Discover and read the best of Twitter Threads about #100DayswithMachinelearning

Day 58 of #100DayswithMachineLearning

Topic - Stochastic Gradient Descent ( SGD )

A Thread 🧵

SGD is an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs. It’s an inexact but powerful technique.

Saddle point or minimax point is point on the surface of graph of function where slopes (derivatives) in orthogonal directions are all zero (a critical point), but which is not local extremum of function

A saddle point (in red) on graph of z = x2 − y2 (hyperbolic paraboloid)

Read 10 tweets

Day 57 of #100dayswithMachinelearning

Topic - Batch Gradient Descent (BGD)

A Thread 🧵

(BGD) is optimization algorithm commonly used in ML & optimization problems to minimize the cost function or maximize the objective function

It is type of GD algorithm that update model parameters by taking the average gradient of entire training dataset at each iteration

Here's how the BGD algorithm works:

1) Initialize the model parameters: Start by initializing the model parameters, such as weights and biases, with random values.

Read 14 tweets

Day 47 of #100dayswithmachinelearning

Topic -- Principle Component Analysis
(PCA) Part 1

PCA statistics is science of analyzing all the dimension & reducing them as much as possible while preserving exact information

You can monitor multi-dimensional data (can visualize in 2D or 3D dimension) over any platform using the Principal Component Method of factor analysis.

Step by step explanation of Principal Component Analysis

STANDARDIZATION
COVARIANCE MATRIX COMPUTATION
FEATURE VECTOR
RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES

Day 46 of #100dayswithmachinelearning

Topic -- Curse of Dimensionality

🧵

Refers to phenomenon where the performance of ML algorithms deteriorates as No. of dimension or feature of input data ⬆️

This is because the volume of space increases exponentially with No. of dimension which causes data to become sparse & distance btwn data point to increase

Many ML algorithms struggle to find meaningful patterns & relationships in high-dimensional data & may suffer from overfitting or poor generalization performance. This can lead to longer training time increased memory requirements & reduced accuracy & efficiency in predictions.

Day 45 of #100dayswithmachinelearning

Topic - Feature Construction & Feature Splitting

A Thread 🧵

Feature construction is a critical aspect of feature engineering, which involves the process of creating new features or transforming existing ones to improve the performance of machine learning models.

The goal of feature construction is to extract meaningful information from raw data and represent it in a way that can be effectively used by machine learning algorithms.

Read 10 tweets

Day 44 of #100dayswithmachinelearning

Topic -- Outlier Detection using Percentile Method

A Thread 🧵

Outliers are a very important and crucial aspect of Data Analysis.

It can be treated in different ways, such as trimming, capping, discretization, or by treating them as missing values.

Percentile Method -

This technique works by setting a particular threshold value, which is decided based on our problem statement.

While we remove the outliers using capping, then that particular method is known as Winsorization.

Day 43 of #100dayswithmachinelearning

Topic - Outlier Detection and Removal using the IQR Method

A Thread 🧵

The IQR (Interquartile Range) method is a common approach for detecting and removing outliers from a dataset

IQR is the difference between 75th and 25th Quartile

we can remove the bad data from left or right skewed distribution as well for that statistics have introduced IQR

Finding the IQR

there are outliers that need to be removed, and for that, here is the start of the section where we will start by finding the IQR

percentile25 = df['placement_exam_marks'].quantile(0.25)
percentile75 = df['placement_exam_marks'].quantile(0.75)

Day 42 of #100dayswithMachinelearning

Topic -- Outlier Detection & Removal using Z-score Method

A Thread 🧵

The Z-score method is statistical approach used for detecting & removing outlier in dataset. An outlier is observation that lies far away from other observation in dataset. Such observations can significantly affect statistical properties of dataset & lead to erroneous conclusion

Approach for Outliers

- The very first step will be setting the upper and lower limit

- The first technique for dealing with outliers is trimming & this is regardless of what kind of data distribution you are working with, trimming is an applicable and proven technique for most

Read 7 tweets

Day 33 of #100dayswithmachinelearning

Topic - Handling Mixed Variable in Feature Engineering 👨‍💻

A Thread 🧵

Handling missing Variable is very important as many machine learning algorithms do not support data with missing values. If you have missing values in the dataset, it can cause errors and poor performance with some machine learning algorithms.

Variable deletion involves dropping variables (columns) with missing values on a case-by-case basis. This method makes sense when there are a lot of missing values in a variable and if the variable is of relatively less importance.

Read 7 tweets

Day 32 of #100dayswithmachinelearning

Topic - Encode Numerical Features ( Binning & Binarization )

A Thread 🧵

Discretization: It is process of transforming continuous variables into categorical variable by creating set of intervals, which are contiguous, that span over the range of the variable’s values. It is also known as “Binning”, where the bin is an analogous name for an interval

Benefits of Discretization or Binning :

1⃣ Handles the Outliers in a better way.
2⃣ Improves the value spread.
3⃣ Minimize the effects of small
observation errors.

Read 11 tweets

Day 31 of #100dayswithMachinelearning

Topic - Power Transformer in ML

A Thread 🧵

🔸Power Transformation techniques are the type of feature transformation technique where the power is applied to the data observations for transforming the data.

🔸Two types of Power Transformation techniques:

1⃣ Box-Cox Transform
2⃣ Yeo-Johnson Transform

▶️Box-Cox Transform :

This is mainly used for transforming the data observation by applying power to them. The power of data observation is denoted by Lambda(λ). There are mainly 2⃣ conditions associated with power in this transform which is lambda equal zero and not equal to0⃣

Day 22 of #100dayswithMachinelearning

Topic -- Pandas Profiling

A Thread🧵

🔸Pandas profiling offers report generation for the dataset with lots of features and customizations for the report generated.

🔸let’s explore all the sections of the report one by one.

1. Overview
This section consists of the 3 tabs: Overview, Warnings, and Reproduction.

2. Variables
This section of the report gives a detailed analysis of all the variables/columns/features of the dataset. The information presented varies depending upon the data type of variable.

Day 21 of #100dayswithmachinelearning

Topic -- EDA using Bivariate and Multivariate Analysis

🧵

✅Bivariate Analysis
Bi means two and variate means variable, so here there are two variables. The analysis is related to cause and the relationship between the two variables.

Three types -

🔸Scatter Plot
🔸Linear Correlation
🔸Chi-square Test

✅Multivariate analysis :

is required when more than two variables have to be analyzed simultaneously.

Types -

🔸Cluster Analysis
🔸Factor Analysis
🔸Multiple Regression Analysis
🔸 Principal Component Analysis

Day 20 of #100DayswithMachineLearning

Topic - EDA using Univariate Analysis

🧵

🔸Uni means one and variate means variable, so in univariate analysis, there is only one dependable variable. The objective of univariate analysis is to derive the data, define and summarize it, and analyze the pattern

🔸Univariate data can be described through:

- Ø Frequency Distribution Tables
Ø Bar Charts
Ø Histograms
Ø Pie Charts

Ø Frequency Polygons

Day 19 of #100DayswithMachineLearning

Topic - UnderStanding Your Data in ML

🧵

🔸The quality & quantity of data available for training & testing play significant role in determining the performance of ML model

🔸ML algorithm use data to learn pattern & relationship between input variable target output whch can be used for prediction or classification tasK

🔸Data can be divided into training and testing sets. The training set is used to train the model, and the testing set is used to evaluate the performance of the model. It is important to ensure that the data is split in a random and representative way.

Day 12 of #100DayswithMachineLearning

Topic - Installing Anaconda & Gather Basic Knowledge About these Tools - Jupyter Notebook || Google Colab

🧵

1⃣ Anaconda is a distribution of the #Python and R #programming languages for scientific computing (#datascience, #machinelearning applications, large-scale #data processing, predictive analytics, etc.), that aims to simplify package management and #deployment.

2⃣ The #JupyterNotebook is an open source #web application that you can use to create and share documents that contain live #code, equations, #visualizations, and text. Jupyter #Notebook is maintained by the people at Project Jupyter.

Blog Link - realpython.com/jupyter-notebo…

Read 7 tweets

Day 11 of #100dayswithMachineLearning

Topic : Tensor in Machine Learning

🧵

A tensor is a container which can house data in N #dimensions. Often and erroneously used interchangeably with the matrix (which is specifically a 2-dimensional #tensor), tensors are generalizations of #matrices to N-dimensional space

Tensor notation is much like matrix notation

Tensors are more than simply a data container, however. Aside from holding numeric #data, tensors also include descriptions of the valid linear #transformations between tensors. Examples - include the cross product and the dot product.

Blog Link - kdnuggets.com/2018/05/wtf-te…

Read 5 tweets

Day 10 of #100dayswithMachineLearning

Topic - Data Engineer Vs Data Analyst Vs Data Scientist Vs ML Engineer

🧵

Do you want to start a career in the field of #DataScience #MachineLearning but confused about the different job titles available in this Data-Driven career and the appropriate skill sets needed to excel in one

#Data powers today's world ransformed radically by #data @avizyt

This article aims to demystify the different job titles for #datascience and #machinelearning based career paths. We would look into some job titles such as #DataAnalyst, Data Scientist, Data Engineer, and Machine Learning #Engineer

Blog Link - medium.com/campusx/explor…

Read 4 tweets

Day9⃣ of #100dayswithMachineLearning

Topic - Machine Learning Development Life Cycle

🧵

#Machinelearning has given the computer systems the abilities to automatically learn without being explicitly #programmed. But how does a machine learning system work? So, it can be described using the #lifecycle of machine learning.

#Machinelearning life cycle is a cyclic process to build an efficient machine learning #project. The main purpose of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:
#DataScience