2. Mathematical derivation
previously variety of DRTs were briefly introduced, in this chapter we will go through mathematical working details on one of Linear Dimensionality Reduction Technique and probably most popular one called Principal Component Analysis.
Usually I tend work in bottom-up fashion w.r.t. simplicity of machine learning project therefore whether it be model itself or preprocessing step I start with linear functions unless I have information about underlying data structure beforehand.
Goal of PCA is to find orthogonal bases(a.k.a Principal Component/Eigenvector) that preserve largest variance of original data. …
Big data are data that are high in 3Vs, Volume, Variety, and Velocity. There are benefits of big data however it comes with consequences such as increased computation complexity, containment of noisy data, and are hard to visualize.
Having high dimension lead to well known problem of “Curse of Dimensionality” which is as dimension increase number of rows must increase exponentially in order to have same explanation ability. …
While at work, developing reinforcement learning model I’ve came across an Auto regressive model that is used to update policy in RL agent. This activated very deep and un-visited part of my brain which is “already learned” part. I’ve remembered that I’ve written a blog on using ARIMA which is combination of AutoRegressive model with Moving Average model. I thought it would be good idea to recap my understanding and also bring out my blog into the light. So here it goes.
Before going in to ARIMA we must recap on what “Time Series” is.
Data points that are observed…
In my opinion it’s useful to know why certain technology or algorithm has been developed. So whenever there is new technology or algorithm that I want to study I often ask my self these questions: What limitations did prior algorithm or technique face that led to development of such algorithm or tech? How does it work? and Is there a problem it didn’t solve or did new problem arise?
We will take the same approach and quickly learn about why Federated Learning were developed.
What limitations did prior algorithm or technique face that led to development of such algorithm…
Lots of people probably have learned that when class is imbalanced they need to use performance metrics such as recall, precision, f1-score or ROC curve over standard accuracy measure.
I’m not sure if it’s just me however I didn’t know about know training with imbalanced data influenced model performance. While studying deep learning concepts from YouTube(@4:10) it has told me that imbalance class label affects how model is trained and best method is to oversample class that has less.
This got me thinking “What about in machine learning algorithms? why did I never try to balance class labels?”
Today we are going to talk about how to regularize linear regression, that is make our model more generalized to overfitting while training.
Linear regression is simple yet powerful method as it provide us with quick prediction time once trained which is one of most important features to consider when developing a machine learning model because in real world there are customers waiting for predictions and longer they wait, customer experience is going to decrease.
When linear regression is underfitting there is no other way (given you can’t add more data) then to increase complexity of the model making it…
Today we are going to go through breast_cancer dataset from Sklearn to understand different types of performance metrics for classification problems and why sometimes one would be preferred over the other.
Even though it is a simple topic answer does not immediately arise when someone asks “what is precision/recall?”, by going over examples I hope everyone including myself get firm grasp on this topic.
Most people often use “Accuracy” as performance metrics in classification problems sometimes it is okay to use however when we have imbalanced labels we should avoid using it.
Metrics are what we use to compare different…
From multiple Data Scientist interviews there are few questions that I have been asked frequently and one of them is “explain how Logistic Regression works”. I understood the concept at a high level and knew how to implement it however I got stuck when I was asked to explain “what is sigmoid function”, “what cost function does Logistic Regression use and why?”, etc… I could not answer them. …
Today I am going to talk about one of most topics that has confused me for a while, Central Limit Theorem(CLT).
I’ve always convinced myself that having a lots of data will generate normal distribution which is non-sense if you take your time to think of it because why would collecting large number of data point lead to normal distribution?(unless data distribution is normally distributed)
Before going into learning about CLT, it is important to clearly understand difference between data distribution and sampling distribution.
Data Distribution: A function or listing showing all possible values of data, how often each data…
Today we will go over one of bagging method called Random Forest to predict one of air pollutants in China. Model is called “forest” because it is built with multiple decision trees and “random” since it selects subset of rows(not features) for each tree and uses subset of features(columns) at each node split.
I assume readers know what Bagging and decision trees are, if not it is recommended that you read my previous blogs to get a brief overview: Decision Tree part I, Decision Tree part II and Ensemble Learning methods.
Before we get started I’ve noticed lot…
Data Scientist passionate about helping the environment.