Binary classification is strong tool in machine learning as most problems are binary or can be re-formulated into binary classification problems. Ability to understand how your model performed and model debugging ability is as important if not more than building the model itself. We will talk about how to measure performance of binary classification problems, more specifically following topics will be covers: ROC curve, area under ROC curve, Precision-recall curve, and average precision.
In previous blog we’ve briefly covered ROC curve and AUC. Today we…
Say I’ve developed recommendation system that I think will beat original system leading to better CTR(# click / # exposure). After having luck to conduct an AB test which resulted in newer system having higher CTR, it is critical that you prove that result was not due to chance. This process is referred to as Hypothesis testing and today we will talk about permutation test which leverage Monte Carlo method to carry out hypothesis testing. …
Long time ago I’ve been asked a question “Why is Beta distribution used in Bayes theorem” at an interview for Data Analyst position. At that time I’ve never heard of Beta Distribution therefore I remember saying “I don’t know”. Recently at work, while developing Thompson Sampling and Contextual Bandits Beta distribution appears once again. So now I’ve decided to take a deeper look into Beta distribution and really understand it once and for all.
we will focus only on statistical viewpoint therefore knowledge of Reinforcement Learning and Thompson Sampling is unnecessary.
Previously, we’ve learned what binomial distribution is and how we can use it to solve coin toss problems, if you need a refresher on what binomial distribution is it is recommended that you review previous blog.
Part II will be focused on mitigating limitations of binomial distributions we’ve introduced previously by using normal approximation and continuity correction.
Goal of normal approximation is if binomial distribution satisfy certain conditions treat them like normal distribution therefore we could apply tricks that are applicable to normal distributions.
We know that normal distribution is…
At work, colleagues and I had a discussion about an algorithm that uses binomial distribution and normal approximation for binomial distribution. Even though I’ve studied the concepts before I could not visualize right away what happens as
p gets larger so I’ve thought refresher on it would be a good idea. This also proves how applicable statistics concepts are, it is not just for finding probability of coin toss.
We will also discuss limitations of Binomial distribution in part I then way of mitigating it by normal approximation will…
2. Mathematical derivation
previously variety of DRTs were briefly introduced, in this chapter we will go through mathematical working details on one of Linear Dimensionality Reduction Technique and probably most popular one called Principal Component Analysis.
Usually I tend work in bottom-up fashion w.r.t. simplicity of machine learning project therefore whether it be model itself or preprocessing step I start with linear functions unless I have information about underlying data structure beforehand.
Goal of PCA is to find orthogonal bases(a.k.a Principal Component/Eigenvector) that preserve largest variance of original data. …
Big data are data that are high in 3Vs, Volume, Variety, and Velocity. There are benefits of big data however it comes with consequences such as increased computation complexity, containment of noisy data, and are hard to visualize.
Having high dimension lead to well known problem of “Curse of Dimensionality” which is as dimension increase number of rows must increase exponentially in order to have same explanation ability. …
While at work, developing reinforcement learning model I’ve came across an Auto regressive model that is used to update policy in RL agent. This activated very deep and un-visited part of my brain which is “already learned” part. I’ve remembered that I’ve written a blog on using ARIMA which is combination of AutoRegressive model with Moving Average model. I thought it would be good idea to recap my understanding and also bring out my blog into the light. So here it goes.
Before going in to ARIMA we must recap on what “Time Series” is.
Data points that are observed…
In my opinion it’s useful to know why certain technology or algorithm has been developed. So whenever there is new technology or algorithm that I want to study I often ask my self these questions: What limitations did prior algorithm or technique face that led to development of such algorithm or tech? How does it work? and Is there a problem it didn’t solve or did new problem arise?
We will take the same approach and quickly learn about why Federated Learning were developed.
What limitations did prior algorithm or technique face that led to development of such algorithm…
Lots of people probably have learned that when class is imbalanced they need to use performance metrics such as recall, precision, f1-score or ROC curve over standard accuracy measure.
I’m not sure if it’s just me however I didn’t know about know training with imbalanced data influenced model performance. While studying deep learning concepts from YouTube(@4:10) it has told me that imbalance class label affects how model is trained and best method is to oversample class that has less.
This got me thinking “What about in machine learning algorithms? why did I never try to balance class labels?”