This article shall be covering methodologies to do a quality check on the predictions from a machine learning model before you deploy it into production.

It happens quite often that the Data Science team has spent, weeks even months to understand the data, perform state of the art Feature Engineering try the various machine learning and deep learning modeling techniques, performed tuning the hyperparameters, and then built the ULTIMATE model to make predictions.

And when this machine learning model is put into production, the business results are below par than the ones in the development conditions.

We shall be using the subpopulation analysis technique on the model predictions and understand why it is important to see through these subpopulations and how to do such an…


In this article, we shall discuss one of the ubiquitous steps in the machine learning pipeline — Feature Scaling. This article's origin lies in one of the coffee discussions in my office on what all models actually are affected by feature scaling and then what is the best way to do it — to normalize or to standardize or something else?

In this article, in addition to the above, we would also cover a gentle introduction to feature scaling, the various feature scaling techniques, how it might lead to data leakage, when to perform feature scaling, and when NOT to…


Ever wondered how can we leverage the outcome of a probabilistic time series forecast? How can atoti be used on top of the outcome from a neural network?

In this article, we will discuss how we can use atoti to leverage probabilistic time series forecasting from a deep learning model.

Here, we have used GluonTS, a solution from AWS, to do probabilistic time series forecasting for the sales number for a salad manufacturer. This probabilistic forecast is then used in order to optimize the inventory and find the optimum number of refrigerators to store the salad products.

Image for post
Image for post
If Tarot cards were right with probability =1, we would not have needed probabilistic forecasting at all!

What is probabilistic time series forecasting and why do we need it?

The concept of…


In this article, we will explore a method to tackle the issue of optimizing inventory management thanks to quantile based time-series predictions and leveraging atoti — a data visualization platform with an aggregation engine and native multidimensional and what-if analysis support.

Inventory management is one of the most critical components of the manufacturing and supply chain processes. It is also one of the major issues faced by the industries.

When it comes to sales for the manufacturing industries, both the extreme scenarios are unfavorable:

  • When there is an unexpectedly high demandit will cause stock-outs
  • When there is unexpectedly…


‘Data leakage’ is a ubiquitous term associated with predictive modeling and is a prevalent occurrence in most Kagglers dictionary.

If your model is performing too well, reflect on your methods before popping open the champagne.

Image for post
Image for post
“too good to be true” performance is “a dead giveaway” of its existence.

Predictive modeling & Cross-validation

Predictive modeling focuses on making predictions on novel data using a model that learns the pattern from the training data.

This is a challenging problem. It’s hard because the model cannot be evaluated on something which is not available.

Hence, the existing training data is leveraged for learning the patterns and, at the same time, testing the capabilities of the model to accurately predict an…


Question:

I’m trying to implement a Naive Bayes model following the Bayes’ theorem. The problem I face is that some class labels are missing when applying the theorem leading to the overall probability estimate to be zero. How to handle such missing classes when using the Naive Bayes model?

Answer:

Background Let first recall what is the Naive Bayes Algorithm. As the name suggests, it is based on the Bayes theorem of Probability and Statistics with a naive assumption that the features are independent of each other.

Bayes Algorithm describes the probability of an event, based on prior knowledge of conditions that…

Raghav Vashisht

Master in Data Science and Business Analytics, ESSEC Business School- CentraleSupélec

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store