Machine learning and artificial intelligence are becoming more and more ubiquitous and an integral part of our lives. Along with the rise of machine learning and artificial intelligence, the concern around machine learning bias is also increasing.

In this article, we will talk about one of the hot topics in Machine Learning Ethics — how to reduce machine learning bias. We shall also discuss the tools and techniques for the same.

Machine Learning Bias

Machine learning bias, also sometimes known as bias in artificial intelligence, is a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning process.

Bias could be prejudice in favor or against a person, group, or thing that is considered to be unfair.

This article shall be covering methodologies to do a quality check on the predictions from a machine learning model before you deploy it into production.

It happens quite often that the Data Science team has spent, weeks even months to understand the data, perform state of the art Feature Engineering try the various machine learning and deep learning modeling techniques, performed tuning the hyperparameters, and then built the ULTIMATE model to make predictions.

And when this machine learning model is put into production, the business results are below par than the ones in the development conditions.

We shall be using the subpopulation analysis technique on the model predictions and understand why it is important to see through these subpopulations and how to do such an…

In this article, we shall discuss one of the ubiquitous steps in the machine learning pipeline — Feature Scaling. This article's origin lies in one of the coffee discussions in my office on what all models actually are affected by feature scaling and then what is the best way to do it — to normalize or to standardize or something else?

In this article, in addition to the above, we would also cover a gentle introduction to feature scaling, the various feature scaling techniques, how it might lead to data leakage, when to perform feature scaling, and when NOT to…

Perform probabilistic time series forecast from a deep learning model and perform what-if analysis on the forecast.

We shall be doing a probabilistic time-series forecast of the sales for a salad manufacturer and then use atoti for inventory management and find the optimum number of refrigerators to store the salad products.

In this article, we will explore a method to tackle the issue of optimizing inventory management thanks to quantile based time-series predictions and leveraging atoti — a data visualization platform with an aggregation engine and native multidimensional and what-if analysis support.

Inventory management is one of the most critical components of the manufacturing and supply chain processes. It is also one of the major issues faced by the industries.

When it comes to sales for the manufacturing industries, both the extreme scenarios are unfavorable:

  • When there is an unexpectedly high demandit will cause stock-outs
  • When there is unexpectedly…

‘Data leakage’ is a ubiquitous term associated with predictive modeling and is a prevalent occurrence in most Kagglers dictionary.

If your model is performing too well, reflect on your methods before popping open the champagne.

Predictive modeling & Cross-validation

Predictive modeling focuses on making predictions on novel data using a model that learns the pattern from the training data.

This is a challenging problem. It’s hard because the model cannot be evaluated on something which is not available.

Hence, the existing training data is leveraged for learning the patterns and, at the same time, testing the capabilities of the model to accurately predict an…


I’m trying to implement a Naive Bayes model following the Bayes’ theorem. The problem I face is that some class labels are missing when applying the theorem leading to the overall probability estimate being zero. How to handle such missing classes when using the Naive Bayes model?


Background Let first recall what is the Naive Bayes Algorithm. As the name suggests, it is based on the Bayes theorem of Probability and Statistics with a naive assumption that the features are independent of each other.

Bayes Algorithm describes the probability of an event, based on prior knowledge of conditions that might…

Raghav Vashisht

Master in Data Science and Business Analytics, ESSEC Business School- CentraleSupélec

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store