In this article, learn one of the sought out skills for data scientists -how to generate random datasets. We will see why to synthetic data generation is important and we will explore the various Python libraries to generate synthetic data.

Introduction: Why Data Synthesis?

Testing proof of concept

As a data scientist, you can benefit from data generation since it allows you to experiment with various ways of exploring datasets, algorithms, data visualization techniques or to validate assumptions about the behavior of some method against many different datasets of your choosing.

When you have to test a Proof of concept, a tempting option is just to use real data. One small problem though is that production data is typically hard to obtain, even partially, and it is not getting easier with new European laws about privacy and security.

Data is indeed a scarce resource

The algorithms, programming frameworks, and machine learning packages (or even…

We build a sales cube on top of automobile Data from the U.S. In this article, learn how to slice and dice the dataset across multiple axes — the customers, the cars, and the brands to find interesting KPIs and summarize the KPIs in an atoti dashboard.

PART 1: Sales Cube and how atoti can ‘boost’ it.

What is a Sales Cube?

Sales cubes are used to report on sales transactions, specifically concerning posting sales order invoices and sales order packing slips. Sales cube datasets are self-containing and do not require users to create table profiles.

Various units of measure can be incorporated into a sales cube report to ensure that the quantity is correct, like using a SUM, MEAN or COUNT value.

Follow us as we will walk you through the process of creating, saving, sharing, and persisting dashboards in atoti.

Would you like to create a business intelligence web application with atoti?

Business intelligence tools are helpful to stay competitive. Organizations of every size and stage use BI tools to analyze, manage and visualize business data. It is extremely easy to create a business intelligence web application with atoti.

Creating a session

First, let us create a session in atoti, this session will give us a web application without any cube or any data loaded into it

And after running the above, you will get a link to go to the web application.

Machine learning and artificial intelligence are becoming more and more ubiquitous and an integral part of our lives. Along with the rise of machine learning and artificial intelligence, the concern around machine learning bias is also increasing.

In this article, we will talk about one of the hot topics in Machine Learning Ethics — how to reduce machine learning bias. We shall also discuss the tools and techniques for the same.

Machine Learning Bias

Machine learning bias, also sometimes known as bias in artificial intelligence, is a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning process.

Bias could be prejudice in favor or against a person, group, or thing that is considered to be unfair.

This article shall be covering methodologies to do a quality check on the predictions from a machine learning model before you deploy it into production.

It happens quite often that the Data Science team has spent, weeks even months to understand the data, perform state of the art Feature Engineering try the various machine learning and deep learning modeling techniques, performed tuning the hyperparameters, and then built the ULTIMATE model to make predictions.

And when this machine learning model is put into production, the business results are below par than the ones in the development conditions.

We shall be using the subpopulation analysis technique on the model predictions and understand why it is important to see through these subpopulations and how to do such an…

In this article, we shall discuss one of the ubiquitous steps in the machine learning pipeline — Feature Scaling. This article's origin lies in one of the coffee discussions in my office on what all models actually are affected by feature scaling and then what is the best way to do it — to normalize or to standardize or something else?

In this article, in addition to the above, we would also cover a gentle introduction to feature scaling, the various feature scaling techniques, how it might lead to data leakage, when to perform feature scaling, and when NOT to…

Perform probabilistic time series forecast from a deep learning model and perform what-if analysis on the forecast.

We shall be doing a probabilistic time-series forecast of the sales for a salad manufacturer and then use atoti for inventory management and find the optimum number of refrigerators to store the salad products.

In this article, we will explore a method to tackle the issue of optimizing inventory management thanks to quantile based time-series predictions and leveraging atoti — a data visualization platform with an aggregation engine and native multidimensional and what-if analysis support.

Inventory management is one of the most critical components of the manufacturing and supply chain processes. It is also one of the major issues faced by the industries.

When it comes to sales for the manufacturing industries, both the extreme scenarios are unfavorable:

  • When there is an unexpectedly high demandit will cause stock-outs
  • When there is unexpectedly…

‘Data leakage’ is a ubiquitous term associated with predictive modeling and is a prevalent occurrence in most Kagglers dictionary.

If your model is performing too well, reflect on your methods before popping open the champagne.

Predictive modeling & Cross-validation

Predictive modeling focuses on making predictions on novel data using a model that learns the pattern from the training data.

This is a challenging problem. It’s hard because the model cannot be evaluated on something which is not available.

Hence, the existing training data is leveraged for learning the patterns and, at the same time, testing the capabilities of the model to accurately predict an…


I’m trying to implement a Naive Bayes model following the Bayes’ theorem. The problem I face is that some class labels are missing when applying the theorem leading to the overall probability estimate being zero. How to handle such missing classes when using the Naive Bayes model?


Background Let first recall what is the Naive Bayes Algorithm. As the name suggests, it is based on the Bayes theorem of Probability and Statistics with a naive assumption that the features are independent of each other.

Bayes Algorithm describes the probability of an event, based on prior knowledge of conditions that might…

Raghav Vashisht

Master in Data Science and Business Analytics, ESSEC Business School- CentraleSupélec

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store