October 2019 – OnData.blog

Why Analysts need Data Lakes?

With substantial analytical needs at Sopra Steria Apps, we are looking to expand our Data Science environment. My thoughts go towards a Data Lake architecture, from a concrete angle, having practical requirements and knowing quite precisely what we want. I’ve

Simple hack to improve data clustering visualizations

Here is how to make your data clusters look pretty in no time (with python and matplotlib), with one-liner code hack. I wanted to visualize in python and matplotlib the data clusters returned by clustering algorithms such as K-means (sklearn.cluster.KMeans)

How to isolate data that constitutes a spike in histogram?

We would all love to spot business problems early on, to react before they become painful. You can learn a lot by looking at past problems. Hence, understanding the nature of anomalies in data can bring substantial operational benefits and