In this article I tackle the following problem: how to define and distinguish anomalies (spikes, peaks, and outliers in data) in real-life, production situations. Typically, the data drift results in the absence of a reference level. Since we do not
Kolmogorov-Smirnov test: a practical intro
I feel that today’s network lags are not normal… But is it really so? Or is it my mind playing tricks on me? The KS test (Kolmogorov-Smirnov) is a practical tool to provide objective answers to such questions. Here is
Are people fair?
Suppose all you had was a history of user requests to the service desk. Would you be able to determine how many of those requests were honest?
When Accuracy Grows But Precision Falls
My Machine Learning classifier’s prediction accuracy improves with the growing volume of train data. But at the same time, its precision falls. Why so? And how to fix it? Read on. reducing the problem to classification At Sopra Steria, we
3 Steps to Unmask Data in Camouflage
I am looking at distribution of a certain data set (left). It has two peaks (this is called ‘bimodal’) therefore I suspect that those are two overimposed populations. How do I split the data, to rediscover the original two populations
The truth behind a histogram dent
Here is quite intriguing research with the data of our Sopra Steria IT operations (ITSM, AIOps, and Infrastructure Management). I’ve been faced with an interesting situation in an IT Applications Management project for a large corporate client. In such a
No, the virus did NOT survive 17 days.
Here is how one careless sentence triggered a surge of detergents in our oceans.
How herd instinct hijacked herd immunity
In this article, I am not advocating any strategy towards herd immunity against coronavirus. I want to show that the mainstream discussion misses the point.
Coronavirus mortality: less than we think
Note 1: If you are looking for some COVID-19 conspiracy theories, go elsewhere. Below is only some boring statistics.
Data Lake: simple usage example
Data Lakes vary from each other. Standards are only emerging. The Lagoon Data Lake we have internally built at Sopra Steria (introduced in the previous post) is an internal IaaS Data Lake solution, built mostly of open source components (Spark,