Skip to content
OnData.blog

OnData.blog

Menu

  • Articles
  • By topic
  • About
  • Linkedin
  • Facebook
  • twitter
  • RSS

How to tell anomalies in data drift

In this article I tackle the following problem: how to define and distinguish anomalies (spikes, peaks, and outliers in data) in real-life, production situations. Typically, the data drift results in the absence of a reference level. Since we do not

Pawel Plaszczak February 22, 2021February 23, 2021 Articles No Comments Read more

Kolmogorov-Smirnov test: a practical intro

I feel that today’s network lags are not normal… But is it really so? Or is it my mind playing tricks on me? The KS test (Kolmogorov-Smirnov) is a practical tool to provide objective answers to such questions. Here is

Pawel Plaszczak October 30, 2020November 2, 2020 Articles No Comments Read more

Are people fair?

Suppose all you had was a history of user requests to the service desk. Would you be able to determine how many of those requests were honest?

Pawel Plaszczak October 23, 2020October 30, 2020 Articles No Comments Read more

The phantom I followed

When you really want to find a pattern in data, you will. Even if there is no pattern. What happened to me yesterday was embarrassing… it also is a lesson worth sharing. I learned how to interpret unusual data patterns,

admin October 18, 2020April 13, 2024 Articles No Comments Read more

When Accuracy Grows But Precision Falls

My Machine Learning classifier’s prediction accuracy improves with the growing volume of train data. But at the same time, its precision falls. Why so? And how to fix it? Read on. reducing the problem to classification At Sopra Steria, we

Pawel Plaszczak July 23, 2020July 23, 2020 Articles No Comments Read more

3 Steps to Unmask Data in Camouflage

I am looking at distribution of a certain data set (left). It has two peaks (this is called ‘bimodal’) therefore I suspect that those are two overimposed populations. How do I split the data, to rediscover the original two populations

Pawel Plaszczak June 29, 2020November 20, 2021 Articles No Comments Read more

The truth behind a histogram dent

Here is quite intriguing research with the data of our Sopra Steria IT operations (ITSM, AIOps, and Infrastructure Management). I’ve been faced with an interesting situation in an IT Applications Management project for a large corporate client. In such a

Pawel Plaszczak June 19, 2020June 29, 2020 Articles No Comments Read more

No, the virus did NOT survive 17 days.

Here is how one careless sentence triggered a surge of detergents in our oceans.

Pawel Plaszczak April 11, 2020April 22, 2020 Articles 2 Comments Read more

How herd instinct hijacked herd immunity

In this article, I am not advocating any strategy towards herd immunity against coronavirus. I want to show that the mainstream discussion misses the point.

Pawel Plaszczak April 2, 2020April 13, 2020 Articles No Comments Read more

Coronavirus mortality: less than we think

Note 1: If you are looking for some COVID-19 conspiracy theories, go elsewhere. Below is only some boring statistics.

Pawel Plaszczak March 29, 2020April 13, 2020 Articles, General Public 9 Comments Read more
  • « Previous
  • Next »

Recent Posts

  • Data Literacy: Six examples of bad data interpretation April 29, 2024
  • Porting PyTorch neural network to Amazon AWS June 30, 2022
  • Porting pyTorch cloud detection model to Amazon AWS S3 June 17, 2022
  • pushing data to AWS. SageMaker sucks. So does Anaconda June 14, 2022
  • Linear Regression: Killer App with 19-century maths January 19, 2022
  • Democratization of statistics: Chi2 for non-experts January 12, 2022
  • An approach to categorize multi-lingual phrases December 15, 2021
  • The implications of Scikit-learn bug #21455 November 29, 2021
  • Your model may be inaccurate November 25, 2021
  • Answering Why (with Chi-Square) November 19, 2021
Copyright © 2025 OnData.blog. All rights reserved. Theme Spacious by ThemeGrill. Powered by: WordPress.