Skip to content
OnData.blog

OnData.blog

Menu

  • Articles
  • By topic
  • About
  • Linkedin
  • Facebook
  • twitter
  • RSS

Answering Why (with Chi-Square)

Analysts don’t like the “why” questions. They are tough to answer. For instance, in a help desk analysis, it is easy to show which tickets are resolved faster. But it is difficult to say why. In my practice in Sopra

Pawel Plaszczak November 19, 2021November 20, 2021 Articles No Comments Read more

What makes Data Quality so difficult

Garbage in, garbage out. Analysis of untrusted or poorly understood data will yield incorrect results. Hence the textbook approach is to clean the data first, and only then proceed with data analytics. For instance, in the data lakes, the data

Pawel Plaszczak November 8, 2021November 9, 2021 Articles No Comments Read more

Don’t trust Data Science. Ask the people

Can intuition beat the popular data science tools? Is SelectKBest, the popular feature selection method, wrong? Here is a story of a recent project. The story ends with a puzzle which I cannot solve. Help is welcome. Both data and

Pawel Plaszczak October 24, 2021November 29, 2021 Articles 1 Comment Read more

Mistaken by factor of 100,000

Mistaken by factor of 100,000

Longormal data is very tricky. Wrong visualization methods can lead to radical misinterpretation of the result. In this article I show an example of such a mistake based on a real project, and I demonstrate how to avoid the caveats

Pawel Plaszczak October 14, 2021October 24, 2021 Articles No Comments Read more

Practical AIOps: 5 use cases

In Sopra Steria we manage the IT infrastructure and applications of big clients. We process millions of service tickets and infrastructure events. This massive stream of data comes from monitoring tools such as Zabbix, Nagios, Solarwinds, and higher level frameworks:

Pawel Plaszczak June 8, 2021June 14, 2021 Articles No Comments Read more

How to delete your data for good

I had to permanently erase data from a few external hard drives before selling them. Some of them were USB, some were NAS (connected through Ethernet). I collected some observations which some people might find helpful. In most filesystems, deleting

Pawel Plaszczak June 1, 2021June 2, 2021 Articles No Comments Read more

I stopped writing reports, and so can you

I stopped writing reports, and so can you

This post explains how to generate management-quality PDF or HTML reporting directly from Jupyter Notebook. With this technique, I reduced to zero the most irritating part of my projects: copy-pasting diagrams into PowerPoint.

Pawel Plaszczak April 21, 2021April 22, 2021 Articles No Comments Read more

Nine Circles of Hell: time in Python

Nine Circles of Hell: time in Python

Python is powerful, concise, and robust. Simply great. Except…when you work with time. Coping with mysterious errors in transforming dates and timestamps took me hours and days of frustration. I was like, ‘why is Python doing it to me’? I

Pawel Plaszczak April 15, 2021April 16, 2021 Articles No Comments Read more

Data Puzzle explained

For the Data Puzzle I posted last week, I received about a dozen of thoughtful and highly relevant answers. THANK YOU. I want to primarily thank to Luis Ruiz Santiago, Chetan Waman and anonymous J for comments under the previous

Pawel Plaszczak March 29, 2021March 30, 2021 Articles No Comments Read more

Data Puzzle

Here is a new data puzzle, coming from my recent analytics in Sopra Steria. I will describe the problem, but not the answer. If you like the challenge, please contribute your thoughts in the comments. The title of the data

Pawel Plaszczak March 24, 2021March 29, 2021 Articles 4 Comments Read more
  • « Previous
  • Next »

Recent Posts

  • Moving On April 4, 2026
  • Data Literacy: Six examples of bad data interpretation April 29, 2024
  • Porting PyTorch neural network to Amazon AWS June 30, 2022
  • Porting pyTorch cloud detection model to Amazon AWS S3 June 17, 2022
  • pushing data to AWS. SageMaker sucks. So does Anaconda June 14, 2022
  • Linear Regression: Killer App with 19-century maths January 19, 2022
  • Democratization of statistics: Chi2 for non-experts January 12, 2022
  • An approach to categorize multi-lingual phrases December 15, 2021
  • The implications of Scikit-learn bug #21455 November 29, 2021
  • Your model may be inaccurate November 25, 2021

Recent Posts

  • Moving On
  • Data Literacy: Six examples of bad data interpretation
  • Porting PyTorch neural network to Amazon AWS
  • Porting pyTorch cloud detection model to Amazon AWS S3
  • pushing data to AWS. SageMaker sucks. So does Anaconda

Recent Comments

  • Pawel Plaszczak on How to isolate data that constitutes a spike in histogram?
  • robert on How to isolate data that constitutes a spike in histogram?
  • Marcello Anselmi Tamburini on Your model may be inaccurate
  • C on Product Owner vs Product Manager vs Architect
  • Houcem on Don’t trust Data Science. Ask the people

Archives

  • April 2026
  • April 2024
  • June 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • October 2020
  • July 2020
  • June 2020
  • April 2020
  • March 2020
  • February 2020
  • November 2019
  • October 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • July 2018
  • November 2016

Categories

  • Articles
  • General Public
  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Copyright © 2026 OnData.blog. All rights reserved. Theme Spacious by ThemeGrill. Powered by: WordPress.