Skip to content
OnData.blog

OnData.blog

Menu

  • Articles
  • By topic
  • About
  • Linkedin
  • Facebook
  • twitter
  • RSS

data science

Data Literacy: Six examples of bad data interpretation

Two people will interpret the same data in different ways. It is a norm, rather than exception. Due to the human factor (personal experience, emotions, deficiencies of human brain and tendency to fall for logical fallacies) understanding of the data

Pawel Plaszczak April 29, 2024 Uncategorized No Comments Read more

Democratization of statistics: Chi2 for non-experts

I am big fan of advanced methods deployed to solve practical problems by ordinary users. Here is our recent achievement. My colleague, an experienced service desk manager, observed that the volume of work in his team has grown. He would

Pawel Plaszczak January 12, 2022January 13, 2022 Articles No Comments Read more

An approach to categorize multi-lingual phrases

I have 130,000 help desk tickets with multi-lingual descriptions. I need to divide this set into categories, such as “password reset”, “license expired”, or “storage failure”. Why? Users could then allocate a category to a new ticket they create. Then

Pawel Plaszczak December 15, 2021December 15, 2021 Articles No Comments Read more

Answering Why (with Chi-Square)

Analysts don’t like the “why” questions. They are tough to answer. For instance, in a help desk analysis, it is easy to show which tickets are resolved faster. But it is difficult to say why. In my practice in Sopra

Pawel Plaszczak November 19, 2021November 20, 2021 Articles No Comments Read more

What makes Data Quality so difficult

Garbage in, garbage out. Analysis of untrusted or poorly understood data will yield incorrect results. Hence the textbook approach is to clean the data first, and only then proceed with data analytics. For instance, in the data lakes, the data

Pawel Plaszczak November 8, 2021November 9, 2021 Articles No Comments Read more

Practical AIOps: 5 use cases

In Sopra Steria we manage the IT infrastructure and applications of big clients. We process millions of service tickets and infrastructure events. This massive stream of data comes from monitoring tools such as Zabbix, Nagios, Solarwinds, and higher level frameworks:

Pawel Plaszczak June 8, 2021June 14, 2021 Articles No Comments Read more

I stopped writing reports, and so can you

I stopped writing reports, and so can you

This post explains how to generate management-quality PDF or HTML reporting directly from Jupyter Notebook. With this technique, I reduced to zero the most irritating part of my projects: copy-pasting diagrams into PowerPoint.

Pawel Plaszczak April 21, 2021April 22, 2021 Articles No Comments Read more

Data Puzzle explained

For the Data Puzzle I posted last week, I received about a dozen of thoughtful and highly relevant answers. THANK YOU. I want to primarily thank to Luis Ruiz Santiago, Chetan Waman and anonymous J for comments under the previous

Pawel Plaszczak March 29, 2021March 30, 2021 Articles No Comments Read more

Data Puzzle

Here is a new data puzzle, coming from my recent analytics in Sopra Steria. I will describe the problem, but not the answer. If you like the challenge, please contribute your thoughts in the comments. The title of the data

Pawel Plaszczak March 24, 2021March 29, 2021 Articles 4 Comments Read more

A picture worth 1,000 words

I love mountains. Some of my dear ones say that this is only because they resemble histograms, which I love more. Not true (ha ha), but I must agree that visualizations done properly brings plenty of satisfaction. Histograms, when prepared

Pawel Plaszczak February 27, 2021March 4, 2021 Articles No Comments Read more
  • « Previous

Recent Posts

  • Data Literacy: Six examples of bad data interpretation April 29, 2024
  • Porting PyTorch neural network to Amazon AWS June 30, 2022
  • Porting pyTorch cloud detection model to Amazon AWS S3 June 17, 2022
  • pushing data to AWS. SageMaker sucks. So does Anaconda June 14, 2022
  • Linear Regression: Killer App with 19-century maths January 19, 2022
  • Democratization of statistics: Chi2 for non-experts January 12, 2022
  • An approach to categorize multi-lingual phrases December 15, 2021
  • The implications of Scikit-learn bug #21455 November 29, 2021
  • Your model may be inaccurate November 25, 2021
  • Answering Why (with Chi-Square) November 19, 2021
Copyright © 2025 OnData.blog. All rights reserved. Theme Spacious by ThemeGrill. Powered by: WordPress.