Skip to content
OnData.blog

OnData.blog

Menu

  • Articles
  • By topic
  • About
  • Linkedin
  • Facebook
  • twitter
  • RSS

Moving On

Dear Friends As you might have noticed, for some time I’ve been publishing on Medium. While my old articles will stay here so you can find them, please follow me on Medium (click here) for anything new. If you have

Pawel Plaszczak April 4, 2026April 4, 2026 Uncategorized No Comments Read more

Data Literacy: Six examples of bad data interpretation

Two people will interpret the same data in different ways. It is a norm, rather than exception. Due to the human factor (personal experience, emotions, deficiencies of human brain and tendency to fall for logical fallacies) understanding of the data

Pawel Plaszczak April 29, 2024 Uncategorized No Comments Read more

Porting PyTorch neural network to Amazon AWS

As part of my Sopra Steria engagement, I have been lately fortunate to spend time in the so-called Aerospace Valley, which is a cluster of aerospace engineering research centers in Toulouse, France. My recent task was to do with cloud

Pawel Plaszczak June 30, 2022June 30, 2022 Articles No Comments Read more

Porting pyTorch cloud detection model to Amazon AWS S3

As part of my Sopra Steria engagement, I have been lately fortunate to spend time in the so-called Aerospace Valley, which is a cluster of aerospace engineering research centers in Toulouse, France. My recent task was to do with cloud

Pawel Plaszczak June 17, 2022June 28, 2022 Articles No Comments Read more

pushing data to AWS. SageMaker sucks. So does Anaconda

I did a lot of tech work on the infrastructure underlying my analytics over the past weeks. I am putting my notes here so they don’t get lost and maybe help someone. Here are three stories, unrelated to each other.

Pawel Plaszczak June 14, 2022June 14, 2022 Articles No Comments Read more

Linear Regression: Killer App with 19-century maths

I often feel the gap between the mainstream Data Science rhetoric and the true business needs is widening. When I hear of Hyperautomation, Edge AI, AutoML, or GANs, I challenge myself to take a leap back, understand our needs better.

Pawel Plaszczak January 19, 2022January 24, 2022 Articles No Comments Read more

Democratization of statistics: Chi2 for non-experts

I am big fan of advanced methods deployed to solve practical problems by ordinary users. Here is our recent achievement. My colleague, an experienced service desk manager, observed that the volume of work in his team has grown. He would

Pawel Plaszczak January 12, 2022January 13, 2022 Articles No Comments Read more

An approach to categorize multi-lingual phrases

I have 130,000 help desk tickets with multi-lingual descriptions. I need to divide this set into categories, such as “password reset”, “license expired”, or “storage failure”. Why? Users could then allocate a category to a new ticket they create. Then

Pawel Plaszczak December 15, 2021December 15, 2021 Articles No Comments Read more

The implications of Scikit-learn bug #21455

As described last week, the Scikit-learn chi-square feature selection is not usable until the bug #21455 is addressed. The problem concerns sklearn.feature_selection.chi2 and the derivative methods, including SelectKBest, if used for categorical features other than binary. The nature of the

Pawel Plaszczak November 29, 2021November 29, 2021 Articles No Comments Read more

Your model may be inaccurate

With Machine Learning in Python, you may do feature selection with SelectKBest. As I just confirmed, this method sometimes returns faulty results. This potentially impacts the accuracy of numerous ML models worldwide. Below the details and the way out. The

Pawel Plaszczak November 25, 2021November 29, 2021 Articles 1 Comment Read more
  • « Previous

Recent Posts

  • Moving On April 4, 2026
  • Data Literacy: Six examples of bad data interpretation April 29, 2024
  • Porting PyTorch neural network to Amazon AWS June 30, 2022
  • Porting pyTorch cloud detection model to Amazon AWS S3 June 17, 2022
  • pushing data to AWS. SageMaker sucks. So does Anaconda June 14, 2022
  • Linear Regression: Killer App with 19-century maths January 19, 2022
  • Democratization of statistics: Chi2 for non-experts January 12, 2022
  • An approach to categorize multi-lingual phrases December 15, 2021
  • The implications of Scikit-learn bug #21455 November 29, 2021
  • Your model may be inaccurate November 25, 2021

Recent Posts

  • Moving On
  • Data Literacy: Six examples of bad data interpretation
  • Porting PyTorch neural network to Amazon AWS
  • Porting pyTorch cloud detection model to Amazon AWS S3
  • pushing data to AWS. SageMaker sucks. So does Anaconda

Recent Comments

  • Pawel Plaszczak on How to isolate data that constitutes a spike in histogram?
  • robert on How to isolate data that constitutes a spike in histogram?
  • Marcello Anselmi Tamburini on Your model may be inaccurate
  • C on Product Owner vs Product Manager vs Architect
  • Houcem on Don’t trust Data Science. Ask the people

Archives

  • April 2026
  • April 2024
  • June 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • October 2020
  • July 2020
  • June 2020
  • April 2020
  • March 2020
  • February 2020
  • November 2019
  • October 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • July 2018
  • November 2016

Categories

  • Articles
  • General Public
  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Copyright © 2026 OnData.blog. All rights reserved. Theme Spacious by ThemeGrill. Powered by: WordPress.