Garbage in, garbage out. Analysis of untrusted or poorly understood data will yield incorrect results. Hence the textbook approach is to clean the data first, and only then proceed with data analytics. For instance, in the data lakes, the data
Don’t trust Data Science. Ask the people
Can intuition beat the popular data science tools? Is SelectKBest, the popular feature selection method, wrong? Here is a story of a recent project. The story ends with a puzzle which I cannot solve. Help is welcome. Both data and
Mistaken by factor of 100,000
Longormal data is very tricky. Wrong visualization methods can lead to radical misinterpretation of the result. In this article I show an example of such a mistake based on a real project, and I demonstrate how to avoid the caveats
Practical AIOps: 5 use cases
In Sopra Steria we manage the IT infrastructure and applications of big clients. We process millions of service tickets and infrastructure events. This massive stream of data comes from monitoring tools such as Zabbix, Nagios, Solarwinds, and higher level frameworks:
How to delete your data for good
I had to permanently erase data from a few external hard drives before selling them. Some of them were USB, some were NAS (connected through Ethernet). I collected some observations which some people might find helpful. In most filesystems, deleting
I stopped writing reports, and so can you
This post explains how to generate management-quality PDF or HTML reporting directly from Jupyter Notebook. With this technique, I reduced to zero the most irritating part of my projects: copy-pasting diagrams into PowerPoint.
Nine Circles of Hell: time in Python
Python is powerful, concise, and robust. Simply great. Except…when you work with time. Coping with mysterious errors in transforming dates and timestamps took me hours and days of frustration. I was like, ‘why is Python doing it to me’? I
Data Puzzle explained
For the Data Puzzle I posted last week, I received about a dozen of thoughtful and highly relevant answers. THANK YOU. I want to primarily thank to Luis Ruiz Santiago, Chetan Waman and anonymous J for comments under the previous
Data Puzzle
Here is a new data puzzle, coming from my recent analytics in Sopra Steria. I will describe the problem, but not the answer. If you like the challenge, please contribute your thoughts in the comments. The title of the data
A picture worth 1,000 words
I love mountains. Some of my dear ones say that this is only because they resemble histograms, which I love more. Not true (ha ha), but I must agree that visualizations done properly brings plenty of satisfaction. Histograms, when prepared