Here is a new data puzzle, coming from my recent analytics in Sopra Steria. I will describe the problem, but not the answer. If you like the challenge, please contribute your thoughts in the comments. The title of the data puzzle is:
Resolution delta versus time spent
I work a lot with Service Desk data. The data points represent the tickets we receive from clients. Each ticket indicates an incident – some technical problem that needs to be fixed. In this work I am frequently puzzled by various data phenomena. Solving them is fun. Some past ones are: The truth behind histogram dent or The phantom I followed.
The problem I tackled this time is related to the resolution delta of incidents. The resolution delta is the time that has passed between the incident open time and the incident close time. The open time field is assigned automatically by the system when an incident arrives. The close time indicates the moment when the problem has been solved for good.
How long are those resolution deltas? Often hours or days, due to the fact that after work has been done, the ticket comes into state ‘resolved’, which triggers an observation period or request for client feedback. Only after this, the ticket can be marked ‘closed’.
The resolution delta has one disadvantage. It does not represent the ticket cost, expressed in the man-hours of the specialist who fixed the ticket. Therefore, some projects introduce the ‘time spent’ field, which is supposed to contain the billable time that an engineer actually spent on the job. As an example, a ticket might have a resolution delta of 6 hours, including 2 hours of time spent, and 4 hours of waiting for client feedback.
In order to understand the data, I tried to visually plot the relationship between the time spent and the resolution time in one project. I expected to see a linear relationship (this is a scatter plot, where each dot represents one ticket):
In my reasoning, easy tickets, with small consumption of time spent should be resolved and closed faster, while difficult tickets with large consumption of time spent, should be resolved later. So the two values should have a strong positive correlation.
However, to my surprise, the plot came out like this:
We are again looking at a scatter plot, where one dot represents one ticket. However, the relationship seems to be the opposite of expected. Tickets with high resolution time (upper left part of the diagram) always have low time spent. In contrast, all tickets with high time spent (bottom right part of the diagram), have consistently low resolution time.
The puzzle is: how to explain this chart? Is it logically possible that the correlation between those two variables is negative? Why?
If you think you have an idea, please submit it in a comment below, or if you prefer at my Facebook fanpage. Good luck and have fun!
The explanation: The Data Puzzle Explained.