3

Project Hamster: A word on statistics

view full story
linux-howto

http://projecthamster.wordpress.com – Any visualization that abstracts data is a dialogue of trust between observer and creator. Statistics are used to expose and display information from angle that reveals new knowledge, that could not be seen just by looking on the whole haystack. Since we are now in the era of information overflow, they have become more essential than ever. Now, that’s quite a mouthful, hahaha. However, you should always question the methodology used to produce results. In open source we trust in masses of individuals – that somebody certainly has looked on the code of the program you are using to transfer sensible data, that somebody has made sure that your computer won’t fry up when carried in bag, because program suddenly decided to wake it up, or, at least, that somebody else has already been burned, and the bug has been fixed and your operating system contains it, and, that somebody has made sure that the visualizations don’t lie (at least not horribly). Ok, i got carried away, but it’s all because i did some adjustments to show statistics more appropriately. Before the recent commit by Patryk patrys Zawadski, we were splitting activities that overlap midnight in two. In the 2.27 cycle i made the split only happen virtually, but Patryk moved it further and now we have a concept called hamster_midnight, which corresponds to 5:30am. Activities before 5:30 fall into previous day, activities overlapping 5:30 tip to the end where the largest part of the activity is. So, in the first iteration of stats, to just get things done, i did the same old midnight split. That certainly influenced average starts and ends. Now i just pushed to git master slightly better approach – we now can have 24h+ timespan in the starts and ends charts, and facts respect the hamster midnight. Here is the resulting difference: How could you possibly tell which is the new (on the left) and which is the previous (on the right) version (hint – compare week end days, also the hacking now has scooted more to the end of the day). Truth is – unless the data is totally opposite from your gut feeling – you can’t. (Software)