Time Series Data Analysis and Visualization: Git Commit Timestamps as An Example

UUID: 06494758-96ed-4f04-bac9-c1e855d883cd
Remarks
- This article is essentially a new use case (Git commits) of same time series data analysis and visualization techniques used in my other article JH-Articles: Data and Visualization of Making My PONS Dictionary .
Timestamps
- 20240111.Added this article

Timestamps are not only universal but also interesting in many cases. Just as time proceeds every second with a stop, timestamp data is also resulted as time goes by.

Time is a human concept than we use to mark the progress of our universe, and not all human beings live at one place, therefore, we also invented more concepts in the scope of time such as year, month, day and time zones.

Time can of course be tied to events. But only time series itself is already an very fruitful topic, therefore, in this article, let's foucus on pure timestamps.

Digital Representation of Time

Timestamp in my impression is mainly a computer concept, which is a numerical number of seconds after Thursday 1 January 1970 00:00:00 UT, a point in time known as the Unix epoch. One good thing about the ISO 8601 timestamp is that it is base on UTC or time-zone independent. But as human, in daily life, we consider time as concept in relationship to daylight, therefore, we have created the concept of time zones to allow people all around the global to use the same number to describe noon or midnight.

Therefore, when handling timestamps, it is necessary to be consider timezones, if the time point relation to daylight is more of interests.

Taking one of my Git project, the program project for the website JH-Articles, Qia-Articles, as an example. I started using Git for the program files in 2017 in P.R.China which use CST +0800 as its local time representation standard (which also means people's (of course including my) daily life adapts to this time representation) and I continued the project when I came to Germany which use CET +0100 and CEST +0200 depending the time of the year. To make meaningful statistics about when do I usually make Git commits in a day, it is necessary to convert the git commit timestamps into local time representation with time zone taken into account.

Git Commit Timestamps

Git commit timestamps can be easily fetched using the git log command like.

$ git log --all --reverse --pretty=format:'%ct'
1507960100
1508578130
1508583841
...

With some python magic, the timestamps can be transformed into some thing like the following.

Datetime,Timezone
2017-10-14 13:48:20,+0800 CST
2017-10-21 17:28:50,+0800 CST
2017-10-21 19:04:01,+0800 CST
...
2019-08-02 00:24:14,+0800 CST
2020-03-01 14:26:54,+0100 CET
2020-03-27 16:08:38,+0100 CET
2020-03-27 16:25:26,+0100 CET
...
2020-03-27 16:48:00,+0100 CET
2020-05-05 17:37:16,+0200 CEST
2020-05-14 12:16:39,+0200 CEST
2020-05-14 12:17:03,+0200 CEST

Which are much easier for human to read and interprete for aspects such as how many commits on a certain day or at which hour certain commits are made. To make the time series data analysis and visualization more focused, let's just use the Datetime column but leave out the Timezone data as the dataset to use.

Time Series Data Analysis

What are the interesting questions we can ask on the dateset? I think at least the following several.

How many commits have I made per day along the timeline?

This question can also cover a bit on which days, i.e., are there big gaps where a sequence of days with 0 commits or are there a big chunks of days with many commits?

How to answer this question? Just aggregate (count) the date-time data by the first 10 charactors (yyyy-mm-dd). For better human consumption, let's look on the result in the later visulization part.

At what hours did I make most commits or fewest commits?

Hours on each day are usually also not identical, like the 9 am on a workday is usually quite different from the 9 am on a weekend, so is between a work day and a holidy. As for frequence, both average and sum are good indicator - I would prefer to pick the sum as it is integer. So the date-time data can be grouped by the 12 and 13 charactors (hh) with the first 10 charactors used a metadata to retrieve the days of the week.

Time Series Data Visualization

Daily Numbers of Commits Along the Timeline

PlotlyTimestampBarGraph { "data":[{ "x":["2017-10-14","2017-10-21","2017-11-09","2017-11-21","2017-11-22","2017-11-26","2017-11-27","2017-12-04","2017-12-05","2017-12-31","2018-04-30","2018-06-09","2018-09-05","2018-11-18","2018-11-27","2018-12-02","2018-12-12","2018-12-16","2018-12-17","2019-07-27","2019-07-28","2019-08-02","2020-03-01","2020-03-27","2020-05-05","2020-05-14","2020-06-09","2020-06-12","2020-06-24","2020-07-05","2020-12-06","2021-01-30","2021-01-31","2021-04-04","2021-04-05","2021-06-09","2021-09-26","2021-10-26","2021-10-27","2021-10-28","2021-10-31","2021-11-04","2021-11-06","2021-11-07","2021-11-08","2021-11-21","2021-11-22","2021-12-11","2021-12-12","2021-12-16","2021-12-17","2022-01-19","2022-01-20","2022-03-06","2022-03-10","2022-04-02","2023-10-20","2023-10-28","2023-10-29","2023-11-03","2023-11-07","2023-11-09","2023-11-10","2023-11-11","2023-11-17","2023-11-18","2023-11-20","2023-11-21","2023-11-23","2023-11-24","2023-11-27","2023-12-05","2023-12-06","2023-12-08","2023-12-09","2023-12-10","2023-12-12","2023-12-15","2023-12-16","2023-12-26","2024-01-02","2024-01-07","2024-01-11"], "y":["1","2","1","2","1","1","1","2","1","1","1","3","1","3","1","6","1","5","1","1","1","1","1","6","1","3","1","1","3","2","3","1","3","2","1","1","2","1","1","1","1","1","3","1","1","4","1","5","1","1","2","1","2","2","1","1","3","2","3","4","2","3","1","1","2","1","1","1","1","2","1","1","1","1","1","2","1","2","1","3","2","1","1"] }], "layout":{"title":"Qia-Articles Git Commits Timestamps"} }

Hourly Numbers of Commits (Separated in Weekday and Weekend to Compare Between)

PlotlyGraph { "data":[{ "marker": { "color": "#4876b0" }, "name": "weekday", "type": "bar", "x":["0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23"], "y":["7","3","0","0","0","0","0","0","0","0","3","2","5","4","5","2","6","3","2","5","7","8","7","8"] }, { "marker": { "color": "#004494" }, "name": "weekend", "type": "bar", "x":["0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23"], "y":["10","0","1","0","1","0","0","0","0","0","0","0","0","7","3","3","9","10","7","3","3","2","6","3"] }], "layout":{ "title":"Qia-Articles Git Commits Timestamps", "barmode": "stack" } }

Hourly Numbers of Commits (Separated in Years to Compare Among)

PlotlyGraph { "data": [{ "type": "bar", "visible": "legendonly", "name": "2017", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 2, 0, 1, 1, 3] }, { "type": "bar", "name": "2018", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 2, 3, 0, 2, 1, 3, 1] }, { "type": "bar", "visible": "legendonly", "name": "2019", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0] }, { "type": "bar", "visible": "legendonly", "name": "2020", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 1, 0, 6, 3, 1, 2, 3, 0, 0, 0] }, { "type": "bar", "visible": "legendonly", "name": "2021", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 2, 6, 5, 3, 0, 1, 1, 4, 1] }, { "type": "bar", "visible": "legendonly", "name": "2022", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0] }, { "type": "bar", "name": "2023", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 0, 4, 4, 3, 0, 2, 2, 3, 1, 7, 4, 6] }, { "type": "bar", "visible": "legendonly", "name": "2024", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], "y": [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0] }], "layout":{ "title":"Qia-Articles Git Commits Timestamps" } }