Every month, the Press Information Bureau (PIB) posts GST collection data for India. However, the unstructured nature of this release presents significant challenges in calculating trends for both – the country and state levels. This difficulty arises due to the need to aggregate historical data and append it to the latest figures. The key point to note is that the issue goes beyond data structure alone; historical data is fragmented into separate Excel files for each financial year, laced with structural inconsistencies between them. Furthermore, the PIB data has a completely different format and granularity, and it’s posted 2 months in advance of the historical data. In the face of these complexities, how can we efficiently process this data upon each upload, thereby eliminating the need for extensive formatting? This note aims to guide you through straightforward coding steps that automate this process, enabling rapid visualization of trends within seconds.
Step1 – Aggregating Historical Data
As a first step download the individual files and store them in a folder. This will look something like this.