The interactive visualizations can be viewed here:
and the partial data files can be downloaded here:
This repository scrapes data from federal budgets that are released as PDFs by the Ministry of Finance, Government of Pakistan
Individual budget files are in their respective folders.
Note that numbers are added up from the lowest ID6
category. If the lowest category is missing in the published budgets, then this will be automatically missing from the tables. These entries are relatively few though and are updated as the budget files are revised.
Also note that actual spending can be different than the allocated amount. This can be checked in budget files which also provide allocation and revised numbers for (N-1) fiscal year.
Disclaimer: The files are generated using pattern recognition scripts which had to be fine tuned over several iterations. The files can contain errors. If you come across data issues, then please report them as soon as possible. This is a hobby project to improve my data scraping skills. Therefore, if you intend to use this data for research and policy work, please double check the original source files. Also note that budget files are intermittently updated so the data might only reflect the version downloaded at the time of scraping the data. A full cleaning is possible but this requires a proper fully-funded project.
Variable | Type | Description |
---|---|---|
fund |
num | Name of the fund e.g. current expenditure, capital expenditure, consolidated funds etc. |
ministry_name |
str | Ministry name |
division_name |
str | Name of the division |
ID1_name |
str | Name of the first level |
ID2_name |
str | Name of the second level |
ID3_name |
str | Name of the third level |
ID4_name |
str | Name of the fourth level |
ID5_name |
str | Name of the fifth level |
ID6_name |
str | Name of the sixth (lowest) level |
posts_<N-1> |
num | The number of posts (jobs) in year N-1. |
posts_<N> |
num | The number of posts (jobs) in year N. |
budget_<N-1> |
num | The value in PKR of item ID6 in fiscal year N-1. |
budget_<N-1>_revised |
num | The value in PKR of item ID6 revised in fiscal year N-1. |
budget_<N> |
num | The value in PKR of item ID6 in fiscal year N. |
1D6
categories were being skipped since the columns were messed up. The other main issues was that entries with single columns were not being assigned to the correct column. While most fit a generic pattern, not all might end up in the correct column. This was the bulk of the fine tuning. These should be extremely few and should ONLY matter if analyzing the data at the highest level of disaggregation, i.e. ID6
. Please report these if you find them.