-
Notifications
You must be signed in to change notification settings - Fork 9
Data Gathering notes
Example files from @elischutze https://www.dropbox.com/sh/nuciypizp91bc10/AAB6z7G8LEgzWeQv6PBrdeu1a?dl=0![](https://github.com/HackBrexit/MinistersUnderTheInfluence/issues/14)
@sikesLpp Thanks for this. See the attached -that was the correct website, we just need to ensure the search is correctly filtered.
The 15 departments are:
Priority government departments (15 we discussed):
Cabinet Office HM Treasury Department for Communities and Local Government Department for Culture, Media & Sport Department for Education Department for Environment, Food & Rural Affairs Department for International Development Department for Transport Department for Work and Pensions Department of Health Foreign & Commonwealth Office Home Office Ministry of Defence Ministry of Justice
New departments that we will want the data from once they start publishing it
15. Department for Business, Energy & Industrial Strategy
16. Department for Exiting the European Union
17. Department for International Trade
Notes from Momo As discussed last meeting I have created a script to harvest data. I had to create a fork of the repo @ https://github.com/sikesLpp/MinistersUnderTheInfluence as I do not have push access. The script requires a a linux machine with libxml and php-cli installed and must be run on the shell. It will take a 'rich' url for a search at https://www.gov.uk/government/publications and dump links to all documents found and some relenvant metadata to a csv file (govharvester_listfile.csv).
As discussed this is a prove of concept script that will need some further tuning ( including actually downloading the docs and storing the metadata in some sort of a database) .
short instruction for usage:
go to https://www.gov.uk/government/publications in your browser make a selection via the search dropdowns push search paste the url the page created as first argument to the script NB: do not forget to enclose in single quotes as the shell will interpret '&' as 'AND' ... example: ./govharvester.php 'https://www.gov.uk/government/publications?keywords=&publication_filter_option=transparency-data&topics[]=all&departments[]=attorney-generals-office'
PS: I will not be able to make it to the meetup tommorow, as I am out of town for a wedding
Do we have a slackchannel or mailinglist for better communictios already ?
grtz Momo
Hey @sikesLpp, I tried to run this script today but kept on getting this error - Fatal error: Cannot use object of type DOMNodeList as array - line 120 when parsing the type from the dom I haven't done any php so I'm not sure how to move forward from this...
Also our slack channel is hackbrexit
@sikesLpp it was my php version and the script needed to have php tags.