Planning to allow Aggregating, Analyzing, and Displaying Data for Home Listings.
Currently able to parse pdfs and store in json format to then allow as input to a Data Lake as semi-structured data.
Next steps will be to parse json for key attributes to extract, transform, and load into a database.
python3 pdf_scraper.py <path/to/file.pdf>