Web scraping is a computer software technique of extracting information from websites. This technique mostly focuses on the transformation of unstructured data (HTML format) on the web into structured data (database or spreadsheet).
Scraping 500 Hindi news articles from the Jagaran Newspaper website.
- Docs: For document change/update
- Gather: For Wrangling process - Reading/Gathering
- Assess: For Wrangling process - Assess
- Clean: For wrangling process - Cleaning quality and tidiness issues, may include test codes too
- Viz: For visualization
- Refactor: Refactoring existing code
- Chore: Package manager