Bash Web Crawler to find URLs by parsing the HTML source code and the found javascript links on homepage of a required specific website domain. It is also possible to use a pattern word as optional argument to customize the URLs extraction.
➜ git clone https://github.com/torsh4rk/BASHkrawler.git
➜ cd BASHkrawler/ && chmod +x bashkrawler.sh
➜ ./bashkrawler.sh
Fig.2 - Chosing the option 1 to find all URLs at target domain www.nasa.gov via HTML parsing
Fig.3 - Finding all URLs at target domain www.nasa.gov via HTML parsing
Fig.4 - Chosing the option 2 to find all JS links at target domain www.nasa.gov and extract all URLs from this found JS links
Fig.5 - Chosing the option 3 to find all URLs at target domain www.nasa.gov via option 1 and 2 without using a pattern word to match
Fig.6 - Finishing the full web crawling at target domain www.nasa.gov
Fig.7 - Make web crawling at a target domain and find all URLs with the word ".nasa"
Fig.8 - Chosing the option 3 to find all URLs with the word "nasa" at target domain www.nasa.gov via option 1 and 2
Fig.9 - Finishing the full web crawling at target domain www.nasa.gov by using the word ".nasa"
https://medium.datadriveninvestor.com/what-is-a-web-crawler-and-how-does-it-work-b9e9c2e4c35d