Skip to content

Latest commit

 

History

History
22 lines (15 loc) · 1.19 KB

README.md

File metadata and controls

22 lines (15 loc) · 1.19 KB

Archive.org booklet scraper: a tool to download and browse archive.org books offline

Disclaimer: Borrowed books that you don't hold a personal physical copy of are legally available to you only for the duration of the loan.

As of october 2020 the old python and bash scrapers are broken, I made a simple semi-automatic scraper, for any urgent need please contact me here to have your item sent as a PDF in 24h nazmi.fr/contact

Work in progress, available soon

The new javascript bookmarklet

  • works in the browser
  • dynamic download of all highest resolution pages with progress display
  • automatic page number estimate
  • from & to values editable as a prompt before the script starts
  • automatic naming and metadata
  • automatic PDF conversion

Requirements for the new script

  • Web browser (Firefox, Chrome (Chromium, Brave, Edge, Opera, ...)) supporting bookmarklets
  • Optional: VPS with the online PDF converter suite (php, apache, imagemagick, ghostscript) (my server is set as default and can be used to convert the images to pdf for small donation of your choice, or some time doing OCR verification for some books)