Fundametally, all bookworm needs is a .txt
file. If you can assemble your own material using material that you already own, go ahead!
- Project Gutenberg is a brilliant resource of freely available, out of copyright textual material, providing room for a lot of exploration of historical literature. Click on a book and download the
Plain Text UTF-8
copy. - The British Library / Microsoft OCR project sought to digitise a significant portion of the Library's historic texts using Optical Character Recognition (OCR) back in 2007. Computer vision was still pretty nascent at that point and the project also stopped short of its intended volume, but there's room for a lot of interesting work to be done there.