Skip to content

The data for guides to breweries across the United States from 1896 to 1918

License

Notifications You must be signed in to change notification settings

hadro/brewery-guides

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brewery Directories from 1896 - 1918

The files in this repository are derived from two related New York Public Library collections, linked below.

They cover listings of breweries and maltsters in the United States from 1896 to 1918, and include fascinating location information as well as excellent historical advertising context and more.

Files come from two collections of similar titles, and the Digital Collections links below contain all the images of the volumes held by The New York Public Library, in a variety of useful resolutions:

Files

Each folder in this repository represents one volume of the brewery guides. Within each folder you'll find four things:

  • A searchable PDF of the original volume;
  • A .txt file that is the concatenated output of the 'txt' folder;
  • A _with_layout.txt file that is the txt output of the PDF file using pdftotxt with the -layout flag;
  • A 'txt' folder with all of the raw OCR txt for each page, with sequence and image ID information embedded in the filename;
  • An 'hocr' folder with all of the coordinate hocr files for each page, with sequence and image ID information embedded in the filename

Full-Text PDFs

N.B. These are 14-26mb files

More fun with Public Domain materials

The images and the data that allowed me to pull down these volumes from The New York Public Library's Digital Collections are all contained in the Public Domain release NYPL put out in January 2016. (disclosure: I co-led that project as an NYPL staffer, and remain very proud of it!) If you want to play with the data and utilities yourself, there's a great NYPL Public Domain Github repo waiting for you.

License

All materials in these two collections are in the public domain according to rights analysis by the NYPL Copyright and Information Policy group. As a result, all the data and PDFs in this repo are likewise in the public domain. A CC0 license has been applied to this repo -- in the off chance that someone believes new copyright could attach to these images or the files, they are hereby released CC0 (even though they don't need to be by virtue of being public domain in the first place).