Skip to content

hrbrmstr/pubcrawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Ā 

History

9 Commits
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 
Ā 

Repository files navigation

*** IMPORTANT ***

No further development will occur in this package as it has been supeseded by the actively maintained and quite spiffy! epubr package.


Travis-CI Build Status AppVeyor Build Status Coverage Status

pubcrawl

Convert ā€˜epubā€™ Files to Text

Description

Convert ā€˜epubā€™ Files to Text

The ā€˜epubā€™ file format is really just a structured ā€˜ZIPā€™ archive with metadata, graphics and (usually) ā€˜HTMLā€™ text. Tools are provided to turn an ā€˜epubā€™ file into a tidy data frame.

Whatā€™s Inside The Tin

The following functions are implemented:

  • epub_to_text: Convert an epub file into a data frame of plaintext chapters

NOTE

There are edge cases Iā€™ve totally not covered yet. Feel free to jump in and make this a real, useful package!

TODO

  • Refactor so there arenā€™t so many heavy dependencies
  • [ ] Try to get hgr on CRAN so itā€™s not a GH dep Moved the cleaner code into here
  • Better docs
  • Embed some epubs for examples and tests
  • Setup Travis, Appveyor, code coverage

Installation

devtools::install_github("hrbrmstr/pubcrawl")

Usage

library(pubcrawl)
library(tidyverse)

# current verison
packageVersion("pubcrawl")
## [1] '0.1.0'

An Oā€™Reilly epub

epub_to_text("~/Data/R Packages.epub")
## # A tibble: 26 x 4
##    path                         size date                content                                                       
##    <chr>                       <dbl> <dttm>              <chr>                                                         
##  1 OEBPS/cover.html              315 2015-03-24 21:49:16 Cover                                                         
##  2 OEBPS/titlepage01.html        466 2015-03-24 21:49:16 "R Packages\n\nHadley Wickham"                                
##  3 OEBPS/copyright-page01.html  3286 2015-03-24 21:49:16 "R Packages\n\nby Hadley  Wickham\n\n\n\nPrinted in the Uniteā€¦
##  4 OEBPS/toc01.html            17557 2015-03-24 21:49:16 "navPrefaceIn This Book\n\nConventions Used in This Book\n\nUā€¦
##  5 OEBPS/preface01.html        17784 2015-03-24 21:49:16 "Preface\n\n\nIn This Book\n\nThis book will guide you from bā€¦
##  6 OEBPS/part01.html             444 2015-03-24 21:49:16 Getting Started                                               
##  7 OEBPS/ch01.html             12007 2015-03-24 21:49:16 "Introduction\n\nIn R, the fundamental unit of shareable codeā€¦
##  8 OEBPS/ch02.html             28633 2015-03-24 21:49:18 "Package Structure\n\nThis chapter will start you on the roadā€¦
##  9 OEBPS/part02.html             454 2015-03-24 21:49:18 Package Components                                            
## 10 OEBPS/ch03.html             28629 2015-03-24 21:49:18 "R Code\n\nThe first principle of using a package is that allā€¦
## # ... with 16 more rows

A Project Gutenberg epub that comes with the package

epub_to_text(system.file("extdat", "augustine.epub", package="pubcrawl")) %>% 
  mutate(path = abbreviate(path))
## # A tibble: 10 x 4
##    path                             size date                content                                                   
##    <chr>                           <dbl> <dttm>              <chr>                                                     
##  1 OEBPS/@@@@@@@3296@3296-@3296--0 63804 2017-10-02 07:00:00 "THE CONFESSIONS\nOF\nSAINT AUGUSTINE\n\nBy Saint Augustiā€¦
##  2 OEBPS/@@@@@@@3296@3296-@3296--1 68504 2017-10-02 07:00:00 "BOOK III\nTo Carthage I came, where there sang all arounā€¦
##  3 OEBPS/@@@@@@@3296@3296-@3296--2 80192 2017-10-02 07:00:00 "BOOK V\nAccept the sacrifice of my confessions from the ā€¦
##  4 OEBPS/@@@@@@@3296@3296-@3296--3 51898 2017-10-02 07:00:00 "O crooked paths! Woe to the audacious soul, which hoped,ā€¦
##  5 OEBPS/@@@@@@@3296@3296-@3296--4 80194 2017-10-02 07:00:00 "Anubis, barking Deity, and allĀ Ā Ā Ā Ā Ā Ā Ā  The monster Gods ā€¦
##  6 OEBPS/@@@@@@@3296@3296-@3296--5 80718 2017-10-02 07:00:00 "The boy then being stilled from weeping, Euodius took upā€¦
##  7 OEBPS/@@@@@@@3296@3296-@3296--6 65956 2017-10-02 07:00:00 "And Thou knowest how far Thou hast already changed me, wā€¦
##  8 OEBPS/@@@@@@@3296@3296-@3296--7 57022 2017-10-02 07:00:00 "BOOK XII\nMy heart, O Lord, touched with the words of Thā€¦
##  9 OEBPS/@@@@@@@3296@3296-@3296--8 69513 2017-10-02 07:00:00 "BOOK XIII\nI call upon Thee, O my God, my mercy, Who creā€¦
## 10 OEBPS/@@@@@@@3296@3296-@3296--9 21223 2017-10-02 07:00:00 "The Confessions of Saint Augustine, by Saint Augustine\nā€¦

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Releases

No releases published

Packages

No packages published

Languages