readability_analysis

A demonstration of readability analysis performed using the Blog Authorship Corpus and Wikipedia.

This repository provides reproducible evidence to support an upcoming discussion on readability analysis that will be published in CHANCE magazine.

To reproduce the work contained herein should require fairly standard computing resources and Python tools. The most resource intensive notebook is vocabulary_and_readability_blog_corpus.ipynb: this was run on an Amazon EC2 m5d.4xlarge machine, with 16 virtual CPUs and 64 GiB of RAM. The environment was Amazon's Deep Learning AMI (Ubuntu 18.04.5 LTS) Version 26.0. The environment.yml file in this directory provides details for the environment used, but this is overkill: the easiest way to run the notebooks provided is to just use conda or pip to install any modules not currently in your Python environment. The data used is not cached here, but is described in the notebooks, particularly blog_data_prep.ipynb, wikipedia_readability.ipynb, and word_ease_analysis.ipynb.

Please direct any questions or comments to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
IMAGES		IMAGES
NOTEBOOKS		NOTEBOOKS
LICENSE		LICENSE
README.md		README.md
Tables.pdf		Tables.pdf
Tables.tex		Tables.tex
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMAGES

IMAGES

NOTEBOOKS

NOTEBOOKS

LICENSE

LICENSE

README.md

README.md

Tables.pdf

Tables.pdf

Tables.tex

Tables.tex

environment.yml

environment.yml

Repository files navigation

readability_analysis

About

Releases

Packages

Languages

License

linesn/readability_analysis

Folders and files

Latest commit

History

Repository files navigation

readability_analysis

About

Topics

Resources

License

Stars

Watchers

Forks

Languages