Substack2Markdown

Substack2Markdown is a Python tool for scraping free and premium Substack posts and saving them as Markdown files. It will save paid for content as long as you're subscribed to that substack. Most "save for later" apps (such as Pocket) don't save these posts, but with this script you can now browse and sort through these posts in a user-friendly HTML interface.

Once you run the script, it will create a folder named after the substack in /substack_md_files, and then begin to scrape the substack URL, converting the blog posts into markdown files. Once all the posts have been saved, it will generate an HTML file in /substack_html_pages directory that allows you to browse the posts.

You can either hardcode the substack URL and the number of posts you'd like to save into the top of the file, or specify them as command line arguments.

Features

Converts Substack posts into Markdown files.
Generates an HTML file to browse Markdown files.
Supports free and premium content (with subscription).
The HTML interface allows sorting essays by date or likes.

Installation

Clone the repo and install the dependencies:

git clone https://github.com/yourusername/substack_scraper.git
cd substack_scraper

# # Optinally create a virtual environment
# python -m venv venv
# # Activate the virtual environment
# .\venv\Scripts\activate  # Windows
# source venv/bin/activate  # Linux

pip install -r requirements.txt

For the premium scraper, update the config.py in the root directory with your Substack email and password:

EMAIL = "[email protected]"
PASSWORD = "your-password"

You'll also need Microsoft Edge installed for the Selenium webdriver.

Usage

Specify the Substack URL and the directory to save the posts to:

You can hardcode your desired Substack URL and the number of posts you'd like to save into the top of the file and run:

python substack_scraper.py

For free Substack sites:

python substack_scraper.py --url https://example.substack.com --directory /path/to/save/posts

For premium Substack sites:

python substack_scraper.py --url https://example.substack.com --directory /path/to/save/posts --premium

To scrape a specific number of posts:

python substack_scraper.py --url https://example.substack.com --directory /path/to/save/posts --number 5

Viewing Markdown Files in Browser

To read the Markdown files in your browser, install the Markdown Viewer browser extension.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
assets		assets
data		data
substack_html_pages		substack_html_pages
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
author_template.html		author_template.html
config.py		config.py
requirements.txt		requirements.txt
substack_scraper.py		substack_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

data

data

substack_html_pages

substack_html_pages

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

author_template.html

author_template.html

config.py

config.py

requirements.txt

requirements.txt

substack_scraper.py

substack_scraper.py

Repository files navigation

Substack2Markdown

Features

Installation

Usage

Viewing Markdown Files in Browser

About

Releases 2

Packages

Contributors 2

Languages

License

timf34/Substack2Markdown

Folders and files

Latest commit

History

Repository files navigation

Substack2Markdown

Features

Installation

Usage

Viewing Markdown Files in Browser

About

Topics

Resources

License

Stars

Watchers

Forks

Languages