Skip to content

Archive tagesschau, tagesthemen, and nachtmagazin & convert subtitles from EBU-TT-D or WEBVTT to SRT

License

Notifications You must be signed in to change notification settings

alexmerkel/tsarchiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tsarchiver

What is tsarchiver

tsarchiver is a script to archive tagesschau, tagesthemen, and nachtmagazin videos from the tagesschau.de website. Metadata and subtitles are added to the video files and are stored in a SQLite database.

Usage:

$ tsarchiver.py ARCHIVEDIR

where ARCHIVEDIR is the directory in which to store the downloaded files. Additionally, the script is looking for a SQLite database called archive.db inside this folder. If it can't find one, you will be asked to create one. Then, the script asks for the page index for each show at which to start the archiving. The index is part of the video domain, for example https://www.tagesschau.de/multimedia/sendung/ts-34001.html, the index would be 34001.

subconvert.py

This script can also be used on its own to convert subtitles from the EBU-TT-D format to the SRT format. Usage:

$ subconvert.py SUBFILE

where SUBFILE is the subtitle file in the EBU-TT-D (.xml) or the WEBVTT (.xml) format. The script also looks for a file called subignore.txt inside the script folder. If a subtitle line contains a word or sentence specified in this file, it will be ignored.

Requirements

Python packages:

License

MIT

About

Archive tagesschau, tagesthemen, and nachtmagazin & convert subtitles from EBU-TT-D or WEBVTT to SRT

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages