when I listen the podcast think fast talk smart on the spotify, I found it is so interesting and useful material for me to learn English.(exactly, you could find my poor English through the text (/ω\), I really want to lean English well.)
But I think that I really need the subscript of podcast TFTS, so, why don't make a program to help me to do that?
Come on, let's do it!
the steps :
- crawl the original subscript. (crawl)
- format as a LaTex file. (LaTex template)
- transfer to pdf file. (call the compile command)
- sent to me. (email)
I choose the python's module "urllib3" to get the webpage and "BeautifulSoup4" to parser it.
the TFTS website stored subscripts of all episodes. And all we need to do , is let python to crap the subscript of the newest episode( or other episode is ok).
When I scrapy the text I want, the next thing is to format it to a standered .tex file, and I use the template as fallowing:
\\documentclass{article}
\\usepackage[UTF8]{ctex}
\\usepackage{hyperref}
\\linespread{1.3}
\\setlength{\parskip}{0.5em}
\\setlength{\parindent}{0em}
\\begin{document}
\\section*{title}
summary
\\begin{flushright}
\\textit{infor}
\\end{flushright}
content
\\end{document}
So, I just need to format the "title",”summary“,”infor“ and "content".
However, at least for me, the difficult is the "content" part, and I choose use re.sub
to replace the <em>
, <stong>
and <a href="">
tag with \textsl{}
, \textbf{}
and \hyref{}{}
.
Normally, compared with LaTex IDE (like TexStudio ), Commands to compile .tex file looks more difficult and complex.
Thank God ! we don't need too much deep technique, just a command line xelatex script.tex
is OK.
What? You even didn't install the LaTex yet! oh my dear~ so, install and config LaTex , take it, no thanks.
python standerd module to create email and sent email is email and smtpib
scrapy the subscript.ipynb
: the program finish the first three steps, but .ipynb file.
send_mail.py
: the program finish the last steps.
subscript.tex
: the generated .tex file.
subscript.pdf
: the complied .pdf file.
info.json
: some information about the episode we scrapied, which used to help send email better.
account.json
: Emm.... Not surprisingly, you couldn't find this file, if you do, email me soon, Otherwise there will be a terrible thing to happen.(just a cold joke, the file is my contains my email password, used to log in the email server)
main.py
: mian .py file, which joint scrapy the subscript.ipynb
and send_mail.py
, you can only use commond ipython main.py
to finish the goal.