Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live Stream transcripts #98

Open
frisch1 opened this issue Feb 24, 2021 · 1 comment
Open

Live Stream transcripts #98

frisch1 opened this issue Feb 24, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@frisch1
Copy link

frisch1 commented Feb 24, 2021

Hello. This is a feature request vs bug, methinks.

Have you looked at extracting captions from a live stream. If you look at any example (https://www.youtube.com/whitehouse) of a live stream, while the stream is live (key), there are auto-generated subtitles delivered in the videoplayback file that streams in, embedded e.g.

https://r6---sn-8xgp1vo-p5qy.googlevideo.com/videoplayback?expire=1614211486&ei=PpU2YM-ULYm98wTm0L_gDA&ip=71.246.232.10&id=yhxmnlGtJ-g.1&itag=386&source=yt_live_broadcast&requiressl=yes&mh=zc&mm=44,29&mn=sn-8xgp1vo-p5qy,sn-p5qs7nel&ms=lva,rdu&mv=m&mvi=6&pl=18&initcwndbps=1717500&vprv=1&live=1&hang=1&noclen=1&xtags=lang=en:ttkind=asr&mime=text/mp4&ns=aD6U7aY6idhNPyXEqiXu6K0F&gir=yes&mt=1614189620&fvip=6&keepalive=yes&fexp=23983797&beids=9466586&c=WEB&n=lmOMV3MuzrpzRQ&sparams=expire,ei,ip,id,itag,source,requiressl,vprv,live,hang,noclen,xtags,mime,ns,gir&sig=AOq0QJ8wRAIgd0qHHqBF3aRir-pw93UKhFNuFxrlpe6OqyMerxsZ4JsCIHZK74UbKX7ig08-egt6vMDzP6g_7EhOyuOOoUXAkSVW&lsparams=mh,mm,mn,ms,mv,mvi,pl,initcwndbps&lsig=AG3C_xAwRAIgHa9tABbFKMiVQSnLLWa7iO_iu7pcVtrea43G-zdfGBUCIGbqOL15uN0-32Yki8s5vwXD2XDkvCBUgntS54w9xvjc&alr=yes&cpn=LW2TAYe5jfbjzMjx&cver=2.20210223.09.00&sq=664

Expired, of course, but an example, the payload here is:

�ftyp��moovlmvhd�_�����@�(mvex trex���}trak\tkhd���@��mdia mdhd�_�UÄ!hdlrtextÐminf$dinf�dref�url �˜stblHstsd�8tx3g�
ftab�stts�stsc�stco�stsz��Vnmhd��emsghttp://youtube.com/streaming/metadata/segment/102015�ˆ°D«Sequence-Number: 664
Stream-Finished: F
Ingestion-Walltime-Us: 1614189870022158
Stream-Duration-Us: 3320017000
Max-Dvr-Duration-Us: 14400000000
Target-Duration-Us: 5000000
Encoding-Alias: L1_Ag

Xmoof�mfhd@traf�tfhd���ß’�tfdt�ÏYz�trun��`�^mdat<?xml version="1.0" encoding="utf-8" ?><timedtext format="3">
<body>
<p t="0" d="345">what&#39;s in the Declassified
report or when it comes out</p>
<p t="345" d="3750">because many elements of Italy
two years ago when when it was</p>
<p t="4095" d="910">first first came out if you come
to the conclusion that there</p>
</body>
</timedtext>

The timedtext is embedded in the file:

<?xml version="1.0" encoding="utf-8" ?><timedtext format="3">
<body>
<p t="0" d="345">what&#39;s in the Declassified
report or when it comes out</p>
<p t="345" d="3750">because many elements of Italy
two years ago when when it was</p>
<p t="4095" d="910">first first came out if you come
to the conclusion that there</p>
</body>
</timedtext>

It's not TTMLv3 but we get this text is associated with sequence #664 from the URL. The t= appears to be millisecond designation relative to the sequence chunk, and "d" appears to be the duration. But even absent that, the stream of text is there. Note it doesn't appear by default. It appears you need to insert into the "sparams" in the URL "xtags" to get the live captioning, but it appears if you try to insert it, it messes up the hash/key associated with it so it needs to be triggered on (cc_load_policy=1 in URL does NOT seem to work)

youtube-dl et al don't recognize this since it's not being delivered as a standalone subtitle file. Acts like there's no subtitles on the live stream since it doesn't identify as a subtitles file.

Thoughts?

@jdepoix
Copy link
Owner

jdepoix commented Feb 25, 2021

Hi @frisch1, I would definitely say that this is a feature request and not a bug. Sounds interesting, but I don't see myself implementing this anytime soon, as this module is mostly used for data-science purposes and I don't really see the use-case for livestreams. However, if you want to contribute this feature I'd be happy to merge it. Deserializing the response probably isn't a big deal, you just gotta find out how to scrape the URL you'll have to call to actually get that response. Let me know if you have that figured out and are interested in contributing it, so we can have a chat on how to implement this into the current API 😊

@jdepoix jdepoix added the enhancement New feature or request label Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants