New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[post-processing] Refactor MoveFilesPP to respect non-video files #9774
base: master
Are you sure you want to change the base?
Conversation
When I created this PP originally, the I believe the path to move to can be obtained by running |
That's a really good shout! I had tried something related to Looking deeper, I see that gets handled elsewhere thanks to |
self.assertTrue(os.path.exists(audiofile), '%s doesn\'t exist' % audiofile) | ||
os.unlink(filename) | ||
os.unlink(audiofile) | ||
run_pp({'keepvideo': True, 'outtmpl': filename}, SimplePP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this out specifically - these changes necessitated an outtmpl
(or setting title
and id
for the default outtmpl
to use). I don't think this is an issue, but I wanted to confirm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand, the reason is that the old implementation didn't concern itself with outtmpls at all - it just got a list of source/destination filenames and it moved files based on that.
The new implementation uses prepare_filename
which allows it to keep source/destination logic contained entirely within the MoveFilesPP
module, but the side effect is that it needs to be able to generate a valid outtmpl
somehow
@pukkandan Thanks for the tip - that ended up working out! I've updated the description and this is ready for re-review with a few notes:
rm -f output.json && \
./yt-dlp.sh https://www.youtube.com/watch\?v\=Bx1AiQdMQro \
--write-thumbnail \
--convert-thumbnail jpg \
--write-description \
--write-subs \
--write-auto-subs \
--sub-langs en,de \
--convert-subs srt \
--output "%(title)s.%(ext)s" \
--output "thumbnail:%(title)s-thumb.%(ext)s" \
--output "subtitle:%(title)s-sub.%(ext)s" \
--output "description:%(title)s-desc.%(ext)s" \
--print-to-file "after_move:%()j" output.json and I'd add in these as-needed: --write-all-thumbnails
--paths "temp: <TODO: tmp directory here>"
--no-overwrites |
ext = ext if ext.startswith('.') else '.' + ext | ||
|
||
return '{}{}'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For viz. I've updated this method to allow you to pass an ext
with or without a leading .
.
This doesn't impact existing usage and makes it easier to pipe together existing commands like:
_, ext = os.path.splitext('foo/bar/file.mp4')
print(ext) # => .mp4
replace_extension(filename, ext)
@pukkandan (not trying to rush this review, just letting you know where I'm at!) I've been playing around with the followup PR(s) I want to make and I know the general approach I want to take for ensuring filepath-related data is accurate for all file types. I've also determined that those changes are somewhat dependent on this change so I'll wait until you're happy with this + it's merged before I put those up 🤙 |
Sorry, but proper review will take some time. From a quick look, here's some things:
Otherwise the impl looks more or less what I had in mind 👍 |
No rush at all! Thank you for the feedback - I'll review and make those changes 👍
Woof. I'm not sure how to solve that one in all cases, but I'll think about it. If I can't come up with anything maybe I'll add some one-off logic to handle the numeric indexed extensions from |
I also think we'll have to resort to that. Note that a similar issue exists for subtitles
|
I don't think so but I'll check again! The new code shouldn't touch anything that's written directly to its final filepath like the More generally, this code should only concern itself with things that were previously stored under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pukkandan (again, no rush! I get it). I've made the requested changes, including the issue with filepaths not being deleted even if the file was being embedded. I've made that change for thumbnails, subs, and infojson but please let me know if there is a filetype I'm missing!
Thanks for the guidance 🙌
self.assertTrue(os.path.exists(audiofile), '%s doesn\'t exist' % audiofile) | ||
os.unlink(filename) | ||
os.unlink(audiofile) | ||
run_pp({'keepvideo': True, 'outtmpl': filename}, SimplePP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand, the reason is that the old implementation didn't concern itself with outtmpls at all - it just got a list of source/destination filenames and it moved files based on that.
The new implementation uses prepare_filename
which allows it to keep source/destination logic contained entirely within the MoveFilesPP
module, but the side effect is that it needs to be able to generate a valid outtmpl
somehow
for f in files_to_delete: | ||
infodict['__files_to_move'].setdefault(f, '') | ||
else: | ||
if not self.params.get('keepvideo', False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could I get eyes on this to confirm that this logic inversion works as expected now that I've deleted the files_to_move
related stuff?
if files_to_move is not None: | ||
self.report_warning('[pre_process] "files_to_move" is deprecated and may be removed in a future version') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For viz: wanted to make sure this messaging was on-brand for deprecating a parameter rather than an entire method
@@ -4339,6 +4330,7 @@ def _write_thumbnails(self, label, info_dict, filename, thumb_filename_base=None | |||
self.to_screen(f'[info] There are no {label} thumbnails to download') | |||
return ret | |||
multiple = write_all and len(thumbnails) > 1 | |||
info_dict['__multiple_thumbnails'] = multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I'm determining whether thumbnails need to use an indexed file "extension". I could use get_param('write_all_thumbnails')
and similar in the MoveFilesPP
, but I reckon doing it this way lessens the risk of logic here getting out of sync with logic in MoveFilesPP
if this is refactored/modified in the future
@@ -224,4 +224,8 @@ def run(self, info): | |||
thumbnail_filename if converted or not self._already_have_thumbnail else None, | |||
original_thumbnail if converted and not self._already_have_thumbnail else None, | |||
info=info) | |||
|
|||
if not self._already_have_thumbnail: | |||
info['thumbnails'][idx].pop('filepath', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the .pop(name, None)
pattern a few places to prevent raising an error if a key doesn't exist. While it shouldn't be strictly necessary based on my testing, there is so much I don't know about yt-dlp that I figured this was the safer bet
@pukkandan Again, no rush but is there anything I can do here to make it easier to review and test? Let me know if I can help! |
Description of your pull request and other information
Related to #9445 along with this comment #9445 (comment)
Issue:
The
MoveFilesAfterDownloadPP
pre-processor moves all files as expected, but it only updates the filepath for the main media file in theinfo
dict. This means that file moves related to thumbnails and subtitles do take place, but the filepath for these entries in the info map isn't updated.Example:
After running the above command, you'll see that the thumbnail is correctly appended with
-thumb
on the filesystem, but thethumbnails
section inoutput.json
references the original non-thumb
path.Fix:
My approach roughly looks like this:
files_to_move
- that is no longer relevantMoveFilesAfterDownloadPP
has been updated to determine the final filename usingprepare_filename
--write-all-thumbnail
's indexed filenames work as expectedfilepath
entries from the info dict if a file was downloaded as an embed but is deleted after processingImportant to note that this PR only concerns itself with files that are moved as a standalone step (requested_subtitles, thumbnails, main media file). Files that are written straight to their final location (description, etc.) are not processed by
MoveFilesAfterDownloadPP
. This is the same behaviour as before but I wanted to call it outI'll be the first to say that I'm NOT a Python guy, so please let me know if there are better ways to accomplish things.
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?