New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check the header “Content-Disposition” to determine the filename to store #1118
Comments
+1 |
aria2 supports content-disposition header field, and the value which server sent is syntactically incorrect. The trailing ";" must not be there. The server is broken. |
A simple workaround for this: diff --git a/src/util.cc b/src/util.cc
index be73a6e4..ffa4236b 100644
--- a/src/util.cc
+++ b/src/util.cc
@@ -1555,6 +1555,12 @@ ssize_t parse_content_disposition(char* dest, size_t destlen,
case CD_AFTER_VALUE:
case CD_TOKEN:
return destlen - dlen;
+ case CD_BEFORE_DISPOSITION_PARM_NAME:
+ if ((flags & CD_FILENAME_FOUND) == 0 &&
+ (flags & CD_EXT_FILENAME_FOUND) == 0) {
+ return -1;
+ }
+ return destlen - dlen;
case CD_VALUE_CHARS:
if (charset == CD_ENC_UTF8 && dfa_state != UTF8_ACCEPT) {
return -1; |
Is the workaround ok? |
well server broken or not it's something happens in the wild and should be handled. wget can recognize all content-disposition replies pretty good but aria2 fails many times |
_ttps://drscdn.500px.org/photo/1022280281/q%3D80_m%3D1000/v2?sig=3234f9ae309e3e6a7a31f95ace6754e45c5171a8be579de3ae30a464f239beae content-disposition: filename=stock-photo-1022280281.jpg aria2c v1.35 doesn't use the remote name when saving the file, curl does. |
@eddiezato @tats-u @tatsuhiro-t @ahmedtds @Baneeishaque @kwkam |
@tatsuhiro-t One of examples of RFC2183 is:
This shows that trailing semicolons are officially allowed. It is unfortunate that the above links are dead, though. |
I can't download models from https://huggingface.co due to this bug 😞 |
you can just wget directly from huggingface. change the /blob/ into /resolve/ |
Surely I can download using wget or curl, but I want to use aria2c for multipart download. |
There's no reason to reject a response from a server just because of its trivial and easily-recoverable syntax "error". |
@popugasik said:
It says quite the opposite, actually. The grammar is defined as:
Specifically, the
we see that Absolutely nothing prohibits having a trailing semicolon, it's explicitly allowed. In fact you could have a hundred of them and still be compliant. Even if RFC6266 did forbid a trailing semicolon - which it doesn't - rejecting the header makes aria2 functionally unusable on a huge percentage of websites, and is a horrible user experience. HuggingFace is serving these files from AWS S3 via AWS CloudFront. CloudFront/S3 are responsible for these headers. A significant portion of websites serve large files from S3 via CloudFront. Rejecting this RFC-spec-compliant header breaks all downloads from these sites. It also (at last check) breaks downloads from recent nginx versions using the default out-of-the-box header configuration.
Yet this issue has been open for over five years and aria2 is still getting it wrong... n.b. on top of the above, section 4.3 of RFC 6266 states:
I'd argue semicolons fall into that category, though that's probably beside the point. |
Aria2 seems to be kind of dead anyway, is there an active aria2 fork out there that doesn't do this kind of nonsense? If not, maybe it's time to fork it now. |
nobody willing to start an active fork and fix issues like this? |
@tatsuhiro-t you still resist to fix this? Do you still think curl is wrong, wget is wrong, python is wrong, Amazon is wrong, nginx and several other most popular web servers are wrong but only you are correct? |
@tatsuhiro-t while you're active nowadays, can you finally solve this issue before a new version? |
Agree, Even it is wrong, Others supported as they can. |
It translates to zero or more instances of Anyway, I agree aria2 should handle this situation. Maybe something like this at the beginning of if (in[len-1]==';'){
A2_LOG_NOTICE(fmt("Content-Disposition header field (RFC1806) minor corruption (trailing ';') - fixing"));
--len;
} Output:
|
I don't know how to compile it but I'm not the only one who needs it, as you can see it affects a lot of people, so a fix in the release would be better. |
I don't think it needs to be logged. It's not something the user needs/wants to know and considering all the other major software and web server behaviour, it's most likely not right to call it a "corruption that was fixed" |
@fenopa, similar situation is with 7-Zip. You can find discussions there. Basically, in many cases, Igor refused to open/handle files that doesn't follow specification. Similar to aria2 situation was with WordPress plugin BackWPup generated tar files - GNU tar (and other programs) silently ignored the error, Igor decided to still report error but also unpack all files ( link ). Here my workflow script file. And here is github-actions release. |
Maintainers did not explicitly close this issue or stated they would not accept a pull request, so I suppose anyone could send a PR. |
@mgrinzPlayer thanks a lot for that build. |
@reschke can you put an end to this? who is right? |
The grammar does not allow trailing semicolons. But that doesn't necessarily mean it would be a problem to allow them. In any case, I would report site bugs when I'd encounter those. |
You and @mgrinzPlayer are right, I misread the grammar (think my annoyed-at-the-time brain inserted a W.R.T. 7-Zip, Igor's persistent refusal to accept the reality of formats that people are using and want to use has caused just as much strife (if not more) and no doubt had a negative effect on uptake/usage of 7-Zip. It's also not entirely comparable, since in the 7-Zip case we're talking about new, relatively-unknown formats that are not supported by much else in the way of software save for the existing 7-Zip forks. While in this case, we're talking about something that every other library/client comparable to aria2 handles gracefully, that's widely deployed and accepted across the internet as a whole.
|
If cloudfront doesn't follow simple |
same for me... so frustrated I use this now: https://github.com/Zibri/HFdl |
For example, the link http://audeering.com/download/1318/ (linked from http://audeering.com/technology/opensmile/#download) provides a Gzip-compressed tarball, but aria2 stored the tarball as the name
index.html
by the commandaria2c -x4 http://audeering.com/download/1318/
.This link returns the header
Content-Disposition
asattachment; filename="opensmile-2.3.0.tar.gz";
, so aria2 must interpret it and store the tarball as the nameopensmile-2.3.0.tar.gz
.The text was updated successfully, but these errors were encountered: