Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formidable detects MIME-type according to the file extension and not by the real content #749

Open
pubmikeb opened this issue Jul 10, 2021 · 18 comments
Labels
Area: files Things related to handling files, names, etc. Priority: High After critical issues are fixed, these should be dealt with before any further issues. Status: Accepted It's clear what the subject of the issue is about, and what the resolution should be. Status: In Progress This issue is being worked on, and has someone assigned. Type: Enhancement Most issues will probably be for additions or changes. Expected that this will result in a PR.

Comments

@pubmikeb
Copy link

Support plan

  • which support plan is this issue covered by? (e.g. Community, Sponsor, or
    Enterprise): Community
  • is this issue currently blocking your project? (yes/no): no
  • is this issue affecting a production system? (yes/no): yes

Context

  • node version: 16.4.2
  • module (formidable) version: 3.0.0-canary.20210428
  • environment (e.g. node, browser, native, OS): Node.js
  • used with (i.e. popular names of modules):
  • any other relevant information:
    Formidable detects MIME-type according to the file extension and not by the real content. Which means that user can fake the file MIME by changing file's extension and as a result to upload to the server not allowed file types.

BTW, multer detects file's MIME not by the extension only.

What are you trying to achieve or the steps to reproduce?

  1. Given JPG file tst.jpg
  2. Rename it to tst.pdf
  3. Set a break point inside of uploader.parse(req, async (err, fields, files) => {…}
  4. Try to upload tst.pdf
uploader.parse(req, async (err, fields, files) => {
	if (err) {
		reject(err);
	} else {}
});

What was the result you got?

mimetype = application/pdf

11_010801

What result did you expect?

mimetype = image/jpeg
Since this file is actually JPG file but with a from extension.

@pubmikeb pubmikeb added the bug label Jul 10, 2021
@GrosSacASac
Copy link
Contributor

That is good point, and the readme could definitely have a small guide on how to check this.

I don't think we will ever bake in content-based MIME detection since there are thousands of file types (and more created each year) and each one needs special detection. For example to know if it is a jpg is very different algorithm than zip, pdf etc.

@GrosSacASac GrosSacASac added docs and removed bug labels Jul 10, 2021
@pubmikeb
Copy link
Author

since there are thousands of file types (and more created each year) and each one needs special detection

Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.

@GrosSacASac
Copy link
Contributor

multer detects file's MIME not by the extension only.

How does multer do it ?

@GrosSacASac
Copy link
Contributor

Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.

Common types is very subjective.

And we are trying to go to the opposite direction, make the lib smaller not bigger. For example #718 (comment)

Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well.
1 lib that validates pdf and jpg etc.

Then everyone can decide to import or not, if he needs alongside formidable.

@pubmikeb
Copy link
Author

pubmikeb commented Jul 13, 2021

Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well.
1 lib that validates pdf and jpg etc.

Then everyone can decide to import or not, if he needs alongside formidable.

Sure, ideally would be to make formidable-core for the minimalistic and generic as much as possible functionality and a set of plugins e.g. formidable-pdf, formidable-msoffice, etc. So every project could select the required set of supported types and every developer could enrich a set of supported file types by writing a plugin. Something similar to ESLint's set of rules.

@pubmikeb
Copy link
Author

And we are trying to go to the opposite direction, make the lib smaller not bigger. For example #718 (comment)

Great approach, zero dependencies and 85 KB package are much better then alternative solutions with 2-5 MB of dependencies.

@pubmikeb
Copy link
Author

pubmikeb commented Jul 13, 2021

How does multer do it ?

It looks like the file type is detected by checking the magic number of the buffer.

It worth using the file-type package:

import FileType from "file-type";

(async () => {
	console.log(await FileType.fromFile("Unicorn.png"));
	console.log(await FileType.fromBuffer(fileBuffer));
	console.log(await FileType.fromStream(fileStream));

	//Output: {ext: "png", mime: "image/png'"}
})();

@TheThing
Copy link

How does multer do it ?

It looks like the file type is detected by checking the magic number of the buffer.

It worth using the file-type package:

import FileType from "file-type";

(async () => {
	console.log(await FileType.fromFile("Unicorn.png"));
	console.log(await FileType.fromBuffer(fileBuffer));
	console.log(await FileType.fromStream(fileStream));

	//Output: {ext: "png", mime: "image/png'"}
})();

Except file-type adds not a single dependency but twelve. Some of which are redundant at this point:

And formidable v3 already has plugin support so if you really want file type detection magic, just make a plugin for that.
I just usually don't even bother. If people wanna upload jpeg as pdf, that's their fault.

@pubmikeb
Copy link
Author

If people wanna upload jpeg as pdf, that's their fault.

OK, since such functionality is not must have and it takes to implement about 10 minutes, this issue can be closed.

@tunnckoCore
Copy link
Member

tunnckoCore commented Jul 16, 2021

Leaning more and more towards a separate package like utils or plugins. where we can include such helpers for common things, just like we started with separate examples in the examples dir.

There's never ending discussion between small and big packages. And I'm still bouncing between the two. Okay, small, almost no dependencies package.. but end users anyway will include a lot of deps to do what their needs are.. so.. that's why i don't see big problem of including things like some of the two qs.. or other basic things like correct basic mime handling, here in the core package. Convenience and usability, over impracticality.
Small packages / unix philosophy starts to break when you realize that you really need few things to have some meaningful and functional final product, with common features. I'm not saying I'm okey with the mentioned problem how much v2 started to grow.

I'm for staying thin and this to remain in userland with some guides, but it also might be a good thing to include some basic detection for most common types, and not something huge and very generic like file-type.

@pubmikeb
Copy link
Author

pubmikeb commented Jul 16, 2021

I think there is no right or wrong answer here.
I would take a decision in a more sophisticated way, based not solely on amount of dependencies, but on a combination of the following parameters:

  • How wanted/popular a feature, a dependency brings, is
    If it is requested by many or is a common and implemented in many scenarios, so why not to include it out-of-box?
  • Popularity of the dependency
    It should have at least 1M weekly downloads.
  • Maintenance status of the dependency
    It is actively maintained and the last release has been published at most 12M ago.
  • Dependency size (incl. all its sub-dependencies)
    Don't see any issue to include a dependency with a total (incl. all its sub-dependencies) size up to 1MB if it meets at least 3 of 4 requirements in this list.

@tunnckoCore, should I reopen this issue?

@tunnckoCore
Copy link
Member

Agree.

should I reopen this issue?

I don't know. I'm not very active last half year or more. Maybe we can open it if we decide to implement some basic detection.

@pubmikeb pubmikeb reopened this Jul 16, 2021
@GrosSacASac
Copy link
Contributor

If nobody wants to work on this we might as well close the issue

@tunnckoCore
Copy link
Member

tunnckoCore commented Oct 30, 2021

I'll work on it in some time.

Made some research:

  • file-type is too much and complex.
  • mimer is kind of okay, almost, except that instead of having dependency has db inside of it, which is huge size too.
  • magic-bytes.js is 192kb but its code architecture seems good and could be stripped down and replicated
    • i can suggest some updates to him.
    • it adds all the types here (12kb) like add('jpg', magicnumbers); add('pdf', pdfMagicNumbers);
    • it also adds the tests in the npm, so..
    • otherwise magic-bytes is free package name btw 😆

Can make @formidable/helper-mimetypes or @formidable/mimetype or formidable-mimetype, plugin or helper that uses magic-bytes or magic-bytes.js

@GrosSacASac
Copy link
Contributor

https://github.com/vader-sama/typective similar to magic-bytes but works with streams

Also remember that a valid number does not prove anything at the end, I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.

@tunnckoCore
Copy link
Member

I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.

Oh yea.. 😆 I remember that too.

typective

yea, seems good.

@tunnckoCore tunnckoCore added Priority: High After critical issues are fixed, these should be dealt with before any further issues. Status: Accepted It's clear what the subject of the issue is about, and what the resolution should be. Status: In Progress This issue is being worked on, and has someone assigned. Type: Enhancement Most issues will probably be for additions or changes. Expected that this will result in a PR. Area: files Things related to handling files, names, etc. and removed question labels Apr 2, 2022
@tunnckoCore
Copy link
Member

Another one: https://github.com/lukeed/mrmime

@Bessonov
Copy link

Bessonov commented Dec 6, 2022

I'm happy user of mmmagic. It was chosen ~6 years ago for execution inside aws lambda because of speed. But for formidable I would suggest to make it plugin-able. I mean not a specific plugin, but just a parameter like:

const form = formidable({
  getMimeType(file: Buffer): string {
    // custom logic
  }
})

Or plugin which allows custom logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: files Things related to handling files, names, etc. Priority: High After critical issues are fixed, these should be dealt with before any further issues. Status: Accepted It's clear what the subject of the issue is about, and what the resolution should be. Status: In Progress This issue is being worked on, and has someone assigned. Type: Enhancement Most issues will probably be for additions or changes. Expected that this will result in a PR.
Projects
None yet
Development

No branches or pull requests

5 participants