Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve file size estimate #205

Merged
merged 4 commits into from
Oct 24, 2020
Merged

Improve file size estimate #205

merged 4 commits into from
Oct 24, 2020

Conversation

sindresorhus
Copy link
Owner

@sindresorhus sindresorhus commented Oct 15, 2020

I tried out many different algorithms and variants, and in the end, found that this one produced the most accurate estimate:

  • Get all the normal frame time codes.
  • Pick 5 consecutive samples from 5 evenly distributed places in the video.
  • Convert with current settings.
  • Divide by the original frame count compared to the one used in the estimate.

Fixes #41


Here is a build: Gifski - with estimate.zip

Please try it out on various videos and check the estimate compared to the final result. The most important part is that the estimate never shows less than the actual file.

Note: The code and look are not done. It should be cleaned up a lot, but I would like to finalize the algorithm first.

Some potential optimization would be to run the 5 samples concurrently, but I don't plan to do that in this PR. I'll add a TODO comment.


Open questions:

  • Should we still show the naive estimate (the one we currently use) while the good estimate is being generated? Currently, the estimate takes from 5 - 15 seconds. The downside of showing the naive one is that it can be woefully incorrect sometimes.

@kornelski
Copy link
Collaborator

Looks good.

I see you're adding 10% just in case. I have an idea to make it more scientific: measure frame sizes in bytes, and compute standard deviation of frame sizes. If the deviation is large, then the estimate is uncertain and should be inflated. If all frame sizes are about the same, then the estimate is likely to be accurate.

@kornelski
Copy link
Collaborator

Alternative solution, which I think we've discussed previously, is to start the actual final conversion in the background, and use it for the estimate. When user presses start, instead of restarting, just reuse the in-progress conversion. This will have disadvantage of using frames from the beginning for the estimate, but OTOH it will make conversion seem faster, since it will get a head start.

@sindresorhus
Copy link
Owner Author

sindresorhus commented Oct 16, 2020

I have an idea to make it more scientific: measure frame sizes in bytes, and compute standard deviation of frame sizes. If the deviation is large, then the estimate is uncertain and should be inflated. If all frame sizes are about the same, then the estimate is likely to be accurate.

That's a good idea. I'll try it out.

This will have disadvantage of using frames from the beginning for the estimate

I tried that too and the estimate was much worse. We really need to evenly spread out samples to get an accurate estimate. I'd rather have a more accurate estimate over a slightly shorter conversion time.

@sunshinejr
Copy link
Contributor

@sindresorhus this is looking really good! tried few recordings from apps (this is mostly my use case for Gifski), and the file size was always a bit bigger, but not that much. E.g. I saw naive 59mb, then updated to 35,1 and then it generated 34,9. This is a really big improvement for me. And personally, I'd skip the "naive" and maybe add a loader or a "Estimating filesize" text?

@kornelski
Copy link
Collaborator

kornelski commented Oct 20, 2020

Regarding naive estimate:

  • Show a min-max range. "20MB-50MB" makes it clearer how imprecise it is.

  • Once you get any better estimate, remember the ratio between the naive and the better estimate. Apply that ratio to later naive estimates. This will give you decent estimates real time for changing parameters.

@sindresorhus
Copy link
Owner Author

Show a min-max range. "20MB-50MB" makes it clearer how imprecise it is.

How should we get the range? Just expand the current naive estimate both ways to get a lower/upper bound, or do you have something more clever in mind?

Once you get any better estimate, remember the ratio between the naive and the better estimate. Apply that ratio to later naive estimates. This will give you decent estimates real time for changing parameters.

That's a good idea.

@kornelski
Copy link
Collaborator

For lower/upper guesstimate I had in mind plugging in different constants/assumptions into the algorithm.

@sindresorhus
Copy link
Owner Author

For lower/upper guesstimate I had in mind plugging in different constants/assumptions into the algorithm.

Let's continue this in #130.

@sindresorhus
Copy link
Owner Author

I don't have time to implement and test all these things right now, but I've opened an issue to track them: #211

@sindresorhus
Copy link
Owner Author

I think it's more important to get this out there now. A lot of people have complained about inaccurate estimates.

@sindresorhus sindresorhus merged commit e6c97bc into master Oct 24, 2020
@sindresorhus sindresorhus deleted the better-estimate branch October 24, 2020 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve file size estimate
3 participants