Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import time #245

Closed
henryiii opened this issue May 21, 2020 · 9 comments · Fixed by #244
Closed

Import time #245

henryiii opened this issue May 21, 2020 · 9 comments · Fixed by #244

Comments

@henryiii
Copy link
Member

I've noticed that something that changed recently has caused the import time to skyrocket. (This could even just be in my branch). Running this in Python 3.7 with python -X importtime shows a whopping 10+ seconds devoted to particle.particle.literals!

@henryiii
Copy link
Member Author

henryiii commented May 21, 2020

Loading the nuclei table takes 10 seconds. @eduardo-rodrigues we need to address this before releasing. You could load the entire numpy library almost 100 times in this timeframe. Particle was designed to load a few hundred particles in a table, not 6,500. (though the speed per object is rather dismal...)

@henryiii
Copy link
Member Author

henryiii commented May 21, 2020

Ran py-spy and the flame graph showed that all the time was in in the search for an existing particle. I've moved the internal cls._table to a set instead of a list and used MyPy to help me fix all the necessary changes. (Particle was always designed to be hashable via PDGID number). It now runs in 0.1 second instead of 10 seconds and worked the first time (after the MyPy assistance).

@eduardo-rodrigues
Copy link
Member

@henryiii, these timing issues you are seeing is something I had sorted out a while back, see https://github.com/scikit-hep/particle/pull/231/commits and 55486ea specifically. I'm very surprised you are seeing the same kind of issues again. It was all working fine. I have no idea what happened in between the developments. Good anyway that you find another solution! I will look closer at the updates today ...

@eduardo-rodrigues
Copy link
Member

Just confirming - on my Windows 3.7.7 local installation all goes fast. Hmm …

@eduardo-rodrigues
Copy link
Member

Loading the nuclei table takes 10 seconds. @eduardo-rodrigues we need to address this before releasing. You could load the entire numpy library almost 100 times in this timeframe. Particle was designed to load a few hundred particles in a table, not 6,500. (though the speed per object is rather dismal...)

The issue with timing was indeed something I saw when adding information on nuclei, hence the discussion at the time since trouble for information that >95% of our users will not care about was a no-go. Indeed there are many isotopes added …

BTW, I'm working on a complete rewrite of the data files, which will be much easier to maintain and rather comprehensive. This relates also to #118 and will be much nicer. News as soon as I find more time to add the new data per categories - quarks, leptons, light mesons, charm baryons, etc.; you get it.

@eduardo-rodrigues
Copy link
Member

Ran py-spy and the flame graph showed that all the time was in in the search for an existing particle. I've moved the internal cls._table to a set instead of a list and used MyPy to help me fix all the necessary changes. (Particle was always designed to be hashable via PDGID number). It now runs in 0.1 second instead of 10 seconds and worked the first time (after the MyPy assistance).

I have not played with this so if you have some commands to teach me and give it a go, I would be very interested. Thanks a lot!

@henryiii
Copy link
Member Author

The problem was that I had to remove the file chaining that was being done, and that removed your speed hack. I might be able to restore the speed boost there too, and get even more! (Greedy cackling). I will check. The set nicely speeds up user code too, while the old hack only helped the original file loading.

@eduardo-rodrigues
Copy link
Member

Agreed that using set is neat :-).

Yep, we may be able to be even faster ...

@henryiii
Copy link
Member Author

Set checking is order 1; I can even drop the initial check for append entirely. (If you have multiple particles with the same PDGID, the last one will always be present). There's no performance difference for turning it off at all.

henryiii added a commit that referenced this issue May 21, 2020
* Some ignores for backports for typing

* Fixing a bunch of unclosed file warnings

* Starting on mypy support

* Adding MyPy support to PDGID

* Full mypy support added to pdgid/functions

As a result, anything that supports __int__ can now be used in the
functions; this includes Particle directly!

* Adding pre-commit changes

* Basic types for Particle (with several skips

* Fix for extra __main__ being dumped in the wrong place

* Fix for Python 2

* Fix for loading from zipfiles

* Make typing optional for the ZipApp

* Prepare to push ZipApp as well

* ZipApp requires Python 3.7+

* ZipApp compression and earlier Python 3 support

* Minor optimizations

* No need to skip a file that is not present

* Change internal table to a set. 100x faster load, faster searches. Fixes #245

* Add GHA badge

* Update docs/CHANGELOG.md

* Update docs/CHANGELOG.md

* Update docs/CHANGELOG.md

* Update docs/CHANGELOG.md

* Update docs/CHANGELOG.md

* Remove old performance addition - not needed with sets!

* Mention PDGID works on any SupportsInt

Co-authored-by: Eduardo Rodrigues <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants