Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortingAnalyzer issues #3031

Open
khl02007 opened this issue Jun 14, 2024 · 7 comments
Open

SortingAnalyzer issues #3031

khl02007 opened this issue Jun 14, 2024 · 7 comments
Labels
core Changes to core module

Comments

@khl02007
Copy link
Contributor

khl02007 commented Jun 14, 2024

@alejoe91 @samuelgarcia
I just started trying the new sorting analyzer and wanted to share some issues that I have encountered:

  • It seems that every time you run sorting_analyzer.compute, you recompute all the extensions. I think that it would be better if it first checks if the extension exists with the same parameters and not re-do it if possible. Maybe there could be a recompute=True parameter that you can pass.
  • Some of the extensions depend on other extensions. For example, computing PC metrics requires the principal components extension. But the parameters passed may differ. For example, principal_components extension could be computed with n_components=5, but a PC metric that depends on it may ask for n_components=10. In that case it seems like the user should get at least a warning, or a error message saying that the parameters are not compatible.
  • Another idea is for the user to just be able to compute the quality metrics from the sorting analyzer, and the sorting analyzer to silently compute the extensions if they don't exist. I prefer this, as I don't really want to think about what kind of extensions are required for me to get my metrics.
  • In general from the user's perspective it is very opaque as to how the extensions depend on one another. Perhaps this can be better documented.
  • I feel that si.load_sorting_analyzer should have load_extensions=False by default (rather than load all the extensions by default, which could take a long time)
  • I'm not sure how I can add / attach a recording back to a sorting analyzer. I created a sorting analyzer with a sorting and a recording, and computed some extensions. Then later I tried to load the sorting analyzer and compute more extensions. But when I loaded the sorting analyzer with load_sorting_analyzer and tried to compute the extensions, I got an error saying that it requires a recording and the sorting analyzer doesn't have one. I can load the recording independently, but I don't know how it can be associated with the loaded sorting analyzer again. Would I 'recreate' the sorting analyzer again with create_sorting_analyzer again in that case?
@h-mayorquin h-mayorquin added the core Changes to core module label Jun 14, 2024
@h-mayorquin
Copy link
Collaborator

Just a comment, there is documentation for the extension dependency graph but it might not be easily discoverable:

https://spikeinterface.readthedocs.io/en/latest/modules/postprocessing.html#extensions-as-analyzerextensions

@khl02007
Copy link
Contributor Author

@h-mayorquin I see, thanks, should have read that

@zm711
Copy link
Collaborator

zm711 commented Jun 17, 2024

Another idea is for the user to just be able to compute the quality metrics from the sorting analyzer, and the sorting analyzer to silently compute the extensions if they don't exist. I prefer this, as I don't really want to think about what kind of extensions are required for me to get my metrics.

I wonder about this. I too like this idea to some extent especially since a chunk of users really only want to use default for everything and don't care about the fine-tuning...

But I think the idea for the analyzer was that it would force the user to make these choices for two reasons: 1) tuneability so that it is easy to optimize and reproduce and 2) education so the user can see how their choices affect quality.

Could I ask what your specific use-case is? You just want to use defaults?

I'm not sure how I can add / attach a recording back to a sorting analyzer. I created a sorting analyzer with a sorting and a recording, and computed some extensions. Then later I tried to load the sorting analyzer and compute more extensions. But when I loaded the sorting analyzer with load_sorting_analyzer and tried to compute the extensions, I got an error saying that it requires a recording and the sorting analyzer doesn't have one. I can load the recording independently, but I don't know how it can be associated with the loaded sorting analyzer again. Would I 'recreate' the sorting analyzer again with create_sorting_analyzer again in that case?

This will be solved with #2785. :) Just not merged yet.

Some of the extensions depend on other extensions. For example, computing PC metrics requires the principal components extension. But the parameters passed may differ. For example, principal_components extension could be computed with n_components=5, but a PC metric that depends on it may ask for n_components=10. In that case it seems like the user should get at least a warning, or a error message saying that the parameters are not compatible.

this is interesting. What is the actual example you're thinking about?

It seems that every time you run sorting_analyzer.compute, you recompute all the extensions. I think that it would be better if it first checks if the extension exists with the same parameters and not re-do it if possible. Maybe there could be a recompute=True parameter that you can pass.

this shouldn't happen. Could you double-check that this is happening within the context of the dependency graph.

@samuelgarcia
Copy link
Member

Hi Kyu,
than you for the feedback.

  • It seems that every time you run sorting_analyzer.compute, you recompute all the extensions. I think that it would be better if it first checks if the extension exists with the same parameters and not re-do it if possible. Maybe there could be a recompute=True parameter that you can pass.

Normally compute, launch the computation only on the targetted list or dict. This is strange.
"compute" is more or less "recompute" when it already exists.

  • Some of the extensions depend on other extensions. For example, computing PC metrics requires the principal components extension. But the parameters passed may differ. For example, principal_components extension could be computed with n_components=5, but a PC metric that depends on it may ask for n_components=10. In that case it seems like the user should get at least a warning, or a error message saying that the parameters are not compatible.

Having several instances of an extension with differents parameters would be extremely complicated and also error prone - I think -
For this specific case I would recommand to have several analyzers with several parameters sets.

  • Another idea is for the user to just be able to compute the quality metrics from the sorting analyzer, and the sorting analyzer to silently compute the extensions if they don't exist. I prefer this, as I don't really want to think about what kind of extensions are required for me to get my metrics.

I think I prefer no.
I am trying to kill slowly every magics stuff happening under the wood in favor of more explicit steps in spikeinterface.

  • I feel that si.load_sorting_analyzer should have load_extensions=False by default (rather than load all the extensions by default, which could take a long time)

I am agree with you but Alessio convinced me that most of end users would prefer to load everything at once.
Loading only a few extensions is a niche for optimization which can be achieved with load_extensions=False.

@khl02007
Copy link
Contributor Author

khl02007 commented Jun 17, 2024

Thanks @zm711 and @samuelgarcia

  • If you guys would like to make every parameter explicit for every processing step in general, I think that's fine and see value in doing so. In that case maybe having a better documentation may help.
  • I meant that if you run e.g. sorting_analyzer.compute('waveforms') twice, it will run the processing both times even if it has already been done with the same parameters the first time. Seems like it would be better to avoid that. I'm working with a long recording and accidentally did this, which led to some wasted time.
  • For the clash between extensions issue, what I had in mind was principal_components vs. quality_metrics, specifically pc params like nearest neighbor isolation / noise overlap. In this case, the user may give different values for the same parameter (e.g. n_components). I subsequently learned that these metrics cannot be computed anyway (they are disabled because they take too long to compute?) but I think it can still happen elsewhere. Maybe you guys have already thought about this and designed the propagation of the parameters such that there is no duplication.

Feel free to close this issue once #2785 is merged.

@zm711
Copy link
Collaborator

zm711 commented Jun 17, 2024

I meant that if you run e.g. sorting_analyzer.compute('waveforms') twice, it will run the processing both times even if it has already been done with the same parameters the first time. Seems like it would be better to avoid that. I'm working with a long recording and accidentally did this, which led to some wasted time.

Assumption here is that if you run it again it is because you have changed something and want to re-run it. But I agree I have made this mistake many times accidentally and it is frustrating. Maybe we could add a check such that if the params are the exact same then we don't re-run?

I subsequently learned that these metrics cannot be computed anyway (they are disabled because they take too long to compute?

I think that is just for default... I you look at the code if you force it by saying metrics_names = [nn...] then even though the default PC will be skipped the fact it is listed as a desired metric should include this. Based on my reading.But I also agree this is a little annoying that we remove something at default. Not sure the thought process, ie, was it for testing?

@alejoe91
Copy link
Member

We removed nn_isolation and nn_overlap because they are extremely slow on hgih-density probes (e.g. Neuropixels). Maybe we can make this behavior dependent on the number of channels/units? of course one can always force the metric_names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

No branches or pull requests

5 participants