Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only make existing paths relative #3014

Closed
wants to merge 13 commits into from

Conversation

alejoe91
Copy link
Member

@alejoe91 alejoe91 commented Jun 11, 2024

Fixes #3013

  • Add tests for this behavior

@samuelgarcia
Copy link
Member

Not sure to like this.
I think the main problem is that electrical_series_path is a path internal to a file and not a system file.

Here, this patch is detecting if file existst or not but this notion is not related to differentiate system path and intra hdf5 path.

@alejoe91
Copy link
Member Author

We currently don't have a better way to tell which paths are real paths and which paths are not. The other option would be to remove path from the argument, but I think that those are good names.

@samuelgarcia
Copy link
Member

We should have an explicit way to declare some kwargs as system path and we should convert to relative only theses ones. Or the reverse skip some kwargs from this relative mechanism.
Here the behavior is totally hidden and this patch do not make many sens sens outside the context of this particular bug.

@alejoe91
Copy link
Member Author

What do you suggest? we could have a class attribute skip_relative_paths which lists the paths that need to be skipped?

@alejoe91 alejoe91 added bug Something isn't working core Changes to core module labels Jun 12, 2024
@samuelgarcia samuelgarcia added this to the 0.101.0 milestone Jun 12, 2024
@h-mayorquin
Copy link
Collaborator

This is an interesting one.

I agree with @samuelgarcia that the patch is too far away from the bug which is undesirable.
I also agree that electrical_series_path is a good name and we should not let an implementation detail make a decision over the exposed API (i.e. the naming of arguments). Moreover, if even if we change it or exclude it will show again when someone makes another argument that has path on their name and is a string. So we need a more robust solution.

My take on the origin of the problem is that the criteria that we use to decide which paths we modify is to loose:

Function to apply to the path. It must take a path as input and return a path
target : str, default: "path"

That is, we are using "path" in the name of the argument as a criteria to decide if something is a path and I think this is not precise enough and will cause problems

My proposal:

  • Either add @alejoe91 logic just just there in the recursive_path_modifier. The criteria is that path is in the name of the argument and that the file exists. This is more localized and makes the criteria very clear on what we are counting as a path. I think this is better than adding a new argument (check_if_exists) in many places and will be easier to change and modify if we need to. Or, an even simpler option is just to hardcode an exclude list right there that contains the arguments that should not be counted as paths. Both are super localized, we can write in a comment why they are there and change them easily.

I think this proposal has the advantage of fixing this and not changing a lot. I would start with that.

That said, long term what I would really like to see is for the criteria to be that we only modify to relative or absolute kwargs that are of Path type. Right now we are casting Paths to strings in most places but I don't think this is necessary. Both json and pickle can handle paths so we don't really need to keep them as strings internally, do we? If we do this, then the criteria to modify something becomes even more straighforward: modify kwargs that are of type Path. This is marked and controlled at the level of the extractor by storing something on the kwargs as a Path type or not. So it is both clear and controllable.

The problem with the second proposal is that is more disruptive and will take us a while on discussing the details (and maybe finding bugs) so I think the first solution is better.

@alejoe91
Copy link
Member Author

@h-mayorquin I tried to implement this at the recursive path modifier level and it works, but it requires an additional check_existance_after_func flag.
The reason is that it's ok for making paths relative, because paths exist before the modification, but it will fail with making path absolute, because the paths will exist after the modification.

IMO, this is good to go! @h-mayorquin tell me what you think!

I also modified tests so they actually create existing and non existing test paths

alejoe91 added a commit to alejoe91/spikeinterface that referenced this pull request Jun 25, 2024
@h-mayorquin
Copy link
Collaborator

I did not think about that problem, the thing is that you don't have the relative folder until you apply the function.

So I understand that we need check_if_exists because some tests depend on it.

But maybe we can get rid of the second keyword by just checking before and after?

  path_before_function = Path(v)
  result = func(v)
  path_after_function = Path(result)
  if path_before_function.exists() or path_after_function.exists():
      dc[k] = result

I really would prefer not to leak path checking implementation details outside of this function.

@alejoe91
Copy link
Member Author

I did not think about that problem, the thing is that you don't have the relative folder until you apply the function.

So I understand that we need check_if_exists because some tests depend on it.

But maybe we can get rid of the second keyword by just checking before and after?

  path_before_function = Path(v)
  result = func(v)
  path_after_function = Path(result)
  if path_before_function.exists() or path_after_function.exists():
      dc[k] = result

I really would prefer not to leak path checking implementation details outside of this function.

Great idea! I'll do that :) I agree that we should minimize the extra kwargs

@alejoe91
Copy link
Member Author

@h-mayorquin done!

@alejoe91
Copy link
Member Author

@h-mayorquin unfortunately I think we have to revert to the previous logic, here's why:

Some functions, like this one use the recursive_path_modifier to append to a list. So calling the func multiple times will defeat the purpose...let me know what you think, but maybe we need to go back to the check_existance_after_func construct...

@h-mayorquin
Copy link
Collaborator

Mmm I think those functions that are using recursive_path_modifier are hacky and will cause more problems. I also don't like that the current tests are forcing us to have keyword that we really should not need (the tests are dicatating features and it should be the other way around) .

@h-mayorquin
Copy link
Collaborator

@alejoe91 check out the appraoach in #3089. I am afraid of recursive_path_modifier growing more complicated and that we need to modify things at the deep level to make it work. The approach in #3089 does not need the new arguments as the logic for determining if the path is real can be added in make_paths_relative and make_paths_absolute as a simple filter in a list comprehension.

@alejoe91
Copy link
Member Author

alejoe91 commented Jul 3, 2024

Close in favor of #3089

@alejoe91 alejoe91 closed this Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

electrical_series_path/unit_table_path are made relative with relative_to
3 participants