Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileDataSource.get_dataframe(args_dict) should also provide simple equality-based selection #96

Open
schuderer opened this issue Mar 31, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@schuderer
Copy link
Owner

schuderer commented Mar 31, 2020

Description

Right now, DBMS-style DataSources allow for querying using parameters provided through args_dict. The most common case is probably providing a value to check equality with (e.g. an ID for an item to fetch).

Although DataSources claim to be isomorphic towards model code, in the case of the FileDataSource, one would have to write specific code to select the desired record from the loaded CSV.

To make this claim somewhat more true (and usage more consistent between kinds of DataSources, at least for the equality case), I propose to add functionality to FileDataSource.get_dataframe to use the args_dict parameter for selection. args_dict would be a dictionary of column key(s) with values to equality-test. get_dataframe would return a subset of the originally loaded DataFrame.

Other comments

If there is a clean, reasonably fast, pandas-supported way to do this on CSVs without loading them into memory first, this would be preferrable to first loading all data, then filtering. Maybe this is relevant: https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function

@schuderer
Copy link
Owner Author

A mapping between "param name" and column name would also be nice -- otherwise, if one wants to keep lookup behaviour consistent between a database and csv source, one would have to change the parameter name in the sql to the corresponding column name

@schuderer schuderer added this to To do in Prioritized User Issues via automation May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant