Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always treat price data as pd.DataFrame, never as pd.Series #642

Open
andreas-vester opened this issue Aug 16, 2023 · 3 comments
Open

Always treat price data as pd.DataFrame, never as pd.Series #642

andreas-vester opened this issue Aug 16, 2023 · 3 comments

Comments

@andreas-vester
Copy link

I realized that vectorbt is treating price data differently depending on the number of symbols provided.

When providing a single symbol, vectorbt is returning a pd.Series, while it returns a pd.DataFrame when providign multiple symbols.

The following get function is responsible for this distinction.

def get(self, column: tp.Optional[tp.Label] = None, **kwargs) -> tp.MaybeTuple[tp.SeriesFrame]:
"""Get column data.
If one symbol, returns data for that symbol.
If multiple symbols, performs concatenation first and returns a DataFrame if one column
and a tuple of DataFrames if a list of columns passed."""
if len(self.symbols) == 1:
if column is None:
return self.data[self.symbols[0]]
return self.data[self.symbols[0]][column]
concat_data = self.concat(**kwargs)
if len(concat_data) == 1:
return tuple(concat_data.values())[0]
if column is not None:
if isinstance(column, list):
return tuple([concat_data[c] for c in column])
return concat_data[column]
return tuple(concat_data.values())

It might look reasonable to treat a single column as a pd.Series, however, this will possibly create problems/issues in downstream code.

For example, I m further processing the data that I retrive when applying the get method in my code. I need to ALWAYS get back a pd.DataFrame with the same column level names no matter how many symbols I provide.

There MIGHT be a quick solution for this issue. I changed part of the function in the following way:

def get(self, column: tp.Optional[tp.Label] = None, **kwargs) -> tp.MaybeTuple[tp.SeriesFrame]:
    """Get column data.

    If one symbol, returns data for that symbol.
    If multiple symbols, performs concatenation first and returns a DataFrame if one column
    and a tuple of DataFrames if a list of columns passed."""
    if len(self.symbols) == 1:
        if column is None:
            return self.data[self.symbols[0]]  # maybe this part needs to be adjusted, too?

        # NEW CODE
        df = pd.DataFrame(self.data[self.symbols[0]][column])
        df = df.rename({column: self.symbols[0]}, axis=1)
        df.columns = df.columns.set_names("symbol")

        return df

With this small change my downstream code seems to work properly, no matter if I provide a single or multiple symbols (fingers crossed).

Having said that, I would be more confident, if you were to change and test the library code.

In my experience, mixing pd.Series and pd.DataFrame causes trouble more often than not. I always go with pd.DataFrame no matter how many columns.

@polakowo
Copy link
Owner

I agree fully, there's an entire argument in the pro version called single_symbol that switches this behavior but there's no such argument in the open-source version. Maybe here we could add an argument return_frame in Data.get, I'll add it to feature requests.

Copy link

github-actions bot commented Feb 5, 2024

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the stale label Feb 5, 2024
@andreas-vester
Copy link
Author

@polakowo Any chance that issue will be solved in the near future/at all?

@github-actions github-actions bot removed the stale label Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants