Always treat price data as pd.DataFrame, never as pd.Series #642

andreas-vester · 2023-08-16T11:35:58Z

I realized that vectorbt is treating price data differently depending on the number of symbols provided.

When providing a single symbol, vectorbt is returning a pd.Series, while it returns a pd.DataFrame when providign multiple symbols.

The following get function is responsible for this distinction.

vectorbt/vectorbt/data/base.py

Lines 699 to 718 in f78d1fd

 def get(self, column: tp.Optional[tp.Label] = None, **kwargs) -> tp.MaybeTuple[tp.SeriesFrame]: 

 """Get column data. 

  If one symbol, returns data for that symbol. 

  If multiple symbols, performs concatenation first and returns a DataFrame if one column 

  and a tuple of DataFrames if a list of columns passed.""" 

 if len(self.symbols) == 1: 

 if column is None: 

 return self.data[self.symbols[0]] 

 return self.data[self.symbols[0]][column] 

 concat_data = self.concat(**kwargs) 

 if len(concat_data) == 1: 

 return tuple(concat_data.values())[0] 

 if column is not None: 

 if isinstance(column, list): 

 return tuple([concat_data[c] for c in column]) 

 return concat_data[column] 

 return tuple(concat_data.values())

It might look reasonable to treat a single column as a pd.Series, however, this will possibly create problems/issues in downstream code.

For example, I m further processing the data that I retrive when applying the get method in my code. I need to ALWAYS get back a pd.DataFrame with the same column level names no matter how many symbols I provide.

There MIGHT be a quick solution for this issue. I changed part of the function in the following way:

def get(self, column: tp.Optional[tp.Label] = None, **kwargs) -> tp.MaybeTuple[tp.SeriesFrame]:
    """Get column data.

    If one symbol, returns data for that symbol.
    If multiple symbols, performs concatenation first and returns a DataFrame if one column
    and a tuple of DataFrames if a list of columns passed."""
    if len(self.symbols) == 1:
        if column is None:
            return self.data[self.symbols[0]]  # maybe this part needs to be adjusted, too?

        # NEW CODE
        df = pd.DataFrame(self.data[self.symbols[0]][column])
        df = df.rename({column: self.symbols[0]}, axis=1)
        df.columns = df.columns.set_names("symbol")

        return df

With this small change my downstream code seems to work properly, no matter if I provide a single or multiple symbols (fingers crossed).

Having said that, I would be more confident, if you were to change and test the library code.

In my experience, mixing pd.Series and pd.DataFrame causes trouble more often than not. I always go with pd.DataFrame no matter how many columns.

The text was updated successfully, but these errors were encountered:

polakowo · 2023-08-18T22:01:36Z

I agree fully, there's an entire argument in the pro version called single_symbol that switches this behavior but there's no such argument in the open-source version. Maybe here we could add an argument return_frame in Data.get, I'll add it to feature requests.

github-actions · 2024-02-05T01:48:43Z

This issue is stale because it has been open for 90 days with no activity.

andreas-vester · 2024-02-19T14:21:55Z

@polakowo Any chance that issue will be solved in the near future/at all?

github-actions bot added the stale label Feb 5, 2024

github-actions bot removed the stale label Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always treat price data as pd.DataFrame, never as pd.Series #642

Always treat price data as pd.DataFrame, never as pd.Series #642

andreas-vester commented Aug 16, 2023

polakowo commented Aug 18, 2023

github-actions bot commented Feb 5, 2024

andreas-vester commented Feb 19, 2024

Always treat price data as pd.DataFrame, never as pd.Series #642

Always treat price data as pd.DataFrame, never as pd.Series #642

Comments

andreas-vester commented Aug 16, 2023

polakowo commented Aug 18, 2023

github-actions bot commented Feb 5, 2024

andreas-vester commented Feb 19, 2024