Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow option in set_with_dataframe and get_as_dataframe to omit redundant higher-level labels in MultiIndex index columns #22

Open
robin900 opened this issue Apr 2, 2020 · 7 comments
Assignees

Comments

@robin900
Copy link
Owner

robin900 commented Apr 2, 2020

From an earlier issue #20 , @sam-s said:

In this case, I would like to have a separate column (with a header!) for each level.

I am torn between

foo one
foo two

and

foo one
    two

I think there should be an option to select one of them, the default being the 1st variant.
The same goes for column indexes.

@robin900 robin900 changed the title Allow option in set_with_dataframe to omit redundant higher-level labels in MultiIndex index columns Allow option in set_with_dataframe to omit redundant higher-level labels in MultiIndex index or column header Apr 2, 2020
@sam-s
Copy link

sam-s commented Apr 2, 2020

Just for clarity -- the 2nd variant is NOT an empty cell under foo but a double cell with foo.

@robin900
Copy link
Owner Author

robin900 commented Apr 2, 2020

OK then, let's re-define this as:

Variant 2 - use empty values to avoid redundant higher-level labels on consecutive rows. Thus "empty cell under foo".

Variant 3 - merge all the cells in a higher-level column with same higher-level label. Thus "double cell with foo". Problem with this is that it involves merging cells, which is a separate sheets API mergeCells request for each group of cells to be merged. They can all be put into a single batchUpdate request, but it's a whole lot of bookkeeping to perform. Can't promise this variant would be implemented anytime soon.

Incidentally, what would you like the new parameter to set_with_dataframe to be called to select this/these new variants?

@sam-s
Copy link

sam-s commented Apr 2, 2020

I suggest something like MultiIndex or handle_MultiIndex with descriptive str values as is common in pandas.

@robin900
Copy link
Owner Author

Here's a sample DataFrame, posted by @sam-s and copied here by me:

Describe the bug

Uploading a DataFrame with MultiIndex columns should merge cells appropriately

Version and Environment
Version of package: 0.3.7
Python interpreter:

Python 3.9.1 (default, Dec 10 2020, 10:36:35) 
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin

and Python 3.6.9 on Linux 5.4.67
OS: MacOS, Linux

To Reproduce

df = pd.DataFrame({("a","x"):[1,2,3],("a","y"):[4,5,6],("b","x"):[7,8,9],("b","y"):[1,3,5],"i":[11,11,22],"j":[1,2,2]})
df.set_index(keys=["i","j"],verify_integrity=True, inplace=True)
df.columns = pd.MultiIndex.from_tuples(df.columns)

now, df is

      a     b   
      x  y  x  y
i  j            
11 1  1  4  7  1
   2  2  5  8  3
22 2  3  6  9  5

now upload it to google using set_with_dataframe.

Expected behavior

I expect a and b to appear on the screen once, just like in the printed representation above.

Screenshots

here is what I get:

image

here is what I want:

image

Desktop (please complete the following information):

  • OS: MacOS
  • Browser chrome
  • Version 87

@robin900
Copy link
Owner Author

robin900 commented Nov 30, 2021

I'm going to write out the specs for this requested feature below:

For both set_with_dataframe and also get_as_dataframe, a new parameter called handle_MultiIndex indicates how columns participating in a MultiIndex are either written or parsed. The allowed parameter values are:

  • 'repeat' indicates that columns for upper levels in a MultiIndex should repeat the level values in every cell. This is the default value.
  • 'blank' indicates that columns for upper levels in a MultiIndex should leave empty all but the first cell showing a particular value for that level.
  • 'merge' indicates that columns for upper levels in a MultiIndex should merge all cells sharing a particular value for the level.

Parsing behavior in get_as_dataframe will interpret worksheet data in the same manner for both 'blank' and 'merge' modes, since when cells are merged the first/leftmost/topmost cell contains the value and the remaining cells in the merge have an empty value. Repeated values in an upper-level column will each be read as distinct values for the MultiIndex level if the parameter value is 'blank' or 'merge'. To achieve safe round-trip travel for values between a Google worksheet and a DataFrame, use the same value for handle_MultiIndex in both the set_with_dataframe and get_as_dataframe function calls.

@robin900 robin900 changed the title Allow option in set_with_dataframe to omit redundant higher-level labels in MultiIndex index or column header Allow option in set_with_dataframe and get_as_dataframe to omit redundant higher-level labels in MultiIndex index columns Nov 30, 2021
@robin900
Copy link
Owner Author

robin900 commented Mar 2, 2022

An update, since I've been spending a little time on this feature. The work that remains is to have get_as_dataframe() correctly build a MultiIndex for both a multiple-column index (indexcols=[...] in pandas reader) and multiple-row column header (header=[...] in pandas reader). When handle_MultiIndex is either 'blank' or 'merge', the parsing logic will need either to repair a MultiIndex riddled with empty level values, or fill in the empty values before passing the data to pandas for parsing. For row indexes, fixing after the fact might work; but for column headers, it seems less likely to work.

@robin900
Copy link
Owner Author

robin900 commented Mar 15, 2022

Note that the proper representation of MultiIndex columns in #44 is a separate issue that is nonetheless related, because header cells for MultiIndex columns will need to be blanked and cell-merged properly after the adjustments of #44 are implemented.

@robin900 robin900 self-assigned this Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants