Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(I#1750) Option to remove Blank Columns upon project creation #4757

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

wsmmxmm
Copy link

@wsmmxmm wsmmxmm commented Apr 14, 2022

Fixes #1750
Changes proposed in this pull request:

  • Determine whether a column is blank: (based on Added 'Store blank columns' importing feature  #3497): Create a new list in the TabularImportingParserBase class, that represents whether each column has data.
  • If the store blank column is false, delete this column (only happens when store blank column == false): removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines.
  • The checkbox of the store blank column added to the front-end page. After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly. In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.

Three parts of changes:
1. Determine whether a column is blank: (based on OpenRefine#3497): Create a new list<Boolean> in the TabularImportingParserBase class, that represents whether each column has data. 2. If the store blank column is false, delete this column (only happens when store blank column == false): removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines. 3. The checkbox of the store blank column added to the front-end page. After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly. In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.
@github-actions github-actions bot added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. Priority: Medium Represents important issues that need to be addressed but are not urgent labels Apr 14, 2022
@antoine2711 antoine2711 changed the title Fixes #1750 (I#1750) Option to remove Blank Columns upon project creation Apr 15, 2022
Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it looks pretty good to me! It would be good to have unit tests to demonstrate the feature, for instance you could test the feature on a few tabular importers (CSV, Excel…)

// for (int j = 0; j < deletedSoFar ; j++) {
// row.setCell(row.cells.size()-1-j, null);
// }
// }
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented out code meant to be submitted in this PR? If it is not needed, I would remove it entirely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented out code meant to be submitted in this PR? If it is not needed, I would remove it entirely.

I have removed the extra code in the new commits(?). I think it is more reasonable to remove the automatically added column names when deleting blank columns.

@wsmmxmm
Copy link
Author

wsmmxmm commented Apr 18, 2022

Thank you, it looks pretty good to me! It would be good to have unit tests to demonstrate the feature, for instance you could test the feature on a few tabular importers (CSV, Excel…)

And I add some unit tests in the test part.
It's weird that the feature is worked in the website on excel importer.
image
image

But it has different results in unit test.
I found the reason for this was that the results of the unit test processing on the blank cell were not consistent with expectations. But I decided not to look further for a solution after comparing it with csv importor test, perhaps because of historical reasons.

image

@wsmmxmm
Copy link
Author

wsmmxmm commented Apr 20, 2022

I found the failed email just now. This is my first pull request. Is there anything else I should do?

@WaltonG
Copy link
Member

WaltonG commented Apr 20, 2022

@wsmmxmm The reason for failling intergation is explained when you click on detaills. In your case its formatting issues.

From the developers manual

If you made Java changes, run linting to make sure they conform to our code style, with mvn formatter:format.

@WaltonG
Copy link
Member

WaltonG commented Apr 21, 2022

@wsmmxmm The integration error still persists.

/OpenRefine/main/tests/server/src/com/google/refine/importers/ImporterUtilitiesTests.java' has not been previously formatted. Please format file and commit before running validation! ->

@wetneb wetneb self-assigned this Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Medium Represents important issues that need to be addressed but are not urgent Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Blank Columns upon project creation/import
3 participants