Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to read XLS file: process doesn't terminate. #258

Open
Yv-vY opened this issue Jan 28, 2021 · 6 comments
Open

Failing to read XLS file: process doesn't terminate. #258

Yv-vY opened this issue Jan 28, 2021 · 6 comments

Comments

@Yv-vY
Copy link

Yv-vY commented Jan 28, 2021

Using the spreadsheet gem I've hit this bug when trying to load the attached XLS file, which obviously is bloated following a "save as" from XLSX to XLS in Microsoft Excel.
When selecting the worksheet 0, the ruby process starts to have high CPU usage and never completes (at least in 5 minutes).

Notes:

  • there's only ~ 300 lines of data in the file
  • the file reads perfectly with LibreOffice too
    BloatedExcelFile.zip

Here's the backtrace:


^C/opt/rubies/ruby-3.0.0/lib/ruby/gems/3.0.0/gems/irb-1.3.1/lib/irb.rb:418:in `raise': abort then interrupt! (IRB::Abort)
	from /home/yann/.gem/ruby/3.0.0/gems/spreadsheet-1.2.6/lib/spreadsheet/excel/reader.rb:677:in `block in read_row'
	from /home/yann/.gem/ruby/3.0.0/gems/spreadsheet-1.2.6/lib/spreadsheet/excel/reader.rb:672:in `fetch'
	from /home/yann/.gem/ruby/3.0.0/gems/spreadsheet-1.2.6/lib/spreadsheet/worksheet.rb:157:in `map'
	from /home/yann/.gem/ruby/3.0.0/gems/spreadsheet-1.2.6/lib/spreadsheet/worksheet.rb:157:in `each'
	from /home/yann/.gem/ruby/3.0.0/gems/spreadsheet-1.2.6/lib/spreadsheet/excel/worksheet.rb:47:in `each'
	from /home/yann/.gem/ruby/3.0.0/gems/roo-xls-1.2.0/lib/roo/xls/excel.rb:252:in `read_cells'
	from /home/yann/.gem/ruby/3.0.0/gems/roo-2.8.3/lib/roo/base.rb:119:in `block (2 levels) in <class:Base>'
	from /home/yann/.gem/ruby/3.0.0/gems/simple-spreadsheet-0.5.0/lib/simple-spreadsheet/modules/roo_module.rb:20:in `last_row'
	from (irb):4:in `<main>'
	... 3 levels...

Thanks for your support.

@zdavatz
Copy link
Owner

zdavatz commented Jan 28, 2021

So the file was created with Microsoft Office and saved as XLSX? Then it was saved with Microsoft Office as XLS?

@Yv-vY
Copy link
Author

Yv-vY commented Jan 28, 2021

Yes this is correct. The only thing is that it has been first created with Microsoft Excel and saved (to default XLSX format) and closed then opened again and saved as XLS.

@zdavatz
Copy link
Owner

zdavatz commented Jan 29, 2021

What happens if you save the file to XLS using LibreOffice? Does spreadsheet gem then work?

@zdavatz
Copy link
Owner

zdavatz commented Jan 29, 2021

And why was was the file first saved as XLSX and not as XLS?

@Yv-vY
Copy link
Author

Yv-vY commented Jan 29, 2021

What happens if you save the file to XLS using LibreOffice? Does spreadsheet gem then work?

Yes the file shrinks to 97 KB and then everything works fine

And why was was the file first saved as XLSX and not as XLS?

Because the user didn't pay attention to this when he created the file (XLSX is the default)

My concern is not about the facts there are some limitation with the spreadsheet gem, that's fine with me. But at least it should fail nicely if it can't process some XLS files. Else it's a nightmare for users of any system making use of this gem (we can't possibly tell them to check for bloated files and/or expect them to guess that if they can't import/process some data in your system it's because the file is bloated).

@zdavatz
Copy link
Owner

zdavatz commented Jan 29, 2021

I kind of a agree and if you have an idea how to "fail nicely" if spreadsheet gem can not process "an XLS" file then please submit a patch with the according test cases.

I am happy that LibreOffice is doing a good job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants