Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gef files not parsed with read_cpt function #334

Open
tlukkezen opened this issue Jun 27, 2023 · 3 comments
Open

Gef files not parsed with read_cpt function #334

tlukkezen opened this issue Jun 27, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@tlukkezen
Copy link
Collaborator

These GEF files could not be parsed using the read_cpt() function. It's unclear why or what went wrong, so this ticket requires some investigation.

KNM_GEF_stuk.zip

@tlukkezen tlukkezen added the bug Something isn't working label Jun 27, 2023
@RDWimmers
Copy link
Member

import pygef

pygef.read_cpt("./KNM_GEF_stuk/S0270_35.gef")

> Traceback (most recent call last):
>   File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
>     exec(code_obj, self.user_global_ns, self.user_ns)
>   File "<ipython-input-3-4d7bdd2c1549>", line 3, in <module>
>     pygef.read_cpt("./KNM_GEF_stuk/S0270_35.gef")
>   File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/pygef/shim.py", line 82, in read_cpt
>     return gef_cpt_to_cpt_data(_GefCpt(path=file))
>   File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/pygef/gef/parse_cpt.py", line 134, in __init__
>     self.parse_data(
>   File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/pygef/gef/gef.py", line 151, in parse_data
>     return pl.read_csv(
>   File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/polars/io/csv/functions.py", line 354, in read_csv
>     df = pl.DataFrame._read_csv(
>   File "/home/robin/Documents/Repositories/pygef/venv/lib/python3.9/site-packages/polars/dataframe/frame.py", line 784, in _read_csv
>     self._df = PyDataFrame.read_csv(
> exceptions.ComputeError: projection index 1 is out of bounds for CSV schema with 1 columns

The #COLUMNSEPARATOR argument is not set in the GEF file. Therefore pyGEF assumes a space is used. Base on the GEF file a tab is used as separator.

import pygef

# Read in the file
with open('./KNM_GEF_stuk/S0270_35.gef', 'r') as file :
  filedata = file.read()

# Replace the target string
filedata = filedata.replace('\t', ' ')

# Write the file out again
with open('./KNM_GEF_stuk/S0270_35.gef', 'w') as file:
  file.write(filedata)

pygef.read_cpt("./KNM_GEF_stuk/S0270_35.gef")

Its nicer to provide a parsing error and not a polars error.

@tlukkezen
Copy link
Collaborator Author

Yes, throwing a custom error would definitely be preferred, e.g. pygef.exceptions.ParseCptGefError

We could throw it if the inferred column-separator can not be found on every row in the CSV-data for the expected amount of times (= #columns - 1).

@tlukkezen
Copy link
Collaborator Author

Linked to #367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants