Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBIOException re-raised ValueError with obfuscating message #4729

Open
CatChenal opened this issue May 15, 2024 · 3 comments · May be fixed by #4753
Open

PDBIOException re-raised ValueError with obfuscating message #4729

CatChenal opened this issue May 15, 2024 · 3 comments · May be fixed by #4753
Assignees

Comments

@CatChenal
Copy link

Setup

I am reporting a problem with Biopython version, Python version, and operating
system as follows:

3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
CPython
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
1.83

Expected behaviour

The final error message — after re-raising a ValueError as a PDBIOException — is :

PDBIOException: Error when writing atom ('cif', 0, 'e', (' ', 609, ' '), ('CZ2', ' '))

While the error message of its source is:

ValueError: Atom serial number ('100000') exceeds PDB format limit.

So, contrary to the comment at PDBIO.save:L390 (see traceback), the "caught" exception message is LESS informative than the original one.

Actual behaviour

The traceback shows the original error message was more informative:

ValueError                                Traceback (most recent call last)
File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:379, in PDBIO.save(self, file, select, write_end, preserve_atom_numbering)
    378 try:
--> 379     s = get_atom_line(
    380         atom,
    381         hetfield,
    382         segid,
    383         atom_number,
    384         resname,
    385         resseq,
    386         icode,
    387         chain_id,
    388     )
    389 except Exception as err:
    390     # catch and re-raise with more information

File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:188, in PDBIO._get_atom_line(self, atom, hetfield, segid, atom_number, resname, resseq, icode, chain_id, charge)
    187 if atom_number > 99999:
--> 188     raise ValueError(
    189         f"Atom serial number ('{atom_number}') exceeds PDB format limit."
    190     )
    192 # Check if the element is valid, unknown (X), or blank

ValueError: Atom serial number ('100000') exceeds PDB format limit.

The above exception was the direct cause of the following exception:

PDBIOException                            Traceback (most recent call last)
Cell In[31], line 12
      9 io.set_structure(structure)
     10 with open(cif_out, "w") as fh:
     11     #try:
---> 12     io.save(fh)
     13     #except Exception as e:
     14     #    print(f"Could not save cif file as {cif_out}: {e}")

File ~/<...>/lib/python3.12/site-packages/Bio/PDB/PDBIO.py:391, in PDBIO.save(self, file, select, write_end, preserve_atom_numbering)
    379     s = get_atom_line(
    380         atom,
    381         hetfield,
   (...)
    387         chain_id,
    388     )
    389 except Exception as err:
    390     # catch and re-raise with more information
--> 391     raise PDBIOException(
    392         f"Error when writing atom {atom.full_id}"
    393     ) from err
    394 else:
    395     fhandle.write(s)

PDBIOException: Error when writing atom ('cif', 0, 'e', (' ', 609, ' '), ('CZ2', ' '))

Steps to reproduce

import Bio.PDB as PDB
parser = PDB.MMCIFParser(QUIET=True)

cif_file = Path("6kig-assembly1.cif") # downloaded from rcsb.org & unzipped
cif_out = Path("6kig.pdb")

structure = parser.get_structure("cif", cif_file)
io = PDB.PDBIO()
io.set_structure(structure)
with open(cif_out, "w") as fh:
    #try:    # I removed this in order to get the error trace
    io.save(fh)
    #except Exception as e:
    #    print(f"Could not save cif file as {cif_out}: {e}")
@peterjc
Copy link
Member

peterjc commented May 16, 2024

You're getting both exceptions with an unmodified Biopython, right?

Rather than this:

raise PDBIOException(
    f"Error when writing atom {atom.full_id}"
) from err

would you prefer:

raise PDBIOException(
    f"Error when writing atom {atom.full_id}: {err}"
) from err

or with a single exception:

raise PDBIOException(
    f"Error when writing atom {atom.full_id}: {err}"
) from None

Or there may be another way to improve this. @etal or @JoaoRodrigues ?

@CatChenal
Copy link
Author

Thanks for responding so promptly, @peterjc.

Yes, the biopython library I use is unmodified.

Additional information I forgot to mention:

  • This exception does not abort the saving operation: the pdb file is created & the number of atoms is truncated up to the pdb format limit, silently.
  • Setting auth_residues=False in PDB.MMCIFParser suppresses the io.save errors, but yields the same outcome: a truncated pdb file is created.

Since the root cause of the error is made explicit in the initial ValueError message, i.e:

Atom serial number ('100000') exceeds PDB format limit.

It should not be lost. So yes, propagating its message into a PDBIOException would be an improvement. Better yet: the message could mention that the pdb file created will be truncated to the atoms number limit of the pdb format.

The following modification would improve the io.save method`:

Add a parameter: truncate_ok (bool, default: True), to io.save to indicate that this is the effective behavior, and to abort the saving operation if it is set to False.

Thank you.

@JoaoRodrigues
Copy link
Member

Thanks for reporting @CatChenal, I'll have a look this weekend. This doesn't seem great..

@JoaoRodrigues JoaoRodrigues self-assigned this May 16, 2024
@JoaoRodrigues JoaoRodrigues linked a pull request Jun 21, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants