Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: could not convert string to int, better type checks for OOBTree keys under Python 3? #272

Open
zopyx opened this issue Jun 18, 2019 · 14 comments

Comments

@zopyx
Copy link
Member

zopyx commented Jun 18, 2019

Plone 5.2RC3, ZODB 5.5.1, Python 3.7

During a Plone migration over plone.restapi I received this random ZODB error, not sure how to reproduce:

2019-06-18 00:02:33,247 WARNING [waitress:354][waitress] application-written content was ignored due to HTTP response that may not contain a message-body: (204 No Content)
2019-06-18 00:02:35,161 ERROR   [ZODB.Connection:809][waitress] Couldn't load state for BTrees.OOBTree.OOBTree 0x52b7
Traceback (most recent call last):
  File "/home/ajung/sandboxes/ugent-longtime/eggs/ZODB-5.5.1-py3.7.egg/ZODB/Connection.py", line 795, in setstate
    self._reader.setGhostState(obj, p)
  File "/home/ajung/sandboxes/ugent-longtime/eggs/ZODB-5.5.1-py3.7.egg/ZODB/serialize.py", line 633, in setGhostState
    state = self.getState(pickle)
  File "/home/ajung/sandboxes/ugent-longtime/eggs/ZODB-5.5.1-py3.7.egg/ZODB/serialize.py", line 626, in getState
    return unpickler.load()
ValueError: could not convert string to int
2019-06-18 00:02:35,281 ERROR   [Zope.SiteErrorLog:251][waitress] 1560808955.19451550.22340673761002505 http://localhost:30081/plone_portal/lw/geschiedenis/en/POST_application_json_
Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 142, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 295, in publish_module
  Module ZPublisher.WSGIPublisher, line 229, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 57, in call_object
  Module plone.rest.service, line 23, in __call__
  Module plone.restapi.services, line 19, in render
  Module plone.restapi.services.content.add, line 93, in reply
  Module plone.restapi.serializer.dxcontent, line 138, in __call__
  Module Products.CMFPlone.CatalogTool, line 429, in searchResults
  Module Products.CMFCore.indexing, line 95, in processQueue
  Module Products.CMFCore.indexing, line 222, in process
  Module Products.CMFCore.indexing, line 50, in reindex
  Module Products.CMFCore.CatalogTool, line 367, in _reindexObject
  Module Products.CMFPlone.CatalogTool, line 351, in catalog_object
  Module Products.ZCatalog.ZCatalog, line 498, in catalog_object
  Module Products.ZCatalog.Catalog, line 369, in catalogObject
  Module Products.PluginIndexes.unindex, line 240, in index_object
  Module Products.PluginIndexes.unindex, line 276, in _index_object
  Module Products.PluginIndexes.unindex, line 216, in insertForwardIndexEntry
  Module ZODB.Connection, line 795, in setstate
  Module ZODB.serialize, line 633, in setGhostState
  Module ZODB.serialize, line 626, in getState
ValueError: could not convert string to int
@d-maurer
Copy link
Contributor

d-maurer commented Jun 18, 2019 via email

@zopyx
Copy link
Member Author

zopyx commented Jun 18, 2019

The index likely contains both "int" and "string" values. Under Python 3, BTrees can no longer have keys of different types: BTrees are based on an order and Python 3 (unlike Python 2) does not order objects of different types.

That does not explain how this corruption of self._index can occur.

@d-maurer
Copy link
Contributor

d-maurer commented Jun 18, 2019 via email

@d-maurer
Copy link
Contributor

d-maurer commented Jun 18, 2019 via email

@zopyx
Copy link
Member Author

zopyx commented Jun 18, 2019

(Pdb) pickletools.dis(pickle)
    0: \x80 PROTO      3
    2: c    GLOBAL     'BTrees.OOBTree OOBTree'
   26: q    BINPUT     0
   28: .    STOP
highest protocol among opcodes = 2
(Pdb) for x in pickletools.genops(pickle): print(x)
(<pickletools.OpcodeInfo object at 0x7f1b86dad048>, 3, 0)
(<pickletools.OpcodeInfo object at 0x7f1b86da7d08>, 'BTrees.OOBTree OOBTree', 2)
(<pickletools.OpcodeInfo object at 0x7f1b86da7ac8>, 0, 26)
(<pickletools.OpcodeInfo object at 0x7f1b86dad0a8>, None, 28)

@zopyx
Copy link
Member Author

zopyx commented Jun 18, 2019

The analysis is like this:

The issue is clearly an index corruption that happened within a Plone migration after a series of plone.restapi calls.

The affected field index is department which is defined in two content-types as a Choice field connected to a vocabulary. However the affected Python pickle of the underlaying index structure indicates that a content object was indexed which has a parent folder with ID department.

So the problem is caused clearly caused on the application level through acquistion.
However I raise the question how the ZODB - OOBTrees under Python 3 in particular - could be improved in order to avoid this kind of corruption. The OOBTree implementation would have to perform type checks...

@zopyx zopyx closed this as completed Jun 18, 2019
@zopyx zopyx reopened this Jun 18, 2019
@zopyx zopyx changed the title ValueError: could not convert string to int ValueError: could not convert string to int, better type checks for OOBTree keys under Python 3? Jun 18, 2019
@d-maurer
Copy link
Contributor

d-maurer commented Jun 18, 2019 via email

@zopyx
Copy link
Member Author

zopyx commented Jun 18, 2019

I have told you that the returned pickle consists of 2 adjacent
pickles (first class, second state).
Above you show the not very interesting first pickle.
You must look at the second one.

Apparently, the first pickle is 29 bytes long. This would
mean that the second pickle starts at byte 30.

Alternatively, you could wrap your pickle into a file like
object (likely io.BytesIO) and call "dis" 2 times on it.

No further analysis needed since the reason for the corruption appears to be clear now.

@d-maurer
Copy link
Contributor

d-maurer commented Jun 18, 2019 via email

@zopyx
Copy link
Member Author

zopyx commented Jun 18, 2019

There is no Python 2 in the game here. The ZODB was created from scratch under Python 3.7/Plone 5.2RC3

@d-maurer
Copy link
Contributor

d-maurer commented Jun 18, 2019 via email

@d-maurer
Copy link
Contributor

d-maurer commented Jun 18, 2019 via email

@NicolasGoeddel
Copy link

I stumbled on a similar issue here.
I have a field index myIndex and depending on the portal_type I sometimes add a list of strings to it and sometimes just a string. This seems not to work anymore. This happened while migrating a project from Plone 4 to Plone 5.2.2 with Python 3.8.
According to your responses so far it seems that I can no longer mix these types within one index, am I right? I now have to use different indexes for each portal_type. But this is a bit of an issue because the index directly accesses a field from the schema. So I have to rename the field in the schema and in every other place in my code?

@d-maurer
Copy link
Contributor

d-maurer commented Nov 27, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants