You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Interesting corruption case: RuntimeError: bucket being iterated changed size and AssertionError: Bucket next pointer is damaged but only with C extension
#68
Open
jamadden opened this issue
Apr 13, 2017
· 2 comments
We have a corrupt BTree in a ZODB. I don't know exactly how we got here, but it just happened recently with the most recent releases of BTree and it's C extension, so I suspect there's a lurking bug.
This happens to be a IOBTree in a catalog, so it's probably pretty large.
The first symptom is that doing list(btree.keys()) results in RuntimeError: the bucket being iterated changed size. But this is a single-threaded program, and that's an atomic call implemented in C, so changing size is not possible.
When we subsequently do btree._check() we get this happy error: AssertionError: Bucket next pointer is damaged. Those two errors seem to fit correctly together (I think).
Here's where it gets interesting. If we do the same two operations with the Python implementation instead of the C implementation, they both pass. The iteration doesn't do that kind of checking, so the absence of a RuntimeError is not surprising. But the Python _checkdoes look for next pointers being damaged, and doesn't raise an error on our tree:
foriinrange(len(data)-1):
assert_(data[i].child._nextisdata[i+1].child,
"Bucket next pointer is damaged")
So we appear to have a case where Python and C are interpreting the same pickle data in different ways. (That's bad.)
It's highly probable that this BTree was written to by both the Python and C implementations at different times over the past few weeks. I think we removed the BTree C extension while we investigated zopefoundation/persistent#62 (BTree being another library with a C extension that had recently changed). It seems the obvious compatibility issues of mixing C and Python implementations may be gone, but there's still something going on there.
I can try to take a look at this (if no one wants to beat me to it 😄 ), but it'll probably be awhile before I can get back to it in depth.
The text was updated successfully, but these errors were encountered:
"#118 (comment)"
contains the analysis of a ZODB provided by @matthchr . There, a bucket deletion updates the bucket chain correctly for "forgets" to update data in the parent. As a consequence, the bucket chain gets damaged.
We have a corrupt BTree in a ZODB. I don't know exactly how we got here, but it just happened recently with the most recent releases of BTree and it's C extension, so I suspect there's a lurking bug.
This happens to be a IOBTree in a catalog, so it's probably pretty large.
The first symptom is that doing
list(btree.keys())
results inRuntimeError: the bucket being iterated changed size
. But this is a single-threaded program, and that's an atomic call implemented in C, so changing size is not possible.When we subsequently do
btree._check()
we get this happy error:AssertionError: Bucket next pointer is damaged
. Those two errors seem to fit correctly together (I think).Here's where it gets interesting. If we do the same two operations with the Python implementation instead of the C implementation, they both pass. The iteration doesn't do that kind of checking, so the absence of a
RuntimeError
is not surprising. But the Python_check
does look for next pointers being damaged, and doesn't raise an error on our tree:So we appear to have a case where Python and C are interpreting the same pickle data in different ways. (That's bad.)
It's highly probable that this BTree was written to by both the Python and C implementations at different times over the past few weeks. I think we removed the BTree C extension while we investigated zopefoundation/persistent#62 (BTree being another library with a C extension that had recently changed). It seems the obvious compatibility issues of mixing C and Python implementations may be gone, but there's still something going on there.
I can try to take a look at this (if no one wants to beat me to it 😄 ), but it'll probably be awhile before I can get back to it in depth.
The text was updated successfully, but these errors were encountered: