-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(python): offset overflow when issing table update #1291
Comments
It's unclear where this is coming from. I tried to reproduce this in Lance, but it worked fine. Is there any important details I might be missing? Here's my script: import lance
import pyarrow as pa
import random
import string
import tqdm
def rand_string(n):
return ''.join(random.choices(string.ascii_lowercase +
string.digits, k=n))
# Create a batch with 100MB of string data
data = pa.table({
"text": pa.array([rand_string(100 * 1024) for _ in range(1024)]),
})
# Write over 5GB of data
for _ in tqdm.tqdm(range(500)):
ds = lance.write_dataset(data, "test", mode="append")
# Try running an update query
ds.update(updates={"text": "'hello'"}, where="text = '{}'".format(data['text'][0].as_py())) |
Hm, my table is pretty wide (14 columns). Would that potentially come into play when walking through the pages during the update? |
It's possible is has something to do with that. Could you share what operations you did to write your table? That would help me figure out how to reproduce this. |
This table has grown a lot over time (and hence has a lot of versions, although I've periodically cleaned those up using |
LanceDB version
0.6.8
What happened?
I seem to be running into an offset overflow when issuing an update spanning my entire table:
Any guidance on how to potentially work around this/apply my updates in smaller batches? Happy to provide additional info.
Are there known steps to reproduce?
No response
The text was updated successfully, but these errors were encountered: