Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't keep recently accessed files during housekeeping #5053

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

link2xt
Copy link
Collaborator

@link2xt link2xt commented Nov 28, 2023

I have just unpacked an old backup,
changed the setting delete_device_after
and tried to export a backup.
This did not delete anything
because all files were considered recent
by their atime and crtime.

Here is an example file stat right after unpacking a backup:

$ stat image_2021-03-06_21-40-28.jpg
  File: image_2021-03-06_21-40-28.jpg
  Size: 66715           Blocks: 136        IO Block: 4096   regular file
Device: 254,3   Inode: 5338300     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/    user)   Gid: ( 1000/    user)
Access: 2023-11-28 14:00:12.436093996 +0000
Modify: 2021-03-13 18:21:28.000000000 +0000
Change: 2023-11-28 13:57:56.641835196 +0000
 Birth: 2023-11-28 13:57:56.641835196 +0000

atime is updated simply by opening a file,
it is not a good time reference e.g.
if user just viewed old media
and then enabled deletion of files from device.

mtime is the right thing to check,
it is the time blob was written to the disk,
this commit keeps this check.

ctime is the time of metadata modification,
such as permission changes,
we are not interested in this.

crtime is not even standardized
and not implemented by some filesystems.
In my case it is the time when I unpacked the backup, while the file in the backup is actually old.

@link2xt link2xt force-pushed the link2xt/housekeeping-recently-modified branch from 58ad780 to cc00e51 Compare November 28, 2023 17:19
@link2xt link2xt force-pushed the link2xt/housekeeping-recently-modified branch from cc00e51 to f1f21af Compare November 28, 2023 17:27
@r10s
Copy link
Member

r10s commented Nov 28, 2023

i was conservative for the access times because of the different systems.

iirc, on windows, at least on older ones, when you copy a file to a different location or move it from one location to another, the modification date is typically preserved. if i skim over https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfiletime it can still be interpreted like that

so, my fear was that sth. as ...

copy(file, file_in_blobdir)
msg = dc_msg_new(..)
dc_msg_set_file(msg, file_in_blobdir);
dc_set_draft(chat, msg);

... fails if houskeeping races in between copy() and dc_set_draft(). might be that the rust file api handles that somehow and maps it to "mtime as the time blob was written to the disk"

so, all in all pretty unsure about this pr. even if the "copy to blobdir" case is probably rare, esp. on window - for the example of the backup - housekeeping would probably still delete the files, only a bit later

@link2xt link2xt marked this pull request as draft November 28, 2023 22:42
@link2xt
Copy link
Collaborator Author

link2xt commented Nov 28, 2023

Ok, I think we need a test that creates a file, rewinds as many timestamps as possible to some known to be old date, creates a new blob and runs housekeeping, then makes sure the file is still in the blobdir. This is the only reason we actually want this check.

There should never be need to move message into the blobdir manually, if library users do this they are on their own to set mtime correctly.

@r10s
Copy link
Member

r10s commented Nov 28, 2023

There should never be need to move message into the blobdir manually, if library users do this they are on their own to set mtime correctly.

it seems to be also about copying. and i agree, in theory, it should probably not happen

in practise, however, i would not bet that this never happens, draft handling is still a mess here and there, the code out there is grown, and not always rethought from scratch on changes - therefore, i think, we should be careful and graceful here

nb: we would need to also double check internal races: if core copies file before adding a database record, there may also be races on windows

surely, one can cleanup and check all codepaths, but that costs time that will be missing elsewhere. so, being too strict here is maybe a wrong tradeoff currently

however, it core is fine, this PR would affect only desktop/windows, not the most tested or known platform, however, maybe @Simon-Laux knows more - if desktop does not copy/move things to the blobdir, this pr should be fine

@Simon-Laux
Copy link
Member

Simon-Laux commented Nov 29, 2023

copy(file, file_in_blobdir)

we don't do that manually in the UIs, do we? at least not on desktop. (we pass file path to core, if it is in clipboard we write it to a temp folder first, BTW: I would really like a core api to pass the blob directly to core without creating temp files)

@r10s
Copy link
Member

r10s commented Nov 29, 2023

we don't do that manually in the UIs, do we?

well, this is the question here :)

i checked the code bases roughly:

  • iOS: OK -- iOS seems not to call getBlobdir() at all - and also is linux-based
  • android: OK -- despite android using getBlobdir() in several code parts for performance reasons to avoid copying files around 1, android is linux-based and the mtime change of this PR should be fine
  • ubuntu touch: OK -- ubuntu touch seems not to call getBlobdir() at all - and also is linux-based
  • desktop: OK for mac/linux -- desktop's only usage for getBlobdir() seems to be to calculate the sticker folder, apart from that mtime change does not affect linux-bases systems
  • desktop: ?? for windows -- even if UI calls only send_msg() with file not in blobdir, reading the code, core seems to copy files before creating a database record, so there is a potential race, where housekeeping could happen between file-copy and database-modify (doing things the other way round may also be ugly as UI won't see the file for a record for a moment)

so, again, all in all, unsure if we want to open that can of worms currently :) main issue seems to be in core, however, maybe we've also overseen other things

compared to the bugs potential or the PR in its current form, the too large backup is a bit annoying, but probably the better tradeoff 2

alternative suggestion: assuming that ctime maps to windows lpCreationTime - why not check that together with mtime? so only remove the atime with this pr?

Footnotes

  1. writing files directly to blobdir is okay from current documentation and core also takes care of it; surely, that may be done differently, however, that would be quite some work and may also make things slower on android, the current discussion here is to not break existing UI code

  2. the too large backup is also a bit of a cornercase, importing a backup, deleting everything, exporting backup - if time would have been passed, things would have been deleted

@link2xt
Copy link
Collaborator Author

link2xt commented Nov 30, 2023

alternative suggestion: assuming that ctime maps to windows lpCreationTime - why not check that together with mtime? so only remove the atime with this pr?

You probably mean crtime? ctime is the metadata modification time, not file creation time.

Both would not work in my case, I have included stat output in the first post. All times except mtime got modified, creation time is modified as well when unpacking a backup. mtime is the time when the file contents was modified, while ctime and crtime point to the time when the file copy was created.

@link2xt
Copy link
Collaborator Author

link2xt commented Nov 30, 2023

compared to the bugs potential or the PR in its current form, the too large backup is a bit annoying, but probably the better tradeoff

Well, it simply does not work in my case, I have > 2Gb account on desktop and it does not fit on my phone. I can clone it on desktop and shrink to several hundred Mbs by removing all files older than 5 weeks, but if I try to transfer it to the phone without properly removing the blobs the transfer process runs out of space.

Or I have to wait 1 hour after changing the setting.

@r10s
Copy link
Member

r10s commented Nov 30, 2023

compared to the bugs potential or the PR in its current form, the too large backup is a bit annoying, but probably the better tradeoff

Well, it simply does not work in my case, I have > 2Gb account on desktop and it does not fit on my phone.

sure, i am not saying that you don't have issues :)

it's about weighing things up :)

You probably mean crtime? ctime is the metadata modification time, not file creation time.

Both would not work in my case, I have included stat output in the first post. All times except mtime got modified, creation time is modified as well when unpacking a backup. mtime is the time when the file contents was modified, while ctime and crtime point to the time when the file copy was created.

i see. do we know that we run under windows? maybe we can take crtime into account only on windows.

maybe we should also really try out things on windows.

we could also consider to use a smaller timespan, however, we should probably still be within at least tens of minutes, video recording may take some time on older devices (or are the files referenced once prepared?)

but still, if we do not find a good solution here, it think it is better to leave things as is, it seems to be the better tradeoff, even if some backups will make problems on devices with low memory. but that happens anyways at some point, if you run with few gb, you're always on the edge where backups will stop working

@iequidoo
Copy link
Collaborator

iequidoo commented Dec 1, 2023

  • desktop: ?? for windows -- even if UI calls only send_msg() with file not in blobdir, reading the code, core seems to copy files before creating a database record, so there is a potential race, where housekeeping could happen between file-copy and database-modify

But if this copying is done only in the core, maybe just create a new file (so that it has fresh mtime) before copying file contents? Looks like this is the only problem preventing merging this PR.

I have just unpacked an old backup,
changed the setting delete_device_after
and tried to export a backup.
This did not delete anything
because all files were considered recent
by their atime and crtime.

Here is an example file stat right after unpacking a backup:

$ stat image_2021-03-06_21-40-28.jpg
  File: image_2021-03-06_21-40-28.jpg
  Size: 66715           Blocks: 136        IO Block: 4096   regular file
Device: 254,3   Inode: 5338300     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/    user)   Gid: ( 1000/    user)
Access: 2023-11-28 14:00:12.436093996 +0000
Modify: 2021-03-13 18:21:28.000000000 +0000
Change: 2023-11-28 13:57:56.641835196 +0000
 Birth: 2023-11-28 13:57:56.641835196 +0000

atime is updated simply by opening a file,
it is not a good time reference e.g.
if user just viewed old media
and then enabled deletion of files from device.

mtime is the right thing to check,
it is the time blob was written to the disk,
this commit keeps this check.

ctime is the time of metadata modification,
such as permission changes,
we are not interested in this.

crtime is not even standardized
and not implemented by some filesystems.
In my case it is the time when I unpacked the backup,
while the file in the backup is actually old.
@link2xt link2xt force-pushed the link2xt/housekeeping-recently-modified branch from f1f21af to cfec3d5 Compare December 1, 2023 03:18
@link2xt link2xt force-pushed the main branch 2 times, most recently from 1abb12e to 2af9ff1 Compare March 4, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants