Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash check if external process changed the content of the file #10662

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

TornaxO7
Copy link
Contributor

@TornaxO7 TornaxO7 commented May 2, 2024

Closes #10657

@TornaxO7
Copy link
Contributor Author

TornaxO7 commented May 2, 2024

I chose AHasher because it seems to be pretty fast but I can switch it with blake3 if you want.

@@ -888,7 +890,21 @@ impl Document {
if let Ok(metadata) = fs::metadata(&path).await {
if let Ok(mtime) = metadata.modified() {
if last_saved_time < mtime {
bail!("file modified by an external process, use :w! to overwrite");
let external_content = fs::read(&path).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid reading the entire file into memory at once

@pascalkuthe
Copy link
Member

pascalkuthe commented May 2, 2024

Ahash has too many collisions it's a good hashmap hash but not suited for something like this where we want to avoid collisios. I want to use a new hashfunction @cessen is working on in the future but for now blake3 is a good option

@tgross35
Copy link

tgross35 commented May 2, 2024

Wow, thanks for jumping on this so quick.

Blake3 update_mmap_rayon computes the hash without reading the file to memory

@TornaxO7
Copy link
Contributor Author

TornaxO7 commented May 2, 2024

Replaced ahash with blake3 :)

@@ -49,7 +49,7 @@ toml = "0.8"
log = "~0.4"

parking_lot = "0.12.1"

blake3 = { version = "1.5.1", features = ["mmap", "rayon"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pulls in 4 new dependencies...

let mut editor_hasher = blake3::Hasher::default();

external_hasher.update_mmap_rayon(&path)?;
text.chunks().for_each(|chunk| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense. If you just want to check if the file on disk is the same on as the one in memory you can just do a comparison but that isnt what we should do here.

Instead you need to compute the hash when the file is loaded from disk so you can check whether the file in memory right now is the same as the one we read from disk. Like I said everywhere we read the timestamp we also need to compute the hash (if the timestamp changed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason not to just directly compare text contents here, avoiding the hash altogether? Because I can't think of one.

Either way we have to load the file contents from disk (either to compute the hash or to compare directly). And computing the hash is likely(?) more expensive than just a direct equality check of the actual contents.

A hash is certainly useful to have around, but I think that's more for things like persistent undo where you don't want to store a whole copy of the file with the on-disk undo stack just to verify the stack's validity. So it's more of a space-saving measure, there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the current implementation here that does not reflect what I had in mind. I left a review about that.

The hash is supposed to be a backup to mtime to avoid false positives. The hash will be computed when the file is read and the compared when the file is written. So we can't compare text directly because the text will have been edited since the file was read. You could keep a rope of the old file around but that would probably increase the memory consumption too much if you edit a lot of/very large files.

So it's a memory saving measure so we don't have to keep a copy of the filet hat was read form disk around. Vim does that too afaik

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can't compare text directly because the text will have been edited since the file was read.

Ah! Right. Brain fart on that, ha ha. Yeah, you're absolutely right that a hash is a good approach for this.

@kirawi
Copy link
Member

kirawi commented May 2, 2024

You can see my function for persistent undo:

fn get_hash<R: Read>(reader: &mut R) -> std::io::Result<[u8; HASH_DIGEST_LENGTH]> {

Edit: Nvm, I see the review is about something else

@TornaxO7 TornaxO7 force-pushed the improve-file-modified-detection branch from 75b87d8 to fa7bebc Compare May 11, 2024 12:46
@TornaxO7 TornaxO7 marked this pull request as draft May 11, 2024 12:46
@TornaxO7
Copy link
Contributor Author

TornaxO7 commented May 11, 2024

@tgross35 could you please try the current state of the branch?

This is how I tried it out:

mkdir /tmp/yeet
cd /tmp/yeet
git init

git switch --create 1
echo "yeetus deletus" > test.txt
git add test.txt
git commit -m "hello there"

gswc 2

cd <helix-repo>
cargo run -- /tmp/yeet/test.txt

# <-- write some random stuff in helix -->

git switch 1
touch test.txt

# now try `:w` in helix and it should pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"file modified by an external process" is noisy
7 participants