New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash check if external process changed the content of the file #10662
base: master
Are you sure you want to change the base?
Hash check if external process changed the content of the file #10662
Conversation
I chose AHasher because it seems to be pretty fast but I can switch it with |
helix-view/src/document.rs
Outdated
@@ -888,7 +890,21 @@ impl Document { | |||
if let Ok(metadata) = fs::metadata(&path).await { | |||
if let Ok(mtime) = metadata.modified() { | |||
if last_saved_time < mtime { | |||
bail!("file modified by an external process, use :w! to overwrite"); | |||
let external_content = fs::read(&path).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid reading the entire file into memory at once
Ahash has too many collisions it's a good hashmap hash but not suited for something like this where we want to avoid collisios. I want to use a new hashfunction @cessen is working on in the future but for now blake3 is a good option |
Wow, thanks for jumping on this so quick. Blake3 update_mmap_rayon computes the hash without reading the file to memory |
Replaced |
helix-view/Cargo.toml
Outdated
@@ -49,7 +49,7 @@ toml = "0.8" | |||
log = "~0.4" | |||
|
|||
parking_lot = "0.12.1" | |||
|
|||
blake3 = { version = "1.5.1", features = ["mmap", "rayon"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pulls in 4 new dependencies...
helix-view/src/document.rs
Outdated
let mut editor_hasher = blake3::Hasher::default(); | ||
|
||
external_hasher.update_mmap_rayon(&path)?; | ||
text.chunks().for_each(|chunk| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense. If you just want to check if the file on disk is the same on as the one in memory you can just do a comparison but that isnt what we should do here.
Instead you need to compute the hash when the file is loaded from disk so you can check whether the file in memory right now is the same as the one we read from disk. Like I said everywhere we read the timestamp we also need to compute the hash (if the timestamp changed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason not to just directly compare text contents here, avoiding the hash altogether? Because I can't think of one.
Either way we have to load the file contents from disk (either to compute the hash or to compare directly). And computing the hash is likely(?) more expensive than just a direct equality check of the actual contents.
A hash is certainly useful to have around, but I think that's more for things like persistent undo where you don't want to store a whole copy of the file with the on-disk undo stack just to verify the stack's validity. So it's more of a space-saving measure, there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore the current implementation here that does not reflect what I had in mind. I left a review about that.
The hash is supposed to be a backup to mtime to avoid false positives. The hash will be computed when the file is read and the compared when the file is written. So we can't compare text directly because the text will have been edited since the file was read. You could keep a rope of the old file around but that would probably increase the memory consumption too much if you edit a lot of/very large files.
So it's a memory saving measure so we don't have to keep a copy of the filet hat was read form disk around. Vim does that too afaik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can't compare text directly because the text will have been edited since the file was read.
Ah! Right. Brain fart on that, ha ha. Yeah, you're absolutely right that a hash is a good approach for this.
You can see my function for persistent undo: helix/helix-core/src/history.rs Line 90 in 8f82b0a
Edit: Nvm, I see the review is about something else |
75b87d8
to
fa7bebc
Compare
@tgross35 could you please try the current state of the branch? This is how I tried it out: mkdir /tmp/yeet
cd /tmp/yeet
git init
git switch --create 1
echo "yeetus deletus" > test.txt
git add test.txt
git commit -m "hello there"
gswc 2
cd <helix-repo>
cargo run -- /tmp/yeet/test.txt
# <-- write some random stuff in helix -->
git switch 1
touch test.txt
# now try `:w` in helix and it should pass |
Closes #10657