RCORE-2141 RCORE-2142 Clean up a bunch of old encryption cruft #7698

tgoyne · 2024-05-15T23:03:36Z

The global shared cache of encrypted file maps was originally required because we actually opened Realm files mulitple times in normal usage, so each of the open files had to know about each other to copy things around. #4839 made it so that in normal usage we only ever have one DB instance per file per process, so it became dead code. Multiprocess encryption made it unneccesary even when the one-DB-per-process rule is violated, as the multiprocess code path covers that.

This eliminates our last reliance on file UniqueIDs, so it lets us get rid of hacks related to that.

The encryption page reclaimer mostly never actually worked. It used a very conservative page reclamation rule that meant that pages would never be reclaimed if there was a long-lived Transaction, even if it was frozen or kept refreshed. This is very common in practice, and when it doesn't happen the DB usually isn't kept open either, making it redundant.

Encryption used to rely on handling BAD_EXEC signals (or mach exceptions) rather than explicit barriers, so it had to read and write in page-sized chunks. That's no longer the case, so we can eliminate a lot of complexity by always reading and writing in 4k blocks.

Our use of off_t meant that on Windows we didn't support >2GB files because off_t is 32-bit even on x64 Windows. The encryption layer now theoretically supports files up to 8 TB on 32-bit (which isn't relevant because SlabAlloc doesn't).

This makes it so that the multiprocess encryption codepaths can be tested in a single process, and in fact UNITTEST_ENCRYPT_ALL=1 will incidentally test it in a bunch of places. This revealed a preexisting bug:

Process 1 reads page X
Process 2 writes to one byte range in page X
Process 1 refreshes the reader mapping and marks the page as StaleIV
Process 1 writes to a different byte range in page X
This byte range is copied to the read mapping and the page is marked as UpToDate
Process 1 reads from the byte range written by process 2 and gets garbage data

When copying data to a StaleIV page we need to copy the entire page rather than just the modified bytes. We can't just mark the page as Clean because while it's fine for the reader mapping to see the data on disk rather than the newly written data while the write is still in progress, we wouldn't know when to actually reread the page.

We no longer use file seeking anywhere and use explicit position offsets. File seeking is spooky when multiple threads are involved and it involved a lot of extra syscalls.

IV refreshing now involves fewer read() calls. I don't think this is actually a meaningful perf gain since they would all have been warm cache hits anyway. Might be faster, though.

The global mapping_mutex is gone and encryption operations on two different DBs can now happen concurrently.

The error messages when decryption fails now include a little more information.

Fixes #7743. Fixes #7744.

finnschiermer · 2024-05-16T10:09:56Z

I applaud the wisdom found here :-)

coveralls-official · 2024-05-21T01:48:49Z

Pull Request Test Coverage Report for Build thomas.goyne_396

Details

1048 of 1098 (95.45%) changed or added relevant lines in 27 files are covered.
132 unchanged lines in 20 files lost coverage.
Overall coverage increased (+0.1%) to 90.956%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/realm/alloc_slab.cpp	20	21	95.24%
src/realm/db.cpp	17	18	94.44%
test/test_json.cpp	0	1	0.0%
test/test_encrypted_file_mapping.cpp	347	349	99.43%
test/test_file.cpp	99	101	98.02%
src/realm/util/file_mapper.cpp	35	38	92.11%
test/util/test_path.hpp	0	3	0.0%
src/realm/util/encrypted_file_mapping.hpp	5	11	45.45%
src/realm/util/file.cpp	98	109	89.91%
src/realm/util/encrypted_file_mapping.cpp	359	379	94.72%

Files with Coverage Reduction	New Missed Lines	%
src/realm/sync/instructions.hpp	1	76.03%
src/realm/util/encrypted_file_mapping.hpp	1	41.27%
src/realm/util/serializer.cpp	1	90.43%
src/realm/uuid.cpp	1	98.48%
test/test_all.cpp	1	76.47%
test/test_dictionary.cpp	1	99.83%
src/realm/alloc_slab.cpp	2	90.56%
src/realm/object-store/shared_realm.cpp	2	91.89%
test/test_file.cpp	2	97.4%
test/test_lang_bind_helper.cpp	2	93.2%

Totals
Change from base Build 2388:	0.1%
Covered Lines:	214536
Relevant Lines:	235869

💛 - Coveralls

ironage · 2024-05-27T23:07:34Z

src/realm/util/encrypted_file_mapping.cpp

- flush();
- sync();
+ do_flush();


What is the reasoning for removing the call to sync? Is it because we can rely on the IV un-bumping strategy for consistency?

Removing a map is the wrong granularity for syncing. Either we need to be syncing between the IV write and the data write for every page, or we need to be syncing once (or twice) per transaction as part of committing. This was making us sync at fairly random times in the middle of the commit which weren't connected to anything logical, and is why some of the tests had to do a lot less work on Windows to not be unreasonably slow (FlushFileBuffers() is closer to F_FULLFSYNC than fsync()).

ironage · 2024-05-27T23:57:07Z

src/realm/util/encrypted_file_mapping.cpp

+static void memcpy_if_changed(void* dst, const void* src, size_t n)
+{
+#if REALM_SANITIZE_THREAD
+ // Because our copying is page-level granularity, we have some benign races


can we remove this case from suppression from test/tsan.suppress?

Looks like we can.

src/realm/util/encrypted_file_mapping.hpp

ironage · 2024-05-28T00:17:31Z

src/realm/util/file.cpp

@@ -1649,23 +1488,6 @@ FileDesc File::dup_file_desc(FileDesc fd)
 return fd_duped;
 }

-File::UniqueID File::get_unique_id()


Great to finally be rid of this 💯

test/test_file.cpp

test/test_lang_bind_helper.cpp

src/realm/util/overload.hpp

ironage · 2024-05-28T00:47:06Z

src/realm/util/load_file.cpp

 if (n == 0)
 break;
 used_size += n;
 }
 return std::string(buffer.data(), used_size); // Throws
 }
-
-
-std::string util::load_file_and_chomp(const std::string& path)


Could you move the simple deletions/cleanups like this one into a separate PR, a lot of the file/mapping stuff is probably too interconnected to be worth extracting, but it would be nice to reduce the number of non-encryption related clean-ups here. (These are all great though!)

There's a few steps in between, but this is actually connected to the encryption changes. Making File::seek() work for encrypted files was previously done with a global mutex which I wanted to kill. Rather than trying to figure out some locking scheme that would make seeking work I updated all of our uses of File to not rely on syncing and instead issue atomic read/write calls at specific offsets. For each thing I had to update for this I first checked if it was actually still used and just deleted it if not.

Ideally the File changes would all be a separate commit that goes before the encryption changes but they are pretty entangled, largely because of the map_flags thing that was passed around through every layer without ever being used for anything.

ironage

Nice improvements across the board 👍 I'm glad to see that the encryption code has been simplified as well. I gave this a fairly detailed review, but given the large scope of changes it may be best to get @finnschiermer to review as well in case I missed something.

tgoyne · 2024-05-29T18:18:01Z

I did a bit of benchmarking of this and concluded that while there's some things that are faster, it was really hard to actually hit the cases where you could hit the performance pitfalls of the old code, which is a good thing I guess. As a result the only real performance change from this is that operations on unrelated encrypted files no longer sometimes block each other.

finnschiermer

Very, very nice.

The global shared cache of encrypted file maps was originally required because we actually opened Realm files mulitple times in normal usage, so each of the open files had to know about each other to copy things around. #4839 made it so that in normal usage we only ever have one DB instance per file per process, so it became dead code. Multiprocess encryption made it unneccesary even when the one-DB-per-process rule is violated, as the multiprocess code path covers that. This eliminates our last reliance on file UniqueIDs, so it lets us get rid of hacks related to that. The encryption page reclaimer mostly never actually worked. It used a very conserative page reclaimation rule that meant that pages would never be reclaimed if there was a long-lived Transaction, even if it was frozen or kept refreshed. This is very common in practice, and when it doesn't happen the DB usually isn't kept open either, making it redundant. Encryption used to rely on handling BAD_EXEC signals (or mach exceptions) rather than explicit barriers, so it had to read and write in page-sized chunks. That's no longer the case, so we can eliminate a lot of complexity by always reading and writing in 4k blocks.

tgoyne added the no-jira-ticket Skip checking the PR title for Jira reference label May 15, 2024

tgoyne self-assigned this May 15, 2024

cla-bot bot added the cla: yes label May 15, 2024

tgoyne mentioned this pull request May 15, 2024

RNET-1141 multiprocess encryption for writers with different page sizes #7689

Open

4 tasks

tgoyne force-pushed the tg/file-map-cache branch 9 times, most recently from 8f9e7f4 to aa3d9c0 Compare May 21, 2024 01:08

tgoyne force-pushed the tg/file-map-cache branch 15 times, most recently from 4e6e8b1 to c468758 Compare May 24, 2024 23:08

tgoyne force-pushed the tg/file-map-cache branch 2 times, most recently from 80c8d66 to e2e78b2 Compare May 25, 2024 01:46

tgoyne marked this pull request as ready for review May 25, 2024 03:41

tgoyne requested review from ironage and finnschiermer May 25, 2024 03:41

ironage reviewed May 28, 2024

View reviewed changes

tgoyne force-pushed the tg/file-map-cache branch from e2e78b2 to 2a8ffbe Compare May 28, 2024 17:42

tgoyne changed the title ~~Clean up a bunch of old encryption cruft~~ RCORE-2141 Clean up a bunch of old encryption cruft May 28, 2024

ironage approved these changes May 28, 2024

View reviewed changes

tgoyne force-pushed the tg/file-map-cache branch from 2a8ffbe to 275822a Compare May 29, 2024 18:13

tgoyne changed the title ~~RCORE-2141 Clean up a bunch of old encryption cruft~~ RCORE-2141 RCORE-2142 Clean up a bunch of old encryption cruft May 29, 2024

tgoyne force-pushed the tg/file-map-cache branch 6 times, most recently from d6c8fec to 2ed0f28 Compare May 31, 2024 21:58

ironage mentioned this pull request Jun 3, 2024

Crashes on attempting to read realm file from some external drives #7454

Open

finnschiermer approved these changes Jun 6, 2024

View reviewed changes

tgoyne added 2 commits June 6, 2024 08:48

Fix UB in Tokenizer

7ab83e8

tgoyne force-pushed the tg/file-map-cache branch from 2ed0f28 to 7ab83e8 Compare June 6, 2024 15:48

tgoyne merged commit 42e4a85 into master Jun 6, 2024
39 checks passed

tgoyne deleted the tg/file-map-cache branch June 6, 2024 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RCORE-2141 RCORE-2142 Clean up a bunch of old encryption cruft #7698

RCORE-2141 RCORE-2142 Clean up a bunch of old encryption cruft #7698

tgoyne commented May 15, 2024 •

edited

finnschiermer commented May 16, 2024

coveralls-official bot commented May 21, 2024 •

edited

ironage May 27, 2024

tgoyne May 28, 2024

ironage May 27, 2024

tgoyne May 28, 2024

ironage May 28, 2024

ironage May 28, 2024

tgoyne May 28, 2024

ironage left a comment

tgoyne commented May 29, 2024

finnschiermer left a comment

RCORE-2141 RCORE-2142 Clean up a bunch of old encryption cruft #7698

RCORE-2141 RCORE-2142 Clean up a bunch of old encryption cruft #7698

Conversation

tgoyne commented May 15, 2024 • edited

finnschiermer commented May 16, 2024

coveralls-official bot commented May 21, 2024 • edited

Pull Request Test Coverage Report for Build thomas.goyne_396

Details

💛 - Coveralls

ironage May 27, 2024

Choose a reason for hiding this comment

tgoyne May 28, 2024

Choose a reason for hiding this comment

ironage May 27, 2024

Choose a reason for hiding this comment

tgoyne May 28, 2024

Choose a reason for hiding this comment

ironage May 28, 2024

Choose a reason for hiding this comment

ironage May 28, 2024

Choose a reason for hiding this comment

tgoyne May 28, 2024

Choose a reason for hiding this comment

ironage left a comment

Choose a reason for hiding this comment

tgoyne commented May 29, 2024

finnschiermer left a comment

Choose a reason for hiding this comment

tgoyne commented May 15, 2024 •

edited

coveralls-official bot commented May 21, 2024 •

edited