Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/$/compact/ with deleteOld=true does not remove the old folder (on Windows) #2127

Open
andreasnef opened this issue Dec 11, 2023 · 8 comments
Labels

Comments

@andreasnef
Copy link

Version

4.8.0

What happened?

We notice that on Windows (at least on Windows Server 2019), the compact call with the flag "deleteOld=true" active does not remove the old folder. There is not error message or exception in the log. The result is a new folder with a compacted database, but the old folder remains untouched.

Relevant output and stacktrace

curl -X POST -i -u "admin:XXX" "http://localhost:3030/$/compact/repository?deleteOld=true"

...
18:48:18 INFO  Admin           :: [2] Compact dataset /repository
18:48:18 INFO  Server          :: Task : 1 : Compact
18:48:18 INFO  Server          :: [Task 1] starts : Compact
18:48:18 INFO  Compact         :: [2] >>>> Start compact /repository
18:48:18 DEBUG DatabaseOps     :: Compact Data-0004 -> Data-0005

18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG TDB2StorageBuilder :: Triple table: SPO :: SPO,POS,OSP
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG TDB2StorageBuilder :: Quad table: GSPO :: GSPO,GPOS,GOSP,POSG,OSPG,SPOG
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:18 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment:8388608  BlockSize=8192  blocksPerSegment=1024
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG TDB2StorageBuilder :: Prefixes: GPU :: GPU
18:48:19 DEBUG LockMRSW        :: Lock : pool-4-thread-1
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:19 DEBUG BlockAccessMapped :: Segment: 0
18:48:20 DEBUG DatabaseOps     :: Deleting old database after successful compaction (old db path='D:\data\fuseki\repository\Data-0004')...
18:48:20 INFO  Compact         :: [2] <<<< Finish compact /repository
18:48:20 INFO  Server          :: [Task 1] finishes : Compact
...

Are you interested in making a pull request?

None

@andreasnef andreasnef added the bug label Dec 11, 2023
@kinow
Copy link
Member

kinow commented Dec 11, 2023

I am quite sure there was a similar comment either in GitHub issue/discussions, or in the mailing list. And I think at that time the answer was that it was by design… but I couldn't find the issue or email.

@andreasnef
Copy link
Author

Hm. I've read #1252, but that was before 4.8.0.

@andreasnef
Copy link
Author

andreasnef commented Dec 12, 2023

Adapted org.apache.jena.tdb2.sys.DatabaseOps to use java.nio.Files.delete() (currently the us of java.io.File.delete() doesn't check if delete is successful) and extended logging a bit, seems that on Windows we have other processes accessing these files:

10:16:20 DEBUG DatabaseOps     :: Deleting old database after successful compaction (old db path='D:\data\fuseki\repository\Data-0009')...
10:16:20 DEBUG DatabaseOps     :: deleting path D:\data\fuseki\repository\Data-0009\tdb.lock
10:16:20 DEBUG DatabaseOps     :: deleting path D:\data\fuseki\repository\Data-0009\SPOG.idn
10:16:20 ERROR DatabaseOps     :: --> failed
java.nio.file.FileSystemException: D:\data\fuseki\repository\Data-0009\SPOG.idn: The process cannot access the file because it is being used by another process.

	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) ~[?:?]
	at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:274) ~[?:?]
	at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) ~[?:?]
	at java.nio.file.Files.delete(Files.java:1142) ~[?:?]
	at org.apache.jena.tdb2.sys.DatabaseOps.lambda$deleteDatabase$1(DatabaseOps.java:297) ~[fuseki-server.jar:4.8.0]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
	at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
	at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:395) ~[?:?]
	at java.util.stream.Sink$ChainedReference.end(Sink.java:258) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
	at org.apache.jena.tdb2.sys.DatabaseOps.deleteDatabase(DatabaseOps.java:294) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.sys.DatabaseOps.compact(DatabaseOps.java:286) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.DatabaseMgr.compact(DatabaseMgr.java:81) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.ctl.ActionCompact$CompactTask.run(ActionCompact.java:109) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:66) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100) [fuseki-server.jar:4.8.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
10:16:20 WARN  Compact         :: [9] **** Exception in compact
org.apache.jena.atlas.RuntimeIOException: java.nio.file.FileSystemException: D:\data\fuseki\repository\Data-0009\SPOG.idn: The process cannot access the file because it is being used by another process.

	at org.apache.jena.atlas.io.IOX.exception(IOX.java:52) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.sys.DatabaseOps.lambda$deleteDatabase$1(DatabaseOps.java:300) ~[fuseki-server.jar:4.8.0]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
	at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
	at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:395) ~[?:?]
	at java.util.stream.Sink$ChainedReference.end(Sink.java:258) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
	at org.apache.jena.tdb2.sys.DatabaseOps.deleteDatabase(DatabaseOps.java:294) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.sys.DatabaseOps.compact(DatabaseOps.java:286) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.DatabaseMgr.compact(DatabaseMgr.java:81) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.ctl.ActionCompact$CompactTask.run(ActionCompact.java:109) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:66) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100) [fuseki-server.jar:4.8.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.nio.file.FileSystemException: D:\data\fuseki\repository\Data-0009\SPOG.idn: The process cannot access the file because it is being used by another process.

	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) ~[?:?]
	at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:274) ~[?:?]
	at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) ~[?:?]
	at java.nio.file.Files.delete(Files.java:1142) ~[?:?]
	at org.apache.jena.tdb2.sys.DatabaseOps.lambda$deleteDatabase$1(DatabaseOps.java:297) ~[fuseki-server.jar:4.8.0]
	... 20 more
10:16:20 ERROR Server          :: Exception in task 1 execution
org.apache.jena.atlas.RuntimeIOException: java.nio.file.FileSystemException: D:\data\fuseki\repository\Data-0009\SPOG.idn: The process cannot access the file because it is being used by another process.

	at org.apache.jena.atlas.io.IOX.exception(IOX.java:52) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.sys.DatabaseOps.lambda$deleteDatabase$1(DatabaseOps.java:300) ~[fuseki-server.jar:4.8.0]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
	at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
	at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:395) ~[?:?]
	at java.util.stream.Sink$ChainedReference.end(Sink.java:258) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
	at org.apache.jena.tdb2.sys.DatabaseOps.deleteDatabase(DatabaseOps.java:294) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.sys.DatabaseOps.compact(DatabaseOps.java:286) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.tdb2.DatabaseMgr.compact(DatabaseMgr.java:81) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.ctl.ActionCompact$CompactTask.run(ActionCompact.java:109) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:66) ~[fuseki-server.jar:4.8.0]
	at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100) [fuseki-server.jar:4.8.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.nio.file.FileSystemException: D:\data\fuseki\repository\Data-0009\SPOG.idn: The process cannot access the file because it is being used by another process.

	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) ~[?:?]
	at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:274) ~[?:?]
	at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) ~[?:?]
	at java.nio.file.Files.delete(Files.java:1142) ~[?:?]
	at org.apache.jena.tdb2.sys.DatabaseOps.lambda$deleteDatabase$1(DatabaseOps.java:297) ~[fuseki-server.jar:4.8.0]
	... 20 more
10:16:20 INFO  Server          :: [Task 1] finishes : Compact

@rvesse
Copy link
Member

rvesse commented Dec 12, 2023

Same root cause as #2092 (comment) - longstanding JDK on Windows bug.

@andreasnef
Copy link
Author

andreasnef commented Dec 12, 2023

Hi @rvesse

Thanks for putting this into context! That sounds a bit discouraging, though, meaning that the deleteOld parameter is not supported on Windows.

Not familiar with the code here, but I wonder if it wouldn't be possible to properly cleanup the mapped files then for the old "database"? I notice that the Windows file handles disappear after a while, and then it is possible to delete the folder. Is that connected to the GC settings as well?

@afs
Copy link
Member

afs commented Jan 3, 2024

I notice that the Windows file handles disappear after a while, and then it is possible to delete the folder. Is that connected to the GC settings as well?

Could be - there have been proposed workarounds, some of which forced a GC. But they seemed to be GC-specific and also had reports they didn't always work.

The best Jena can do is to detech it is running on Windows and log that deleteOld is being ignores(I don't know if that works if the host is linux, that is, a container, running on a Windows host with mounted host files)

@rvesse
Copy link
Member

rvesse commented Mar 20, 2024

@afs possible solution idea:

On Windows instead of attempting the delete write a marker file (.deleteme) after a successful compaction. Then on a subsequent startup scan through the old data directories (if any) and delete those with the marker file in place. WDYT?

@afs
Copy link
Member

afs commented Mar 20, 2024

In the stacktrace, it appears there is another process active in some way.

Deleting disk space is attempted for Windows;
On Windows, the space is only reclaimed (released to the file system) when the JVM exits. This "on exit", or sometimes earlier, but still delayed, is the long standing java issue.

The situation around compaction is different for Jena 5.0.0 if a previous JVM had a problem in compaction. In 4.x partial, broken compaction could get left around causing a subsequent JVM to encounter problems. That might be what is occurring in the stacktrace (it isn't conclusive though).

5.0.0 has compaction robustness - issue #2254.

PR #2304 added robustness by compacting to a temporary directory, moving it into place using filesystem atomic rename within the same directory. This is the commit step. Then it deletes the old directory before the compacted database is available to the application. (There is a period of vulnerability to external effects, not java factors, between rename and finishing the filesystem delete.)

#2315 reported that the storage directory can't be renamed into place on MS Windows. The atomic rename above does not work if the directory has memory mapped files.

#2321 changes MS Windows to write the new database storage into the correct place. It has different compaction robustness by a different clean-up.

A control file is written which lists the directory to be cleared up the next time this database is opened/connected to. Compaction writes the name of the new database storage directory (the Data-(N+1) directory) to the clean up file.

If compaction runs smoothly, the old directory, Data-N, is deleted; the space is only truly released by the OS when the JVM exits; this is unavoidable. But the system is using Data-(N+1), Data-(N+2), ... as compactions happen successfully.

If the compaction does go wrong, there is highest Data-N directory that can not be used. Its name is in the control file.

The Windows compaction code deletes the control file when the compaction has finished and before the switchover to the new compacted storage and before it is available to the application. Cleaning up the control file is the commit step. There is an equivalent period of vulnerability to external effects.

Code runs once per JVM per database before using a database for the first time in the JVM. That code understands both Windows-style and Linux-style compactions and so the database folder can still be moved between systems.

Please try Jena 5.0.0. After running many Windows github actions and not seeing an occurrence of failure of one of the tests in Jena 4.x that would occasionally fail, I hope this is robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants