-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-8784. trigger compaction outside of volume check. #6611
base: master
Are you sure you want to change the base?
Conversation
dbVolumeSet != null ? 1 : | ||
dnConf.getAutoCompactionSmallSstFileExecutors(), | ||
new ThreadFactoryBuilder().setNameFormat( | ||
"RocksDB Compact Thread-%d").build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datanodeDetails.threadNamePrefix() + "RocksDBCompactionThread-%d"
HddsVolume hddsVolume = (HddsVolume) volume; | ||
CompletableFuture.runAsync(hddsVolume::compactDb, compactExecutor); | ||
// If set dbVolumeSet only need to execute the compact db once | ||
if (dbVolumeSet != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully understand this. For every HddsVolume, there will be one RocksDB need compaction. It's true even when dbVolume is configured.
if (SchemaV3.isFinalizedAndEnabled(config)) { | ||
HddsVolumeUtil.loadAllHddsVolumeDbStore( | ||
volumeSet, dbVolumeSet, false, LOG); | ||
if (dnConf.autoCompactionSmallSstFile()) { | ||
this.compactExecutor = Executors.newScheduledThreadPool( | ||
dbVolumeSet != null ? 1 : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this thread pool is to accelerate the db compaction. So the thread count matters when there are multiple rocksdb to compact. The rocksdb and HddsVolume has a 1:1 mapping. So evern dbVolumeSet is set, if there are 10 HddsVolume configured, there will be 10 RocksDB to compact. So there I think dbVolumeSet null check is not required.
private long autoCompactionSmallSstFileIntervalMinutes = | ||
AUTO_COMPACTION_SMALL_SST_FILE_INTERVAL_MINUTES_DEFAULT; | ||
|
||
@Config(key = "rocksdb.auto-compaction-small-sst-file.executors", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rocksdb.auto-compaction-small-sst-file.executors" ->
"rocksdb.auto-compaction-small-sst-file.threads"
@@ -121,7 +124,7 @@ public class OzoneContainer { | |||
private final ReplicationServer replicationServer; | |||
private DatanodeDetails datanodeDetails; | |||
private StateContext context; | |||
|
|||
private ScheduledExecutorService compactExecutor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compactExecutor -> dbCompactionExecutorService .
public boolean compactDb() { | ||
File dbDir = getDbParentDir(); | ||
File dbFile = new File(dbDir, CONTAINER_DB_NAME); | ||
if (dbFile.exists() && dbFile.canRead()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to check exists and canRead, which is covered in check() already. If it's checked here, then you should mark the volume as failure if check failed. This boolean result is not used any where, maybe a void return is better.
AUTO_COMPACTION_SMALL_SST_FILE_INTERVAL_MINUTES_DEFAULT; | ||
|
||
@Config(key = "rocksdb.auto-compaction-small-sst-file.executors", | ||
defaultValue = "1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current RocksDB compaction is run in volume checker, which uses a unlimited thread pool( Executors.newCachedThreadPool). Not sure whether by default 1 thread will have performance impact to DN or not.
What changes were proposed in this pull request?
Currently RocksDB compaction is triggered for schema v3 RocksDBs (one DB per volume) as part of the volume check. Periodic manual compaction is necessary because import of sst files as part of container replication can lead to many small sst files in RocksDB.
This operation may slow down the volume check and can have unintended consequences if the minimum gap between volume checks is set very high or very low. Ideally compaction should be triggered on its own independently controlled thread.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8784
How was this patch tested?
UT: org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer#testAutoCompactionSmallSstFile