Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-8784. trigger compaction outside of volume check. #6611

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

guohao-rosicky
Copy link
Contributor

@guohao-rosicky guohao-rosicky commented Apr 30, 2024

What changes were proposed in this pull request?

Currently RocksDB compaction is triggered for schema v3 RocksDBs (one DB per volume) as part of the volume check. Periodic manual compaction is necessary because import of sst files as part of container replication can lead to many small sst files in RocksDB.

This operation may slow down the volume check and can have unintended consequences if the minimum gap between volume checks is set very high or very low. Ideally compaction should be triggered on its own independently controlled thread.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8784

How was this patch tested?

UT: org.apache.hadoop.ozone.container.keyvalue.TestKeyValueContainer#testAutoCompactionSmallSstFile

@adoroszlai adoroszlai marked this pull request as draft May 6, 2024 11:17
@guohao-rosicky guohao-rosicky marked this pull request as ready for review May 8, 2024 06:04
dbVolumeSet != null ? 1 :
dnConf.getAutoCompactionSmallSstFileExecutors(),
new ThreadFactoryBuilder().setNameFormat(
"RocksDB Compact Thread-%d").build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datanodeDetails.threadNamePrefix() + "RocksDBCompactionThread-%d"

HddsVolume hddsVolume = (HddsVolume) volume;
CompletableFuture.runAsync(hddsVolume::compactDb, compactExecutor);
// If set dbVolumeSet only need to execute the compact db once
if (dbVolumeSet != null) {
Copy link
Contributor

@ChenSammi ChenSammi May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully understand this. For every HddsVolume, there will be one RocksDB need compaction. It's true even when dbVolume is configured.

if (SchemaV3.isFinalizedAndEnabled(config)) {
HddsVolumeUtil.loadAllHddsVolumeDbStore(
volumeSet, dbVolumeSet, false, LOG);
if (dnConf.autoCompactionSmallSstFile()) {
this.compactExecutor = Executors.newScheduledThreadPool(
dbVolumeSet != null ? 1 :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this thread pool is to accelerate the db compaction. So the thread count matters when there are multiple rocksdb to compact. The rocksdb and HddsVolume has a 1:1 mapping. So evern dbVolumeSet is set, if there are 10 HddsVolume configured, there will be 10 RocksDB to compact. So there I think dbVolumeSet null check is not required.

private long autoCompactionSmallSstFileIntervalMinutes =
AUTO_COMPACTION_SMALL_SST_FILE_INTERVAL_MINUTES_DEFAULT;

@Config(key = "rocksdb.auto-compaction-small-sst-file.executors",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rocksdb.auto-compaction-small-sst-file.executors" ->

"rocksdb.auto-compaction-small-sst-file.threads"

@@ -121,7 +124,7 @@ public class OzoneContainer {
private final ReplicationServer replicationServer;
private DatanodeDetails datanodeDetails;
private StateContext context;

private ScheduledExecutorService compactExecutor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compactExecutor -> dbCompactionExecutorService .

public boolean compactDb() {
File dbDir = getDbParentDir();
File dbFile = new File(dbDir, CONTAINER_DB_NAME);
if (dbFile.exists() && dbFile.canRead()) {
Copy link
Contributor

@ChenSammi ChenSammi May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to check exists and canRead, which is covered in check() already. If it's checked here, then you should mark the volume as failure if check failed. This boolean result is not used any where, maybe a void return is better.

AUTO_COMPACTION_SMALL_SST_FILE_INTERVAL_MINUTES_DEFAULT;

@Config(key = "rocksdb.auto-compaction-small-sst-file.executors",
defaultValue = "1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current RocksDB compaction is run in volume checker, which uses a unlimited thread pool( Executors.newCachedThreadPool). Not sure whether by default 1 thread will have performance impact to DN or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants