-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-8101. Add FSO repair tool to ozone CLI in read-only and repair modes. #6608
base: master
Are you sure you want to change the base?
Conversation
cc. @errose28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DaveTeng0 for the patch.
Some comments about POM and CLI. Note: I haven't checked the code of the tool itself (FSORepairTool
).
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/FSORepairCLI.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to decide what the CLI for this should look like. We could do ozone {debug,repair} fso-tree
or ozone repair fso-tree [--dry-run]
. Also as we add more of these type of commands I think ones that are specific to a component should be under their own subcommand for organization, like ozone repair om fso-tree
.
Attila also brought up the --dry-run
mode. I think if the command is under repair
only, then dry run would not be the expected default value. If we add the read-only invocation under debug
then that becomes the equivalent of dry run and no flag is needed.
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestFSORepairTool.java
Show resolved
Hide resolved
…d base class FSOBaseCLI and FSOBaseTool
Yeah! extracted common codes between FSODebugCLI and FSORepairCLI to separated base classes FSOBaseCLI and FSOBaseTool, and make them reuse same logic. |
Hello team! please feel free to let me know if there is any new comment~ Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DaveTeng0 I just did a quick pass on the code outside of the main repair logic.
There seems to be a lot of failures on your branch, although I haven't looked deeply and it could just be a bad CI run. Can you get that to a green state?
For the CLI, If we go the route of different debug and repair commands, I think we want ozone debug om fso-tree
and ozone repair om fso-tree
to be the two options. Just having ozone repair om fso-tree [--dry-run]
isn't a bad approach either. I think I slightly prefer that one and it prevents the need for an extra om
subcommand under each option (for now at least), but I'm ok with two separate commands as well.
@@ -87,6 +89,11 @@ public static ManagedRocksDB open( | |||
); | |||
} | |||
|
|||
public static ManagedRocksDB open(final String path) throws RocksDBException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually want to use RocksDatabase
here, which wraps ManagedRocksDB
. Looks like it should already have all the operations we need implemented except drop column family, which would need to be added to RocksDatabase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, makes sense! updated to use RocksDatabase class.
public Void call() throws Exception { | ||
|
||
try { | ||
// TODO case insensitive enum options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add this to the change, it should be easy to support. I think this was copy/pasted into the other child classes too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use boolean value '--dry-run' instead.
/** | ||
* Parser for scm.db file. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an old comment that needs to be updated. Child classes have this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on another suggestion, this class is removed.
"INFO and DEBUG levels." | ||
) | ||
@MetaInfServices(SubcommandWithParent.class) | ||
public class FSOBaseCLI implements Callable<Void>, SubcommandWithParent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there's enough shared code to warrant a base class for debug and repair. The shared CLI flags like --db
and --verbose
can be used in both commands with PicoCLI mixins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on another suggestion, this class is removed.
} | ||
|
||
if (verbose) { | ||
System.out.println("FSO inspection finished. See client logs for results."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated as well since I think in a previous comment we decided to output everything to stdout and use --verbose
to change the level of output. This is more user friendly than log4j for client output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense! updated to use log4j for here.
…h wraps ManagedRocksDB
What changes were proposed in this pull request?
Bugs like HDDS-7592 can break the FSO tree and cause data to be orphaned in the OM. We have developed a tool to identify and repair this condition in the OM and tested it on affected clusters. This jira is to contribute the tool back to the community under the ozone CLI.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8101
How was this patch tested?
Unit test, integration test.