Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10634. Recon - listKeys API for listing keys with optional filters #6658

Merged
merged 18 commits into from
May 26, 2024

Conversation

devmadhuu
Copy link
Contributor

@devmadhuu devmadhuu commented May 8, 2024

This PR adds a new API in Recon for listing keys for OBS buckets, Legacy buckets and FSO buckets with filters and recursively in a flat structure for OBS, LEGACY and FSO buckets.

New API:

api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket/&limit=105

Default values of API parameters if not provided:

  1. replicationType - empty string and filter will not be applied, so list out all keys irrespective of replication type.
  2. creationTime - empty string and filter will not be applied, so list out keys irrespective of age, else list out keys which got created on or after provided creationTime
  3. keySize - 0 bytes, which means all keys greater than zero bytes will be listed, effectively all.
  4. startPrefix - /
  5. prevKey - ""
  6. limit - 1000
    Behavior of API:
    For OBS bucket - list out limit number of keys on the provided path.
    This API will implement pagination support using prevKey and limit params.

Get List of All Keys:
GET /api/v1/keys/listKeys

API params:

  1. replicationType - Filter for RATIS or EC replication keys
  2. creationDate in "MM-dd-yyyy HH:mm:ss" string format.
  3. startPrefix
  4. prevKey
  5. limit
  6. keySize

Now lets consider, we have following OBS, LEGACY and FSO bucket key/files namespace tree structure

For OBS Bucket

  • /volume1/obs-bucket/key1
  • /volume1/obs-bucket/key1/key2
  • /volume1/obs-bucket/key1/key2/key3
  • /volume1/obs-bucket/key4
  • /volume1/obs-bucket/key5
  • /volume1/obs-bucket/key6

For LEGACY Bucket

  • /volume1/legacy-bucket/key
  • /volume1/legacy-bucket/key1/key2
  • /volume1/legacy-bucket/key1/key2/key3
  • /volume1/legacy-bucket/key4
  • /volume1/legacy-bucket/key5
  • /volume1/legacy-bucket/key6

For FSO Bucket

  • /volume1/fso-bucket/dir1/dir2/dir3
  • /volume1/fso-bucket/dir1/testfile
  • /volume1/fso-bucket/dir1/file1
  • /volume1/fso-bucket/dir1/dir2/testfile
  • /volume1/fso-bucket/dir1/dir2/file1
  • /volume1/fso-bucket/dir1/dir2/dir3/testfile
  • /volume1/fso-bucket/dir1/dir2/dir3/file1

Input Request for OBS bucket:

   `api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket&limit=2&replicationType=RATIS`

Output Response:

{
   "status": "OK",
   "path": "/volume1/obs-bucket",
   "replicatedDataSize": 20971520,
   "unReplicatedDataSize": 20971520,
   "keyCount": 2,
   "lastKey": "/volume1/obs-bucket/key1/key2",
   "keys": [
       {
           "key": "/volume1/obs-bucket/key1",
           "path": "key1",
           "inStateSince": 1715174266126,
           "size": 10485760,
           "replicatedSize": 10485760,
           "replicationInfo": {
               "replicationFactor": "ONE",
               "requiredNodes": 1,
               "replicationType": "RATIS"
           },
           "creationTime": 1715174266126,
           "modificationTime": 1715174267480,
           "isKey": true
       },
       {
           "key": "/volume1/obs-bucket/key1/key2",
           "path": "key1/key2",
           "inStateSince": 1715174269510,
           "size": 10485760,
           "replicatedSize": 10485760,
           "replicationInfo": {
               "replicationFactor": "ONE",
               "requiredNodes": 1,
               "replicationType": "RATIS"
           },
           "creationTime": 1715174269510,
           "modificationTime": 1715174270410,
           "isKey": true
       }
   ]
}

Input Request for FSO bucket:

       `api/v1/keys/listKeys?startPrefix=/volume1/fso-bucket&limit=2&replicationType=RATIS`

Output Response:

{
    "status": "OK",
    "path": "/volume1/fso-bucket",
    "replicatedDataSize": 62914560,
    "unReplicatedDataSize": 20971520,
    "keyCount": 2,
    "lastKey": "/-9223372036854775552/-9223372036854775040/-9223372036854774525/testfile",
    "keys": [
        {
            "key": "/-9223372036854775552/-9223372036854775040/-9223372036854774525/file1",
            "path": "file1",
            "inStateSince": 1715174237440,
            "size": 10485760,
            "replicatedSize": 31457280,
            "replicationInfo": {
                "replicationFactor": "THREE",
                "requiredNodes": 3,
                "replicationType": "RATIS"
            },
            "creationTime": 1715174237440,
            "modificationTime": 1715174238161,
            "isKey": true
        },
        {
            "key": "/-9223372036854775552/-9223372036854775040/-9223372036854774525/testfile",
            "path": "testfile",
            "inStateSince": 1715174234840,
            "size": 10485760,
            "replicatedSize": 31457280,
            "replicationInfo": {
                "replicationFactor": "THREE",
                "requiredNodes": 3,
                "replicationType": "RATIS"
            },
            "creationTime": 1715174234840,
            "modificationTime": 1715174235562,
            "isKey": true
        }
    ]
}

Input Request for Legacy bucket:

       `api/v1/keys/listKeys?startPrefix=/volume1/legacy-bucket&limit=2&replicationType=RATIS`

Output Response:

{
    "status": "OK",
    "path": "/volume1/legacy-bucket",
    "replicatedDataSize": 52428800,
    "unReplicatedDataSize": 52428800,
    "keyCount": 2,
    "lastKey": "/volume1/legacy-bucket/key1/key2",
    "keys": [
        {
            "key": "/volume1/legacy-bucket/key1",
            "path": "key1",
            "inStateSince": 1715174303702,
            "size": 10485760,
            "replicatedSize": 10485760,
            "replicationInfo": {
                "replicationFactor": "ONE",
                "requiredNodes": 1,
                "replicationType": "RATIS"
            },
            "creationTime": 1715174303702,
            "modificationTime": 1715174304619,
            "isKey": true
        },
        {
            "key": "/volume1/legacy-bucket/key1/key2",
            "path": "key1/key2",
            "inStateSince": 1715174306641,
            "size": 41943040,
            "replicatedSize": 41943040,
            "replicationInfo": {
                "replicationFactor": "ONE",
                "requiredNodes": 1,
                "replicationType": "RATIS"
            },
            "creationTime": 1715174306641,
            "modificationTime": 1715174307994,
            "isKey": true
        }
    ]
}

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10634

How was this patch tested?

Added Junit test cases and tested various assertions.

@devmadhuu devmadhuu changed the title branch_HDDS-10634. Recon - listKeys API for listing of OBS , FSO and Legacy bucket keys with filters. HDDS-10634. Recon - listKeys API for listing of OBS , FSO and Legacy bucket keys with filters. May 8, 2024
@adoroszlai adoroszlai changed the title HDDS-10634. Recon - listKeys API for listing of OBS , FSO and Legacy bucket keys with filters. HDDS-10634. Recon - listKeys API for listing keys with optional filters May 10, 2024
@devmadhuu
Copy link
Contributor Author

@sodonnel @sumitagrawl @ArafatKhan2198 @dombizita kindly review.

@devmadhuu devmadhuu marked this pull request as ready for review May 10, 2024 10:39
Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu Thanks for working, have few comments

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @devmadhuu Few comments for now Ill send in the next comments after some time.

Comment on lines +546 to +548
if (StringUtils.isEmpty(dateString)) {
return Instant.now().toEpochMilli();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also Validate dateFormat and timeZone parameters to ensure they are not null.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code flow, caller is passing dateFormat and timezone, so they cannot be null, however I have added null check as this method can be called by other callers as well.

* @param dateString
* @param dateFormat
* @param timeZone
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @return the epoch milliseconds representation of the date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Date date = sdf.parse(dateString);
return date.getTime(); // Convert to epoch milliseconds
} catch (ParseException parseException) {
log.error("Date parse exception for date: {} in format: {} -> {}", dateString, dateFormat, parseException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also log the exception details in the general Exception catch block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done.

Comment on lines 1409 to 1420
public Response validateNames(String resName)
throws IllegalArgumentException {
if (resName.length() < OzoneConsts.OZONE_MIN_BUCKET_NAME_LENGTH ||
resName.length() > OzoneConsts.OZONE_MAX_BUCKET_NAME_LENGTH) {
throw new IllegalArgumentException(
"Bucket or Volume name length should be between " +
OzoneConsts.OZONE_MIN_BUCKET_NAME_LENGTH + " and " +
OzoneConsts.OZONE_MAX_BUCKET_NAME_LENGTH);
}

if (resName.charAt(0) == '.' || resName.charAt(0) == '-' ||
resName.charAt(resName.length() - 1) == '.' ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move some the methods mentioned in this class to ReconUtils?
Like the validateNames here, which could be utilised by other endpoints as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sure.

* @param startPrefix The search prefix that was used.
* @return The response indicating that no keys matched the search prefix.
*/
private Response noMatchedKeysResponse(String startPrefix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an effective way to generate response messages. Can we create a new class named ResponseUtil and move these methods to it? This will help reduce the amount of code in the current class and make these utility methods available for other endpoints as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines +1360 to +1361
.filter(keySizeFilter)
.collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The anyMatch method can beused instead of collecting the filtered entries to a list and checking its size. This makes the code more efficient as the stream will short-circuit on the first match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, anyMatch just matches any one element in stream and terminate. We need apply filters on all elements. So this may not suit our usecase.

return matchedKeys;
}

private boolean applyFilters(Table.KeyValue<String, OmKeyInfo> entry, ParamInfo paramInfo) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are already propagating the throws IOException upwards do we require the try and catch block to handle the IOException?

We could further simplify the code in the method by removing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need for defining the predicate definition, else compiler gives error.

Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments @devmadhuu

Comment on lines +740 to +752
* {
* "status": "OK",
* "path": "/volume1/obs-bucket",
* "replicatedDataSize": 62914560,
* "unReplicatedDataSize": 62914560,
* "lastKey": "/volume1/obs-bucket/key6",
* "keys": [
* {
* "key": "/volume1/obs-bucket/key1",
* "path": "volume1/obs-bucket/key1",
* "size": 10485760,
* "replicatedSize": 10485760,
* "replicationInfo": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the structure of the JSON response remains consistent across different bucket types, there is no need to show multiple example responses for each bucket type. Instead, we can show one example response and clarify that this structure applies to all bucket types (OBS, Legacy, and FSO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep OBS and FSO, as output values different in path, so that it is evident from javadoc comment. I have removed for LEGACY buckets.

* @param ids The IDs to construct the object path with.
* @return The constructed object path.
*/
private String constructObjectPathWithPrefix(long... ids) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can also be placed inside ReconUtils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -300,5 +300,4 @@ public OmDirectoryInfo getDirInfo(String[] names) throws IOException {
.getDirectoryTable().getSkipCache(dirKey);
return dirInfo;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please Remove this unwanted change, from the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line space was not needed. So removed.

@@ -785,10 +1103,284 @@ public void testGetDirectorySizeInfo() throws Exception {
assertEquals(18L, keyInsightInfoResp.getUnreplicatedDataSize());
}

@Test
public void testListKeysFSOBucket() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing test cases cover the following main scenarios which is great!

  • Listing keys at different levels (bucket, directory) for FSO and OBS buckets.
  • Verifying the number of keys returned, paths, keys, replication types, and last keys.
  • Testing pagination with different limits and across multiple pages.
  • Filtering keys based on replication type, creation date, and key size.
  • Handling edge cases like empty results and last page with an empty last key.

Can we write a few more methods that test out these simple scenarios as well.
Empty Buckets: Verify the behaviour when the bucket is empty.
Non-existent Paths: Test with paths that do not exist to ensure the API handles such cases gracefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok done.

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu few more comments, please check

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu Given few comments

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@Arafat2198 Arafat2198 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @devmadhuu
The patch looks good!
LGTM +1

@sumitagrawl sumitagrawl merged commit 40951a4 into apache:master May 26, 2024
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants