Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] How to Improve the Readability and Composability of the NativeLink Config JSON part 1 #834

Open
MarcusSorealheis opened this issue Apr 5, 2024 · 6 comments · May be fixed by #889

Comments

@MarcusSorealheis
Copy link
Collaborator

MarcusSorealheis commented Apr 5, 2024

Currently, one area of confusion stems from the fact that we accept user-defined keys (in other words, dynamic keys) for the names of stores and other entities. It's fine to have user-defined names, but they should be anchored to a non-dynamic key, and added as fields. Here is an example config in our current state:

"AC_MAIN_STORE" and "FS_CONTENT_STORE" are both user-supplied values.

{
  "stores": {
    "AC_MAIN_STORE": {
        "filesystem": {}
    },
    "FS_CONTENT_STORE": {
        "filesystem": {}
    }
  }
}

Below is an improved version of the same config where:

  • the name is dynamic but the key is always name

  • the stores objects are nested in an array because multiple objects should be in an array

    • and must be when they share a key (e.g. "name")
  • the details about each store are embedded in a self-describing key called options.

The sum of all these changes results in many improvements.

{
  "stores": [
    {
      "name": "AC_MAIN_STORE",
      "options": {
        "filesystem": {
            "content_path": "/tmp/nativelink/data-worker-test/content_path-ac",
            "temp_path": "/tmp/nativelink/data-worker-test/tmp_path-ac"
            ...
        }
      }
    },
    {
      "name": "FS_CONTENT_STORE",
      "options": {
        "filesystem": {}
      }
    }
  ]
}
@aaronmondal
Copy link
Contributor

aaronmondal commented Apr 5, 2024

I like this. It reminds me of how such configs would look like in YAML.

Some ideas (which effectively can be compiled down to json):

GHA style (seems nice and concise initially, but the more I think about it the less convinced I get as it seems very easy to mess up when configs become more complex):

---
stores:
  - name: ac-main-store-fs
    uses: filesystem-store
    with:
      contentPath: /tmp/nativelink/data-worker-test/content_path-ac
      tempPath: /tmp/nativelink/data-worker-test/tmp_path-ac
  - name: ac-main-store-memory
    uses: memory-store
    with:
      maxSize: 20G
  - name: ac-main-store
    uses: fast-slow-store
    with:
      fast: ac-main-store-memory
      slow: ac-main-store-fs

K8s style (seems overly verbose initially, but the more I think about it the more I like it as it provides a clear, familiar structure and a standardized way to iterate on the API):

---
apiVersion: nativelink/v1alpha1
kind: FilesystemStore
metadata:
  name: ac-main-store-fs
spec:
  contentPath: /tmp/nativelink/data-worker-test/content_path-ac
  tempPath: /tmp/nativelink/data-worker-test/tmp_path-ac
---
apiVersion: nativelink/v1alpha1
kind: MemoryStore
metadata:
  name: ac-main-store-memory
spec:
  maxSize: 20G
---
apiVersion: nativelink/v1alpha1
kind: FastSlowStore
metadata:
  name: ac-main-store
spec:
  fast: ac-main-store-memory
  slow: ac-main-store-fs

@MarcusSorealheis
Copy link
Collaborator Author

I thought about this for a long time and the only possible downside of the new format is that lookup in the vector format (example 2) changes from direct access to O(n) time. However, given that we will never have more than 5 or 10 stores in NativeLink, I don't see this as a material concern. In the worst case scenario, 100 stores would still not cause any negligible performance problems.

@allada
Copy link
Collaborator

allada commented Apr 18, 2024

In this case the Big-O complexity doesn't change. We have to translate these configs into the implementation which requires iterating through them all at startup time. Internally it'll still hold them as a HashMap.

I'm fine with this, the only bikeshed I have is "options". I would like to find a more intuitive name... maybe "config" or something?

@blizzardc0der
Copy link
Contributor

I'd like to work on this issue.

@blizzardc0der blizzardc0der linked a pull request Apr 26, 2024 that will close this issue
7 tasks
@blizzardc0der
Copy link
Contributor

blizzardc0der commented Apr 27, 2024

I've questions while I was working on this issue.
I just had felt that schedulers had the same issue as stores and I think it needs to updated as well.

Before:

"schedulers": {
    "MAIN_SCHEDULER": {
      "simple": {
        "supported_platform_properties": {
          "cpu_count": "minimum",
          "memory_kb": "minimum",
          ...
        }
      }
    }
  }

After:

"schedulers": [
    "name": "MAIN_SCHEDULER",
    "config": {
      "simple": {
        "supported_platform_properties": {
          "cpu_count": "minimum",
          "memory_kb": "minimum",
          ...
        }
      }
    }
  ]

Do we need this change in this PR?

cc: @MarcusSorealheis , @aaronmondal , @allada , @adam-singer

@blizzardc0der
Copy link
Contributor

Okay, I already made PR for both stores and schedulers.

#889

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants