Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SDK methods to fetch pachyderm configs [MD-406] #9348

Merged
merged 1 commit into from
May 20, 2024

Conversation

azhou-determined
Copy link
Contributor

@azhou-determined azhou-determined commented May 9, 2024

Ticket

Description

We want to expose pachyderm configs from the expconf for pachyderm integrations in the python SDK. These are basically just convenience methods for getting a specific field out of the experiment config. Pachyderm configs were added to the expconf in #8933 and take the following structure:

    integrations:
      pachyderm: 
        pachd:
          host: localhost
          port: 80
        proxy:
          scheme: http
          host: localhost
          port: 80
        dataset:
          branch: br
          commit: commit-hash
          project: proj
          repo: repo-name
          token: tkn

these are expected to be populated mostly manually by users today

Test Plan

  • Submit an experiment that a) has the integrations: pachyderm config defined in the expconf, and b) creates at least 1 checkpoint.
    For example, using the Core API script at examples/tutorials/core_api/2_checkpoints.py, with the following expconf:
    pach.yaml
name: metrics
entrypoint: python3 2_checkpoints.py

searcher:
   name: single
   metric: x
   max_length: 1

max_restarts: 0

integrations:
  pachyderm:
    pachd:
      host: localhost
      port: 80
    proxy:
      scheme: http
      host: localhost
      port: 80
    dataset:
      branch: pach-branch
      commit: test-pach-commit-id-12345
      project: pach-project
      repo: pach-repo
      token: testpachtoken123
  • after the exp completes, take note of the exp ID and trial ID, then run a script that tests the new python SDK methods:
import json

from determined import experimental


DET_MASTER = "XXXXX"
USER = "determined"
PASSWORD = "******"


def main():
    client = experimental.Determined(
        master=DET_MASTER,
        user=USER,
        password=PASSWORD,
    )
    experiment_id = 6503
    trial_id = 44486

    trial = client.get_trial(trial_id=trial_id)
    exp = trial.get_experiment()
    assert exp.id == experiment_id

    pach_conf = exp.get_pachyderm_config()
    print(json.dumps(pach_conf, indent=4))

    ckpt = trial.list_checkpoints()[0]
    print(ckpt.get_pachyderm_commit())


if __name__ == "__main__":
    main()

the commit and configs that are printed should match what was submitted in the experiment config.

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Copy link

netlify bot commented May 9, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 0fe9e05
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/663d6a0f6c80230008a88f13

Copy link

codecov bot commented May 10, 2024

Codecov Report

Attention: Patch coverage is 14.81481% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 45.11%. Comparing base (86aa319) to head (0fe9e05).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9348      +/-   ##
==========================================
- Coverage   45.16%   45.11%   -0.06%     
==========================================
  Files        1230     1229       -1     
  Lines      154523   154524       +1     
  Branches     2404     2404              
==========================================
- Hits        69788    69707      -81     
- Misses      84540    84622      +82     
  Partials      195      195              
Flag Coverage Δ
backend 41.74% <ø> (-0.06%) ⬇️
harness 63.97% <14.81%> (-0.14%) ⬇️
web 35.63% <14.81%> (-0.44%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
harness/determined/common/experimental/trial.py 54.50% <28.57%> (-1.11%) ⬇️
...ined/common/experimental/checkpoint/_checkpoint.py 66.66% <10.00%> (-2.93%) ⬇️
...rness/determined/common/experimental/experiment.py 69.53% <10.00%> (-2.42%) ⬇️

... and 10 files with indirect coverage changes

@@ -449,6 +449,23 @@ def get_metrics(self, group: Optional[str] = None) -> Iterable["metrics.TrialMet
for d in resp.metrics:
yield metrics.TrialMetrics._from_bindings(d, group)

def get_pachyderm_commit(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking question: Is there a reason why we're exposing the commit hash specifically? Is there a use case for retrieving this field that's different than the other fields in the pachyderm configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's mainly a product framing consideration. we want to make the integration seem smooth.

conceptually we're putting pachyderm integration on a determined checkpoint assuming the user wants to get the data associated with this checkpoint. my understanding is that the commit hash is the only thing you need to be able to get the data in pachyderm, so the idea here is to make the user journey between determined checkpoints and its pachyderm data obvious. it'd be weird if a determined checkpoint object returned the whole pachyderm config.

Copy link
Contributor

@corban-beaird corban-beaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to automate the test plan?

Copy link
Contributor

@corban-beaird corban-beaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@azhou-determined
Copy link
Contributor Author

Would it be possible to automate the test plan?

so, this could be made into an e2e test, but i'm not sure it's worth the additional cost to run. the change itself is rather small and logically trivial IMO. most of the existing expconf logic is tested elsewhere, this change just grabs a key out of it.

i had originally written unit/mock tests for these methods, but decided not to include them because mocking the API endpoints reduced the logic down to "pass in a mocked response dict, make sure the exact same dict is returned", and it felt like a silly test to include.

if we build out this pachyderm integration further to include logic beyond getting a key from a dict, then i think tests would definitely be warranted.

Copy link
Contributor

@gt2345 gt2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected

Copy link
Contributor

@jgongd jgongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@azhou-determined azhou-determined merged commit 0c42ced into main May 20, 2024
83 of 96 checks passed
@azhou-determined azhou-determined deleted the pach-config-sdk branch May 20, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants