Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new artifacts colab. #526

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Adds new artifacts colab. #526

wants to merge 20 commits into from

Conversation

katjacksonWB
Copy link

Adds a new Artifacts colab to replace the old, outdated one linked on the Artifacts landing page.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

github-actions bot commented May 15, 2024

Thanks for contributing to wandb/examples!
We appreciate your efforts in opening a PR for the examples repository. Our goal is to ensure a smooth and enjoyable experience for you 😎.

Guidelines

The examples repo is regularly tested against the ever-evolving ML stack. To facilitate our work, please adhere to the following guidelines:

  • Notebook naming: You can use a combination of snake_case and CamelCase for your notebook name. Avoid using spaces (replace them with _) and special characters (&%$?). For example:
Cool_Keras_integration_example_with_weights_and_biases.ipynb 

is acceptable, but

Cool Keras Example with W&B.ipynb

is not. Avoid spaces and the & character. To refer to W&B, you can use: weights_and_biases or just wandb (it's our library, after all!)

  • Managing dependencies within the notebook: You may need to set up dependencies to ensure that your code works. Please avoid the following practices:

    • Docker-related activities. If Docker installation is required, consider adding a full example with the corresponding Dockerfile to the wandb/examples/examples folder (where non-Colab examples reside).
    • Using pip install as the primary method to install packages. When calling pip in a cell, avoid performing other tasks. We automatically filter these types of cells, and executing other actions might break the automatic testing of the notebooks. For example,
    pip install -qU wandb transformers gpt4
    

    is acceptable, but

    pip install -qU wandb
    import wandb

    is not.

    • Installing packages from a GitHub branch. Although it's acceptable 😎 to directly obtain the latest bleeding-edge libraries from GitHub, did you know that you can install them like this:
    !pip install -q git+https://github.com/huggingface/transformers

    You don't need to clone, then cd into the repo and install it in editable mode.

    • Avoid referencing specific Colab directories. Google Colab has a /content directory where everything resides. Avoid explicitly referencing this directory because we test our notebooks with pure Jupyter (without Colab). Instead, use relative paths to make the notebook reproducible.
  • The Jupyter notebook file .ipynb is nothing more than a JSON file with primarily two types of cells: markdown and code. There is also a bunch of other metadata specific to Google Colab. We have a set of tools to ensure proper notebook formatting. These tools can be found at wandb/nb_helpers.

Before merging, wait for a maintainer to clean and format the notebooks you're adding. You can tag @tcapelle.

Before marking the PR as ready for review, please run your notebook one more time. Restart the Colab and run all. We will provide you with links to open the Colabs below

The following colabs were changed
-colabs/artifact_basics/Artifact_Basics.ipynb

@noaleetz
Copy link
Contributor

Hey @katjacksonWB - some feedback from reviewing the whole thing:

  • I think it's important to cover how to version an artifact because if not the colab doesn't really show the utility of logging your stuff to an artifact. It can be a minimal example, like adding a few new images and showing that a new version is created
  • the colab should also showcase how someone can navigate to the UI for the artifact logged and link to a public project that the user can look at the understand how the actions done in colab reflect in UI (the SDK prints out a URL so we should show user how to find this to get to the UI from their log_artifact command)

Copy link
Contributor

@noaleetz noaleetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment with some requested changes!

Copy link
Contributor

@noaleetz noaleetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left another round of comments!

@tcapelle
Copy link
Collaborator

Ping me when ready for final review/merge

@noaleetz noaleetz requested review from rymc and removed request for moredatarequired May 28, 2024 19:10
@ngrayluna ngrayluna requested a review from tcapelle May 31, 2024 19:22
Copy link
Collaborator

@tcapelle tcapelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't show the classic:

# you can log using the one-liner:
wandb.log_artifact("file.csv", name="my_artifact", type="data")

# or
at = Artifact(name="my_artifact",  type="data")
at.add_file("file.csv")
# add_dir(...)
wandb.log_artifact(at)
  • Shouldn't this live in wandb-artifacts?
  • Please also replace the "old outdated one"

@noaleetz
Copy link
Contributor

Why we don't show the classic:

# you can log using the one-liner:
wandb.log_artifact("file.csv", name="my_artifact", type="data")

# or
at = Artifact(name="my_artifact",  type="data")
at.add_file("file.csv")
# add_dir(...)
wandb.log_artifact(at)
  • Shouldn't this live in wandb-artifacts?
  • Please also replace the "old outdated one"

is referring to a specific line?

Copy link
Contributor

@noaleetz noaleetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last request on lineage

"source": [
"You can also manage your Artifacts via the W&B platform. This can give you insight into your model's performance or dataset versioning. To navigate to the relevant information, click this [link](https://wandb.ai/wandb/artifact-basics/overview), then click on the **Artifacts** tab.\n",
"\n",
"Navigating to the **Lineage** section in the tab will show the dependency graph formed by calling `run.use_artifact()` when an Artifact is an input to a run, and `run.log_artifact()` when an Artifact is output to a run. This helps visualize the relationship between different model versions and other objects like datasets and jobs in your project. Click [this](https://wandb.ai/wandb/artifact-basics/artifacts/dataset/my_first_artifact/v0/lineage) link to navigate to the project's lineage page."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure we include a screenshot of a more complex lineage for a user to explore, and also link the relevant project (probably the artifact workflow project)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, I am missing some screenshots. Add those or from the docs or upload them as files in the same folder.

" inplace=True)\n",
"csvData.to_csv(\"/content/sample_data/california_housing_test.csv\") # overwrites file with the sorted data\n",
"# adds the new file to the artifact\n",
"run = wandb.init(project=\"artifact-basics\")\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think would be better here to init the run at the start of the code block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • maybe split in 2 cells.
  • csv_data instead of csvData.

"cell_type": "markdown",
"metadata": {},
"source": [
"Now the sorted file will be logged in `my_first_artifact`. Any changes you log to an artifact will overwrite any older version. \n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm the wording of overwriting old versions may be confusing to users (line 228 and 236) as we don't really overwrite the old version, we create a new version instead. Overwriting to me implies it replaces the previous.

"artifact = run.use_artifact(artifact_or_name=\"my_first_artifact:latest\")\n",
"# This will download the specified artifact to where your code is running\n",
"datadir = artifact.download()\n",
"run.finish()\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving the run.finish() to the end of the block as it is usually better practice (e.g., in this case, it would capture what is printed out)

"source": [
"You can also manage your Artifacts via the W&B platform. This can give you insight into your model's performance or dataset versioning. To navigate to the relevant information, click this [link](https://wandb.ai/wandb/artifact-basics/overview), then click on the **Artifacts** tab.\n",
"\n",
"Navigating to the **Lineage** section in the tab will show the dependency graph formed by calling `run.use_artifact()` when an Artifact is an input to a run, and `run.log_artifact()` when an Artifact is output to a run. This helps visualize the relationship between different model versions and other objects like datasets and jobs in your project. Click [this](https://wandb.ai/wandb/artifact-basics/artifacts/dataset/my_first_artifact/v0/lineage) link to navigate to the project's lineage page."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following wording here may be a little confusing to some users:

"run.log_artifact() when an Artifact is output to a run"

I think "to a run" should be "of a run"?

@tcapelle
Copy link
Collaborator

Why we don't show the classic:

# you can log using the one-liner:
wandb.log_artifact("file.csv", name="my_artifact", type="data")

# or
at = Artifact(name="my_artifact",  type="data")
at.add_file("file.csv")
# add_dir(...)
wandb.log_artifact(at)
  • Shouldn't this live in wandb-artifacts?
  • Please also replace the "old outdated one"

is referring to a specific line?

This plan and the code don't match:
image

You are doing 2+3+4 in one line.

@tcapelle
Copy link
Collaborator

I would also use this PR to remove/replace old stuff in wandb-artifacts (and put this file in there as a getting started)

"outputs": [],
"source": [
"!pip install wandb\n",
"import wandb\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split pip install from import wandb please

"The general workflow for creating an Artifact is:\n",
"\n",
"\n",
"1. Intialize a run.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just feel like we are not following with code this plan.

" inplace=True)\n",
"csvData.to_csv(\"/content/sample_data/california_housing_test.csv\") # overwrites file with the sorted data\n",
"# adds the new file to the artifact\n",
"run = wandb.init(project=\"artifact-basics\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • maybe split in 2 cells.
  • csv_data instead of csvData.

"source": [
"run = wandb.init(project=\"artifact-basics\")\n",
"artifact = run.use_artifact(artifact_or_name=\"my_first_artifact:latest\")\n",
"# This will download the specified artifact to where your code is running\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add blanck line before the comment for readability

"source": [
"You can also manage your Artifacts via the W&B platform. This can give you insight into your model's performance or dataset versioning. To navigate to the relevant information, click this [link](https://wandb.ai/wandb/artifact-basics/overview), then click on the **Artifacts** tab.\n",
"\n",
"Navigating to the **Lineage** section in the tab will show the dependency graph formed by calling `run.use_artifact()` when an Artifact is an input to a run, and `run.log_artifact()` when an Artifact is output to a run. This helps visualize the relationship between different model versions and other objects like datasets and jobs in your project. Click [this](https://wandb.ai/wandb/artifact-basics/artifacts/dataset/my_first_artifact/v0/lineage) link to navigate to the project's lineage page."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, I am missing some screenshots. Add those or from the docs or upload them as files in the same folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants