Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Kaggle example - Beginner classification #128

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

vanshhhhh
Copy link
Contributor

@vanshhhhh vanshhhhh commented Sep 2, 2022

Kaggle dataset used - Titanic - Machine Learning from Disaster
Category - Beginner
Type - Classification

Note -

  • In the link section (image attached below), the links given will work only when this PR merges..
    Screenshot 2022-08-25 191207
  • Alternatively, you can check the same example on my personal repository in which the "Run in Google Colab" link is working. I'd suggest viewing this example on Colab since some cell outputs are handled by limiting the cell size.

Mentor - @mirhyman
Reviewers - @random-forests, @markmcd

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word the title such that it explains what is in this doc, e.g. what task is being done (style guide).

The title will be used to list this doc in site navigation, so it needs to be descriptive and something a user wants to click on.


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this cell. You introduce the dataset further down, no need to do it twice.


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a beginner's guide to TensorFlow Decision Forests,  please refer to this tutorial.

Is this tutorial a beginner's guide? If so, there's no need to link somewhere else. If not, then we can remove a bunch of introductory concepts from this guide and focus it more specifically. (Alternatively, reword this link such that it explains the the difference - e.g. "For an introduction to [TFDF] without Kaggle, please refer to this tutorial"


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before you begin experimenting with neural networks.

Try not to down-play DFs. They are a production-ready option that provide a number of benefits over DNN-style architectures.

Perhaps here just write "...will often outperform neural networks."

Also, I think the first paragraph under "random forest" repeats much of the introduction paragraph. Can you remove the "Random Forest" heading and flow this section on from the introduction? I think we can keep the intro short & clear.


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try not to recommend libraries that are "competition". If there's a good reason to recommend SKL or one of the others, do so with context (e.g. "SKL supports $x algo, go check it out if you are interested.")

I think the second para can go too. We don't talk about how old things are (documentation should be timeless), and we've already introduced TFDF.


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the nice features about this particular hyperparameter is that larger values are usually better, and come with little risk aside from slowing down training.

Can you provide a citation for this?


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    plt.plot([log.num_trees for log in logs], [log.evaluation.accuracy for log in logs])

You have stated that this is OOB / test data, but the only dataset I can see used so far is train_ds (from model.fit(x=train_ds)). Is this test data or training data?


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, is this evaluation data or training data?

If it's from training data, we need to communicate that. If it's from validation data, what's the difference between this and the next cell?


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our docs need to be structured with a single H1 at the top - the title. Other headings start from H2.

Also I think it'd be clearer to have this section be "Use your model to make predictions". Using the test set makes sense, but pedagogically the user really just needs to know: 1) load data, 2) train model, 3) understand model fitness and now 4) use the model.


Reply via ReviewNB

@@ -0,0 +1,2105 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebooks need to run start-to-finish. If you want a user to do some work with incomplete or un-runnable code, consider using a code block in a markdown cell.

Some other options:

  • Rewrite this to show the user each step
  • Provide a short list of ideas (e.g. "For further exercise, try creating a model to do ...")

Keep in mind that these notebooks are published to tensorflow.org in a read-only format, so it probably doesn't make sense to say "Try it yourself, ... your code here ... etc" when there's no way a user can interact.


Reply via ReviewNB

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for this, this one was due to my own suggestion basing things off of some colabs I had seen previously!

I think I wasn't considering the fact, when supplying feedback to Vansh, that the notebooks would be published in a read-only format not similar to what is viewable on colab. Apologies for that confusion I caused myself!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all - if you want to make an interactive notebook instead, that's totally fine. We'll just need to exclude it from publication on tensorflow.org.

I do recommend getting it into a format we can use on the site though, you'll get a lot more visibility there.

Copy link
Member

markmcd commented Sep 6, 2022

Thanks for the guide - looks like a good start. I've made some comments that apply in several places, please be sure to fix them everywhere (e.g. style related comments).

Can you also ensure that you run nbfmt and nblint on your notebook to ensure stable diffs and that the right links/licenses/etc are present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants