Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an Example for BLEU. #806

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Neeshamraghav012
Copy link
Contributor

I would like to inform you that I have made some updates to the content. Firstly, I have added an example for BLEU which was not present before. This new example serves as an illustration of the concept and should be helpful for those who are new to this topic.

Furthermore, I have also fixed the URLs by wrapping them in markdown. This should make it easier for readers to access the resources mentioned in the content.

I would like to inform you that I have made some updates to the content. Firstly, I have added an example for BLEU which was not present before. This new example serves as an illustration of the concept and should be helpful for those who are new to this topic.

Furthermore, I have also fixed the URLs by wrapping them in markdown. This should make it easier for readers to access the resources mentioned in the content.
@Neeshamraghav012
Copy link
Contributor Author

@abheesht17 I have deleted the previous pull request because I messed up with it. Please review this one.

Copy link
Collaborator

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Most of them our minor, but there is one major comment regarding rank of inputs!

@@ -4,7 +4,7 @@
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
# https://www.apache.org/licenses/LICENSE-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove extra spaces.

>>> bleu = keras_nlp.metrics.Bleu(max_order=4)
>>> ref_sentence = "the quick brown fox jumps over the lazy dog"
>>> pred_sentence = "the quick brown fox jumps over the box"
>>> score = bleu([ref_sentence], [pred_sentence])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we mention that this example is with Python string inputs, this should ideally be

bleu([ref_sentence], pred_sentence)

"the quick brown fox jumps over the lazy frog"
]
>>> pred_sentence = ["the quick brown fox jumps over the box"]
>>> score = bleu(ref_sentence, pred_sentence)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assigning the score to a variable, let's just have

>>> bleu(ref_sentence, pred_sentence)
<tf.Tensor(0.7420885, shape=(), dtype=float32)>

Please do the same in other places as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the above example works, the ideal input formula is:

rank(ref_sentence) = rank(pred_sentence) + 1

I think we should probably stick to this formula in our examples. Otherwise, it becomes confusing for the reader. Could you please make this change everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the above example works, the ideal input formula is:

rank(ref_sentence) = rank(pred_sentence) + 1

I think we should probably stick to this formula in our examples. Otherwise, it becomes confusing for the reader. Could you please make this change everywhere?

Correct me if I am wrong, for this, do I need to change blue(ref_sentence, pred_sentence) to bleu([ref_sentence], pred_sentence)?

Comment on lines 127 to 139
c. RaggedTensor.
>>> bleu = keras_nlp.metrics.Bleu(max_order=4)
>>> ref_sentence = tf.ragged.constant([
[
"the quick brown fox jumps over the lazy dog",
"the quick brown fox jumps over the lazy frog"
]
])
>>> pred_sentence = tf.ragged.constant([
["the quick brown fox jumps over the box"]
])
>>> score = bleu(ref_sentence, pred_sentence)
<tf.Tensor(0.7420885, shape=(), dtype=float32)>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we mention that we are using ragged tensors here, it would be great if we could actually have a tensor with variable dimension along an axis. Also, I think your example will throw an error; pred_sentence should be a dense tensor here, not a ragged tensor.

So, something like:

>>> bleu = keras_nlp.metrics.Bleu(max_order=4)
>>> ref_sentence = tf.ragged.constant([
...         ["the quick brown fox jumps over the lazy dog"],
...         ["the quick brown fox jumps over the lazy dog", "the quick brown fox jumps over the lazy frog"]
...     ])
>>> pred_sentence = tf.constant([
...         ["the quick brown fox jumps over the box"],
...         ["the quick brown fox jumps over the box"]
...     ])
>>> bleu(ref_sentence, pred_sentence)
<tf.Tensor: shape=(), dtype=float32, numpy=0.7420885>

>>> score = bleu([ref_sentence], [pred_sentence])
<tf.Tensor(0.7420885, shape=(), dtype=float32)>

1.2. rank 1 inputs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change these to

rank 1 prediction inputs, rank 2 reference inputs

or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the Rouge L metric documentation, where this same convention is used. Is it okay to make these changes?

@abheesht17 abheesht17 requested review from abheesht17 and mattdangerw and removed request for abheesht17 March 8, 2023 03:42
@abheesht17
Copy link
Collaborator

@mattdangerw - for context, this is the previous PR: #799

Minor improvements are done!
@Neeshamraghav012
Copy link
Contributor Author

Hi @abheesht17 and @mattdangerw, I made the minor changes as you suggested. Thank you for your feedback!

However, I am not sure about the part where you mentioned changing the rank of inputs. Could you please provide more details or clarify what you meant by that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants