Update MaskedLMHead to support dtype=bfloat16/float16/float64 #1197

g-dspencer · 2023-08-04T18:39:06Z

Update MaskedLMHead to support dtype=bfloat16/float16/float64.

Inspired by keras-team/keras@397ad57
i.e. using the idiom (?) of dtype=self._dtype_policy.

This is to fix #1195

I had a previous try at this where I accidentally included print statements, sorry.

mattdangerw

Thanks!

mattdangerw · 2023-08-08T17:36:16Z

keras_nlp/layers/modeling/masked_lm_head.py

@@ -153,9 +153,11 @@ def build(self, inputs_shape, masked_positions_shape=None):
 activation=self.intermediate_activation,
 kernel_initializer=self.kernel_initializer,
 bias_initializer=self.bias_initializer,
+ dtype=self._dtype_policy,


It looks like self.dtype_policy should just work. Can we do that?

https://github.com/keras-team/keras/blob/b3ffea6602dbbb481e82312baa24fe657de83e11/keras/engine/base_layer.py#L2213C9-L2218

mattdangerw · 2023-08-08T17:38:07Z

keras_nlp/layers/modeling/masked_lm_head_test.py

@@ -36,6 +39,30 @@ def test_valid_call(self):
 position_data = ops.random.randint(minval=0, maxval=10, shape=(4, 5))
 model((token_data, position_data))

+ @parameterized.named_parameters(
+ ("bfloat16", tf.bfloat16),


because we now run our testing suite with jax/torch/tf with keras-core, we are generally just referring to these by string name, e.g. "float16" instead of tf.float16.

Does anything break if we switch to that?

mattdangerw · 2023-08-08T17:40:04Z

keras_nlp/layers/modeling/masked_lm_head_test.py

@@ -119,6 +146,32 @@ def test_one_train_step(self):
 loss = model.train_on_batch(x=(token_data, position_data), y=label_data)
 self.assertGreater(loss, 0)

+ @parameterized.named_parameters(


I would kill this test. Compiling a real loss function can make for slower tests, and with the parameterized testing this could slow down our suite.

mattdangerw · 2023-08-08T17:41:08Z

keras_nlp/layers/modeling/masked_lm_head.py

@@ -153,9 +153,11 @@ def build(self, inputs_shape, masked_positions_shape=None):
 activation=self.intermediate_activation,


It seems like we should really have this for all our our "composite" layers in KerasNLP, right?

token and position embedding

transformer decoder

transformer encoder

cached multi head attention

f net encoder

Are you interested in following up for other layers? (same PR or split PRs fine!)

mattdangerw · 2023-08-08T18:21:36Z

keras_nlp/layers/modeling/masked_lm_head_test.py

+ )
+ encoded_tokens = keras.Input(shape=(10, 16))
+ positions = keras.Input(shape=(5,), dtype="int32")
+ outputs = head(encoded_tokens, masked_positions=positions)


This might need a rebase over master. This should be mask_positions now. This is causing a lot of test failures.

g-dspencer added 2 commits August 4, 2023 18:22

Update MaskedLMHead to support dtype=bfloat16/float16/float64.

50697e5

update

ff0fede

mattdangerw reviewed Aug 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MaskedLMHead to support dtype=bfloat16/float16/float64 #1197

Update MaskedLMHead to support dtype=bfloat16/float16/float64 #1197

g-dspencer commented Aug 4, 2023

mattdangerw left a comment

mattdangerw Aug 8, 2023

mattdangerw Aug 8, 2023

mattdangerw Aug 8, 2023

mattdangerw Aug 8, 2023 •

edited

mattdangerw Aug 8, 2023

		@@ -153,9 +153,11 @@ def build(self, inputs_shape, masked_positions_shape=None):
		activation=self.intermediate_activation,

Update MaskedLMHead to support dtype=bfloat16/float16/float64 #1197

Are you sure you want to change the base?

Update MaskedLMHead to support dtype=bfloat16/float16/float64 #1197

Conversation

g-dspencer commented Aug 4, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Aug 8, 2023

Choose a reason for hiding this comment

mattdangerw Aug 8, 2023

Choose a reason for hiding this comment

mattdangerw Aug 8, 2023

Choose a reason for hiding this comment

mattdangerw Aug 8, 2023 • edited

Choose a reason for hiding this comment

mattdangerw Aug 8, 2023

Choose a reason for hiding this comment

mattdangerw Aug 8, 2023 •

edited