Need help to understand the logic here #19579

IshanRattan · 2024-04-21T07:54:04Z

latent_dim = 2048
num_heads = 8

encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="encoder_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)
encoder_outputs = TransformerEncoder(embed_dim, latent_dim, num_heads)(x)
encoder = keras.Model(encoder_inputs, encoder_outputs)

decoder_inputs = keras.Input(shape=(None,), dtype="int64", name="decoder_inputs")
encoded_seq_inputs = keras.Input(shape=(None, embed_dim), name="decoder_state_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, latent_dim, num_heads)(x, encoded_seq_inputs)
x = layers.Dropout(0.5)(x)
decoder_outputs = layers.Dense(vocab_size, activation="softmax")(x)
decoder = keras.Model([decoder_inputs, encoded_seq_inputs], decoder_outputs)

decoder_outputs = decoder([decoder_inputs, encoder_outputs])
transformer = keras.Model(
    [encoder_inputs, decoder_inputs], decoder_outputs, name="transformer"
)

Hi,

I have taken this code from "English-to-Spanish translation with a sequence-to-sequence Transformer" in Keras examples. I am unable to understand the reason for the code below.

decoder = keras.Model([decoder_inputs, encoded_seq_inputs], decoder_outputs)
decoder_outputs = decoder([decoder_inputs, encoder_outputs])

Why did we not simply do this?(This code is taken from Deep Learning with Python Second Edition)

decoder_outputs = layers.Dense(vocab_size, activation='softmax')(x)

The text was updated successfully, but these errors were encountered:

sachinprasadhs · 2024-04-25T01:04:45Z

Below is the same code from the keras example, I think you might have got confused between the multiple occurrence of decoder_outputs.
First decoder_outputs = layers.Dense(vocab_size, activation='softmax')(x) is part of the Softmax in the decoder part( refer architecture below)
Second decoder_outputs = decoder([decoder_inputs, encoder_outputs]) is the part where the output is formed with the decoder_inputs and encoder_outputs which will have the encoder context (follow arrow flowing from encoder part to decoder from below architecture)

x = layers.Dropout(0.5)(x)
decoder_outputs = layers.Dense(vocab_size, activation="softmax")(x)
decoder = keras.Model([decoder_inputs, encoded_seq_inputs], decoder_outputs)

decoder_outputs = decoder([decoder_inputs, encoder_outputs])
transformer = keras.Model(
    [encoder_inputs, decoder_inputs], decoder_outputs, name="transformer"
)

Architecture of the transformer:

IshanRattan · 2024-04-28T06:48:21Z

Hi @sachinprasadhs thanks for helping me understand more about it! But I still have a question, in the code below:

encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="english")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)
encoder_outputs = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)

decoder_inputs = keras.Input(shape=(None,), dtype='int64', name='spanish')
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, dense_dim, num_heads)(x, encoder_outputs)
x = layers.Dropout(.5)(x)
decoder_outputs = layers.Dense(vocab_size, activation='softmax')(x)
transformer = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)

Why did we not build the "transformer" like this(above)?

What is the benefit of using keras.Model() for encoder and decoder separately as written in the code(below)?

encoder_inputs = keras.Input(shape=(None,), dtype="int64", name="encoder_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)
encoder_outputs = TransformerEncoder(embed_dim, latent_dim, num_heads)(x)
encoder = keras.Model(encoder_inputs, encoder_outputs)

decoder_inputs = keras.Input(shape=(None,), dtype="int64", name="decoder_inputs")
encoded_seq_inputs = keras.Input(shape=(None, embed_dim), name="decoder_state_inputs")
x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)
x = TransformerDecoder(embed_dim, latent_dim, num_heads)(x, encoded_seq_inputs)
x = layers.Dropout(0.5)(x)
decoder_outputs = layers.Dense(vocab_size, activation="softmax")(x)
decoder = keras.Model([decoder_inputs, encoded_seq_inputs], decoder_outputs)

decoder_outputs = decoder([decoder_inputs, encoder_outputs])
transformer = keras.Model(
    [encoder_inputs, decoder_inputs], decoder_outputs, name="transformer"
)

sachinprasadhs · 2024-05-01T23:46:34Z

encoder and decoder are basically 2 separate models in this case, it's just a way of building it to make it ready for the final transformer model which is dependent on decoder_outputs & decoder model.

In the above code, you can comment out the line for encoder model which does not make any difference in the final outcome #encoder = keras.Model(encoder_inputs, encoder_outputs) since the decoder_outputs is dependent on encoder_outputs which is already built using TransformerEncoder.

Here is another way of doing it mainly using subclassing models and layers https://www.tensorflow.org/text/tutorials/transformer

github-actions · 2024-05-16T01:49:29Z

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions bot assigned sachinprasadhs Apr 21, 2024

sachinprasadhs added the type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. label Apr 23, 2024

sachinprasadhs added the stat:awaiting response from contributor label Apr 25, 2024

google-ml-butler bot removed the stat:awaiting response from contributor label Apr 28, 2024

sachinprasadhs added the stat:awaiting response from contributor label May 1, 2024

github-actions bot added the stale label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help to understand the logic here #19579

Need help to understand the logic here #19579

IshanRattan commented Apr 21, 2024 •

edited

sachinprasadhs commented Apr 25, 2024

IshanRattan commented Apr 28, 2024

sachinprasadhs commented May 1, 2024

github-actions bot commented May 16, 2024

Need help to understand the logic here #19579

Need help to understand the logic here #19579

Comments

IshanRattan commented Apr 21, 2024 • edited

sachinprasadhs commented Apr 25, 2024

IshanRattan commented Apr 28, 2024

sachinprasadhs commented May 1, 2024

github-actions bot commented May 16, 2024

IshanRattan commented Apr 21, 2024 •

edited