Add alignment matrix learning #1108

iPRET · 2024-05-15T13:31:22Z

Added alignment matrix learning to training. See issue #1105 for more details.
Added command line arguments
--alignment-matrix for specifying a file to read alignments from.
--alignment-matrix-weight for specifying loss coefficient for alignment matrix cross entropy.
--attention-alignment-layer for specifying layer of decoder in which attention alignment will happen.
--align-attentions for telling model to learn alignments when that's impossible to infer from other command line arguments.
--shift-alignments for telling data preparation to shift alignments one target token forward, and translation one target token backward.

Report on performance impact:
Sockeye_Alignment_Matrix_Report-6.pdf

Pull Request Checklist

Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
until you can check this box.
Unit tests pass (pytest)
Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
System tests pass (pytest test/system)
Passed code style checking (./style-check.sh)
You have considered writing a test
Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

iPRET · 2024-05-16T08:12:00Z

So I ran the style_check.sh (My appologies that I didn't do it before the pull request) (ᗒᗣᗕ)՞.
.
One of the issues is a type error.
I modified TransformerDecoder.decode_seq to return Tuple[Tensor, Tensor] istead of the previous Tensor.
This is necessary because the decoder now has to return both the translation and the attention matrices.
But it doesn't fit what the abstract class Decoder.decode_seq expects of the function.
I see a couple paths forward here:

Make decode_seq return a dictionary of results where keys correspond to different tensors.
Make Decoder.decode_seq return Tuple[Tensor, Tensor]
Make a new function called decode_seq_and_attention, and modify the logic everywhere that calls decode_seq.
Perhaps something completely different?

Which approach do you think might be most pleasing?
Thanks,
IP

mjdenkowski · 2024-05-16T16:09:54Z

Hi Ingus,

It looks like you've created a thorough implementation and tested it extensively. Given the size of the pull request and our current priorities, it may be some time before we get to this. We'll follow up when we start to review the code and your report.

Best,
Michael

iPRET · 2024-05-17T06:34:37Z

Hello Michael.

Before You do the code review, can you please give a quick opinion on the type error, so I can fix it up? (It's the last problem holding me from ticking the style_check.sh box) ༼ つ ◕.◕ ༽つ

Thanks,
IP

mjdenkowski · 2024-06-07T14:52:15Z

Hi Ingus,

Thank you for your interest in using and contributing to Sockeye!

Unfortunately, we were not able to include your new feature before the cutoff for transitioning Sockeye to maintenance mode. We appreciate the amount of work that went into developing and testing this feature and encourage you to share your extended version of Sockeye with others.

Best wishes on your future projects!

--Michael

Add alignment matrix learning

5af66b6

iPRET requested review from fhieber, tdomhan, mjdenkowski, xingniu and annacurrey as code owners May 15, 2024 13:31

iPRET added 3 commits May 16, 2024 12:31

Fix alignment matrix relevant type errors

2eecfb9

Update CHANGELOG.md with alignment matrix changes

7c15891

Change version to 3.1.38 (backwards compatible)

9c8f2e6

mjdenkowski closed this Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add alignment matrix learning #1108

Add alignment matrix learning #1108

iPRET commented May 15, 2024 •

edited

iPRET commented May 16, 2024

mjdenkowski commented May 16, 2024

iPRET commented May 17, 2024 •

edited

mjdenkowski commented Jun 7, 2024

Add alignment matrix learning #1108

Add alignment matrix learning #1108

Conversation

iPRET commented May 15, 2024 • edited

Pull Request Checklist

iPRET commented May 16, 2024

mjdenkowski commented May 16, 2024

iPRET commented May 17, 2024 • edited

mjdenkowski commented Jun 7, 2024

iPRET commented May 15, 2024 •

edited

iPRET commented May 17, 2024 •

edited