Skip to content

Multimodal classification - multiple images for each record #3343

Answered by abidwael
anakin87 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @anakin87, your task fits right in and we're glad you stumbled across Ludwig!

Averaging the image embeddings is a good idea. Here are a few more:

  1. For a record that contains more than 1 image records, you can unravel it into multiple training examples, i.e. if you have 1 training example with four images A, B, C and D, one text E, and one label F, you would transform them into 4 records where each one will have a different image, and the same text and label. Your config will be
input_features:
  - name: image_feature
    type: image
  - name: text_feature
    type: text
output_features:
  - name: label
    type: category (or other)
combiner:
  ...
trainer:
  ...

I'd go with this opti…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@anakin87
Comment options

Answer selected by abidwael
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants