Skip to content

VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)

Notifications You must be signed in to change notification settings

nttmdlab-nlp/VisualMRC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

VisualMRC

VisualMRC is a visual machine reading comprehension dataset that proposes a task: given a question and a document image, a model produces an abstractive answer.

Figure 1 from paper

You can find more details, analyses, and baseline results in our paper. You can cite it as follows:

@inproceedings{VisualMRC2021,
  author    = {Ryota Tanaka and
               Kyosuke Nishida and
               Sen Yoshida},
  title     = {VisualMRC: Machine Reading Comprehension on Document Images},
  booktitle = {AAAI},
  year      = {2021}
}

Statistics

  • 10,197 images
  • 30,562 QA pairs
  • 10.53 average question tokens (tokenizing with NLTK tokenizer)
  • 9.53 average answer tokens (tokenizing wit NLTK tokenizer)
  • 151.46 average OCR tokens (tokenizing with NLTK tokenizer)

Get Started

If you want to use the dataset including ground-truth annotations, please contact me at [email protected]. Please let us know your institution, name, and purpose.

Dataset Format

id: "image id",
url: "URL",
screenshot_filename: "screenshot file name",
image_filename: "image file name",
bounding_boxes: [
  {
  id: "bounding box id",
  structure: "semantic class of the bounding box",
  shape:
    {
      x: "INT, Top left x coordinate of the bounding box",
      y: "INT, Top left y coordinate of the bounding box ",
      width: "INT, Width of the ROI bounding box",
      height: "INT, Height of the bounding box",
    }
  ocr_info: [
    {
      word: "OCR token",
      confidence: "Confiden score produced by tesseract",
      bbox: 
        {
          x: "INT, Top left x coordinate of the OCR bounding box",
          y: "INT, Top left y coordinate of the OCR bounding box ",
          width: "INT, Width of the OCR bounding box",
          height: "INT, Height of the OCR bounding box",
        }
     }
   ]
  }
]
qa_data:[
  {
  question:
    {
      text: "question"
    }
   answer:
    {
      text: "answer",
      relevant: ["relevant bounding boxes that need to answer the question"]
    }
  }
]

About

VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published