GitHub

Question Answering using document retrieval

The dataset we have colleted has this format:

{

x: question

y: exact answer,

z: passage/document

}

Encoding

We need to convert document i.e z and question i.e x to embedding/vector.

Document encoding

What is document enconding for?

To convert each text document/passage in our dataset to an embedding vector.

How to do document encoding?

Using a model which takes the text as input and provides a single vector as output.

Which model to use for document encoding?

A pretrained BERT Model. The CLS token of BERT model could be treated as document embedding/vector.

Quentions encoding

What is question enconding for?

To convert a question to an embedding vector. The answer to this qustion lies in documents. The question vector should help us in selecing few documents carrying answer. The similarity score of this vector should be higher for document vectors corresponding to answer.

How to do question encoding?

Using a model which takes the text as input and provides a single vector as output. This time model should be trainable.

Which model to use for document encoding?

A BERT Model. The CLS token of BERT model could be treated as question embedding/vector. We will be uing same pretrained BERT model as in document encoder. This time it is trained.

Finding document with answer

We have a question vector. This vector could tell us which docments are relevant or similar to question. so question vector should be sufficient in generating ranking of documents. But, if we follow this, it is going to be computationally expensive. We have to generate similarity score for all the documents for a given question before ranking.

Is there an alternative?

Yes, FAISS -Facebook AI Similarity Search.

This algorithm makes ranking computation much faster and less expensive. So, we will be using this to find out top K matching documents.

Answer generation

Once we have the top k documents, we will concatenate the documents found with the query (raw text) and then feed it to the BART encoder. The BART model is trained to give exact answer or semantically similar answer.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
architecture.png		architecture.png
architecture.rst		architecture.rst
document-encoding.png		document-encoding.png
readme.rst		readme.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question Answering using document retrieval

Encoding

Document encoding

What is document enconding for?

How to do document encoding?

Which model to use for document encoding?

Quentions encoding

What is question enconding for?

How to do question encoding?

Which model to use for document encoding?

Finding document with answer

Is there an alternative?

Answer generation

About

Releases

Packages

santoshnlp/END2QnACapstone

Folders and files

Latest commit

History

Repository files navigation

Question Answering using document retrieval

Encoding

Document encoding

What is document enconding for?

How to do document encoding?

Which model to use for document encoding?

Quentions encoding

What is question enconding for?

How to do question encoding?

Which model to use for document encoding?

Finding document with answer

Is there an alternative?

Answer generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages