impira/layoutlm-invoices is a forked repo from huggingface. License: cc-by-nc-sa-4-0

cc-by-nc-sa-4-0 document-question-answering model

Go to file

Richard Stebbing 416427ddd2 Update model (#3 ) - Update pytorch_model.bin (5ab2ed873c59b186fbc95b22d36bba37076033ae)		2022-11-09 20:11:59 +00:00
.gitattributes	initial commit	2022-09-06 17:49:13 +00:00
README.md	Update README.md	2022-09-21 12:50:20 +00:00
after.png	Improve README	2022-09-06 11:01:39 -07:00
before.png	Fix "before" image	2022-09-06 11:02:52 -07:00
config.json	Fix _name_or_path in config.json (#2 )	2022-10-16 23:15:00 +00:00
merges.txt	Initial commit	2022-09-06 10:50:42 -07:00
pyproject.toml	Initial commit	2022-09-06 10:50:42 -07:00
pytorch_model.bin	Update model (#3 )	2022-11-09 20:11:59 +00:00
setup.cfg	Initial commit	2022-09-06 10:50:42 -07:00
special_tokens_map.json	Initial commit	2022-09-06 10:50:42 -07:00
tokenizer.json	Initial commit	2022-09-06 10:50:42 -07:00
tokenizer_config.json	Update tokenizer to add_prefix_space by default	2022-09-08 15:12:32 -07:00
vocab.json	Initial commit	2022-09-06 10:50:42 -07:00

README.md

language license pipeline_tag tags widget

cc-by-nc-sa-4.0

document-question-answering

layoutlm

document-question-answering

pdf

invoices

text	src
What is the invoice number?	`2359223c18/invoice.png`

text	src
What is the purchase amount?	`2359223c18/contract.jpeg`

LayoutLM for Invoices

This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on invoices and other documents. It has been fine-tuned on a proprietary dataset of invoices as well as both SQuAD2.0 and DocVQA for general comprehension.

Non-consecutive tokens

Unlike other QA models, which can only extract consecutive tokens (because they predict the start and end of a sequence), this model can predict longer-range, non-consecutive sequences with an additional classifier head. For example, QA models often encounter this failure mode:

Before

After

However this model is able to predict non-consecutive tokens and therefore the address correctly:

Getting started with the model

The best way to use this model is via DocQuery.

About us

This model was created by the team at Impira.