Compare commits
10 Commits
1a67f8ef3f
...
416427ddd2
Author | SHA1 | Date |
---|---|---|
|
416427ddd2 | |
|
479914aa1e | |
|
783b0c2ae8 | |
|
106a8afa8a | |
|
b5e7d8a1b3 | |
|
f3db68b382 | |
|
7234eee0a1 | |
|
9d33a03f04 | |
|
dcab013d85 | |
|
0771b61a10 |
19
README.md
19
README.md
|
@ -1,11 +1,17 @@
|
|||
---
|
||||
language: en
|
||||
license: cc-by-nc-sa-4.0
|
||||
pipeline_tag: document-question-answering
|
||||
tags:
|
||||
- layoutlm
|
||||
- document-question-answering
|
||||
- pdf
|
||||
- invoices
|
||||
widget:
|
||||
- text: "What is the invoice number?"
|
||||
src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png"
|
||||
- text: "What is the purchase amount?"
|
||||
src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/contract.jpeg"
|
||||
---
|
||||
|
||||
# LayoutLM for Invoices
|
||||
|
@ -16,9 +22,18 @@ invoices as well as both [SQuAD2.0](https://huggingface.co/datasets/squad_v2) an
|
|||
## Non-consecutive tokens
|
||||
|
||||
Unlike other QA models, which can only extract consecutive tokens (because they predict the start and end of a sequence), this model can predict longer-range, non-consecutive sequences with an additional
|
||||
classifier head. For example, it can extract the two-line address as below:
|
||||
classifier head. For example, QA models often encounter this failure mode:
|
||||
|
||||

|
||||
### Before
|
||||
|
||||

|
||||
|
||||
|
||||
### After
|
||||
|
||||
However this model is able to predict non-consecutive tokens and therefore the address correctly:
|
||||
|
||||

|
||||
|
||||
## Getting started with the model
|
||||
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
|
@ -1,5 +1,5 @@
|
|||
{
|
||||
"_name_or_path": "impira/layoutlm-document-qa",
|
||||
"_name_or_path": "impira/layoutlm-invoices",
|
||||
"architectures": [
|
||||
"LayoutLMForQuestionAnswering"
|
||||
],
|
||||
|
@ -16,7 +16,7 @@
|
|||
"layer_norm_eps": 1e-05,
|
||||
"max_2d_position_embeddings": 1024,
|
||||
"max_position_embeddings": 514,
|
||||
"model_type": "layoutlm-docquery",
|
||||
"model_type": "layoutlm",
|
||||
"num_attention_heads": 12,
|
||||
"num_hidden_layers": 12,
|
||||
"pad_token_id": 1,
|
||||
|
|
BIN
pytorch_model.bin (Stored with Git LFS)
BIN
pytorch_model.bin (Stored with Git LFS)
Binary file not shown.
|
@ -1 +1 @@
|
|||
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base"}
|
||||
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base", "add_prefix_space": true}
|
||||
|
|
Loading…
Reference in New Issue