Compare commits
No commits in common. "416427ddd23a0b11a6e45302125158e09eb20cab" and "1a67f8ef3f829f7ad6ad7b320ede8b245c33e067" have entirely different histories.
416427ddd2
...
1a67f8ef3f
19
README.md
19
README.md
|
@ -1,17 +1,11 @@
|
||||||
---
|
---
|
||||||
language: en
|
language: en
|
||||||
license: cc-by-nc-sa-4.0
|
license: cc-by-nc-sa-4.0
|
||||||
pipeline_tag: document-question-answering
|
|
||||||
tags:
|
tags:
|
||||||
- layoutlm
|
- layoutlm
|
||||||
- document-question-answering
|
- document-question-answering
|
||||||
- pdf
|
- pdf
|
||||||
- invoices
|
- invoices
|
||||||
widget:
|
|
||||||
- text: "What is the invoice number?"
|
|
||||||
src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png"
|
|
||||||
- text: "What is the purchase amount?"
|
|
||||||
src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/contract.jpeg"
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# LayoutLM for Invoices
|
# LayoutLM for Invoices
|
||||||
|
@ -22,18 +16,9 @@ invoices as well as both [SQuAD2.0](https://huggingface.co/datasets/squad_v2) an
|
||||||
## Non-consecutive tokens
|
## Non-consecutive tokens
|
||||||
|
|
||||||
Unlike other QA models, which can only extract consecutive tokens (because they predict the start and end of a sequence), this model can predict longer-range, non-consecutive sequences with an additional
|
Unlike other QA models, which can only extract consecutive tokens (because they predict the start and end of a sequence), this model can predict longer-range, non-consecutive sequences with an additional
|
||||||
classifier head. For example, QA models often encounter this failure mode:
|
classifier head. For example, it can extract the two-line address as below:
|
||||||
|
|
||||||
### Before
|

|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
|
||||||
### After
|
|
||||||
|
|
||||||
However this model is able to predict non-consecutive tokens and therefore the address correctly:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## Getting started with the model
|
## Getting started with the model
|
||||||
|
|
||||||
|
|
BIN
before.png
BIN
before.png
Binary file not shown.
Before Width: | Height: | Size: 30 KiB |
|
@ -1,5 +1,5 @@
|
||||||
{
|
{
|
||||||
"_name_or_path": "impira/layoutlm-invoices",
|
"_name_or_path": "impira/layoutlm-document-qa",
|
||||||
"architectures": [
|
"architectures": [
|
||||||
"LayoutLMForQuestionAnswering"
|
"LayoutLMForQuestionAnswering"
|
||||||
],
|
],
|
||||||
|
@ -16,7 +16,7 @@
|
||||||
"layer_norm_eps": 1e-05,
|
"layer_norm_eps": 1e-05,
|
||||||
"max_2d_position_embeddings": 1024,
|
"max_2d_position_embeddings": 1024,
|
||||||
"max_position_embeddings": 514,
|
"max_position_embeddings": 514,
|
||||||
"model_type": "layoutlm",
|
"model_type": "layoutlm-docquery",
|
||||||
"num_attention_heads": 12,
|
"num_attention_heads": 12,
|
||||||
"num_hidden_layers": 12,
|
"num_hidden_layers": 12,
|
||||||
"pad_token_id": 1,
|
"pad_token_id": 1,
|
||||||
|
|
BIN
pytorch_model.bin (Stored with Git LFS)
BIN
pytorch_model.bin (Stored with Git LFS)
Binary file not shown.
|
@ -1 +1 @@
|
||||||
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base", "add_prefix_space": true}
|
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base"}
|
Loading…
Reference in New Issue