Update model (#3 )

- Update pytorch_model.bin (5ab2ed873c59b186fbc95b22d36bba37076033ae)
Fix _name_or_path in config.json (#2 )
2022-11-09 20:11:59 +00:00 · 2022-10-16 23:15:00 +00:00 · 2022-09-21 15:11:27 +00:00 · 2022-09-21 12:50:20 +00:00 · 2022-09-14 11:50:17 +00:00 · 2022-09-08 15:12:32 -07:00
7 changed files with 22 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -1,11 +1,17 @@
 ---
 language: en
 license: cc-by-nc-sa-4.0
+pipeline_tag: document-question-answering
 tags:
 - layoutlm
 - document-question-answering
 - pdf
 - invoices
+widget:
+- text: "What is the invoice number?"
+  src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png"
+- text: "What is the purchase amount?"
+  src: "https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/contract.jpeg"
 ---

 # LayoutLM for Invoices
@ -16,9 +22,18 @@ invoices as well as both [SQuAD2.0](https://huggingface.co/datasets/squad_v2) an
 ## Non-consecutive tokens

 Unlike other QA models, which can only extract consecutive tokens (because they predict the start and end of a sequence), this model can predict longer-range, non-consecutive sequences with an additional
-classifier head. For example, it can extract the two-line address as below:
+classifier head. For example, QA models often encounter this failure mode:

-![Two-line Address](./demo.png)
+### Before
+
+![Broken Address](./before.png)
+
+
+### After
+
+However this model is able to predict non-consecutive tokens and therefore the address correctly:
+
+![Two-line Address](./after.png)

 ## Getting started with the model

--- a/after.png
+++ b/after.png
--- a/before.png
+++ b/before.png
--- a/config.json
+++ b/config.json
@ -1,5 +1,5 @@
 {
-  "_name_or_path": "impira/layoutlm-document-qa",
+  "_name_or_path": "impira/layoutlm-invoices",
  "architectures": [
    "LayoutLMForQuestionAnswering"
  ],
@ -16,7 +16,7 @@
  "layer_norm_eps": 1e-05,
  "max_2d_position_embeddings": 1024,
  "max_position_embeddings": 514,
-  "model_type": "layoutlm-docquery",
+  "model_type": "layoutlm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
--- a/demo.png
+++ b/demo.png
--- a/pytorch_model.bin
+++ b/pytorch_model.bin
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@ -1 +1 @@
-{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base"}
+{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base", "add_prefix_space": true}
Author	SHA1	Message	Date
Richard Stebbing	416427ddd2	Update model (#3 ) - Update pytorch_model.bin (5ab2ed873c59b186fbc95b22d36bba37076033ae)	2022-11-09 20:11:59 +00:00
Ankur Goyal	479914aa1e	Fix _name_or_path in config.json (#2 ) - Fix _name_or_path in config.json (4d8385babf18df8230039850173b21ecc62694c1) Co-authored-by: Timo Witte <Spacefish007@users.noreply.huggingface.co>	2022-10-16 23:15:00 +00:00
Ankur Goyal	783b0c2ae8	Fix model_type (#1 ) - Fix model_type (70dd60f1c410191721676412e889a4dc3b442e5c) Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>	2022-09-21 15:11:27 +00:00
Niels Rogge	106a8afa8a	Update README.md	2022-09-21 12:50:20 +00:00
Ankur Goyal	b5e7d8a1b3	Update README.md	2022-09-14 11:50:17 +00:00
Ankur Goyal	f3db68b382	Update tokenizer to add_prefix_space by default	2022-09-08 15:12:32 -07:00
Ankur Goyal	7234eee0a1	Fix model-type name	2022-09-06 11:04:48 -07:00
Ankur Goyal	9d33a03f04	Improve titles	2022-09-06 11:04:13 -07:00
Ankur Goyal	dcab013d85	Fix "before" image	2022-09-06 11:02:52 -07:00
Ankur Goyal	0771b61a10	Improve README	2022-09-06 11:01:39 -07:00