Update README.md

Adding generation config file(s)
Update README.md
2023-03-14 11:51:54 +00:00 · 2023-01-24 17:02:46 +00:00 · 2022-10-06 02:45:08 +00:00 · 2022-07-14 10:12:06 +00:00 · 2022-05-17 08:26:08 +00:00 · 2022-05-05 07:01:43 +00:00
5 changed files with 128 additions and 40 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,78 @@
---
-license: mit
---
+---
+language: en
+tags:
+- tapex
+- table-question-answering
+datasets:
+- wikitablequestions
+license: mit
+---
+
+# TAPEX (large-sized model) 
+
+TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
+
+## Model description
+
+TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
+
+TAPEX is based on the BART architecture, the transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
+
+This model is the `tapex-base` model fine-tuned on the [WikiTableQuestions](https://huggingface.co/datasets/wikitablequestions) dataset.
+
+## Intended Uses
+
+You can use the model for table question answering on *complex* questions. Some **solveable** questions are shown below (corresponding tables now shown):
+
+| Question | Answer |
+|:---: |:---:|
+| according to the table, what is the last title that spicy horse produced? | Akaneiro: Demon Hunters |
+| what is the difference in runners-up from coleraine academical institution and royal school dungannon? | 20 |
+| what were the first and last movies greenstreet acted in? | The Maltese Falcon, Malaya |
+| in which olympic games did arasay thondike not finish in the top 20? | 2012 |
+| which broadcaster hosted 3 titles but they had only 1 episode? | Channel 4 |
+
+
+### How to Use
+
+Here is how to use this model in transformers:
+
+```python
+from transformers import TapexTokenizer, BartForConditionalGeneration
+import pandas as pd
+
+tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-large-finetuned-wtq")
+model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-large-finetuned-wtq")
+
+data = {
+    "year": [1896, 1900, 1904, 2004, 2008, 2012],
+    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
+}
+table = pd.DataFrame.from_dict(data)
+
+# tapex accepts uncased input since it is pre-trained on the uncased corpus
+query = "In which year did beijing host the Olympic Games?"
+encoding = tokenizer(table=table, query=query, return_tensors="pt")
+
+outputs = model.generate(**encoding)
+
+print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
+# [' 2008.0']
+```
+
+### How to Eval
+
+Please find the eval script [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/tapex).
+
+### BibTeX entry and citation info
+
+```bibtex
+@inproceedings{
+    liu2022tapex,
+    title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
+    author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
+    booktitle={International Conference on Learning Representations},
+    year={2022},
+    url={https://openreview.net/forum?id=O50443AsCP}
+}
+```
--- a/config.json
+++ b/config.json
@ -1,37 +1,37 @@
-{
-  "_name_or_path": "tapex-large-finetuned-wtq",
-  "activation_dropout": 0.0,
-  "activation_function": "gelu",
-  "architectures": [
-    "BartForConditionalGeneration"
-  ],
-  "attention_dropout": 0.1,
-  "bos_token_id": 0,
-  "classifier_dropout": 0.0,
-  "d_model": 1024,
-  "decoder_attention_heads": 16,
-  "decoder_ffn_dim": 4096,
-  "decoder_layerdrop": 0.0,
-  "decoder_layers": 12,
-  "decoder_start_token_id": 2,
-  "dropout": 0.1,
-  "encoder_attention_heads": 16,
-  "encoder_ffn_dim": 4096,
-  "encoder_layerdrop": 0.0,
-  "encoder_layers": 12,
-  "eos_token_id": 2,
-  "forced_bos_token_id": 0,
-  "forced_eos_token_id": 2,
-  "init_std": 0.02,
-  "is_encoder_decoder": true,
-  "max_length": 1024,
-  "max_position_embeddings": 1024,
-  "model_type": "bart",
-  "num_hidden_layers": 12,
-  "pad_token_id": 1,
-  "scale_embedding": false,
-  "torch_dtype": "float32",
-  "transformers_version": "4.17.0.dev0",
-  "use_cache": true,
-  "vocab_size": 50265
-}
+{
+  "_name_or_path": "microsoft/tapex-large-finetuned-wtq",
+  "activation_dropout": 0.0,
+  "activation_function": "gelu",
+  "architectures": [
+    "BartForConditionalGeneration"
+  ],
+  "attention_dropout": 0.1,
+  "bos_token_id": 0,
+  "classifier_dropout": 0.0,
+  "d_model": 1024,
+  "decoder_attention_heads": 16,
+  "decoder_ffn_dim": 4096,
+  "decoder_layerdrop": 0.0,
+  "decoder_layers": 12,
+  "decoder_start_token_id": 2,
+  "dropout": 0.1,
+  "encoder_attention_heads": 16,
+  "encoder_ffn_dim": 4096,
+  "encoder_layerdrop": 0.0,
+  "encoder_layers": 12,
+  "eos_token_id": 2,
+  "forced_bos_token_id": 0,
+  "forced_eos_token_id": 2,
+  "init_std": 0.02,
+  "is_encoder_decoder": true,
+  "max_length": 1024,
+  "max_position_embeddings": 1024,
+  "model_type": "bart",
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "scale_embedding": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.17.0.dev0",
+  "use_cache": true,
+  "vocab_size": 50265
+}
--- a/generation_config.json
+++ b/generation_config.json
@ -0,0 +1,11 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "decoder_start_token_id": 2,
+  "eos_token_id": 2,
+  "forced_bos_token_id": 0,
+  "forced_eos_token_id": 2,
+  "max_length": 1024,
+  "pad_token_id": 1,
+  "transformers_version": "4.27.0.dev0"
+}
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@ -0,0 +1 @@
+{"do_lower_case": true, "errors": "replace", "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": true, "max_cell_length": 15, "model_max_length": 1024, "special_tokens_map_file": null, "name_or_path": "microsoft/tapex-large-finetuned-wtq", "use_fast": true, "tokenizer_class": "TapexTokenizer"}
--- a/vocab.json
+++ b/vocab.json
Author	SHA1	Message	Date
Qian Liu	8338affcb0	Update README.md	2023-03-14 11:51:54 +00:00
Joao Gante	a755f6ff34	Adding generation config file(s)	2023-01-24 17:02:46 +00:00
Qian Liu	ec1a11f879	Update README.md Update the old URL link with the latest one.	2022-10-06 02:45:08 +00:00
Niels Rogge	0cd74d3e2a	Update README.md	2022-07-14 10:12:06 +00:00
Qian Liu	f5d2d80895	Update config.json	2022-05-17 08:26:08 +00:00
Niels Rogge	59fa639a30	Update README.md	2022-05-05 07:01:43 +00:00
Qian Liu	03c7da25f9	Update README.md	2022-03-10 08:26:35 +00:00
Qian Liu	08b43d3f06	Update README.md	2022-03-10 05:45:49 +00:00
Qian Liu	881e372614	Upload vocab.json	2022-03-10 05:40:57 +00:00
Qian Liu	bcdb8e3138	Upload tokenizer_config.json	2022-03-10 05:40:50 +00:00
				`@ -0,0 +1 @@`
				{"do_lower_case": true, "errors": "replace", "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": true, "max_cell_length": 15, "model_max_length": 1024, "special_tokens_map_file": null, "name_or_path": "microsoft/tapex-large-finetuned-wtq", "use_fast": true, "tokenizer_class": "TapexTokenizer"}