Adding generation config file(s)

Update README.md
Update the old link with official link.
2023-01-24 17:02:43 +00:00 · 2022-10-06 02:48:19 +00:00 · 2022-05-17 08:26:50 +00:00 · 2022-05-05 07:02:33 +00:00 · 2022-03-10 05:28:45 +00:00 · 2022-03-10 05:04:00 +00:00
5 changed files with 55 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,39 @@
---
-license: mit
---
+---
+language: en
+tags:
+- tapex
+- table-question-answering
+license: mit
+---
+
+# TAPEX (large-sized model) 
+
+TAPEX was proposed in [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found [here](https://github.com/microsoft/Table-Pretraining).
+
+## Model description
+
+TAPEX (**Ta**ble **P**re-training via **Ex**ecution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with *table reasoning* skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.
+
+TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
+
+## Intended Uses
+
+⚠️ This model checkpoint is **ONLY** used for fine-tuining on downstream tasks, and you **CANNOT** use this model for simulating neural SQL execution, i.e., employ TAPEX to execute a SQL query on a given table. The one that can neurally execute SQL queries is at [here](https://huggingface.co/microsoft/tapex-large-sql-execution).
+> This separation of two models for two kinds of intention is because of a known issue in BART large, and we recommend readers to see [this comment](https://github.com/huggingface/transformers/issues/15559#issuecomment-1062880564) for more details.
+
+### How to Fine-tuning
+
+Please find the fine-tuning script [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/tapex).
+
+### BibTeX entry and citation info
+
+```bibtex
+@inproceedings{
+    liu2022tapex,
+    title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
+    author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
+    booktitle={International Conference on Learning Representations},
+    year={2022},
+    url={https://openreview.net/forum?id=O50443AsCP}
+}
+```
--- a/config.json
+++ b/config.json
@ -1,4 +1,5 @@
 {
+  "_name_or_path": "microsoft/tapex-large",
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "architectures": [
@ -32,9 +33,10 @@
  "model_type": "bart",
  "num_hidden_layers": 12,
  "pad_token_id": 1,
+  "num_beams": 4,
  "scale_embedding": false,
  "torch_dtype": "float32",
-  "transformers_version": "4.15.0",
+  "transformers_version": "4.17.0.dev0",
  "use_cache": true,
  "vocab_size": 50265
 }
--- a/generation_config.json
+++ b/generation_config.json
@ -0,0 +1,11 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "decoder_start_token_id": 2,
+  "eos_token_id": 2,
+  "forced_bos_token_id": 0,
+  "forced_eos_token_id": 2,
+  "num_beams": 4,
+  "pad_token_id": 1,
+  "transformers_version": "4.27.0.dev0"
+}
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@ -0,0 +1 @@
+{"do_lower_case": true, "errors": "replace", "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "max_cell_length": 15, "model_max_length": 1024, "special_tokens_map_file": null, "name_or_path": "microsoft/tapex-large", "tokenizer_class": "TapexTokenizer"}
--- a/vocab.json
+++ b/vocab.json
Author	SHA1	Message	Date
Joao Gante	e1a21a34d2	Adding generation config file(s)	2023-01-24 17:02:43 +00:00
Qian Liu	4faed89596	Update README.md Update the old link with official link.	2022-10-06 02:48:19 +00:00
Qian Liu	6315ea4500	Update config.json	2022-05-17 08:26:50 +00:00
Niels Rogge	5d604f67dc	Update README.md	2022-05-05 07:02:33 +00:00
Qian Liu	f1b17e54a7	Update README.md	2022-03-10 05:28:45 +00:00
Qian Liu	08b6fcc3c7	Update README.md	2022-03-10 05:04:00 +00:00
Qian Liu	015428a242	Update README.md	2022-03-10 05:01:46 +00:00
Qian Liu	22c087cf83	Upload vocab.json	2022-03-10 05:01:27 +00:00
Qian Liu	288d4bc353	Upload tokenizer_config.json	2022-03-10 05:01:20 +00:00
Qian Liu	ce529b7599	Upload config.json	2022-03-10 05:00:53 +00:00
				`@ -0,0 +1 @@`
				{"do_lower_case": true, "errors": "replace", "bos_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "unk_token": {"content": "<unk>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "sep_token": {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "cls_token": {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": {"content": "<pad>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "max_cell_length": 15, "model_max_length": 1024, "special_tokens_map_file": null, "name_or_path": "microsoft/tapex-large", "tokenizer_class": "TapexTokenizer"}