add tokenizer

This commit is contained in:
MohammedRakib 2021-07-03 18:10:33 +00:00
parent 2ee5bbdfb1
commit bc60334996
5 changed files with 50005 additions and 0 deletions

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

1
special_tokens_map.json Normal file
View File

@ -0,0 +1 @@
{"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}

1
tokenizer.json Normal file

File diff suppressed because one or more lines are too long

1
tokenizer_config.json Normal file
View File

@ -0,0 +1 @@
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "/content/drive/MyDrive/models/C10_roberta-base-100%-using-CUAD-trained-on-Only-Has-Ans-dataset", "tokenizer_class": "RobertaTokenizer"}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long