Compare commits

...

10 Commits

Author SHA1 Message Date
Prithiviraj Damodaran 9f32aa1e45 Update README.md 2021-05-18 07:53:27 +00:00
Prithiviraj Damodaran 53be3e7260 Update README.md 2021-05-11 06:53:48 +00:00
Prithiviraj Damodaran efbc2b90f5 Update README.md 2021-05-11 06:50:46 +00:00
Prithiviraj Damodaran 845ef10820 Update README.md 2021-05-11 06:48:55 +00:00
Prithivida d314313c7d V2 with more training data 2021-05-10 11:51:54 +05:30
Prithiviraj Damodaran 16ebb1410b Update README.md 2021-05-08 13:16:22 +00:00
Prithiviraj Damodaran 2bdcc837e3 Update README.md 2021-05-08 12:43:09 +00:00
Prithiviraj Damodaran 3e62bba831 Update README.md 2021-05-08 12:42:49 +00:00
Prithiviraj Damodaran 53fa195f05 Update README.md 2021-05-08 12:37:15 +00:00
Prithiviraj Damodaran 4d86284e9c Update README.md 2021-05-08 12:36:06 +00:00
4 changed files with 32 additions and 66 deletions

View File

@ -1,35 +1,38 @@
# Parrot
## 1. What is Parrot?
Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more than just a paraphrasing model. Please refer to the [github page](https://github.com/PrithivirajDamodaran/Parrot)
Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more than just a paraphrasing model. For more details on the library and usage please refer to the [github page](https://github.com/PrithivirajDamodaran/Parrot)
### Installation
```python
pip install git+https://github.com/PrithivirajDamodaran/Parrot.git
pip install git+https://github.com/PrithivirajDamodaran/Parrot_Paraphraser.git
```
### Quickstart
```python
# Caveat: the generate part is NOT seeded, so for a same input, multiple runs will produce DIFFERENT outputs for now
from parrot import Parrot
import torch
import warnings
warnings.filterwarnings("ignore")
def set_seed(seed):
'''
uncomment to get reproducable paraphrase generations
def random_state(seed):
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
set_seed(42)
random_state(1234)
'''
#Init models (make sure you init ONLY once if you integrate this to your code)
parrot = Parrot(model_tag="prithivida/parrot_paraphraser_on_T5", use_gpu=False)
phrases = ["Can you recommed some upscale restaurants in Rome?",
phrases = ["Can you recommed some upscale restaurants in Newyork?",
"What are the famous places we should not miss in Russia?"
]
@ -43,64 +46,27 @@ for phrase in phrases:
```
```
-----------------------------------------------------------------------------
Input_phrase: Can you recommed some upscale restaurants in Rome?
-----------------------------------------------------------------------------
"which upscale restaurants are recommended in rome?"
"which are the best restaurants in rome?"
"are there any upscale restaurants near rome?"
"can you recommend a good restaurant in rome?"
"can you recommend some of the best restaurants in rome?"
"can you recommend some best restaurants in rome?"
"can you recommend some upscale restaurants in rome?"
-----------------------------------------------------------------------------
----------------------------------------------------------------------
Input_phrase: Can you recommed some upscale restaurants in Newyork?
----------------------------------------------------------------------
list some excellent restaurants to visit in new york city?
what upscale restaurants do you recommend in new york?
i want to try some upscale restaurants in new york?
recommend some upscale restaurants in newyork?
can you recommend some high end restaurants in newyork?
can you recommend some upscale restaurants in new york?
can you recommend some upscale restaurants in newyork?
----------------------------------------------------------------------
Input_phrase: What are the famous places we should not miss in Russia
-----------------------------------------------------------------------------
"which are the must do places for tourists to visit in russia?"
"what are the best places to visit in russia?"
"what are some of the most visited sights in russia?"
"what are some of the most beautiful places in russia that tourists should not miss?"
"which are some of the most beautiful places to visit in russia?"
"what are some of the most important places to visit in russia?"
"what are some of the most famous places of russia?"
"what are some places we should not miss in russia?"
----------------------------------------------------------------------
what should we not miss when visiting russia?
recommend some of the best places to visit in russia?
list some of the best places to visit in russia?
can you list the top places to visit in russia?
show the places that we should not miss in russia?
list some famous places which we should not miss in russia?
```
### How to get syntactic and phrasal diversity/variety in your paraphrases using parrot?
You can play with the do_diverse knob (check out the next section for more knobs).
Consider this example: do_diverse = False (default)
```
------------------------------------------------------------------------------
Input_phrase: The ultimate test of your knowledge is your capacity to convey it to another.
------------------------------------------------------------------------------
'the final test of knowledge is your capacity to impart it '
'the ultimate test of a person's knowledge is his ability to transmit it to another '
'the ultimate test of knowledge is the ability to communicate it to another '
'the ultimate test of knowledge is your ability to communicate it to others '
'the test of your knowledge is your capacity to communicate it to others '
'the ultimate test for knowledge is the capacity to show it to another '
'the ultimate test of your knowledge is your ability to transmit to others '
'the ultimate test of a knowledge is your capacity to communicate it to another '
'the ultimate test of knowledge is your capacity to transmit it to another '
'the final test of your knowledge is your ability to convey it to another '
```
do_diverse = True
```
------------------------------------------------------------------------------
Input_phrase: The ultimate test of your knowledge is your capacity to convey it to another.
------------------------------------------------------------------------------
'one of the ultimate tests of knowledge is your ability to communicate it to another person '
'one of the ultimate tests of knowledge is your ability to transmit it to another person '
'one of the ultimate tests of knowledge is your ability to communicate it to another '
'one of the greatest tests of knowledge is your ability to convey it to another '
'one of the ultimate tests of knowledge is your ability to transmit it to another '
'one of the ultimate tests of knowledge is your ability to convey it to another person '
'one of the ultimate tests of knowledge is the ability to convey it to another '
'the ultimate test of your knowledge is your ability to communicate it to another '
'the ultimate test of your knowledge is your ability to transmit it to another '
```
### Knobs
@ -139,7 +105,7 @@ But in general being a generative model paraphrasers doesn't guarantee to preser
## 3. Scope
In the space of conversational engines, knowledge bots are to which **we ask questions** like *"when was the Berlin wall teared down?"*, transactional bots are to which **we give commands** like *"Turn on the music please"* and voice assistants are the ones which can do both answer questions and action our commands. Parrot mainly foucses on augmenting texts typed-into or spoken-to conversational interfaces for building robust NLU models. (*So usually people neither type out or yell out long paragraphs to conversational interfaces. Hence the pre-trained model is trained on text samples of maximum length of 64.*)
In the space of conversational engines, knowledge bots are to which **we ask questions** like *"when was the Berlin wall teared down?"*, transactional bots are to which **we give commands** like *"Turn on the music please"* and voice assistants are the ones which can do both answer questions and action our commands. Parrot mainly foucses on augmenting texts typed-into or spoken-to conversational interfaces for building robust NLU models. (*So usually people neither type out or yell out long paragraphs to conversational interfaces. Hence the pre-trained model is trained on text samples of maximum length of 32.*)
*While Parrot predominantly aims to be a text augmentor for building good NLU models, it can also be used as a pure-play paraphraser.*

View File

@ -1,5 +1,5 @@
{
"_name_or_path": "paraphrase/checkpoint-18660",
"_name_or_path": "paraphrase/checkpoint-19329",
"architectures": [
"T5ForConditionalGeneration"
],

BIN
pytorch_model.bin (Stored with Git LFS)

Binary file not shown.

BIN
training_args.bin (Stored with Git LFS)

Binary file not shown.