Update README.md

This commit is contained in:
Thomas De Decker 2022-06-01 07:57:55 +00:00 committed by huggingface-web
parent 4fdb3ab1c8
commit 2ef9d8fee5
1 changed files with 12 additions and 12 deletions

View File

@ -83,18 +83,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
```python
# Load pipeline
model_name = "DeDeckerThomas/keyphrase-extraction-distilbert-inspec"
model_name = "ml6team/keyphrase-extraction-distilbert-inspec"
extractor = KeyphraseExtractionPipeline(model=model_name)
```
```python
# Inference
text = """
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …),
keyphrase extraction can be improved. These new methods also focus on the semantics and context of a document, which is quite an improvement.
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
Currently, classical machine learning methods, that use statistics and linguistics,
are widely used for the extraction process. The fact that these methods have been widely used in the community
has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
and context of a document, which is quite an improvement.
""".replace(
"\n", ""
)
@ -106,10 +107,9 @@ print(keyphrases)
```
# Output
['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
'classical machine learning' 'deep learning methods'
'keyphrase extraction' 'linguistics' 'recurrent neural networks'
'semantics' 'statistics' 'text analysis' 'transformers']
['artificial intelligence', 'classical machine learning methods',
'keyphrase extraction', 'linguistics', 'statistics',
'text analysis']
```
## 📚 Training Dataset
@ -172,7 +172,7 @@ def preprocess_fuction(all_samples_per_split):
```
### Postprocessing
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive B and Is. As last you strip the keyphrase to ensure all spaces are removed.
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
```python
# Define post_process functions
def concat_tokens_by_tag(keyphrases):
@ -216,4 +216,4 @@ The model achieves the following results on the Inspec test set:
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
## 🚨 Issues
Please feel free to contact Thomas De Decker for any problems with this model.
Please feel free to start discussions in the Community Tab.