Update README.md

This commit is contained in:
Thomas De Decker 2022-06-01 07:57:55 +00:00 committed by huggingface-web
parent 4fdb3ab1c8
commit 2ef9d8fee5
1 changed files with 12 additions and 12 deletions

View File

@ -83,7 +83,7 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
```python ```python
# Load pipeline # Load pipeline
model_name = "DeDeckerThomas/keyphrase-extraction-distilbert-inspec" model_name = "ml6team/keyphrase-extraction-distilbert-inspec"
extractor = KeyphraseExtractionPipeline(model=model_name) extractor = KeyphraseExtractionPipeline(model=model_name)
``` ```
```python ```python
@ -91,10 +91,11 @@ extractor = KeyphraseExtractionPipeline(model=model_name)
text = """ text = """
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text. Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
Since this is a time-consuming process, Artificial Intelligence is used to automate it. Since this is a time-consuming process, Artificial Intelligence is used to automate it.
Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process. Currently, classical machine learning methods, that use statistics and linguistics,
The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries. are widely used for the extraction process. The fact that these methods have been widely used in the community
Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …), has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
keyphrase extraction can be improved. These new methods also focus on the semantics and context of a document, which is quite an improvement. transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
and context of a document, which is quite an improvement.
""".replace( """.replace(
"\n", "" "\n", ""
) )
@ -106,10 +107,9 @@ print(keyphrases)
``` ```
# Output # Output
['Artificial Intelligence' 'GANS' 'Keyphrase extraction' ['artificial intelligence', 'classical machine learning methods',
'classical machine learning' 'deep learning methods' 'keyphrase extraction', 'linguistics', 'statistics',
'keyphrase extraction' 'linguistics' 'recurrent neural networks' 'text analysis']
'semantics' 'statistics' 'text analysis' 'transformers']
``` ```
## 📚 Training Dataset ## 📚 Training Dataset
@ -172,7 +172,7 @@ def preprocess_fuction(all_samples_per_split):
``` ```
### Postprocessing ### Postprocessing
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive B and Is. As last you strip the keyphrase to ensure all spaces are removed. For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
```python ```python
# Define post_process functions # Define post_process functions
def concat_tokens_by_tag(keyphrases): def concat_tokens_by_tag(keyphrases):
@ -216,4 +216,4 @@ The model achieves the following results on the Inspec test set:
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook. For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
## 🚨 Issues ## 🚨 Issues
Please feel free to contact Thomas De Decker for any problems with this model. Please feel free to start discussions in the Community Tab.