Compare commits

..

10 Commits

Author SHA1 Message Date
Niels Rogge c1c0c663ec Update README.md 2023-02-07 09:20:24 +00:00
Niels Rogge 238aef2e1e Update config.json 2022-11-21 15:13:34 +00:00
Niels Rogge d20f174a6b Update README.md 2022-11-21 10:43:38 +00:00
Niels Rogge 0a45c915b3 Update README.md 2022-11-21 10:42:44 +00:00
Niels Rogge 7779200c41 Update README.md 2022-11-21 10:42:14 +00:00
Niels Rogge 61e7b2e134 Create README.md 2022-11-21 10:40:49 +00:00
Niels Rogge 2437b5b828 Upload feature extractor 2022-11-21 10:33:14 +00:00
Niels Rogge f32fcdf8f4 Upload ASTForSequenceClassification 2022-11-21 10:33:11 +00:00
Niels Rogge 5ddb6d6a4c Upload feature extractor 2022-11-17 14:36:35 +00:00
Niels Rogge 52a0212175 Upload ASTForSequenceClassification 2022-11-17 14:36:32 +00:00
4 changed files with 26 additions and 6 deletions

19
README.md Normal file
View File

@ -0,0 +1,19 @@
---
license: bsd-3-clause
tags:
- audio-classification
---
# Audio Spectrogram Transformer (fine-tuned on AudioSet)
Audio Spectrogram Transformer (AST) model fine-tuned on AudioSet. It was introduced in the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Gong et al. and first released in [this repository](https://github.com/YuanGongND/ast).
Disclaimer: The team releasing Audio Spectrogram Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.
## Model description
The Audio Spectrogram Transformer is equivalent to [ViT](https://huggingface.co/docs/transformers/model_doc/vit), but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks.
## Usage
You can use the raw model for classifying audio into one of the AudioSet classes. See the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/audio-spectrogram-transformer#transformers.ASTForAudioClassification.forward.example) for more info.

View File

@ -1,9 +1,8 @@
{
"architectures": [
"AudioSpectrogramTransformerForSequenceClassification"
"ASTForAudioClassification"
],
"attention_probs_dropout_prob": 0.0,
"frequency_dimension": 128,
"frequency_stride": 10,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.0,
@ -1069,12 +1068,13 @@
"Zither": 150
},
"layer_norm_eps": 1e-12,
"max_length": 1024,
"model_type": "audio-spectrogram-transformer",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_mel_bins": 128,
"patch_size": 16,
"qkv_bias": true,
"time_dimension": 1024,
"time_stride": 10,
"torch_dtype": "float32",
"transformers_version": "4.25.0.dev0"

View File

@ -1,7 +1,8 @@
{
"do_normalize": true,
"feature_extractor_type": "AudioSpectrogramTransformerFeatureExtractor",
"feature_extractor_type": "ASTFeatureExtractor",
"feature_size": 1,
"max_length": 1024,
"mean": -4.2677393,
"num_mel_bins": 128,
"padding_side": "right",

BIN
pytorch_model.bin (Stored with Git LFS)

Binary file not shown.