Compare commits
10 Commits
c87aed3ce0
...
c1c0c663ec
Author | SHA1 | Date |
---|---|---|
|
c1c0c663ec | |
|
238aef2e1e | |
|
d20f174a6b | |
|
0a45c915b3 | |
|
7779200c41 | |
|
61e7b2e134 | |
|
2437b5b828 | |
|
f32fcdf8f4 | |
|
5ddb6d6a4c | |
|
52a0212175 |
|
@ -0,0 +1,19 @@
|
|||
---
|
||||
license: bsd-3-clause
|
||||
tags:
|
||||
- audio-classification
|
||||
---
|
||||
|
||||
# Audio Spectrogram Transformer (fine-tuned on AudioSet)
|
||||
|
||||
Audio Spectrogram Transformer (AST) model fine-tuned on AudioSet. It was introduced in the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Gong et al. and first released in [this repository](https://github.com/YuanGongND/ast).
|
||||
|
||||
Disclaimer: The team releasing Audio Spectrogram Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.
|
||||
|
||||
## Model description
|
||||
|
||||
The Audio Spectrogram Transformer is equivalent to [ViT](https://huggingface.co/docs/transformers/model_doc/vit), but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks.
|
||||
|
||||
## Usage
|
||||
|
||||
You can use the raw model for classifying audio into one of the AudioSet classes. See the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/audio-spectrogram-transformer#transformers.ASTForAudioClassification.forward.example) for more info.
|
|
@ -1,9 +1,8 @@
|
|||
{
|
||||
"architectures": [
|
||||
"AudioSpectrogramTransformerForSequenceClassification"
|
||||
"ASTForAudioClassification"
|
||||
],
|
||||
"attention_probs_dropout_prob": 0.0,
|
||||
"frequency_dimension": 128,
|
||||
"frequency_stride": 10,
|
||||
"hidden_act": "gelu",
|
||||
"hidden_dropout_prob": 0.0,
|
||||
|
@ -1069,12 +1068,13 @@
|
|||
"Zither": 150
|
||||
},
|
||||
"layer_norm_eps": 1e-12,
|
||||
"max_length": 1024,
|
||||
"model_type": "audio-spectrogram-transformer",
|
||||
"num_attention_heads": 12,
|
||||
"num_hidden_layers": 12,
|
||||
"num_mel_bins": 128,
|
||||
"patch_size": 16,
|
||||
"qkv_bias": true,
|
||||
"time_dimension": 1024,
|
||||
"time_stride": 10,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.25.0.dev0"
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
{
|
||||
"do_normalize": true,
|
||||
"feature_extractor_type": "AudioSpectrogramTransformerFeatureExtractor",
|
||||
"feature_extractor_type": "ASTFeatureExtractor",
|
||||
"feature_size": 1,
|
||||
"max_length": 1024,
|
||||
"mean": -4.2677393,
|
||||
"num_mel_bins": 128,
|
||||
"padding_side": "right",
|
||||
|
|
BIN
pytorch_model.bin (Stored with Git LFS)
BIN
pytorch_model.bin (Stored with Git LFS)
Binary file not shown.
Loading…
Reference in New Issue