Compare commits
No commits in common. "c1c0c663ecf7a4de90db1bc2f8d4e2d38a4f93b4" and "c87aed3ce094f3ec19ae144d6e6fd010e34d7c57" have entirely different histories.
c1c0c663ec
...
c87aed3ce0
19
README.md
19
README.md
|
@ -1,19 +0,0 @@
|
|||
---
|
||||
license: bsd-3-clause
|
||||
tags:
|
||||
- audio-classification
|
||||
---
|
||||
|
||||
# Audio Spectrogram Transformer (fine-tuned on AudioSet)
|
||||
|
||||
Audio Spectrogram Transformer (AST) model fine-tuned on AudioSet. It was introduced in the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Gong et al. and first released in [this repository](https://github.com/YuanGongND/ast).
|
||||
|
||||
Disclaimer: The team releasing Audio Spectrogram Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.
|
||||
|
||||
## Model description
|
||||
|
||||
The Audio Spectrogram Transformer is equivalent to [ViT](https://huggingface.co/docs/transformers/model_doc/vit), but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks.
|
||||
|
||||
## Usage
|
||||
|
||||
You can use the raw model for classifying audio into one of the AudioSet classes. See the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/audio-spectrogram-transformer#transformers.ASTForAudioClassification.forward.example) for more info.
|
|
@ -1,8 +1,9 @@
|
|||
{
|
||||
"architectures": [
|
||||
"ASTForAudioClassification"
|
||||
"AudioSpectrogramTransformerForSequenceClassification"
|
||||
],
|
||||
"attention_probs_dropout_prob": 0.0,
|
||||
"frequency_dimension": 128,
|
||||
"frequency_stride": 10,
|
||||
"hidden_act": "gelu",
|
||||
"hidden_dropout_prob": 0.0,
|
||||
|
@ -1068,13 +1069,12 @@
|
|||
"Zither": 150
|
||||
},
|
||||
"layer_norm_eps": 1e-12,
|
||||
"max_length": 1024,
|
||||
"model_type": "audio-spectrogram-transformer",
|
||||
"num_attention_heads": 12,
|
||||
"num_hidden_layers": 12,
|
||||
"num_mel_bins": 128,
|
||||
"patch_size": 16,
|
||||
"qkv_bias": true,
|
||||
"time_dimension": 1024,
|
||||
"time_stride": 10,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.25.0.dev0"
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
{
|
||||
"do_normalize": true,
|
||||
"feature_extractor_type": "ASTFeatureExtractor",
|
||||
"feature_extractor_type": "AudioSpectrogramTransformerFeatureExtractor",
|
||||
"feature_size": 1,
|
||||
"max_length": 1024,
|
||||
"mean": -4.2677393,
|
||||
"num_mel_bins": 128,
|
||||
"padding_side": "right",
|
||||
|
|
BIN
pytorch_model.bin (Stored with Git LFS)
BIN
pytorch_model.bin (Stored with Git LFS)
Binary file not shown.
Loading…
Reference in New Issue