Update README.md

Update config.json
Update README.md
2023-02-07 09:20:24 +00:00 · 2022-11-21 15:13:34 +00:00 · 2022-11-21 10:43:38 +00:00 · 2022-11-21 10:42:44 +00:00 · 2022-11-21 10:42:14 +00:00 · 2022-11-21 10:40:49 +00:00
4 changed files with 26 additions and 6 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,19 @@
+---
+license: bsd-3-clause
+tags:
+- audio-classification
+---
+
+# Audio Spectrogram Transformer (fine-tuned on AudioSet) 
+
+Audio Spectrogram Transformer (AST) model fine-tuned on AudioSet. It was introduced in the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Gong et al. and first released in [this repository](https://github.com/YuanGongND/ast). 
+
+Disclaimer: The team releasing Audio Spectrogram Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.
+
+## Model description
+
+The Audio Spectrogram Transformer is equivalent to [ViT](https://huggingface.co/docs/transformers/model_doc/vit), but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks.
+
+## Usage
+
+You can use the raw model for classifying audio into one of the AudioSet classes. See the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/audio-spectrogram-transformer#transformers.ASTForAudioClassification.forward.example) for more info.
--- a/config.json
+++ b/config.json
@ -1,9 +1,8 @@
 {
  "architectures": [
-    "AudioSpectrogramTransformerForSequenceClassification"
+    "ASTForAudioClassification"
  ],
  "attention_probs_dropout_prob": 0.0,
-  "frequency_dimension": 128,
  "frequency_stride": 10,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
@ -1069,12 +1068,13 @@
    "Zither": 150
  },
  "layer_norm_eps": 1e-12,
+  "max_length": 1024,
  "model_type": "audio-spectrogram-transformer",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
+  "num_mel_bins": 128,
  "patch_size": 16,
  "qkv_bias": true,
-  "time_dimension": 1024,
  "time_stride": 10,
  "torch_dtype": "float32",
  "transformers_version": "4.25.0.dev0"
--- a/preprocessor_config.json
+++ b/preprocessor_config.json
@ -1,7 +1,8 @@
 {
  "do_normalize": true,
-  "feature_extractor_type": "AudioSpectrogramTransformerFeatureExtractor",
+  "feature_extractor_type": "ASTFeatureExtractor",
  "feature_size": 1,
+  "max_length": 1024,
  "mean": -4.2677393,
  "num_mel_bins": 128,
  "padding_side": "right",
--- a/pytorch_model.bin
+++ b/pytorch_model.bin
Author	SHA1	Message	Date
Niels Rogge	c1c0c663ec	Update README.md	2023-02-07 09:20:24 +00:00
Niels Rogge	238aef2e1e	Update config.json	2022-11-21 15:13:34 +00:00
Niels Rogge	d20f174a6b	Update README.md	2022-11-21 10:43:38 +00:00
Niels Rogge	0a45c915b3	Update README.md	2022-11-21 10:42:44 +00:00
Niels Rogge	7779200c41	Update README.md	2022-11-21 10:42:14 +00:00
Niels Rogge	61e7b2e134	Create README.md	2022-11-21 10:40:49 +00:00
Niels Rogge	2437b5b828	Upload feature extractor	2022-11-21 10:33:14 +00:00
Niels Rogge	f32fcdf8f4	Upload ASTForSequenceClassification	2022-11-21 10:33:11 +00:00
Niels Rogge	5ddb6d6a4c	Upload feature extractor	2022-11-17 14:36:35 +00:00
Niels Rogge	52a0212175	Upload ASTForSequenceClassification	2022-11-17 14:36:32 +00:00