4 changed files with 6 additions and 26 deletions
--- a/README.md
+++ b/README.md
@ -1,19 +0,0 @@
---
-license: bsd-3-clause
-tags:
- audio-classification
---
-
-# Audio Spectrogram Transformer (fine-tuned on AudioSet) 
-
-Audio Spectrogram Transformer (AST) model fine-tuned on AudioSet. It was introduced in the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Gong et al. and first released in [this repository](https://github.com/YuanGongND/ast). 
-
-Disclaimer: The team releasing Audio Spectrogram Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.
-
-## Model description
-
-The Audio Spectrogram Transformer is equivalent to [ViT](https://huggingface.co/docs/transformers/model_doc/vit), but applied on audio. Audio is first turned into an image (as a spectrogram), after which a Vision Transformer is applied. The model gets state-of-the-art results on several audio classification benchmarks.
-
-## Usage
-
-You can use the raw model for classifying audio into one of the AudioSet classes. See the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/audio-spectrogram-transformer#transformers.ASTForAudioClassification.forward.example) for more info.
--- a/config.json
+++ b/config.json
@ -1,8 +1,9 @@
 {
  "architectures": [
-    "ASTForAudioClassification"
+    "AudioSpectrogramTransformerForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.0,
+  "frequency_dimension": 128,
  "frequency_stride": 10,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
@ -1068,13 +1069,12 @@
    "Zither": 150
  },
  "layer_norm_eps": 1e-12,
-  "max_length": 1024,
  "model_type": "audio-spectrogram-transformer",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
-  "num_mel_bins": 128,
  "patch_size": 16,
  "qkv_bias": true,
+  "time_dimension": 1024,
  "time_stride": 10,
  "torch_dtype": "float32",
  "transformers_version": "4.25.0.dev0"
--- a/preprocessor_config.json
+++ b/preprocessor_config.json
@ -1,8 +1,7 @@
 {
  "do_normalize": true,
-  "feature_extractor_type": "ASTFeatureExtractor",
+  "feature_extractor_type": "AudioSpectrogramTransformerFeatureExtractor",
  "feature_size": 1,
-  "max_length": 1024,
  "mean": -4.2677393,
  "num_mel_bins": 128,
  "padding_side": "right",
--- a/pytorch_model.bin
+++ b/pytorch_model.bin