From 64cbfbc5d51275d255c884bd8c050029f661db91 Mon Sep 17 00:00:00 2001 From: Niels Rogge Date: Mon, 2 May 2022 12:56:22 +0000 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index be82241..186aaa3 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ Disclaimer: The team releasing YOLOS did not write a model card for this model s ## Model description -YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, the model is able to achieve 42 AP on COCO validation 2017. +YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN. The model is trained using a "bipartite matching loss": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.