From 64cbfbc5d51275d255c884bd8c050029f661db91 Mon Sep 17 00:00:00 2001
From: Niels Rogge <niels.rogge1@gmail.com>
Date: Mon, 2 May 2022 12:56:22 +0000
Subject: [PATCH] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index be82241..186aaa3 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@ Disclaimer: The team releasing YOLOS did not write a model card for this model s
 
 ## Model description
 
-YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, the model is able to achieve 42 AP on COCO validation 2017.
+YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN.
 
 The model is trained using a "bipartite matching loss": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.