REAL TIME TRANSFORMER BASED OBJECT DETECTION USING YOLOv8

Vishnu, V Nair; Thushara, A

DSpace Home
→
PG Thesis
→
Computer Science (CS)
→
2023
→
View Item

REAL TIME TRANSFORMER BASED OBJECT DETECTION USING YOLOv8

Vishnu, V Nair; Thushara, A

URI: http://210.212.227.212:8080/xmlui/handle/123456789/507

Date: 2023-07-11

Abstract:

Real time Object detection is a computer vision task that involves identi fying and localizing objects of interest within an image or video. Many chal lenges need to be addressed in object detection, including occlusions, scale variations, clutter in the background, deformations and variations of objects, limited data, real-time processing demands, imbalanced classes, and the need to adapt to new object categories. This project proposes a Transformer-based object detection model to tackle the aforementioned challenges. The pro posed model utilizes Transformers, originally designed for natural language processing, to address object detection challenges. The model leverages the self-attention mechanism in Transformers for feature extraction rather than relying on convolutional neural networks. This allows the models to effec tively capture global and local features and learn complex spatial relation ships between objects. Furthermore, the fully connected layers in the con ventional object detection method are replaced with a Transformer-based detection head in the proposed models. This modification allows the model to utilize the strengths of Transformers in processing the extracted features and generating precise bounding box predictions. Also, the model can learn complex object representations and handle object occlusion, scale variation, and other challenging scenarios more effectively. This adaptation enhances the model’s capability to detect and localize objects in various real-world ap plications accurately. The performance of the proposed Transformer-based object detection model is evaluated through experiments on widely recognized object detection benchmarks like COCO. Additionally, proprietary datasets like Next wealth are used to gauge the model’s performance. The results of these evaluations exhibit significant enhancements in metrics such as mean average precision and localization accuracy compared to the other state-of art methods. The Transformer-based object detection models demonstrate promising outcomes, showcasing improved accuracy and their capability to handle challenging scenarios and complex object interactions effectively.

Show full item record