Bounding boxes augmentation for object detection

2023-12-07 21:28

本文主要是介绍Bounding boxes augmentation for object detection,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Different annotations formats¶

Bounding boxes are rectangles that mark objects on an image. There are multiple formats of bounding boxes annotations. Each format uses its specific representation of bouning boxes coordinates 每种格式都使用其特定的边界框坐标表示。. Albumentations supports four formats: pascal_vocalbumentationscoco, and yolo .

Let's take a look at each of those formats and how they represent coordinates 坐标 of bounding boxes.

As an example, we will use an image from the dataset named Common Objects in Context. It contains one bounding box that marks a cat. The image width is 640 pixels, and its height is 480 pixels. The width of the bounding box is 322 pixels, and its height is 117 pixels.

 An example image with a bounding box from the COCO dataset

pascal_voc¶

pascal_voc is a format used by the Pascal VOC dataset. Coordinates of a bounding box are encoded with four values in pixels: [x_min, y_min, x_max, y_max]. x_min and y_min are coordinates of the top-left corner of the bounding box. x_max and y_max are coordinates of bottom-right corner of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 420, 462].

albumentations¶

albumentations is similar to pascal_voc, because it also uses four values [x_min, y_min, x_max, y_max] to represent a bounding box. But unlike pascal_vocalbumentations uses normalized values. To normalize values, we divide coordinates in pixels for the x- and y-axis by the width and the height of the image.

Coordinates of the example bounding box in this format are [98 / 640, 345 / 480, 420 / 640, 462 / 480] which are [0.153125, 0.71875, 0.65625, 0.9625].

Albumentations uses this format internally 内部 to work with bounding boxes and augment them.

coco¶

coco is a format used by the Common Objects in Context COCOCOCO dataset.

In coco, a bounding box is defined by four values in pixels [x_min, y_min, width, height]. They are coordinates of the top-left corner along with the width and height of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 322, 117].

yolo¶

In yolo, a bounding box is represented by four values [x_center, y_center, width, height]x_center and y_center are the normalized coordinates of the center of the bounding box. To make coordinates normalized, we take pixel values of x and y, which marks the center of the bounding box on the x- and y-axis. Then we divide the value of x by the width of the image and value of y by the height of the image. width and height represent the width and the height of the bounding box. They are normalized as well.

Coordinates of the example bounding box in this format are [((420 + 98) / 2) / 640, ((462 + 345) / 2) / 480, 322 / 640, 117 / 480] which are [0.4046875, 0.840625, 0.503125, 0.24375].

Bounding boxes augmentation¶

Just like with images and masks augmentation, the process of augmenting bounding boxes consists of 4 steps.

  1. You import the required libraries.
  2. You define an augmentation pipeline.
  3. You read images and bounding boxes from the disk.
  4. You pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.

Note

Some transforms in Albumentation don't support bounding boxes. If you try to use them you will get an exception. Please refer to this article to check whether a transform can augment bounding boxes.

Step 1. Import the required libraries.¶

import albumentations as A
import cv2

Step 2. Define an augmentation pipeline.¶

Here an example of a minimal declaration of an augmentation pipeline that works with bounding boxes.

transform = A.Compose([A.RandomCrop(width=450, height=450),A.HorizontalFlip(p=0.5),A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco'))

Note that unlike image and masks augmentation, Compose now has an additional parameter bbox_params. You need to pass an instance of A.BboxParams to that argument. A.BboxParams specifies settings for working with bounding boxes. format sets the format for bounding boxes coordinates.

It can either be pascal_vocalbumentationscoco or yolo. This value is required because Albumentation needs to know the coordinates' source format for bounding boxes to apply augmentations correctly.

Besides formatA.BboxParams supports a few more settings.

Here is an example of Compose that shows all available settings with A.BboxParams:

transform = A.Compose([A.RandomCrop(width=450, height=450),A.HorizontalFlip(p=0.5),A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', min_area=1024, min_visibility=0.1, label_fields=['class_labels']))

min_area and min_visibility

min_area and min_visibility parameters control what Albumentations should do to the augmented bounding boxes if their size has changed after augmentation. The size of bounding boxes could change if you apply spatial augmentations 空间增强 , for example, when you crop 裁剪 a part of an image or when you resize an image.

min_area is a value in pixels 是以像素为单位的值. If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won't contain that bounding box.

min_visibility is a value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won't be present in the returned list of the augmented bounding boxes.

Here is an example image that contains two bounding boxes. Bounding boxes coordinates are declared using the coco format.

 An example image with two bounding boxes

First, we apply the CenterCrop augmentation without declaring parameters min_area and min_visibility. The augmented image contains two bounding boxes.

 An example image with two bounding boxes after applying augmentation

Next, we apply the same CenterCrop augmentation, but now we also use the min_area parameter. Now, the augmented image contains only one bounding box, because the other bounding box's area after augmentation became smaller than min_area, so Albumentations dropped that bounding box.

 An example image with one bounding box after applying augmentation with 'min_area'

Finally, we apply the CenterCrop augmentation with the min_visibility. After that augmentation, the resulting image doesn't contain any bounding box, because visibility of all bounding boxes after augmentation are below threshold set by min_visibility.

An example image with zero bounding boxes after applying augmentation with 'min_visibility' 

Class labels for bounding boxes¶

Besides coordinates, each bounding box should have an associated class label that tells which object lies inside the bounding box. There are two ways to pass a label for a bounding box.

Let's say you have an example image with three objects: dogcat, and sports ball. Bounding boxes coordinates in the coco format for those objects are [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49].

An example image with 3 bounding boxes from the COCO dataset

1. You can pass labels along with bounding boxes coordinates by adding them as additional values to the list of coordinates.¶

For the image above, bounding boxes with class labels will become [23, 74, 295, 388, 'dog'][377, 294, 252, 161, 'cat'], and [333, 421, 49, 49, 'sports ball'].

Class labels could be of any type: integer, string, or any other Python data type. For example, integer values as class labels will look the following: [23, 74, 295, 388, 18][377, 294, 252, 161, 17], and [333, 421, 49, 49, 37].

 Also, you can use multiple class values for each bounding box, for example [23, 74, 295, 388, 'dog', 'animal'][377, 294, 252, 161, 'cat', 'animal'], and [333, 421, 49, 49, 'sports ball', 'item'].

2.You can pass labels for bounding boxes as a separate list (the preferred way).¶

 For example, if you have three bounding boxes like [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49] you can create a separate list with values like ['cat', 'dog', 'sports ball'], or [18, 17, 37] that contains class labels for those bounding boxes. Next, you pass that list with class labels as a separate argument to the transform function. Albumentations needs to know the names of all those lists with class labels to join them with augmented bounding boxes correctly. Then, if a bounding box is dropped after augmentation because it is no longer visible, Albumentations will drop the class label for that box as well. Use label_fields parameter to set names for all arguments in transform that will contain label descriptions for bounding boxes (more on that in Step 4).

Step 3. Read images and bounding boxes from the disk.¶

Read an image from the disk.

image = cv2.imread("/path/to/image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Bounding boxes can be stored on the disk in different serialization formats: JSON, XML, YAML, CSV, etc. So the code to read bounding boxes depends on the actual format of data on the disk.

After you read the data from the disk, you need to prepare bounding boxes for Albumentations.

Albumentations expects that bounding boxes will be represented 表示 as a list of lists. Each list contains information about a single bounding box. A bounding box definition should have at list four elements that represent the coordinates of that bounding box. The actual meaning of those four values depends on the format of bounding boxes (either pascal_vocalbumentationscoco, or yolo). Besides four coordinates, each definition of a bounding box may contain one or more extra values. You can use those extra values to store additional information about the bounding box, such as a class label of the object inside the box. During augmentation, Albumentations will not process those extra values. The library will return them as is along with the updated coordinates of the augmented bounding box 库将按原样返回它们以及增强边界框的更新坐标.

Step 4. Pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.¶

As discussed in Step 2, there are two ways of passing class labels along with bounding boxes coordinates:

1. Pass class labels along with coordinates.¶

So, if you have coordinates of three bounding boxes that look like this:

bboxes = [[23, 74, 295, 388],[377, 294, 252, 161],[333, 421, 49, 49],
]

you can add a class label for each bounding box as an additional element of the list along with four coordinates. So now a list with bounding boxes and their coordinates will look the following:

bboxes = [[23, 74, 295, 388, 'dog'],[377, 294, 252, 161, 'cat'],[333, 421, 49, 49, 'sports ball'],
]

or with multiple labels per each bounding box:

bboxes = [[23, 74, 295, 388, 'dog', 'animal'],[377, 294, 252, 161, 'cat', 'animal'],[333, 421, 49, 49, 'sports ball', 'item'],
]

You can use any data type for declaring class labels. It can be string, integer, or any other Python data type.

Next, you pass an image and bounding boxes for it to the transform function and receive the augmented image and bounding boxes.

transformed = transform(image=image, bboxes=bboxes)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']

 Example input and output data for bounding boxes augmentation

2. Pass class labels in a separate argument to transform (the preferred way).¶

 Let's say you have coordinates of three bounding boxes

bboxes = [[23, 74, 295, 388],[377, 294, 252, 161],[333, 421, 49, 49],
]

You can create a separate list that contains class labels for those bounding boxes:

class_labels = ['cat', 'dog', 'parrot']

Then you pass both bounding boxes and class labels to transform. Note that to pass class labels, you need to use the name of the argument that you declared in label_fields when creating an instance of Compose in step 2. In our case, we set the name of the argument to class_labels.

transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']

Example input and output data for bounding boxes augmentation with a separate argument for class labels 

Note that label_fields expects a list, so you can set multiple fields that contain labels for your bounding boxes. So if you declare Compose like

transform = A.Compose([A.RandomCrop(width=450, height=450),A.HorizontalFlip(p=0.5),A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', label_fields=['class_labels', 'class_categories'])))

you can use those multiple arguments to pass info about class labels, like

class_labels = ['cat', 'dog', 'parrot']
class_categories = ['animal', 'animal', 'item']transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels, class_categories=class_categories)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']
transformed_class_categories = transformed['class_categories']

这篇关于Bounding boxes augmentation for object detection的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/467409

相关文章

【Python报错已解决】AttributeError: ‘list‘ object has no attribute ‘text‘

🎬 鸽芷咕:个人主页  🔥 个人专栏: 《C++干货基地》《粉丝福利》 ⛺️生活的理想,就是为了理想的生活! 文章目录 前言一、问题描述1.1 报错示例1.2 报错分析1.3 解决思路 二、解决方法2.1 方法一:检查属性名2.2 步骤二:访问列表元素的属性 三、其他解决方法四、总结 前言 在Python编程中,属性错误(At

时间序列|change point detection

change point detection 被称为变点检测,其基本定义是在一个序列或过程中,当某个统计特性(分布类型、分布参数)在某时间点受系统性因素而非偶然因素影响发生变化,我们就称该时间点为变点。变点识别即利用统计量或统计方法或机器学习方法将该变点位置估计出来。 Change Point Detection的类型 online 指连续观察某一随机过程,监测到变点时停止检验,不运用到

error while loading shared libraries: libnuma.so.1: cannot open shared object file:

腾讯云CentOS,安装Mysql时: 1.yum remove libnuma.so.1 2.yum install numactl.x86_64

java基础总结12-面向对象8(Object类)

1 Object类介绍 Object类在JAVA里面是一个比较特殊的类,JAVA只支持单继承,子类只能从一个父类来继承,如果父类又是从另外一个父类继承过来,那他也只能有一个父类,父类再有父类,那也只能有一个,JAVA为了组织这个类组织得比较方便,它提供了一个最根上的类,相当于所有的类都是从这个类继承,这个类就叫Object。所以Object类是所有JAVA类的根基类,是所有JAVA类的老祖宗

MACS bdgdiff: Differential peak detection based on paired four bedGraph files.

参考原文地址:[http://manpages.ubuntu.com/manpages/xenial/man1/macs2_bdgdiff.1.html](http://manpages.ubuntu.com/manpages/xenial/man1/macs2_bdgdiff.1.html) 文章目录 一、MACS bdgdiff 简介DESCRIPTION 二、用法

王立平--Object-c

object-c通常写作objective-c或者obj-c,是根据C语言所衍生出来的语言,继承了C语言的特性,是扩充C的面向对象编程语言。它主要使用于MacOSX和GNUstep这两个使用OpenStep标准的系统,而在NeXTSTEP和OpenStep中它更是基本语言。Objective-C可以在gcc运作的系统写和编译,因为gcc含Objective-C的编译器。在MA

Learning Memory-guided Normality for Anomaly Detection——学习记忆引导的常态异常检测

又是一篇在自编码器框架中研究使用记忆模块的论文,可以看做19年的iccv的论文的衍生,在我的博客中对19年iccv这篇论文也做了简单介绍。韩国人写的,应该是吧,这名字听起来就像。 摘要abstract 我们解决异常检测的问题,即检测视频序列中的异常事件。基于卷积神经网络的异常检测方法通常利用代理任务(如重建输入视频帧)来学习描述正常情况的模型,而在训练时看不到异常样本,并在测试时使用重建误

REMEMBERING HISTORY WITH CONVOLUTIONAL LSTM FOR ANOMALY DETECTION——利用卷积LSTM记忆历史进行异常检测

上海科技大学的文章,上海科技大学有个组一直在做这方面的工作,好文章挺多的还有数据集。 ABSTRACT 本文解决了视频中的异常检测问题,由于异常是无界的,所以异常检测是一项极具挑战性的任务。我们通过利用卷积神经网络(CNN或ConvNet)对每一帧进行外观编码,并利用卷积长期记忆(ConvLSTM)来记忆与运动信息相对应的所有过去的帧来完成这项任务。然后将ConvNet和ConvLSTM与

COD论文笔记 ECCV2024 Just a Hint: Point-Supervised Camouflaged Object Detection

这篇论文的主要动机、现有方法的不足、拟解决的问题、主要贡献和创新点: 1. 动机 伪装物体检测(Camouflaged Object Detection, COD)旨在检测隐藏在环境中的伪装物体,这是一个具有挑战性的任务。由于伪装物体与背景的细微差别和模糊的边界,手动标注像素级的物体非常耗时,例如每张图片可能需要 60 分钟来标注。因此,作者希望通过减少标注负担,提出了一种仅依赖“点标注”的弱

COD论文笔记 Adaptive Guidance Learning for Camouflaged Object Detection

论文的主要动机、现有方法的不足、拟解决的问题、主要贡献和创新点如下: 动机: 论文的核心动机是解决伪装目标检测(COD)中的挑战性任务。伪装目标检测旨在识别和分割那些在视觉上与周围环境高度相似的目标,这对于计算机视觉来说是非常困难的任务。尽管深度学习方法在该领域取得了一定进展,但现有方法仍面临有效分离目标和背景的难题,尤其是在伪装目标与背景特征高度相似的情况下。 现有方法的不足之处: 过于