Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: PoC for the bounding box reflection in bbox_shift_scale_rotate #1125

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

i-aki-y
Copy link
Contributor

@i-aki-y i-aki-y commented Feb 15, 2022

About this PR

I had implemented a bbox reflection functionality for bbox_shift_scale_rotate for my own specific usecase. I think this functionality will be beneficial for other users. So I made this PR.

My goal is to integrate the bbox reflection mode like (cv2.BORDER_REFLECT and cv2.BORDER_WRAP) into current transforms like ShiftScaleRotate.

But it was not an easy task.

So in this PR, I provide a only functional version (bboxes_shift_scale_rotate_reflect) as a first step. I think it is sufficient to look into how it works and what challenges exist.
And if you think this implementation seems promising to merge, I want to get into the next step.

I'm not sure this PR can be merged, but I hope this implementation and analysis inspire someone.

Demo

This is a demo: An input image(left), a result of bbox_shift_scale_rotate(center), and a result of bboxes_shift_scale_rotate_reflect(right) proposed in this PR.

bbox_reflection

The full runnable code is here:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import cv2
import skimage
import albumentations as A

# define helper funcs
def add_bbox(ax, bbox):
    label = 0    
    if len(bbox) > 4:
        bbox, label = bbox[:4], bbox[4]        
    bbox_color = plt.get_cmap("tab10").colors[label]
    x_min, y_min, x_max, y_max = bbox
    w, h = x_max - x_min, y_max - y_min
    pat = Rectangle(xy=(x_min, y_min), width=w, height=h, fill=False, lw=3, color=bbox_color)
    ax.add_patch(pat)

def plot_image_and_bboxes(image, bboxes, ax):
    ax.imshow(image)
    for i in range(len(bboxes)):
        add_bbox(ax, bboxes[i])

def get_shift_scale_rotate_bboxes(bboxes, params):
    bboxes_out = []
    for bbox in bboxes:
        label = []
        if len(bbox) > 4:
            label.append(bbox[4])
        bbox = A.normalize_bbox(bbox, params["rows"], params["cols"])
        bbox = A.bbox_shift_scale_rotate(bbox, **params)
        bbox = A.denormalize_bbox(bbox, params["rows"], params["cols"])
        bboxes_out.append((*bbox, *label))
    return bboxes_out

def get_shift_scale_rotate_reflect_bboxes(bboxes, params):
    bboxes, labels = A.to_ndarray_bboxes(bboxes)
    bboxes = A.normalize_bboxes2(bboxes, params["rows"], params["cols"])
    bboxes = A.bboxes_shift_scale_rotate_reflect(bboxes, **box_params)
    bboxes = A.denormalize_bboxes2(bboxes, params["rows"], params["cols"])
    bboxes = A.to_tuple_bboxes(bboxes, labels)
    return bboxes

# setup
image = skimage.data.astronaut()
rows, cols = image.shape[:2]
bboxes = [[170, 30, 280, 180, 0], [350, 80, 460, 290, 1], [140, 350, 200, 420, 2]]
img_params = dict(angle=45, scale=0.5, dx=0.1, dy=0.2, border_mode=cv2.BORDER_REFLECT)
box_params = img_params.copy()
box_params.update({"rows": rows, "cols": cols})
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# show original image
axes[0].set_title("original image")
plot_image_and_bboxes(image, bboxes, axes[0])

## shift_scale_rotate
axes[1].set_title("shift_scale_rotate")
print("image transform")
%timeit image_tr = A.shift_scale_rotate(image, **img_params)
print("shift_scale_rotate")
%timeit bboxes_no_reflect = get_shift_scale_rotate_bboxes(bboxes, box_params)
plot_image_and_bboxes(image_tr, bboxes_no_reflect, axes[1])

## shift_scale_rotate_reflect
axes[2].set_title("shift_scale_rotate_reflect")
print("shift_scale_rotate with reflection")
%timeit bboxes_reflect = get_shift_scale_rotate_reflect_bboxes(bboxes, box_params)
plot_image_and_bboxes(image_tr, bboxes_reflect, axes[2])
#plt.savefig("./bbox_reflection.jpg")

About implementation

This implementation is very straightforward.

A summary is here:

  1. Generate flipped copies and laid out them around the original image.
  2. Apply affine transform to the bboxes, including coped ones.
  3. Apply center bbox crop and remove invisible bboxes
        step.1       step.2       step.3

                       q p q
        +-+-+-+      +-+-+-+
        |q|p|q|      | |d|b|d
        +-+-+-+      +-+-+-+       +-+
  b  -> |d|b|d|  ->  | |q|p|q  ->  |q|
        +-+-+-+      +-+-+-+       +-+
        |q|p|q|      | | | |
        +-+-+-+      +-+-+-+

This works.
But this is not efficient because this makes many bbox copies that will be removed finally.
I searched for an efficient algorithm for this task, but I could not find it.
Therefore, I decided to continue adopting this method.

To mitigate the disadvantage for the performance, I decided to implement this functionality in the vectorized bboxes.
This is why this PR has some 'bboxes_xyz' functions vectorized versions of existing bbox_xyz counterparts.

This PR has many changes, but I tried not to change any existing codes to avoid unintentional problems.

About performance

The std output of the above example is here.

image transform
3.39 ms ± 880 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
shift_scale_rotate
155 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
shift_scale_rotate with reflection
385 µs ± 9.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The time of 385 µs is not so bad compared to the img transform result of 3.39ms.
But it is slower than the original shift_scale_rotate despite vectorization.
The benefit of vectorization appears as the number of bboxes becomes large.
To see this, I run the following codes:

def get_grid_bboxes(n_grid, bbox_size, rows, cols):
    bboxes = []
    d_y = rows / n_grid
    d_x = cols / n_grid
    for i_y in range(n_grid):
        y_min = (i_y + 0.5) * d_y
        y_max = y_min + bbox_size
        for i_x in range(n_grid):
            x_min = (i_x + 0.5) * d_x
            x_max = x_min + bbox_size            
            bboxes.append([x_min, y_min, x_max, y_max])
    return bboxes

#grid_bboxes = get_grid_bboxes(3, 32, rows, cols)
#fig, ax = plt.subplots(1, 1, figsize=(6, 6))
#plot_image_and_bboxes(image, grid_bboxes, ax)

box_params["border_mode"] = cv2.BORDER_REFLECT_101
#box_params["border_mode"] = cv2.BORDER_CONSTANT

grid_bboxes = get_grid_bboxes(1, 32, rows, cols)
print(f"{len(grid_bboxes)} boxes")
%timeit get_shift_scale_rotate_bboxes(grid_bboxes, box_params)
%timeit get_shift_scale_rotate_reflect_bboxes(grid_bboxes, box_params)

grid_bboxes = get_grid_bboxes(3, 32, rows, cols)
print(f"{len(grid_bboxes)} boxes")
%timeit get_shift_scale_rotate_bboxes(grid_bboxes, box_params)
%timeit get_shift_scale_rotate_reflect_bboxes(grid_bboxes, box_params)

grid_bboxes = get_grid_bboxes(10, 32, rows, cols)
print(f"{len(grid_bboxes)} boxes")
%timeit get_shift_scale_rotate_bboxes(grid_bboxes, box_params)
%timeit get_shift_scale_rotate_reflect_bboxes(grid_bboxes, box_params)

The results are here (the parameters are the same as the previous example):

1 boxes
45.3 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
293 µs ± 5.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
9 boxes
402 µs ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
418 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
100 boxes
4.48 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.68 ms ± 36.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The vectorized version surpassed the original one as the number of bboxes increased. Also, note that the number of bboxes processed by the bboxes_shift_scale_rotate_reflect is about some times larger than
the bbox_shift_scale_rotate does because bbox_shift_scale_rotate does not process any of reflected bboxes.

Know issues and some notes

I list some known issues and notes that I found through the implementation.

1. Computational cost depends on parameters.

For example, assume to set a very small scale factor, like scale=0.01. In such a case, the number of bbox will be multiplied by 10^4.
This can cause performance issues, so users should be careful about small scale factors.

2. Extra care for tracking the bbox and labels is needed

Albumentation allows adding label information as a different target by using label_fields.
Since bbox reflection changes the number of input bboxes, it makes it difficult to track the relation between bbox and label_fields.
I think some extra care about the label_fields is needed to integrate this functionality into albumentation's pipeline.

--> The label_fields are automatically concatenated to bbox, so we do not need extra work about it.

3. Extra care for mismatches between numpy ndarray and tuple bboxes is needed

Albumentation allows adding label information to the bboxes as an additional element, and the label information can be a string type.
So bbox should be split into pure bbox and label so that the np.array(bboxes) does not create str ndarray.
To solve this issue, I introduced to_ndarray_bboxes(bboxes) and to_tuple_bboxes(bboxes, labels) functions.

4. This implementation does not care about the difference between BORDER_REFLECT and BORDER_REFLECT_101

Since I am not sure that this gives a significant disadvantage for results, this implementation does not care about the difference between BORDER_REFLECT and BORDER_REFLECT_101 to avoid extra complications.
I may need to do something about it, but I don't have any ideas on how to implement BORDER_REFLECT_101 precisely at the moment.

5. A well-established algorithm is wanted

This implementation is easy to understand but it is better if there is a more efficient and established algorithm.

@Dipet
Copy link
Collaborator

Dipet commented Feb 16, 2022

Thank you for your contribution. This is great and complex work!
Unfortunately, we need some time to look deeper into this PR and understand how it works.
Thus, reviewing this PR may take some time.

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Feb 17, 2022

Of course, you can take your time.
I think this is not a simple patch and includes things to consider and need design decisions.

@Dipet Dipet added the WIP label Jun 11, 2022
Some vectorized bbox functions are also added
@i-aki-y
Copy link
Contributor Author

i-aki-y commented Sep 4, 2022

I made some minor changes to follow up on recent albumentation updates.
And I integrated bbox reflection functionality to ShiftScaleRotate transform by adding a new argument reflec_annotation

@Dipet Dipet removed the WIP label Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants