Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support slice function, padding the mask to the full images and t… #881

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

AdonaiVera
Copy link
Contributor

Description

This pull request introduces support for InferenceSlicer that focuses on segmentation. This is the second part of the discussion in the #678 issue.

First, I added a conditional statement to pad images that were smaller than the slice size, such as corners. Next, I created the _apply_padding_to_slice function to apply padding to a slice using the letterbox resizing method.

We are adjusting the masks to align them with the image coordinate system, which represents the entire image. Most of the masks are empty except for the specific area we want to focus on. This simplifies the process of merging all the masks, but it also requires more memory since the size of each mask will be the same as the original image. Besides, we are currently using a concatenate function to join all the masks in the merge function

The main difficulty lies in the high computational cost and time required by this method. The algorithm needs to fit every mask detection in the entire image, resulting in a significantly larger size of the data.

In my opinion, this PR is not ready for production yet, as we need to handle the computational cost for the mask. However, I would like to hear your perspective @SkalskiP

Currently, the method is consuming a lot of RAM.

- load image ✅ 
- slice the image into NxN tiles ✅ 
- surround smaller size slices (the ones close to the edges) with a letterbox so that all tiles are NxN
- loop over slices ✅ 
   - run inference ✅ 
- update box coordinate values to match the image coordinate system, not the slice coordinate system ✅ 
- pad masks to match the image coordinate system, not the slice coordinate system ✅ 
- merge detections ✅ 

In addition, this PR encompasses comprehensive unit tests for the new move_masks functionality and a demo that demonstrates the algorithm's application and effectiveness in real-world scenarios.

Type of change

New feature (non-breaking change which adds functionality)

How has this change been tested

I created a demo to showcase this functionality here

@AdonaiVera
Copy link
Contributor Author

Hi @SkalskiP 👋

I was discussing with Tal Yagev, who is a Roboflow user, about the issue of large images consuming too much RAM. I tried several approaches to reduce RAM consumption, such as integrating a function to merge the mask in a batch, incremental stacking with lists, generator expressions, and others. However, none of these approaches solved the peak of the RAM that broke the code.

After further investigation, I found that the issue was with the NMS (non-maximum suppression) in the segmentation function that consumed a lot of RAM when you have slices of the image. Therefore, I suggested two changes and would like to hear your perspective on them.

The first change is to modify the code from this:

with ThreadPoolExecutor(max_workers=self.thread_workers) as executor:
            futures = [
                executor.submit(self._run_callback, image, offset) for offset in offsets
            ]
            for future in as_completed(futures):
                detections_list.append(future.result())

        return Detections.merge(detections_list=detections_list).with_nms(
                    threshold=self.iou_threshold
                )

to this:

with ThreadPoolExecutor(max_workers=self.thread_workers) as executor:
            futures = [
                executor.submit(self._run_callback, image, offset) for offset in offsets
            ]
            for future in as_completed(futures):
                detections_list.append(future.result().with_nms(
                    threshold=self.iou_threshold
                ))

        return Detections.merge(detections_list=detections_list)

The reason for this change is that we don't need to implement NMS in the full image if we're processing slices of the images.

The second change is optional and adds a conditional in the with_nms function, where the user can decide whether they want to implement NMS with a mask or with a bounding box.

if self.mask is not None and nms_mask:
            indices = mask_non_max_suppression(
                predictions=predictions, masks=self.mask, iou_threshold=threshold
            )
        else:
            indices = box_non_max_suppression(
                predictions=predictions, iou_threshold=threshold
            )

With these changes, we have improved the results significantly. I integrated a memory_profiler function in the demo to check the improvements.

## RAM CONSUMTION
### BEFORE
1. mask 1024 x 1024, peak memory: 7415.32 MiB, increment: 6418.33 MiB
2. mask 512 x 512 - Crash.
3. mask 256 x 256 - Crash.
4. mask 128 x 128 - Crash.

### AFTER
1. mask 1024 x 1024, peak memory: 2775.74 MiB, increment: 1775.71 MiB
2. mask 512 x 512, peak memory: 4574.13 MiB, increment: 2378.86 MiB
3. mask 256 x 256, peak memory: 5558.58 MiB, increment: 4558.59 MiB
4. mask 128 x 128, peak memory: 9688.52 MiB, increment: 4861.30 MiB

What do you think Piotr ?

On the other hand, I would like to create a cookbook of this functionality because I believe it will be of great help to people who work with large images and need to detect small objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant