Video Segmentation and Inpainting

Introduction

This code implements a video segmentation and inpainting pipeline using the Stable Diffusion (SD) model and LangSAM (Language-based Segmentation) model. The goal is to segment specific objects in each frame of a video based on textual prompts and then inpaint the segmented regions.

Functionality

The script performs two main tasks:

Segmentation: The script uses the LangSAM model to segment an image based on the text_prompt.
Inpainting: After segmentation, the script uses the StableDiffusionInpaintPipeline to inpaint the segmented parts of the image based on the prompt_sd.

Environment Setup

To run the code successfully, follow these steps to set up the required environment:

Create a new Anaconda environment with Python 3.10(.12):

conda create --name your_environment_name python=3.10.12
conda activate your_environment_name

Install CUDA Toolkit (version 11.7.0) - For Nvidia GeForce GTX 1660Ti:
```
 conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
```
Export CUDA Path:
```
export CUDA_HOME=/usr/local/cuda-11.7/
```
If it fails, try also to find the path with echo $CUDA_HOME (not recommended).

Install PyTorch:

 pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

Install lang-sam:

 git clone https://github.com/luca-medeiros/lang-segment-anything
 cd lang-segment-anything
 pip install -e .

Install diffusers and accelerate for GPU stable diffusion inference:
```
 pip install diffusers==0.20.2 accelerate==0.22.0
```
Export Paths:

Ensure that these are the correct folders. This needs to be done only once. Taken from PyTorch Issue #102535:
```
 export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}
 export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64 
```
The above have to be added in the 'bashrc' file.

Frame Extraction to be Inpainted and Combine Inpainted and Original DeepFake Videos

This script performs the following tasks:

Resizes input videos to a specified resolution (1920x1080 by default) for compatibility with the deepfake processing script.
Extracts individual frames from a deepfake video to be used as input to an inpainting pipeline.
Recreates the deepfake video with the initial audio in it.
Crops resized videos from step 1 (those were not used in steps 2-3) to a smaller target size (1080x1080 by default) for final output.
Concatenates a deepfake created video (final_deepfake_video.mp4) created in steps 2-3 and using the deepfake creation pipeline, with any other cropped videos before and after it found in the final_video_folder and resizes the combined video to a desired resolution.

External Dependencies

numpy: Numerical computing library.
cv2: OpenCV library for computer vision tasks.
matplotlib: Library for creating static, animated, and interactive visualizations.
os: Module for interacting with the operating system.
torch: PyTorch deep learning framework.
diffusers: Library providing implementations of various diffusion models.
PIL (Image, ImageFilter): Python Imaging Library for image processing tasks.
lang_sam: Language-based segmentation model.

External References

Main Inpainting Script Structure

1. Prompts and Paths

text_prompt: Text describing the object to segment (e.g., "costume").
prompt_sd: Text prompt for inpainting (e.g., "A fancy summer shirt with flowers in a black background").
sam_checkpoint: Path to the SAM model checkpoint.
model_type: SAM model type (e.g., "vit_b").
frame_path: Path to the directory containing video frames.
save_path: Path to the directory for saving inpainted frames.
test_save_path: Path to save images for manual checking (debugging).
target_width: Desired width for cropped images and masks.
target_height: Desired height for cropped images and masks.

2. Initialize Dependencies

Creates directories for saving inpainted frames if they don't exist.
Initializes the Stable Diffusion inpainting pipeline, enabling CPU offloading and attention slicing for efficiency.
Loads the SAM model.

3. Crop and Resize Function

crop_and_resize: Crops an image and its mask around the center, resizing them to the specified dimensions.

4. Inpainting Function

Crops the image and mask using crop_and_resize.
Inpaints the cropped image using Stable Diffusion based on the provided prompt and mask.
Returns the inpainted image.

5. Blend Image Function

Blends the inpainted image with the original image using the mask and applies Gaussian blur for smoother transitions.
Returns the blended image.

6. Main Loop

Loop through each frame in the video.
For the first frame:
- Loads the frame, predicts masks using SAM, and converts them to uint8 format.
- Creates a copy of the first frame's mask.
- Crops and resizes the image and mask.
- Inpaints the cropped image using Stable Diffusion.
- Blends the inpainted image with the original cropped image and saves it.
For subsequent frames:
- Loads the inpainted image from the previous iteration.
- Loads, crops and resizes the current frame
- Predicts masks for the current frame using SAM.
- Extracts the inpainted portion from the previous frame using the current frame's mask.
- Replaces pixels in the cropped current frame with the corresponding pixels from the extracted inpainted portion.
- Inpaints the remaining unmasked pixels in the cropped current frame using Stable Diffusion.
- Blends the inpainted cropped image with the original cropped image and saves it.

Notes

The code uses Stable Diffusion for inpainting based on masks generated by LangSAM.
The inpainting process involves blending the inpainted mask with the original image using the provided mask.
The code contains debugging and manual checking options, which can be activated by uncommenting specific sections.

Recommendations

Ensure the necessary dependencies are installed.
Verify that paths and checkpoints are correctly set.
Adjust parameters and prompts based on specific use cases.
Carefully review and understand the inpainting process, as it involves blending and masking operations.

Important

This documentation assumes familiarity with the referenced models and libraries.
It is recommended to refer to the linked repositories and documentation for detailed explanations of the models and methods used.
Some external links may be subject to changes, and it is advisable to refer to the latest documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
masks_inpaint_langseg.py		masks_inpaint_langseg.py
video_to_img.py		video_to_img.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

masks_inpaint_langseg.py

masks_inpaint_langseg.py

video_to_img.py

video_to_img.py

Repository files navigation

Video Segmentation and Inpainting

Introduction

Functionality

Environment Setup

Frame Extraction to be Inpainted and Combine Inpainted and Original DeepFake Videos

External Dependencies

External References

Main Inpainting Script Structure

1. Prompts and Paths

2. Initialize Dependencies

3. Crop and Resize Function

4. Inpainting Function

5. Blend Image Function

6. Main Loop

Notes

Recommendations

Important

About

Releases

Packages

Languages

nsourlos/frame_inpainting_and_video_creation

Folders and files

Latest commit

History

Repository files navigation

Video Segmentation and Inpainting

Introduction

Functionality

Environment Setup

Frame Extraction to be Inpainted and Combine Inpainted and Original DeepFake Videos

External Dependencies

External References

Main Inpainting Script Structure

1. Prompts and Paths

2. Initialize Dependencies

3. Crop and Resize Function

4. Inpainting Function

5. Blend Image Function

6. Main Loop

Notes

Recommendations

Important

About

Topics

Resources

Stars

Watchers

Forks

Languages