Skip to content

Blip 2 Captioning, Mass Captioning, Question Answering, and other tools.

Notifications You must be signed in to change notification settings

rodrigo-barraza/inscriptor

Repository files navigation

Inscriptor ✍️

An easy-to-use image captioning for data set annotation. Need to caption your images? Inscriptor has you. This is an implementation of diffusers utilizing Blip2's new zero-shot instructed vision-to-language generation. Runs best on the blip2-opt-6.7b-coco model, but you can change it to any of the other models in the blip2 family that are lower or higher requirements.

Recommended Requirements

24GB VRAM and 48GB RAM

Install Torch

with pip:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

with conda:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

or install from https://pytorch.org/get-started/locally/

Examples

Coming soon

How to use

Inside inscriptor-mass-captioning.ipynb, point imagesDirectory to your local dataset directory. The dataset directory should contain images, in any of these formats: .jpg, .png, .jpeg, .webp, .gif, and can contain subfolders with their own images. Inscriptor will recursively search for images in the directories. The names of these folders can be used as tokens to add to the caption. For example, if you have a folder named cat and another named dog, the caption will contain the tokens cat and dog. This will generate a .txt file with the in the same location where your original image is located with the same filename. The .txt file will contain the caption generated by Inscriptor.