Introduction Video 🎥

A picture speaks volumes, as do the words that frame it.

Introduction Video 🎥

Turn on the music if possible 🎧

trex2_ongithub.mp4

News 📰

2024-03-27: Online Gradio demo hosted at https://huggingface.co/spaces/Mountchicken/T-Rex2
2024-03-26: New Gradio demo available🤗! Get our API but don't know how to use it? We are now providing a local Gradio demo that provides GUI interface to support your usage. Check here: Gradio APP

Contents 📜

Introduction Video 🎥
News 📰
Contents 📜
1. Introduction 📚
- What Can T-Rex Do 📝
2. Try Demo 🎮
3. API Usage Examples📚
4. Local Gradio Demo with API🎨
5. Related Works
BibTeX 📚

1. Introduction 📚

Object detection, the ability to locate and identify objects within an image, is a cornerstone of computer vision, pivotal to applications ranging from autonomous driving to content moderation. A notable limitation of traditional object detection models is their closed-set nature. These models are trained on a predetermined set of categories, confining their ability to recognize only those specific categories. The training process itself is arduous, demanding expert knowledge, extensive datasets, and intricate model tuning to achieve desirable accuracy. Moreover, the introduction of a novel object category, exacerbates these challenges, necessitating the entire process to be repeated.

T-Rex2 addresses these limitations by integrating both text and visual prompts in one model, thereby harnessing the strengths of both modalities. The synergy of text and visual prompts equips T-Rex2 with robust zero-shot capabilities, making it a versatile tool in the ever-changing landscape of object detection.

What Can T-Rex Do 📝

T-Rex2 is well-suited for a variety of real-world applications, including but not limited to: agriculture, industry, livstock and wild animals monitoring, biology, medicine, OCR, retail, electronics, transportation, logistics, and more. T-Rex2 mainly supports three major workflows including interactive visual prompt workflow, generic visual prompt workflow and text prompt workflow. It can cover most of the application scenarios that require object detection

workflows.mp4

2. Try Demo 🎮

We are now opening online demo for T-Rex2. Check our demo here

3. API Usage Examples📚

We are now opening free API access to T-Rex2. For educators, students, and researchers, we offer an API with extensive usage times to support your educational and research endeavors. You can get API at here request API.

Full API documentation can be found here.

Setup

Install the API package and acquire the API token from the email.

git clone https://github.com/IDEA-Research/T-Rex.git
cd T-Rex
pip install dds-cloudapi-sdk==0.1.1
pip install -v -e .

Interactive Visual Prompt API

In interactive visual prompt workflow, users can provide visual prompts in boxes or points format on a given image to specify the object to be detected.
```
python demo_examples/interactive_inference.py --token <your_token> 
```
- You are supposed get the following visualization results at demo_vis/

Generic Visual Prompt API

In generic visual prompt workflow, users can provide visual prompts on one reference image and detect on the other image.
```
python demo_examples/generic_inference.py --token <your_token> 
```
- You are supposed get the following visualization results at demo_vis/
  + =

Customize Visual Prompt Embedding API

In this workflow, you cam customize a visual embedding for a object category using multiple images. With this embedding, you can detect on any images.

python demo_examples/customize_embedding.py --token <your_token>

You are supposed to get a download link for this visual prompt embedding in safetensors format. Save it and let's use it for embedding_inference.

Embedding Inference API

With the visual prompt embeddings generated from the previous API. You can use it detect on any images.

  python demo_examples/embedding_inference.py --token <your_token>

4. Local Gradio Demo with API🎨

4.1. Setup

Install T-Rex2 API if you haven't done so

- install gradio and other dependencies
```bash
# install gradio and other dependencies
pip install gradio==4.22.0
pip install gradio-image-prompter

4.2. Run the Gradio Demo

python gradio_demo.py --trex2_api_token <your_token>

4.3. Basic Operations

Draw Box: Draw a box on the image to specify the object to be detected. Drag the left mouse button to draw a box.
Draw Point: Draw a point on the image to specify the object to be detected. Click the left mouse button to draw a point.
Interactive Visual Prompt: Provide visual prompts in boxes or points format on a given image to specify the object to be detected. The Input Target Image and Interactive Visual Prompt Image should be the same
Generic Visual Prompt: Provide visual prompts on multiple reference images and detect on the other image.

5. Related Works

🔥 We release the training and inference code and demo link of DINOv, which can handle in-context visual prompts for open-set and referring detection & segmentation. Check it out!

BibTeX 📚

@misc{jiang2024trex2,
      title={T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy}, 
      author={Qing Jiang and Feng Li and Zhaoyang Zeng and Tianhe Ren and Shilong Liu and Lei Zhang},
      year={2024},
      eprint={2403.14610},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
demo_examples		demo_examples
trex		trex
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradio_demo.py		gradio_demo.py
requirements.txt		requirements.txt
setup.py		setup.py

License

IDEA-Research/T-Rex

Folders and files

Latest commit

History

Repository files navigation

Introduction Video 🎥

News 📰

Contents 📜

1. Introduction 📚

What Can T-Rex Do 📝

2. Try Demo 🎮

3. API Usage Examples📚

Setup

Interactive Visual Prompt API

Generic Visual Prompt API

Customize Visual Prompt Embedding API

Embedding Inference API

4. Local Gradio Demo with API🎨

4.1. Setup

4.2. Run the Gradio Demo

4.3. Basic Operations

5. Related Works

BibTeX 📚

About

Topics

Resources

License

Stars

Watchers

Forks

Languages