Vision LLama

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta. PAPER LINK

install

$ pip install vision-llama

usage

import torch
from vision_llama.main import VisionLlama

# Forward Tensor
x = torch.randn(1, 3, 224, 224)

# Create an instance of the VisionLlamaBlock model with the specified parameters
model = VisionLlama(
    dim=768, depth=12, channels=3, heads=12, num_classes=1000
)


# Print the shape of the output tensor when x is passed through the model
print(model(x))

License

MIT

Citation

@misc{chu2024visionllama,
    title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks}, 
    author={Xiangxiang Chu and Jianlin Su and Bo Zhang and Chunhua Shen},
    year={2024},
    eprint={2403.00522},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

todo

Implement the AS2DRoPE rope, might just use axial rotary embeddings instead, my implementation is really bad
Implement the GSA attention, i implemented it but's bad
Add imagenet training script with distributed

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
scripts		scripts
vision_llama		vision_llama
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
agorabanner.png		agorabanner.png
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision LLama

install

usage

License

Citation

todo

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

License

kyegomez/VisionLLaMA

Folders and files

Latest commit

History

Repository files navigation

Vision LLama

install

usage

License

Citation

todo

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

Packages