TXH-mercury / VAST Star 194 Code Issues Pull requests Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset dataset vision-language audio-language multimodal-foundation-model cross-modality-pretraining vision-audio-subtitle-text Updated Mar 14, 2024 Jupyter Notebook