Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
-
Updated
Sep 18, 2016 - Python
Visually informed embedding of word (VIEW) is a tool for transferring multimodal background knowledge to NLP algorithms.
We aim to generate realistic images from text descriptions using GAN architecture. The network that we have designed is used for image generation for two datasets: MSCOCO and CUBS.
Image Captioning on Microsoft Coco Dataset
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset
The Jakarnotator is an annotation tool to create your own database for instance segmentation problem.
A demo for mapping class labels from ImageNet to COCO.
A deep learning based application which is entitled to help the visually impaired people. The application automatically generates the textual description of what's happening in front of the camera and conveys it to person through audio. It is capable of recognising faces and tell user whether a known person is present in front of him or not.
High-resolution Networks for the Fully Convolutional One-Stage Object Detection (FCOS) algorithm
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
[ECCV 2020] Boundary-preserving Mask R-CNN
MS COCO captions in Arabic
A tensorflow implement mobilenetv3 centernet, which can be easily deployeed on android(MNN) and ios(CoreML).
Add a description, image, and links to the mscoco topic page so that developers can more easily learn about it.
To associate your repository with the mscoco topic, visit your repo's landing page and select "manage topics."