Skip to content

Arenaa/foundation-models-implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Foundation models Implementations from Scratch

This repository contains my custom implementations of various models using PyTorch. The models include all of these categories of models.

  • Computer Vision
  • Natural Language Processing
  • Generative AI
  • Vision-Language (Multimodel)

Computer vision models are designed to translate visual data based on features and contextual information identified during training. This enables models to interpret images and video and apply those interpretations to predictive or decision-making tasks. This repository contains some of the most famous models like ResNet, EfficientNet, and YOLO.

Natural Language Processing models are designed to process textual data and manipulate human language or data that resembles human language. NLP models work by finding relationships between the constituent parts of language, for example, the letters, words, and sentences found in a text dataset. NLP architectures use various methods for data preprocessing, feature extraction, and modeling. This repository contains Bidirectional LSTM, seq2seq, and Masked language models.

Generative AI models are designed for learning any kind of data distribution using unsupervised learning. All types of generative models aim at learning the true data distribution of the training set so as to generate new data points with some variations. Two of the most commonly used and efficient approaches are Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN). This repository contains various models improved based on GAN and VAE. Recently a new group of models was introduced in this area called Diffusion models. They also will be added soon.

Vision-Langauge models combine both the vision and language modalities. One characteristic that helps define these models is their ability to process both images (vision) and natural language text (language). This process depends on the inputs, outputs, and the task these models are asked to perform. This repository contains the Flamingo model which is one of the most effective and efficient models in this category.

It's important to note that the primary focus of these implementations is on understanding the functionality of the models rather than prioritizing efficiency. My goal is to delve into the intricacies of these models to enhance my learning experience.

About

Models implementations from scratch in Pytorch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages