Skip to content

A holistic evaluation library for multi-modal generative models using 🤗 Diffusers and Weave

Notifications You must be signed in to change notification settings

soumik12345/Hemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hemm: Holistic Evaluation of Multi-modal Generative Models

Hemm is a library for performing comprehensive benchmark of text-to-image diffusion models on image quality and prompt comprehension integrated with Weights & Biases and Weave. Hemm is inspired by Holistic Evaluation of Text-To-Image Models.

Installation

git clone https://github.com/soumik12345/Hemm
cd Hemm
pip install -e ".[core]"

Quickstart

First let's publish a small subset of the MSCOCO validation set as a Weave Dataset.

import weave
from hemm.utils import publish_dataset_to_weave


if __name__ == "__main__":
    weave.init(project_name="t2i_eval")

    dataset_reference = publish_dataset_to_weave(
        dataset_path="HuggingFaceM4/COCO",
        prompt_column="sentences",
        ground_truth_image_column="image",
        split="validation",
        dataset_transforms=[
            lambda item: {**item, "sentences": item["sentences"]["raw"]}
        ],
        data_limit=5,
    )
Weave Datasets enable you to collect examples for evaluation and automatically track versions for accurate comparisons. Easily update datasets with the UI and download the latest version locally with a simple API.

Next, you can evaluate Stable Diffusion 1.4 on image quality metrics as shown in the following code snippet:

from hemm.eval_pipelines import StableDiffusionEvaluationPipeline
from hemm.metrics.image_quality import LPIPSMetric, PSNRMetric, SSIMMetric


if __name__ == "__main__":
    diffuion_evaluation_pipeline = StableDiffusionEvaluationPipeline(
        "CompVis/stable-diffusion-v1-4"
    )

    # Add PSNR Metric
    psnr_metric = PSNRMetric(image_size=diffuion_evaluation_pipeline.image_size)
    diffuion_evaluation_pipeline.add_metric(psnr_metric)

    # Add SSIM Metric
    ssim_metric = SSIMMetric(image_size=diffuion_evaluation_pipeline.image_size)
    diffuion_evaluation_pipeline.add_metric(ssim_metric)
    
    # Add LPIPS Metric
    lpips_metric = LPIPSMetric(image_size=diffuion_evaluation_pipeline.image_size)
    diffuion_evaluation_pipeline.add_metric(lpips_metric)

    diffuion_evaluation_pipeline(
        dataset="COCO:v1",
        init_params=dict(project="t2i_eval", entity="geekyrakshit"),
    )
The evaluation pipeline will take each example, pass it through your application and score the output on multiple custom scoring functions using Weave Evaluation. By doing this, you'll have a view of the performance of your model, and a rich UI to drill into individual ouputs and scores.

About

A holistic evaluation library for multi-modal generative models using 🤗 Diffusers and Weave

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages