Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] πŸš€πŸš€πŸš€ Transformers.js V3 πŸš€πŸš€πŸš€ #545

Draft
wants to merge 190 commits into
base: main
Choose a base branch
from

Conversation

xenova
Copy link
Owner

@xenova xenova commented Jan 27, 2024

In preparation for Transformers.js v3, I'm compiling a list of issues/features which will be fixed/included in the release.

How to use WebGPU

First, install the development branch

npm install xenova/transformers.js#v3

Then specify the device parameter when loading the model. Here's example code to get started. Please note that this is still a WORK IN PROGRESS, so the following usage may change before release.

import { pipeline } from '@xenova/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    device: 'webgpu',
    dtype: 'fp32', // or 'fp16'
});

// Generate embeddings
const sentences = ['That is a happy person', 'That is a very happy person'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output.tolist());

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova xenova marked this pull request as draft January 27, 2024 18:02
@Huguet57
Copy link

Hey! This is great. Is this already in alpha?

@beaufortfrancois
Copy link

beaufortfrancois commented Jun 4, 2024

@xenova For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.

I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.

Essentially, knowing the model architecture and testing with empty weights would allow for assessing performance capability without downloading the full model.

I thought I could use from_config for that but I wonder now if this should be a built-in V3 feature. What are your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment