[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

xenova · 2024-01-27T17:53:18Z

In preparation for Transformers.js v3, I'm compiling a list of issues/features which will be fixed/included in the release.

How to use WebGPU

First, install the development branch

npm install xenova/transformers.js#v3

Then specify the device parameter when loading the model. Here's example code to get started. Please note that this is still a WORK IN PROGRESS, so the following usage may change before release.

import { pipeline } from '@xenova/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    device: 'webgpu',
    dtype: 'fp32', // or 'fp16'
});

// Generate embeddings
const sentences = ['That is a happy person', 'That is a very happy person'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output.tolist());

Requires dynamic imports

HuggingFaceDocBuilderDev · 2024-01-27T17:56:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Huguet57 · 2024-01-31T00:34:02Z

Hey! This is great. Is this already in alpha?

beaufortfrancois · 2024-06-04T12:54:31Z

@xenova For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.

I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.

Essentially, knowing the model architecture and testing with empty weights would allow for assessing performance capability without downloading the full model.

I thought I could use from_config for that but I wonder now if this should be a built-in V3 feature. What are your thoughts?

xenova added 24 commits August 24, 2023 02:38

[version] Update to 3.0.0-alpha.0

129abcd

Fix SamImageProcessor

8c98446

Split SAM encoder and decoder

75ae57b

Merge branch 'main' into v3

4449caf

Only use necessary inputs for prompt_encoder_mask_decoder

2e08b74

Update to onnxruntime v1.16.1

08fef47

Binarize mask with Uint8Array data

ee24941

Allow for separate computation of reshaped input points

8913e9a

Update calculateDimensions typing

1acb635

Implement additional helper functions for RawImage

7e115dc

Implement tensor multiplication function

55066b1

Do padding after rescaling/normalizing

49e669a

Minor reformatting

1903673

Update allowed types for min and max functions

704d95d

Update compatibility for webgpu EP

66da130

Requires dynamic imports

Fix assignment issue

27fb0dc

Update dependency versions

32bcaf7

Merge branch 'main' into v3

4940e85

post-merge cleanup

aa57f00

Support webgpu build

f3509e5

Remove stream/web import

f8bc912

Merge branch 'main' into v3

97b48b3

Merge branch 'main' into v3

e772c41

Merge branch 'main' into v3

50feeb0

This was linked to issues Jan 27, 2024

[Feature request] WebGPU support #73

Open

[Feature request] Deno Support #78

Open

YOLOS model extremely slow #533

Open

xenova marked this pull request as draft January 27, 2024 18:02

Add tiny random tests for LLM/VLM PKV caching

dce3267

Technologicat mentioned this pull request May 20, 2024

Small improvements to RAG SillyTavern/SillyTavern#1871

Open

xenova mentioned this pull request May 23, 2024

[DO NOT MERGE] Code example for testing ORT-Web WebNN EP #608

Closed

xenova linked an issue May 30, 2024 that may be closed by this pull request

Support stopping criteria for sequence generation #786

Open

xenova added 21 commits May 30, 2024 13:30

Merge branch 'main' into v3

9a326eb

Fix SeamlessM4TFeatureExtractor

b0c8632

Update processors.test.js

5bef6d9

Upgrade onnxruntime-web and onnxruntime-node to 1.18.0

a140a3c

Implement divide tensor op

8fe841c

Create constants.js

c106ec9

Add support for whisper in v3

400b844

Fix matmul typo

43bd282

Only stream token if non-empty

2bd912f

Add tiny random whisper unit tests

14cadc1

Update wasmPaths URL

bb0a189

Add SuppressTokensAtBeginLogitsProcessor back

9647ba9

Fix ASR pipeline

9c48305

Support dtype, device, etc. in pipeline options

4a7ca0f

Update pipelines.js

b14af4c

Fix tokens number -> bigint

b57c3b7

TextStreamer - support custom callback function

7236245

Remove chunk_callback option in ASR pipeline

6cb7481

Implement WhisperTextStreamer helper class

812185f

Improve whisper tokenizer

1cfdf10

Upload source code for WebGPU-VLM demo

046b292

xenova mentioned this pull request Jun 3, 2024

Can I use Xenova/Phi-3-mini-4k-instruct model server side? #789

Closed

xenova added 2 commits June 4, 2024 01:30

Add vision-encoder-decoder support

49b80c1

Re-enable image-to-text pipeline

24d706d

Only set preferredOutputLocation if shapes are present

c9ef76c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

xenova commented Jan 27, 2024 •

edited

HuggingFaceDocBuilderDev commented Jan 27, 2024

Huguet57 commented Jan 31, 2024

beaufortfrancois commented Jun 4, 2024 •

edited

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Are you sure you want to change the base?

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Conversation

xenova commented Jan 27, 2024 • edited

How to use WebGPU

HuggingFaceDocBuilderDev commented Jan 27, 2024

Huguet57 commented Jan 31, 2024

beaufortfrancois commented Jun 4, 2024 • edited

xenova commented Jan 27, 2024 •

edited

beaufortfrancois commented Jun 4, 2024 •

edited