Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.39 KB

model-card.md

File metadata and controls

33 lines (24 loc) · 1.39 KB

Model Card: VALL-E X

Author: Songting.

This is the official codebase for running open-sourced VALL-E X.

The following is additional information about the models released here.

Model Details

VALL-E X is a series of two transformer models that turn text into audio.

Phoneme to acoustic tokens

  • Input: IPAs converted from input text by a rule-based G2P tool.
  • Output: tokens from the first codebook of the EnCodec Codec from facebook

Coarse to fine tokens

  • Input: IPAs converted from input text by a rule-based G2P tool & the first codebook from EnCodec
  • Output: 8 codebooks from EnCodec

Architecture

Model Parameters Attention Output Vocab size
G2P tool - - 69
Phoneme to coarse tokens 150 M Causal 1x 1,024
Coarse to fine tokens 150 M Non-causal 7x 1,024

Release date

August 2023

Broader Implications

We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages. Straightforward improvements will allow models to run faster than realtime, rendering them useful for applications such as virtual assistants.