how to generate wide sparse features #73

kiminh · 2021-11-29T15:13:18Z

Hi,I'm confused about how to generate the wide sparse features. Here is my understanding: combine the multi field categorical features together and form the multi hot sparse feature. then the index is generated by hash value or simliar way like the labelencode way?

kiminh · 2021-11-29T15:17:00Z

I mean every single field categorical feature has its vocabulary, then multiple field categorical features have multiple vocabularies. then the vocabulary of the multi hot sparse feature is the union set of multiple vocabularies, and index the multiple field categorical feature. Or just use the hash way to index the categorical feature like string "field_name:categorical feature value", this way may have some conflicts but don't have to maintain the whole vocabulary.

StarWang · 2021-12-01T06:31:34Z

Hi @kiminh, I assume that your question is based on DeText-TF2. In DeText TF2, each sparse feature field (wide part) is a multi hot vector. This vector should be generated by user beforehand (e.g. hashing). The vocab size can be passed to DeText through nums_sparse_ftrs.

The vocab for each field is independent of each other. There's no correlation between them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to generate wide sparse features #73

how to generate wide sparse features #73

kiminh commented Nov 29, 2021 •

edited

kiminh commented Nov 29, 2021 •

edited

StarWang commented Dec 1, 2021

how to generate wide sparse features #73

how to generate wide sparse features #73

Comments

kiminh commented Nov 29, 2021 • edited

kiminh commented Nov 29, 2021 • edited

StarWang commented Dec 1, 2021

kiminh commented Nov 29, 2021 •

edited

kiminh commented Nov 29, 2021 •

edited