Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to generate wide sparse features #73

Open
kiminh opened this issue Nov 29, 2021 · 2 comments
Open

how to generate wide sparse features #73

kiminh opened this issue Nov 29, 2021 · 2 comments

Comments

@kiminh
Copy link

kiminh commented Nov 29, 2021

Hi,I'm confused about how to generate the wide sparse features. Here is my understanding: combine the multi field categorical features together and form the multi hot sparse feature. then the index is generated by hash value or simliar way like the labelencode way?

@kiminh
Copy link
Author

kiminh commented Nov 29, 2021

I mean every single field categorical feature has its vocabulary, then multiple field categorical features have multiple vocabularies. then the vocabulary of the multi hot sparse feature is the union set of multiple vocabularies, and index the multiple field categorical feature. Or just use the hash way to index the categorical feature like string "field_name:categorical feature value", this way may have some conflicts but don't have to maintain the whole vocabulary.

@StarWang
Copy link
Contributor

StarWang commented Dec 1, 2021

Hi @kiminh, I assume that your question is based on DeText-TF2. In DeText TF2, each sparse feature field (wide part) is a multi hot vector. This vector should be generated by user beforehand (e.g. hashing). The vocab size can be passed to DeText through nums_sparse_ftrs.

The vocab for each field is independent of each other. There's no correlation between them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants