Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Is there pipeline mechanism to help the lookup requests always be handled on device cache in HugeCTR? #437

Open
Lifann opened this issue Dec 21, 2023 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@Lifann
Copy link

Lifann commented Dec 21, 2023

Background

In the recommender system training, the user/item/history feature can be super large in production. Considering HPS as a multi-level cache, it can well store large sparse parameters, with cost of cache missing. If the number of keys in one query grow too large for the HPS, the delay may significantly increase.

Question

Is there N-step pipeline mechanism in HugeCTR to make the device cache always hold the next M-step lookup results?

Example

Assuming in current step N = 100, the hps has status: Device:
[0,2,4,6], Host: [0,1,2,3,4], Disk: [0,1,2,3,4,5,6,7,8,9,10,100]

Now a lookup request [0,1,2,3,100,500] comes. Then device will miss [1,3,100,500], host will miss [100,500], disk will miss [500].

The delay will grow while the cache-missing rate raise.

In replace, if there is a mechanism to make device predict and hold [0,1,2,3,100,500], before the lookup requests come, actively or passively, then all no cache-missing will happen. Which can reduce the delay obviously.

If M = 10, then hps was told to prefetch [0,1,2,3,100,500] in step N - M = 90.

@Lifann Lifann added the question Further information is requested label Dec 21, 2023
@Lifann Lifann changed the title [Question] Is there pipeline mechanism to help the lookup requests always be blocked on device cache in HugeCTR? [Question] Is there pipeline mechanism to help the lookup requests always be handled on device cache in HugeCTR? Dec 21, 2023
@yingcanw
Copy link
Collaborator

@Lifann Thanks for your feedback. In order to understand your question more accurately. Let me first make it clear that the current GPU cache in HPS is only used for inference cases of recommender system, so the prefetching mechanism you described is difficult to implement in high-concurrency inference scenarios. However, we have implemented a high-performance lock-free GPU cache for inference to support concurrent lookup&insertion, which will be released in the near future.

Then in the training cases, we have implemented the prefetching mechanism you suggested in the ETC (Embedding Training Cache). Regarding the difference between ETC and HPS, you can refer to the #424.

However, we have deprecated the ETC, which will be replaced by HierarchicalKV on the training using hierarchical memory. In addition, we have integrated HKV into sok and can conduct seamless training on the tf platform. For how to use sok and HKV, you can also refer to the examples provided by #424

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants