Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Difference between Embedding Training Cache and GPU Embedding Cache #424

Open
hsezhiyan opened this issue Oct 19, 2023 · 9 comments
Labels
question Further information is requested

Comments

@hsezhiyan
Copy link

What is the difference between the Embedding Training Cache (https://github.com/NVIDIA-Merlin/HugeCTR/tree/main/HugeCTR/src/embedding_training_cache) and the GPU Embedding Cache (https://github.com/NVIDIA-Merlin/HugeCTR/tree/main/gpu_cache)?

It appears as if the Embedding Training Cache is used only during training. Does it use the GPU Embedding Cache under the hood?

@hsezhiyan hsezhiyan added the question Further information is requested label Oct 19, 2023
@minseokl
Copy link
Collaborator

Hi @hsezhiyan

  • Yes the Embedding Training Cache (ETC) is a feature for training, which enables the use of embedding tables beyond the GPU memory capacity. It is not implemented based on the GPU Embedding Cache. Please also note that this feature is under deprecation.
  • The GPU Embedding Cache is mainly used by our inference use cases, through the Hierarchical Parameter Server (HPS). If you are interested in HPS, please checkout https://nvidia-merlin.github.io/HugeCTR/main/hierarchical_parameter_server/index.html

Thanks,
Minseok

@hsezhiyan
Copy link
Author

Thank you for the response @minseokl

In that case, will ETC (which is under deprecation) be replaced by GPU Embedding Cache for training cases? Because it looks like GPU Embedding Cache can be used for both inference and training

@yingcanw
Copy link
Collaborator

@hsezhiyan
The ETC will be be replaced by HierarchicalKV on the training using hierarchical memory. We actually have no plans to integrate the GPU embedding cache into training. In addition, we have completed the implementation of a new generation GPU embedding cache with with higher performance and will release it soon.

@sezhiyanhari
Copy link

sezhiyanhari commented Oct 31, 2023

Thank you for the answer @yingcanw! I'd like to ask a few followup questions:

  1. Are there any instructions on how to use HierarchicalKV during training? I can only find HugeCTR training examples using ETC.
  2. Is there an expected timeframe when the updated GPU embedding cache will be released?
  3. From a design perspective, why are different caching systems (ETC, GPU Embedding Cache) for training and inference? Was there a reason to not include a single caching system for both training and inference?

@sezhiyanhari
Copy link

@minseokl if you also have any insights, I would appreciate it!

@yingcanw
Copy link
Collaborator

yingcanw commented Nov 7, 2023

@sezhiyanhari Sorry for the late reply.
1.Here is the relevant API description about HKV. In addition, we have integrated HKV into sok and can conduct seamless training on the tf platform. @kanghui0204 will provide a more detailed introduction, if you have any questions about sok.
2. It is expected to be soon. If you currently only need the highest performance GPU embedding cache lookup, you can also use this version of the cache.
3. Because training and inference focus on different indicators in industrial cases. For example, the inference has very strict requirements on prediction latency. At the same time, the model also needs to be updated in real-time with high frequency, which requires the cache to provide high performance of concurrent read and write. However, synchronous training can separate cache R&W, and pipeline can be optimized through operations such as prefetching... Therefore, different cache systems need to be designed to meet the performance requirements of training and inference.

@lausannel
Copy link

lausannel commented Dec 22, 2023

@sezhiyanhari Sorry for the late reply. 1.Here is the relevant API description about HKV. In addition, we have integrated HKV into sok and can conduct seamless training on the tf platform. @kanghui0204 will provide a more detailed introduction, if you have any questions about sok. 2. It is expected to be soon. If you currently only need the highest performance GPU embedding cache lookup, you can also use this version of the cache. 3. Because training and inference focus on different indicators in industrial cases. For example, the inference has very strict requirements on prediction latency. At the same time, the model also needs to be updated in real-time with high frequency, which requires the cache to provide high performance of concurrent read and write. However, synchronous training can separate cache R&W, and pipeline can be optimized through operations such as prefetching... Therefore, different cache systems need to be designed to meet the performance requirements of training and inference.

Hi, could you provide an example script about training using HKV and SOK?

I am a little confused about how HKV could replace ETC because as far as I know, HKV is a single GPU key-value store. Could it eliminate the Parameter Server in ETC?

Any insights are appreciated.

@kanghui0204
Copy link
Collaborator

Hi @lausannel ,
here is an example of using SOK+HKV.
SOK+HKV example

HKV is a key-value store that uses GPU + CPU memory, where the memory for values can be stored either on the GPU or on the CPU.

HKV repo

@lausannel
Copy link

@kanghui0204 Thanks for your explaination!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants