-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
馃挕[feat] Track each task GPU utilization (and other information) and display it on the WebUI #5043
Comments
Hello, please check out our profiling feature: |
Hi @ioga, If I understand correctly, do the |
@tshu-w correct, today it needs to be a |
@tshu-w Note: You can setup Prometheus and Grafana to monitor the usage of all Determined workloads: https://docs.determined.ai/latest/integrations/prometheus/prometheus.html |
Thanks, @vishnu2kmohan, is it possible to monitor each GPU and know which task is running on it intuitively with Prometheus and Grafana? Our intent is to monitor GPU utilization per task to ensure that resources are not wasted. |
Hi, I also encountered this situation. So have you figured out is it works to use Prometheus and Grafana to monitor GPU utilization on every task? |
See https://docs.determined.ai/latest/integrations/prometheus/prometheus.html. We provide a pre-configured Grafana panel for monitoring hardware metrics including GPU utilization. We currently only provide preset filters for However, we do surface various container/task mappings through a Prometheus API endpoint ( |
Describe the problem
It would be very convenient for cluster managers and users if the WebUI could display system information for each task.
Here are screenshots from wandb:
Describe the solution you'd like
Users can view these chart on WebUI
Describe alternatives you've considered
No response
Additional context
cc @luyaojie
The text was updated successfully, but these errors were encountered: