-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Azure VM Agent #2014
Comments
Guest OS metrics such as memory usage are only available for a given VM or VM Scale Set if the Azure diagnostic extension has been enabled in that VM or VMSS. Otherwise, such metrics cannot be viewed in the Metrics section of the Azure portal, and cannot be used in autoscale trigger rules, even if the VM is correctly sending metrics data; one of the reasons for this is that metrics data is stored in tables in a given storage account, and the Azure portal needs to know what storage account is being used and takes this information from the configuration settings of the diagnostic extension. |
This change adds a new "azure" klib that implements an Azure extension similar to the Linux Diagnostic extension. The current implementation supports sending 4 types of memory metrics (i.e. available and used memory, both as number of bytes as a percentage of total memory). This klib is configured in the manifest options via an "azure" tuple; the diagnostic functionalities are enabled and configured by inserting a "diagnostic" tuple with the folowing attributes: - storage_account: indicates the Azure storage account to be used to store metrics data generated by the klib; the storage account must be located in the same region as the region where the Azure instance is deployed - storage_account_sas: Shared Access Signature token for accessing the storage account: this token must have proper permissions to create and add entities to Azure storage tables in the above storage account; SAS tokens for a given storage account can be generated for example via the Azure portal in the "Security + networking" menu. - metrics: tuple that enables sending memory metrics; it can contain 2 optional attributes: - sample_interval: interval expressed in seconds at which metrics data is collected (default: 15) - transfer_interval: interval expressed in seconds at which metrics data is aggregated and sent to the storage account (default: 60) Example snippet of Ops configuration file: ``` "ManifestPassthrough": { "azure": { "diagnostics": { "storage_account": "mystorageaccount", "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz", "metrics": {"sample_interval": "15","transfer_interval": "60"} } } } ``` Aggregated memory metrics data consist of the number of samples, the minimum, maximum, last, and average value, and the sum of all samples; these data are insterted in an Azure storage table (one entity per aggregated data). The name of the table is in the format "WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer interval expressed with ISO8601 format, and yyyymmdd is the representation of the 10-day date interval to which the metrics refer (thus, a new table is created every 10 days). For example, a table named WADMetricsPT1MP10DV2S20240503 contains metric data aggregated every minute ("PT1M" is the ISO8601 representation of a 1-minute period) generated for a 10-day period starting on May 3, 2024. By default, the Azure portal doe not display these metrics in its charts; in order for metrics to be available in the portal, the Linux Diagnostics Extension must be enabled and configured in a running instance (this can be done in the "Diagnostic settings" section in the portal) to match the settings in the Nanod manifest options. More specifically, the storage account and the metric aggregation interval specified in the Azure diagnostic settings must match those specified in the manifest options. Note: the Azure VM agent implemented in the cloud_init klib responds to requests to enable and configure the diagnostic extension, but does not actually applies the extension settings specified in the requests; instead, it always applies the settings from the manifest. Closes #2014
This change adds a new "azure" klib that implements an Azure extension similar to the Linux Diagnostic extension. The current implementation supports sending 4 types of memory metrics (i.e. available and used memory, as both number of bytes and percentage of total memory). This klib is configured in the manifest options via an "azure" tuple; the diagnostic functionalities are enabled and configured by inserting a "diagnostic" tuple with the following attributes: - storage_account: indicates the Azure storage account to be used to store metrics data generated by the klib; the storage account must be located in the same region as the region where the Azure instance is deployed - storage_account_sas: Shared Access Signature token for accessing the storage account: this token must have proper permissions to create Azure storage tables and add table entities in the above storage account; SAS tokens for a given storage account can be generated for example via the Azure portal in the "Security + networking" menu. - metrics: tuple that enables sending memory metrics; it can contain 2 optional attributes: - sample_interval: interval expressed in seconds at which metrics data is collected (default: 15) - transfer_interval: interval expressed in seconds at which metrics data is aggregated and sent to the storage account (default: 60) Example snippet of Ops configuration file: ``` "ManifestPassthrough": { "azure": { "diagnostics": { "storage_account": "mystorageaccount", "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz", "metrics": {"sample_interval": "15","transfer_interval": "60"} } } } ``` Aggregated memory metrics data consist of the number of samples, the minimum, maximum, last, and average value, and the sum of all values; these data are insterted in an Azure storage table (one entity per aggregated data). The name of the table is in the format "WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer interval expressed with ISO8601 format, and yyyymmdd is a representation of the 10-day date interval to which the metrics refer (thus, a new table is created every 10 days). For example, a table named WADMetricsPT1MP10DV2S20240503 contains metrics data aggregated every minute ("PT1M" is the ISO8601 representation of a 1-minute period) generated for a 10-day period starting on May 3, 2024. By default, the Azure portal does not display these metrics in its charts; in order for metrics to be available in the portal, the Linux Diagnostics Extension must be enabled and configured in a running instance (this can be done in the "Diagnostic settings" section in the portal) to match the settings in the Nanos manifest options. More specifically, the storage account and the metric aggregation interval specified in the Azure diagnostic settings must match those specified in the manifest options. Note: the Azure VM agent implemented in the cloud_init klib responds to requests to enable and configure the diagnostic extension, but does not actually apply the extension settings specified in the requests; instead, it always applies the settings from the manifest. Closes #2014
#2022 implements an Azure VM agent and a diagnostic extension that publishes memory metrics. Please note that the extension is enabled and configured via manifest options when creating a Nanos image; it sends metrics regardless of whether the diagnostic extension is enabled in the Azure portal, but in order for these metrics to show up in the charts you need to enable the extension (in fact, without enabling the extension you cannot even select guest OS metrics in the charts). |
Awesome! Thanks! |
This change adds a new "azure" klib that implements an Azure extension similar to the Linux Diagnostic extension. The current implementation supports sending 4 types of memory metrics (i.e. available and used memory, as both number of bytes and percentage of total memory). This klib is configured in the manifest options via an "azure" tuple; the diagnostic functionalities are enabled and configured by inserting a "diagnostic" tuple with the following attributes: - storage_account: indicates the Azure storage account to be used to store metrics data generated by the klib; the storage account must be located in the same region as the region where the Azure instance is deployed - storage_account_sas: Shared Access Signature token for accessing the storage account: this token must have proper permissions to create Azure storage tables and add table entities in the above storage account; SAS tokens for a given storage account can be generated for example via the Azure portal in the "Security + networking" menu. - metrics: tuple that enables sending memory metrics; it can contain 2 optional attributes: - sample_interval: interval expressed in seconds at which metrics data is collected (default: 15) - transfer_interval: interval expressed in seconds at which metrics data is aggregated and sent to the storage account (default: 60) Example snippet of Ops configuration file: ``` "ManifestPassthrough": { "azure": { "diagnostics": { "storage_account": "mystorageaccount", "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz", "metrics": {"sample_interval": "15","transfer_interval": "60"} } } } ``` Aggregated memory metrics data consist of the number of samples, the minimum, maximum, last, and average value, and the sum of all values; these data are inserted in an Azure storage table (one entity per aggregated data). The name of the table is in the format "WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer interval expressed with ISO8601 format, and yyyymmdd is a representation of the 10-day date interval to which the metrics refer (thus, a new table is created every 10 days). For example, a table named WADMetricsPT1MP10DV2S20240503 contains metrics data aggregated every minute ("PT1M" is the ISO8601 representation of a 1-minute period) generated for a 10-day period starting on May 3, 2024. By default, the Azure portal does not display these metrics in its charts; in order for metrics to be available in the portal, the Linux Diagnostics Extension must be enabled and configured in a running instance (this can be done in the "Diagnostic settings" section in the portal) to match the settings in the Nanos manifest options. More specifically, the storage account and the metric aggregation interval specified in the Azure diagnostic settings must match those specified in the manifest options. Note: the Azure VM agent implemented in the cloud_init klib responds to requests to enable and configure the diagnostic extension, but does not actually apply the extension settings specified in the requests; instead, it always applies the settings from the manifest. Closes #2014
It's pretty cool that Nanos runs on Azure. It's the only unikernel able to do that so far.
I'd like to put it in a VM Scale Set with autoscaling based on cpu and memory usage, but memory metrics are only available with an agent.
Surely this is easier said than done, but now there's an issue on it.
Thanks for a great product!
The text was updated successfully, but these errors were encountered: