Welcome to the New Relic Integration for Databricks! This repository provides scripts and instructions for integrating New Relic with Databricks through New Relic Datbaricks Integration or through other Open Source tools like OpenTelemetry and Prometheus.
This repository provides you various ways to utilize New Relic directly from your Databricks environment. With options to monitor via standalone integration, OpenTelemetry (OTel), or Prometheus, you flexibly choose what best fits your operational needs.
-
New Relic Databricks Integration: A direct connection between Databricks and New Relic, enabling seamless data transfer and analysis capabilities. This integration supports
spark metrics
,databricks queries metrics
,databricks job runs events
. This integration along with New Relic APM agent can pulllogs
, and cluster performance related data as well. -
OpenTelemetry (OTel) Integration: An open-source observability framework, enabling you to generate and manage telemetry data, supports
spark metrics
from Databricks. Please follow the instructions here for a detailed guide on how to add initialization scripts for OpenTelemetry to your Databricks cluster. -
Prometheus Integration: A powerful open-source systems monitoring and alerting toolkit which can process metrics from Databricks. Support
spark metrics
from Databricks. Please follow the instructions here for a detailed guide on how to add initialization scripts to your Databricks cluster.
Pick the option that suits your use-case and follow the associated guide to get started.
The Standalone environment runs the data pipelines as an independant service, either on-premises or cloud instances like AWS EC2. It can run on Linux, macOS, Windows, and any OS with support for GoLang.
- Go 1.20 or later.
Open a terminal, CD to cmd/standalone
, and run:
$ go build
The standalone environment requieres a YAML file for pipeline configuration. The requiered keys are:
interval
: Integer. Time in seconds between requests.
Check config/example_config.yaml
for a configuration example.
nr_account_id
: String. Account ID.nr_api_key
: String. Api key for writing.nr_endpoint
: String. New Relic endpoint region. EitherUS
orEU
. Optional, default value isUS
.
Just run the following command from the build folder:
$ ./standalone path/to/config.yaml
To run the pipeline on system start, check your specific system init documentation.
Databricks Initialization Scripts are shell scripts that run when a cluster is starting. They are useful for setting up custom configurations or third-party integrations such as setting up monitoring agents. Here is how you add an init script to Databricks.
Based on the cloud Databricks is hosted on, you will be able to run the APM agent.
-
Add script to Databricks: Create new file in workspace as nr-agent-installation.sh and add the below script to it.
#!/bin/bash # Define the newrelic version and jar path NR_VERSION="8.10.0" # replace with the version you want NR_JAR_PATH="/databricks/jars/newrelic-agent-${NR_VERSION}.jar" NR_CONFIG_FILE="/databricks/jars/newrelic.yml" # Download the newrelic java agent curl -o ${NR_JAR_PATH} -L https://download.newrelic.com/newrelic/java-agent/newrelic-agent/${NR_VERSION}/newrelic-agent-${NR_VERSION}.jar # Create new relic yml file echo "common: &default_settings license_key: 'xxxxxx' # Replace with your License Key agent_enabled: true production: <<: *default_settings app_name: Databricks" > ${NR_CONFIG_FILE}
-
Add the script to your Databricks cluster: To add the initialization script to your cluster in Databricks, follow these steps:
- Navigate to your Databricks workspace and go to the
Clusters
page. - Choose the cluster you want to add the script to and click
Edit
. - In the
Advanced Options
section, find theInit Scripts
field. - Click on
Add
, then in the Script Path input, select workspace or cloud storage path where your script is stored. - Click
Confirm
and thenUpdate
.
- Navigate to your Databricks workspace and go to the
-
Add Spark configurations to attach the java agent:
- Navigate to your cluster
Advanced Options
, thenSpark
. - Add or update Spark configurations as key-value pairs. Here's an example:
# Example jar path "/databricks/jars/newrelic-agent-8.10.0.jar" echo "spark.driver.extraJavaOptions -javaagent:${NR_JAR_PATH}" echo "spark.executor.extraJavaOptions -javaagent:${NR_JAR_PATH}"
- Navigate to your cluster
-
Verify the script was executed: After your cluster starts/restarts, you should verify that the script was executed successfully. You can do this by checking the cluster logs via the
Logs
tab on your clusters page.
Note: Any changes to the script settings will apply only to new clusters or when existing clusters are restarted.
New Relic hosts and moderates an online forum where customers can interact with New Relic employees as well as other customers to get help and share best practices. If you're running into a problem, please raise an issue on this repository and we will try to help you ASAP. Please bear in mind this is an open source project and hence it isn't directly supported by New Relic.
We encourage your contributions to improve New relic Databricks Integration! Keep in mind that when you submit your pull request, you'll need to sign the CLA via the click-through using CLA-Assistant. You only have to sign the CLA one time per project.
If you have any questions, or to execute our corporate CLA (which is required if your contribution is on behalf of a company), drop us an email at [email protected].
A note about vulnerabilities
As noted in our security policy, New Relic is committed to the privacy and security of our customers and their data. We believe that providing coordinated disclosure by security researchers and engaging with the security community are important means to achieve our security goals.
If you believe you have found a security vulnerability in this project or any of New Relic's products or websites, we welcome and greatly appreciate you reporting it to New Relic through our bug bounty program.
If you would like to contribute to this project, review these guidelines.
New Relic Databricks Integration is licensed under the Apache 2.0 License.