Databricks Key Azure Integrations
Databricks offers several integrations with Azure to provide a seamless and powerful data analytics and machine learning environment. These integrations leverage the capabilities of Azure services to enhance data engineering, data science, and machine learning workflows.
Here are the major Databricks integrations with Azure:
Azure Databricks service – Azure Databricks itself is a managed Apache Spark and data analytics platform that is tightly integrated with Azure. It provides a collaborative environment for data engineers and data scientists to work together on big data and machine learning projects.
Azure blob storage – Databricks can seamlessly integrate with Azure Blob Storage, making it easy to access and process data stored in Azure Data Lake Storage or Azure Blob Storage containers. This integration allows you to read and write data efficiently, enhancing data engineering workflows.
Azure machine learning – Databricks can integrate with Azure Machine Learning services, allowing data scientists to train and deploy machine learning models using Databricks clusters and then easily deploy them to Azure for production use.
Azure monitor and Azure log analytics – Databricks can integrate with Azure Monitor and Azure Log Analytics to provide monitoring, logging, and diagnostic capabilities for your Databricks workloads. This integration helps in performance tuning and troubleshooting.
Azure active directory – Single Sign-On with Azure Active Directory is the best way to sign in to Azure Databricks. Azure Databricks also supports automated user provisioning with Azure AD to create new users, give them the proper level of access, and remove users to deprovision access.
Azure data lake storage – The Azure Databricks native connector to ADLS supports multiple methods of access to your data lake. Simplify data access security by using the same Azure AD identity that you use to log into Azure Databricks with Azure Active Directory Credential Passthrough. Your data access is controlled via the ADLS roles and Access Control Lists you have already set up.
Azure data factory – Seamlessly run Azure Databricks jobs using Azure Data Factory and leverage 90+ built-in data source connectors to ingest all your data sources into a single data lake. ADF provides built-in workflow control, data transformation, pipeline scheduling, data integration, and many more capabilities to help you create reliable data pipelines.
Azure synapse analytics – Azure Databricks integrates with Azure services to bring analytics, business intelligence (BI), and data science together in Microsoft’s build web and mobile applications. The high-performance connector between Azure Databricks and Azure Synapse enables fast data transfer between the services, including support for streaming data.
Power BI – One of the key features customers look for when adopting a Lakehouse strategy is the ability to efficiently and securely consume data directly from the data lake with BI tools. This typically reduces the additional latency, compute, and storage costs associated with the traditional flow of copying data already stored in a data lake to a data warehouse for BI consumption. The Azure Databricks connector in Power BI makes for a more secure, more interactive data visualization experience for data stored in your data lake.
Azure DevOps – Azure Databricks connects with Azure DevOps to help enable Continuous Integration and Continuous Deployment (CI/CD). Configure Azure DevOps as your Git provider and take advantage of the integrated version control features.
Azure virtual network – The default deployment of Azure Databricks is a fully managed service on Azure that includes a virtual network (VNet). Azure Databricks also supports deployment in your own virtual network (sometimes called VNet injection) that enables full control of network security rules.
Azure event hubs – Get insights from live streaming data by connecting Azure Event Hubs to Azure Databricks, then process messages as they arrive. With Event Hubs and Azure Databricks, stream millions of events per second from any IoT device, or logs from website clickstreams, and process it in near-real time.
Azure key vault – Manage your secrets such as keys and passwords with integration to Azure Key Vault. By default, all Azure Databricks notebooks and results are encrypted at rest with a different encryption key. If you want to own and manage the key used for encrypting your notebooks and results yourself, you can bring your own key (BYOK).
Azure confidential computing – Customers can run their Azure Databricks workloads on Azure confidential virtual machines (VMs). With support for Azure confidential computing, customers can build an end-to-end data platform on the Databricks Lakehouse with increased confidentiality and privacy by encrypting data in use. This builds on support for customer-managed keys (CMK) for encrypting data at rest.