Databricks Unity Catalog Volume
You can configure the Databricks Unity Catalog Volume on the Litmus Edge webUI to send files directly to the Databricks Unity Catalog (supports AWS, Azure, and GCP).
Make sure you have the following:
- Access to Databricks workspace URL and access token.
Follow the steps to Connect a Device. The device will be used to store tags that will be eventually used to create outbound topics in the connector. Make sure to select the Enable Data Store checkbox.
After connecting the device in Litmus Edge, you can Add Tags to the device. Create tags that you want to use to create outbound topics for the connector.
To add the cloud storage sync job:
- Navigate to System > External Storage.
Click Add sync job. The Add Cloud Sync Job dialog box appears.
- From the Add Cloud Sync Job dialog box, enter the following details:
- Name: Enter a name for the cloud sync job.
Provider: Select Databricks Unity Catalog Volume provider from the drop-down list.
Note: It is recommended to review the Run your first ETL workload on Databricks | Databricks on AWS guide before starting this section.
To configure the Databricks Unity Catalog Volume:
- From the Add Cloud Sync Job dialog box, enter the following details:
- Name: Enter a friendly user defined name.
- Workspace URL: Enter the URL of your Databricks workspace. See Get identifiers for workspace objects | Databricks on AWS for more details.
- Access Token: Copy and paste the access token from your Databricks account. See Databricks SQL Driver for Go | Databricks on AWS for more details.
- Source: Enter the source path from where the files will be copied.
- Destination: Enter the path of the remote destination. For this scenario, it is the path for your unity catalog volume on Databricks which must be created prior to setting your Litmus Edge. See Create and work with volumes | Databricks on AWS for more details.
Transfer mode: Select Copy to ensure files are copied from the source to the destination.
Click Save.
Click the toggle button to enable the storage sync job and start transferring files from the source to the destination.
Once a successful connection is established, the status changes to connected from transferring.
To verify the files in the Databricks Unity Catalog Volume:
- Go to your Databricks Unity Catalog Volume workspace.
- Refresh the page to see the newly uploaded files.
Confirm that the test.csv has been uploaded successfully.
This notebook will use the data transferred to the Databricks Unity Catalog Volume. Follow these steps to create and run a Delta Live Tables (DLT) pipeline:
1. Import the necessary libraries and dependencies.
2. Define the file path, table name, and DLT configuration variables. See Create a Delta Live Tables materialized view for more details.
3. After defining the variables, create and publish a pipeline. See Create and publish a pipeline guide for detailed steps.
4. Schedule the pipeline to run at desired intervals. See Schedule the pipeline guide for detailed steps.
5. Monitor the pipeline to ensure data is being processed as expected. You can query the table created in the Unity Catalog Volume to confirm that the data is correctly ingested.