How-To Guides
...
Integration Guides
Databricks Integration

Databricks Unity Catalog Volume

9min

You can configure the Databricks Unity Catalog Volume on the Litmus Edge webUI to send files directly to the Databricks Unity Catalog (supports AWS, Azure, and GCP).

Before You Begin

Make sure you have the following:

Step 1: Add Device

Follow the steps to Connect a Device. The device will be used to store tags that will be eventually used to create outbound topics in the connector. Make sure to select the Enable Data Store checkbox.

Step 2: Add Tags

After connecting the device in Litmus Edge, you can Add Tags to the device. Create tags that you want to use to create outbound topics for the connector.

Step 3: Add Cloud Sync Job

To add the cloud storage sync job:

  1. Navigate to System > External Storage.
  2. Click Add sync job. The Add Cloud Sync Job dialog box appears.

    External Storage page in Litmus Edge
    External Storage page in Litmus Edge
    
  3. From the Add Cloud Sync Job dialog box, enter the following details:
    • Name: Enter a name for the cloud sync job.
    • Provider: Select Databricks Unity Catalog Volume provider from the drop-down list.

      Add Cloud Sync Job dialog box
      Add Cloud Sync Job dialog box
      

Step 4: Configure Databricks Unity Catalog Volume

Note: It is recommended to review the Run your first ETL workload on Databricks | Databricks on AWS guide before starting this section.

To configure the Databricks Unity Catalog Volume:

  1. From the Add Cloud Sync Job dialog box, enter the following details:
    • Name: Enter a friendly user defined name.
    • Workspace URL: Enter the URL of your Databricks workspace. See Get identifiers for workspace objects | Databricks on AWS for more details.
    • Access Token: Copy and paste the access token from your Databricks account. See Databricks SQL Driver for Go | Databricks on AWS for more details.
    • Source: Enter the source path from where the files will be copied.
    • Destination: Enter the path of the remote destination. For this scenario, it is the path for your unity catalog volume on Databricks which must be created prior to setting your Litmus Edge. See Create and work with volumes | Databricks on AWS for more details.
    • Transfer mode: Select Copy to ensure files are copied from the source to the destination.

      Databricks workspace environment
      Databricks workspace environment
      
  2. Click Save.

    Add Cloud Sync Job dialog box
    Add Cloud Sync Job dialog box
    

Note: To generate CSV, JSON, or Parquet files for syncing with the Databricks Unity Catalog, you can utilize the File Reading Processor in Litmus Edge.

Step 5: Enable the Cloud Storage Sync

Click the toggle button to enable the storage sync job and start transferring files from the source to the destination.

Once a successful connection is established, the status changes to connected from transferring.

Cloud Storage Sync pane
Cloud Storage Sync pane


Step 6: Confirm Transfer Completion

To verify the files in the Databricks Unity Catalog Volume:

  1. Go to your Databricks Unity Catalog Volume workspace.
  2. Refresh the page to see the newly uploaded files.
  3. Confirm that the test.csv has been uploaded successfully.

    Databricks Unity Catalog Volume
    Databricks Unity Catalog Volume
    

Example Notebook

This notebook will use the data transferred to the Databricks Unity Catalog Volume. Follow these steps to create and run a Delta Live Tables (DLT) pipeline:

1. Import the necessary libraries and dependencies.

Python


2. Define the file path, table name, and DLT configuration variables. See Create a Delta Live Tables materialized view for more details.

Python


3. After defining the variables, create and publish a pipeline. See Create and publish a pipeline guide for detailed steps.

4. Schedule the pipeline to run at desired intervals. See Schedule the pipeline guide for detailed steps.

5. Monitor the pipeline to ensure data is being processed as expected. You can query the table created in the Unity Catalog Volume to confirm that the data is correctly ingested.

Dataframe table
Dataframe table