Databricks – Developer Blog

Introduction

Problem

Like any other program, Azure Databricks notebooks should be tested automatically to ensure code quality.

Using standard Python Test Tools is not easy because these tools are based on Python files in a file system. And a notebook doesn’t correspond to a Python file.

Solution

To enable automated testing with unittest (documentation), we proceed as follows:

Create a test class that contains all the tests you want
Execution of all defined tests

Create Notebook with the Code

We will create a simple Notebook for our test.

This notebook will implement a simple calculator, so that we can test the basic calculator operations like add ad multiply.

Create a new Notebook with the name Calculator:

class Calculator:

	def __init__(self, x = 10, y = 8):
		self.x = x
		self.y = y
		
	def add(self, x = None, y = None):
		if x == None: x = self.x
		if y == None: y = self.y			
          
		return x+y

	def subtract(self, x = None, y = None):
		if x == None: x = self.x
		if y == None: y = self.y	
          
		return x-y

	def multiply(self, x = None, y = None):
		if x == None: x = self.x
		if y == None: y = self.y			
          
		return x*y

	def divide(self, x = None, y = None):
		if x == None: x = self.x
		if y == None: y = self.y			
          
		if y == 0:
			raise ValueError('cannot divide by zero')
		else:
			return x/y

The notebook should look like this

To use this class, write the following lines:

c = Calculator()
print(c.add(20, 10), c.subtract(20, 10), c.multiply(20, 10), c.divide(20, 10))

Create Notebook with the Tests

Create a new Notebook in the same folder with the name Calculator.Tests.

The name is not important, but it is convenient to name the test program like the program to be tested with the suffix ‘Tests’.

Create the first command to import the Calculator Notebook




Create the Test Class



import unittest

class CalculatorTests(unittest.TestCase):
  
  @classmethod
  def setUpClass(cls):
    cls.app = Calculator()

  def setUp(self):
    # print("this is setup for every method")
    pass

  def test_add(self):
    self.assertEqual(self.app.add(10,5), 15, )

  def test_subtract(self):
    self.assertEqual(self.app.subtract(10,5), 5)
    self.assertNotEqual(self.app.subtract(10,2), 4)

  def test_multiply(self):
    self.assertEqual(self.app.multiply(10,5), 50)

  def tearDown(self):
    # print("teardown for every method")
    pass

  @classmethod
  def tearDownClass(cls):
    # print("this is teardown class")
    pass
Create the code to run the tests
suite =  unittest.TestLoader().loadTestsFromTestCase(CalculatorTests)
unittest.TextTestRunner(verbosity=2).run(suite)

3Jun

Azure | Cookbook Databricks

by Ralph Azure, Databricks

Databricks CLI

Export all Notebooks

databricks workspace list | ForEach { databricks workspace export_dir /$_ $_ }

Troubleshooting
Problem

Error in SQL statement: AnalysisException: Can not create the managed table('`demo`'). The associated location('dbfs:/user/hive/warehouse/demo') already exists.;

Solution

dbutils.fs.rm("dbfs:/user/hive/warehouse/demo/", true)

29May

Azure | Working with Widgets

by Ralph Azure, Databricks, Widgets

TL;DR

Don’t want to read the post, then explore this Azure Notebook

Requirements

Define needed moduls and functions

from datetime import datetime

import pyspark.sql.functions as F

Create DataFrame for this post:

df = spark.sql("select * from diamonds")
df.show()

Working with Widgets

Default Widgets

dbutils.widgets.removeAll()

dbutils.widgets.text("W1", "1", "Text")
dbutils.widgets.combobox("W2", "3", [str(x) for x in range(1, 10)], "Combobox")
dbutils.widgets.dropdown("W3", "4", [str(x) for x in range(1, 10)], "Dropdown")

Multiselect Widgets

list = [ f"Square of {x} is {x*x}" for x in range(1, 10)]
dbutils.widgets.multiselect("W4", list[0], list, "Multi-Select")

Monitor the changes when selection values

print("Selection: ", dbutils.widgets.get("W4"))
print("Current Time =", datetime.now().strftime("











Filter Query by widgets



Prepare widgets



dbutils.widgets.removeAll()

df = spark.sql("select * from diamonds")

vals = [ str(x[0]) for x in df.select("cut").orderBy("cut").distinct().collect() ]
dbutils.widgets.dropdown("Cuts", vals[0], vals)

vals = [ str(x[0]) for x in df.select("carat").orderBy("carat").distinct().collect() ]
dbutils.widgets.dropdown("Carat", vals[0], vals)
Now, change some values
filter_cut = dbutils.widgets.get("Cuts")
df=spark.sql(f"select * from diamonds where cut='{filter_cut}'").show()

16Apr

Azure Databricks | Cookbook

by Ralph Azure, Cookbook, Databricks

Reading Data

Create Table from CSV file with SQL

DROP TABLE IF EXISTS quickstart;

CREATE TABLE quickstart
USING csv
OPTIONS (path "/databricks-datasets/data.csv", header "true")

Create Table from CSV file with PySpark

quickstart= spark.read.csv("/databricks-datasets/data.csv", header="true", inferSchema="true")

Analyse Data

Group and Display

from pyspark.sql.functions import avg

display(quickstart.select("color","price").groupBy("color").agg(avg("price")).sort("color"))

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Developer Blog

Introduction

Problem

Solution

Create Notebook with the Code

Create Notebook with the Tests

Databricks CLI

Export all Notebooks

TroubleshootingProblem

Solution

TL;DR

Requirements

Define needed moduls and functions

Create DataFrame for this post:

Working with Widgets

Default Widgets

Multiselect Widgets

Monitor the changes when selection values

Filter Query by widgets

Prepare widgets

Now, change some values

Reading Data

Create Table from CSV file with SQL

Create Table from CSV file with PySpark

Analyse Data

Group and Display

Troubleshooting
Problem