# Category Databricks

## Introduction

### Problem

Like any other program, Azure Databricks notebooks should be tested automatically to ensure code quality.

Using standard Python Test Tools is not easy because these tools are based on Python files in a file system. And a notebook doesn’t correspond to a Python file.

### Solution

To enable automated testing with unittest (documentation), we proceed as follows:

• Create a test class that contains all the tests you want
• Execution of all defined tests

## Create Notebook with the Code

We will create a simple Notebook for our test.

This notebook will implement a simple calculator, so that we can test the basic calculator operations like add ad multiply.

Create a new Notebook with the name Calculator:

```class Calculator:

def __init__(self, x = 10, y = 8):
self.x = x
self.y = y

def add(self, x = None, y = None):
if x == None: x = self.x
if y == None: y = self.y

return x+y

def subtract(self, x = None, y = None):
if x == None: x = self.x
if y == None: y = self.y

return x-y

def multiply(self, x = None, y = None):
if x == None: x = self.x
if y == None: y = self.y

return x*y

def divide(self, x = None, y = None):
if x == None: x = self.x
if y == None: y = self.y

if y == 0:
raise ValueError('cannot divide by zero')
else:
return x/y```

The notebook should look like this

To use this class, write the following lines:

```c = Calculator()
print(c.add(20, 10), c.subtract(20, 10), c.multiply(20, 10), c.divide(20, 10))```

## Create Notebook with the Tests

Create a new Notebook in the same folder with the name Calculator.Tests.

The name is not important, but it is convenient to name the test program like the program to be tested with the suffix ‘Tests’.

Create the first command to import the Calculator Notebook

`%run "./Calculator"`

Create the Test Class

```import unittest

class CalculatorTests(unittest.TestCase):

@classmethod
def setUpClass(cls):
cls.app = Calculator()

def setUp(self):
# print("this is setup for every method")
pass

def test_subtract(self):
self.assertEqual(self.app.subtract(10,5), 5)
self.assertNotEqual(self.app.subtract(10,2), 4)

def test_multiply(self):
self.assertEqual(self.app.multiply(10,5), 50)

def tearDown(self):
# print("teardown for every method")
pass

@classmethod
def tearDownClass(cls):
# print("this is teardown class")
pass```

Create the code to run the tests

```suite =  unittest.TestLoader().loadTestsFromTestCase(CalculatorTests)
unittest.TextTestRunner(verbosity=2).run(suite)```

## Databricks CLI

#### Export all Notebooks

`databricks workspace list | ForEach { databricks workspace export_dir /\$_ \$_ }`

#### TroubleshootingProblem

`Error in SQL statement: AnalysisException: Can not create the managed table('`demo`'). The associated location('dbfs:/user/hive/warehouse/demo') already exists.;`

#### Solution

`dbutils.fs.rm("dbfs:/user/hive/warehouse/demo/", true)`

## TL;DR

Don’t want to read the post, then explore this Azure Notebook

## Requirements

### Define needed moduls and functions

```from datetime import datetime

import pyspark.sql.functions as F```

### Create DataFrame for this post:

```df = spark.sql("select * from diamonds")
df.show()```

## Working with Widgets

### Default Widgets

```dbutils.widgets.removeAll()

dbutils.widgets.text("W1", "1", "Text")
dbutils.widgets.combobox("W2", "3", [str(x) for x in range(1, 10)], "Combobox")
dbutils.widgets.dropdown("W3", "4", [str(x) for x in range(1, 10)], "Dropdown")```

### Multiselect Widgets

```list = [ f"Square of {x} is {x*x}" for x in range(1, 10)]
dbutils.widgets.multiselect("W4", list, list, "Multi-Select")```

### Monitor the changes when selection values

```print("Selection: ", dbutils.widgets.get("W4"))
print("Current Time =", datetime.now().strftime("%H:%M:%S"))```

## Filter Query by widgets

### Prepare widgets

```dbutils.widgets.removeAll()

df = spark.sql("select * from diamonds")

vals = [ str(x) for x in df.select("cut").orderBy("cut").distinct().collect() ]
dbutils.widgets.dropdown("Cuts", vals, vals)

vals = [ str(x) for x in df.select("carat").orderBy("carat").distinct().collect() ]
dbutils.widgets.dropdown("Carat", vals, vals)```

### Now, change some values

```filter_cut = dbutils.widgets.get("Cuts")
df=spark.sql(f"select * from diamonds where cut='{filter_cut}'").show()```

### Create Table from CSV file with SQL

```DROP TABLE IF EXISTS quickstart;

CREATE TABLE quickstart
USING csv

### Create Table from CSV file with PySpark

```%python
```%python