16. April 2020
Azure Databricks | Cookbook
Reading Data
Create Table from CSV file with SQL
<pre class="EnlighterJSRAW" data-enlighter-group="" data-enlighter-highlight="" data-enlighter-language="sql" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-theme="" data-enlighter-title="">DROP TABLE IF EXISTS quickstart;
CREATE TABLE quickstart
USING csv
OPTIONS (path "/databricks-datasets/data.csv", header "true")
Create Table from CSV file with PySpark
<pre class="EnlighterJSRAW" data-enlighter-group="" data-enlighter-highlight="" data-enlighter-language="generic" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-theme="" data-enlighter-title="">%python
quickstart= spark.read.csv("/databricks-datasets/data.csv", header="true", inferSchema="true")
Analyse Data
Group and Display
<pre class="EnlighterJSRAW" data-enlighter-group="" data-enlighter-highlight="" data-enlighter-language="generic" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-theme="" data-enlighter-title="">%python
from pyspark.sql.functions import avg
display(quickstart.select("color","price").groupBy("color").agg(avg("price")).sort("color"))