March 2020 – Developer Blog

PySpark | Cookbook

by Ralph Apache Spark, Cookbook, PySpark

Websites

The Blaze Ecosystem (Blaze)
Dask: Flexible library for parallel computing in Python.
DataShape: Data layout language for array programming.
Odo: Shapeshifting for your data
It efficiently migrates data from the source to the target through a network of conversions.

Reading Textfiles

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Demo") \
    .config("spark.demo.config.option", "demo-value") \
    .getOrCreate()

df = spark.read.csv("input.csv",header=True,sep="|");

from pyspark import SparkContext
from pyspark.sql import SQLContext
import pandas as pd

sc = SparkContext('local','example') 
sql_sc = SQLContext(sc)

p_df= pd.read_csv('file.csv')  # assuming the file contains a header
# p_df = pd.read_csv('file.csv', names = ['column 1','column 2']) # if no header
s_df = sql_sc.createDataFrame(pandas_df)

sc.textFile("file.csv") \
    .map(lambda line: line.split(",")) \
    .filter(lambda line: len(line)>1) \
    .map(lambda line: (line[0],line[1])) \
    .collect()

sc.textFile("input.csv") \
    .map(lambda line: line.split(",")) \
    .filter(lambda line: len(line)<=1) \
    .collect()

spark.read.csv(
    "input.csv", header=True, mode="DROPMALFORMED", schema=schema
)

(spark.read
    .schema(schema)
    .option("header", "true")
    .option("mode", "DROPMALFORMED")
    .csv("input.csv"))

Read CSV file with known structure

from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import DoubleType, IntegerType, StringType

schema = StructType([
    StructField("A", IntegerType()),
    StructField("B", DoubleType()),
    StructField("C", StringType())
])

(sqlContext
    .read
    .format("com.databricks.spark.csv")
    .schema(schema)
    .option("header", "true")
    .option("mode", "DROPMALFORMED")
    .load("input.csv"))

2Mar

Jenkins | Build and Deploy a Groovy App

by Ralph CI/CD, Groovy, Jenkins

Introduction

Using Jenkins as an automation server for your development, you can automate such repeating tasks as testing and deploying your app.

Starting with a sample Groovy App (a simple calculator) with tests, you will learn how to integrate your app in Jenkins and build a pipeline, so that Jenkins runs the desired tasks every time, you change the code.

Prepare the sources

Clone the sample repository from Github.

You should clone the demo repository into you demo account, because you may change some file during this post., and you will not get write permissions for the demo repository.

Also, clone the repository to your local machine to see what our demo app looks like.

$ git clone https://github.com/jenkins-toolbox/SampleApp_GroovyCalculator Cloning into 'SampleApp_GroovyCalculator'... remote: Enumerating objects: 194, done. remote: Counting objects: 100 remote: Compressing objects: 100 remote: Total 194 (delta 44), reused 137 (delta 23), pack-reused 0 Receiving objects: 100 Resolving deltas: 100

Go into the new create folder

$ cd SampleApp_GroovyCalculator/
$ ls
Jenkinsfile      README.md        bin              build.gradle     gradlew          src
Makefile         SampleCalculator build            gradle           settings.gradle

The first task, Jenkins will do in our pipeline: build your app

$ ./gradlew build

Because it’s the first time you start gradlew, the required software will be downloaded:

First: the current Gradle Version (Gradle is the Build Tool used by Groovy Projects)

Downloading https://services.gradle.org/distributions/gradle-6.2.1-bin.zip
………10

Welcome to Gradle 6.2.1!

Here are the highlights of this release:
 - Dependency checksum and signature verification
 - Shareable read-only dependency cache
 - Documentation links in deprecation messages

For more details see https://docs.gradle.org/6.2.1/release-notes.html

Starting a Gradle Daemon, 2 stopped Daemons could not be reused, use --status for details

After this, your app will be tested

> Task :test

Calculator02Spec > two plus two should equal four PASSED

Calculator01Spec > add: 2 + 3 PASSED

Calculator01Spec > subtract: 4 - 3 PASSED

Calculator01Spec > multiply: 2 * 3 PASSED

BUILD SUCCESSFUL in 34s
5 actionable tasks: 5 executed

Perform the build again

No download is required. The build is much quicker.

$ ./gradlew build

BUILD SUCCESSFUL in 1s
5 actionable tasks: 5 up-to-date

Now, test our app:

./gradlew clean test

> Task :test

Calculator02Spec > two plus two should equal four PASSED

Calculator01Spec > add: 2 + 3 PASSED

Calculator01Spec > subtract: 4 - 3 PASSED

Calculator01Spec > multiply: 2 * 3 PASSED

BUILD SUCCESSFUL in 4s
5 actionable tasks: 5 executed

Create a Jenkins Pipeline

Start by clicking on the BlueOcean menu item.

Hint: Blue Ocean is not installed with the default Jenkins installation.

You have to install the corresponding Plugins.

Select Manage Jenkins → Manage Plugins.

Then, select the tab Available and enter in the Filter box: Blue Ocean.

Install all plugins, that will be listed.

Next: Click on the New Pipeline to create your first Pipeline

Use the Item GitHub to specify, where our code is stored

Next, use your GitHub account.

Be sure, that you cloned the demo repository

Next, we select the demo repository SampleApp_GroovyCalculator

Click on Create Pipeline and after a few seconds, the pipeline is created.

Immediately after creating the pipeline, Jenkins is starting the pipeline and all steps included.

If everything went well, you see a positive status

Now, click on the pipeline (e.g. the text master or the status icon) and you will see the pipeline with all steps and their corresponding state.

If you, want to edit the pipeline, for example to add another step, like on the pencil in the header.

Click on Cancel to leave the Pipeline editor.

Hint: If you click on Save, all changes are pushed back to the repository and Jenkins starts the Pipeline again.

Run the Pipeline

If you want to run your pipeline, click on the rerun icon for your pipeline

2Mar

Jenkins | Cookbook

by Ralph CI/CD, Cookbook, Jenkins

Working with VS Code

Validate Jenkins File

Install VS Code Plugin Jenkins Pipeline Linter Connector

Add configuration in .vscode/settings.json

"jenkins.pipeline.linter.connector.crumbUrl": "<JENKINS_URL>/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,
"jenkins.pipeline.linter.connector.user": "<USERNAME>",
"jenkins.pipeline.linter.connector.pass": "<PASSWORD>",
"jenkins.pipeline.linter.connector.url": "<JENKINS_URL>/pipeline-model-converter/validate",

Replace <USERNAME>, <PASSWORD> and <JENKINS_URL> with your values, for example

"jenkins.pipeline.linter.connector.crumbUrl": "http://localhost:8080/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,%22:%22,//crumb)",
"jenkins.pipeline.linter.connector.user": "admin",
"jenkins.pipeline.linter.connector.pass": "secret",
"jenkins.pipeline.linter.connector.url": "http://localhost:8080/pipeline-model-converter/validate",

Working with Jenkins Client (CLI)

Download Client

wget localhost:8080/jnlpJars/jenkins-cli.jar

Working with Plugins

Create aPlugin

mkdir SamplePlugin
cd SamplePlugin

mvn -U archetype:generate -Dfilter="io.jenkins.archetypes:"

mvn -U archetype:generate -Dfilter="io.jenkins.archetypes:global-configuration-plugin"
[INFO] Scanning for projects...
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-metadata.xml
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/mojo/maven-metadata.xml
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-metadata.xml (14 kB at 32 kB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/mojo/maven-metadata.xml (20 kB at 44 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-archetype-plugin/maven-metadata.xml
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-archetype-plugin/maven-metadata.xml (918 B at 18 kB/s)
[INFO]
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:3.1.2:generate (default-cli) > generate-sources @ standalone-pom >>>
[INFO]
[INFO] <<< maven-archetype-plugin:3.1.2:generate (default-cli) < generate-sources @ standalone-pom <<<
[INFO]
[INFO]
[INFO] --- maven-archetype-plugin:3.1.2:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)

Choose archetype:
1: remote -> io.jenkins.archetypes:global-configuration-plugin (Skeleton of a Jenkins plugin with a POM and an example piece of global configuration.)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : 1
Choose io.jenkins.archetypes:global-configuration-plugin version:
1: 1.2
2: 1.3
3: 1.4
4: 1.5
5: 1.6
Choose a number: 5:
[INFO] Using property: groupId = unused
Define value for property 'artifactId': com.examples.jenkins.plugins
Define value for property 'version' 1.0-SNAPSHOT: :
[INFO] Using property: package = io.jenkins.plugins.sample
Confirm properties configuration:
groupId: unused
artifactId: com.examples.jenkins.plugins
version: 1.0-SNAPSHOT
package: io.jenkins.plugins.sample
 Y: : y

[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: global-configuration-plugin:1.6
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: unused
[INFO] Parameter: artifactId, Value: com.examples.jenkins.plugins
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: package, Value: io.jenkins.plugins.sample
[INFO] Parameter: packageInPathFormat, Value: io/jenkins/plugins/sample
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: package, Value: io.jenkins.plugins.sample
[INFO] Parameter: groupId, Value: unused
[INFO] Parameter: artifactId, Value: com.examples.jenkins.plugins
[INFO] Project created from Archetype in dir: /Users/Shared/CLOUD/Kunde.BSH/workspace/SamplePlugin_Config/com.examples.jenkins.plugins
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  45.525 s
[INFO] Finished at: 2020-03-01T17:28:27+01:00
[INFO] ------------------------------------------------------------------------

Verify Plugin

 cd com.examples.jenkins.plugins
mvn verify

Run Plugin

mvn hpi:run

Working with Groovy Scripts

Include a common groovy script in Jenkins file

1: Create a common.groovy file with function as needed

def mycommoncode() {
}

2: In the main Jenkinfile load the file and use the function as shown below

node{ 
   def common = load “common.groovy”
   common.mycommoncode()
}

Basic example of Loading Groovy scripts

File example.groovy

def example1() {
 println 'Hello from example1' 
}
def example2() {
 println 'Hello from example2'
}

The example.groovy script defines example1 and example2 functions before ending with return this. Note that return this is definitely required and one common mistake is to forget ending the Groovy script with it.Jenkinsfile

def code node('java-agent') { 
    stage('Checkout') { checkout scm } 
    stage('Load') { code = load 'example.groovy' } 
    stage('Execute') { code.example1() }
} 

code.example2()

Processing Github JSON from Groovy

In this demo, we first show how to process JSON response from Github API in Groovy.Processing JSON from Github

String username = System.getenv('GITHUB_USERNAME') 
String password = System.getenv('GITHUB_PASSWORD') 
String GITHUB_API = 'https://api.github.com/repos' String repo = 'groovy' 
String PR_ID = '2' // Pull request ID 
String url = "${GITHUB_API}/${username}/${repo}/pulls/${PR_ID}" 

println "Querying ${url}" 

def text = url.toURL().getText(requestProperties: ['Authorization': "token ${password}"]) 
def json = new JsonSlurper().parseText(text) 

def bodyText = json.body // Check if Pull Request body has certain text if ( bodyText.find('Safari') ) {
     println 'Found Safari user' }

The equivalent bash command for retrieving JSON response from Github API is as follows:Equivalent bash command

// Groovy formatted string 
String cmd = "curl -s -H \"Authorization: token ${password}\" ${url}" 
// Example String 
example = 'curl -s -H "Authorization: token XXX" https://api.github.com/repos/tdongsi/groovy/pulls/2'

Processing Github JSON from Jenkinsfile

Continuing the demo from the last section, we now put the Groovy code into a callable function in a script called “github.groovy”. Then, in our Jenkinsfile, we proceed to load the script and use the function to process JSON response from Github API.github.groovy

import groovy.json.JsonSlurper
def getPrBody(String githubUsername, String githubToken, String repo, String id) {
   String GITHUB_API = 'https://api.github.com/repos' 
   String url = "${GITHUB_API}/${githubUsername}/${repo}/pulls/${id}" 

   println "Querying ${url}"

def text = url.toURL().getText(requestProperties: ['Authorization': "token ${githubToken}"])
def json = new JsonSlurper().parseText(text) 
def bodyText = json.body return bodyText } 
return this

Jenkinsfile

def code node('java-agent') { 
    stage('Checkout') { checkout scm }
    stage('Load') { code = load 'github.groovy' } 
    stage('Execute') {

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Developer Blog