Apache Spark is a lightning-fast cluster computing designed for fast computation. It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations which includes Interactive Queries and Stream Processing.
This is an extract from this brief tutorial that explains the basics of Spark Core programming.
Environment / Requirements
Installation on Mac OS X
Check or install java
$ java -version java version "12.0.1" 2019-04-16 Java(TM) SE Runtime Environment (build 12.0.1+12) Java HotSpot(TM) 64-Bit Server VM (build 12.0.1+12, mixed mode, sharing)
Check or install Scala
$ brew install scala
$ scala -version Scala code runner version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc.
Check or install Apache Spark
Setup environment in .bashrc
Installation on Ubuntu
apt update apt upgrade
apt-get install openjdk-8-jdk java -version