32.9 C
Texas

How To Install Apache Spark On Ubuntu 20.04

In this article you’ll learn that how to install Apache Spark On Ubuntu 20.04. Apache Spark is most powerful cluster computing system that gives high level API’s in Java, Scala & Python. It provides high level tools with advanced techniques like SQL,MLlib,GraphX & Spark Streaming. So, follow the below steps for an easy & optimal installation of Apache Spark.

Step 1: Update Your System

As usual we do, update your system before installing any new package.

sudo apt update && sudo apt upgrade -y

Once the update finished, reboot your system.

- Advertisement -
sudo reboot

Step 2: Install Java On Ubuntu 20.04

As apache spark needs Java to operate, install it by typing

sudo apt install default-jdk

Verify the installed java version by typing.

[email protected]:~$ java -version
openjdk version "11.0.9.1" 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

Step 3: Download & Install Apache Spark On Ubuntu 20.04

Fire the below command in your terminal to download the latest version of Apache spark or visit the official page to download manually.

wget https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
tar xvzf spark-3.0.1-bin-hadoop2.7.tgz
sudo mv spark-3.0.1-bin-hadoop2.7/ /opt/spark

Now, configure the apache environment.

sudo nano ~/.bashrc

And add the environment variable into the file.

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Finally source the file by typing

source ~/.bashrc

Step 4: Starting Spark Master Server

You can start the Apache Spark Master server by typing the following command in your terminal.

start-master.sh

Step 5: Access Apache Spark Via Web Interface

Go to your browser and type your server IP with port 8080 to access apache spark web interface.

http://127.0.0.1:8080/

To start a new slave server under this Master server, type the following command.

start-slave.sh spark://ubuntu1:7077

Reload the web page and you’ll see the slave server running.

Finally finish the config & hit the below command to verify the installation.

So, this is how you can install Apache Spark on Ubuntu 20.04

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article