adplus-dvertising
6 C
Amsterdam
angeloma
angeloma
Senior Writer and partner

How to install Apache Spark on Debian 10

- Advertisement -

Hello, folks. Many companies are looking at Apache Spark as a component that can serve to somehow not depend so much on Elasticsearch. That’s why in this post, I’ll show you how to install Apache Spark on Debian 10.

According to the project website:

Apache Spark is a unified analytics engine for large-scale data processing.

Also, we can count on its maintenance and evolution to be carried out by prestigious working groups, and there will be great flexibility and interconnection with other Apache modules such as Hadoop, Hive, or Kafka.

Spark is used by a wide range of organizations to process large datasets. In fact, Since 2009, more than 1200 developers have contributed to Spark!

Learning Apache Spark is easy whether you come from a Java, Scala, Python, R, or SQL background.

Install Apache Spark on Debian 10

The installation of Apache Spark is quite simple and easier than you might think.

Install some required packages

So, connect via SSH to your server or open a terminal. To make sure there are no problems, update the distribution completely.

sudo apt update
sudo apt upgrade

After that, install Java on Debian 10.

sudo apt install default-jdk
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  ca-certificates-java default-jdk-headless default-jre default-jre-headless fontconfig-config fonts-dejavu-core java-common libasound2 libasound2-data
  libavahi-client3 libavahi-common-data libavahi-common3 libcups2 libdrm-amdgpu1 libdrm-common libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libfontconfig1
  libgif7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0 libjpeg62-turbo liblcms2-2 libllvm7 libnspr4 libnss3 libpciaccess0 libpcsclite1
  libsensors-config libsensors5 libx11-6 libx11-data libx11-xcb1 libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-sync1 libxcb1 libxdamage1
  libxdmcp6 libxext6 libxfixes3 libxi6 libxrender1 libxshmfence1 libxtst6 libxxf86vm1 openjdk-11-jdk openjdk-11-jdk-headless openjdk-11-jre openjdk-11-jre-headless
  x11-common
Suggested packages:
  libasound2-plugins alsa-utils cups-common liblcms2-utils pciutils pcscd lm-sensors openjdk-11-demo openjdk-11-source visualvm libnss-mdns fonts-dejavu-extra
  fonts-ipafont-gothic fonts-ipafont-mincho fonts-wqy-microhei | fonts-wqy-zenhei fonts-indic
Recommended packages:
  libxt-dev libatk-wrapper-java-jni fonts-dejavu-extra
The following NEW packages will be installed:
  ca-certificates-java default-jdk default-jdk-headless default-jre default-jre-headless fontconfig-config fonts-dejavu-core java-common libasound2 libasound2-data
  libavahi-client3 libavahi-common-data libavahi-common3 libcups2 libdrm-amdgpu1 libdrm-common libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libfontconfig1
  libgif7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0 libjpeg62-turbo liblcms2-2 libllvm7 libnspr4 libnss3 libpciaccess0 libpcsclite1
  libsensors-config libsensors5 libx11-6 libx11-data libx11-xcb1 libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-sync1 libxcb1 libxdamage1
  libxdmcp6 libxext6 libxfixes3 libxi6 libxrender1 libxshmfence1 libxtst6 libxxf86vm1 openjdk-11-jdk openjdk-11-jdk-headless openjdk-11-jre openjdk-11-jre-headless
  x11-common
0 upgraded, 61 newly installed, 0 to remove and 0 not upgraded.
Need to get 294 MB of archives.
After this operation, 642 MB of additional disk space will be used.
Do you want to continue? [Y/n]

And verify that everything went well by displaying the installed version.

java --version
openjdk 11.0.9.1 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode, sharing)

With Java running correctly, it’s time to install the Scala package on Debian 10.

sudo apt install scala

Check the version of Scala to make sure it was installed correctly.

scala -version
Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

With this, we are done with the Apache Spark dependencies.

Download and install Apache Spark on Debian 10

Now we can download the Apache Spark binary.

So, navigate to the /tmp/ folder and from there with the wget command to perform the download

cd /tmp
wget -c https://archive.apache.org/dist/spark/spark-3.0.2/spark-3.0.2-bin-hadoop2.7.tgz

then decompress it and move it to a safe location such as /opt/.

tar -xvzf spark-3.0.2-bin-hadoop2.7.tgz
sudo mv spark-3.0.2-bin-hadoop2.7/ /opt/spark

To use Apache Spark seamlessly from any location at the prompt, you need to add this path to the .bashrc file

nano ~/.bashrc

At the end of the file, add the following lines:

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Save the changes and close the editor. To apply the changes run:

source ~/.bashrc

Now start Apache Spark with these commands, one of which is the master of the cluster

start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-angelo-org.apache.spark.deploy.master.Master-1-osradar.out

And the slave, which in this case will be the same localhost, but you can replace it with the IP address or Domain of the computer.

start-slave.sh spark://localhost:7077
starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-angelo-org.apache.spark.deploy.worker.Worker-1-osradar.out

Now you can open a web browser and access the web interface via http://your-server:8080.

1.- Apache Spark on Debian 10
1.- Apache Spark on Debian 10

So, Apache Spark is working properly…

Conclusion

Apache Spark is easy to install on Debian 10 but so powerful that you can hardly believe it. With this tool, you can do a lot of things with a lot of data.

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Donate to Osradar

Latest article