1.3 C
Amsterdam
Saturday, December 5, 2020

How To Install Apache Hadoop / HBase on Ubuntu 18.04

Must read

How To Install Docker Swarm On Ubuntu 20.04

In this tutorial, you'll learn that how to Install Docker Swarm on Ubuntu 20.04. Docker Swarm is most popular tool that can be used...

How To Install Keeweb Password Manager On Ubuntu 20.04

Today we are going to learn that how to install KeeWeb Passwrod Manager On Ubuntu 20.04. KeeWeb Password Manager provides the best and easy...

How to change the MAC address in Windows 10

Hello! How are you? Today we will see how to change the MAC address on a computer with Windows 10. In fact, it is...

How To Configure Slave BIND DNS Server On Ubuntu 20.04

In our previous guide we covered the setup of Master(Primary) DNS Server. Here we'll learn that how to configure slave bind dns server on...

HBase is an open source distributed non-relational database developed under the Apache Software Foundation. It is written in Java & runs on top of Hadoop File Systems (HDFS). HBase is one of the dominant databases when working with big data. It is designed for a quick read & write access to huge amounts of structured data.

Today, we will cover our first guide on the Installation of Hadoop & HBase on Ubuntu 18.04 and it is a HBase Installation on a Single Node Hadoop Cluster. It is done on a barebone Ubuntu 18.04 Virtual Machine with 8GB Ram & 4vCPU

Installing Hadoop on Ubuntu 18.04

Cover these steps to install a Single node Hadoop cluster on Ubuntu 18.04 LTS

Step 1: Update System

To deploy Hadoop & HBase on Ubuntu , update it.

sudo apt update
sudo apt -y upgrade
sudo reboot

Step 2: Install Java

Skip this step if you have Installed java.

sudo apt install openjdk-8-jre-headless
sudo apt update

Confirm the Installation of Java by

sabi@Ubuntu:~$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

Set up JAVA_HOME variable.

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export PATH=$PATH:$JAVA_HOME/bin
EOF

Now, update your PATH & settings.

source /etc/profile.d/hadoop_java.sh

Testing Java

sabi@Ubuntu:~$ echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-amd64

Step 3: Creating User Account

Move forward to create an Account for Hadoop so we have isolation b/w the Hadoop file system & the Unix file system.

sabi@Ubuntu:~$ sudo adduser hadoop
Adding user hadoop' ... Adding new grouphadoop' (1001) …
Adding new user hadoop' (1001) with grouphadoop' …
Creating home directory /home/hadoop' ... Copying files from/etc/skel' …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []: Sabir Hussain
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
sabi@Ubuntu:~$ sudo usermod -aG sudo hadoop

After adding user, generate SS key pair for the user.

sabi@Ubuntu:~$ sudo su - hadoop
hadoop@Ubuntu:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:f/lEUTkJyr49dZEHr9xZ7wCD4Lg3+ephloHQ8w8GVlY hadoop@Ubuntu
The key's randomart image is:
+---[RSA 2048]----+
| +.E .o +|
| . = …. B.|
| . * . .oo .o=|
| o * .. + +*|
| o S . =o+|
| o O oo.o.|
| = +.oo. .|
| o o . o. |
| .o . |
+----[SHA256]-----+

Allow authorization

Add this user’s key to list of Authorized ssh keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Make sure that you can ssh using added key.

hadoop@Ubuntu:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:jyWPWJLVC9MCHnOAFJjN8c8bwLu0o0U85cWTxHwuHvE.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
Canonical Livepatch is available for installation.
Reduce system reboots and improve kernel security. Activate at:
https://ubuntu.com/livepatch
0 packages can be updated.
0 updates are security updates.
Your Hardware Enablement Stack (HWE) is supported until April 2023.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
hadoop@Ubuntu:~$ exit
logout
Connection to localhost closed.

Step 4: Download & Install Hadoop

Go for the latest release of Hadoop & download it.

wget https://www-eu.apache.org/dist/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

Extract the files.

tar xzvf hadoop-2.10.0.tar.gz

Move resulting directory to /usr/local/hadoop

sudo mv hadoop-2.10.0 /usr/local/hadoop

Set up HADOOP_HOME and add directory with Hadoop binaries to your $PATH

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
EOF

Source file using

source /etc/profile.d/hadoop_java.sh

Confirm your Hadoop version by

hadoop@Ubuntu:~$ hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

Step 5: Configure Hadoop

Hadoop configurations are located under /usr/local/hadoop/etc/hadoop/

Various files needed to be modified to complete the Installation on Ubuntu 18.04

First of all edit JAVA_HOME in shell script hadoop-env.sh:

$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Then configure:

1.core-site.xml

The core-site.xml file contains Hadoop cluster information used when starting up. These properties include:

  • The port number used for Hadoop instance
  • The memory allocated for file system
  • The memory limit for data storage
  • The size of Read / Write buffers.

Open core-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following properties in b/w the <configuration> and </configuration> tags.

2. hdfs-site.xml

Configure this file for each host to be used in the cluster. It holds the information of

  • The namenode & datanode paths ol the local filesystem.
  • Value of replication data

I’m using my disk to store Hadoop infrastructure. You can follow this procedure for your secondary disk.

hadoop@Ubuntu:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 149.9M 1 loop /snap/gnome-3-28-1804/67
loop1 7:1 0 54.4M 1 loop /snap/core18/1066
loop2 7:2 0 4.2M 1 loop /snap/gnome-calculator/544
loop3 7:3 0 14.8M 1 loop /snap/gnome-characters/296
loop4 7:4 0 4M 1 loop /snap/gnome-calculator/406
loop5 7:5 0 3.7M 1 loop /snap/gnome-system-monitor/123
loop6 7:6 0 89.1M 1 loop /snap/core/8268
loop7 7:7 0 14.8M 1 loop /snap/gnome-characters/375
loop8 7:8 0 3.7M 1 loop /snap/gnome-system-monitor/100
loop9 7:9 0 1008K 1 loop /snap/gnome-logs/61
loop10 7:10 0 88.5M 1 loop /snap/core/7270
loop11 7:11 0 156.7M 1 loop /snap/gnome-3-28-1804/110
loop12 7:12 0 956K 1 loop /snap/gnome-logs/81
loop13 7:13 0 44.2M 1 loop /snap/gtk-common-themes/1353
loop14 7:14 0 42.8M 1 loop /snap/gtk-common-themes/1313
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sr0 11:0 1 2G 0 rom

Do partition & mount the disk to /hadoop directory.

1.sudo parted -s -- /dev/sdb mklabel gpt
2.sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
3.sudo parted -s -- /dev/sdb align-check optimal 1
4.sudo mkfs.xfs /dev/sdb1
5.sudo mkdir /hadoop
echo "/dev/sdb1 /hadoop xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

Check:

hadoop@Ubuntu:~$ df -hT | grep /dev/sda1
/dev/sda1 ext4 20G 7.4G 12G 40% /

Create directories for namenode & datanode

sudo mkdir -p /hadoop/hdfs/{namenode,datanode}

Now, set ownership to hadoop user & group

sudo chown -R hadoop:hadoop /hadoop

Open the file

sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Then add the below data in b/w <configuration> & </configuration> tags.

3. mapred-site.xml

Use this file to set the MapReduce Framework

sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

Set according to the below

4. yarn-site.xml

It will overwrite the configurations for Hadoop.yarn because it will define resource management & job scheduling logic.

sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

Do similar configuration

Step 6: Validate Hadoop Configuration

Initialize Hadoop Infrastructure store.

sudo su - hadoop
hdfs namenode -format

Test HDFS Configuration

$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hbase]
hbase: Warning: Permanently added 'hbase' (ECDSA) to the list of known hosts.

In the end, verify the YARN configurations

$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

Hadoop 2.x default web UI ports.

  • NameNode – Default HTTP port is 9870.
  • ResourceManager – Default HTTP port is 8088.
  • MapReduce JobHistory Server – Default HTTP port is 19888.

Check these by typing

ss -tunelp

Access Hadoop Web Dashboard at http://ServerIP:9870

See Hadoop Cluster Overview at http://ServerIP:8080

Let’s create a directory to test

$ hadoop fs -mkdir /test
$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2019-12-29 10:23 /test
Stopping Hadoop Services

Run the following command to stop the Hadoop Services.

$ stop-dfs.sh
$ stop-yarn.sh

See our next article to read How To Install HBase on Ubuntu 18.04

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

-

Latest article

How To Install Docker Swarm On Ubuntu 20.04

In this tutorial, you'll learn that how to Install Docker Swarm on Ubuntu 20.04. Docker Swarm is most popular tool that can be used...

How To Install Keeweb Password Manager On Ubuntu 20.04

Today we are going to learn that how to install KeeWeb Passwrod Manager On Ubuntu 20.04. KeeWeb Password Manager provides the best and easy...

How to change the MAC address in Windows 10

Hello! How are you? Today we will see how to change the MAC address on a computer with Windows 10. In fact, it is...

How To Configure Slave BIND DNS Server On Ubuntu 20.04

In our previous guide we covered the setup of Master(Primary) DNS Server. Here we'll learn that how to configure slave bind dns server on...

Install Fossil on Linux – An alternative to Git

Hello, friends. In this post, we will talk about a serious alternative to Git. In a few words, we'll show you how to install...
x