Quick Start Hadoop in WSL2

Recently, I used WSL2 with ubuntu version 20.0.4 to installed hadoop which version is 3.2.1, and I want to share some experiences about what I met some troubles.

First, I went to the offical webiste, look for a guide, as a single node, I opend the Single Node Setup, for the first step, I type some commands like it said.

1
2
$ sudo apt-get install ssh
$ sudo apt-get install pdsh // Here is a mistake for next step, pdsh will be a trouble.

It must to use Java enviroment, so I installed openjdk-8-jdk.

1
$ sudo apt install openjdk-8-jdk

Then, I wen to the other link, and download the release package.

1
2
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
tar -xzvf hadoop-3.2.1.tar.gz -C /home/gavin

Before run hadoop, we should modify configuration, use this command

1
2
3
4
5
6
7
8
9
10
vim ~/hadoop-3.2.1/etc/hadoop/core-site.xml

use following

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
vim ~/hadoop-3.2.1/etc/hadoop/hdfs-site.xml:


<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Then append enviroment variable to hadoop-env.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
vim ~/hadoop-3.2.1/etc/hadoop/hadoop-env.sh


export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_HOME=/home/gavin/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"

If you cannot ssh to localhost without a passphrase, execute the following commands:

1
2
3
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

After that, as offcial doument, we will done, But some troubles will start, first I can’t start hadoop, it told me localhost: rcmd: socket: Permission denied, then I try to ssh localhost, it refused on port 22, so by it describe, I think ssh-server not run, so I search how to run the ssh-server on wsl2, and as it said, I modify the ssh configration

1
2
3
4
5
6
7
8
9
10
$ sudo vim /etc/ssh/sshd_config

Find this config, and remove #
# Port 22
# AddressFamily any
# ListenAddress 0.0.0.0
# ListenAddress ::

change PasswordAuthentication to yes
PasswordAuthentication yes

After config, try use

1
$ ssh localhost

if you can login, it will done, that’s all.

Author: Gavin Zhao
Link: https://www.gavinz.xyz/2021/03/19/quick-start-hadoop-in-wsl2/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.