October | 2012 | BioInCloud

Archive for October, 2012

Hadoop YARN distributedshell example

Posted in Hadoop on October 19, 2012| Leave a Comment »

This post demostrates hadoop distributedshell example.

zhengqiu@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar ../share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar -shell_command ls
12/10/19 14:45:05 INFO distributedshell.Client: Initializing Client
12/10/19 14:45:05 INFO distributedshell.Client: Starting Client
12/10/19 14:45:05 INFO distributedshell.Client: Connecting to ResourceManager at /172.22.244.177:8040
12/10/19 14:45:05 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
12/10/19 14:45:06 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=2
12/10/19 14:45:06 INFO distributedshell.Client: Got Cluster node info from ASM
12/10/19 14:45:06 INFO distributedshell.Client: Got node report from ASM for, nodeId=hadoop4:46971, nodeAddresshadoop4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: “”, last_health_report_time: 1350672194514,
12/10/19 14:45:06 INFO distributedshell.Client: Got node report from ASM for, nodeId=hadoop3:44447, nodeAddresshadoop3:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: “”, last_health_report_time: 1350672195177,
12/10/19 14:45:06 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
12/10/19 14:45:06 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
12/10/19 14:45:06 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
12/10/19 14:45:06 INFO distributedshell.Client: Got new application id=application_1350584236302_0003
12/10/19 14:45:06 INFO distributedshell.Client: Min mem capabililty of resources in this cluster 128
12/10/19 14:45:06 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 10240
12/10/19 14:45:06 INFO distributedshell.Client: AM memory specified below min threshold of cluster. Using min value., specified=10, min=128
12/10/19 14:45:06 INFO distributedshell.Client: Setting up application submission context for ASM
12/10/19 14:45:06 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment
12/10/19 14:45:07 INFO distributedshell.Client: Set the environment for the application master
12/10/19 14:45:07 INFO distributedshell.Client: Trying to generate classpath for app master from current thread’s classpath
12/10/19 14:45:07 INFO distributedshell.Client: Readable bytes from stream=8559
12/10/19 14:45:07 INFO distributedshell.Client: Setting up app master command
12/10/19 14:45:07 INFO distributedshell.Client: Completed setting up app master command ${JAVA_HOME}/bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster –container_memory 10 –num_containers 1 –priority 0 –shell_command ls 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr
12/10/19 14:45:07 INFO distributedshell.Client: Submitting application to ASM
12/10/19 14:45:08 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=, appMasterRpcPort=0, appStartTime=1350672290726, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl=172.22.244.177:8088/proxy/application_1350584236302_0003/, appUser=zhengqiu
12/10/19 14:45:09 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=, appMasterRpcPort=0, appStartTime=1350672290726, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl=172.22.244.177:8088/proxy/application_1350584236302_0003/, appUser=zhengqiu
12/10/19 14:45:10 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=, appMasterRpcPort=0, appStartTime=1350672290726, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl=172.22.244.177:8088/proxy/application_1350584236302_0003/, appUser=zhengqiu
12/10/19 14:45:11 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350672290726, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=zhengqiu
12/10/19 14:45:12 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350672290726, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=zhengqiu
12/10/19 14:45:13 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350672290726, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=zhengqiu
12/10/19 14:45:14 INFO distributedshell.Client: Got application report from ASM for, appId=3, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350672290726, yarnAppState=FINISHED, distributedFinalState=FAILED, appTrackingUrl=, appUser=zhengqiu
12/10/19 14:45:14 INFO distributedshell.Client: Application did finished unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring loop
12/10/19 14:45:14 ERROR distributedshell.Client: Application failed to complete successfully
zhengqiu@hadoop:~/hadoop-2.0.1-alpha/bin$

go to ResourceManager web interface to check the log

You may find the information like
“……………is running beyond virtual memory limits. Current usage: 35.4mb of 128.0mb physical memory used; 286.0mb of 268.8 virtual memory used. Killing container…………………”

Which indicates that the required virtual memory (260mb) exceeds the allowed virtual memory (260mb).
Solution: edit yarn-site.xml by adding
<property>
<name>yarn.nodenamager.vmem-pmem-ratio</name>
<value>3</value>
</property>

yarn.nodenamager.vmem-pmem-ratio: The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.

By default, the value is 2.1, so increasing this value will increase the allowed virtual memory.
Also refer to http://x-rip.iteye.com/blog/1533106. The same problem was also reported.

Update the yarn-site.xml file on each node and restart hadoop. Run the command again:
zhengqiu@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar ../share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar -shell_command ls
12/10/19 15:19:48 INFO distributedshell.Client: Initializing Client
12/10/19 15:19:48 INFO distributedshell.Client: Starting Client
12/10/19 15:19:48 INFO distributedshell.Client: Connecting to ResourceManager at /172.22.244.177:8040
12/10/19 15:19:49 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
12/10/19 15:19:49 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=2
12/10/19 15:19:49 INFO distributedshell.Client: Got Cluster node info from ASM
12/10/19 15:19:49 INFO distributedshell.Client: Got node report from ASM for, nodeId=hadoop3:53328, nodeAddresshadoop3:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: “”, last_health_report_time: 1350674330637,
12/10/19 15:19:49 INFO distributedshell.Client: Got node report from ASM for, nodeId=hadoop4:41566, nodeAddresshadoop4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: “”, last_health_report_time: 1350674332057,
12/10/19 15:19:49 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
12/10/19 15:19:49 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
12/10/19 15:19:49 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
12/10/19 15:19:49 INFO distributedshell.Client: Got new application id=application_1350674345000_0001
12/10/19 15:19:49 INFO distributedshell.Client: Min mem capabililty of resources in this cluster 128
12/10/19 15:19:49 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 10240
12/10/19 15:19:49 INFO distributedshell.Client: AM memory specified below min threshold of cluster. Using min value., specified=10, min=128
12/10/19 15:19:49 INFO distributedshell.Client: Setting up application submission context for ASM
12/10/19 15:19:49 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment
12/10/19 15:19:50 INFO distributedshell.Client: Set the environment for the application master
12/10/19 15:19:50 INFO distributedshell.Client: Trying to generate classpath for app master from current thread’s classpath
12/10/19 15:19:50 INFO distributedshell.Client: Readable bytes from stream=8559
12/10/19 15:19:50 INFO distributedshell.Client: Setting up app master command
12/10/19 15:19:50 INFO distributedshell.Client: Completed setting up app master command ${JAVA_HOME}/bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster –container_memory 10 –num_containers 1 –priority 0 –shell_command ls 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr
12/10/19 15:19:50 INFO distributedshell.Client: Submitting application to ASM
12/10/19 15:19:51 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl=172.22.244.177:8088/proxy/application_1350674345000_0001/, appUser=zhengqiu
12/10/19 15:19:52 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl=172.22.244.177:8088/proxy/application_1350674345000_0001/, appUser=zhengqiu
12/10/19 15:19:53 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl=172.22.244.177:8088/proxy/application_1350674345000_0001/, appUser=zhengqiu
12/10/19 15:19:54 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=SUBMITTED, distributedFinalState=UNDEFINED, appTrackingUrl=172.22.244.177:8088/proxy/application_1350674345000_0001/, appUser=zhengqiu
12/10/19 15:19:55 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=zhengqiu
12/10/19 15:19:56 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=zhengqiu
12/10/19 15:19:57 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=, appUser=zhengqiu
12/10/19 15:19:58 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=, appMasterRpcPort=0, appStartTime=1350674374290, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=, appUser=zhengqiu
12/10/19 15:19:58 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop
12/10/19 15:19:58 INFO distributedshell.Client: Application completed successfully

Check log for the above Linux ls command result:
zhengqiu@hadoop4:/tmp/logs/application_1350675606528_0023$ ll
total 16
drwx–x— 4 zhengqiu zhengqiu 4096 2012-10-19 16:46 ./
drwxr-xr-x 22 zhengqiu zhengqiu 4096 2012-10-19 16:46 ../
drwx–x— 2 zhengqiu zhengqiu 4096 2012-10-19 16:46 container_1350675606528_0023_01_000001/
drwx–x— 2 zhengqiu zhengqiu 4096 2012-10-19 16:46 container_1350675606528_0023_01_000002/
zhengqiu@hadoop4:/tmp/logs/application_1350675606528_0023$ find
.
./container_1350675606528_0023_01_000001
./container_1350675606528_0023_01_000001/AppMaster.stderr
./container_1350675606528_0023_01_000001/AppMaster.stdout
./container_1350675606528_0023_01_000002
./container_1350675606528_0023_01_000002/stdout
./container_1350675606528_0023_01_000002/stderr
zhengqiu@hadoop4:/tmp/logs/application_1350675606528_0023$ more ./container_1350675606528_0023_01_000002/stdout
container_tokens
default_container_executor.sh
launch_container.sh
tmp

Run Linux cal command:
zhengqiu@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar ../share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar -debug -shell_command cal

Check log:
zhengqiu@hadoop4:/tmp/logs$ cd application_1350675606528_0030/
zhengqiu@hadoop4:/tmp/logs/application_1350675606528_0030$ find
.
./container_1350675606528_0030_01_000002
./container_1350675606528_0030_01_000002/stdout
./container_1350675606528_0030_01_000002/stderr
./container_1350675606528_0030_01_000001
./container_1350675606528_0030_01_000001/AppMaster.stderr
./container_1350675606528_0030_01_000001/AppMaster.stdout
zhengqiu@hadoop4:/tmp/logs/application_1350675606528_0030$ more ./container_1350675606528_0030_01_000002/stdout
October 2012
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31

Run Linux cal commad in two containers:
zhengqiu@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar ../share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.1-alpha.jar -debug -shell_command cal -num_containers 2

Check log:

On datanode hadoop3, the output was generated twice.

Check log:

On datanode hadoop4, the output was generated three times.

reference: http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg03853.html

Read Full Post »

How to share files between Ubuntu guest and Windows 7 host

Posted in Linux/Unix on October 19, 2012| Leave a Comment »

Select the virtual machine, click on Settings, the above window will open. Click on the plus icon to select the folder you want to share with the virtual machine. Here I share the /Users/rezca/Desttop on the Windows 7 to share with the virtual Ubuntu. The shared folder name is Desktop, which will be used by Ubuntu to mount it.

Create a mount point in Ubuntu:
sudo mkdir /mnt/test

Mount the shared folder in Ubuntu:
sudo mount.vboxsf Desktop /mnt/test

reference: https://blogs.oracle.com/tao/entry/virtual_box_shared_folder_between

Read Full Post »

Deploy a fully distributed Hadoop MRv2 (aka YARN) cluster on virtual network

Posted in Cloud Computing, Hadoop, Linux/Unix on October 19, 2012| Leave a Comment »

This post shows the step-by step instructions to deploy a hadoop cluster (3 nodes) on the virtual network using virtualbox.

NameNode: 192.168.10.1 hadoop

ResourceManager: 192.168.10.2 hadoop2

DataNode: 192.168.10.3 hadoop3

Install Virtualbox

Install Ubuntu in Virtualbox (Install 3 copies for the 3 nodes and name them as hadoop, hadoop2 and hadoop3 respectively)

download Ubuntu from http://releases.ubuntu.com/lucid/ubuntu-10.04.4-server-i386.iso

check “Enable Network Adapter”, select “Bridged Adapter”

fig9
Choose Install Ubuntu Server

Use the same username for each node (in this example, I used zcai for all three nodes)

choose OpenSSH Server to install

create passwordless ssh login between nodes
zcai@hadoop:~$ssh-keygen -t rsa
zcai@hadoop:~$cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

copy keys to other nodes
zcai @hadoop:~$scp -r ~/.ssh 196.168.10.2
zcai @hadoop:~$scp -r ~/.ssh 196.168.10.3

use ssh to login to make sure the keys work.

install java jdk on each node
$apt-get install openjdk-6-jdk

download hadoop packag
zcai @hadoop:~$tar zxvf hadoop-2.0.1-alpha.tar.gz

create and edit configuration files
create a file hadoop-env.sh under ~/hadoop-2.0.1-alpha/etc/hadoop with the content the following command shows:
zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre
export HADOOP_HOME=~/hadoop-2.0.1-alpha
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=~${HADOOP_HOME}/etc/hadoop

Set JAVA_HOME in yarn-env.sh
zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more yarn-env.sh
…………
# some Java parameters
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre
…………….
……………….

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more yarn-site.xml
<?xml version=”1.0″?>
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.10.2:8040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.10.2:8025</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.10.2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.10.2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.10.2:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more core-site.xml
<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.10.1:8020</value>
</property>
</configuration>

create a file mapred-site.xml under ~/hadoop-2.0.1-alpha/etc/hadoop with the content the following command shows:
zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>/tmp</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/local</value>
</property>
</configuration>

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/zcai/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/zcai/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more slaves
192.168.10.3

Copy configured hadoop package to other nodes:
zcai@hadoop:~$ scp -r hadoop-2.0.1-alpha 196.168.10.2
zcai@hadoop:~$ scp -r hadoop-2.0.1-alpha 196.168.10.3

Format namenode (the command should be run on the namenode 192.168.10.1):
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$./hdfs namenode -format

zcai@hadoop:~/hadoop-2.0.1-alpha/sbin$ ./start-dfs.sh
12/10/17 14:51:34 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
Starting namenodes on [172.22.244.221]
192.168.10.1: starting namenode, logging to /home/zcai/hadoop-2.0.1-alpha/logs/hadoop-zcai-namenode-hadoop.out
192.168.10.3: starting datanode, logging to /home/zcai/hadoop-2.0.1-alpha/logs/hadoop-zcai-datanode-hadoop3.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/zcai/hadoop-2.0.1-alpha/logs/hadoop-zcai-secondarynamenode-hadoop.out

Start Hadoop
zcai@hadoop2:~/hadoop-2.0.1-alpha/sbin$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/zcai/hadoop-2.0.1-alpha/logs/yarn-zcai-resourcemanager-hadoop2.out
192.168.10.3: starting nodemanager, logging to /home/zcai/hadoop-2.0.1-alpha/logs/yarn-zcai-nodemanager-hadoop3.out

if it is working, you will find the following processes running.
On ResourceManager node
zcai@hadoop2:~/hadoop-2.0.1-alpha/sbin$ jps
1811 ResourceManager
2062 Jps

On namenode
zcai@hadoop:~/hadoop-2.0.1-alpha/sbin$ jps
13079 NameNode
13362 Jps
13312 SecondaryNameNode

On datanode:
zcai@hadoop3:~/hadoop-2.0.1-alpha/sbin$ jps
9886 DataNode
10050 NodeManager
10237 Jps

Test HDFS:
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -mkdir caitest
12/10/17 16:25:33 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zhengqiu@hadoop3:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -lsr
lsr: DEPRECATED: Please use ‘ls -R’ instead.
12/10/17 16:25:48 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zcai supergroup 0 2012-10-17 16:25 caitest
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -put x caitest
12/10/17 16:26:11 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zhengqiu@hadoop3:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -lsr
lsr: DEPRECATED: Please use ‘ls -R’ instead.
12/10/17 16:26:15 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zcai supergroup 0 2012-10-17 16:26 caitest
-rw-r–r– 1 zcai supergroup 12814 2012-10-17 16:26 caitest/x

go to http://192.168.10.2:8088 (yarn.resourcemanager.webapp.address in yarn-site.xml)

Run a map/reduce example application on the virtual hadoop cluster to ensure it is working
download an earlier version of hadoop, such as hadoop-0.20.XX.tar.gz, uncompressed it and you will find the example hadoop-examples-0.20-xxx.jar.

zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -mkdir test-input
12/10/18 11:34:21 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zcai@hadoop3:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -put x test-input
12/10/18 11:34:35 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -ls -R
12/10/18 11:34:44 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zcai supergroup 0 2012-10-18 11:34 test-input
-rw-r–r– 1 zcai supergroup 12814 2012-10-18 11:34 test-input/x

zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop jar ~/hadoop-examples-0.20.205.0.jar wordcount test-input test-output
12/10/18 11:40:18 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
12/10/18 11:40:19 INFO input.FileInputFormat: Total input paths to process : 1
12/10/18 11:40:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/18 11:40:19 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/18 11:40:19 INFO mapreduce.JobSubmitter: number of splits:1
12/10/18 11:40:19 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
12/10/18 11:40:19 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
12/10/18 11:40:19 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
12/10/18 11:40:19 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
12/10/18 11:40:19 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/10/18 11:40:19 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
12/10/18 11:40:19 INFO mapred.ResourceMgrDelegate: Submitted application application_1350573728815_0002 to ResourceManager at /172.22.244.177:8040
12/10/18 11:40:19 INFO mapreduce.Job: The url to track the job: http://192.168.10.2:8088/proxy/application_1350573728815_0002/
12/10/18 11:40:19 INFO mapreduce.Job: Running job: job_1350573728815_0002
12/10/18 11:40:26 INFO mapreduce.Job: Job job_1350573728815_0002 running in uber mode : false
12/10/18 11:40:26 INFO mapreduce.Job: map 0% reduce 0%
12/10/18 11:40:31 INFO mapreduce.Job: map 100% reduce 0%
12/10/18 11:40:32 INFO mapreduce.Job: map 100% reduce 100%
12/10/18 11:40:32 INFO mapreduce.Job: Job job_1350573728815_0002 completed successfully
12/10/18 11:40:32 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=12511
FILE: Number of bytes written=118934
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=12932
HDFS: Number of bytes written=11718
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=23944
Total time spent by all reduces in occupied slots (ms)=25504
Map-Reduce Framework
Map input records=38
Map output records=269
Map output bytes=13809
Map output materialized bytes=12271
Input split bytes=118
Combine input records=269
Combine output records=137
Reduce input groups=137
Reduce shuffle bytes=12271
Reduce input records=137
Reduce output records=137
Spilled Records=274
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=243
CPU time spent (ms)=960
Physical memory (bytes) snapshot=199864320
Virtual memory (bytes) snapshot=749105152
Total committed heap usage (bytes)=136843264
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=12814
File Output Format Counters
Bytes Written=11718

Check the result
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -ls -R
12/10/18 11:59:19 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zhengqiu supergroup 0 2012-10-18 11:34 test-input
-rw-r–r– 1 zhengqiu supergroup 12814 2012-10-18 11:34 test-input/x
drwxr-xr-x – zhengqiu supergroup 0 2012-10-18 11:40 test-output
-rw-r–r– 1 zhengqiu supergroup 0 2012-10-18 11:40 test-output/_SUCCESS
-rw-r–r– 1 zhengqiu supergroup 11718 2012-10-18 11:40 test-output/part-r-00000

Show word count statistics
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -cat test-output/part-r-00000

go to http://192.168.10.2:8088 (yarn.resourcemanager.webapp.address in yarn-site.xml)

done.

Read Full Post »

Deploy Eucalyptus on virtual Ubuntu

Posted in Cloud Computing on October 15, 2012| Leave a Comment »

This post shows how to deploy Eucalyptus cloud on virtual Ubuntu machine. All components (CLC, CL, SC, Walrus and NC) are deployed on a single virtual machine. For this deployment, the instances cannot be launched on the virtual node and the other features work.

Install virtual box

Install Ubuntu

Install Eucalyptus (refer to http://www.eucalyptus.com/docs/3.1/ig/installing_euca_ubuntu.html#installing_euca_ubuntu)

Reboot Ubuntu

Capture8
register cluster controller. If it is not working, deregister it and register it again. walrus should be automatically registered.

Capture9
go to https://front-end-ip:8443 Credentials tab, download credentials as name-of-credentials.zip

mkdir ~/.euca
cd ~/.euca
unzip name-of-credentials.zip
chmod 0700 ~/.euca
chmod 0600 ~/.euca/*
. ~/.euca/eucarc

Capture2
Go to Store tab, you will see some available images. Select the image you want and install it

If the following message appears:
ubuntu cloud Error 60: server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
Do the following steps(Use at your own risk, reference http://iamnotrhce.blogspot.com/2012/02/ubuntu-cloud-error-60-server.html):
1. cd /usr/lib/python2.6/dist-packages/imagestore/lib/
2. cp fetch.py fetch.py_orginal
3. sudo vim fetch.py
after line number 143 add below 2 lines
##############################################
curl.setopt(pycurl.SSL_VERIFYPEER, 0)
curl.setopt(pycurl.SSL_VERIFYHOST, 0)
##############################################
4. sudo wget -P /usr/local/share/ca-certificates/ –no-check-certificate https://certs.godaddy.com/repository/gd-class2-root.crt https://certs.godaddy.com/repository/gd_intermediate.crt https://certs.godaddy.com/repository/gd_cross_intermediate.crt
5. sudo update-ca-certificates
6. sudo service image-store-proxy restart

Capture3
After installation, you will find the installed images under Images tab

Capture4
Use the command line tool to check the installed image

Install Elasticfox (http://s3.amazonaws.com/ec2-downloads/elasticfox.xpi)

Capture1
Use Firefox ElasticFox plugin to interact with the cloud. First we need to set the endpoint. Click on the “Regions”, region name is “cluster1”, which is the cluster controller I registered. You can find the endpoint information in the file file “eucarc” in the downloaded credentials. see the figure below: EC2_URL is the endpoint URL.

Capture10

Capture7
Set user account. the account information can also be found in the file file “eucarc” in the downloaded credentials.

create an EBS volume

Capture6
Use Elasticfox to view the volume I created

Add a keypair, caikeypair.private will be used to ssh to the instance

Capture12
Use Elasticfox to view the key pairs

Capture13
Run an instance. Note: the instance cannot be launched here and the instance status will be pending for a while and then go to termination since the node is a virtual machine here.

Read Full Post »

BioInCloud

Archive for October, 2012

Hadoop YARN distributedshell example

How to share files between Ubuntu guest and Windows 7 host

Deploy a fully distributed Hadoop MRv2 (aka YARN) cluster on virtual network

Deploy Eucalyptus on virtual Ubuntu

Recent Posts

Archives

Categories

Meta