Feeds:
Posts
Comments

Archive for the ‘Cloud Computing’ Category

Under Virtualbox, install 8 Ubuntu with the following host name and ip address and assign them different roles

ubuntu1 192.168.10.1 shard1
ubuntu2 192.168.10.2 shard1
ubuntu3 192.168.10.3 shard1
ubuntu4 192.168.10.4 shard2
ubuntu5 192.168.10.5 shard2
ubuntu6 192.168.10.6 shard2
ubuntu7 192.168.10.7 config server
ubuntu8 192.168.10.8 route server

all ubuntu have the following configuration:
ubuntu100 192.168.10.100 gateway and DNS
network mask 255.255.255.0
domain yourdomain.com
user name: use the same information
download mongodb-linux-x84_64-2.2.1 to your home directory on each virtual machine

start shard1:

on 192.168.10.1

cd ~/mongodb-linux-x84_64-2.2.1/bin

mkdir -p /mongodb/data/shard11

mkdir -p /mongodb/logs

./mongod -shardsvr -replSet shard1 -dbpath=/mongodb/data/shard11 -logpath=/mongodb/logs/shard11.log -port=27017 -fork

on 192.168.10.2

cd ~/mongodb-linux-x84_64-2.2.1/bin

mkdir -p /mongodb/data/shard12

mkdir -p /mongodb/logs

./mongod -shardsvr -replSet shard1 -dbpath=/mongodb/data/shard12 -logpath=/mongodb/logs/shard12.log -port=27017 -fork

on 192.168.10.3

cd ~/mongodb-linux-x84_64-2.2.1/bin

mkdir -p /mongodb/data/shard13

mkdir -p /mongodb/logs

./mongod -shardsvr -replSet shard1 -dbpath=/mongodb/data/shard13 -logpath=/mongodb/logs/shard13.log -port=27017 -fork

Run the command shown in the following figure to confige shard1:

Capture11

start shard2:

on 192.168.10.4

cd ~/mongodb-linux-x84_64-2.2.1/bin

mkdir -p /mongodb/data/shard21

mkdir -p /mongodb/logs

./mongod -shardsvr -replSet shard2 -dbpath=/mongodb/data/shard21 -logpath=/mongodb/logs/shard21.log -port=27017 -fork

on 192.168.10.5

cd ~/mongodb-linux-x84_64-2.2.1/bin

mkdir -p /mongodb/data/shard22

mkdir -p /mongodb/logs

./mongod -shardsvr -replSet shard2 -dbpath=/mongodb/data/shard22 -logpath=/mongodb/logs/shard22.log -port=27017 -fork

on 192.168.10.6

cd ~/mongodb-linux-x84_64-2.2.1/bin

mkdir -p /mongodb/data/shard23

mkdir -p /mongodb/logs

./mongod -shardsvr -replSet shard2 -dbpath=/mongodb/data/shard23 -logpath=/mongodb/logs/shard23.log -port=27017 -fork

 

On any node of shard1, run the command shown in the following figure to configure shard1:
Capture21

On any node of shard2, run the command shown in the following figure to configure shard2:
Capture12

start config server on ubuntu7 (192.168.10.7):
Capture13

start route server on ubuntu8 (192.168.10.8):
Capture14

configure shard cluster:
run ./mongo localhost:30000 on the route server ubuntu8 (192.168.10.8), pay attention to the port number 30000, which is the port number where route server listens to
Capture15
Note: addshard command should “use admin”

check if shard cluster configured successfully:
Capture16

create a collection named table1 under the database testDB and insert 1000000 documents (records) without enabling sharding. In this case, all documents are located in one shard.
Capture17

enable sharding on database. This does not automatically shard any collections, but makes it possible to begin sharding collections using sh.shardCollection() or db.run.Command({shardcollections}).
Capture18

enable sharding on collection (table) and collection table1 got splitted over the two shards. Sharding is on a per-collection basis. We must explicitly specify collections to shard. The collections must belong to a database for which sharding has been enabled.
Capture19
Capture20

The following figures show an example of how sharding happens:
create a database named testDB and create a collection named table1 and insert 1000000 documents into this collection. enablesharding the database does not shard table1 illustrated by the first db.stats(). shardcollection does shard table1, which is illustrated by the second and third db.stats(). The second db.stats() still shows table1 not sharded because the sharding process takes some time to finish.
Capture24
Capture25
Capture26
Capture27

 

——————————————————————————————————————————–

 

How to change chunk size?

refer to http://docs.mongodb.org/manual/administration/sharding/#sharding-balancing-modify-chunk-size

When you initialize a sharded cluster, the default chunk size is 64 megabytes. This default chunk size works well for most deployments. However, if you notice that automatic migrations are incurring a level of I/O that your hardware cannot handle, you may want to reduce the chunk size. For the automatic splits and migrations, a small chunk size leads to more rapid and frequent migrations.

To modify the chunk size, use the following procedure:
1. Connect to any mongos in the cluster using the mongo shell.

2.Issue the following command to switch to the Config Database Contents:

use config

3. Issue the following save() operation:
db.settings.save( { _id:”chunksize”, value: <size> } )

Where the value of <size> reflects the new chunk size in megabytes. Here, you’re essentially writing a document whose values store the global chunk size configuration value.

Note:

The chunkSize and –chunkSize options, passed at runtime to the mongos do not affect the chunk size after you have initialized the cluster.

To eliminate confusion you should always set chunk size using the above procedure and never use the runtime options.

Modifying the chunk size has several limitations:
•Automatic splitting only occurs when inserting documents or updating existing documents.
•If you lower the chunk size it may take time for all chunks to split to the new size.
•Splits cannot be “undone.”

If you increase the chunk size, existing chunks must grow through insertion or updates until they reach the new size.

————————————————————————————-

collection remove() and drop() difference
Capture2

 

remove a database
Capture3

show content of a collection
Capture4

show sharding status
Capture28

Capture29

 

test GridFS. On the route server ubuntu8 (192.168.10.8), create a file and upload it to GridFS
Capture31

Capture30

Read Full Post »

Solution:

First, check apache web site to make sure your Hadoop and Pig versions are compatible.

Then, make sure the following two environment variables are set correctly:

export PIG_CLASSPATH=~/hadoop-0.23.4/etc/hadoop

export HADOOP_HOME=~/hadoop-0.23.4

Read Full Post »

This post shows the step-by step instructions to deploy a hadoop cluster (3 nodes) on the virtual network using virtualbox.

NameNode: 192.168.10.1 hadoop

ResourceManager: 192.168.10.2 hadoop2

DataNode: 192.168.10.3 hadoop3

Install Virtualbox

Install Ubuntu in Virtualbox (Install 3 copies for the 3 nodes and name them as hadoop, hadoop2 and hadoop3 respectively)

download Ubuntu from http://releases.ubuntu.com/lucid/ubuntu-10.04.4-server-i386.iso

fig8
check “Enable Network Adapter”, select “Bridged Adapter”

fig9
Choose Install Ubuntu Server

fig10
Use the same username for each node (in this example, I used zcai for all three nodes)

fig11
choose OpenSSH Server to install

create passwordless ssh login between nodes
zcai@hadoop:~$ssh-keygen -t rsa
zcai@hadoop:~$cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

copy keys to other nodes
zcai@hadoop:~$scp -r ~/.ssh 196.168.10.2
zcai@hadoop:~$scp -r ~/.ssh 196.168.10.3

use ssh to login to make sure the keys work.

install java jdk on each node
$apt-get install openjdk-6-jdk

download hadoop packag
zcai@hadoop:~$tar zxvf hadoop-2.0.1-alpha.tar.gz

create and edit configuration files
create a file hadoop-env.sh under ~/hadoop-2.0.1-alpha/etc/hadoop with the content the following command shows:
zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre
export HADOOP_HOME=~/hadoop-2.0.1-alpha
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=~${HADOOP_HOME}/etc/hadoop

Set JAVA_HOME in yarn-env.sh
zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more yarn-env.sh
…………
# some Java parameters
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre
…………….
……………….

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more yarn-site.xml
<?xml version=”1.0″?>
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.10.2:8040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.10.2:8025</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.10.2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.10.2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.10.2:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more core-site.xml
<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.10.1:8020</value>
</property>
</configuration>

create a file mapred-site.xml under ~/hadoop-2.0.1-alpha/etc/hadoop with the content the following command shows:
zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>/tmp</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/local</value>
</property>
</configuration>

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/zcai/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/zcai/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

zcai@hadoop:~/hadoop-2.0.1-alpha/etc/hadoop$ more slaves
192.168.10.3

Copy configured hadoop package to other nodes:
zcai@hadoop:~$ scp -r hadoop-2.0.1-alpha 196.168.10.2
zcai@hadoop:~$ scp -r hadoop-2.0.1-alpha 196.168.10.3

Format namenode (the command should be run on the namenode 192.168.10.1):
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$./hdfs namenode -format

zcai@hadoop:~/hadoop-2.0.1-alpha/sbin$ ./start-dfs.sh
12/10/17 14:51:34 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
Starting namenodes on [172.22.244.221]
192.168.10.1: starting namenode, logging to /home/zcai/hadoop-2.0.1-alpha/logs/hadoop-zcai-namenode-hadoop.out
192.168.10.3: starting datanode, logging to /home/zcai/hadoop-2.0.1-alpha/logs/hadoop-zcai-datanode-hadoop3.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/zcai/hadoop-2.0.1-alpha/logs/hadoop-zcai-secondarynamenode-hadoop.out

Start Hadoop
zcai@hadoop2:~/hadoop-2.0.1-alpha/sbin$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/zcai/hadoop-2.0.1-alpha/logs/yarn-zcai-resourcemanager-hadoop2.out
192.168.10.3: starting nodemanager, logging to /home/zcai/hadoop-2.0.1-alpha/logs/yarn-zcai-nodemanager-hadoop3.out

if it is working, you will find the following processes running.
On ResourceManager node
zcai@hadoop2:~/hadoop-2.0.1-alpha/sbin$ jps
1811 ResourceManager
2062 Jps

On namenode
zcai@hadoop:~/hadoop-2.0.1-alpha/sbin$ jps
13079 NameNode
13362 Jps
13312 SecondaryNameNode

On datanode:
zcai@hadoop3:~/hadoop-2.0.1-alpha/sbin$ jps
9886 DataNode
10050 NodeManager
10237 Jps

Test HDFS:
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -mkdir caitest
12/10/17 16:25:33 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zhengqiu@hadoop3:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -lsr
lsr: DEPRECATED: Please use ‘ls -R’ instead.
12/10/17 16:25:48 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zcai supergroup 0 2012-10-17 16:25 caitest
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -put x caitest
12/10/17 16:26:11 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zhengqiu@hadoop3:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -lsr
lsr: DEPRECATED: Please use ‘ls -R’ instead.
12/10/17 16:26:15 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zcai supergroup 0 2012-10-17 16:26 caitest
-rw-r–r– 1 zcai supergroup 12814 2012-10-17 16:26 caitest/x

go to http://192.168.10.2:8088 (yarn.resourcemanager.webapp.address in yarn-site.xml)

fig7

Run a map/reduce example application on the virtual hadoop cluster to ensure it is working
download an earlier version of hadoop, such as hadoop-0.20.XX.tar.gz, uncompressed it and you will find the example hadoop-examples-0.20-xxx.jar.

zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -mkdir test-input
12/10/18 11:34:21 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zcai@hadoop3:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -put x test-input
12/10/18 11:34:35 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -ls -R
12/10/18 11:34:44 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zcai supergroup 0 2012-10-18 11:34 test-input
-rw-r–r– 1 zcai supergroup 12814 2012-10-18 11:34 test-input/x

zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop jar ~/hadoop-examples-0.20.205.0.jar wordcount test-input test-output
12/10/18 11:40:18 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
12/10/18 11:40:19 INFO input.FileInputFormat: Total input paths to process : 1
12/10/18 11:40:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/18 11:40:19 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/18 11:40:19 INFO mapreduce.JobSubmitter: number of splits:1
12/10/18 11:40:19 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
12/10/18 11:40:19 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
12/10/18 11:40:19 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
12/10/18 11:40:19 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
12/10/18 11:40:19 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/10/18 11:40:19 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
12/10/18 11:40:19 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
12/10/18 11:40:19 INFO mapred.ResourceMgrDelegate: Submitted application application_1350573728815_0002 to ResourceManager at /172.22.244.177:8040
12/10/18 11:40:19 INFO mapreduce.Job: The url to track the job: http://192.168.10.2:8088/proxy/application_1350573728815_0002/
12/10/18 11:40:19 INFO mapreduce.Job: Running job: job_1350573728815_0002
12/10/18 11:40:26 INFO mapreduce.Job: Job job_1350573728815_0002 running in uber mode : false
12/10/18 11:40:26 INFO mapreduce.Job: map 0% reduce 0%
12/10/18 11:40:31 INFO mapreduce.Job: map 100% reduce 0%
12/10/18 11:40:32 INFO mapreduce.Job: map 100% reduce 100%
12/10/18 11:40:32 INFO mapreduce.Job: Job job_1350573728815_0002 completed successfully
12/10/18 11:40:32 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=12511
FILE: Number of bytes written=118934
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=12932
HDFS: Number of bytes written=11718
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=23944
Total time spent by all reduces in occupied slots (ms)=25504
Map-Reduce Framework
Map input records=38
Map output records=269
Map output bytes=13809
Map output materialized bytes=12271
Input split bytes=118
Combine input records=269
Combine output records=137
Reduce input groups=137
Reduce shuffle bytes=12271
Reduce input records=137
Reduce output records=137
Spilled Records=274
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=243
CPU time spent (ms)=960
Physical memory (bytes) snapshot=199864320
Virtual memory (bytes) snapshot=749105152
Total committed heap usage (bytes)=136843264
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=12814
File Output Format Counters
Bytes Written=11718

Check the result
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -ls -R
12/10/18 11:59:19 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
drwxr-xr-x – zhengqiu supergroup 0 2012-10-18 11:34 test-input
-rw-r–r– 1 zhengqiu supergroup 12814 2012-10-18 11:34 test-input/x
drwxr-xr-x – zhengqiu supergroup 0 2012-10-18 11:40 test-output
-rw-r–r– 1 zhengqiu supergroup 0 2012-10-18 11:40 test-output/_SUCCESS
-rw-r–r– 1 zhengqiu supergroup 11718 2012-10-18 11:40 test-output/part-r-00000

Show word count statistics
zcai@hadoop:~/hadoop-2.0.1-alpha/bin$ ./hadoop fs -cat test-output/part-r-00000

go to http://192.168.10.2:8088 (yarn.resourcemanager.webapp.address in yarn-site.xml)

fig13

done.

Read Full Post »

This post shows how to deploy Eucalyptus cloud on virtual Ubuntu machine. All components (CLC, CL, SC, Walrus and NC) are deployed on a single virtual machine. For this deployment, the instances cannot be launched on the virtual node and the other features work.

Install virtual box

Install Ubuntu

Install Eucalyptus (refer to http://www.eucalyptus.com/docs/3.1/ig/installing_euca_ubuntu.html#installing_euca_ubuntu)

Reboot Ubuntu

Capture8
register cluster controller. If it is not working, deregister it and register it again. walrus should be automatically registered.

Capture9
go to https://front-end-ip:8443 Credentials tab, download credentials as name-of-credentials.zip

mkdir ~/.euca
cd ~/.euca
unzip name-of-credentials.zip
chmod 0700 ~/.euca
chmod 0600 ~/.euca/*
. ~/.euca/eucarc

Capture2
Go to Store tab, you will see some available images. Select the image you want and install it

If the following message appears:
ubuntu cloud Error 60: server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
Do the following steps(Use at your own risk, reference http://iamnotrhce.blogspot.com/2012/02/ubuntu-cloud-error-60-server.html):
1. cd /usr/lib/python2.6/dist-packages/imagestore/lib/
2. cp fetch.py fetch.py_orginal
3. sudo vim fetch.py
after line number 143 add below 2 lines
##############################################
curl.setopt(pycurl.SSL_VERIFYPEER, 0)
curl.setopt(pycurl.SSL_VERIFYHOST, 0)
##############################################
4. sudo wget -P /usr/local/share/ca-certificates/ –no-check-certificate https://certs.godaddy.com/repository/gd-class2-root.crt https://certs.godaddy.com/repository/gd_intermediate.crt https://certs.godaddy.com/repository/gd_cross_intermediate.crt
5. sudo update-ca-certificates
6. sudo service image-store-proxy restart

Capture3
After installation, you will find the installed images under Images tab

Capture4
Use the command line tool to check the installed image

Install Elasticfox (http://s3.amazonaws.com/ec2-downloads/elasticfox.xpi)

Capture1
Use Firefox ElasticFox plugin to interact with the cloud. First we need to set the endpoint. Click on the “Regions”, region name is “cluster1”, which is the cluster controller I registered. You can find the endpoint information in the file file “eucarc” in the downloaded credentials. see the figure below: EC2_URL is the endpoint URL.

Capture10

Capture7
Set user account. the account information can also be found in the file file “eucarc” in the downloaded credentials.

Capture5
create an EBS volume

Capture6
Use Elasticfox to view the volume I created

Capture11
Add a keypair, caikeypair.private will be used to ssh to the instance

Capture12
Use Elasticfox to view the key pairs

Capture13
Run an instance. Note: the instance cannot be launched here and the instance status will be pending for a while and then go to termination since the node is a virtual machine here.

Read Full Post »