Tuesday, November 19, 2013

Commands in HDFS :

           Creating Directory in HDFS
           ->hadoop fs –mkdir /user/systemname/foldername
                       OR
           ->hadoop fs –mkdir foldername

           Delete a Folder in HDFS
           ->hadoop fs –rmr /user/systemname/foldername
                       OR
           ->hadoop fs –rmr foldername
           
           Delete a File in HDFS
           ->hadoop fs –rm /user/systemname/foldername/filename
                       OR
           ->hadoop fs –rm foldername/filename
           
           Copying a file from one location to another location in HDFS
           ->hadoop fs –cp source destination
           
           Moving a File from one location to another location in HDFS
           ->hadoop fs –mv source destination
           
           Putting a file in HDFS
           ->hadoop fs –put /source /destination
                       OR
           ->hadoop fs –copyFromLocal /source /destination
           
           Getting a file in HDFS
           ->hadoop fs –get /source /destination
                       OR
           ->hadoop fs –copyToLocal /source /destination
           
           Getting list of file in HDFS
           ->hadoop fs –ls
           
           Reading content of a file in HDFS
           ->hadoop fs –cat filelocation (head, tail)
           
           Checking HDFS condition
           ->hadoop fsck
           
           Getting Health of Hadoop
           ->sudo –u hdfs hadoop /
           ->sudo –u hdfs hadoop /sourcepath –files –blocks –racks
           
           UnZip a file in Hadoop
           -tar xzvf  filename
           

Monday, November 18, 2013

Hadoop Installation steps on Centos


// Installation Steps in Centos//
// suppose MyClusterOne is masterMyClusterTwo and MyClusterThree are slaves and hdpuser is user in all //

Step 1: Install JAVA & Ecllipse using System->Administratio->add or remove progrm->package  
        collection select needed packages and download.
                      (OR)
    Download latest java package from "http://www.oracle.com/technetwork/java/javase/downloads/....."

    1)cd /opt/jdk1.7.0_40
    2)tar -xzf /home/hdpuser/Downloads/jdk-7u40-linux-x64.tar.gz
    3)alternatives --install /usr/bin/java java /opt/jdk1.7.0_40/bin/java 2
    4)alternatives --config java

Step 2: Configure Environment variables for JAVA
    # export JAVA_HOME=/opt/jdk1.7.0_40
    # export JRE_HOME=/opt/jdk1.7.0_40/jre
    # export PATH=$PATH:/opt/jdk1.7.0_40/bin:/opt/jdk1.7.0_40/jre/bin

Step 3: Steps to Give SUDO permission to User:
    1) Go to Terminal and Type "su -" , it will connect to root and enter root password.
    2) Type "visudo" , it will show "sudoer" file and enter "i" to edit the file. 
    3) Add user details after line
        ## Allow root to run any commands anywhere 
        Ex : hdpuser    ALL=(ALL)    ALL
    4) Add Password permission details after line
        ## Same thing without a password
        Ex : hdpuser    ALL=(ALL)    NOPASSWD: ALL
    5) Press Esc, and Enter ":x" to save and exits.

Step 4: Create User ( We can also use existing user)
    1) Sudo Useradd hdpuser
    2) Sudo passwd hdpuser , Enter New password.

Step 5: Edit Host file
    1) open /etc/hosts 
    2) Enter Master and slave node IP addresses in Node ( Master and Slves)
        xxx.xx.xx.xxx    MyClusterOne
        xxx.xx.xx.xxx    MyClusterTwo ....

Step 6: Configuring Key Based Login
    1)Type "su - hdpuser"
    2)Type "ssh-keygen -t rsa"
    3)Type "sudo service sshd restart" to restart the service
    4)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterOne"
    5)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterTwo"
    6)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterThree" 
           ( Add details of all slavenodes one by one)    
           ( if any connecting error,then need to start sshd service in all slaves)
    7)Type "chmod 0600 ~/.ssh/authorized_keys"
    8)Exit
    9)ssh-add

Step 7: Download and Extract Hadoop Source
    1) cd /opt/hadoop-1.2.1/
    3) sudo wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
    4) sudo tar -xzvf hadoop-1.2.1.tar.gz 
    5) sudo chown -R hdpuser /opt/hadoop-1.2.1
    6) cd /opt/hadoop-1.2.1/conf

Step 8: Edit configuration files
    1) gedit conf/core-site.xml  ( Master Node Details)
    
       <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://MyClusterOne:9000/</value>
        </property>
        <property>
            <name>dfs.permissions</name>
            <value>false</value>
        </property>
        </configuration>

    2) gedit conf/hdfs-site.xml     ( Master Node Details)
        <configuration>
        <property>
            <name>dfs.data.dir</name>
            <value>/opt/hadoop-1.2.1/dfs/name/data</value>
            <final>true</final>
        </property>
    
        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hadoop-1.2.1/dfs/name</value>
            <final>true</final>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        </configuration>
    3) gedit conf/mapred-site.xml
        <configuration>
        <property>
            <name>mapred.job.tracker</name>
            <value>MyClusterOne:9001</value>
        </property>
        </configuration>
    
    4) gedit conf/hadoop-env.sh
        export JAVA_HOME=/opt/jdk1.7.0_40
        export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
        export HADOOP_CONF_DIR=/opt/hadoop-1.2.1/conf

Step 9: Copy Hadoop Source to Slave Servers 
    1)su - hdpuser
    2)cd /opt/hadoop-1.2.1
    3)scp -r hdpuser MyClusterTwo:/opt/hadoop-1.2.1
    4)scp -r hdpuser MyClusterThree:/opt/hadoop-1.2.1 ....

Step 10 : Configure Hadoop on Master Server Only
    1) su - hdpuser
    2) cd /opt/hadoop-1.2.1
    3) gedit
         conf/masters ( Add Master Node Name)
        MyClusterOne
    4) gedit conf/slaves ( Add Slave Node Names)
        MyClusterTwo
        MyClusterThree

Step 11 : To Communicate with Slave, Firewall need to be OFF
    1)/etc/init.d/sudo service iptables save
    2 )/etc/init.d/sudo service iptables stop
    3) /etc/init.d/sudo chkconfig iptables off
Step 12: To use All system space or NameNode backup 
    1) edit Core-site.xml and add any Folder in /Home/
    2) we should give permission to that created 
     sudo chmod 755 /home/folderrname
Step 13: Add hadoop Path details in hadoop.sh
    1) gedit /etc/profile.d/hadoop.sh

Step 12: Format Name Node on Hadoop Master only
    1) su - hdpuser
    2) cd /opt/hadoop-1.2.1
    3) bin/hadoop namenode -format

Step 13 : Start Hadoop Services
    1) bin/start-all.sh

Comma Separated values


-- Sample Table 

create table #Emp(Name varchar(10),Skills varchar(max))

insert into #Emp values('Ramesh','Hadoop,SQL,DBA')
insert into #Emp values('Arjun','SQL,MSBI')
insert into #Emp values('Mohan','Java')

select * from #Emp

Name  Skills
---------------------------
Ramesh  Hadoop,SQL,DBA
Arjun  SQL,MSBI
Mohan  Java

-- Code to Display below output

Name  Skill
---------------------
Arjun  MSBI
Arjun  SQL
Mohan  Java
Ramesh  DBA
Ramesh  Hadoop
Ramesh  SQL


SELECT DISTINCT Name,LTRIM(RTRIM(i.value('.', 'VARCHAR(MAX)'))) AS SKILL into #sample
FROM
(
SELECT Name,Skills
,CAST('' + REPLACE(Skills, ',', '') + '' AS XML) AS Des
FROM #Emp
) List
CROSS APPLY Des.nodes('//i') x(i)


select * from #sample 

-- for the above output, Code to Display below output

Name  Skills
---------------------------
Ramesh  Hadoop,SQL,DBA
Arjun  SQL,MSBI
Mohan  Java

SELECT Distinct Name
,STUFF((SELECT ','+SKILL FROM #sample f where f.Name=s.Name FOR XML PATH('')),1,1,'') AS Skills
from #sample s