Tuesday, November 19, 2013

Commands in HDFS :

           Creating Directory in HDFS
           ->hadoop fs –mkdir /user/systemname/foldername
                       OR
           ->hadoop fs –mkdir foldername

           Delete a Folder in HDFS
           ->hadoop fs –rmr /user/systemname/foldername
                       OR
           ->hadoop fs –rmr foldername
           
           Delete a File in HDFS
           ->hadoop fs –rm /user/systemname/foldername/filename
                       OR
           ->hadoop fs –rm foldername/filename
           
           Copying a file from one location to another location in HDFS
           ->hadoop fs –cp source destination
           
           Moving a File from one location to another location in HDFS
           ->hadoop fs –mv source destination
           
           Putting a file in HDFS
           ->hadoop fs –put /source /destination
                       OR
           ->hadoop fs –copyFromLocal /source /destination
           
           Getting a file in HDFS
           ->hadoop fs –get /source /destination
                       OR
           ->hadoop fs –copyToLocal /source /destination
           
           Getting list of file in HDFS
           ->hadoop fs –ls
           
           Reading content of a file in HDFS
           ->hadoop fs –cat filelocation (head, tail)
           
           Checking HDFS condition
           ->hadoop fsck
           
           Getting Health of Hadoop
           ->sudo –u hdfs hadoop /
           ->sudo –u hdfs hadoop /sourcepath –files –blocks –racks
           
           UnZip a file in Hadoop
           -tar xzvf  filename
           

Monday, November 18, 2013

Hadoop Installation steps on Centos


// Installation Steps in Centos//
// suppose MyClusterOne is masterMyClusterTwo and MyClusterThree are slaves and hdpuser is user in all //

Step 1: Install JAVA & Ecllipse using System->Administratio->add or remove progrm->package  
        collection select needed packages and download.
                      (OR)
    Download latest java package from "http://www.oracle.com/technetwork/java/javase/downloads/....."

    1)cd /opt/jdk1.7.0_40
    2)tar -xzf /home/hdpuser/Downloads/jdk-7u40-linux-x64.tar.gz
    3)alternatives --install /usr/bin/java java /opt/jdk1.7.0_40/bin/java 2
    4)alternatives --config java

Step 2: Configure Environment variables for JAVA
    # export JAVA_HOME=/opt/jdk1.7.0_40
    # export JRE_HOME=/opt/jdk1.7.0_40/jre
    # export PATH=$PATH:/opt/jdk1.7.0_40/bin:/opt/jdk1.7.0_40/jre/bin

Step 3: Steps to Give SUDO permission to User:
    1) Go to Terminal and Type "su -" , it will connect to root and enter root password.
    2) Type "visudo" , it will show "sudoer" file and enter "i" to edit the file. 
    3) Add user details after line
        ## Allow root to run any commands anywhere 
        Ex : hdpuser    ALL=(ALL)    ALL
    4) Add Password permission details after line
        ## Same thing without a password
        Ex : hdpuser    ALL=(ALL)    NOPASSWD: ALL
    5) Press Esc, and Enter ":x" to save and exits.

Step 4: Create User ( We can also use existing user)
    1) Sudo Useradd hdpuser
    2) Sudo passwd hdpuser , Enter New password.

Step 5: Edit Host file
    1) open /etc/hosts 
    2) Enter Master and slave node IP addresses in Node ( Master and Slves)
        xxx.xx.xx.xxx    MyClusterOne
        xxx.xx.xx.xxx    MyClusterTwo ....

Step 6: Configuring Key Based Login
    1)Type "su - hdpuser"
    2)Type "ssh-keygen -t rsa"
    3)Type "sudo service sshd restart" to restart the service
    4)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterOne"
    5)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterTwo"
    6)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterThree" 
           ( Add details of all slavenodes one by one)    
           ( if any connecting error,then need to start sshd service in all slaves)
    7)Type "chmod 0600 ~/.ssh/authorized_keys"
    8)Exit
    9)ssh-add

Step 7: Download and Extract Hadoop Source
    1) cd /opt/hadoop-1.2.1/
    3) sudo wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
    4) sudo tar -xzvf hadoop-1.2.1.tar.gz 
    5) sudo chown -R hdpuser /opt/hadoop-1.2.1
    6) cd /opt/hadoop-1.2.1/conf

Step 8: Edit configuration files
    1) gedit conf/core-site.xml  ( Master Node Details)
    
       <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://MyClusterOne:9000/</value>
        </property>
        <property>
            <name>dfs.permissions</name>
            <value>false</value>
        </property>
        </configuration>

    2) gedit conf/hdfs-site.xml     ( Master Node Details)
        <configuration>
        <property>
            <name>dfs.data.dir</name>
            <value>/opt/hadoop-1.2.1/dfs/name/data</value>
            <final>true</final>
        </property>
    
        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hadoop-1.2.1/dfs/name</value>
            <final>true</final>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        </configuration>
    3) gedit conf/mapred-site.xml
        <configuration>
        <property>
            <name>mapred.job.tracker</name>
            <value>MyClusterOne:9001</value>
        </property>
        </configuration>
    
    4) gedit conf/hadoop-env.sh
        export JAVA_HOME=/opt/jdk1.7.0_40
        export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
        export HADOOP_CONF_DIR=/opt/hadoop-1.2.1/conf

Step 9: Copy Hadoop Source to Slave Servers 
    1)su - hdpuser
    2)cd /opt/hadoop-1.2.1
    3)scp -r hdpuser MyClusterTwo:/opt/hadoop-1.2.1
    4)scp -r hdpuser MyClusterThree:/opt/hadoop-1.2.1 ....

Step 10 : Configure Hadoop on Master Server Only
    1) su - hdpuser
    2) cd /opt/hadoop-1.2.1
    3) gedit
         conf/masters ( Add Master Node Name)
        MyClusterOne
    4) gedit conf/slaves ( Add Slave Node Names)
        MyClusterTwo
        MyClusterThree

Step 11 : To Communicate with Slave, Firewall need to be OFF
    1)/etc/init.d/sudo service iptables save
    2 )/etc/init.d/sudo service iptables stop
    3) /etc/init.d/sudo chkconfig iptables off
Step 12: To use All system space or NameNode backup 
    1) edit Core-site.xml and add any Folder in /Home/
    2) we should give permission to that created 
     sudo chmod 755 /home/folderrname
Step 13: Add hadoop Path details in hadoop.sh
    1) gedit /etc/profile.d/hadoop.sh

Step 12: Format Name Node on Hadoop Master only
    1) su - hdpuser
    2) cd /opt/hadoop-1.2.1
    3) bin/hadoop namenode -format

Step 13 : Start Hadoop Services
    1) bin/start-all.sh

Comma Separated values


-- Sample Table 

create table #Emp(Name varchar(10),Skills varchar(max))

insert into #Emp values('Ramesh','Hadoop,SQL,DBA')
insert into #Emp values('Arjun','SQL,MSBI')
insert into #Emp values('Mohan','Java')

select * from #Emp

Name  Skills
---------------------------
Ramesh  Hadoop,SQL,DBA
Arjun  SQL,MSBI
Mohan  Java

-- Code to Display below output

Name  Skill
---------------------
Arjun  MSBI
Arjun  SQL
Mohan  Java
Ramesh  DBA
Ramesh  Hadoop
Ramesh  SQL


SELECT DISTINCT Name,LTRIM(RTRIM(i.value('.', 'VARCHAR(MAX)'))) AS SKILL into #sample
FROM
(
SELECT Name,Skills
,CAST('' + REPLACE(Skills, ',', '') + '' AS XML) AS Des
FROM #Emp
) List
CROSS APPLY Des.nodes('//i') x(i)


select * from #sample 

-- for the above output, Code to Display below output

Name  Skills
---------------------------
Ramesh  Hadoop,SQL,DBA
Arjun  SQL,MSBI
Mohan  Java

SELECT Distinct Name
,STUFF((SELECT ','+SKILL FROM #sample f where f.Name=s.Name FOR XML PATH('')),1,1,'') AS Skills
from #sample s

Thursday, January 31, 2013

Differences....

Differences Between Funtion and Procedure

  • Procedure can return zero or n values whereas function can return one value which is mandatory.
  • Procedures can have input/output parameters for it whereas functions can have only input parameters.
  • Procedure allows select as well as DML statement in it whereas function allows only select statement in it.
  • functions can be called from Procedure whereas Procedures cannot be called from function.
  • Exception can be handled by try-catch block in a Procedure whereas try-catch block cannot be used in a function.
  • We can go for transaction management in Procedure whereas we can't go in function.
  • Procedures can not be utilized in a select statement whereas function can be embedded in a select statement.
  • UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored Procedures cannot be.
  • UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables.
  • Inline UDF's can be though of as views that take parameters and can be used in JOINs and other Rowset operations.

Differences Between Char , Varchar and Nvarchar

  • Char datatype is used to store fixed length of characters. For example , if we declared char(10) it will allocates memory for 10 characters. Once we insert only 6 characters of word then only 6 characters of memory will be used and other 4 characters of memory will be wasted.
  • Varchar means variable characters and it is used to store non-unicode characters. It will allocate the memory based on number characters inserted. For example, if we declared varchar(10) it will allocates memory of 0 characters at the time of declaration and Once we insert only 6 characters of word it will allocate memory for only 6 characters.
  • Nvarchar datatype same as varchar datatype but only difference nvarchar is used to store Unicode characters and it allows you to store multiple languages in database. nvarchar datatype will take twice as much space to store extended set of characters as required by other languages.

Differences Between Primary Key and Unique Key

  • Primary Key creates clustered Index by default , where as Unique Key Create NoN-Clustered Index
  • Primary Key doesn't allow NULLs , where as Unique Key allows only one NULL value

Differences Between Delete ,Drop and Truncate

DELETE

  • Delete is a DML Command
  • We can use where clause to filter data, It deletes specified data if where condition exists
  • DELETE statement is executed using a row lock, each row in the table is locked for deletion
  • DELETE retain the identity
  • Delete activates a trigger because the operation are logged individually
  • Slower than truncate because, it keeps logs
  • Rollback is possible

DROP

  • DROP is a DDL Command
  • Removes all rows and also the table definition, including indexes, triggers, grants, storage parameters
  • No filter criteria allowed, removes all rows and No triggers fired
  • DROP ANY privilege on a specific table cannot be granted to another user or role.
  • Drop operation cannot be rolled back in any way as it is an auto committed , while statementDelete operation can be rolled back and it is not auto committed.

TRUNCATE

  • TRUNCATE is a DDL command
  • It Removes all the data
  • TRUNCATE TABLE always locks the table and page but not each row
  • If the table contains an identity column, the counter for that column is reset to the seed value that is defined for the column
  • TRUNCATE TABLE cannot activate a trigger because the operation does not log individual row deletions
  • Faster in performance wise, because it doesn't keep any logs
  • Rollback is possible with TRANSACTION