Creating Directory in HDFS ->hadoop fs –mkdir /user/systemname/foldername OR ->hadoop fs –mkdir foldername Delete a Folder in HDFS ->hadoop fs –rmr /user/systemname/foldername OR ->hadoop fs –rmr foldername Delete a File in HDFS ->hadoop fs –rm /user/systemname/foldername/filename OR ->hadoop fs –rm foldername/filename Copying a file from one location to another location in HDFS ->hadoop fs –cp source destination Moving a File from one location to another location in HDFS ->hadoop fs –mv source destination Putting a file in HDFS ->hadoop fs –put /source /destination OR ->hadoop fs –copyFromLocal /source /destination Getting a file in HDFS ->hadoop fs –get /source /destination OR ->hadoop fs –copyToLocal /source /destination Getting list of file in HDFS ->hadoop fs –ls Reading content of a file in HDFS ->hadoop fs –cat filelocation (head, tail) Checking HDFS condition ->hadoop fsck Getting Health of Hadoop ->sudo –u hdfs hadoop / ->sudo –u hdfs hadoop /sourcepath –files –blocks –racks UnZip a file in Hadoop -tar xzvf filename
Tuesday, November 19, 2013
Commands in HDFS :
Monday, November 18, 2013
Hadoop Installation steps on Centos
// Installation Steps in Centos// // suppose MyClusterOne is masterMyClusterTwo and MyClusterThree are slaves and hdpuser is user in all // Step 1: Install JAVA & Ecllipse using System->Administratio->add or remove progrm->package collection select needed packages and download. (OR) Download latest java package from "http://www.oracle.com/technetwork/java/javase/downloads/....." 1)cd /opt/jdk1.7.0_40 2)tar -xzf /home/hdpuser/Downloads/jdk-7u40-linux-x64.tar.gz 3)alternatives --install /usr/bin/java java /opt/jdk1.7.0_40/bin/java 2 4)alternatives --config java Step 2: Configure Environment variables for JAVA # export JAVA_HOME=/opt/jdk1.7.0_40 # export JRE_HOME=/opt/jdk1.7.0_40/jre # export PATH=$PATH:/opt/jdk1.7.0_40/bin:/opt/jdk1.7.0_40/jre/bin Step 3: Steps to Give SUDO permission to User: 1) Go to Terminal and Type "su -" , it will connect to root and enter root password. 2) Type "visudo" , it will show "sudoer" file and enter "i" to edit the file. 3) Add user details after line ## Allow root to run any commands anywhere Ex : hdpuser ALL=(ALL) ALL 4) Add Password permission details after line ## Same thing without a password Ex : hdpuser ALL=(ALL) NOPASSWD: ALL 5) Press Esc, and Enter ":x" to save and exits. Step 4: Create User ( We can also use existing user) 1) Sudo Useradd hdpuser 2) Sudo passwd hdpuser , Enter New password. Step 5: Edit Host file 1) open /etc/hosts 2) Enter Master and slave node IP addresses in Node ( Master and Slves) xxx.xx.xx.xxx MyClusterOne xxx.xx.xx.xxx MyClusterTwo .... Step 6: Configuring Key Based Login 1)Type "su - hdpuser" 2)Type "ssh-keygen -t rsa" 3)Type "sudo service sshd restart" to restart the service 4)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterOne" 5)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterTwo" 6)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterThree" ( Add details of all slavenodes one by one) ( if any connecting error,then need to start sshd service in all slaves) 7)Type "chmod 0600 ~/.ssh/authorized_keys" 8)Exit 9)ssh-add Step 7: Download and Extract Hadoop Source 1) cd /opt/hadoop-1.2.1/ 3) sudo wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz 4) sudo tar -xzvf hadoop-1.2.1.tar.gz 5) sudo chown -R hdpuser /opt/hadoop-1.2.1 6) cd /opt/hadoop-1.2.1/conf Step 8: Edit configuration files 1) gedit conf/core-site.xml ( Master Node Details) <configuration> <property> <name>fs.default.name</name> <value>hdfs://MyClusterOne:9000/</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration> 2) gedit conf/hdfs-site.xml ( Master Node Details) <configuration> <property> <name>dfs.data.dir</name> <value>/opt/hadoop-1.2.1/dfs/name/data</value> <final>true</final> </property> <property> <name>dfs.name.dir</name> <value>/opt/hadoop-1.2.1/dfs/name</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> 3) gedit conf/mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>MyClusterOne:9001</value> </property> </configuration> 4) gedit conf/hadoop-env.sh export JAVA_HOME=/opt/jdk1.7.0_40 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true export HADOOP_CONF_DIR=/opt/hadoop-1.2.1/conf Step 9: Copy Hadoop Source to Slave Servers 1)su - hdpuser 2)cd /opt/hadoop-1.2.1 3)scp -r hdpuser MyClusterTwo:/opt/hadoop-1.2.1 4)scp -r hdpuser MyClusterThree:/opt/hadoop-1.2.1 .... Step 10 : Configure Hadoop on Master Server Only 1) su - hdpuser 2) cd /opt/hadoop-1.2.1 3) gedit conf/masters ( Add Master Node Name) MyClusterOne 4) gedit conf/slaves ( Add Slave Node Names) MyClusterTwo MyClusterThree Step 11 : To Communicate with Slave, Firewall need to be OFF 1)/etc/init.d/sudo service iptables save 2 )/etc/init.d/sudo service iptables stop 3) /etc/init.d/sudo chkconfig iptables off Step 12: To use All system space or NameNode backup 1) edit Core-site.xml and add any Folder in /Home/ 2) we should give permission to that created sudo chmod 755 /home/folderrname Step 13: Add hadoop Path details in hadoop.sh 1) gedit /etc/profile.d/hadoop.sh Step 12: Format Name Node on Hadoop Master only 1) su - hdpuser 2) cd /opt/hadoop-1.2.1 3) bin/hadoop namenode -format Step 13 : Start Hadoop Services 1) bin/start-all.sh
Comma Separated values
-- Sample Table create table #Emp(Name varchar(10),Skills varchar(max)) insert into #Emp values('Ramesh','Hadoop,SQL,DBA') insert into #Emp values('Arjun','SQL,MSBI') insert into #Emp values('Mohan','Java') select * from #Emp Name Skills --------------------------- Ramesh Hadoop,SQL,DBA Arjun SQL,MSBI Mohan Java -- Code to Display below output Name Skill --------------------- Arjun MSBI Arjun SQL Mohan Java Ramesh DBA Ramesh Hadoop Ramesh SQL SELECT DISTINCT Name,LTRIM(RTRIM(i.value('.', 'VARCHAR(MAX)'))) AS SKILL into #sample FROM ( SELECT Name,Skills ,CAST('' + REPLACE(Skills, ',', '') + '' AS XML) AS Des FROM #Emp ) List CROSS APPLY Des.nodes('//i') x(i) select * from #sample -- for the above output, Code to Display below output Name Skills --------------------------- Ramesh Hadoop,SQL,DBA Arjun SQL,MSBI Mohan Java SELECT Distinct Name ,STUFF((SELECT ','+SKILL FROM #sample f where f.Name=s.Name FOR XML PATH('')),1,1,'') AS Skills from #sample s
Subscribe to:
Posts (Atom)