Creating Directory in HDFS
->hadoop fs –mkdir /user/systemname/foldername
OR
->hadoop fs –mkdir foldername
Delete a Folder in HDFS
->hadoop fs –rmr /user/systemname/foldername
OR
->hadoop fs –rmr foldername
Delete a File in HDFS
->hadoop fs –rm /user/systemname/foldername/filename
OR
->hadoop fs –rm foldername/filename
Copying a file from one location to another location in HDFS
->hadoop fs –cp source destination
Moving a File from one location to another location in HDFS
->hadoop fs –mv source destination
Putting a file in HDFS
->hadoop fs –put /source /destination
OR
->hadoop fs –copyFromLocal /source /destination
Getting a file in HDFS
->hadoop fs –get /source /destination
OR
->hadoop fs –copyToLocal /source /destination
Getting list of file in HDFS
->hadoop fs –ls
Reading content of a file in HDFS
->hadoop fs –cat filelocation (head, tail)
Checking HDFS condition
->hadoop fsck
Getting Health of Hadoop
->sudo –u hdfs hadoop /
->sudo –u hdfs hadoop /sourcepath –files –blocks –racks
UnZip a file in Hadoop
-tar xzvf filename
Tuesday, November 19, 2013
Commands in HDFS :
Monday, November 18, 2013
Hadoop Installation steps on Centos
// Installation Steps in Centos//
// suppose MyClusterOne is masterMyClusterTwo and MyClusterThree are slaves and hdpuser is user in all //
Step 1: Install JAVA & Ecllipse using System->Administratio->add or remove progrm->package
collection select needed packages and download.
(OR)
Download latest java package from "http://www.oracle.com/technetwork/java/javase/downloads/....."
1)cd /opt/jdk1.7.0_40
2)tar -xzf /home/hdpuser/Downloads/jdk-7u40-linux-x64.tar.gz
3)alternatives --install /usr/bin/java java /opt/jdk1.7.0_40/bin/java 2
4)alternatives --config java
Step 2: Configure Environment variables for JAVA
# export JAVA_HOME=/opt/jdk1.7.0_40
# export JRE_HOME=/opt/jdk1.7.0_40/jre
# export PATH=$PATH:/opt/jdk1.7.0_40/bin:/opt/jdk1.7.0_40/jre/bin
Step 3: Steps to Give SUDO permission to User:
1) Go to Terminal and Type "su -" , it will connect to root and enter root password.
2) Type "visudo" , it will show "sudoer" file and enter "i" to edit the file.
3) Add user details after line
## Allow root to run any commands anywhere
Ex : hdpuser ALL=(ALL) ALL
4) Add Password permission details after line
## Same thing without a password
Ex : hdpuser ALL=(ALL) NOPASSWD: ALL
5) Press Esc, and Enter ":x" to save and exits.
Step 4: Create User ( We can also use existing user)
1) Sudo Useradd hdpuser
2) Sudo passwd hdpuser , Enter New password.
Step 5: Edit Host file
1) open /etc/hosts
2) Enter Master and slave node IP addresses in Node ( Master and Slves)
xxx.xx.xx.xxx MyClusterOne
xxx.xx.xx.xxx MyClusterTwo ....
Step 6: Configuring Key Based Login
1)Type "su - hdpuser"
2)Type "ssh-keygen -t rsa"
3)Type "sudo service sshd restart" to restart the service
4)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterOne"
5)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterTwo"
6)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterThree"
( Add details of all slavenodes one by one)
( if any connecting error,then need to start sshd service in all slaves)
7)Type "chmod 0600 ~/.ssh/authorized_keys"
8)Exit
9)ssh-add
Step 7: Download and Extract Hadoop Source
1) cd /opt/hadoop-1.2.1/
3) sudo wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
4) sudo tar -xzvf hadoop-1.2.1.tar.gz
5) sudo chown -R hdpuser /opt/hadoop-1.2.1
6) cd /opt/hadoop-1.2.1/conf
Step 8: Edit configuration files
1) gedit conf/core-site.xml ( Master Node Details)
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://MyClusterOne:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
2) gedit conf/hdfs-site.xml ( Master Node Details)
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop-1.2.1/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop-1.2.1/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
3) gedit conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>MyClusterOne:9001</value>
</property>
</configuration>
4) gedit conf/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_40
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_CONF_DIR=/opt/hadoop-1.2.1/conf
Step 9: Copy Hadoop Source to Slave Servers
1)su - hdpuser
2)cd /opt/hadoop-1.2.1
3)scp -r hdpuser MyClusterTwo:/opt/hadoop-1.2.1
4)scp -r hdpuser MyClusterThree:/opt/hadoop-1.2.1 ....
Step 10 : Configure Hadoop on Master Server Only
1) su - hdpuser
2) cd /opt/hadoop-1.2.1
3) gedit
conf/masters ( Add Master Node Name)
MyClusterOne
4) gedit conf/slaves ( Add Slave Node Names)
MyClusterTwo
MyClusterThree
Step 11 : To Communicate with Slave, Firewall need to be OFF
1)/etc/init.d/sudo service iptables save
2 )/etc/init.d/sudo service iptables stop
3) /etc/init.d/sudo chkconfig iptables off
Step 12: To use All system space or NameNode backup
1) edit Core-site.xml and add any Folder in /Home/
2) we should give permission to that created
sudo chmod 755 /home/folderrname
Step 13: Add hadoop Path details in hadoop.sh
1) gedit /etc/profile.d/hadoop.sh
Step 12: Format Name Node on Hadoop Master only
1) su - hdpuser
2) cd /opt/hadoop-1.2.1
3) bin/hadoop namenode -format
Step 13 : Start Hadoop Services
1) bin/start-all.sh
Comma Separated values
-- Sample Table
create table #Emp(Name varchar(10),Skills varchar(max))
insert into #Emp values('Ramesh','Hadoop,SQL,DBA')
insert into #Emp values('Arjun','SQL,MSBI')
insert into #Emp values('Mohan','Java')
select * from #Emp
Name Skills
---------------------------
Ramesh Hadoop,SQL,DBA
Arjun SQL,MSBI
Mohan Java
-- Code to Display below output
Name Skill
---------------------
Arjun MSBI
Arjun SQL
Mohan Java
Ramesh DBA
Ramesh Hadoop
Ramesh SQL
SELECT DISTINCT Name,LTRIM(RTRIM(i.value('.', 'VARCHAR(MAX)'))) AS SKILL into #sample
FROM
(
SELECT Name,Skills
,CAST('' + REPLACE(Skills, ',', '') + '' AS XML) AS Des
FROM #Emp
) List
CROSS APPLY Des.nodes('//i') x(i)
select * from #sample
-- for the above output, Code to Display below output
Name Skills
---------------------------
Ramesh Hadoop,SQL,DBA
Arjun SQL,MSBI
Mohan Java
SELECT Distinct Name
,STUFF((SELECT ','+SKILL FROM #sample f where f.Name=s.Name FOR XML PATH('')),1,1,'') AS Skills
from #sample s
Subscribe to:
Comments (Atom)
