Creating Directory in HDFS
->hadoop fs –mkdir /user/systemname/foldername
OR
->hadoop fs –mkdir foldername
Delete a Folder in HDFS
->hadoop fs –rmr /user/systemname/foldername
OR
->hadoop fs –rmr foldername
Delete a File in HDFS
->hadoop fs –rm /user/systemname/foldername/filename
OR
->hadoop fs –rm foldername/filename
Copying a file from one location to another location in HDFS
->hadoop fs –cp source destination
Moving a File from one location to another location in HDFS
->hadoop fs –mv source destination
Putting a file in HDFS
->hadoop fs –put /source /destination
OR
->hadoop fs –copyFromLocal /source /destination
Getting a file in HDFS
->hadoop fs –get /source /destination
OR
->hadoop fs –copyToLocal /source /destination
Getting list of file in HDFS
->hadoop fs –ls
Reading content of a file in HDFS
->hadoop fs –cat filelocation (head, tail)
Checking HDFS condition
->hadoop fsck
Getting Health of Hadoop
->sudo –u hdfs hadoop /
->sudo –u hdfs hadoop /sourcepath –files –blocks –racks
UnZip a file in Hadoop
-tar xzvf filename
Tuesday, November 19, 2013
Commands in HDFS :
Monday, November 18, 2013
Hadoop Installation steps on Centos
// Installation Steps in Centos//
// suppose MyClusterOne is masterMyClusterTwo and MyClusterThree are slaves and hdpuser is user in all //
Step 1: Install JAVA & Ecllipse using System->Administratio->add or remove progrm->package
collection select needed packages and download.
(OR)
Download latest java package from "http://www.oracle.com/technetwork/java/javase/downloads/....."
1)cd /opt/jdk1.7.0_40
2)tar -xzf /home/hdpuser/Downloads/jdk-7u40-linux-x64.tar.gz
3)alternatives --install /usr/bin/java java /opt/jdk1.7.0_40/bin/java 2
4)alternatives --config java
Step 2: Configure Environment variables for JAVA
# export JAVA_HOME=/opt/jdk1.7.0_40
# export JRE_HOME=/opt/jdk1.7.0_40/jre
# export PATH=$PATH:/opt/jdk1.7.0_40/bin:/opt/jdk1.7.0_40/jre/bin
Step 3: Steps to Give SUDO permission to User:
1) Go to Terminal and Type "su -" , it will connect to root and enter root password.
2) Type "visudo" , it will show "sudoer" file and enter "i" to edit the file.
3) Add user details after line
## Allow root to run any commands anywhere
Ex : hdpuser ALL=(ALL) ALL
4) Add Password permission details after line
## Same thing without a password
Ex : hdpuser ALL=(ALL) NOPASSWD: ALL
5) Press Esc, and Enter ":x" to save and exits.
Step 4: Create User ( We can also use existing user)
1) Sudo Useradd hdpuser
2) Sudo passwd hdpuser , Enter New password.
Step 5: Edit Host file
1) open /etc/hosts
2) Enter Master and slave node IP addresses in Node ( Master and Slves)
xxx.xx.xx.xxx MyClusterOne
xxx.xx.xx.xxx MyClusterTwo ....
Step 6: Configuring Key Based Login
1)Type "su - hdpuser"
2)Type "ssh-keygen -t rsa"
3)Type "sudo service sshd restart" to restart the service
4)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterOne"
5)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterTwo"
6)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterThree"
( Add details of all slavenodes one by one)
( if any connecting error,then need to start sshd service in all slaves)
7)Type "chmod 0600 ~/.ssh/authorized_keys"
8)Exit
9)ssh-add
Step 7: Download and Extract Hadoop Source
1) cd /opt/hadoop-1.2.1/
3) sudo wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
4) sudo tar -xzvf hadoop-1.2.1.tar.gz
5) sudo chown -R hdpuser /opt/hadoop-1.2.1
6) cd /opt/hadoop-1.2.1/conf
Step 8: Edit configuration files
1) gedit conf/core-site.xml ( Master Node Details)
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://MyClusterOne:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
2) gedit conf/hdfs-site.xml ( Master Node Details)
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop-1.2.1/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop-1.2.1/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
3) gedit conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>MyClusterOne:9001</value>
</property>
</configuration>
4) gedit conf/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_40
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_CONF_DIR=/opt/hadoop-1.2.1/conf
Step 9: Copy Hadoop Source to Slave Servers
1)su - hdpuser
2)cd /opt/hadoop-1.2.1
3)scp -r hdpuser MyClusterTwo:/opt/hadoop-1.2.1
4)scp -r hdpuser MyClusterThree:/opt/hadoop-1.2.1 ....
Step 10 : Configure Hadoop on Master Server Only
1) su - hdpuser
2) cd /opt/hadoop-1.2.1
3) gedit
conf/masters ( Add Master Node Name)
MyClusterOne
4) gedit conf/slaves ( Add Slave Node Names)
MyClusterTwo
MyClusterThree
Step 11 : To Communicate with Slave, Firewall need to be OFF
1)/etc/init.d/sudo service iptables save
2 )/etc/init.d/sudo service iptables stop
3) /etc/init.d/sudo chkconfig iptables off
Step 12: To use All system space or NameNode backup
1) edit Core-site.xml and add any Folder in /Home/
2) we should give permission to that created
sudo chmod 755 /home/folderrname
Step 13: Add hadoop Path details in hadoop.sh
1) gedit /etc/profile.d/hadoop.sh
Step 12: Format Name Node on Hadoop Master only
1) su - hdpuser
2) cd /opt/hadoop-1.2.1
3) bin/hadoop namenode -format
Step 13 : Start Hadoop Services
1) bin/start-all.sh
Comma Separated values
-- Sample Table
create table #Emp(Name varchar(10),Skills varchar(max))
insert into #Emp values('Ramesh','Hadoop,SQL,DBA')
insert into #Emp values('Arjun','SQL,MSBI')
insert into #Emp values('Mohan','Java')
select * from #Emp
Name Skills
---------------------------
Ramesh Hadoop,SQL,DBA
Arjun SQL,MSBI
Mohan Java
-- Code to Display below output
Name Skill
---------------------
Arjun MSBI
Arjun SQL
Mohan Java
Ramesh DBA
Ramesh Hadoop
Ramesh SQL
SELECT DISTINCT Name,LTRIM(RTRIM(i.value('.', 'VARCHAR(MAX)'))) AS SKILL into #sample
FROM
(
SELECT Name,Skills
,CAST('' + REPLACE(Skills, ',', '') + '' AS XML) AS Des
FROM #Emp
) List
CROSS APPLY Des.nodes('//i') x(i)
select * from #sample
-- for the above output, Code to Display below output
Name Skills
---------------------------
Ramesh Hadoop,SQL,DBA
Arjun SQL,MSBI
Mohan Java
SELECT Distinct Name
,STUFF((SELECT ','+SKILL FROM #sample f where f.Name=s.Name FOR XML PATH('')),1,1,'') AS Skills
from #sample s
Thursday, January 31, 2013
Differences....
Differences Between Funtion and Procedure
- Procedure can return zero or n values whereas function can return one value which is mandatory.
- Procedures can have input/output parameters for it whereas functions can have only input parameters.
- Procedure allows select as well as DML statement in it whereas function allows only select statement in it.
- functions can be called from Procedure whereas Procedures cannot be called from function.
- Exception can be handled by try-catch block in a Procedure whereas try-catch block cannot be used in a function.
- We can go for transaction management in Procedure whereas we can't go in function.
- Procedures can not be utilized in a select statement whereas function can be embedded in a select statement.
- UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored Procedures cannot be.
- UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables.
- Inline UDF's can be though of as views that take parameters and can be used in JOINs and other Rowset operations.
- Char datatype is used to store fixed length of characters. For example , if we declared char(10) it will allocates memory for 10 characters. Once we insert only 6 characters of word then only 6 characters of memory will be used and other 4 characters of memory will be wasted.
- Varchar means variable characters and it is used to store non-unicode characters. It will allocate the memory based on number characters inserted. For example, if we declared varchar(10) it will allocates memory of 0 characters at the time of declaration and Once we insert only 6 characters of word it will allocate memory for only 6 characters.
- Nvarchar datatype same as varchar datatype but only difference nvarchar is used to store Unicode characters and it allows you to store multiple languages in database. nvarchar datatype will take twice as much space to store extended set of characters as required by other languages.
- Primary Key creates clustered Index by default , where as Unique Key Create NoN-Clustered Index
- Primary Key doesn't allow NULLs , where as Unique Key allows only one NULL value
DELETE
- Delete is a DML Command
- We can use where clause to filter data, It deletes specified data if where condition exists
- DELETE statement is executed using a row lock, each row in the table is locked for deletion
- DELETE retain the identity
- Delete activates a trigger because the operation are logged individually
- Slower than truncate because, it keeps logs
- Rollback is possible
DROP
- DROP is a DDL Command
- Removes all rows and also the table definition, including indexes, triggers, grants, storage parameters
- No filter criteria allowed, removes all rows and No triggers fired
- DROP ANY privilege on a specific table cannot be granted to another user or role.
- Drop operation cannot be rolled back in any way as it is an auto committed , while statementDelete operation can be rolled back and it is not auto committed.
TRUNCATE
- TRUNCATE is a DDL command
- It Removes all the data
- TRUNCATE TABLE always locks the table and page but not each row
- If the table contains an identity column, the counter for that column is reset to the seed value that is defined for the column
- TRUNCATE TABLE cannot activate a trigger because the operation does not log individual row deletions
- Faster in performance wise, because it doesn't keep any logs
- Rollback is possible with TRANSACTION
Subscribe to:
Comments (Atom)
