Creating Directory in HDFS ->hadoop fs –mkdir /user/systemname/foldername OR ->hadoop fs –mkdir foldername Delete a Folder in HDFS ->hadoop fs –rmr /user/systemname/foldername OR ->hadoop fs –rmr foldername Delete a File in HDFS ->hadoop fs –rm /user/systemname/foldername/filename OR ->hadoop fs –rm foldername/filename Copying a file from one location to another location in HDFS ->hadoop fs –cp source destination Moving a File from one location to another location in HDFS ->hadoop fs –mv source destination Putting a file in HDFS ->hadoop fs –put /source /destination OR ->hadoop fs –copyFromLocal /source /destination Getting a file in HDFS ->hadoop fs –get /source /destination OR ->hadoop fs –copyToLocal /source /destination Getting list of file in HDFS ->hadoop fs –ls Reading content of a file in HDFS ->hadoop fs –cat filelocation (head, tail) Checking HDFS condition ->hadoop fsck Getting Health of Hadoop ->sudo –u hdfs hadoop / ->sudo –u hdfs hadoop /sourcepath –files –blocks –racks UnZip a file in Hadoop -tar xzvf filename
Tuesday, November 19, 2013
Commands in HDFS :
Monday, November 18, 2013
Hadoop Installation steps on Centos
// Installation Steps in Centos// // suppose MyClusterOne is masterMyClusterTwo and MyClusterThree are slaves and hdpuser is user in all // Step 1: Install JAVA & Ecllipse using System->Administratio->add or remove progrm->package collection select needed packages and download. (OR) Download latest java package from "http://www.oracle.com/technetwork/java/javase/downloads/....." 1)cd /opt/jdk1.7.0_40 2)tar -xzf /home/hdpuser/Downloads/jdk-7u40-linux-x64.tar.gz 3)alternatives --install /usr/bin/java java /opt/jdk1.7.0_40/bin/java 2 4)alternatives --config java Step 2: Configure Environment variables for JAVA # export JAVA_HOME=/opt/jdk1.7.0_40 # export JRE_HOME=/opt/jdk1.7.0_40/jre # export PATH=$PATH:/opt/jdk1.7.0_40/bin:/opt/jdk1.7.0_40/jre/bin Step 3: Steps to Give SUDO permission to User: 1) Go to Terminal and Type "su -" , it will connect to root and enter root password. 2) Type "visudo" , it will show "sudoer" file and enter "i" to edit the file. 3) Add user details after line ## Allow root to run any commands anywhere Ex : hdpuser ALL=(ALL) ALL 4) Add Password permission details after line ## Same thing without a password Ex : hdpuser ALL=(ALL) NOPASSWD: ALL 5) Press Esc, and Enter ":x" to save and exits. Step 4: Create User ( We can also use existing user) 1) Sudo Useradd hdpuser 2) Sudo passwd hdpuser , Enter New password. Step 5: Edit Host file 1) open /etc/hosts 2) Enter Master and slave node IP addresses in Node ( Master and Slves) xxx.xx.xx.xxx MyClusterOne xxx.xx.xx.xxx MyClusterTwo .... Step 6: Configuring Key Based Login 1)Type "su - hdpuser" 2)Type "ssh-keygen -t rsa" 3)Type "sudo service sshd restart" to restart the service 4)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterOne" 5)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterTwo" 6)Type "ssh-copy-id -i ~/.ssh/id_rsa.pub hdpuser@MyClusterThree" ( Add details of all slavenodes one by one) ( if any connecting error,then need to start sshd service in all slaves) 7)Type "chmod 0600 ~/.ssh/authorized_keys" 8)Exit 9)ssh-add Step 7: Download and Extract Hadoop Source 1) cd /opt/hadoop-1.2.1/ 3) sudo wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz 4) sudo tar -xzvf hadoop-1.2.1.tar.gz 5) sudo chown -R hdpuser /opt/hadoop-1.2.1 6) cd /opt/hadoop-1.2.1/conf Step 8: Edit configuration files 1) gedit conf/core-site.xml ( Master Node Details) <configuration> <property> <name>fs.default.name</name> <value>hdfs://MyClusterOne:9000/</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration> 2) gedit conf/hdfs-site.xml ( Master Node Details) <configuration> <property> <name>dfs.data.dir</name> <value>/opt/hadoop-1.2.1/dfs/name/data</value> <final>true</final> </property> <property> <name>dfs.name.dir</name> <value>/opt/hadoop-1.2.1/dfs/name</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> 3) gedit conf/mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>MyClusterOne:9001</value> </property> </configuration> 4) gedit conf/hadoop-env.sh export JAVA_HOME=/opt/jdk1.7.0_40 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true export HADOOP_CONF_DIR=/opt/hadoop-1.2.1/conf Step 9: Copy Hadoop Source to Slave Servers 1)su - hdpuser 2)cd /opt/hadoop-1.2.1 3)scp -r hdpuser MyClusterTwo:/opt/hadoop-1.2.1 4)scp -r hdpuser MyClusterThree:/opt/hadoop-1.2.1 .... Step 10 : Configure Hadoop on Master Server Only 1) su - hdpuser 2) cd /opt/hadoop-1.2.1 3) gedit conf/masters ( Add Master Node Name) MyClusterOne 4) gedit conf/slaves ( Add Slave Node Names) MyClusterTwo MyClusterThree Step 11 : To Communicate with Slave, Firewall need to be OFF 1)/etc/init.d/sudo service iptables save 2 )/etc/init.d/sudo service iptables stop 3) /etc/init.d/sudo chkconfig iptables off Step 12: To use All system space or NameNode backup 1) edit Core-site.xml and add any Folder in /Home/ 2) we should give permission to that created sudo chmod 755 /home/folderrname Step 13: Add hadoop Path details in hadoop.sh 1) gedit /etc/profile.d/hadoop.sh Step 12: Format Name Node on Hadoop Master only 1) su - hdpuser 2) cd /opt/hadoop-1.2.1 3) bin/hadoop namenode -format Step 13 : Start Hadoop Services 1) bin/start-all.sh
Comma Separated values
-- Sample Table create table #Emp(Name varchar(10),Skills varchar(max)) insert into #Emp values('Ramesh','Hadoop,SQL,DBA') insert into #Emp values('Arjun','SQL,MSBI') insert into #Emp values('Mohan','Java') select * from #Emp Name Skills --------------------------- Ramesh Hadoop,SQL,DBA Arjun SQL,MSBI Mohan Java -- Code to Display below output Name Skill --------------------- Arjun MSBI Arjun SQL Mohan Java Ramesh DBA Ramesh Hadoop Ramesh SQL SELECT DISTINCT Name,LTRIM(RTRIM(i.value('.', 'VARCHAR(MAX)'))) AS SKILL into #sample FROM ( SELECT Name,Skills ,CAST('' + REPLACE(Skills, ',', '') + '' AS XML) AS Des FROM #Emp ) List CROSS APPLY Des.nodes('//i') x(i) select * from #sample -- for the above output, Code to Display below output Name Skills --------------------------- Ramesh Hadoop,SQL,DBA Arjun SQL,MSBI Mohan Java SELECT Distinct Name ,STUFF((SELECT ','+SKILL FROM #sample f where f.Name=s.Name FOR XML PATH('')),1,1,'') AS Skills from #sample s
Thursday, January 31, 2013
Differences....
Differences Between Funtion and Procedure
- Procedure can return zero or n values whereas function can return one value which is mandatory.
- Procedures can have input/output parameters for it whereas functions can have only input parameters.
- Procedure allows select as well as DML statement in it whereas function allows only select statement in it.
- functions can be called from Procedure whereas Procedures cannot be called from function.
- Exception can be handled by try-catch block in a Procedure whereas try-catch block cannot be used in a function.
- We can go for transaction management in Procedure whereas we can't go in function.
- Procedures can not be utilized in a select statement whereas function can be embedded in a select statement.
- UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored Procedures cannot be.
- UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables.
- Inline UDF's can be though of as views that take parameters and can be used in JOINs and other Rowset operations.
- Char datatype is used to store fixed length of characters. For example , if we declared char(10) it will allocates memory for 10 characters. Once we insert only 6 characters of word then only 6 characters of memory will be used and other 4 characters of memory will be wasted.
- Varchar means variable characters and it is used to store non-unicode characters. It will allocate the memory based on number characters inserted. For example, if we declared varchar(10) it will allocates memory of 0 characters at the time of declaration and Once we insert only 6 characters of word it will allocate memory for only 6 characters.
- Nvarchar datatype same as varchar datatype but only difference nvarchar is used to store Unicode characters and it allows you to store multiple languages in database. nvarchar datatype will take twice as much space to store extended set of characters as required by other languages.
- Primary Key creates clustered Index by default , where as Unique Key Create NoN-Clustered Index
- Primary Key doesn't allow NULLs , where as Unique Key allows only one NULL value
DELETE
- Delete is a DML Command
- We can use where clause to filter data, It deletes specified data if where condition exists
- DELETE statement is executed using a row lock, each row in the table is locked for deletion
- DELETE retain the identity
- Delete activates a trigger because the operation are logged individually
- Slower than truncate because, it keeps logs
- Rollback is possible
DROP
- DROP is a DDL Command
- Removes all rows and also the table definition, including indexes, triggers, grants, storage parameters
- No filter criteria allowed, removes all rows and No triggers fired
- DROP ANY privilege on a specific table cannot be granted to another user or role.
- Drop operation cannot be rolled back in any way as it is an auto committed , while statementDelete operation can be rolled back and it is not auto committed.
TRUNCATE
- TRUNCATE is a DDL command
- It Removes all the data
- TRUNCATE TABLE always locks the table and page but not each row
- If the table contains an identity column, the counter for that column is reset to the seed value that is defined for the column
- TRUNCATE TABLE cannot activate a trigger because the operation does not log individual row deletions
- Faster in performance wise, because it doesn't keep any logs
- Rollback is possible with TRANSACTION
Subscribe to:
Posts (Atom)