Welcome~~~


Another blog:
http://fun-st.blogspot.com/

It is easier to write an incorrect program than understand a correct one.

Thursday, March 31, 2011

Hadoop - Standalone Operation

Glad to finish this first try - set- up the standalone operation on Ubuntu 10

Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at leastJAVA_HOME to be the root of your Java installation.

Prerequisites---Sun Java 6

Hadoop requires a working Java 1.5.x (aka 5.0.x) installation. However, using Java 1.6.x (aka 6.0.x aka 6) is recommended for running Hadoop. For the sake of this tutorial, I will therefore describe the installation of Java 1.6.

In Ubuntu 10.04 LTS, the package sun-java6-jdk has been dropped from the Multiverse section of the Ubuntu archive. You have to perform the following four steps to install the package.

1. Add the Canonical Partner Repository to your apt repositories:

1 sudo add-apt-repository "deb http://archive.canonical.com/lucid partner"

2. Update the source list

1 sudo apt-get update

3. Install sun-java6-jdk

1 sudo apt-get install sun-java6-jdk

4. Select Sun's Java as the default on your machine.

1 sudo update-java-alternatives -s java-6-sun

The full JDK which will be placed in /usr/lib/jvm/java-6-sun (well, this directory is actually a symlink on Ubuntu).

After installation, make a quick check whether Sun's JDK is correctly set up:

1 user@ubuntu:~# java -version
2 java version "1.6.0_20"
3 Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
4 Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

hadoop-env.sh

The only required environment variable we have to configure for Hadoop in this tutorial isJAVA_HOME. Open /conf/hadoop-env.sh in the editor of your choice (if you used the installation path in this tutorial, the full path is /usr/local/hadoop/conf/hadoop-env.sh) and set theJAVA_HOME environment variable to the Sun JDK/JRE 6 directory.


  # The java implementation to use.  Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory. 
$ mkdir input 
$ cp conf/*.xml input 
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' 
$ cat output/*



--
REF: http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs

No comments:

Post a Comment