Installing Hadoop 1.0.x-stable On 10+ Nodes With HBase, ZooKeeper, Thrift, HappyBase – Part 2!

We left off setting up the NameNode for HDFS. Let’s configure/install the JobTracker (www4) and rest of the slaves www5-www15.

  • Set up the HDFS unix user on each node with the same gid/uid; In our case it is id 509 groupadd -g 509 hdfs ; useradd -u 509 -g 509 hdfs
  • Install java as before as root yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel
  • wget http://apache.petsads.us/hadoop/common/stable/hadoop-1.0.4.tar.gz
  • tar -C /usr/local -zxvf hadoop-1.0.4.tar.gz
  • ln -s /usr/local/hadoop-1.0.4 /usr/local/hadoop
  • mkdir /usr/local/hadoop/namenode ; mkdir /usr/local/hadoop/datanode
  • mkdir -p /var/hadoop/temp ; chown hdfs /var/hadoop/temp
  • chown -R hdfs /usr/local/hadoop-1.0.4
  • Copy the config files modified in part 1 over… from the master node: scp /usr/local/hadoop/conf/* hdfs@www4:/usr/local/hadoop/conf/
  • Put in your ~/.bashrc / rc file your JAVA_HOME … export JAVA_HOME=/usr/lib/jvm/java

Download HBase and get started with its installation:

  • wget http://apache.mesi.com.ar/hbase/stable/hbase-0.94.4.tar.gz
  • tar -C /usr/local -zxvf hbase-0.94.4.tar.gz
  • ln -s /usr/local/hbase-0.94.4 /usr/local/hbase
  • edit /usr/local/hbase/conf/hbase-site.xml:
    <configuration>
      <property>
        <name>hbase.rootdir</name>
        <value>hdfs://www3:8020/hbase</value>
        <description>The directory shared by RegionServers.
        </description>
      </property>
      <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
      </property>
      <property>
        <name>hbase.zookeeper.quorum</name>
        <value>www3,www5,www7,www9,www11</value>
      </property>
      <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/usr/local/hbase/zookeeper</value>
      </property>
    </configuration>
    
  • start-hbase.sh

Installing Hadoop 1.0.x-stable On 10+ Nodes With HBase, ZooKeeper, Thrift, HappyBase – Part 1!

For more information about Hadoop, please watch https://www.youtube.com/watch?v=d2xeNpfzsYI. HBase on top of Hadoop provides powerful, extremely high throughput (Hadoop HDFS) with secondary indexing, automatic sharding of data, and Map-Reduce. I’m going to try to keep this guide as simple as possible for our future reference. I hope you find it useful!

The hardware topology in our case will be 13 Dell R420 1RU webservers, each with 143GB SSD RAID 1 drives and 2 x Intel E5-2450 @ 2.10GHz CPUs (20MB Cache), 16 cores (32 with HT), with 32GB of RAM. These servers are in 2 racks connected by 2 x 1gig Foundry switching split into 2 vlans – frontend and backend.

N.B., Hadoop 1.0.x-stable requires one NameNode for filesystem metadata (see Hadoop 2.0 for NameNode HA and HDFS Federation), at least three Datanodes (replica level by default is set to 3), exactly one JobTracker, and many TaskTrackers.

Footnote: In Hadoop 2.0.x, the Map-Reduce JobTracker has been split into two components: the ResourceManager and ApplicationMaster. Apache mentions that, “the new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. An application is either a single job in the sense of classic MapReduce jobs or a DAG of such jobs. The ResourceManager and per-machine NodeManager daemon, which manages the user processes on that machine, form the computation fabric. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.” For more information see this YARN document.

In our case, we will set up 1 master NameNode with hostname www3 and 1 master JobTracker with hostname www4 for Map/Reduce jobs. The rest of the servers will be slaves, and as such will be DataNodes with TaskTrackers. These will have hostnames www5 through www15. www2 will be a hot-spare of www3 in the event of system failure. We achieve this by specifying a secondary NFS path for dfs.name.dir, which will be mounted on our failover server and replayed in the event of www4 failure. The operating systems on these servers are all CentOS 6.x x86_64.

  • Create a new Unix user hdfs that will be for HDFS daemon operations on each node: useradd hdfs
  • Download Hadoop 1.0.x stable. In our case, we get from a mirror: wget http://apache.petsads.us/hadoop/common/stable/hadoop-1.0.4.tar.gz.
  • Extract the directory to /usr/local: tar -C /usr/local -zxvf hadoop-1.0.4.tar.gz then ln -s /usr/local/hadoop-1.0.4 /usr/local/hadoop. Then make sure it is owned by the hdfs user: chown -R hdfs /usr/local/hadoop-1.0.4
  • Make sure java is installed. yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel
  • First we set up the NameNode on www3. Open /usr/local/hadoop/conf/core-site.xml:
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
    <!-- URI of NameNode (master metadata HDFS node) -->
        <name>fs.default.name</name>
        <value>hdfs://www3/</value>
      </property>
      <property>
        <name>fs.inmemory.size.mb</name>
        <value>200</value>
        <!--Larger amount of memory allocated for the in-memory file-system used to merge map-outputs at the reduces. -->
      </property>
      <property>
        <name>io.sort.factor</name>
        <value>100</value>
      </property>
      <property>
        <name>io.sort.mb</name>
        <value>200</value>
      </property>
      <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
      </property>
    </configuration>
    
  • Next create a folder for Namenode metadata and mapreduce temp data: mkdir /usr/local/hadoop/namenode ; mkdir /usr/local/hadoop/datanode; mkdir -p /var/hadoop/temp ; chown hdfs /usr/local/hadoop/namenode; chown hdfs /usr/local/hadoop/datanode; chown -R hdfs /var/hadoop/temp. Edit /usr/local/hadoop/conf/hdfs-site.xml:
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/namenode,/nfs_storage/hadoop/namenode_backup</value>
      </property>
      <property>
        <name>dfs.block.size</name>
        <!-- 128MB -->
        <value>134217728</value>
      </property>
      <property>
        <name>dfs.namenode.handler.count</name>
        <!-- # RPC threads from datanodes -->
        <value>40</value>
      </property>
      <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop/datanode</value>
      </property>
      <property>
        <name>dfs.datanode.max.xcievers</name>
        <value>4096</value>
      </property>
      <property>
        <name>dfs.support.append</name>
        <value>true</value>
      </property>
    </configuration>
    
  • Next modify /usr/local/hadoop/conf/mapred-site.xml:
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
        <name>mapred.job.tracker</name>
        <value>www4:8021</value>
      </property>
      <property>
        <name>mapred.system.dir</name>
        <value>/hadoop/mapred/system/</value>
      </property>
      <property>
        <name>mapred.local.dir</name>
        <value>/var/hadoop/temp</value>
      </property>
      <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>20</value>
      </property>
      <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>20</value>
      </property>
      <property>
        <name>mapred.queue.names</name>
        <value>default,rooms</name>
      </property>
      <property>
        <name>mapred.acls.enabled</name>
        <value>false</value>
      </property>
      <property>
        <name>mapred.reduce.parallel.copies</name>
        <value>20</value>
        <!--Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.-->
      </property>
      <property>
        <name>mapred.map.child.java.opts</name>
        <value>-Xmx512M</value>
      </property>
      <property>
        <name>mapred.reduce.child.java.opts</name>
        <value>-Xmx512M</value>
      </property>
      <property>
        <name>mapred.task.tracker.task-controller</name>
        <value>org.apache.hadoop.mapred.DefaultTaskController</value>
      </property>
    </configuration>
  • Next modify your slaves and masters files /usr/local/hadoop/conf/slaves and /usr/local/hadoop/conf/masters:
    [root@www3 conf]# cat masters
    www3
    www4
    
    [root@www3 conf]# cat slaves
    www5
    www6
    www7
    www8
    www9
    www10
    www11
    www12
    www13
    www14
    www15
    
  • We now need to set up your environment on the NameNode www3. Open /usr/local/hadoop/conf/hadoop-env.sh:
    export HADOOP_NODENAME_OPTS="-XX:+UseParallelGC ${HADOOP_NODENAME_OPTS}"
    export HADOOP_HEAPSIZE="1000"
    export JAVA_HOME=/usr/lib/jvm/java
    

    Also you will want to put

    export JAVA_HOME=/usr/lib/jvm/java

    in your ~/.bashrc

After www3 (NameNode master server) has been set up as follows, we just have to copy the conf files over to www4 and the other nodes then fire up hadoop. Here are the steps after syncing the conf files and making the appropriate (meta)data directories like /usr/local/hadoop/namenode or /usr/local/hadoop/datanode.

  • To start a Hadoop cluster you will need to start both the HDFS and Map/Reduce cluster. Format a new distributed filesystem:
    $ bin/hadoop namenode -format
  • You will get output like:
    [hdfs@www3 hadoop]$ bin/hadoop namenode -format
    12/12/29 02:00:05 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = www3/10.23.23.12
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.0.4
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
    ************************************************************/
    Re-format filesystem in /usr/local/hadoop/namenode ? (Y or N) 
    2/12/29 02:15:39 INFO util.GSet: VM type       = 64-bit
    12/12/29 02:15:39 INFO util.GSet: 2% max memory = 17.77875 MB
    12/12/29 02:15:39 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    12/12/29 02:15:39 INFO util.GSet: recommended=2097152, actual=2097152
    12/12/29 02:15:40 INFO namenode.FSNamesystem: fsOwner=hdfs
    12/12/29 02:15:40 INFO namenode.FSNamesystem: supergroup=supergroup
    12/12/29 02:15:40 INFO namenode.FSNamesystem: isPermissionEnabled=true
    12/12/29 02:15:40 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    12/12/29 02:15:40 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    12/12/29 02:15:40 INFO namenode.NameNode: Caching file names occuring more than 10 times 
    12/12/29 02:15:40 INFO common.Storage: Image file of size 110 saved in 0 seconds.
    12/12/29 02:15:40 INFO common.Storage: Storage directory /usr/local/hadoop/namenode has been successfully formatted.
    12/12/29 02:15:40 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at www3/10.23.23.12
    ************************************************************/
    
    
  • The hdfs user should have ssh (pubkey) access to all the slaves as well as to itself from the nameNode and the jobTracker node. Here we create the keys for each (ssh into each nameNode and jobTracker to do this) and add to all ssh authorized_hosts files; sudo su - hdfs ; ssh-keygen -t rsa ; cat ~/.ssh/id_rsa.pub Copy that public key to the ~hdfs/.ssh/authorized_keys file on the slaves and the master NameNode.
  • Start the HDFS with the following command *** after the slaves are configured/installed *** – see Part 2 and then run on the designated NameNode:
    $ bin/start-dfs.sh
  • Start the jobTracker node ***after part2!***
    $ bin/start-mapred.sh

Arch Linux on Mac Pro with Luks Full Disk Encryption

You need: OS X Install CD (I used a copy of Snow Leopard), Arch Linux net install CD

1. Boot into Arch. Wipe hard drive, for example with #badblocks -c 10240 -wsvt random /dev/sda

2. #parted , set partition type to “msdos” (instead of “GPT”)

3. #cfdisk /dev/sda , add /dev/sda1 size 1024M , bootable, type 83 (linux) (for /boot); add /dev/sda2 size 8024M, type 82 (swap); add /dev/sda3 size XXXG , type 83 (linux) for / . Save partition table.

4. Reboot into OS X install DVD. Open terminal. #bless --device /dev/disk0sX --setBoot --legacy --verbose …. where “X” is the number for your /boot partition you created. You can find it by doing #diskutil list . Now your mac is configured to boot from /boot .

5. Boot into Arch CD. Set up encrypted swap volume: #cryptsetup -c aes-xts-plain -s 512 -h sha512 -v luksFormat /dev/sda2 (put your passphrase in)… Set up encrypted root volume: #cryptsetup -c aes-xts-plain -y -s 512 luksFormat /dev/sda3 (put your passphrase in)

6. Set up mappers for crypt volumes in /dev: #cryptsetup luksOpen /dev/sda3 root … #cryptsetup luksOpen /dev/sda2 swapDevice

7. Create swap on swapDevice: #mkswap /dev/mapper/swapDevice

8. edit /lib/initcpio/hooks/openswap:

# vim: set ft=sh:
run_hook ()
{
cryptsetup luksOpen /dev/sda2 swapDevice
}

9. edit /lib/initcpio/install/openswap:

# vim: set ft=sh:
build ()
{
MODULES=""
BINARIES=""
FILES=""
SCRIPT="openswap"
}
help ()
{
cat <<HELPEOF
This opens the encrypted swap partition /dev/sda2 on swapDevice mapper.
HELPEOF
}

10. Edit /etc/mkinitcpio.conf; add “openswap” before “filesystems” but after “encrypt”. Add “resume” between “openswap” and “filesystems”. Should look something like this: HOOKS="base udev autodetect pata scsi sata usb usbinput keymap encrypt openswap resume filesystems"

11. Install arch with /arch/setup , and use /dev/mapper/root for / and /dev/mapper/swapDevice for your swap . Use /dev/sda1 for /boot . I use XFS for / .

12. When it comes time to modify the config files after installing packages, modify mkinitcpio.conf like above. Also set MODULES=”xfs” for whatever filesystem you used for /.

13. When it comes time to set up the bootloader grub, make this change to your grub config: kernel /vmlinuz-linux cryptdevice=/dev/sda3:root root=/dev/mapper/root resume=/dev/mapper/swapDevice ro

14. Install grub on /dev/sda

15. reboot. You’ll be asked twice for your luks keyphrase.. one for swap and one for root.

16. #pacman –sync –refresh

17. #pacman -Syu xorg-server
18. #pacman -S xorg-xinit xterm fluxbox xorg-utils xorg-server-utils xf86-video-ati chromium mesa-demos artwiz-fonts bdf-unifont cantarell-fonts font-bitstream-speedo font-misc-ethiopic font-misc-meltho ftgl gsfonts xpdf xv gv ttf-cheapskate ttf-bitstream-vera ttf-freefont ttf-linux-libertine xorg-xlsfonts

18a. edit ~/.xinitrc , make it executable.

#!/bin/sh

xset +fp /usr/share/fonts/local
xset fp rehash
nitrogen -restore
dropbox start
pidgin
exec startfluxbox

19. edit /etc/pacman.conf and uncomment the multilib section for 32bit support . #pacman –sync –refresh

19a. edit /etc/inittab … make SLIM do the login instead of text login .. by uncommenting x:5:respawn:/usr/bin/slim >/dev/null 2>&1 and commenting the other xdm line. Change id:5:initdefault: to be uncommented, comment out init level 3 initdefault. edit slim themes by editing /etc/slim.conf change current_theme line to whatever : current_theme fingerprint,default,rear-window,subway,wave,lake,flat,capernoited … for example.

20. #pacman -S dina-font font-mathematica terminus-font profont zsh vim slim slim-themes archlinux-themes-slim alsa-utils alsa-tools alsamixer thunderbird pidgin gtk-theme-switch2 gtk-engines nautilus rsync gnupg irssi flashplugin nitrogen xlockmore dnsutils wget glib wine gpgme cups ntp skype vpnc scrot xclip … etc etc

To update your system from time to time… do pacman -Syu

Cron to Re-Sign DNSSEC Zones Nightly


#!/bin/bash
# this script uses sign_zone.sh from a previous blog post:
# http://packetcloud.net/2011/10/13/script-to-easy-nsec3rsasha1-sign-dnssec-zones/

cd /var/named/dynamic/
for dir in `find . -mindepth 1 -maxdepth 1 -type d -print`; do
domain=`basename $dir`
cd $dir
/usr/local/bin/sign_zone.sh $domain
done

/sbin/service named restart

Script to Generate NSEC3 KSK and ZSK for DNSSEC

Here’s my script to generate keys in /var/named/dynamic for DNSSEC. Usage: generate_keys <domain>


#!/bin/bash
#this script is /usr/local/bin/generate_keys
domain=$1

cd /var/named/dynamic/

dnssec-keygen -r /dev/urandom -3 $domain
dnssec-keygen -r /dev/urandom -3 -fk $domain

chown named:named /var/named/dynamic/K*

mkdir /var/named/dynamic/$domain
chown named:named /var/named/dynamic/$domain

Script to Easy-NSEC3RSASHA1 Sign DNSSEC Zones

DNSSEC has a lot of commands to learn and type when maintaining your system. Hopefully this simplifies it for you. Usage: sign_zone.sh <domain>. I verified this working with Bind 9.7.3 on Amazon EC2 and also with Bind 9.7.0 on CentOS using the bind97 RPMs and chroot jail. I store my ZSK and KSK for all domains in /var/named/dynamic. Then I have each zone in a subfolder /var/named/dynamic/<domain>. /etc/named.conf is configured to look for the generated <domain>.signed file. It will automatically increment the serial number for the zone then resign. I have a separate script to run this every night on a cron.


#!/bin/bash
#this file is /usr/local/bin/sign_zone.sh
domain=$1
nsec3_salt=`/usr/local/bin/random_salt`

cd /var/named/dynamic/$domain

ZSK=`grep -iH 'zone' ../K${domain}.*key | cut -d':' -f1`
KSK=`grep -iH 'key-sign' ../K${domain}.*key | cut -d':' -f1`

SOA_SERIAL=`grep serial $domain | sed -e 's/^[ \t]*//g' | awk '{print $1}'`
NEW_SERIAL=`expr $SOA_SERIAL + 1`

echo "detected SOA SERIAL: $SOA_SERIAL"
echo "generating a new zone with NEW SOA SERIAL: $NEW_SERIAL"

cat $domain | sed -e "s/[0-9][0-9]*.*;.*serial/${NEW_SERIAL} ; serial/" > $domain.new
cp $domain.new $domain

echo "detected ZSK: $ZSK"
echo "detected KSK: $KSK"
echo "running signzone..."
echo dnssec-signzone -3 $nsec3_salt -a -S -k $KSK $domain $ZSK
dnssec-signzone -3 $nsec3_salt -a -S -k $KSK $domain $ZSK

Code to make a random salt for above:

#!/bin/bash
# save this file as /usr/local/bin/random_salt
dd if=/dev/urandom bs=16 count=1 2>/dev/null | hexdump -e \"%08x\"

Screenshot Script to Put Image In Dropbox – Puts URL in Clipboard – For *nix

Here’s are two scripts to do what GrabBox does on OS X – they take screenshots and then place each in your public dropbox folder (edit the url to fit your id where xxxxxxx is). The first script snap.sh does a full screenshot. The second lets you click on a window, and that window will be grabbed. I set up some fluxbox key bindings to call these scripts. The url will be injected into your clipboard using xclip for quick paste in an email or in IRC :-)

snap.sh:

#!/bin/bash
RAND=`cat /dev/urandom| tr -dc 'a-zA-Z0-9' | fold -w 24 | head -n 1`
IMAGE=${RAND}.png

scrot -d 0 -q 10 ~/$IMAGE

cp ~/$IMAGE ~/Dropbox/Public/Screenshots/${IMAGE}
echo "http://dl.dropbox.com/u/xxxxxxxx/Screenshots/${IMAGE}" | xclip
rm -f ~/$IMAGE
#http://dl.dropbox.com/u/xxxxxxx/Screenshots/2vaq%7Exr5lr6x.png

snap2.sh:

#!/bin/bash

RAND=`cat /dev/urandom| tr -dc 'a-zA-Z0-9' | fold -w 24 | head -n 1`
IMAGE=${RAND}.png

scrot -d 0 -b -s -q 10 ~/$IMAGE

cp ~/$IMAGE ~/Dropbox/Public/Screenshots/${IMAGE}
echo "http://dl.dropbox.com/u/xxxxxxx/Screenshots/${IMAGE}" | xclip
rm -f ~/$IMAGE

Upgrade from FreeBSD 8.1 to 8.2 REL

1. cd /usr/src
2. edit supfile to use RELENG_8_2
*default host=cvsup15.FreeBSD.org
*default tag=RELENG_8_2
*default prefix=/usr
*default base=/var/db
*default release=cvs delete use-rel-suffix
src-all
3. cvsup -g -L 2 supfile
4. cd /usr/obj
5. chflags -R noschg *
6. rm -rf *
7. cd /usr/src && mergemaster -p
8. make buildworld

8.5. Make sure to edit your /root/kernels/PACKETCLOUD or whatever kernel config file you have

9. make buildkernel KERNCONF=PACKETCLOUD
10. make installkernel KERNCONF=PACKETCLOUD
11. mergemaster -p
12. make installworld
13. mergemaster
14. reboot
15. redo your ports…. cd /usr/ports …. portsnap fetch …. portsnap update
16. pkg_version -v | grep -v =
17. portupgrade -ra
18. etc etc. as usual

Amazon EC2 Snapshot Script – Nightly Instance Backup


#!/bin/bash

export AWS_ACCESS_KEY_ID=*******************
export AWS_SECRET_ACCESS_KEY=******************
export EC2_CERT=/root/api_cert.txt
export EC2_PRIVATE_KEY=/root/private_key.txt
export REGION=us-west-1
export VOLUME=vol-********
export OLD=`date +%m-%d-%Y --date '2 days ago'`

TODAY=`date +%m-%d-%Y`

echo "Stopping MySQL to create consistent snapshot."
/etc/init.d/mysql.server stop

echo "Creating snapshot of volume $VOLUME."
ec2-create-snapshot -C $EC2_CERT -K $EC2_PRIVATE_KEY --region $REGION -d "www.whatever.com $TODAY" $VOLUME

echo "Deleting oldest snapshot of $VOLUME."
OLDEST=`ec2-describe-snapshots -C $EC2_CERT -K $EC2_PRIVATE_KEY --region $REGION | grep $VOLUME | grep $OLD | sed -e 's/.*snap/snap/' | sed -e 's/\t.*//'`

if [ "x$OLDEST" ne "x" ]; then
ec2-delete-snapshot -C $EC2_CERT -K $EC2_PRIVATE_KEY --region $REGION $OLDEST
else
echo "No snapshot to delete."
fi

echo "Starting MySQL."
/etc/init.d/mysql.server start

Setting up Streaming Replication in PostgreSQL 9.0


* install slave with empty 9.0 postgresql. Some hints are at http://packetcloud.net/index.php/how-to-upgrade-from-postgresql-8-4-5-centos-5-base-to-postgresql-9-pgdg-rpm/ -- just stop at initdb (no need to run pg_upgrade ), erase 8.4 rpms on slave if installed

* stop master pgsql (service postgresql-9.0 stop)

* modify /var/lib/pgsql/9.0/data/pg_hba.conf on master to have (adjust with your IP info)
host replication postgres 10.24.16.0/24 trust

* mkdir /var/lib/pgsql/9.0/data/pg_wal ; chown postgres /var/lib/pgsql/9.0/data/pg_wal

* modify /var/lib/pgsql/9.0/data/postgresql.conf to have
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/9.0/data/pg_wal/%f'

* service postgresql-9.0 start on master

* Copy master data to standby with a point in time backup:
su - postgres
psql -c "SELECT pg_start_backup('label', true)"
rsync -a -v -e ssh /var/lib/pgsql/9.0/data/ slave:/var/lib/pgsql/9.0/data/ --exclude postmaster.pid
psql -c "SELECT pg_stop_backup()"

* copy the postgresql.conf settings above to slave so it can act as a primary after failover
* enable read only queries on slave in postgresql.conf
hot_standby = on

* create a recovery file in slave with streaming replication in /var/lib/pgsql/9.0/data/recovery.conf, adjust IP as needed
standby_mode = 'on'
primary_conninfo = 'host=10.24.16.11 port=5432 user=postgres'
trigger_file = '/tmp/pgsql.trigger'
restore_command = 'cp /var/lib/pgsql/9.0/data/pg_wal/%f "%p"'

* service postgresql-9.0 start on slave

* calculate replication lag:
SELECT pg_current_xlog_location() --- on master
SELECT pg_last_xlog_receive_location() --- on slave
SELECT pg_last_xlog_replay_location() --- on slave

* check replication using ps command
ps -ef | grep sender (on master) :
postgres 6879 6831 0 10:31 ? 00:00:00 postgres: wal sender process postgres 127.0.0.1(44663) streaming 0/2000000

ps -ef | grep receiver ( on slave ):
postgres 6878 6872 1 10:31 ? 00:00:01 postgres: wal receiver process streaming 0/2000000

How to failover:
* touch /tmp/pgsql.trigger ... start querying to failover server

How to restart replication after failover :
* remake a fresh backup. master doesn't have to be stopped

How to restart replication after standby fails:
* restart postgres in standby after eliminating cause of failure

How to disconnect standby from primary:
* touch /tmp/pgsql.trigger in slave while primary running.

How to re-sync standby after isolation:
* shutdown standby, make a fresh backup as per above