RHEL4 cluster installation - Seamus Murray's Notes about Linux and System Engineering

Red Hat 4 Cluster Install


#!/bin/bash
exit # this is here encase you run this as a script
# install cluster suite for rhes4u3 last updated 17th April 2007 by Seamus

# Below details how install the redhat cluster suite and GFS on any redhat ES4 Update 3 box
# the prerequisites are based upon the RHEL4u3 kickstart build prepared by Seamus
# fileserver.example.com:/u1/Distros/KickStart_rhes4u3/RHEL4u3_32bit.iso
# All of the software below is installed from an NFS server
# Simply cut and paste the sections into a root shell,
# WARNING there are no error checks in this file so please don't run as a script...
# just watch the terminal for any errors such as cant find rpm, cant mount NFS volume etc.
# I have divided up the commands based on the source of the software
# There are a few comments down the bottom which need to be actioned manually

NFS_SERVER=fileserver.example.com
NFS_SHARE=/u1/Distros
VERSION=rhes4u3
NFS_PATH=$NFS_SHARE/$VERSION
TEMP_MOUNT=/tmp/software
mkdir $TEMP_MOUNT
mount $NFS_SERVER:$NFS_SHARE $TEMP_MOUNT

#######################################
# Install cluster suite prerequisites #
#######################################
# The following RedHat RPM's are not part of the standard build prepared by Seamus
# and therefore will need to be installed prior to the installation of the cluster suite

mkdir /tmp/rhcs_install
mount -o ro,loop -t iso9660 /tmp/software/rhes4u3/rhel-4-u3-rhcs-i386.iso /tmp/rhcs_install
RPM_PATH=/tmp/software/rhes4u3/install/RedHat/RPMS/

rpm -ivh $RPM_PATH/libidn-0.5.6-1.i386.rpm
rpm -ivh $RPM_PATH/curl-7.12.1-8.rhel4.i386.rpm
rpm -ivh $RPM_PATH/php-pear-4.3.9-3.9.i386.rpm $RPM_PATH/php-4.3.9-3.9.i386.rpm
rpm -ivh $RPM_PATH/device-mapper-multipath-0.4.5-12.0.RHEL4.i386.rpm

#######################################
# Install cluster suite               #
#######################################
# not all of the following are required to get the cluster running
# I figured its easier to have them here just in-case you need them in the future

RHCS_PATH=/tmp/rhcs_install/RedHat/RPMS/

rpm -ivh $RHCS_PATH/ipvsadm-1.24-6.i386.rpm
rpm -ivh $RHCS_PATH/piranha-0.8.2-1.i386.rpm
rpm -ivh $RHCS_PATH/perl-Net-Telnet-3.03-3.noarch.rpm
rpm -ivh $RHCS_PATH/magma-1.0.4-0.i686.rpm
rpm -ivh $RHCS_PATH/ccs-1.0.3-0.i686.rpm
rpm -ivh $RHCS_PATH/gulm-1.0.6-0.i686.rpm
rpm -ivh $RHCS_PATH/cman-kernel-2.6.9-43.8.i686.rpm
rpm -ivh $RHCS_PATH/cman-1.0.4-0.i686.rpm
rpm -ivh $RHCS_PATH/cman-kernel-smp-2.6.9-43.8.i686.rpm
rpm -ivh $RHCS_PATH/fence-1.32.18-0.i686.rpm
rpm -ivh $RHCS_PATH/rgmanager-1.9.46-0.i386.rpm
rpm -ivh $RHCS_PATH/system-config-cluster-1.0.25-1.0.noarch.rpm
rpm -ivh $RHCS_PATH/iddev-2.0.0-3.i686.rpm

# add these to the production servers
rpm -ivh $RHCS_PATH/dlm-1.0.0-5.i686.rpm
rpm -ivh $RHCS_PATH/dlm-kernel-2.6.9-41.7.i686.rpm
rpm -ivh $RHCS_PATH/dlm-kernheaders-2.6.9-41.7.i686.rpm
rpm -ivh $RHCS_PATH/dlm-kernel-smp-2.6.9-41.7.i686.rpm
rpm -ivh $RHCS_PATH/magma-plugins-1.0.6-0.i386.rpm

######################################
# Install the few GFS packages       #
######################################

mkdir /tmp/rhgfs_install

mount -o ro,loop -t iso9660 /tmp/software/rhes4u3/rhel-4-u3-rhgfs-i386.iso /tmp/rhgfs_install

RHGFS_PATH=/tmp/rhgfs_install/RedHat/RPMS/

rpm -ivh $RHGFS_PATH/GFS-6.1.5-0.i386.rpm
rpm -ivh $RHGFS_PATH/GFS-kernel-smp-2.6.9-49.1.i686.rpm
rpm -ivh $RHGFS_PATH/lvm2-cluster-2.02.01-1.2.RHEL4.i386.rpm

#########################################################
#                                                       #
# This is the end of the cluster software installation  #
# All further steps should be performed manually        #
#                                                       #
#########################################################

# You need to setup the host file on each cluster node
# either cut and paste into a shell or paste in a VI session
# watch out for tabs vs white space when you cut and paste

TIME=`date +%Y_%m_%d_%H%M`
cp /etc/hosts /etc/hosts_$TIME.bak
cat </etc/hosts

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost

# production search engine host file last updated 13th April 2007

#temporary blue IPs
10.10.10.50 example10.example.com example10
10.10.10.51 example11.example.com example11
10.10.10.52 example12.example.com example12
10.10.10.53 example13.example.com example13
10.10.10.54 example14.example.com example14
10.10.10.55 example15.example.com example15
10.10.10.56 example16.example.com example16
10.10.10.57 example17.example.com example17
10.10.10.58 example18.example.com example18
10.10.10.59 example19.example.com example19

# ilo interfaces
# warning if you move a blade the ilo ip will change
10.10.10.24 example10-ilo
10.10.10.25 example11-ilo
10.10.10.26 example12-ilo
10.10.10.27 example13-ilo
10.10.10.28 example14-ilo
10.10.10.40 example15-ilo
10.10.10.41 example16-ilo
10.10.10.42 example17-ilo
10.10.10.43 example18-ilo
10.10.10.44 example19-ilo

#end of hosts file

EOF

################################################
#                                              #
# Setting up the fibre channel cards and paths #
#                                              #
################################################

# make backup of multipath.conf file
TIME=`date +%Y_%m_%d_%H%M`
cp /etc/multipath.conf /etc/multipath.conf_$TIME.bak

#To enable the mutipathd to scan for luns,
#you need to comment out the following 3 lines in /etc/multipath.conf

#devnode_blacklist {
# devnode "*"
#}

# at this point its easiest to reboot you can lookup the rescan method
# for you particular HBA driver but these always change
# once the box comes back up run
multipath -l

#you should see somtheing like

mpath1 (360060e80000000000000000000000000000)
[size=500 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:0 sda 8:0 [active][ready]
\_ round-robin 0 [enabled]
\_ 1:0:0:0 sdb 8:16 [active][ready]

#edit /etc/ssh/sshd_config and change the permit root login just

mkdir /root/scripts
touch /root/scripts/cluster_services.sh
chmod 700 /root/scripts/cluster_services.sh
vi /root/scripts/cluster_services.sh

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
#!/bin/bash
# this script makes it easy to enable and disable all the cluster services from automatically starting

ACTION=$1
if [ -z "${ACTION}" ]; then
echo "Usage: $0 on|off|start|stop|status"
exit 1
fi

if [ ${ACTION} = "on" ]; then
printf "Setting cluster services to start on runlevels 2345\n"

chkconfig --level 2345 ccsd on
chkconfig --level 2345 cman on
chkconfig --level 2345 fenced on
chkconfig --level 2345 clvmd on
chkconfig --level 2345 gfs on
chkconfig --level 2345 rgmanager on

elif [ ${ACTION} = "off" ]; then
printf "Turning cluster services off for runlevels 2345\n"

chkconfig --level 2345 ccsd off
chkconfig --level 2345 cman off
chkconfig --level 2345 fenced off
chkconfig --level 2345 clvmd off
chkconfig --level 2345 gfs off
chkconfig --level 2345 rgmanager off

elif [ ${ACTION} = "status" ]; then

/usr/sbin/clustat

elif [ ${ACTION} = "stop" ]; then
printf "Run the following commands manually in order";echo

echo "service rgmanager stop"
echo "service gfs stop"
echo "service clvmd stop"
echo "service fenced stop"
echo "service cman stop"
echo "service ccsd stop"

elif [ ${ACTION} = "start" ]; then
printf "Run the following commands manually in order";echo

echo "service ccsd start"
echo "service cman start"
echo "service fenced start"
echo "service clvmd start"
echo "service gfs start"
echo "service rgmanager start"

fi

#end of file

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

# Below is the initial cluster configuration file that I created, the only deviation from standard is..
# I setup a unique ILO FENCE account on each host node, these account name's are based on the host name.
# This step is very important when dealing with blade enclosures

# The reason for having unique names is due the cluster fencing mechanism.
# The problem stems from the way the ILO's IP addresses are assigned via DHCP,
# The IP addresses are assigned based upon the physical location of the blade within the enclosure cabinet.
# note they are not permanently assigned to a particular blade.
# ie of you move a blade from slot 9 to slot 10, its ILO IP address will also change, and what ever blade is put
# back into slot 9 will inherit its old IP. Who cares i hear you say. What if you put someone else's server in the old slot 9
# and the cluster tries to fence the the blade you have just moved to slot 10.
# If you failed to update the /etc/hosts file The fencing mechanism will shut down the wrong node (if it could log in).

# So when ever a blade is physically moved make sure you update the hosts file on each cluster node and manually test login via ssh

# There is a bug in the ILOs ssh daemon that prevents you from logging in, a work around is to create a ssh conf file with
# ForwardAgent no then call it when you ssh by ssh -F -u

mkdir /root/.ssh
echo "ForwardAgent no" > .ssh/ilo_bug

ssh -F .ssh/ilo_bug xxxxxx_test_fence@10.10.10.29
# to restart server typr

Fencing via the HP ILO
Field Description
Name A name for the server with HP iLO support.
Login The login name used to access the device.
Password The password used to authenticate the connection to the device.
Hostname The hostname assigned to the device.

###############################################
#                                             #
# sample initial /etc/cluster/cluster.conf    #
#                                             #
###############################################

<?xml version="1.0" ?>
removed   
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

This change is made by the installation of the vm2-cluster-2.02.01-1.2.RHEL4.i386.rpm

diff /etc/lvm/lvm.conf /etc/lvm/lvm.conf.lvmconfold
172,173d171
< library_dir = "/usr/lib"
< locking_library = "liblvm2clusterlock.so"
215c213
< locking_type = 2
---
> locking_type = 1

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

The following directory's need to be shared between all nodes via GFS or NFS

/home/search
/big/search
/var/www/html/search

manually replicate this file between the nodes
/etc/httpd/conf/funnelback-httpd.conf