Setup NG2HPC at University of Canterbury

From BeSTGRID

Jump to: navigation, search

The NG2HPC is a copy of the NG2 machine created to integrate the [IBM p575 HPC] with the grid.

The steps in the setup are:

  1. Integrate the machine with the cluster's filesystem
  2. Integrate the virtual machine with load leveler
  3. Install globus
  4. Install globus load leveler setup.
  5. Install and configure MIP to register the cluster in the MDS.

Contents

[edit] Filesystem integration

In order for globus to state-in files needed by jobs, ng2hpc needs access to the cluster's filesystem. The HPC uses GPFS - we were considering either mounting the filesystem directly via GPFS, or exporting the filesystem from a node (the p520) via NFS and mounting it on the gateway.

[edit] GPFS

We have obtained the GPFS Multiplatform CD with GPFS binaries and drivers for Linux. RHEL4 is a supported platform, and the driver compiles fine under a Centos 4.4 kernel (2.6.9-42.0.10ELsmp). The drivers however did not compile under the modified Xen kernel. It is however possible to setup a Xen Vladimir:HVM virtual machine running unmodified CentOS kernel and install the drivers there.

We did not use this option, but the steps are:

  1. Pre-install: needs compatibility libraries and imake:
    yum install compat-libstdc++-33 xorg-x11-devel
  2. Run gpfs_install-3.1.0-0_i386 from the CD; this creates rpm files in /usr/lpp/mmfs/3.1/
  3. Install the RPM packages:
    rpm -Uvh /usr/lpp/mmfs/3.1/*.rpm
  4. Follow the instructions in /usr/lpp/mmfs/src/README
    1. export SHARKCLONEROOT=/usr/lpp/mmfs/src
    2. cd /usr/lpp/mmfs/src/config
    3. cp site.mcr.proto site.mcr
    4. edit site.mcr
      • LINUX_DISTRIBUTION = REDHAT_LINUX
      • #define LINUX_DISTRIBUTION_LEVEL 44
      • #define LINUX_KERNEL_VERSION 2060942
      • If you forget to change the definitions at this time, you must later edit both src/site.mcr and src/shark/config/site.mcr.
    5. And compile and install
make World
su -c make InstallImages
cd /usr/lpp/mmfs/bin
insmod tracedev
insmod mmfslinux
insmod mmfs26 

[edit] NFS

Export filesystems (/hpc/{home,work,projects,griddata,gridusers} from the cluster via NFS, and mount them to ng2hpc.

/etc/fstab:

hpcgrid1-c:/hpc/gridusers      /hpc/gridusers   nfs     fg,retry=20,hard,acregmin=1,acdirmin=1    0 0

In order to force tighter synchronization and avoid situation when a file is created on the server side and not yet visible locally, minimum attribute caching interval has been reduced to 1s.

Note that for client NFS to work, portmapper must be running - and in order for NFS filesystems from /etc/fstab to be mounted at boot time, service netfs must be on:

chkconfig portmap on
chkconfig netfs on  

service portmap start
service netfs start

[edit] Load Leveler

This has been done and the machine is capable of submitting LoadLeveler jobs. Note that this had to be redone. The GT40-LoadLeveler integration library requires the full version of loadleveler - it has hardcoded path names into /opt/ibmll/LoadL/so/bin/, while the submit-only version installs into /opt/ibmll/LoadL/so/bin. Also note that LoadLeveler and GT40 must be installed 'before' the integration library can be installed.


[edit] Install LoadLeveler binaries

We have installed the LoadLeveler "full" binaries from the LoadLeveler 3.4 Multiplatform CD.

yum install openmotif
# needed by LoadL-full

rpm -e --noscripts LoadL-so-license-RH4-X86-3.4.0.0-0 LoadL-so-RH4-X86-3.4.0.0-0
### --noscripts is essential - the LoadL-*license RPMs would remove /opt/ibmll when uninstalled
rpm -Uvh /root/inst/LoadLMulti/LoadL-full-license-RH4-X86-3.4.0.0-0.i386.rpm
rpm -Uvh /root/inst/LoadLMulti/LoadL-full-RH4-X86-3.4.0.0-0.i386.rpm

either:

/opt/ibmll/LoadL/sbin/install_ll -d /root/inst

or create /opt/ibmll/LoadL/lap/license/status.dat

#Wed May 16 16:50:45 NZST 2007
Status=9

Finally,

rpm -Uvh LoadL-so-RH4-X86-3.4.0.0-0.i386.rpm

The binaries are now in /opt/ibmll/LoadL/full/bin/.

The next step is to configure LoadLeveler. LoadLeveler expects that user loadl exists, and reads the configuration from ~loadl.

adduser -u 1005 loadl

Now, the optimal step would be to mount all home directories from /hpc/home, including ~loadl. Until the directories are exported, the temporary solution is to copy configuration from the HPC:

su loadl
cd ~loadl
ssh vme28@hpclogin2 tar cvzf - -C /hpc/home/loadl LoadL_{admin,config} local > loadl-config-snapshot-2007-05-18.tar.gz
tar xzf loadl-config-snapshot-2007-05-18.tar.gz
chmod 755 . 

Update: The home directories are already exported via NFS, and I am now using the shared LoadLeveler configuration from /hpc/home/loadl (vipw, change home directory of user loadl from /home/loadl to /hpc/home/loadl).

Your admin needs to setup a public scheduler and add your machine as a submit-only node, and needs to create a configuration file for your machine. This file should exist for both the name how the cluster knows the machine (ng2hpc-c) and the hostname of the machine (ng2hpc). Thus, create in ~loadl/local/ files LoadL_config.ng2hpc and LoadL_config.ng2hpc-c with

SCHEDD_RUNS_HERE = FALSE
STARTD_RUNS_HERE = FALSE
START_DAEMONS = FALSE

Add /opt/ibmll/LoadL/full/bin/ to your PATH and llsubmit, llq<tt>, ... should work now. Note that in case different LoadLeveler versions are mixed, the Central Manager must run the most recent version of all involved. Otherwise, commands such as <tt>llstatus may report communication errors.

To make LoadLeveler binaries automatically accessible to everyone create (executable) /etc/profile.d/loadl.sh:

PATH=$PATH:/opt/ibmll/LoadL/full/bin/
export PATH

[edit] Globus

The Globus installation has roughly followed the NG2 setup - and thus, roughly followed the [APAC NG2 setup]. The key difference has been that as LoadLeveler is used instead of PBS, no PBS-specific packages were installed, and the PBS-specific installation steps from the build script were skipped.

The main steps have been:

  • yum install Gbuild Gpulse
  • skipping Gtorque-client (and other PBS-specific instructions)
  • install host certificate into /etc/grid-security/host{cert,key}.pem (key protected)
  • cleanup services that can't run (and cause gridpulse.sh to report the host as Not OK)
chkconfig lvm2-monitor off
chkconfig cpuspeed off
  • yum update (to update to CentOS 4.5, and to avoid inconsistencies in package update status (new packages would be installed from the CentOS 4.5 distribution).
  • modify BuildNg2Vdt161.sh to skip PRIMA setup and to skip any PBS-specific checks and configuration steps (saved as BuildNg2Vdt161NoPrima.sh)
    • do not install pbs-telltail
    • do not check for qstat
    • do not install Globus-WS-PBS-Setup (I did let it install, and it was a pain to remove all traces of it from the gateway)
    • do not configure PRIMA (let us us EDG-GridMap instead)
    • do not set up the pbs-logmaker service
--- BuildNg2Vdt161.sh   2007-05-16 18:59:49.000000000 +1200
+++ BuildNg2Vdt161NoPrima.sh    2007-07-20 15:19:16.000000000 +1200
@@ -26,8 +26,11 @@
             vim-enhanced iptables ntp yp-tools mailx nss_ldap libXp   \
             tcsh openssh-server sudo lsof slocate bind-utils telnet   \
             gcc vixie-cron anacron crontabs diffutils xinetd tmpwatch \
-            sysklogd logrotate man pbs-telltail compat-libstdc++-33   \
+            sysklogd logrotate man compat-libstdc++-33   \
             compat-libcom_err perl-DBD-MySQL openssl097a gcc-c++ $Extras
+###### disabled by VLADIMIR MENCL: #pbs-telltail
+## DISABLED by VLADIMIR MENCL 2007-07-11
+if [ -n "$REALLYBUGMEWITHQSTAT" ] ; then
 until qstat >/dev/null 2>/dev/null ; do
   echo    "==> qstat not found or not configured!"
   echo -n "==> Enter path (e.g. /usr/local/pbs/bin), else enter 'q' .. "
@@ -35,6 +38,7 @@
   [ "$_Ans" = q ] && echo "==> You might want to do: yum install Gtorque-client" && exit 1
 done
 [ -d /usr/spool/PBS/server_logs ] && export PBS_HOME=/usr/spool/PBS
+fi

 #
 # Pacman, port-range adjustment, java-version adjustment, VDT
@@ -64,7 +68,8 @@

 #
 # VDT Components
-for Component in JDK-1.5 Globus-WS PRIMA-GT4 Fetch-CRL Globus-WS-PBS-Setup ; do
+###### disabled by VLADIMIR MENCL: Globus-WS-PBS-Setup
+for Component in JDK-1.5 Globus-WS PRIMA-GT4 Fetch-CRL ; do
   echo "==> Checking/Installing: $Component"
   pacman -pretend-platform linux-rhel-4 $ProxyString \
     -get http://www.grid.apac.edu.au/repository/mirror/vdt-1.6.1.mirror:$Component || echo "==> Failed!"
@@ -87,7 +92,9 @@
 wait_timeout=2764800
 ' /opt/vdt/mysql/var/my.cnf
 . /etc/profile; vdt-control --force --on && echo "==> Installed: startup scripts"
-if [ ! -f /etc/grid-security/prima-authz.conf ] ; then
+
+## DISABLED by VLADIMIR MENCL 2007-07-11
+if [ -n "$REALLYINSTALLPRIMA" -a ! -f /etc/grid-security/prima-authz.conf ] ; then
   until [ -n "$Gums_Server" ] ; do
     echo -n "==> Please enter the name of your GUMS server [e.g. nggums.vpac.org ] .. "
     read Gums_Server
@@ -126,7 +133,7 @@
 #
 # Wrapup
 [ -x  /usr/local/sbin/SecureMdsVdt161.sh ] && /usr/local/sbin/SecureMdsVdt161.sh Supress
-chkconfig --add pbs-logmaker; service pbs-logmaker start
+###### disabled by VLADIMIR MENCL: chkconfig --add pbs-logmaker; service pbs-logmaker start
 echo "==> Re-starting: xinetd"
 chkconfig --add xinetd; service xinetd start; service xinetd reload
 echo "==> Running: /opt/vdt/fetch-crl/share/doc/fetch-crl-2.6.2/fetch-crl.cron"
  • run BuildNg2Vdt161NoPrima.sh
  • visudo, copy and paste from /opt/vdt/post-install/README (the Build script configures the sudo permissions for the syntax used with PRIMA, with a grid-mapfile, the command syntax is different)
  • /opt/vdt/post-install/README did not give any additional instructions
  • container did not start because we have no gridmap-file
  • Install EDG-Make-Gridmap
 pacman -pretend-platform linux-rhel-4 -get http://www.grid.apac.edu.au/repository/mirror/vdt-1.6.1.mirror:EDG-Make-Gridmap
 vdt-control --on edg-mkgridmap # this enables cron job
  • Install UberFTP - to use as a client tool.
pacman -pretend-platform linux-rhel-4 -get http://www.grid.apac.edu.au/repository/mirror/vdt-1.6.1.mirror:UberFTP

[edit] LoadLeveler integration

This has been done according to the IBM instructions. Before this integration is done LoadLeveler (full version) and and Globus Toolkit 4.0 must be installed and configured.

[edit] Installing LoadLeveler GT40 integration library

  • Extract the integration library (it is in the llgrid.tar file packaged with the LoadL-full RPM)
mkdir /root/inst/llgrid
cd /root/inst/llgrid
tar xvf /opt/ibmll/LoadL/full/lib/llgrid.tar
  • Change the configuration file
vi /root/inst/llgrid/gt4/globus-loadleveler.conf
    • change log_path to a directory that is accessible both by globus and by the LoadLeveler scheduler --- that likely means an NFS-mounted directory.
log_path=/hpc/gridusers/grid-bgd/log/globus-loadleveler.log

Now the instructions ask to run

cd /root/inst/llgrid/gt4
./deploy.sh

The deploy script configures LoadLeveler as an additional scheduler in your Globus installation. Namely, it:

  • installs the perl script GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/loadleveler.pm to handle job submission and status inquiry.
  • installs information service provider $GLOBUS_LOCATION/libexec/globus-scheduler-provider-loadleveler to provide basic MDS information (though this is not the GLUE MDS information).
  • $GLOBUS_LOCATION/etc/grid-services/jobmanager-loadleveler - don't know what's this one good for (?? GT2)
  • $GLOBUS_LOCATION/etc/globus-loadleveler.conf - configuration file for the SEG (Scheduler Event Generator) specifying where the log file is
  • $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml - configure directory mappings for the Loadleveler Factory Type.
  • copy the SEG binaries into $GLOBUS_LOCATION/lib/
    cp -f seg-binary-linux/* $GLOBUS_LOCATION/lib/
  • create the log file

However, a number of additonal steps have to be done.

  • Important! jndi-config.xml must specify substitution definitions file and a refresh period. jndi-config.xml files for other GRAM services have these definitions, but etc/gram-service-Loadleveler/jndi-config.xml does not. Add
                <parameter>
                    <name>
                        substitutionDefinitionsFile
                    </name>
                    <value>
                        /opt/vdt/globus/etc/gram-service-Loadleveler/substitution-definition.properties
                    </value>
                </parameter>
                <parameter>
                    <name>
                        substitutionDefinitionsRefreshPeriod
                    </name>
                    <value>
                        <!-- MINUTES -->
                        480
                    </value>
                </parameter>
to the end of <resourceParams> and copy /opt/vdt/globus/etc/gram-service-Fork/substitution-definition.properties to /opt/vdt/globus/etc/gram-service-Loadleveler/substitution-definition.properties.
  • Also, while editing /opt/vdt/globus/etc/gram-service-Loadleveler/jndi-config.xml, add ${GLOBUS_USER_HOME}/ to the value of scratchDirectory parameter - the value should be ${GLOBUS_USER_HOME}/.globus/scratch.
  • Fix the /opt/vdt/globus/libexec/globus-scheduler-provider-loadleveler script - it tries to create a temporary file in the current directory - which is the globus base directory where it has no write access.
mkdir /opt/vdt/globus/var/llglobus/tmp
chown daemon.daemon /opt/vdt/globus/var/llglobus/tmp
--- globus-scheduler-provider-loadleveler.ORIG  2007-07-11 17:36:52.000000000 +1200
+++ globus-scheduler-provider-loadleveler       2007-07-12 11:28:59.000000000 +1200
@@ -6,6 +6,9 @@
 # Information Provider service for LoadLeveler
 #

+### FIX:
+LLGLOBUS_VAR_DIR=$GLOBUS_LOCATION/var/llglobus/tmp
+
 # programs used in this script located by autoconf:
 grep=${GLOBUS_SH_GREP-grep}
 sed=${GLOBUS_SH_SED-sed}
@@ -51,10 +54,10 @@
   fi
 fi

-llstatus_file="./globus_llstatus_tmp_file.$$"
-llq_file="./globus_llq_tmp_file.$$"
-llclass_file="./globus_llclass_tmp_file.$$"
-llclass_l_file="./globus_llclass_l_tmp_file.$$"
+llstatus_file="$LLGLOBUS_VAR_DIR/globus_llstatus_tmp_file.$$"
+llq_file="$LLGLOBUS_VAR_DIR/globus_llq_tmp_file.$$"
+llclass_file="$LLGLOBUS_VAR_DIR/globus_llclass_tmp_file.$$"
+llclass_l_file="$LLGLOBUS_VAR_DIR/globus_llclass_l_tmp_file.$$"

 ############################################################
 #

  • Make sure the user your grid-mapfile maps to does exist.
  • Make sure the working directory of your LoadLeveler job does exist on the target system. If it is in the home directory of a user, the path to the home directory must be the same on the gateway and on the LoadLeveler system.
  • Make sure your LoadLeveler log file (as specified in /opt/vdt/globus/etc/globus-loadleveler.conf) is accessible from LoadLeveler scheduler. The LoadLeveler scheduler must write to that file - then, the LoadLeveler SEG will be able to let globus know about the job's progress. Otherwise, your job will hang as Unsubmitted.

And the log message in LoadLeveler log files will be

 07/12 17:38:56 TI-9260 Cannot open globus LoadLeveler log file for l4n02-c.85.0.



[edit] Troubleshooting

Check loadleveler log files (on the scheduler node, l4n02-c:/var/loadl/log/SchedLog ???, for example the messages

http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/user-index.html#s-wsgram-user-troubleshooting

Remaining problem:

  • (unresolved, did not reoccur) - GridFTP could not retrieve output file when job completed too fast
    • may have been solved by reducing NFS attribute caching time interval.
  • unresolved: if job does not produce any error output, LoadLeveler deletes the stdErr file and Globus reports a GriFTP file not found error.

[edit] Additional configuration

GBLL_{TASKS_PER_NODE,COMMENT,RESTART,...} env vars - environment for Loadleveler.pm

see Grid Toolbox Adminstration Guide, http://dl.alphaworks.ibm.com/technologies/gridtoolbox/GridAdmin.pdf


If compiling SEG libraries

INSTALL Globus-SDK

pacman -pretend-platform linux-rhel-4 -get http://www.grid.apac.edu.au/repository/mirror/vdt-1.6.1.mirror:Globus-Base-SDK
 ./configure --prefix=$GLOBUS_LOCATION --with-flavor=gcc32dbg
make 
make install


[edit] Cleaning up PBS

If Globus-PBS-Setup / Globus-WS-PBS-Setup is accidentally installed, removing pacman packages is not enough.

pacman -remove Globus-PBS-Setup Globus-WS-PBS-Setup

In addition, you have to remove all files installed by vdt_globus_jobmanager_pbs-VDT1.6.0-x86_rhas_4.tar.gz and vdt_globus_wsjobmanager_pbs-VDT1.6.0-x86_rhas_4.tar.gz have to be removed manually.


[edit] Testing LoadLeveler jobs

job submission: -Ft Loadleveler ??? what happens to -Ft PBS jobs (if submitted accidentally?)


job submission: "two strategies" [LLGT40UserGuide]

(1) submit via GRAM only "llsubmit sample1.jcf".  Obvious (though un-stated) drawback is that the GRAM job will terminiate immediately and the LL job won't be monitorable via Globus
(2) -Ft Loadleveler

ehm.... as "fs mappings" are entered for each port (-Ft) separately, can we have multiple home mappings for different ports? (and have a single ng2?)


[edit] Logging grid usage

I have created a /usr/local/sbin/send_grid_usage script to report LoadLeveler job usage. The script send the information in the same way as the send_grid_usage script written by David Bannon for PBS systems. However, as LoadLeveler has a completely different system of storing job accounting information, the script has to first obtain the information from LoadLeveler with llsummary and next convert the information into PBS format with the script /usr/local/sbin/loadl2pbs.pl I wrote for this purpose. Note that this script has to use the Job Step Id (Job Id with ".0" appended) as the PBS JobID to match the information produced by sent in the Job-DN emails by the auditquery script.

The script keeps a local copy of the LoadLeveler data and the converted PBS output in /opt/vdt/globus/var/llacct

The /etc/cron.hourly/auditquery script worked without modification, but I have extended it to log the JobID-DN pair locally into /opt/vdt/globus/var/llacct/jobdn.log


[edit] Tweaking loadleveler.pm

  • If job description does not specify a job class (via a <queue> element), set a default job class: par4_6 for parallel jobs and serial_6 for serial jobs.
  • Tagging jobs: extract user identity from X509_USER_CERT (if job credentials have been delegated) and tag the job with this information by setting GLOBUS_USER_DN and GLOBUS_USER_EMAIL in the LoadLeveler job environment.
    • If user email is available, set LoadLeveler notify_user to the users's email:
      # \@ notify_user    = $job_environment{GLOBUS_USER_EMAIL}
  • Change the POE executable from /bin/poe to /usr/bin/poe. On AIX, /bin is symlink to /usr/bin anyway; on Linux, only /usr/bin exists.
  • If job does not have uniq_id (has not been seen yet), use "".time().".$$" as the unique ID for log file name.
  • Remove (Adapter == "ethernet") job requirement (for an unexplained reason, this requirement could not be satisfied on Linux, and is not necessary on any nodes anyway).
  • Let the script print a "letterhead" statement to stderr to prevent LoadLeveler from deleting the empty stderr file.
    $script_file->print('echo "This job has been processed at the University of Canterbury Supercomputing Center (node `hostname`)" >&2'."\n");
  • Until the modules package in installed, add at least /usr/local/bin to the job's PATH:
    $script_file->print('PATH=/usr/local/bin:/hpc/home/vme28/bin:$PATH'."\n");
  • Log the JobID - User-DN pair to /opt/vdt/globus/var/llacct/jobdn-subm.log
  • If job submission fails, report the llsubmit error output as a GT3_FAILURE_MESSAGE message, so that it gets displayed on the globusrun-ws console (and also output the error message into the job standard error).
  • Workaround for a LoadLeveler bug: if job environment size is just below 1kB, a job with a number of tasks >=8 may fail with
    0031-769 Invalid task environment data received.
    For ordinary jobs, this would happen when the GLOBUS_USER_DN and GLOBUS_USER_EMAIL are both set. In this case, we get the environment size over the treshhold with a comment environment variable (GLOBUS_COMMENT).
  • Get BlueGene job submission working.
  • If (cpu)Count is > $LL_MAX_TASKS_PER_NODE (==16) and hostCount is not specified, assume hostCount (ll_node) to be the least number of nodes necessary to accommodate all the tasks:
    ($ll_total_tasks - 1)/$LL_MAX_TASKS_PER_NODE + 1;

[edit] Receiving Mail

As LoadLeveler may be sending email to the user submitting the job (which may be a virtual account, but nevermind), I have enabled receiving remote email on Ng2Hpc. Edit /etc/mail/sendmail.cf:

O DaemonPortOptions=Port=smtp,Addr=0.0.0.0, Name=MTA

[edit] Installing MIP

The gateway has MIP installed, and feeds the GLUE information via MIP remote to Ng2, where the information gets published into MDS. Due to its complexity, installing MIP on Ng2HPC has been documented on a separate page.

[edit] TODO

  • check with someone if it is worth looking at the Axis complaint in globus/var/container-real.log
    2007-07-11 16:02:20,598 WARN  utils.JavaUtils [main,isAttachmentSupported:1218] Unable to find required classes (javax.activation.DataHandler and javax.mail.internet.MimeMultipart). Attachment support is disabled.
    • .... likely only Axis complaining, not needed.

Optionally:

  • disable GRAM+RFT service registration from SecureMDS
  • Get job uniq_id (for logging) from the GLOBUS_GRAM_JOB_HANDLE
  • Remove temporary submission log in loadleveler.pm even if submission fails

Done:

  • move loadleveler log file to /hpc/gridusers/var/
  • prevent LoadLeveler from deleting empty stdErr
  • setup audit (to send JobID-DNs to VPAC)
  • setup PBS log equivalent to be sent to VPAC (JobID CPU usage)
  • JobDN-subm is logged with Unix time (int); switch to readable date