I work in the High Performance Computing Center at the University of Southern California. Here's a few things of interest to the community...
xcat-dist-oss-1.2.0.tgz md5 signed md5 Manifest, xCAT OSS package
xcat-dist-oss-1.3.0.tgz md5 signed md5 Manifest, xCAT OSS package
xcat-ipmitool.tgz md5 signed md5 README.ipmitool, ipmitool support for xCAT for 1.1.x (1.2.x already includes it)
perl-PBS, perl module for PBS client libraries, and includes a newer pbstop. This is still alpha code.
dumpmom, dump some info from pbs_mom for scripting/debugging purposes. configure and makefile requires TORQUE-2.1.0 or newer.
submit_p4shmem csh script, PBS job script for the mpich p4shmem device, it tries to fix some lameness in mpirun.
torque-1.1.0p4-qstat-empty-headers.patch makes qstat print column headers even with empty output.
torque-1.2.0b0-dumpmom.patch adds the dumpmom command.
torque-1.2.0b0-down_on_error.patch marks nodes down if they have an ERROR message (see health check docs).
torque-1.2.0p1-momupdateinternval.patch is a trivial patch that increases the mom stat update interval. The non-configurable default is too low in my opinion.
torque-1.2.0p5-jobnanny.patch protects against jobs that are stuck in an exiting or preexiting state by adding a "job deletion nanny" that periodically tries to kill jobs that have been killed (by qdel or your scheduler). This mechanism also purges jobs that don't exist on the mother superior node. The code is disabled unless JOB_DELETE_NANNY is defined at compile.
torque-1.2.0p5-jobdepterm2.patch ensures that deleted or aborted jobs also remove any dependant jobs.
BSD Process accounting on Linux is broken if you have UIDs and/or GIDs over 65536(2^16). These patches fix the problem while maintaining backwards compatibility.
32bit-pacct-howto.txt read this before doing anything else
linux-acct-uid32.patch, to fix BSD process accounting in linux for uids/gids over 2^16.
acct.h-uid32.patch, fix up /usr/include/sys/acct.h if you've patched the kernel with linux-acct-uid32.patch.
process accounting tools - acct-6.3.2-32bit.patch - psacct-6.3.2-9uid32.src.rpm, for psacct-6.3.2 on RedHat 7.2
symlink_unbalanced_kunmap.diff, fixes a kernel oops when many nodes create the same symlink at the same time in an NFS mount using a Solaris server
big-ring-buffer.patch, if the top of "dmesg" is getting lost, use this patch
linux-2.4.20-ext3.patch, important ext3 fixes for 2.4.20
irqbalance-2.4.20-MRC.patch, IRQ load balancing performance enhancement
linux-2.4.20-VFS-lock.patch, filesystem locking within the VFS (mostly for LVM and ext3/quotas)
linux-2.4.20-mrc-base.patch, fix filesystem quotas for 32bit uids
linux-ipmi-2.4.20-v21.diff, OpenIPMI driver
preempt-kernel-rml-2.4.20-3.patch, improve system responiveness with Robert Love's wonderful preemptible patch (it's no coincidence that his site looks just like mine; I stole his stylesheet!)
tg3-2.4.20.patch, version 1.4 of the tigon3 driver (use this instead of that nappy bcm5700 driver).
big-ring-buffer.patch, if the top of "dmesg" is getting lost, use this patch
preempt-kernel-rml-2.4.21-1.patch, improve system responsiveness with Robert Love's wonderful preemptible patch
Send questions to garrick@usc.edu
Consider everything on this page (unless noted as from another author) to be GPL'd
My GPG pubkey
Valid HTML 4.01