|
Bulletin 116 - 2004 Jun 17
The sxcross command no longer supports the sx5 and sx6 options (see sxcross -h). The changes were made because of problems with supporting multiple versions of the MPI libraries. The sx5 and sx6 options are no longer appropriate now that the SX-5 (Super-UX 12.1) crosskit has gone. There are new options, default and latest to select the default or latest installed compiler revisions. Note that now there is only one version of MPI. The default environment after loading sxcross no longer sets SX_BASE_MPI or a corresponding PATH element as you should not need these set. You can force these to be set with "sxcross mpisx/inst". Currently with this setting you will not see the locally customised sxmpif90 and sxmpic++ commands which honour F90_SITE_OPTIONS and SX_BASE_F90 etc.
See the user guide for more information on sxcross from On the SX-6/TX7 system, the temporary directories $LOCALDIR, $MMFSDIR and $TMPDIR are intended to be automatically deleted on logout (and at the end of batch jobs). This is not happening consistently for all sessions and jobs, in particular for tcsh batch jobs and some ksh jobs. If the temporary directories are not cleaned up, there exists the potential for filling the filesystem/going over-quota/running out of inodes. We are looking at a number of options to rectify the situation. The first option is flushing these directories at some time after job completion, which is currently being done. In this case there is some delay before the directories/files are deleted. Please do not rely on this delay. 3. SX-5 files mirrored on cherax for CSIRO users
This weekend, all the CSIRO SX-5 user files on cherax in the location cherax:/cs/data/SX5userdata/group/csabc will be moved into user home directories under the directory ~abc123/SX-5.csabc The files will then be under each individual user's control. (The directory name is a change from that notified previously, so that we can easily cope with cases where a user had more than one old username.) If you do not want this action, please let us know urgently. 4. New batch system for cheraxThe current batch system for jobs on cherax is licensed only for 8 processors. A new batch system, based on the torque distribution, has been installed on cherax for testing. If testing is satisfactory, then we will move to using the torque-based system as soon as possible. This will allow the Altix to be upgraded immediately to 16 processors, and when electrical work is completed and the system relocated within the machine room, to 64 processors. This will allow the system to cope better with the increasing load. Users on the Altix are urged to try the new batch system, by entering the command pkgenv torque to set up the correct path for the torque 1.0.1p5 system. The job parameters are compatible with the PBS system currently in production on the Altix. There are some deficiencies which should be noted, and we will work on to improve.
A sample job for the batch system on cherax follows. #!/bin/sh #PBS -N myjob #PBS -l walltime=3:20:00 #PBS -l ncpus=1 #PBS -l mem=100mb #PBS -l pcput=3990 #PBS -l cput=4000 #PBS -j oe # cd $PBS_O_WORKDIR efc -V -tpp2 -i8 -r8 -O3 -Vaxlib -o myprog testit.f ./myprog (Note that there should be no space between the '#' and the 'PBS'.) Note that any computational work on cherax should be submitted via the batch system (and not just backgrounded from an interactive session). We are exploring mechanisms of limiting CPU usage in an interactive session and implementation will be announced in a future HPCbull. 5. GFS file systems, servers and integrityOn 26th May and 14th June, there were failovers of GFS file systems from one TX7 to the other. Under these circumstances, the integrity of the data being accessed by jobs could not be guaranteed, and jobs were terminated to ensure incorrect results were not returned, and e-mails were sent to the users. NEC has a fix available for the integrity problem, which the HPCCC will install shortly. Please note that the GFS file systems are based on NFS semantics, which are not fully POSIX compliant. For example, if one node in a multi-node system writes to a GFS file, then there is no guarantee that another node reading the file subsequently will have access to the data written from the first node. You should not therefore use files accessed from multiple nodes for inter-process communication. With care you can potentially use do_t7 or similar to do all the file accesses from one host. Please contact HPCCC for further guidance. 6. SX-6 batch job accountingGareth Williams has written a utility to summarise job accounting information for SX-6 batch jobs and interactive sessions. It is a ksh script called 'jobinfo' (in /usr/local/bin). Please try it out and send comments to him. The HPCCC may have the utility run for all users at the end of all batch jobs (and interactive sessions).
To get a usage message run: $ jobinfo help usage: jobinfo [help] [init [now]] [detail] [help] [off] "jobinfo" generates and filters process accounting info from acctcom for the current batch job or session (user/tty combination) init: set level and start time for subsequent jobinfo call in this sessionnow: only show accounting for commands after now (only valid with init) detail: enable raw acctcom output for all processes considered all: include processes with low cpu (< 0.3 s) off: suppress jobinfo help: this message Without the help or init options, jobinfo runs acctcom and summarises the output to stdout. When the detail, all or off options are given, they are saved to a file (in $TMPDIR) and reused for further runs of jobinfo (which have no options specified). Options may also be read from the environment variable JOBINFO (in particular 'NO') but this cannot be set in a batch script as the script runs in a sub-shell (not the login shell). It can be set in .login or .profile. It is intended that jobinfo be run on logout for all batch and interactive login sessions: the HPCCC is hoping to implement this soon. 7. e-mail from batch jobsWe have found that attempts to send output from batch jobs on both the TX-7s and the Altix sometimes fail, but the addition of the -v flag (for verbose) to the mail command, or the use of the mailto program, works. 8. Network access restrictions, and working with multiple systemsWe currently have routing limitations to/from the SX-6 cluster including to the TX7s: mawson and eccles. Currently, you are best off transferring files from the TX7s over a fast network to a front-end such as cherax or gale, and then from there to your local machines. These restrictions may be lifted soon for file transfers from the TX7s, but not from the SX-6s. We recommend you consider continuing to use the SX-6 cluster and front-ends in the following way (please substitute eccles for mawson and the name of your own front-end such as gale for cherax in the following):
With this strategy, since any file that is not purely transient gets transferred back to cherax, you can copy the file from cherax to local systems when you need to. (The above is mostly CSIRO specific, but Bureau users can translate appropriately noting that cherax serves both as a HPCCC front end and the CSIRO datastore, but these functions are separated (gale and sam) for Bureau users.) 9. Weeding out small block i/o.Please accept a reminder again to look at the i/o processing on all jobs destined for the SX-6/TX7 system. Recently, the performance on one program execution was improved by a large factor - the job included the setting of F_SETBUF06=0, so that there was a zero-length buffer for the file on unit 6 (stdout), and every write involved the data being sent over NFS to the GFS discs. (This statement had been inserted for debugging purposes, but had not been removed.) This is an extremely poor practice in the SX-6/TX7 environment. Please sweep all your scripts and codes and look for such settings - the immediate fix is to change the setting to some value larger than the expected file size. Given the amount of memory on each node (64 Gbyte), or about 8 Gbyte per processor, then think about using that memory in your applications to reduce i/o traffic. If data from a file is read more than once in a program, then read it in once to a large array, and cut out the re-read. All i/o to the GFS file systems should have appropriate values of buffers set, e.g. with commands like export F_SETBUF 4096 (the units are kbytes). In addition, any jobs using Fortran direct access i/o probably shouldn't: but if they do, they should use the export F_HSDIR e.g. with export F_HSDIR=10 for Fortran unit 10. Any job using netCDF should use the recent library, which supports the use of the variable NC_BLOCKSIZE, e.g. export NC_BLOCKSIZE=2097152 to set a buffer size of 2 Mbyte. For further information, see the FORTRAN90/SX Programmer's Guide. See also the FAQs
"How can I improve performance of file i/o in Fortran?"
"How can I speed up I/O to and from netCDF files(on SX-5 or Cray)?" Per Nyberg, formerly with NEC and posted at the HPCCC, will be giving presentations at: CSIRO Atmospheric Research, Stations Street, Aspendale VicMonday 12th July from 14:00 in the Lecture Theatre. HPCCC 24th Floor 150 Lonsdale Street Melbourne Tuesday 13th July from 10:30 in the Meeting Room Per is responsible for Cray's worldwide strategic planning, business development and marketing for Environmental Sciences. He will explore two issues: research in atmospheric science and the computational issues that will arise; and provide a view of how Cray sees the computational needs of environmental science developing, and being met by Cray. Cray will also deliver a formal presentation and update on Cray in the Environmental Science sector. For the presentation at the HPCCC, please RSVP by 7th July to Ms Erika Stojanovic, 03 9669 8113, erika.stojanovic@csiro.au, as space will be limited. 11. Queue on a dedicated node for testingThe HPCCC has received a request for access to a node for dedicated testing of performance issues. The HPCCC will shortly create a new queue, bmtest, open to all Bureau users, which will be restricted to one node, and have a run limit of one job at a time. No other jobs from other queues will be run on that node. This will allow the node to be used for such testing, but in order to not waste the node at other times, some users might like to investigate the use of that queue.
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |