Bulletin 145 - 2005 Sep 02

  1. HPCCC SX-6 system upgrade to SUPER-UX R15.1
  2. HPCCC SX-6 environment for SUPER-UX R15.1
  3. HPCCC SX-6 NQS II batch parameter -l cpunum_job=n
  4. Use of memory
  5. HPCCC req system - change of appearance
  6. New HPCCC utilities for the SX-6s
  7. R utility on cherax


1. HPCCC SX-6 system upgrade to SUPER-UX R15.1

All SX-6 nodes except sx610-sx617 are now running under SUPER-UX Release 15.1.

The remaining nodes are due to be upgraded on Tue-Wed 13-14 September. On Wed 14 September, there will be a one hour re-boot in the early morning, to reset the system to be one cluster again.

Where there are no nodes left running SUPER-UX R13.1 for a queue, then the queue has been re-directed to the R15.1 nodes, e.g the cs queue. All CSIRO queues are now directed to R15.1 nodes.

To allow jobs in the queues to be re-routed after the final changes, please submit jobs to routing queues such as sx, sxrt, sxmn, sxrtmn and the "3' variants, rather than to explicit execution queues such as bmml.

The original Bureau queues are directed to R13.1 nodes today, and all "3" queues are directed to R15.1 nodes. After 14 September all Bureau queues will be directed to R15.1 nodes.

After that date, users should drop the use of the "3" queues, which will disappear on 28th September. (CSIRO users can drop the "3"s now).



2. HPCCC SX-6 environment for SUPER-UX R15.1

2.1 Background information

The HPCCC provides a compilation or cross-compilation environment for the SX-6s on the following platforms:

SX-6 SUPER-UX (28 nodes)
TX7 ia64 Linux (eccles, mawson, r2)
PA-RISC HP-UX (gale)B
Altix ia64 Linux (cherax)
ia32 Linux (farrer)

(Each platform has certain unique features that we have tried to standardise.)

There are at least the following components:

NEC Compilers:
Fortran90/SX    C++/SX  

NEC supplied and supported libraries and utilities:
MathKeisan      ASL/SX  Vampirtrace     Vampir
Psuite          MPI/SX  sxld            FSA

Open Source libraries and utilities, repackaged and supported by NEC:
netCDF          NCO     
udunits         NCAR Graphics   

CSIRO-only library:
NAG

Local utilities:
sxcross         sxenv           sxcoffinfo      qsub

The HPCCC has provided access to multiple sxf90 release levels, and various library versions (e.g. 32- and 64-bit versions). With the SUPER-UX upgrade, the environments must also support R13.1 and R15.1 system libraries to minimise risk to all users.

Providing all combinations of compilers and libraries is non-trivial. To make this as user friendly as possible a number of special utilities have been developed. Examples follow:

The sxcross and sxcross_upgrade commands set up the cross-environment for use. (Use sxcross -help for help).

The sxenv command can provide information about versions available.

The new sxcoffinfo command can list information about an object or executable file - compiler and library version information for example. (Use sxcoffinfo -help for help).

(man pages for these are in preparation).

The R15.1 release cross systems have some different conventions compared to previous versions. These may affect user scripts and Makefiles depending on the compiler level and library options needed.

2.2 Information for SUPER-UX Release 15.1

To set up your environment for the SUPER-UX Release 15.1, use the command

        sxcross_upgrade

followed by

        sxf90_new_site_options

This sets your environment to the default R15.1 environment, including the default R302 sxf90 compiler.

To use a different compiler version, follow these commands with

        sxcross sxf90/rev313    or
        sxcross latest

or similar.

To link with alternative library versions, you may need to use explicit paths in Makefiles and scripts. If you do this, be sure you are aware of such occurrences for future changes. For example to use netCDF 3.6.0p1 with -ew and -size_t64, specify

LIBS= -L/SX/local/netcdf/3.6.0p1/lib-ew-64
INCLUDES= -I/SX/local/netcdf/3.6.0p1/include-ew-64

Take care to link the library that matches your compile options, e.g. -ew. You should also specify the corresponding include files; while they are normally identical it is not guaranteed.

2.3 Further information

For more information, including alternate options, see

http://www.hpccc.gov.au/hpccc/userguides/faq/ or http://intra.hpsc.csiro.au/userguides/faq/

  • look for SX-6 under the contents list, then "How do I cross- compile for SX during the transition to a new version of SUPER-UX?"

See also http://www.hpccc.gov.au/hpccc/userdocs/index_user.shtml for information on

  • sxenv - SX cross environment reporting utility user guide
  • NEC Super-UX and Cross Environment default software locations
  • Release details - Super-UX cross environment installed software

and the local user guide at

http://www.hpccc.gov.au/hpccc/userguides/sx/

2.4 cherax specifics

Some redundant links have been removed from /SX/usr/lib/.

Existing redundant links for netCDF and udunits will be removed from /SX/usr/include on Monday 12 September.

This will potentially affect users who have been including these files without specifying any -I compile options.

The fix for such users is to use the new F90_SITE_OPTIONS environment variable, or simply use the sxf90_new_site_options command. You can alternatively specify -I/SX/local/include for the default (-ew -size_t32) version, or -I with a specific directory e.g.

  -I/SX/local/netcdf/inst/include or
-I/SX/local/netcdf/3.6.0p1-1/include-dw-32


3. HPCCC SX-6 NQS II batch parameter -l cpunum_job=n

HPCbull 144 highlighted that SUPER-UX R15.1 now uses the new NQS II parameter -l cpunum_job=n to control the number of CPUs assigned to a job. This limit is now enforced at run time - a job will not be given more than cpunum_job CPUs at any time.

Applications that put tasks into the background have slowed down because there was no value of -l cpunum_job=n specified, and only one CPU at a time was assigned to the job. MPI jobs must have an appropriate value specified.

Please ensure all jobs include the -l cpunum_job=n parameter with an appropriate value for n. For example if a job runs two 3-CPU tasks in the background, while continuing execution in the main shell, then the correct settings are

#PBS -l tasknum_prc=3
#PBS -l cpunum_prc=3
#PBS -l cpunum_job=7

An MPI job requiring 4 CPUs should have

#PBS -l tasknum_prc=1
#PBS -l cpunum_job=4
#PBS -l cpunum_prc=1

Note that it is bad practice on the SX-6s to try to run several multi-CPU programs in the one job when the multi-CPU programs require different numbers of CPUs - there is no way to specify different CPU counts for each program execution.

In the first example above, the Gang scheduler uses the #PBS -l cpunum_prc=3 to set the number of CPUs assigned to a task at each time slice.

On non-Gang-scheduled nodes, jobs specifying #PBS -l cpunum_job=4 are going to be assigned four CPUs whether or not you actually need or use them.



4. Use of memory

The SX-6 nodes have plenty of memory.

Recently, a user was running an old program, which was written around 25 years ago when machines had smaller memory. The performance of the model was not good.

Analysis showed that there was a lot of i/o in progress - one file of about 1.5 Mbyte was read about 900 million times.

After changing the program to read the file and similar files just once, the program execution time dropped from 42 to 7 hours.

Many old programs could benefit from such re-working - remove the old out-of-core capabilities, and make good use of the large memory available.

Likewise, if your programs are using local disc on a node, consider using the main-memory file system on the SX-6 nodes instead - use $MMFSDIR - there is about 6 Gbyte on each SX-6 node. However, use of such an area will make your jobs non-migratable, and the no-migrate flag (-J n) must be set.

Please ensure your jobs remove any files in $MMFSDIR before they finish - use traps to ensure this is done even if the job crashes.



5. HPCCC req system - change of appearance

We have changed the appearance of the req system: the new version is more compact, and allows for long text lines to be wrapped.

See http://intra.hpsc.csiro.au/cgi-bin/wreq/req and http://intra.hpsc.csiro.au/cgi-bin/wreq/req?list-1

If you prefer a different font size, many browsers will change font size with CTRL or Apple + and -.



6. New HPCCC utilities for the SX-6s

The erstatj command and a variant erstatjfixed have been installed on all SX-6 nodes. erstatjfixed outputs one-line format lists.

On TX7s and SX6s, the command

        /usr/local/bin/job_diag [--version|--verbose|--help]

is being developed to provide comprehensive system diagnostics from within a job. It runs a erstatjfixed, ersys, ps -ef, etc.

The script provides options for verbose output.

These commands can be useful for interrogating the system while a job is still in execution to find information, e.g. how long the job has been held, when tracking performance and scheduling issues.



7. R utility on cherax

R version 2.1.0 - a "free software environment for statistical computing and graphics" - has been installed on cherax.

See http://www.r-project.org/ for further information.

Use pkgenv R to access it.





BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement