Bulletin 141 - 2005 July 29

  1. HPCCC SX-6 SUPER-UX upgrade to R15.1
  2. SXF90 R313 Available for Evaluation
  3. Upgrade of SX MPI libraries
  4. cherax downtimes and upgrades - disk and Linux
  5. Strategies for those moving applications to multi-processing
  6. Changes to the SX-6 Enhanced Resource Scheduler (ERS)
  7. Latest Intel Fortran and C compilers (CSIRO)
  8. Cluster downtime notification (CSIRO)


1. HPCCC SX-6 SUPER-UX upgrade to R15.1

Information about the upgrade to SUPER-UX R15.1 is being made available at: http://www.hpccc.gov.au/hpccc/user_news_advice/ and http://www.hpccc.gov.au/hpccc/userdocs/index_user.shtml .

In particular, there is a list of upgrade pre-requisites:

  • re-configuration of NQSII, ERSII and resource groups to allow Bureau operational jobs to use nodes sx607 to sx624 - 3rd August
    • multi-CPU jobs will go to queue bmrtml instead of bmrtmn.

  • installing updated cross-environments on the front-ends - 3rd August
    SYSTEM CHANGE NOTICE 2005-A020

  • applying patches to NQSII/ERSII - 4th August onward
    SYSTEM CHANGE NOTICE 2005-A019
    • fixes problems, including qdel failures

  • providing updated versions of the MPI libraries, commands and daemons - 3rd August
  • updating c++ on the SX-6s and front-ends - 9th August
  • updating nco, udunits and netCDF libraries - 9th August

At http://www.hpccc.gov.au/hpccc/userdocs/SuperUX_Node_Upgrade_to_15-1.shtml there is information about the timetable for nodes to be moved from R13.1 to R15.1.

From Wed 10 August until the upgrade is complete, users can submit jobs to nodes running R13.1 by using the usual queues, and to nodes running R15.1 by the usual queue names with a '3' suffix: e.g. bm3.

NEC documentation for R15.1 will appear at
http://www.hpccc.gov.au/hpccc/userdocs/index_user.shtml .

All SUPER-UX R13.1 executables can be run under SUPER-UX R15.1, but note that a SUPER-UX R15.1 executable (compiled with the SUPER-UX R15.1 Crosskit) cannot be safely run on SUPER-UX R13.1.


2. SXF90 R313 Available for Evaluation

The R313 compiler is now available as the latest compiler, for evaluation. Some limited performance issues have been reported but neither HPCCC nor NEC will be able to investigate until after the SUPER-UX upgrade is completed. R313 problem reports will only be acted upon in the short term if functional failures, or alternatively performance issues that seriously affect production or research, and R285 (previous) or R302 (default) are not viable.

Note that R302 will remain the default compiler for some time to come.

Note that there are licence key issues preventing use on the TX7s and cherax - we are investigating.


3. Upgrade of SX MPI libraries

New versions of the NEC SX-6 MPI libraries, commands and daemons will be available on or before 3rd August.

To use these new versions, please relink your code with the new libraries by issuing the command:

 sxcross mpisx/latest

NEC has now affirmed that the executables compiled with SUPER-UX R13.1 and MPI/SX Library 6.7.12 are safe to run on SUPER-UX R15.1 (with MPI/SX Library Version 6.7.25 and command/daemon 7.0.5/7.0.15), contrary to HPCbull 140.5.

It is also possible to test the new MPI/SX libraries with SUPER-UX R13.1. However, a special procedure is needed to use the matching MPI command and daemon - if you need to use this test facility, please contact HPCCC staff.


4. cherax downtimes and upgrades - disk and Linux

cherax will be down on Saturday 13th August, and maybe part of Sunday 14th August.

New disc will be incorporated into the /cs/datastore and /work file systems, and configured to provide higher performance. The /cs/datastore will be dumped and re-loaded - this will take several hours.

Please note: all files and directories on the temporary /work file system will be lost.

The operating system will be upgraded from Red Hat Linux to SUSE Linux Enterprise Server 9.2 with SGI Propack 4 at the same time. New versions of DMF and the tape management software will be installed.

We expect better scalability and performance from the new operating system. In particular, file system scans are expected to be much faster (these often trigger slowdowns).

Most of the changes are at the system level - we are scanning the documentation for user-level issues.

Information about the upgrades can be gleaned from our WWW pages at
http://intra.hpsc.csiro.au/user/userdocs/ax/
under the section "Upgrade to SUSE Linux Enterprise Server 9.2 from Red Hat - August 2005" - there are pointers to guides provided by Novell and SGI.

More details will be provided later.


5. Strategies for those moving applications to multi-processing

The FAQ been extensively updated - to separately address developing code for multi-processing, or inheriting code which is capable of using MPI or OpenMP.

See
http://www.hpccc.gov.au/hpccc/userguides/faq/parallel_choice.php
or
http://intra.hpsc.csiro.au/userguides/faq/parallel_choice.php


6. Changes to the SX-6 Enhanced Resource Scheduler (ERS)

Changes to ERS scheduling sometimes need to be made at short notice because of changed circumstances. For most tuning changes, there is no interruption to ERS functioning.

However, when the changes interact with NQS or other parts of the system, ERS needs to be stopped and started briefly - the interruption is less than 5 minutes.

The HPCCC would like to be able to do these changes at any time without issuing a change notice. Executing jobs continue unimpeded: ERS just ceases to manage jobs and display status during such interruptions.

If you fail to get a response from commands such as erstatj and ersys, for less than five minutes, then we might be making such a change.


7. Latest Intel Fortran and C compilers (CSIRO)

The pgkenv utility on cherax and burnet now provides access to multiple versions of the icc and ifort compilers. Enter pkgenv without arguments to see what is available.

The latest Intel versions are now available - icc-8.1 and ifort-9.0.024.

Still to come are:

  • latest icc 9 on cherax
  • latest icc and ifort 9 on burnet (+ idb and other tool updates if they are new versions)
  • check release notes for 8.1 -> 9.0 runtime shared library compatibility
  • set INTEL_LICENSE_FILE appropriately on both cherax and burnet so the licenses can be pooled
  • make /ld.so.conf point to last 8.1 compiler shared libs or 9...
  • getting rid of some old versions

8. Cluster downtime notification (CSIRO)

A reminder that the CSIRO cluster, burnet, is scheduled to be down for maintenance and upgrades on the second Saturday of each month. The next downtime will be on 13th August.



BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement