Bulletin 174 - 2007 November 30

  1. Invitation to Register
  2. Draft policy for the Bureau $WORKDIR flush areas on the SX-6/TX7 system
  3. New Web Page showing active SX-6 Queues and Nodes
  4. HPCCC Christmas Party
  5. CSIRO Advanced Scientific Computing Review
  6. Holiday shutdown period
  7. CSIRO cluster - burnet - shutdown
  8. CSIRO clusters - burnet and nelson - re-fresh
  9. cherax $DATADIR
  10. Cherax - tcl-nap v6.4.0

Note: "CSIRO" items can apply to BoM users of cherax and burnet


1. Invitation to Register

The Bureau of Meteorology issued an Invitation to Register for the "Supply, Installation and Maintenance of a Supercomputer System for the Australian Bureau of Meteorology" on 16th November. The Invitation to Register is available from the Australian Government AusTender site.

[ page top ]



2. Draft policy for the Bureau $WORKDIR flush areas on the SX-6/TX7 system

Following recent incidents where flushable file systems on the SX-6/TX7 system have nearly filled, a revised policy for the management of Bureau $WORKDIR areas is proposed.

Currently Bureau $WORKDIR files are retained for at least 30 days, before flushing occurs.

The draft policy proposes that Bureau $WORKDIR files will be retained for a minimum of only 7 days.

(Refer to http://www.hpccc.gov.au/hpccc/system_stats/usage/SX-6/performance/disk_util/yearly_index.shtml for /bm/flush[1234] utilisation history.)

The key difference from the current policy for Bureau flush areas is that the minimum retention period will go down from 30 days to 7 days.

Please see reqs #8180 and #8677.

We would welcome feedback from users on the following.

Flushable file systems will be flushed of files when usage reaches a threshold, H1. At this point, automatic flushing will be invoked, starting with the oldest files.

Flushing will stop when either:

  • a second threshold of usage, H2, is reached, or
  • all files older than T1 have been removed (both access and modify dates >= T1)

A report will be placed in the head of the file system, and on WWW pages.

If the flushing stops because of the second cause, operators will be notified that the flush has failed to clear sufficient space. The operators will have been authorised and given instructions to run a second flush run, flushing files from the oldest down to T2 days, or until the threshold H2 of usage is reached. Operators will contact HPCCC systems staff before commencing this during normal business hours, and will notify by e-mail outside normal business hours.

If the second flush run fails to reach the H2 threshold, then the operators should call out HPCCC staff to investigate. HPCCC staff will take action to manually move or delete files to get the usage down, typically by finding recent major anomalous usage.

Typical values will be:

For /bm/flush*

H1 = 95%
H2 = 80%
T1 = 30 days
T2 = 7 days

[ page top ]



3. New Web Page showing active SX-6 Queues and Nodes

From the HPCCC Home Page - under "User News and Advice" Select "SX-6 Queue Assignments" to see SX-6 active queues. Or directly refer to http://www.hpccc.gov.au/hpccc/user_news_advice/queue_status/queue_status.shtml

[ page top ]



4. HPCCC Christmas Party

Owing to unusual circumstances this year, the HPCCC Christmas Party will be a low-key informal event.

You are invited to join HPCCC staff in the Summit Cafe at 700 Collins Street Docklands from 4 pm on Tuesday 18th December to reflect on the year.

Please advise Teresa Curcio, 03 9669 8113, Teresa.Curcio@csiro.au if you wish to attend.

[ page top ]



5. CSIRO Advanced Scientific Computing Review

The CSIRO Executive Team has considered and endorsed the findings of the Advanced Scientific Computing Review

[ page top ]



6. Holiday shutdown period

The period from close of business on 24th December to the commencement of business on 2nd January will be a shutdown period for HPCCC staff.

Systems will be left running, but if there are problems, only essential and operational systems will be restored to service.

In addition, in periods of high temperatures, parts of some systems will be shutdown, and may not be restored to service until 2nd January or later.

[ page top ]



7. CSIRO cluster - burnet - shutdown

The RAID disc on the CSIRO cluster burnet has reported a fault, and a firmware upgrade is recommended.

This will be scheduled for Saturday morning, 8th December.

File systems will be backed up prior to the downtime period, to provide protection in case of major problems. To reduce the amount of data to be backed up, and so reduce the duration of the down-time, the $WORKDIR area on burnet will be flushed of all files older than 7 days.

Please check carefully your holdings on burnet $WORKDIR prior to 7th December.

Please watch the traffic lights or incident reports for further information.

[ page top ]



8. CSIRO clusters - burnet and nelson - re-fresh

Approval has been given to upgrade the infrastructure for the CSIRO IBM e1350 clusters - this will provide more robust head and management nodes, will allow the partial merging of burnet and nelson, and will provide better storage facilities.

In order to make the changes heat-neutral, some older nodes will be switched off.

[ page top ]



9. cherax $DATADIR

A new storage area on cherax is available, accessed through the variable $DATADIR. The area is on low-performance disc, and has 2.4 Tbyte allocated.

Like the $DATADIR areas on other systems, the area will provide storage with the following management arrangements:

  • quotas - initially 200 Gbyte per user, until space is short, when the quotas will be lowered, and users with large amounts of idle data will be notified and asked to remove data.

  • NO BACKUP - three of the 30 disc drives in the new disc array have already failed, but nothing was lost because of RAID. Nothing important should be stored solely on this disc area.

  • NO migration

  • NO flushing

We suggest that users use the area for a working area for data which is also stored elsewhere.

[ page top ]



10. Cherax - tcl-nap v6.4.0

tcl-nap v6.4.0 (http://tcl-nap.sourceforge.net/) has been released by Harvey Davies (CMAR) and is now available on cherax. Support for CAPS, netCDF, HDF and PROJ4 is included.

Previously, Harvey has kindly provided access to his own installation of tcl-nap on cherax. We suggest that users of tcl-nap move to using the HPSC installation allowing us to better manage the software and on-going support.

You can access the tcl-nap software by doing the following:

% module load tcl-nap/6.4.0

then followed by either:

tclsh
wish
tkcon

Usage instructions are available at the APAC software map http://nf.apac.edu.au/facilities/software/software.php?software=TCL-NAP&site=CSIRO&from_site=CSIRO

[ page top ]




BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement