|
Bulletin 124 - 2004 Aug 26
There will be an outage (to be confirmed) on the Grangenet network in Melbourne on Saturday 28th August. The proposed outage window is 8 am to midday (with the actual outage being most of the window). Access to the following sites will be affected:
In addition all CSIRO traffic between 700 Collins St and the University of Melbourne will also be affected. This is the only CSIRO network link at present from 700 Collins Street, so CSIRO users will not be able to access the HPCCC systems during this outage. 2. HPCCC Users' Liaison MeetingsThe HPCCC has re-constituted its meetings with the transition to the new Memorandum of Understanding between the Australian Bureau of Meteorology and CSIRO. Replacing the former Operations Committee will be a Users' Liaison Meeting, to be held on the first Thursday of each month at 14:00, at a location to be advised. A representative from any of the major user groups is welcome to attend. The goals of the meeting are to deal with major issues concerning use of the HPCCC systems, and to seek guidance from the user community about plans for upgrades and enhancements. Please contact Rob Bell, 03 9669 8102, Robert.Bell@csiro.au for further information, or if your group wishes to be represented. The meeting on 2nd September will be held on level 11, 700 Collins Street, Docklands - please meet at the HPCCC area. 3. HPCCC SX-6/TX7 upgradeIn September 2004, the HPCCC will be taking delivery of additional equipment to upgrade the SX-6/TX7 system. Ten additional SX-6 nodes will be installed, along with about 9 Tbyte of additional disc, and 8 extra processors for the TX-7s. NEC and the HPCCC are preparing plans for the installation and upgrade, with the aim of minimising the disruptions. There will have to be at least one outage to connect the new nodes to the Inter-node Crossbar Switch. 4. HPCCC SX-6/TX7 Corrections and upgradesThe HPCCC receives quite frequent updates to the SX-6/TX7 software, including patches for security issues. The HPCCC is currently using a rolling upgrade program, where a few SX-6 nodes are taken out of service at a time, upgraded and then put back into service. Jobs (other than operational jobs), may be checkpointed during these times. A new round is scheduled to commence on 8th September. 5. HPCCC SX-6 scheduling - testing job migrationThe HPCCC plans to start testing ERS automatic job migration on the SX-6s, initially for the CSIRO jobs. This feature allows ERS to migrate jobs from over-committed nodes onto under-utilised nodes, to allow for better throughput and higher utilisation. Any job subject to migration must not have any dependencies on a local node, e.g. using a local disc or memory-based file system. We plan to introduce this feature from 8th September. Would any CSIRO users whose jobs have local dependencies please add the no-hold flag to such jobs, viz: -H n (Later in the year, NEC will supply a no-migrate flag, which will be a more accurate specification). Note that jobs with a no-hold option cannot be saved across shutdowns. 6. Altix Users' GuideA new draft guide to the use of the Altix (cherax) can be found temporarily from http://b6.hpsc.csiro.au/datastore/userdocs/. This Guide has notes on getting the most out of the migrating file system. We would welcome readers, and suggestions for improving the Guide. (Note that the format is not final - the content is the initial focus.) 7. SYSTEM CHANGE NOTICE 2004-H010 Loss of CCF Air ConditioningSYSTEM CHANGE NOTICE 2004-H010 Loss of CCF Air Conditioning & SX-6 Operational Cluster TARGET DATE OF CHANGE: 2004-09-01 08:30 AEST Duration: 2 hours. SYSTEMS AFFECTED: SX-6 Operational Cluster & eccles & mawson SOFTWARE AFFECTED: Interactive access cannot be guaranteed. IMPACT: Jobs will be held. DETAILS: Due to Loss of Air Conditioning, the SX-6 nodes will be shutdown. Kernel Patches which require an SX-6 reboot, will be applied. Patches to eccles & mawson will also be applied, so as to reduce the total number of service interruptions. CONTACT: Ann Eblen NOTICE APPROVAL: Rob Bell
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |