|
Bulletin 149 - 2005 November 10
1. HPCCC documentation and the request system There have been some updates to the CSIRO Data Store Userguide at http://intra.hpsc.csiro.au/userguides/ds/ - revisions to the backup and free-space management, some information about strategies for compression (don't!) and consolidation of files, and minor updates. There is a updated FAQ section about the HPCCC problem tracking system (req), which leads to a new Userguide on the req system. See - http://www.hpccc.gov.au/hpccc/userguides/faq/ or http://intra.hpsc.csiro.au/userguides/faq/wreq.php . It helps HPCCC staff greatly if users could acknowledge that problems have been solved. To help reduce the backlog of requests, the HPCCC will consider resolving all problems in the category 'pending user response' that are more than a month old. [ page top ] 2. Totalview class - correction Last issue's item about the Totalview class should have read: The class presentation materials are available at http://www.hpccc.gov.au/hpccc/seminars/ . A PC-video of the recent Totalview class is also available. Contact Greg Roff on 03 9669 4822. [ page top ] 3. Use of Totalview A Totalview debugging session requires an X window into an SX-6 node. Rather than users choosing an SX-6 node for an interactive login, there is a capability to have an X window displayed from a batch job. This allows the system to place the session on an appropriate node, and to do resource scheduling. Otherwise, Totalview sessions may have an adverse impact on other work. See the Local Userguide for the SX-6 Cluster at http://www.hpccc.gov.au/hpccc/userguides/sx/ or http://intra.hpsc.csiro.au/userguides/sx/ . A brief overview follows. To use this facility, log into a TX7 with a command like ssh -X tx7 Upon login, check that X windows is working by checking the DISPLAY variable echo $DISPLAY and making sure it is something like tx701:21.0 . Then do a test X command, e.g. xclock & and see that a clock-face appears. Then you can submit a batch job with a command like xterm_batch -l cpunum_prc=1 -l cpunum_job=1 \ -l cputim_prc=70 -l cputim_job=80 \ -l memsz_prc=100MB -l memsz_job=100MB An X window should appear soon, and can be used to run a program under Totalview. [ page top ] 4. Altix (cherax) system upgrade The planned upgrade of the CSIRO Altix (cherax) and the CSIRO Data Store on Monday 31st October did not succeed. The cause of the configuration problem has been identified, and a new configuration has been tested. There will be an outage on Saturday 12th November to upgrade from 64 to 128 processors. The duration of this outage (09:00-12:00) will be substantially less than the previous attempt. [ page top ] 5. CSIRO Linux Cluster update There was a complete failure of the main network switch in burnet last week. There will be an outage on burnet Friday 11th November from 09:00 to 10:00 to replace the main switch. The head node will also be rebooted to allow a second system image to be created. Running jobs that require network connectivity are likely to fail during the switch replacement, and may need to be restarted. Other jobs may survive. The batch queues will be stopped in the time leading up to the outage. [ page top ] 6. Impending retirement - Len Makin Len Makin, who has been working for the CSIRO Supercomputing Support Group, the HPCCC, and CSIRO HPSC since 1991, is retiring at the end of this year. A farewell function will be held at 700 Collins St Docklands on Friday 9th December from 3 pm, to which users are invited. If you wish to attend, please RSVP to Robert Bell on 03 9669 8102, robert.bell@csiro.au by 5th December. [ page top ]
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |