|
Bulletin 165 - 2007 January 10
Note: "CSIRO" items can apply to BoM users of cherax and burnet 1. cherax outages and upgrade: starting Monday 15th January We are planning to upgrade cherax to SLES10 together with SGI's Pro Pack 5. There are enhancements to the kernel which may bring better network performance. To do the upgrade, we wish to run the new operating system on the test partition of cherax (cherax-1), but the last attempted re-boot of the test partition also brought cherax down. There will therefore be a scheduled time on Monday 15th January from 08:00 to 11:00 when the system will be tested, and there may be outages. We plan to have SLES 10 available on cherax-1 from Monday 22nd January, with the upgrade of the main machine to follow on the weekends starting either 26th January or 2nd February. There are no known user impacts with this upgrade. [ page top ] 2. NQSII qsub wrapper to be updated: Wednesday 17th January There have been several changes to the qsubnew version of the qsub wrapper recently, to improve error handling, and to support new queues. The production version of the qsub wrapper will be updated at 09:30 on Wednesday 17th January. [ page top ] 3. NQSII: qstat -Ps superseding qstato With the initial release of NQSII on the SX-6/TX7 system, the standard qstat command had no provision for users to display other users' jobs, so a local utility qstato was provided. qstat now has the option -Ps, which allows such display. Users are requested to use the qstat -Ps form rather than qstato in future, as the former is more efficient. qstato will be removed as a command on Wednesday 28th February 2007. [ page top ] 4. NQSII jobs in EXT state Over a long time, users have reported that jobs can spend a lot of time in the exit (EXT) state from the SX-6 systems. (See for example req #7724). Recently, a user reported jobs spending over 10 minutes in this state, (the jobs were returning a standard output file of around 10 Mbyte), and investigation by NEC led to a recommendation to change a system buffer size - a test indicated a speedup by a factor of 90 with a changed buffer size. Changed buffer sizes have been implemented for all the NQSII network queues to fix the problem. [ page top ] 5. Improvement in SX-6 throughput At times under a heavy load of very short jobs, the throughput on the SX-6 system would actually decrease, because of the pressure to process arriving URGENT class jobs - see for example http://www.hpsc.csiro.au/users/bel107/pictures/SX-6/j2006-12-18.png which shows that from 09:00, the number of CPUs worth of work being run actually decreased as the load increased. Changes have been made to the queues, and NMOC changed its job submission rate after that time, and the impact of the large numbers of jobs has lessened substantially - there was no peak in queued jobs at 21:00 on the above graph. [ page top ] 6. SX-6 Documentation NEC Australia has recently provided additions to their local documentation offering, including a combined table of contents for all of the "New Format" SUPER-UX manuals, as well as a New Format Manuals Global Index with hyperlinks to the respective sections. The inclusion of these new features only applies to and references the New Format manuals, which will henceforth be grouped together on http://www.hpccc.gov.au/hpccc/userdocs/index_user.shtml to highlight this. Please note the referenced web page will have a new documentation ordering arrangement. [ page top ] 7. R and ferret updates on cherax and burnet Recently, R 2.4.0 was installed on burnet and cherax. See http://www.r-project.org/ The default on burnet was changed to 2.3.1, and will change on both systems to 2.4.0 on Wednesday 31st January. Versions older than 2.3.1 will then be retired. To setup your environment to use R 2.4.0 do: module load R/2.4.0 It would be useful to know the relative performance of these versions and platforms on user applications. Please give us some feedback when you can. Recently, ferret 6.0 was installed on burnet and cherax. (see req #7727, and http://ferret.wrc.noaa.gov/Ferret/) The default on cherax is the 32-bit version 5.81 and on burnet was changed to 5.81, and will change on both systems to 6.0 on Wednesday 31st January. Versions older than 5.81 will then be retired To setup your environment to use ferret 6.0 do: module load ferret/6.00 Note that the 6.0 version for cherax (ia64) is an unannounced version, so take extra caution using it. There is also a copy of the 32 bit version on cherax: module load ferret/6.00-ia32 and the 5.81 version is actually a 32 bit version. It would be useful to know the relative performance of these versions and platforms on user applications. Please give us some feedback when you can. [ page top ] 8. Removing csxxx usernames on cherax The removal of csxxx usernames on cherax which have corresponding NEXUS usernames is scheduled to be completed today. [ page top ] 9. cherax slowdowns A further round of slowdowns has been observed over recent months on cherax. A change was made to the management of the buffer cache on 9th January, and immediate improvement was seen in several areas. [ page top ] 10. Traffic lights The traffic light status displays are a WWW page, visible from http://www.hpccc.gov.au/hpccc/system_status/ and http://intra.hpsc.csiro.au/
There have been many small changes over recent months, the most recent being the addition of information about the queue of work on SAM-FS, the Bureau's large-scale data store. Many of the other services now have some indication of the load on the services. If there are problems with any of the services, shown by a red traffic light, then please contact the Help Desk as indicated. The traffic light display will sometimes also show advance notice of downtimes, and can display incident reports for some systems. [ page top ]
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |