|
Bulletin 201 - 2010 July 12
1. Bureau Constellation operational Following operational trials, the Bureau's Sun/Oracle Constellation system ('solar') was declared operational on 22nd June. ACCESS and Wavewatch system products are being produced. Thanks were expressed to the large team of people who worked to achieve this milestone. For assistance, email For information see: http://www.hpccc.gov.au/hpccc/userguides/solar/ and http://www.hpccc.gov.au/hpccc/userdocs/solar/ and if you have access to the BoM network: http://wiki.bom.gov.au/foswiki/Main/HPCCC [ page top ] 2. SX-6/TX7 system closure It is anticipated that the HPCCC NEC SX-6/TX7 system will be fully switched off by the end of July 2010. It is now only running operational jobs and operational support jobs with unused parts of the system already shut down. At the close of service, the req system which has handled request tracking for this system and predecessors will be de-commissioned. The general 'req' queue reached 11771 items recently. [ page top ] Users requiring assistance with LSDSS, including sam and disks on flurry mounted under /g/ns, should now email [ page top ] 4. CSIRO ASC Microsoft Windows Server 2008 HPC Cluster CSIRO ASC has 15 new Compute nodes on a Microsoft Windows Server 2008 HPC Cluster. It is operational and ready for cluster jobs to be submitted. The compute nodes combined provide 15 x 2 x 6 CPU cores as seen in Windows. You can point to dellcpu.csiro.au with the Microsoft Windows HPC toolkit to submit jobs. User information about all the Windows Clusters and getting the HPC toolkit can be found at http://intra.hpsc.csiro.au/userguides/WinGPU/getting_started.shtml If you have questions or need help / advice on getting started please contact The Cluster is a Dell M1000e Chassis with a M600 head / management node and M610 compute nodes. It is running Microsoft Windows Server 2008 HPC SP2. Each node has 2 Intel(R) Xeon(R) X5650 2.66GHz CPU's. It is a 64-bit system. [ page top ] 5. CSIRO ASC Data Store upgrades CSIRO has taken delivery of an additional four T10000B tape drives, and these are being connected successively, to improve the DMF performance, particularly to reduce the wait time for file recalls, and to improve the throughput. We now have power connected to the new IS4600 disc subsystem, and configuration work has started to bring on-line the large increase in disc storage that has been delivered. This process is likely to take a month or two. Specifically, the following increases in capacity of the main user filesystems are likely:
/cs/datastore 6.2TiB --> 26.3TiB
/work 2.6TiB --> 6.2TiB
Other DMF-internal and user filesystems may be resized at a later date. Of equal importance is the increase of speed that should result from the new hardware. This is not just due to the use of more modern disk drives, but also due to the introduction of a solid-state disk (SSD) which will be used to hold essential filesystem data structures such as inodes and possibly directories in "flash" memory. This will indirectly speed disk activity by users by speeding various system and DMF administrative tasks, thereby reducing their impact on user work. Use of SSD for this purpose is innovative, and will require some experimentation to determine the best way to implement it. Fortunately these experiments can be done without impact on cherax, but the final integration the new disk into our configuration will nonetheless involve some down-time. Short interruptions to update system software and make minor but disruptive changes will take place from 16:00 on Wednesdays, but when the disk is placed into production all the files will need to be copied over which is likely to take an entire weekend. Times of outages will advertised through the login message at /etc/motd. [ page top ] 6. CSIRO ASC Data Store recent load Recently, we have seen some high demands for file recalls, with 24 Tbyte total being recalled on two successive days in early June. The 6.6 Tbyte home file system is clearly not adequate to cope with such high recall rates. We believe that we should aim to be able to accommodate the recalls from business hours in the on-line disc space, without needed to write data to tape or remove dual-state files. Users should not try to run multiple streams of retrievals to the detriment of themselves and other users. Until the new disc is in production, would users please limit your recalls to less than 1 Tbyte per day. Here is a way to see how much you are recalling. Go to the directory where all the files are that you will be recalling. If the entire directory is needed, do: dmget --list --recurse . > $WORKDIR/dmget.list If only some files, then something like: dmget --list --recurse files* > $WORKDIR/dmget.list You will get a message like: You would be recalling 43 of the 99 files specified. 14 tape mounts may be required.Then in the same directory do: dmdu -ch `cat $WORKDIR/dmget.list` | tail -1and you will get a number like: 7317.29 Mbyte Then repeat the dmget, but remove the --list, and add -a. This will update the access time (use dmls -alu to see this), and start the dmget. [ page top ] 7. CSIRO ASC Data Store - dmput command One of the commands available to users of cherax and the CSIRO ASC Data Store is dmput. This is somewhat the reverse of the dmget command, and enables users to influence the migration of their files. Migration is normally a two-phase process:
These steps are performed automatically by DMF at least once each day, but the dmput command allows users to request that it be done for selected files of their own sooner than then, at a time of their choosing. See man dmput, and also the details on the dmput command in the local Userguide for the CSIRO Data Store at http://intra.hpsc.csiro.au/userguides/ds/ The principle reason for using the dmput command is to free up on-line disc space, and so make more space available for your own and others' work, Typically, if you have a long-running processing job that recalls many large files, then it is good practice to follow up the processing with a dmput command of the form: dmput -r file where the -r says to release the blocks. Note that users should normally issue dmput commands only for files that are dual-state - it is best not to issue dmput command for new files, since this may initiate a tape write for only a small amount of data. It is best to leave this process to systems tasks, which run every evening, and when free space becomes low. A problem arises when users share large files. Although other users may have permission to use shared files, they do not have permission to migrate them. We have implemented a new feature in a local dmput wrapper. This new feature allows users to dmput other users files, provided the initiating user has read access and belongs to the group of the file. This will allow users who access others users' files extensively to reduce their usage of on-line space, and reduce the need for system freeing of disc space, which can be pretty heavy handed, impacting on the work of the readers, the owner, and other users of the system. [ page top ] MolSim Downunder is a week-long workshop providing a thorough grounding in molecular dynamics and related methods. The course will be taught by international speakers and local experts, integrating theoretical and practical sessions. Speakers include: Dr. Bernd Ensing, University of Amsterdam, The Netherlands; Prof. Luciano Colombo, University of Cagliari, Italy; Prof. Julian Gale, Curtin University; A/Prof. Mike Ford, UTS Sydney; A/Prof. Ben Corry, University of Western Australia; A/Prof. Nigel Marks, Curtin University; Dr. Paolo Raiteri, Curtin University. Check the website for more information: http://molsim2010.ivec.org Venue: Opening - ARRC Auditorium, 26 Dick Perry Ave, Kensington Tutorials - Exhibition space, Dept. of Chemistry (B500), Curtin University Date: Monday, 26th to Friday 30th July Cost: $150 for students and $250 for postdocs. Travel and registration grants might become available at a later stage and will be advertised on the web page as soon as they come available. For registration and payment details visit http://molsim2010.ivec.org or email [ page top ] 9. 9th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2011) This symposium is in Perth from the 17th to 20th of January http://www.swinflow.org/confs/auspdc2011/cfp.htm They accept three types of submissions: full research paper, full industry experience paper, and extended abstract and the due date is the 16th August. The symposium covers a wide range of topics matching Advanced Scientific Computing needs. We encourage you to make a submission. [ page top ] 10. CSIRO ASC - New and Upgraded Software
[ page top ] Recently, NCI has issued a newsletter, which provides useful information about a number of topics, and especially focussing on file systems and management. Please see http://nf.nci.org.au/notices_news/news.php [ page top ]
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |