|
Bulletin 189 - 2009 July 6
1. Bureau Supercomputer installation updates Equipment for the Bureau's Sun Constellation system has been delivered to the Bureau's Central Computing Facility, and the equipment has passed a first stage acceptance test. Final installation is awaiting electrical and mechanical services to be upgraded. Work is underway to finalise the hardware and software installation and documentation in readiness for handover for user access. Key Bureau users have been accessing an Exemplar system for some months to port the Bureau's operational models and infrastructure, and key research users may request early access to the Exemplar system in mid-August. The Bureau's research Linux cluster (FLURRY) is to upgraded in the next month, to replace current compute nodes with 24 energy efficient nodes consisting of two quad-core 2.53 GHz Intel Nehalem processors, 24 GB of DDR3 memory, and 40Gbps QDR infiniband network interfaces. This is similar to the compute nodes in the Bureau and NCI's Sun Constellation system. Seminars for research users on the application development environment for the Sun Constellation system are being planned. The seminars are expect to start when the system is available for research access. [ page top ] 2. CSIRO ASC User Support Phone We have established a new CSIRO ASC phone number for users requiring assistance with ASC shared systems, software and consulting.
External 03 8601-3800 Use of the HPCCC helpdesk number will be reviewed as part of planning the support arrangements for the Sun Constellation systems. [ page top ] 3. Porting data files from the NEC to the Sun Please make sure your data is portable beyond the NEC *_before_* the NEC becomes unavailable! When migrating applications and data from the NEC to the Sun don't forget that the data representation and file structure of the two machines are different. The Sun and Intel compilers have a number of options for reading and writing non-native files, but any files written with F_EXPRCW or F_PARTRCW will need to be converted *_before_* the SX-6 is decommissioned. Please contact hpchelp@csiro.au or hpchelp@hpccc.gov.au for assistance with porting code and data to the new systems Other significant differences are:
The latest Sun and Intel compiler manuals are available at http://developers.sun.com/sunstudio/documentation/product/compiler.jsp and http://software.intel.com/en-us/articles/intel-software-technical-documentation/ respectively [ page top ] 4. CSIRO Data Store updates On Friday 26th June, we ceased using the fast-access T9840 tape drives and media for the Data Migration Facility. The tape volumes have been disposed of in accordance with our risk assessment. On Monday 29th June, the CSIRO StorageTek Powderhorn Tape library was de-commissioned, dismantled and removed, thus bringing to an end the use of this line of tape libraries which started for CSIRO in 1993. CSIRO now is using a 6500-slot SL8500 tape library, and has five T9940 tape drives (200 Gbyte uncompressed) and two T10000A tape drives (500 Gbyte uncompressed) in use for DMF and dumps. As well, four T10000B tape drives (1 Tbyte uncompressed) are on-site, but awaiting a cherax operating system upgrade before they can be brought into service (see next item). We have nearly finished copying (a process which started last July) one copy of all data to the T10000A tapes. At then end of this process, we expect better recall times for user requests, although this will be offset somewhat by the loss of the T9840 drives for smaller files. [ page top ] 5. cherax upgrade An upgrade to the latest Service Pack for the operating system (SLES10SP2) and associated SGI software (in particular, DMF and the tape subsystem) is tentatively scheduled for 29th July starting at 16:00. A reservation of this time has been placed in the batch system, and long running jobs may be blocked until after this time. [ page top ] 6. CSIRO Data Store - dmget wrapper upgrade We have brought into operation a new version of the dmget wrapper. The dmget wrapper was designed to stop user flooding the DMF recall queue at the expense of other users. The new version has the important attribute of sorting the recalls into tape order and attempts to process all recalls from the same tape as part of a single batch, to bring an extra level of efficiency to recalls by minimising tape mounts. As well, a new parameter, -l is supported. If this is specified, the dmget wrapper will not initiate any dmgets, but will report on the order in which the files will be recalled, and the number of tapes required. This command will be useful when the files being recalled are to be input to another process, e.g. scp to another host. When run interactively, the new wrapper attempts to give an indication of how much work has been requested and how long it might take: cherax$ dmget * You are recalling 12 of the 19 files specified. The oldest currently queued recall request has been waiting for 0h 2m 5 tape mounts may be required. [ page top ] 7. Software training courses The following external courses may be of interest.
[ page top ] 8. Over-writing files There is a deficiency in the rcp and scp commands' interaction with both the SAM-FS and DMF Hierarchical Storage Management systems. When rcp or scp are used to over-write an existing file or files, they initiate a read of the target files. If the files are off-line, a recall is started for each file serially, and this can delay the command completion considerably, and overload the systems. There are several workarounds:
The third option is probably the best - rsync is as fast as or faster than rcp in many circumstances, and has the added advantage of not transferring files if a target directory already contains the source files. The -a option to rsync provides a convenient 'archive mode'. If you don't use -a, then as a minimum you should use -t (--times). If you don't, rsync tries to checksum files to see if the copy should be skipped or not. Checksumming a directory of offline files is very slow. If the underlying link does not support rsh or rcp, then add the option --rsh=ssh to the rsync command line. HPCCC and ASC staff can assist further with tailoring of rsync commands, which have been in use for over two years to support backups of CSIRO ASC systems into cherax. [ page top ]
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |