Overview
SAM-FS software is both an archival and migration system which automatically manages the migration and retrieval of files between a number of levels of storage hierarchy; from on-line disk cache to on-line tape media to off-line tape media. Different types of media can be supported from cdrom to digital linear tape to standard IBM type formats.
The SAM-FS system enables multiple copies of files to be automatically generated immediately after a file is created. Files can be left resident on discs for pre-determined times or can be immediately removed from the disc cache freeing up disc storage; subsequent reference to such a file involves the file being recovered automatically from secondary storage media. SAM-FS enables up to four copies of data to be made to different media in order to protect against media malfunction and enable copies to be made for off-site storage.
The SAM-FS system incorporates a hierarchical storage management component; whenever a disc cache area fills, file space is released to ensure predefined levels of disc cache area is available.
The file format adopted by the SAM-FS system is standard tar format; provided the location of the file is known on a tape, any file can be recovered on a standard UNIX system. Transport of tapes to other UNIX systems is thus possible without the need for special recovery or conversion software. All tapes have standard ANSI labels.
Harware Configuration
The hardware configuration for the SAM-FS server is a Sun Fire 4800 Server. The detailed hardware is:
Control of the Silo robot is by another SUN system, a Sun Fire v240. The hardware is:
System Operation
The SAM-FS system is based on the concept of a disc cache area into which data can be stored. When data is placed in a disc area which is under the control of SAM-FS, SAM-FS will migrate/move the data from the disc to cartridge/tape media according to configuration parameters set by the system configurator. An index/catalogue of the location of the data on a tape is created by SAM-FS to give fast direct access to the data when recovery to disc is needed.
The following features can be set for data within a SAM-FS controlled disc area:
Typically when a file is placed into a SAM-FS controlled disc cache, the SAM-FS archiver,when next invoked, will check all files in the disc cache filesystem, to see if a copy of the file should be made. ( New copies of a file will be generated if the file is replaced; renaming does not create a new copy.) If so, a copy is made. With multiple copies, copies are generally made at different times after the file was created/modified to minimise queueing for cartridge drives. On-line disc storage for a file is released if specified in configuration information or when disc cache space becomes used above a pre-determined level.
Two copies of files will be generated in order to ensure data loss will not occur if there is a media failure.
Access to the SAM-FS disc cache areas is by either NFS or rcp. Either mechanism can be used to store or recover data to/from the SAM-FS system. The fastest method to store and recover data is via the rcp command.
When using rcp from the supercomputers TO sam, please use the following format for rcp:
The getepdata command has been extended to enable users to continue to use this feature for recovery of data from SAM-FS systems. It is preferred that users recover archived operational data via the getepdata command as this permits statistic gathering for planning purposes (see below for further information).
If a copy of a file is only on tape and the file is accessed by a user, the file will be recovered from tape. Recovering a 233Mb file which is offline (eg requires a tape to recover the file) using NFS across the HiPPI link takes 66 seconds. Retrieving the same file again (eg resident on Sam's disk) reduces this time to 44 seconds.
File System Configuration
The SAM-FS disc cache has been configured into operational and general user areas. Large disc cache areas have been provided for operational requirements, while smaller areas have been made available for user groups.
Operational File Systems
Large disc cache areas have been provided for operational needs in order to enable frequently accessed data files to remain on disc and be immediately available for user access; no cartridge mount will be required to access such data. The operational filesystems and retention period for selected data appears below:
| Operational Area | File System Name | Size |
|---|---|---|
| NMC Forecasts/Analyses (files generated after Jul 30, 2001) |
/samnmc_reg /samnmc_trop /samnmc_med /samnmc_lo /samnmc_glob /samnmc_expires |
130GB 65GB 65GB 32GB 100GB 130GB |
| NMC Forecasts/Analyses (files generated before Mon Jul 30, 2001) |
/samnmc_h | 65GB |
| NMC RTDB Archived Data, CMSS Data |
/samepms | 65GB |
| Satellite Information (files generated after Mon Jul 30, 2001) |
/samsat | 100GB |
| Satellite Information (files generated before Mon Jul 30, 2001) |
/samsat_h | 65GB |
| National Climate Centre | /samncc | 65GB |
Operational disc cache areas will be monitored to ensure data access is efficient and retrieval requests can be met quickly. Additional space will be added if required.
For NMC archived data, two copies are being generated, one for onsite and the second for offsite storage. All data stored within the above disc areas, have two copies made, one for onsite storage and the second destined for offsite storage.
User Storage Areas
Where possible, user areas have been allocated on a group basis such as the groups within BMRC. The disc areas allocated to user groups is given below. User groups within NMC and NCC, share the operational filesystems for that area at present; if monitoring of usage indicates access conflicts, then separate areas will be created for those user groups.
Two levels of retention of user data is proposed. A general area for data not required beyond 3 years, and a long term archive.
Two copies of data will be generated for all data.
User disc cache areas will be exported via NFS to nominated systems with read-only permission enabled. Users must ensure that these file systems are not included in search paths under any circumstance; this results in degradation of NFS performance for other users. Users may write small files using NFS; larger files should be transferred into the SAM-FS system using rcp.
The filesystems currently available for general use by user groups are as follows:
| User Area | File System Name | Size |
|---|---|---|
| BMRC Group mrsr | /sammrsr | 130GB |
| BMRC Group mrme | /sammrme | 130GB |
| BMRC Group mrgh | /sammrgh | 130GB |
| BMRC Group mrlr | /sammrlr | 130GB |
| BMRC Group mrms | /sammrms | 130GB |
| BMRC Group mroc | /sammroc | 130GB |
| Experimental | /samex | 65GB |
| OEB | /samoeb | 65GB |
| MOS | /sammos | 32GB |
| CRC | /samcrc | 128GB |
Note that areas for other groups will be progressively made available.
BMRC Storage Areas
For user areas, within each main group directory, two top directories - ext and gen will be created; the ext directory will contain data for long term storage, the gen directory will contain files with a lifetime up to three years. Within each directoy "ext" and "gen", a sub-directory for a user will/should be created. Users can store data within each area as appropriate. By the use of rlogin, users can create their directories within the ext and gen directories.
For example for /sammrsr, the directory structure will look:
Groups may wish to set up additional functional directories in either of these areas to contain special source etc. If special archival requirements are required for these files, such as three copies, archiving separately, then Operations should be contacted.
Users should note that the SAM-FS system is quite flexible; if any special sets of data need to be grouped and stored as an entity, Operations should be contacted to enable such collection of data. for each user will be created.
Where there is the likelihood of storing of data for a specific function such as a parallel trial, Operations should be contacted so that separate storage media can be allocated. At the end of the trial, users would then have the option of deleting the data from the SAM-FS system or have a listing of the tape contents created, then the data deleted from within the SAM-FS system and the tapes returned to the user. In the latter case, the data would still be accessible by use of standard UNIX commands.
Extended Climate/ Model Evaluation Experiments
In addition to the general user areas, a special disc cache area /samex has been created to enable archive of climate experiments, model evaluation runs etc. Storage on media through this area will be set up so that each file as written will be archived sequentially onto a particular cartridge. When model post-processing is underway, this will enable data streaming to occur from a single tape, no unmounting/mounting should occur during the processing thus ta will be lost.
Access to Archival Data in SAM-FS
Currently data is archived into the SAM-FS system for:
Access Method
Access to data stored in these areas can be handled via a number of different methods.
Where possible users are encouraged to use the rcp_files_from_sam script; it retrieves large number of files using an efficient strategy. If rcp_files_from_sam is not available then rcp can be used. See the manual page on your local system for rcp usage, use the --help option with rcp_files_from_sam for usage information.
The getepdata command will allow users to recover data from either sam2 via Ethernet, or sam2jf via the Jumbo Frame Gigabit ethernet link. An additional parameter - "h" - has been added to enable users to specify which host to request data from.
Users should check the man page for getepdata for usage.
Users requiring the new version of getepdata installed, should ask the System Administrator to contact SRSL. Operational file systems have been exported read-only to the major systems where data is regularly accessed enabling users to check naming and if data is available.
Additional Information about NMC Data
Users should check the NMC Archival Policy Document for availability of data.
The naming of the NMC archive has been altered in moving from the EPOCH storage system. The basic structure now adopted on the SAM-FS server is of the form:
where
Additional Information about Satellite Data
A full description of and naming convention for satellite archive data is available. Lists of data held is also provided. Satellite data archived before July 30th, 2001 can generally be found under /samsat_h
Appendix
As of 30th July 2001, /samnmc was split up into a number of different filesystems. Users can still access data archived to sam2 via the old /samnmc directory, but their true locations are given below.
Note: Data archived before 30th July 2001 will be found in /samnmc_h.
/samnmc has a set of symbolic links which map the new /samnmc_* filesystems to:Known Problems
Changelog
Here is a list of updates in this userguide for quick reference for users returning to this guide.
Last updated:
Wed Mar 29 10:27:08 AEST 2006