|
Bulletin 185 - 2009 January 22
1. Getting more out of the SX-6 system Although the SX-6 system is fully allocated with jobs most of the time, the processor utilisation is not high, and there is scope for getting more out of the system, to get more throughput and better turnaround. HPCCC staff members are working on changes to the scheduling, but you can help with changes to your jobs, to improve the ability of the system to schedule work effectively. The following items deal with changes you can make to help, changes you will see, and changes being undertaken by HPCCC staff. [ page top ] 2. User Changes - SX-6 jobs One of the big causes of churning on the system has recently been identified - jobs that make excessive requests for memory. Starting these jobs causes the scheduler to reserve space for jobs by holding other jobs, but in many cases this is in fact unnecessary, and causes nodes to be idle while checkpointing is in progress. We have seen an instance where this lasted 45 minutes. Please note that the qsub parameter:
-l memsz_job=45G
requests 45 Gbyte on each node of a multi-node job - not 45 Gbyte total. We have recent examples of jobs requesting 45 Gbyte, and using a maximum of less than 20 Gbyte on each of two nodes. (The terminology is confusing - NEC NQSII refers to each part of a multi-node request as a 'job', and the limit applies to each part, not the entire request. There is no way to request different memory sizes on different nodes of a multi-node request. If such an un-even distribution is desired, then it may be preferable to split this task into separate jobs.) Would users urgently review their jobs, to ensure memory limits requested are close to the actual needs. Maximum memory usage of jobs is shown in the last few lines of output. The Enhanced Resource Scheduler (ERS) does not cope well with a high input rate of jobs. We have found that the amount of work being run on the SX-6s decreases with the number of jobs being submitted! [Within its scheduling cycle, ERS works down the list of jobs, but does not complete its consideration of jobs if the queue is too long, before needing to return to the top of the queue to deal with URGENT (operational) jobs.] Combining many small jobs into longer-running jobs will help. The more resources a job requests, the harder it is for the scheduler to find places to run it, and the turnaround decreases (it's a bin-packing problem). For development work, you may find better turnaround by restricting jobs to a single node. The ERS scheduler allocates resources, such as CPUs, for a job according to the job limits for a life of a job. Jobs that have widely varying resource requirements during their life can waste a lot of resources. For example, a job was recently found to be running on 3 SX-6 nodes, requesting 8 CPUs per node, but for most of the job, only 9 CPUs were used, thus wasting 15 CPUs for most of the job. If possible, such jobs should be split into smaller pieces. [ page top ] 3. System changes, SX-6 scheduling Changes made or under consideration:
Please consult http://www.hpccc.gov.au/hpccc/user_news_advice/queue_status/queue_status.shtml for queue and node assignments [ page top ] 4. CSIRO ASC Cluster outage One rack (of 3) of the CSIRO cluster burnet (and burnet-old) will be down on the morning of 28th January, to allow further re-configuration of the power feeds. Job reservations will be set on the nodes to be shutdown during the outage, to prevent jobs starting whose finishing time would otherwise span the outage. This work should allow all the chassis to be powered on, but the resiliency of dual power feeds to each chassis will not be restored. [ page top ]
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |