Discussion:
"Dispatching rate" of jobs
chambon
2010-12-06 09:21:39 UTC
Permalink
Hello,

Is there any solution to control the "dispathing rate" (I mean the number of jobs dispatched per scheduling pass), mainly to avoid bootleneck from jobs using storage services ?
(I mean avoid too many jobs starting at the same time and trying to access files)
?

I don't speak about controling the number of running jobs, why is, of course, possible by complex
I also know that users can "serialize" their jobs (via hold|released states or jobs dependancies)

Best regards
Bernard CHAMBON

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302387

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
andre_ismll
2010-12-06 11:04:32 UTC
Permalink
Hi Bernhard,
Post by chambon
Is there any solution to control the "dispathing rate" (I mean the number of jobs dispatched per scheduling pass), mainly to avoid bootleneck from jobs using storage services ?
you could possibly play around with the scheduler configuration, esp.
with "job_load_adjustments", "load_adjustment_decay_time", and possibly
the "load_formula" (depending on your scenario):

e.g., we made the following:
job_load_adjustments np_load_avg=2.00
load_adjustment_decay_time 0:3:00

resulting in slowly dispatching the jobs (each job increases the load
instantly by 2, which is then decayed within 3 mins.) This works for us,
as we only have jobs requiring a single CPU/slot.

You could check and adjust the scheduling behaviour with an empty queue
when submitting a bunch of jobs (more than slots are available), each
making 100% CPU load: e.g., if using Ganglia, you could monitor a
slow-increase of cluster load (jobs are dispatched slowly) when
increasing load_adjustment_decay_time.

HTH (and we made nothing silly in this case),
André
--
André Busche, MSc.
Information Systems and Machine Learning Lab (ISMLL),
University of Hildesheim,
Tel: +49 (5121) 883 765
http://www.ismll.uni-hildesheim.de/personen/busche.html

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302405

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
chambon
2010-12-06 15:02:51 UTC
Permalink
Post by andre_ismll
Hi Bernhard,
...
Post by andre_ismll
you could possibly play around with the scheduler configuration, esp.
with "job_load_adjustments", "load_adjustment_decay_time", and possibly
Ok, I understand what you mean.

The inconvenient is that this rule will concern ALL jobs (and not only jobs | users | groups| projects| complex| etc. using storage services)

moreover, I my case I have already defined a load_formula to take into account the cpu load, disk space et memory space per machine.
(with the help of load sensors)

Thank you very much for your answer.

Bernard CHAMBON

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302476

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-12-06 15:27:17 UTC
Permalink
Post by chambon
Post by andre_ismll
Hi Bernhard,
...
Post by andre_ismll
you could possibly play around with the scheduler configuration, esp.
with "job_load_adjustments", "load_adjustment_decay_time", and possibly
Ok, I understand what you mean.
The inconvenient is that this rule will concern ALL jobs (and not only jobs | users | groups| projects| complex| etc. using storage services)
moreover, I my case I have already defined a load_formula to take into account the cpu load, disk space et memory space per machine.
(with the help of load sensors)
What you can do: submit all jobs with an operator or system hold. Then use a cron job which will check, that one and only one job is pending without hold.

The problem is of course, that you will influence the scheduling as you are doing it partially on your own. You could try to have a limited amount from each project/group/resource request eligible for scheduling.

-- Reuti
Post by chambon
Thank you very much for your answer.
Bernard CHAMBON
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302476
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302482

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
kdoman
2010-12-06 21:00:58 UTC
Permalink
This is pretty extreme - but it work:

- Queue is always in disabled state
- Cron job to check for any pending job at each minute
- If there's pending jobs, enable ***@compute-node one at a time,
sleep for some seconds before enable another ***@compute-node
- Check for list of pending jobs while the loop above is running
- Exit from loop when all pending jobs are cleared
- Disable the queue again.
Post by reuti
Post by chambon
Post by andre_ismll
Hi Bernhard,
...
Post by andre_ismll
you could possibly play around with the scheduler configuration, esp.
with "job_load_adjustments", "load_adjustment_decay_time", and possibly
Ok, I understand what you mean.
The inconvenient is that this rule will concern  ALL jobs (and not only jobs | users | groups| projects| complex| etc.  using storage services)
moreover, I my case I have already defined a load_formula to take into account the cpu load, disk space et memory space per machine.
(with the help of load sensors)
What you can do: submit all jobs with an operator or system hold. Then use a cron job which will check, that one and only one job is pending without hold.
The problem is of course, that you will influence the scheduling as you are doing it partially on your own. You could try to have a limited amount from each project/group/resource request eligible for scheduling.
-- Reuti
Post by chambon
Thank you very much for your answer.
Bernard CHAMBON
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302476
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302482
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302568

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
Loading...