--- On Tue, 23/11/10, craffi <***@sonsorol.org> wrote:
From: craffi <***@sonsorol.org>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: ***@gridengine.sunsource.net
Date: Tuesday, 23 November, 2010, 5:30 PM
Missing mount points representing OS and cluster problems are usually
checked by non-SGE cluster tools although you could presumably write a
JSV or Prolog script that could check for these things.
Best implementation I saw was at a site where the admins had a script
that probed for every OS issue they had ever encountered in the past.
The script ran at node boot time and periodically afterwards. As soon as
any problem was detected the node gets put into disabled state 'd' and
the admins get notified. The same script also puts the node into 'd'
state for the first 5 minutes after boot to make sure that there is time
for problems to show up and be detected before jobs start landing on it.
If the mounts are supposed to be missing (perhaps because different
servers have different mounts configured by deesign) then you can attach
a Boolean true/false attribute to the exec hosts and users could submit
jobs like:Â "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.
For serious and transparent use a JSV might work. The JSV can examine
the user job script and make changes on the fly such as redirecting to a
different queue or queue instance.
License-aware scheduling is another matter. Google "Olesen FlexLM" to
see how it's done with SGE. Basically the modern method involves
declaring requestable/consumable resources for each license entitlement
and making it dynamic via a script that polls the license server and
constantly adjusts the value of the resource. This method has superseded
the load-sensor method.
Hi Craffi,
That's a lot of information. But i'm really not sure if i'll be able to set it up like this. Because we are currently using DRMAA for submitting array jobs. The DRMAA is in python, but it does not use any -l flag at the moment.
Post by llikethatHi,
Is there an option by which SGE can check for the mount points, licenses
etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes
which do not satisfy this.
Thanks,
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=297928
To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298222
To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].