Discussion:
checking mount points or any other user defined attributes
llikethat
2010-11-23 11:23:37 UTC
Permalink
Hi,
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Thanks,

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=297917

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-11-23 11:56:46 UTC
Permalink
Post by llikethat
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=297927

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
llikethat
2010-11-24 05:21:48 UTC
Permalink
--- On Tue, 23/11/10, reuti <***@staff.uni-marburg.de> wrote:

From: reuti <***@staff.uni-marburg.de>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: ***@gridengine.sunsource.net
Date: Tuesday, 23 November, 2010, 5:26 PM
Post by llikethat
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes

-- Reuti
Hi Reuti,
Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points


------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=297927

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298219

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-11-24 08:42:21 UTC
Permalink
Post by llikethat
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Tuesday, 23 November, 2010, 5:26 PM
Post by llikethat
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes
-- Reuti
Hi Reuti,
Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points
You define a complex as RESTRING and fill it with the mount points available on a machine in each exechost's specification:

$ qconf -me node01
...
complex_values mounts=:/nfs/fubar1:/nfs/fubar2:

and then you request e.g.

$ qsub -l mounts="*:/nfs/fubar2:*" ...

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298259

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
llikethat
2010-11-24 11:15:17 UTC
Permalink
--- On Wed, 24/11/10, reuti <***@staff.uni-marburg.de> wrote:

From: reuti <***@staff.uni-marburg.de>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: ***@gridengine.sunsource.net
Date: Wednesday, 24 November, 2010, 2:12 PM
Post by llikethat
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Tuesday, 23 November, 2010, 5:26 PM
Post by llikethat
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes
-- Reuti
Hi Reuti,
Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points
You define a complex as RESTRING and fill it with the mount points available on a machine in each exechost's specification:

$ qconf -me node01
...
complex_values mounts=:/nfs/fubar1:/nfs/fubar2:

and then you request e.g.

$ qsub -l mounts="*:/nfs/fubar2:*" ...

-- Reuti


Thank you very much for the reply will try this out and get back to you.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298259

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298291

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
llikethat
2010-11-24 11:27:17 UTC
Permalink
--- On Wed, 24/11/10, llikethat <***@yahoo.com> wrote:

From: llikethat <***@yahoo.com>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: ***@gridengine.sunsource.net
Date: Wednesday, 24 November, 2010, 4:45 PM--- On Wed, 24/11/10, reuti <***@staff.uni-marburg.de> wrote:

From: reuti <***@staff.uni-marburg.de>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: ***@gridengine.sunsource.net
Date: Wednesday, 24 November, 2010, 2:12 PM
Post by llikethat
Subject: Re: [GE users] checking mount
points or any other user defined attributes
Post by llikethat
Date: Tuesday, 23 November, 2010, 5:26 PM
Post by llikethat
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes
-- Reuti
Hi Reuti,
Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can
be set for the mount points

You define a complex as RESTRING and fill it with the mount points available on a machine in each exechost's specification:

$ qconf -me node01
...
complex_values mounts=:/nfs/fubar1:/nfs/fubar2:

and then you request e.g.

$ qsub -l mounts="*:/nfs/fubar2:*" ...

-- Reuti


Thank you very much for the reply will try this out and get back to you.
Hi Reuti,
I had doubt in setting the complexes,
If i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?
-Bharani

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298293

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-11-24 11:38:02 UTC
Permalink
Post by llikethat
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Wednesday, 24 November, 2010, 4:45 PM
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Wednesday, 24 November, 2010, 2:12 PM
Post by llikethat
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Tuesday, 23 November, 2010, 5:26 PM
Post by llikethat
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes
-- Reuti
Hi Reuti,
Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points
$ qconf -me node01
...
and then you request e.g.
$ qsub -l mounts="*:/nfs/fubar2:*" ...
-- Reuti
Thank you very much for the reply will try this out and get back to you.
Hi Reuti,
I had doubt in setting the complexes,
If i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?
No, it's just a fixed string - SGE doesn't know what it is, and it's also not necessary. Normally I would assume that you don't change mount points twice an hour. So they are fixed bound to machines. There is nothing to check for SGE.

You could nvertheless setup a load sensor, which will report the string of found mount points in a generic way for all machines. In a format as described (to avoid that a substring matches a found mount point) you can then fill the values automatically.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298294

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
llikethat
2010-11-24 12:06:50 UTC
Permalink
--- On Wed, 24/11/10, reuti <***@staff.uni-marburg.de> wrote:

From: reuti <***@staff.uni-marburg.de>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: ***@gridengine.sunsource.net
Date: Wednesday, 24 November, 2010, 5:08 PM
Post by llikethat
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Wednesday, 24 November, 2010, 4:45 PM
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Wednesday, 24 November, 2010, 2:12 PM
Post by llikethat
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Tuesday, 23 November, 2010, 5:26 PM
Post by llikethat
Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.
Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes
-- Reuti
Hi Reuti,
Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points
$ qconf -me node01
...
and then you request e.g.
$ qsub -l mounts="*:/nfs/fubar2:*" ...
-- Reuti
Thank you very much for the reply will try this out and get back to you.
Hi Reuti,
I had doubt in setting the complexes,
If i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?
No, it's just a fixed string - SGE doesn't know what it is, and it's also not necessary. Normally I would assume that you don't change mount points twice an hour. So they are fixed bound to machines. There is nothing to check for SGE.

You could nvertheless setup a load sensor, which will report the string of found mount points in a generic way for all machines. In a format as described (to avoid that a substring matches a found mount point) you can then fill the values automatically.

-- Reuti


Hi,
Oh ok, now i got to understand it better. Instead of doing a load sensor. What if i do a prolog which will be the mount commands to mount the NFS shares before submitting the job. Will this work?
-Bharani

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298299

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-11-25 09:42:53 UTC
Permalink
<snip>
Post by llikethat
If i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?
No, it's just a fixed string - SGE doesn't know what it is, and it's also not necessary. Normally I would assume that you don't change mount points twice an hour. So they are fixed bound to machines. There is nothing to check for SGE.
You could nvertheless setup a load sensor, which will report the string of found mount points in a generic way for all machines. In a format as described (to avoid that a substring matches a found mount point) you can then fill the values automatically.
-- Reuti
Hi,
Oh ok, now i got to understand it better. Instead of doing a load sensor. What if i do a prolog which will be the mount commands to mount the NFS shares before submitting the job. Will this work?
You would have to tell the job, which particular mount points are necessary for this job. When I get you right, you don't want to mount all mount points all the time.

A place for such information (which is unrelated to SGE in any way), are is the job context. This is so called meta-informastion and not used by SGE in any way. But you on your own can set an access this information:

$ qsub -ac MOUNTS=/nfs/app1,/nfs/app2 myjob.sh

Then you can access this information with `qstat -j $JOB_ID` in the line with the entry "context:". It may be necessary to run the prolog and epilog then as root, which can be achieved by prefixing the path to the script with root@/usr/sge/cluster/myprolog.sh

Pitfall: when you have more than one job on a node at a time, it might be necessary to check, whether any other job running on this particular node is still using one mount point which you would like to unmout in an epilog. To avoid in addition a race condition, the clean solution in this case would even be to disable the queue instance in the epilog, check for other jobs using the mount point, unmount them, enable the queue instance again.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298660

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
craffi
2010-11-23 12:00:09 UTC
Permalink
Missing mount points representing OS and cluster problems are usually
checked by non-SGE cluster tools although you could presumably write a
JSV or Prolog script that could check for these things.

Best implementation I saw was at a site where the admins had a script
that probed for every OS issue they had ever encountered in the past.
The script ran at node boot time and periodically afterwards. As soon as
any problem was detected the node gets put into disabled state 'd' and
the admins get notified. The same script also puts the node into 'd'
state for the first 5 minutes after boot to make sure that there is time
for problems to show up and be detected before jobs start landing on it.

If the mounts are supposed to be missing (perhaps because different
servers have different mounts configured by deesign) then you can attach
a Boolean true/false attribute to the exec hosts and users could submit
jobs like: "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.

For serious and transparent use a JSV might work. The JSV can examine
the user job script and make changes on the fly such as redirecting to a
different queue or queue instance.

License-aware scheduling is another matter. Google "Olesen FlexLM" to
see how it's done with SGE. Basically the modern method involves
declaring requestable/consumable resources for each license entitlement
and making it dynamic via a script that polls the license server and
constantly adjusts the value of the resource. This method has superseded
the load-sensor method.
Post by llikethat
Hi,
Is there an option by which SGE can check for the mount points, licenses
etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes
which do not satisfy this.
Thanks,
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=297928

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
llikethat
2010-11-24 05:31:57 UTC
Permalink
--- On Tue, 23/11/10, craffi <***@sonsorol.org> wrote:

From: craffi <***@sonsorol.org>
Subject: Re: [GE users] checking mount points or any other user defined attributes
To: ***@gridengine.sunsource.net
Date: Tuesday, 23 November, 2010, 5:30 PM

Missing mount points representing OS and cluster problems are usually
checked by non-SGE cluster tools although you could presumably write a
JSV or Prolog script that could check for these things.

Best implementation I saw was at a site where the admins had a script
that probed for every OS issue they had ever encountered in the past.
The script ran at node boot time and periodically afterwards. As soon as
any problem was detected the node gets put into disabled state 'd' and
the admins get notified. The same script also puts the node into 'd'
state for the first 5 minutes after boot to make sure that there is time
for problems to show up and be detected before jobs start landing on it.

If the mounts are supposed to be missing (perhaps because different
servers have different mounts configured by deesign) then you can attach
a Boolean true/false attribute to the exec hosts and users could submit
jobs like:  "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.

For serious and transparent use a JSV might work. The JSV can examine
the user job script and make changes on the fly such as redirecting to a
different queue or queue instance.

License-aware scheduling is another matter. Google "Olesen FlexLM" to
see how it's done with SGE. Basically the modern method involves
declaring requestable/consumable resources for each license entitlement
and making it dynamic via a script that polls the license server and
constantly adjusts the value of the resource. This method has superseded
the load-sensor method.

Hi Craffi,
That's a lot of information. But i'm really not sure if i'll be able to set it up like this. Because we are currently using DRMAA for submitting array jobs. The DRMAA is in python, but it does not use any -l flag at the moment.
Post by llikethat
Hi,
Is there an option by which SGE can check for the mount points, licenses
etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes
which do not satisfy this.
Thanks,
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=297928

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298222

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-11-24 08:36:20 UTC
Permalink
Post by llikethat
Subject: Re: [GE users] checking mount points or any other user defined attributes
Date: Tuesday, 23 November, 2010, 5:30 PM
Missing mount points representing OS and cluster problems are usually
checked by non-SGE cluster tools although you could presumably write a
JSV or Prolog script that could check for these things.
Best implementation I saw was at a site where the admins had a script
that probed for every OS issue they had ever encountered in the past.
The script ran at node boot time and periodically afterwards. As soon as
any problem was detected the node gets put into disabled state 'd' and
the admins get notified. The same script also puts the node into 'd'
state for the first 5 minutes after boot to make sure that there is time
for problems to show up and be detected before jobs start landing on it.
If the mounts are supposed to be missing (perhaps because different
servers have different mounts configured by deesign) then you can attach
a Boolean true/false attribute to the exec hosts and users could submit
jobs like: "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.
For serious and transparent use a JSV might work. The JSV can examine
the user job script and make changes on the fly such as redirecting to a
different queue or queue instance.
License-aware scheduling is another matter. Google "Olesen FlexLM" to
see how it's done with SGE. Basically the modern method involves
declaring requestable/consumable resources for each license entitlement
and making it dynamic via a script that polls the license server and
constantly adjusts the value of the resource. This method has superseded
the load-sensor method.
Hi Craffi,
That's a lot of information. But i'm really not sure if i'll be able to set it up like this. Because we are currently using DRMAA for submitting array jobs. The DRMAA is in python, but it does not use any -l flag at the moment.
You can use a native specification in DRMAA to use an -l flag.

-- Reuti
Post by llikethat
Post by llikethat
Hi,
Is there an option by which SGE can check for the mount points, licenses
etc before starting a job on a node?
By doing this I want to restrict SGE not to submit jobs on the nodes
which do not satisfy this.
Thanks,
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=297928
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=298257

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
fx
2010-11-28 15:55:57 UTC
Permalink
Post by craffi
Best implementation I saw was at a site where the admins had a script
that probed for every OS issue they had ever encountered in the past.
The script ran at node boot time and periodically afterwards. As soon as
any problem was detected the node gets put into disabled state 'd' and
the admins get notified.
I'd have hoped that sort of thing was standard practice, for some value
of `every OS issue'. (I use Nagios.) You do need to judge whether it's
worth it for a particular failure mode, both in terms of resources to
write/organize a test, and the resources to run it, which might have a
significant effect on the compute nodes, or the head, if you're running
it there.

The SGE angle is that the job prolog/epilog are a convenient place to
make tests just at the time they particularly matter, without putting a
continual load on the node. You can either ensure the queue goes into
an error state, check that and rely on figuring out why, or use
something like NCSA under Nagios. To have Nagios disable queues, for
instance, you have to be careful either to run specific commands under
sudo or make sure nagios has appropriate SGE privileges, and it's not
necessarily easy to test that it all works.
--
Dave Love
Advanced Research Computing, Computing Services, University of Liverpool
AKA ***@gnu.org

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=299888

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
Loading...