mdsteeves
2010-12-13 20:27:05 UTC
We're running SGE 6.2u4 on RHEL5.4.
We've set up Olesen to help users run jobs on the cluster that require
FLEXlm licenses, and would also like to be able to set up a resource
quota so that when users launch jobs they're not able to lock up all of
the licenses:
{
name moe_limit
description limit everyone to no more than 20 moe license
enabled TRUE
limit users {*} to moe=20
}
For some reason, though, we're running into problems with some users
that submit jobs that use PEs, and also request certain resources with
the "-l" switch get stuck in a qw state, and the message references the
resource quota:
scheduling info: queue instance "***@compute-1-25.local"
dropped because it is disabled
queue instance "***@compute-0-11.local"
dropped because it is disabled
queue instance "***@compute-1-26.local"
dropped because it is full
cannot run in queue "himem.q" because it is
not contained in its hard queue list (-q)
cannot run because it exceeds limit
"steevmi1/////" in rule "moe_limit/1"
cannot run in PE "orte" because it only
offers 0 slots
For testing, I've been using the following script:
#!/bin/bash
#$ -S /bin/ksh
#$ -j y
#$ -cwd
#$ -q mpi.q
#$ -pe orte 8
#$ -N mdsTest
## The following all work:
## #$ -l h_cpu=1
## #$ -l mem_total=5G
## #$ -l arch=lx26-amd64
## #$ -l moe=1
## Any of the following do not work, and cause the job to hang in the
queue:
## #$ -l q=mpi.q
## #$ -l hostname="compute-0-2"
## #$ -l
hostname="compute-0-78|compute-0-106|compute-0-69|compute-0-68|compute-0-100|compute-0-63|compute-0-93|compute-0-82|compute-0-76"
hostname
sleep 300
Even switching from "-q mpi.q" to "-masterq mpi.q" doesn't help any. If
we disable the resource quota rule, then the jobs run without any
problems. Is there something that we're missing?
-Mike
We've set up Olesen to help users run jobs on the cluster that require
FLEXlm licenses, and would also like to be able to set up a resource
quota so that when users launch jobs they're not able to lock up all of
the licenses:
{
name moe_limit
description limit everyone to no more than 20 moe license
enabled TRUE
limit users {*} to moe=20
}
For some reason, though, we're running into problems with some users
that submit jobs that use PEs, and also request certain resources with
the "-l" switch get stuck in a qw state, and the message references the
resource quota:
scheduling info: queue instance "***@compute-1-25.local"
dropped because it is disabled
queue instance "***@compute-0-11.local"
dropped because it is disabled
queue instance "***@compute-1-26.local"
dropped because it is full
cannot run in queue "himem.q" because it is
not contained in its hard queue list (-q)
cannot run because it exceeds limit
"steevmi1/////" in rule "moe_limit/1"
cannot run in PE "orte" because it only
offers 0 slots
For testing, I've been using the following script:
#!/bin/bash
#$ -S /bin/ksh
#$ -j y
#$ -cwd
#$ -q mpi.q
#$ -pe orte 8
#$ -N mdsTest
## The following all work:
## #$ -l h_cpu=1
## #$ -l mem_total=5G
## #$ -l arch=lx26-amd64
## #$ -l moe=1
## Any of the following do not work, and cause the job to hang in the
queue:
## #$ -l q=mpi.q
## #$ -l hostname="compute-0-2"
## #$ -l
hostname="compute-0-78|compute-0-106|compute-0-69|compute-0-68|compute-0-100|compute-0-63|compute-0-93|compute-0-82|compute-0-76"
hostname
sleep 300
Even switching from "-q mpi.q" to "-masterq mpi.q" doesn't help any. If
we disable the resource quota rule, then the jobs run without any
problems. Is there something that we're missing?
-Mike
--
Michael Steeves (***@gmail.com)
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=305177
To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
Michael Steeves (***@gmail.com)
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=305177
To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].