Discussion:
SGE 6.1 (6.1u3) sends double email notifications
adary
2010-12-01 15:21:56 UTC
Permalink
To clarify this,

Job is sent with -m a and -M ***@email parameters

When a job is killed with qdel, two emails are sent to the user instead of one.

First email is the regular emai:


Job 1016932 (vim) Aborted

Exit Status = 137

Signal = KILL

User = adary

Queue = ***@lnx4073.il.marvell.com

Host = lnx4073.il.marvell.com

Start Time = 12/01/2010 16:04:53

End Time = 12/01/2010 16:05:20

CPU = 00:00:00

Max vmem = 408.438M

failed assumedly after job because:

job 1016932.1 died through signal KILL (9)



Second mail looks like this:



Job 1079336 (sleep) was killed by ***@adary-lnx.il.marvell.com<mailto:***@adary-lnx.il.marvell.com>





I cant find a reason for this behavior, and users clain that they started getting the second mail only in the last few weeks (this grid is in production for the last three years)



Anyone got an idea how can something like this happen and how to suppress the extra second mail?



Another related question: Is there a way to get only one email when a job array is killed? Right now in ideal situation I would get a mail for every running task in the job array (and we have arrays of 500+ running tasks)



Looking forward to any answers :)



Y.


________________________________
Yuval Adar, Marvell Israel - Senior UNIX Administrator
6 Hamada Street
Mordot HaCarmel Industrial Park
Yokneam, 20692, Israel
Email: ***@marvell.com<mailto:***@marvell.com>
Office: +972.4.9091188 - OnNet: 704.1188
Fax: +972.4.9091501
Mobile: +972.54.2493958
Web site: http://www.marvell.com<http://www.marvell.com/>

This message may contain confidential, proprietary or legally privileged information. The information is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by telephone or by e-mail and delete the message from your computer.
________________________________

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=301052

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-12-01 16:24:22 UTC
Permalink
Hi,
Post by adary
To clarify this,
When a job is killed with qdel, two emails are sent to the user instead of one.
yes, AFAIK this is the default behavior.
Post by adary
Job 1016932 (vim) Aborted
Exit Status = 137
Signal = KILL
User = adary
Host = lnx4073.il.marvell.com
Start Time = 12/01/2010 16:04:53
End Time = 12/01/2010 16:05:20
CPU = 00:00:00
Max vmem = 408.438M
job 1016932.1 died through signal KILL (9)
I cant find a reason for this behavior, and users clain that they started getting the second mail only in the last few weeks (this grid is in production for the last three years)
Anyone got an idea how can something like this happen and how to suppress the extra second mail?
Another related question: Is there a way to get only one email when a job array is killed? Right now in ideal situation I would get a mail for every running task in the job array (and we have arrays of 500+ running tasks)
This is a long standing demand but nothing is implemented to cover this right now.

You can use a mail wrapper and supress mails which are filtering the mails:

http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=254376

This will check only if the index of the array is "1" and send emails for only this one (originally it was designed to send "Set job to Error" only for task 1 of each array job). Maybe the script can be adjusted for your needs: in your case you don't have to look for "Set" but "Failed" and/or "Aborted" in the subject line and simply ignore it.

-- Reuti
Post by adary
Looking forward to any answers J
Y.
Yuval Adar, Marvell Israel - Senior UNIX Administrator
6 Hamada Street
Mordot HaCarmel Industrial Park
Yokneam, 20692, Israel
Office: +972.4.9091188 - OnNet: 704.1188
Fax: +972.4.9091501
Mobile: +972.54.2493958
Web site: http://www.marvell.com
This message may contain confidential, proprietary or legally privileged information. The information is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by telephone or by e-mail and delete the message from your computer.
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=301063

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
Loading...