Discussion:
Dealing with Berkeley DB
chambon
2010-12-07 10:36:43 UTC
Permalink
I try to survive with Berkeley DB...

Question 1 :

How to restore a configuration (sge file) ?
I have used the save|load_sge_config.sh, but those scripts need a running qmaster
but what can I do if I can't start qmaster because of failure on sge file ... ? (It not my case for the moment)

Question 2 :
For the moment I have a running GE master. I can read and modify the configuration, submit and run job
nevertheless db_verify show errors on sge file
db_verify -h /nbs/ge6.2u5/default/spool/spooldb sge
db_verify: Page 15: page 17 encountered a second time on free list
db_verify: sge: DB_VERIFY_BAD: Database verification failed

May I trust db_verify ?
if necessary (?), how to repair with db_* tools ?

(The only solution I saw (and successfully test) is db_dump + rm sge + db_load)

Best reagards
Bernard CHAMBON

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302697

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
reuti
2010-12-07 10:53:16 UTC
Permalink
Hi,
Post by chambon
I try to survive with Berkeley DB...
How to restore a configuration (sge file) ?
I have used the save|load_sge_config.sh, but those scripts need a running qmaster
but what can I do if I can't start qmaster because of failure on sge file ... ? (It not my case for the moment)
the best approach is to remove the old configuration completely (i.e. reinstall the qmaster with an empty configuration) and then load the saved one. This is also advisable, as the load script isn't so sophisticated to remove an old installation in the correct way all the time. I mean e.g.: first you have to remove some entries from queues and exechosts, before you can remove them from the complex definition. And also queus must be removed in a certain order, in case there are entries in subordinate_list (like when they are in their subordinate_list one another, first you have to remove one subordinate_entry, before you can remove the other queue, and then the initial queue).

When a queue is left behind by the script, also attached PEs and CKPTs may not be removed.
Post by chambon
For the moment I have a running GE master. I can read and modify the configuration, submit and run job
nevertheless db_verify show errors on sge file
db_verify -h /nbs/ge6.2u5/default/spool/spooldb sge
db_verify: Page 15: page 17 encountered a second time on free list
db_verify: sge: DB_VERIFY_BAD: Database verification failed
May I trust db_verify ?
if necessary (?), how to repair with db_* tools ?
(The only solution I saw (and successfully test) is db_dump + rm sge + db_load)
Yes, it should: http://gridengine.info/2008/11/11/fixing-a-berkeley-db-spool-database

-- Reuti
Post by chambon
Best reagards
Bernard CHAMBON
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302697
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302705

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
chambon
2010-12-08 09:04:55 UTC
Permalink
Hello,
Post by reuti
the best approach is to remove the old configuration completely (i.e. reinstall the qmaster with an empty configuration) and then load the saved one. This is also advisable, as the load script isn't so sophisticated to remove an old installation in the correct way all the time. I mean e.g.: first you have to remove some entries from queues and exechosts, before you can remove them from the complex definition. And also queus must be removed in a certain order, in case there are entries in subordinate_list (like when they are in their subordinate_list one another, first you have to remove one subordinate_entry, before you can remove the other queue, and then the initial queue).
When a queue is left behind by the script, also attached PEs and CKPTs may not be removed.
Ok
Thank you for the suggestions.

I have also setup a periodical db_dump for config (sge) et job (sge_job)
db_dump + db_load do not need qmaster.


I have another question about BDB log files removal :
Are the log files, not involved in transactions, automatically removed ?

It seems to me that yes ?
If that's true, what is the purpose of the script : util/bdb_checkpoint.sh ?
and more tedious (boring), can I perform catastrophic recovery if such log files are removed ?

At last , how to disable "auto remove" of log files ? I have tried to setup a DB_CONFIG file but the only flag I know is
DB_LOG_AUTOREMOVE, What is the syntax | flag to disable ?

Best regards
Bernard CHAMBON

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=303063

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].
Loading...