Recovering A Corrupt OpenLDAP Database On OSX Server

Recovering A Corrupt OpenLDAP Database On OSX Server
Last night we noticed some services provided by an OSX Leopard Server instance were not working correctly. The iChat, AFP and Web services were not authenticating. In Server Admin.app, the “Overview” tab of the Open Directory service reported…
LDAP Server is: Not Running
Password Server is: Running
Kerberos is: Not Running
Looking at the server error logs through Console.app, the following was occuring every 10 seconds..
com.apple.launchd[1] (org.openldap.slapd[27382]) Exited with exit code: 1
com.apple.launchd[1] (org.openldap.slapd) Throttling respawn: Will start in 10 seconds
The slapd daemon appeared not to be starting. Jumping to the command line, I tested the configuration using the `slapd -Tt` command.
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
bdb(dc=openrain,dc=com): PANIC: fatal region error detected; run recovery
bdb_db_open: Database cannot be opened, err -30978. Restore from backup!
bdb(dc=openrain,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (-30978)
slap_startup failed (test would succeed using the -u switch)
http://discussions.apple.com/message.jspa?messageID=9548971
With a little research, I concluded that..
The OpenLDAP database had been corrupted, and..
The `slapd_db_recover` tool (as present on some Linux installations) is instead named `db_recover`. Ah!
After carefully backing up the /var/db/openldap folder, I ran the recovery tool and re-tested the configuration..
core:openldap admin$ sudo db_recover -h /var/db/openldap/openldap-data/
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
config file testing succeeded
The errors in Console.app stopped, and the Server Admin.app panel started reporting..
LDAP Server is: Not Running
Password Server is: Running
Kerberos is: Not Running
I had to restart the AFP, iChat and Web services on the machine to get everything working again, but all seems well now.
Last night we noticed some services provided by an OSX Leopard Server instance were not working correctly. The iChat, AFP and Web services were not authenticating. In Server Admin.app, the “Overview” tab of the Open Directory service reported…
LDAP Server is: Not Running
Password Server is: Running
Kerberos is: Not Running
Looking at the server error logs through Console.app, the following was occuring every 10 seconds..
com.apple.launchd[1] (org.openldap.slapd[27382]) Exited with exit code: 1
com.apple.launchd[1] (org.openldap.slapd) Throttling respawn: Will start in 10 seconds
The slapd daemon appeared not to be starting. Jumping to the command line, I tested the configuration using the `slapd -Tt` command.
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
bdb(dc=openrain,dc=com): PANIC: fatal region error detected; run recovery
bdb_db_open: Database cannot be opened, err -30978. Restore from backup!
bdb(dc=openrain,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (-30978)
slap_startup failed (test would succeed using the -u switch)
With a little research, I concluded that..
  1. The OpenLDAP database had been corrupted, and..
  2. The `slapd_db_recover` tool (as present on some Linux installations) is instead named `db_recover`. Ah!
After carefully backing up the /var/db/openldap folder, I ran the recovery tool and re-tested the configuration..
core:openldap admin$ sudo db_recover -h /var/db/openldap/openldap-data/
core:openldap admin$ sudo /usr/libexec/slapd -Tt
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
overlay_config(): warning, overlay “dynid” already in list
config file testing succeeded
The errors in Console.app stopped, and the Server Admin.app panel started reporting..
LDAP Server is: Running
Password Server is: Running
Kerberos is: Running
I had to restart the AFP, iChat and Web services on the machine to get everything working again, but all seems well now.
  • Share/Bookmark
You can leave a response, or trackback from your own site.

20 Responses to “Recovering A Corrupt OpenLDAP Database On OSX Server”

  1. Paul says:

    This tip saved my ass! Thank you.

  2. preston.lee says:

    @Paul
    You’re welcome!

  3. Terrance says:

    Followed your instructions, worked like a champ!! Thanks!!

  4. jan says:

    Thank you! Weird enough the configuration tester already fixed it for me!

  5. Bob says:

    Add me to the grateful throngs. The LDAP db on my Leopard Server was corrupted after a restart that didn’t. After a panicked morning, I found your post. Worked like a champ!

    (Does 4 count as a throng?)

  6. Craig says:

    This worked great but it was corrupted again this morning after a restart, any reason why this would keep happening? Thanks!

  7. preston.lee says:

    @Craig
    I haven’t seen that happen before. The only time this has happened to me is when I’ve had to hard reset the machine and/or manually kill the daemons: in other words not shutting stuff down properly. My best guess is that there may be something external to LDAP that is corrupting the files on disk, or perhaps a disk issue itself.

  8. Craig says:

    Hi,
    Thanks for your quick response, and actually we are having to force shutdown the machine as it is getting stuck on shutdown. Thanks so much for the insight.

    @preston.lee

  9. preston.lee says:

    @Craig
    No problem. If you’re consistently needing to force shutdown, try running `sync` at the command line before hitting the power button. That should force flush the disks write buffers. Hopefully that’ll at least keep LDAP from getting corrupted.

  10. Marco Papa says:

    Preston, another great thanks from me. Anyway, my situation was identical to yours. Same identical messages, and corruption generated by hitting the power button.

    I did the db_recover and it worked. Only thing, Kerberos is still listed as “Stopped”. Any ideas why? Everything seems to work.

  11. preston.lee says:

    @Marco Papa
    If you’ve already tried restarting the “Open Directory” service a couple times, I’m not sure. Is there anything in the system logs that seems relevant? Could it be a DNS problem?

  12. Adam says:

    Thank you!

  13. Mel says:

    Preston, yet another big vote of thanks for this – you saved our bacon big time!

  14. Adam says:

    MAN YOU SAVED OUR BACON!! Thanks a million we owe you a PINT. :)

  15. Quentin says:

    Great post & instructions; like the others, it got me out of a jam after an update hung the restart and left me with a corrupt db.

  16. Ranj says:

    Thanks a lot this assisted me with a similar problem I was having.

    I had to run a few more commands in terminal to get it working though

    1) sudo to root

    sudo -i
    2) shutdown the open directory server

    service org.openldap.slapd stop

    3) dump a copy of the Open Directory database to an LDIF format text file

    mkdir /var/root/opendirectory
    cd /var/root/opendirectory
    slapcat -l dir.ldif
    4) move the old (corrupt) database files out of the way (or remove them).

    cd /var/db/openldap/openldap-data
    mkdir SAVE
    mv *.bdb SAVE/

    be sure you don’t move, rename or delete the file named DB_CONFIG. It’s needed.

    5) recreate the database from the LDIF format file

    cd /var/root/opendirectory
    slapadd -l dir.ldif
    slapindex
    You will see some harmless warnings during slapadd. Ignore them.

    6) restart open directory

    service org.openldap.slapd start

  17. Jon Zgoda says:

    Thank you! Thank you! Thank you!!!

    I thought I was being good, doing a backup through ServerAdmin before any updates…but then I wasn’t able to restore through ServerAdmin when this problem occurred.

    Your solution worked perfectly, and was back up in minutes.

  18. john lewis says:

    Ranj’s instructions worked for me. THANKS GUYS! Phew!

  19. Moshik says:

    Ranj 10x Alot…!!! saved us also…

  20. Daniel says:

    Thank you.

    This has happened to to me three times.
    Two times I rebuilt from scratch.

    This time – thank you!

Leave a Reply