Recovering A Corrupt OpenLDAP Database On OSX Server

Written by

Last night we noticed some services provided by an OSX Leopard Server instance were not working correctly. The iChat, AFP and Web services were not authenticating. In Server Admin.app, the “Overview” tab of the Open Directory service reported…

LDAP Server is: Not Running

Password Server is: Running

Kerberos is: Not Running

Looking at the server error logs through Console.app, the following was occuring every 10 seconds..

com.apple.launchd[1] (org.openldap.slapd[27382]) Exited with exit code: 1

com.apple.launchd[1] (org.openldap.slapd) Throttling respawn: Will start in 10 seconds

The slapd daemon appeared not to be starting. Jumping to the command line, I tested the configuration using the `slapd -Tt` command.

core:openldap admin$ sudo /usr/libexec/slapd -Tt

overlay_config(): warning, overlay “dynid” already in list

bdb(dc=openrain,dc=com): PANIC: fatal region error detected; run recovery

bdb_db_open: Database cannot be opened, err -30978. Restore from backup!

bdb(dc=openrain,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem

backend_startup_one: bi_db_open failed! (-30978)

slap_startup failed (test would succeed using the -u switch)

http://discussions.apple.com/message.jspa?messageID=9548971

With a little research, I concluded that..

The OpenLDAP database had been corrupted, and..

The `slapd_db_recover` tool (as present on some Linux installations) is instead named `db_recover`. Ah!

After carefully backing up the /var/db/openldap folder, I ran the recovery tool and re-tested the configuration..

core:openldap admin$ sudo db_recover -h /var/db/openldap/openldap-data/

core:openldap admin$ sudo /usr/libexec/slapd -Tt

overlay_config(): warning, overlay “dynid” already in list

config file testing succeeded

The errors in Console.app stopped, and the Server Admin.app panel started reporting..

LDAP Server is: Not Running

Password Server is: Running

Kerberos is: Not Running

I had to restart the AFP, iChat and Web services on the machine to get everything working again, but all seems well now.

LDAP Server is: Not Running

Password Server is: Running

Kerberos is: Not Running

Looking at the server error logs through Console.app, the following was occuring every 10 seconds..

com.apple.launchd[1] (org.openldap.slapd[27382]) Exited with exit code: 1

com.apple.launchd[1] (org.openldap.slapd) Throttling respawn: Will start in 10 seconds

The slapd daemon appeared not to be starting. Jumping to the command line, I tested the configuration using the `slapd -Tt` command.

core:openldap admin$ sudo /usr/libexec/slapd -Tt

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

bdb(dc=openrain,dc=com): PANIC: fatal region error detected; run recovery

bdb_db_open: Database cannot be opened, err -30978. Restore from backup!

bdb(dc=openrain,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem

backend_startup_one: bi_db_open failed! (-30978)

slap_startup failed (test would succeed using the -u switch)

With a little research, I concluded that..

The OpenLDAP database had been corrupted, and..
The `slapd_db_recover` tool (as present on some Linux installations) is instead named `db_recover`. Ah!

After carefully backing up the /var/db/openldap folder, I ran the recovery tool and re-tested the configuration..

core:openldap admin$ sudo db_recover -h /var/db/openldap/openldap-data/

core:openldap admin$ sudo /usr/libexec/slapd -Tt

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

overlay_config(): warning, overlay “dynid” already in list

config file testing succeeded

The errors in Console.app stopped, and the Server Admin.app panel started reporting..

LDAP Server is: Running

Password Server is: Running

Kerberos is: Running

I had to restart the AFP, iChat and Web services on the machine to get everything working again, but all seems well now.

Comments

44 responses to “Recovering A Corrupt OpenLDAP Database On OSX Server”

2009.07.08

Paul

This tip saved my ass! Thank you.
2009.07.09

preston.lee

@Paul
You’re welcome!
2009.07.09

Terrance

Followed your instructions, worked like a champ!! Thanks!!
2009.08.12

jan

Thank you! Weird enough the configuration tester already fixed it for me!
2009.08.20

Bob

Add me to the grateful throngs. The LDAP db on my Leopard Server was corrupted after a restart that didn’t. After a panicked morning, I found your post. Worked like a champ!

(Does 4 count as a throng?)
2009.09.08

Craig

This worked great but it was corrupted again this morning after a restart, any reason why this would keep happening? Thanks!
2009.09.08

preston.lee

@Craig
I haven’t seen that happen before. The only time this has happened to me is when I’ve had to hard reset the machine and/or manually kill the daemons: in other words not shutting stuff down properly. My best guess is that there may be something external to LDAP that is corrupting the files on disk, or perhaps a disk issue itself.
2009.09.08

Craig

Hi,
Thanks for your quick response, and actually we are having to force shutdown the machine as it is getting stuck on shutdown. Thanks so much for the insight.

@preston.lee
2009.09.08

preston.lee

@Craig
No problem. If you’re consistently needing to force shutdown, try running `sync` at the command line before hitting the power button. That should force flush the disks write buffers. Hopefully that’ll at least keep LDAP from getting corrupted.
2009.09.21

Marco Papa

Preston, another great thanks from me. Anyway, my situation was identical to yours. Same identical messages, and corruption generated by hitting the power button.

I did the db_recover and it worked. Only thing, Kerberos is still listed as “Stopped”. Any ideas why? Everything seems to work.
2009.09.22

preston.lee

@Marco Papa
If you’ve already tried restarting the “Open Directory” service a couple times, I’m not sure. Is there anything in the system logs that seems relevant? Could it be a DNS problem?
2009.09.29

Adam

Thank you!
2009.11.10

Mel

Preston, yet another big vote of thanks for this – you saved our bacon big time!
2009.11.11

Adam

MAN YOU SAVED OUR BACON!! Thanks a million we owe you a PINT. 🙂
2009.12.01

Quentin

Great post & instructions; like the others, it got me out of a jam after an update hung the restart and left me with a corrupt db.
2009.12.07

Ranj

Thanks a lot this assisted me with a similar problem I was having.

I had to run a few more commands in terminal to get it working though

1) sudo to root

sudo -i
2) shutdown the open directory server

service org.openldap.slapd stop

3) dump a copy of the Open Directory database to an LDIF format text file

mkdir /var/root/opendirectory
cd /var/root/opendirectory
slapcat -l dir.ldif
4) move the old (corrupt) database files out of the way (or remove them).

cd /var/db/openldap/openldap-data
mkdir SAVE
mv *.bdb SAVE/

be sure you don’t move, rename or delete the file named DB_CONFIG. It’s needed.

5) recreate the database from the LDIF format file

cd /var/root/opendirectory
slapadd -l dir.ldif
slapindex
You will see some harmless warnings during slapadd. Ignore them.

6) restart open directory

service org.openldap.slapd start
2009.12.08

Jon Zgoda

Thank you! Thank you! Thank you!!!

I thought I was being good, doing a backup through ServerAdmin before any updates…but then I wasn’t able to restore through ServerAdmin when this problem occurred.

Your solution worked perfectly, and was back up in minutes.
2009.12.16

john lewis

Ranj’s instructions worked for me. THANKS GUYS! Phew!
2009.12.23

Moshik

Ranj 10x Alot…!!! saved us also…
2010.03.11

Daniel

Thank you.

This has happened to to me three times.
Two times I rebuilt from scratch.

This time – thank you!
2010.05.14

Junjun

This works! Won’t know db_recovery is the right tool without reading your post.

On my Xserve, I have to use this to recover the database.

sudo db_recover -cev -h /var/db/openldap/openldap-data/
2010.05.14

MacDave

Worked great for me — thanks!
2010.06.29

BradDS

I may try this seeing all the positive responses and the fact that I am currently in the same boat, but did you lose all of your LDAP setting or did they stay intact, because I see you said you carefully backed up the openldap directory but does this require you to replace it? Or does it fix the existing LDAP
2010.06.29

admin

For *me* it fixed my existing database.
2010.06.29

BradDS

That would be wonderful. One last question if I may. You also mentioned you had to restart services such as AFP after. What was the reason? Did it prevent the LDAP from fully starting or running that recovery caused those services to stop?
2010.06.29

BradDS

Chalk up another Success Story! Thanks!
2010.07.25

BigClay

Thanks for the post it saved my ‘bacon’ as well. Now on to make a better back up approach for the OS X server!!!
2010.08.17

Gill

KARMA is a boomerang and you have a lot of good KARMA coming your way my friend. This fix is fantastic! and a life saver for all who lose theri ldap data!!

TY TY TY !!
2010.08.17

admin

No problem and I certainly hope so! 🙂
2010.08.18

Brian Jønch

Followed your instructions, worked like a champ, Thanks alot.

/BJ
2010.08.23

Eric

You are the man!!! You saved me from a LOT of work, time, frustration, profanity, ulcers, high blood pressure, etc.
2010.08.25

Geoff Smyth

Elevated to legend status you are.
2010.10.05

Michael

Just about to poo my pants until I came across this article. Many thanks Guys!
2011.01.14

Kenny

LDAP Server up and running again.
Thanks a lot, this did it, great !!!!!
2011.10.04

Steve

Preston, thanks for the article. I’m experiencing a similar problem, though OD reports that LDAP, Password Server and Kerberos are all running. But in my LDAP log I get the similar messages as you:

Oct 4 11:07:36 s1 slapd[26448]: bdb(cn=accesslog): DB_ENV->lock_id interface requires an environment configured for the locking subsystem
Oct 4 11:07:36: — last message repeated 3 times —
Oct 4 11:07:36 s1 slapd[26448]: findbase failed! 80

When I tested the config, I get this:

s1:~ sadmin$ sudo /usr/libexec/slapd -Tt
bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
bdb_db_open: database “cn=accesslog”: unclean shutdown detected; attempting recovery.
bdb_db_open: database “cn=accesslog”: recovery skipped in read-only mode. Run manual recovery if errors are encountered.
bdb_db_open: database “cn=accesslog”: alock_recover failed
bdb_db_open: could not restore bdb backend -1config file testing succeeded
bdb_db_close: database “cn=accesslog”: alock_close failed

Do you think I should follow your steps above in using the db_recovery tool, or is that not applicable to my situation? Authentication and all services are working on the server, but I’m afraid this problem could manifest itself in ugly ways if I let it continue.
2011.10.04

Steve

After some other troubleshooting I tried the db_recover tool, and rather than recovering the database, I received these messages:

Oct 4 11:46:18 s1 slapd[1116]: bdb(dc=*****,dc=*****): PANIC: fatal region error detected; run recovery
Oct 4 11:46:18 s1 slapd[1116]: SASL [conn=29] Failure: no user in database _ldap_replicator

I’ve rebooted the machine, manually unloaded & loaded slapd, and ran the recovery tool multiple times, and nothing has worked. Any other ideas? Getting pretty nervous now.
2011.10.05

Steve

Not sure if anyone is checking this thread anymore, but just in case someone with the same problem reads it, here was my solution. None of the commands above worked in my situation – I suppose the corruption was too bad for the db_recover tool (or the manual .ldif steps), so I took one of my healthy OD Replicas and promoted it to Master. I then destroyed OD on the the previous Master, rebooted and made it a Replica of the new Master. This resulted in a fresh, clean database on the old Master, as well as not losing any records or passwords on the new one. Took me hours of troubleshooting and trial and error tests, but I should have done this as my very first step. There was minimal downtime and to most users the transition was transparent.
2011.10.18

leolor

has exactly the same problem on leopard 10.5.8 server. thanks. it works again for me!!!!!!!
2011.12.22

MV

Seriously, put up a Paypal link since you saved me a few hours of research. Or a mailing address.
2012.02.12

linux-blog – Fa. anracon – Dr. Mönchmeyer » Blog Archive » OX5 – LDAP Restaurierung

[…] http://www.prestonlee.com/2009/07/08/recovering-a-corrupt-openldap-database-on-osx-server/ http://serverfault.com/questions/87889/ldap-database-recovery-after-server-crash […]
2012.09.06

Sanjiv Singh

Thanks man !!
2013.08.18

chris

Just saved my Sunday. thanks!
2013.10.07

TD

Thank you, problem I encountered was identical.
2015.07.28

Bryan

Thank you for the help. It is 2015 but I am still running 10.6.8 on my server and this solved the problem.

Recovering A Corrupt OpenLDAP Database On OSX Server

Comments

44 responses to “Recovering A Corrupt OpenLDAP Database On OSX Server”

Leave a Reply

More posts

CQL Studio v1 Download Now Available

Stakeout v4: Service Monitoring and Screenshot Service Now Open Source

CQL Studio: Filling the Gap in CQL Development Tools

CQL Tests UI Getting Live Test Execution