Modify

Ticket #3082 (closed defect: fixed)

Opened 22 months ago

Last modified 22 months ago

LDAP Slaves fail to sync on Zentyal Beta 2.2-rc1

Reported by: r.hoelzemer@… Owned by: cperez@…
Milestone: 2.2 Component: users
Severity: normal Keywords: ldap master slave replication
Cc:

Description

Tested with three (vitual) machines, a master ("ldap") and two slaves ("gateway" and "server"), and followed the wiki HOWTO  here. I already have a master/slave scenario with EBox 1.5.x in production at work, so i know the pifalls of that setup.

It might also be worth to note that I did this whole setup three times now to make sure it is indeed a bug and not some silly mistake I might have made.

Here's what I have done to get things going:

  • made sure to uninstall apparmor and reboot on each machine
  • made sure that all machines are resolvable via dns
  • set the incoming ldap rule on the master to accept traffic

Configuring the master and joining the slaves afterwards was a piece of cake. Basically, everything went fine up to this point. I can see the slaves listed in the master and the slaves themselves seem to be connected to the master.

After successfully joining the slaves, here's what i had done:

  • create a new group names "samba" ( this went ok, group is visible on the slaves)
  • create a new user "bob" ( failed! is listed in the slave operations list on the master for both slaves)

From this point on, all Operations, either new users or new groups, fail to replicate to the slaves.

Attached are the logfiles of all three machines.

Attachments

zentyal_gateway.log Download (58.3 KB) - added by anonymous 22 months ago.
software_gateway.log Download (36.9 KB) - added by anonymous 22 months ago.
error_gateway.log Download (83.7 KB) - added by anonymous 22 months ago.
error_ldap.log Download (26.7 KB) - added by anonymous 22 months ago.
software_ldap.log Download (13.2 KB) - added by anonymous 22 months ago.
zentyal_ldap.log Download (11.5 KB) - added by anonymous 22 months ago.

Change History

Changed 22 months ago by anonymous

Changed 22 months ago by anonymous

Changed 22 months ago by anonymous

Changed 22 months ago by anonymous

Changed 22 months ago by anonymous

Changed 22 months ago by anonymous

comment:1 Changed 22 months ago by r.hoelzemer@…

Sorry for the doublepost! Trac wouldn't let me upload a zipfile of all logs because it thought it was some sort of spam. :)

Also, I tried to attach all nine logfiles, but apparently there is a six file maximum for attachments here. The posted ones are from the master ("ldap") and one slave ("gateway"). If you still need logs from the other slave ("server"), just ping me. :)

comment:2 Changed 22 months ago by cperez@…

  • Status changed from new to closed
  • Resolution set to fixed

(In [22624]) Do not stop slapd daemons after slave enable (closes #3082, #3070 and #3054)

comment:3 Changed 22 months ago by r.hoelzemer@…

  • Status changed from closed to reopened
  • Resolution fixed deleted

Hi cperez,

unfortunately changeset 22624 didn't actually fix this bug. I tested this in a similar environment as mentioned above, but manually removed the line

$self->_manageService('stop');

in UsersAndGroups?.pm right after installing the module and before configuring anything. Before and after configuring/joining the master and the slaves, i made sure that slapd is indeed running on all machines.

openldap 11568  0.0  1.4 161956  7084 ?        Ssl  13:38   0:00 /usr/sbin/slapd -d 0 -h ldap://0.0.0.0:1389/ -u openldap -g openldap -F /etc/ldap/slapd-replica.d
openldap 11589  0.0  1.4 229772  7140 ?        Ssl  13:38   0:00 /usr/sbin/slapd -d 0 -h ldap://127.0.0.1:1390/ -u openldap -g openldap -F /etc/ldap/slapd-translucent.d
openldap 11609  0.0  1.4 147780  7196 ?        Ssl  13:38   0:00 /usr/sbin/slapd -d 0 -h ldap://0.0.0.0/ ldapi://%2fvar%2frun%2fslapd%2fldapi/????x-mod=0777 -u openldap -g openldap -F /etc/ldap/slapd-frontend.d

The result is exactly the same as before. The errors i get are already available in the last logs i posted.

In the  master logfile line 77-78:

77	2011/07/31 19:46:05 ERROR> Ldap.pm:701 EBox::Ldap::_errorOnLdap - $VAR1 = 'cn=master,dc=example,dc=de';
78	2011/07/31 19:46:05 ERROR> Ldap.pm:703 EBox::Ldap::_errorOnLdap - Unknown error at EBox::UsersAndGroups::__ANON__ No such object

and the  slave logfile line 190-195

190	2011/07/31 19:49:53 ERROR> Ldap.pm:701 EBox::Ldap::_errorOnLdap - $VAR1 = 'ou=Users,dc=example,dc=de';
191	2011/07/31 19:49:53 ERROR> Ldap.pm:703 EBox::Ldap::_errorOnLdap - Unknown error at EBox::UsersAndGroups::__ANON__ Referral received
192	2011/07/31 19:49:53 ERROR> Ldap.pm:701 EBox::Ldap::_errorOnLdap - $VAR1 = 'ou=Groups,dc=example,dc=de';
193	2011/07/31 19:49:53 ERROR> Ldap.pm:703 EBox::Ldap::_errorOnLdap - Unknown error at EBox::UsersAndGroups::__ANON__ Referral received
194	2011/07/31 19:49:53 ERROR> Ldap.pm:701 EBox::Ldap::_errorOnLdap - $VAR1 = 'cn=__USERS__,ou=Groups,dc=example,dc=de';
195	2011/07/31 19:49:53 ERROR> Ldap.pm:703 EBox::Ldap::_errorOnLdap - Unknown error at EBox::UsersAndGroups::__ANON__ Referral received

and line 216-217

216	2011/07/31 20:32:55 ERROR> Ldap.pm:701 EBox::Ldap::_errorOnLdap - $VAR1 = 'cn=samba,ou=Groups,dc=example,dc=de';
217	2011/07/31 20:32:55 ERROR> Ldap.pm:703 EBox::Ldap::_errorOnLdap - Unknown error at EBox::UsersAndGroups::__ANON__ Referral received

If I am missing something here please let me know. For now i reopen the ticket for reference.

comment:4 Changed 22 months ago by cperez@…

Hi,

Have you applied all my patches in that branch? I left a package in my public dir:

 http://people.zentyal.org/~exekias/zentyal-users_2.1.7_all.deb

If you want you can try it first reinstall users module with:

/usr/share/zentyal-users/reinstall (This will remove all your data so do not do that on production)

Then install the download package with:

dpkg -i zentyal-users_2.1.7_all.deb

and configure and enable the module as usual

comment:5 Changed 22 months ago by r.hoelzemer@…

Ok. I knew i was missing something. Sorry, I assumed fix was the above changeset alone.

I just did another test with the above mentioned package and the error is still the same.

Here's what i did:

  • gone back to a clean install on all three machines
  • uninstalled apparmor everywhere
  • installed the neccessary modules plus the new zentyal-users_2.1.7 package on all machines
  • run /usr/share/zentyal-users/reinstall on all machines
  • made sure the master has ldap ports enabled
  • made sure master has only the users module installed
  • made sure slapd is running everywhere

Still no go, unfortunately.

comment:6 Changed 22 months ago by cperez@…

Sorry, you did this:

  • installed the neccessary modules plus the new zentyal-users_2.1.7 package on all machines
  • run /usr/share/zentyal-users/reinstall on all machines

The problem is that reinstall script does reinstall zentyal-users module, taking it from official repository, so you should install with dpkg after running that script :)

comment:7 Changed 22 months ago by r.hoelzemer@…

Hmmm, I am pretty sure i checked that Version 2.1.7 was installed prior to configuring master/slaves. After installing with dpkg, wouldn't the new package be in the apt cache and picked up by an reinstall/upgrade anyway?

Ok. I'll do another test. :)

comment:8 Changed 22 months ago by r.hoelzemer@…

Nope! Out of interest, I did both scenarios - first dpkg -i zentyal-users_2.1.7, then /usr/share/zentyal-users/reinstall or vice versa. Both times, the installed Version of zentyal-users is 2.1.7 and also both scenarios give the exact same error as before.

I did however find something unusual. After some investigation, i decided to activate the firewall logs and discovered that on the master, incoming packets on port 389 are dropped. Then doublechecked the firewall settings - everything fine there.

iptables -L confirms that the port is open:

...
Chain iglobal (1 references)
target     prot opt source               destination         
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ldap state NEW 
drop       tcp  --  anywhere             anywhere            tcp dpt:6677 state NEW 
ACCEPT     udp  --  anywhere             anywhere            udp dpt:ntp state NEW 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh state NEW 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:https state NEW 
...

that's ok, i guess. Then why are ldap packets dropped by the master? Even syslog confirmes the drop:

Aug  2 19:37:37 ldap kernel: [ 7090.392991] ebox-firewall drop IN=eth0 OUT= MAC=08:00:27:6e:c5:9a:08:00:27:6b:e7:27:08:00 SRC=10.0.0.1 DST=10.0.0.5 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=34268 DPT=389 WINDOW=0 RES=0x00 RST URGP=0 MARK=0x1 

My suspicion was that those packets are dropped by the global input INVALID filter, which itself goes into the drop chain and creates that message in syslog.

-A INPUT -m state --state INVALID -j idrop
...
-A drop -m limit --limit 50/min --limit-burst 10 -j LOG --log-prefix "ebox-firewall drop " --log-level 7
-A drop -j DROP

So to distinguish an invalid packet drop from other drop events in the log, I made a copy of the drop chain with a custom "invalid" message just for that first filter.

-A INPUT -m state --state INVALID -j iinvalid
...
-A invalid -m limit --limit 50/min --limit-burst 10 -j LOG --log-prefix "ebox-firewall invalid " --log-level 7
-A invalid -j DROP

Here's the result:

Aug  2 19:46:32 ldap kernel: [ 7625.351966] ebox-firewall invalid IN=eth0 OUT= MAC=08:00:27:6e:c5:9a:08:00:27:6b:e7:27:08:00 SRC=10.0.0.1 DST=10.0.0.5 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=51787 DPT=389 WINDOW=0 RES=0x00 RST URGP=0 MARK=0x1 

So it seems that ldap packets that arrive at the master are INVALID and therefor dropped by the firewall. :) This is basically as far as I could get for now. Hopefully this helps further invesigating the issue.

comment:9 Changed 22 months ago by cperez@…

Uhm, that's weird.

Just a question, if you disable the firewall does the master-slave stuff works?

If the answer is yes we should close this ticket and work in the firewall issue (in this or another one actually)

comment:10 Changed 22 months ago by r.hoelzemer@…

Just tested this with the firewall disabled on all machines. Still the same error.

comment:11 Changed 22 months ago by jsalamero@…

Some things to check:

  • NTP module is installed on all machines, check the time is synchronized with date
  • from the master you can resolve slaves hostname, you even can connect to them to the soap port (same than zentyal interface) using openssl s_client -conect slave:443
  • from slaves you can connect to the master ldap port using ldapsearch -h master -b 'dc=foo,dc=bar' -x -w

comment:12 Changed 22 months ago by r.hoelzemer@…

Yes, I made sure time is synchronized on all machines by installing the virtualbox guest additions plus the NTP module. Also checked before every action that time is the same everywhere.

The whole network is resolvable by IP, name and FQDN. I can connect from the slaves to the master with openssl s_client -connect ldap:443. No problem here.

ldapsearch fails with

ldap_bind: Invalid credentials (49)

I assume the password is the one displayed in the web ui? Made sure the password was correct. Also tried to read it from file with the "-y" switch:

PING ldap.example.de (10.0.0.5) 56(84) bytes of data.
64 bytes from ldap.example.de (10.0.0.5): icmp_seq=1 ttl=64 time=0.485 ms
64 bytes from ldap.example.de (10.0.0.5): icmp_seq=2 ttl=64 time=0.669 ms
64 bytes from ldap.example.de (10.0.0.5): icmp_seq=3 ttl=64 time=0.489 ms
ldapsearch -h ldap.example.de -b 'dc=example,dc=de' -x -w *master_password* 
ldap_bind: Invalid credentials (49)
ldapsearch -h ldap.example.de -b 'dc=example,dc=de' -x -y /var/lib/zentyal/conf/ebox-ldap.passwd 
Warning: Password file /var/lib/zentyal/conf/ebox-ldap.passwd is publicly readable/writeable
ldap_bind: Invalid credentials (49)

comment:13 Changed 22 months ago by cperez@…

Ok,

I think I have reproduced your problem, can you see something like this in your /var/log/syslog?

Aug 3 16:51:57 zentyal slapd[555]: syncrepl_message_to_entry: rid=110 mods check (objectClass: value #3 invalid per syntax)

comment:14 Changed 22 months ago by r.hoelzemer@…

Yes! After adding a new user at the master i have many entries like this on the slaves.

Aug  3 17:07:28 gateway slapd[9021]: syncrepl_message_to_entry: rid=110 mods check (objectClass: value #3 invalid per syntax)
Aug  3 17:07:28 gateway slapd[9021]: do_syncrepl: rid=110 rc 21 retrying (4 retries left)

comment:15 Changed 22 months ago by cperez@…

Ok,

Now I know where is the problem, low level replication is not working well, for sure something related with different schemas between master and slave.

I'm going to work on fixing it! Thank you for your patience and effort :)

Will keep this ticket updated

comment:16 Changed 22 months ago by r.hoelzemer@…

Ahhh, finally some progress!

Thank you aswell for investigating and dealing with my neverending poking sessions. :)

I am looking forward for a bugfix.

comment:17 Changed 22 months ago by cperez@…

  • Status changed from reopened to closed
  • Resolution set to fixed

(In [22630]) Include quota schema in slaves LDAP (fixes replication, closes #3082)

comment:18 Changed 22 months ago by cperez@…

Thank you very much!

This last commit truly fix the problem, Quota schemas were moved to users module (from samba) and caused all this mess.

You will need to apply the patch to the package and reinstall slaves (rembember /usr/share/zentyal-users/reinstall)

comment:19 Changed 22 months ago by cperez@…

I updated the package if you want to use mine:

 http://people.zentyal.org/~exekias/zentyal-users_2.1.7_all.deb

comment:20 Changed 22 months ago by r.hoelzemer@…

Fix confirmed!

Thanks again and have a nice day :)

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.