Modify

Ticket #4603 (closed defect: fixed)

Opened 12 months ago

Last modified 11 months ago

Enabling network module causes shutdown to hang

Reported by: me@… Owned by: jamor@…
Milestone: 3.0 Component: network
Severity: normal Keywords:
Cc:

Description

Under certain conditions a Zentyal server will fail to finish halting or rebooting. I've done quite a bit of troubleshooting to try and narrow it down. If I do "aptitude install zentyal-all" I can reproduce the problem and then I can fix it by doing "aptitude purge ~nzentyal". If I install just the Zentyal packages I want through the web admin and do the wizard configuration, reboot/halt works just fine. If, however, I enable the network module, reboot/halt hangs. If I then disable the network module, reboot/halt works again.

This was all done on a fresh install of Ubuntu Server 12.04 with the Zentyal 2.3 repositories added, running "aptitude install zentyal" and then installing the following components: antivirus, bandwidth monitor, file sharing and domain services, firewall, intrusion detection system, layer-7 filter, monitor, network configuration, printer sharing serrvice, traffic shaping, users and groups, and VPN service. The machine is an HP 2133 Mini-note with a Sabrent USB-G1000 USB ethernet adapter (ASIX AX88178 chipset).

Attachments

dhcp-clear.strace Download (250.6 KB) - added by me@… 12 months ago.
strace /usr/share/zentyal-network/dhcp-clear.pl eth0
redis-lock.lsof Download (368 bytes) - added by me@… 12 months ago.
lsof /run/shm/zentyal/redis.lock

Change History

comment:1 Changed 12 months ago by jamor@…

  • Status changed from new to accepted

comment:2 Changed 12 months ago by me@…

Is there any documentation available about getting debugging information out of Zentyal itself? Can anyone give me any information about how to trace the module stopping/shutdown process so I can figure out where, more specifically, the freeze is happening?

comment:3 Changed 12 months ago by me@…

Manually shutting down zentyal with service zentyal stop (without shutting down the host) also reproduces a console hang.

comment:4 Changed 12 months ago by me@…

Doing ifdown for all interfaces works without any hang. If I reproduce the hang with service zentyal stop I see the following processes:

root      5580  0.0  0.1  13884  3340 tty1     S+   13:56   0:00 sudo service zentyal stop
ebox      5581  4.5  3.8  90480 68620 tty1     S+   13:56   0:05 /usr/bin/perl /etc/init.d/zentyal stop
ebox      6111  0.0  0.0   2216   284 tty1     S+   13:57   0:00 sh -c /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd 2> /var/lib/zentyal/tmp/stderr
root      6112  0.0  0.1  13880  3332 tty1     S+   13:57   0:00 /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd
root      6113  0.0  0.0   2216   512 tty1     S+   13:57   0:00 sh /var/lib/zentyal/tmp/M4mgm0hjwk.cmd
root      6148  0.0  0.0   2160   512 tty1     S    13:57   0:00 /sbin/ifdown --force -i /etc/network/interfaces eth0
root      6168  0.0  0.0   2216   284 tty1     S    13:57   0:00 /bin/sh -c dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root      6169  0.0  0.0   2908  1160 tty1     S    13:57   0:00 dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root      6170  0.0  0.0   3440  1484 tty1     S    13:57   0:00 /bin/bash /sbin/dhclient-script
ebox      6205  2.7  2.5  55748 45160 tty1     S    13:57   0:02 /usr/bin/perl /usr/share/zentyal-network/dhcp-clear.pl eth0

If I kill the dhcp-clear.pl eth0 process, the original console is still hung. If I look at the processes again, I see:

root      5580  0.0  0.1  13884  3340 tty1     S+   13:56   0:00 sudo service zentyal stop
ebox      5581  3.0  3.8  90480 68620 tty1     S+   13:56   0:05 /usr/bin/perl /etc/init.d/zentyal stop
ebox      6111  0.0  0.0   2216   284 tty1     S+   13:57   0:00 sh -c /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd 2> /var/lib/zentyal/tmp/stderr
root      6112  0.0  0.1  13880  3332 tty1     S+   13:57   0:00 /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd
root      6113  0.0  0.0   2216   512 tty1     S+   13:57   0:00 sh /var/lib/zentyal/tmp/M4mgm0hjwk.cmd
root      6148  0.0  0.0   2160   512 tty1     S    13:57   0:00 /sbin/ifdown --force -i /etc/network/interfaces eth0
root      6168  0.0  0.0   2216   284 tty1     S    13:57   0:00 /bin/sh -c dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root      6169  0.0  0.0   2908  1160 tty1     S    13:57   0:00 dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root      6170  0.0  0.0   3440  1500 tty1     S    13:57   0:00 /bin/bash /sbin/dhclient-script
ebox      6463 10.1  2.6  57996 47368 tty1     S    13:59   0:03 /usr/bin/perl /usr/share/zentyal-firewall/dhcp-firewall.pl eth0

Then if I kill the dhcp-firewall.pl eth0 process, the console is no longer hung and the service zentyal stop command exits.

comment:5 Changed 12 months ago by me@…

I removed the DHCP module but the problem still persists.

Changed 12 months ago by me@…

strace /usr/share/zentyal-network/dhcp-clear.pl eth0

comment:6 Changed 12 months ago by me@…

I can reproduce the problem by running /usr/share/zentyal-network/dhcp-clear.pl eth0 so I ran it through strace and it looks like it's hanging trying to get the redis lock:

...
open("/run/shm/zentyal/redis.lock", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 10
ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf949328) = -1 ENOTTY (Inappropriate ioctl for device)
_llseek(10, 0, [0], SEEK_CUR)           = 0
fstat64(10, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fcntl64(10, F_SETFD, FD_CLOEXEC)        = 0
flock(10, LOCK_EX 

Might something not be releasing that lock?

I also attached a more complete strace in case it's helpful.

Changed 12 months ago by me@…

lsof /run/shm/zentyal/redis.lock

comment:7 Changed 12 months ago by me@…

Here's the processes who have that lock file open:

root:/usr/share/zentyal-network# lsof /run/shm/zentyal/redis.lock 
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
zentyal    782 ebox    9wW  REG   0,27        0 9718 /run/shm/zentyal/redis.lock
dhcp-clea 1227 ebox    8w   REG   0,27        0 9718 /run/shm/zentyal/redis.lock
manage-lo 1256 ebox    7w   REG   0,27        0 9718 /run/shm/zentyal/redis.lock

Looks like zentyal itself has the write lock, specifically /usr/bin/perl /etc/init.d/zentyal stop the initial stop command itself.

comment:8 Changed 12 months ago by jamor@…

Hello,

we will look into this in the next days, but if you have your system frozen you could try to remove the redis.lock file by hand

comment:9 Changed 12 months ago by me@…

Removing the redis.lock file doesn't un-hang service zentyal stop

comment:10 Changed 12 months ago by jamor@…

  • Status changed from accepted to closed
  • Resolution set to fixed

Fixed in [f3197f5]

If you want to hotfix your system:

  1. Download the new version of Init.pm from  http://git.zentyal.org/zentyal.git/blob_plain/f3197f58684718c75e4a6fc400110a8ad942ed79:/main/core/src/EBox/Util/Init.pm
  2. Use it to replace /usr/share/perl5/EBox/Util/Init.pm
  3. Execute "sudo /etc/init.d/zental apache restart"

Thanks for taking time to report this issue.

Javier

Last edited 12 months ago by jamor@… (previous) (diff)

comment:11 Changed 12 months ago by jamor@…

By the way the key which enables the debug information in zentyal is located in /etc/zentyal/zentyal.conf but it is already enabled in beta installations

comment:12 Changed 12 months ago by me@…

  • Status changed from closed to reopened
  • Resolution fixed deleted

I applied the hotfix as described and did sudo /etc/init.d/zentyal apache restart and then did a Ctrl-Alt-Del but it still hung. Then I did a forced reboot, let it boot back up fully just in case the restart command didn't reload the fix and then tried another Ctrl-Alt-Del but it still hung. So it appears this doesn't fix it.

I've also double confirmed that my Init.pm is the one specified:

root:/tmp# wget http://git.zentyal.org/zentyal.git/blob_plain/f3197f58684718c75e4a6fc400110a8ad942ed79:/main/core/src/EBox/Util/Init.pm
--2012-06-25 11:03:42--  http://git.zentyal.org/zentyal.git/blob_plain/f3197f58684718c75e4a6fc400110a8ad942ed79:/main/core/src/EBox/Util/Init.pm
Resolving git.zentyal.org (git.zentyal.org)... 217.70.188.63
Connecting to git.zentyal.org (git.zentyal.org)|217.70.188.63|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/x-perl]
Saving to: `Init.pm'

    [ <=>                                                                                                                             ] 5,020       28.3K/s   in 0.2s    

2012-06-25 11:03:44 (28.3 KB/s) - `Init.pm' saved [5020]

root:/tmp# diff Init.pm /usr/share/perl5/EBox/Util/Init.pm 
root:/tmp# 

I also manually inspected that the changes in that commit are indeed reflected in my Init.pm.

comment:13 Changed 12 months ago by jamor@…

Can you do a 'sudo /etc/init.d/zentyal network stop' ?. If this command works the problem is in another place.

comment:14 Changed 12 months ago by me@…

Did you see  comment 3? Isn't service zentyal stop the same thing?

comment:15 Changed 12 months ago by jamor@…

  • Status changed from reopened to accepted

Not exactly, 'service zentyal stop' is equivalent of '/etc/init.d/zentyal stop', this stops all zentyal services . '/etc/init.d/zentyal network stop' only stops network module.

The case is that I found that it was the network service whom hanged so I could reproduce more easily with just a network stop. I am asking to try you the network stop because if it succeeds, it could mean that you have another issue on your server.

comment:16 Changed 12 months ago by me@…

Sorry, misread. I just did /etc/init.d/zentyal network stop and it does indeed reproduce the problem. BTW, I'm now using debs built from git master (as of yesterday afternoon) for all my installed zentyal packages so this is still a problem.

comment:17 follow-up: ↓ 18 Changed 12 months ago by jamor@…

The branch with the fix has not been yet merged to the master branch, so not wonder it has the problem. Could you try the hotfix?.

comment:18 in reply to: ↑ 17 Changed 12 months ago by me@…

Replying to jamor@…:

The branch with the fix has not been yet merged to the master branch, so not wonder it has the problem. Could you try the hotfix?.

I  already did that.

comment:19 Changed 12 months ago by jamor@…

Hello R. Patterson,

the patch solved the issue for me. Maybe is that _additionally_ you had a lock problem. Could you remove the file 'run/shm/zentyal/redis.lock' and then do the network stop _with_ the patch?.

If it hangs again, could you attach the file '/var/log/zentyal/zentyal.log' and the output of 'ps auxwww'?

Regards,

Javier

comment:20 follow-up: ↓ 21 Changed 12 months ago by jamor@…

  • Status changed from accepted to closed
  • Resolution set to fixed

R.Patterson,

I have checked with another system that the fix works.

I think the problem is that I made a typo in the comment 10. The correct path to replace is /usr/share/perl5/EBox/Util/Init.pm and not /usr/share/perl5/EBox/UtilInit.pm . I have already fixed the commentary.

Regards,

Javier

comment:21 in reply to: ↑ 20 Changed 11 months ago by me@…

Replying to jamor@…:

I think the problem is that I made a typo in the comment 10. The correct path to replace is /usr/share/perl5/EBox/Util/Init.pm and not /usr/share/perl5/EBox/UtilInit.pm . I have already fixed the commentary.

Actually if you look my comment:12, I already corrected for that earlier.

comment:22 Changed 11 months ago by me@…

This is working for me with the latest releases.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.