Ticket #4603 (closed defect: fixed)
Enabling network module causes shutdown to hang
| Reported by: | me@… | Owned by: | jamor@… |
|---|---|---|---|
| Milestone: | 3.0 | Component: | network |
| Severity: | normal | Keywords: | |
| Cc: |
Description
Under certain conditions a Zentyal server will fail to finish halting or rebooting. I've done quite a bit of troubleshooting to try and narrow it down. If I do "aptitude install zentyal-all" I can reproduce the problem and then I can fix it by doing "aptitude purge ~nzentyal". If I install just the Zentyal packages I want through the web admin and do the wizard configuration, reboot/halt works just fine. If, however, I enable the network module, reboot/halt hangs. If I then disable the network module, reboot/halt works again.
This was all done on a fresh install of Ubuntu Server 12.04 with the Zentyal 2.3 repositories added, running "aptitude install zentyal" and then installing the following components: antivirus, bandwidth monitor, file sharing and domain services, firewall, intrusion detection system, layer-7 filter, monitor, network configuration, printer sharing serrvice, traffic shaping, users and groups, and VPN service. The machine is an HP 2133 Mini-note with a Sabrent USB-G1000 USB ethernet adapter (ASIX AX88178 chipset).
Attachments
Change History
comment:2 Changed 12 months ago by me@…
Is there any documentation available about getting debugging information out of Zentyal itself? Can anyone give me any information about how to trace the module stopping/shutdown process so I can figure out where, more specifically, the freeze is happening?
comment:3 Changed 12 months ago by me@…
Manually shutting down zentyal with service zentyal stop (without shutting down the host) also reproduces a console hang.
comment:4 Changed 12 months ago by me@…
Doing ifdown for all interfaces works without any hang. If I reproduce the hang with service zentyal stop I see the following processes:
root 5580 0.0 0.1 13884 3340 tty1 S+ 13:56 0:00 sudo service zentyal stop ebox 5581 4.5 3.8 90480 68620 tty1 S+ 13:56 0:05 /usr/bin/perl /etc/init.d/zentyal stop ebox 6111 0.0 0.0 2216 284 tty1 S+ 13:57 0:00 sh -c /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd 2> /var/lib/zentyal/tmp/stderr root 6112 0.0 0.1 13880 3332 tty1 S+ 13:57 0:00 /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd root 6113 0.0 0.0 2216 512 tty1 S+ 13:57 0:00 sh /var/lib/zentyal/tmp/M4mgm0hjwk.cmd root 6148 0.0 0.0 2160 512 tty1 S 13:57 0:00 /sbin/ifdown --force -i /etc/network/interfaces eth0 root 6168 0.0 0.0 2216 284 tty1 S 13:57 0:00 /bin/sh -c dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0 root 6169 0.0 0.0 2908 1160 tty1 S 13:57 0:00 dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0 root 6170 0.0 0.0 3440 1484 tty1 S 13:57 0:00 /bin/bash /sbin/dhclient-script ebox 6205 2.7 2.5 55748 45160 tty1 S 13:57 0:02 /usr/bin/perl /usr/share/zentyal-network/dhcp-clear.pl eth0
If I kill the dhcp-clear.pl eth0 process, the original console is still hung. If I look at the processes again, I see:
root 5580 0.0 0.1 13884 3340 tty1 S+ 13:56 0:00 sudo service zentyal stop ebox 5581 3.0 3.8 90480 68620 tty1 S+ 13:56 0:05 /usr/bin/perl /etc/init.d/zentyal stop ebox 6111 0.0 0.0 2216 284 tty1 S+ 13:57 0:00 sh -c /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd 2> /var/lib/zentyal/tmp/stderr root 6112 0.0 0.1 13880 3332 tty1 S+ 13:57 0:00 /usr/bin/sudo -p sudo: /var/lib/zentyal/tmp/M4mgm0hjwk.cmd root 6113 0.0 0.0 2216 512 tty1 S+ 13:57 0:00 sh /var/lib/zentyal/tmp/M4mgm0hjwk.cmd root 6148 0.0 0.0 2160 512 tty1 S 13:57 0:00 /sbin/ifdown --force -i /etc/network/interfaces eth0 root 6168 0.0 0.0 2216 284 tty1 S 13:57 0:00 /bin/sh -c dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0 root 6169 0.0 0.0 2908 1160 tty1 S 13:57 0:00 dhclient3 -r -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0 root 6170 0.0 0.0 3440 1500 tty1 S 13:57 0:00 /bin/bash /sbin/dhclient-script ebox 6463 10.1 2.6 57996 47368 tty1 S 13:59 0:03 /usr/bin/perl /usr/share/zentyal-firewall/dhcp-firewall.pl eth0
Then if I kill the dhcp-firewall.pl eth0 process, the console is no longer hung and the service zentyal stop command exits.
Changed 12 months ago by me@…
-
attachment
dhcp-clear.strace
added
strace /usr/share/zentyal-network/dhcp-clear.pl eth0
comment:6 Changed 12 months ago by me@…
I can reproduce the problem by running /usr/share/zentyal-network/dhcp-clear.pl eth0 so I ran it through strace and it looks like it's hanging trying to get the redis lock:
...
open("/run/shm/zentyal/redis.lock", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 10
ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf949328) = -1 ENOTTY (Inappropriate ioctl for device)
_llseek(10, 0, [0], SEEK_CUR) = 0
fstat64(10, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
flock(10, LOCK_EX
Might something not be releasing that lock?
I also attached a more complete strace in case it's helpful.
comment:7 Changed 12 months ago by me@…
Here's the processes who have that lock file open:
root:/usr/share/zentyal-network# lsof /run/shm/zentyal/redis.lock COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME zentyal 782 ebox 9wW REG 0,27 0 9718 /run/shm/zentyal/redis.lock dhcp-clea 1227 ebox 8w REG 0,27 0 9718 /run/shm/zentyal/redis.lock manage-lo 1256 ebox 7w REG 0,27 0 9718 /run/shm/zentyal/redis.lock
Looks like zentyal itself has the write lock, specifically /usr/bin/perl /etc/init.d/zentyal stop the initial stop command itself.
comment:8 Changed 12 months ago by jamor@…
Hello,
we will look into this in the next days, but if you have your system frozen you could try to remove the redis.lock file by hand
comment:9 Changed 12 months ago by me@…
Removing the redis.lock file doesn't un-hang service zentyal stop
comment:10 Changed 12 months ago by jamor@…
- Status changed from accepted to closed
- Resolution set to fixed
Fixed in [f3197f5]
If you want to hotfix your system:
- Download the new version of Init.pm from http://git.zentyal.org/zentyal.git/blob_plain/f3197f58684718c75e4a6fc400110a8ad942ed79:/main/core/src/EBox/Util/Init.pm
- Use it to replace /usr/share/perl5/EBox/Util/Init.pm
- Execute "sudo /etc/init.d/zental apache restart"
Thanks for taking time to report this issue.
Javier
comment:11 Changed 12 months ago by jamor@…
By the way the key which enables the debug information in zentyal is located in /etc/zentyal/zentyal.conf but it is already enabled in beta installations
comment:12 Changed 12 months ago by me@…
- Status changed from closed to reopened
- Resolution fixed deleted
I applied the hotfix as described and did sudo /etc/init.d/zentyal apache restart and then did a Ctrl-Alt-Del but it still hung. Then I did a forced reboot, let it boot back up fully just in case the restart command didn't reload the fix and then tried another Ctrl-Alt-Del but it still hung. So it appears this doesn't fix it.
I've also double confirmed that my Init.pm is the one specified:
root:/tmp# wget http://git.zentyal.org/zentyal.git/blob_plain/f3197f58684718c75e4a6fc400110a8ad942ed79:/main/core/src/EBox/Util/Init.pm
--2012-06-25 11:03:42-- http://git.zentyal.org/zentyal.git/blob_plain/f3197f58684718c75e4a6fc400110a8ad942ed79:/main/core/src/EBox/Util/Init.pm
Resolving git.zentyal.org (git.zentyal.org)... 217.70.188.63
Connecting to git.zentyal.org (git.zentyal.org)|217.70.188.63|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/x-perl]
Saving to: `Init.pm'
[ <=> ] 5,020 28.3K/s in 0.2s
2012-06-25 11:03:44 (28.3 KB/s) - `Init.pm' saved [5020]
root:/tmp# diff Init.pm /usr/share/perl5/EBox/Util/Init.pm
root:/tmp#
I also manually inspected that the changes in that commit are indeed reflected in my Init.pm.
comment:13 Changed 12 months ago by jamor@…
Can you do a 'sudo /etc/init.d/zentyal network stop' ?. If this command works the problem is in another place.
comment:14 Changed 12 months ago by me@…
Did you see comment 3? Isn't service zentyal stop the same thing?
comment:15 Changed 12 months ago by jamor@…
- Status changed from reopened to accepted
Not exactly, 'service zentyal stop' is equivalent of '/etc/init.d/zentyal stop', this stops all zentyal services . '/etc/init.d/zentyal network stop' only stops network module.
The case is that I found that it was the network service whom hanged so I could reproduce more easily with just a network stop. I am asking to try you the network stop because if it succeeds, it could mean that you have another issue on your server.
comment:16 Changed 12 months ago by me@…
Sorry, misread. I just did /etc/init.d/zentyal network stop and it does indeed reproduce the problem. BTW, I'm now using debs built from git master (as of yesterday afternoon) for all my installed zentyal packages so this is still a problem.
comment:17 follow-up: ↓ 18 Changed 12 months ago by jamor@…
The branch with the fix has not been yet merged to the master branch, so not wonder it has the problem. Could you try the hotfix?.
comment:18 in reply to: ↑ 17 Changed 12 months ago by me@…
Replying to jamor@…:
The branch with the fix has not been yet merged to the master branch, so not wonder it has the problem. Could you try the hotfix?.
comment:19 Changed 12 months ago by jamor@…
Hello R. Patterson,
the patch solved the issue for me. Maybe is that _additionally_ you had a lock problem. Could you remove the file 'run/shm/zentyal/redis.lock' and then do the network stop _with_ the patch?.
If it hangs again, could you attach the file '/var/log/zentyal/zentyal.log' and the output of 'ps auxwww'?
Regards,
Javier
comment:20 follow-up: ↓ 21 Changed 12 months ago by jamor@…
- Status changed from accepted to closed
- Resolution set to fixed
R.Patterson,
I have checked with another system that the fix works.
I think the problem is that I made a typo in the comment 10. The correct path to replace is /usr/share/perl5/EBox/Util/Init.pm and not /usr/share/perl5/EBox/UtilInit.pm . I have already fixed the commentary.
Regards,
Javier
comment:21 in reply to: ↑ 20 Changed 11 months ago by me@…
Replying to jamor@…:
I think the problem is that I made a typo in the comment 10. The correct path to replace is /usr/share/perl5/EBox/Util/Init.pm and not /usr/share/perl5/EBox/UtilInit.pm . I have already fixed the commentary.
Actually if you look my comment:12, I already corrected for that earlier.
comment:22 Changed 11 months ago by me@…
This is working for me with the latest releases.