]> git.proxmox.com Git - mirror_ovs.git/commitdiff
ovs-lib: Handle daemon segfaults during exit.
authorGurucharan Shetty <guru@ovn.org>
Fri, 18 Sep 2020 21:43:32 +0000 (14:43 -0700)
committerGurucharan Shetty <guru@ovn.org>
Tue, 22 Sep 2020 01:31:59 +0000 (18:31 -0700)
Currently, we terminate a daemon by trying
"ovs-appctl exit", "SIGTERM" and finally "SIGKILL".
But the logic fails if during "ovs-appctl exit", the
daemon crashes (segfaults). The monitor will automatically
restart the daemon with a new pid. The current logic of
checking the non-existance of old pid succeeds and we proceed
with the assumption that the daemon is dead.

This is a problem during OVS upgrades as we will continue
to run the older version of OVS.

With this commit, we take care of this situation. If there
is a segfault, the pidfile is not deleted. So, we wait a
little to give time for the monitor to restart the daemon
(which is usually instantaneous) and then re-read the pidfile.

VMware-BZ: #2633995
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
utilities/ovs-lib.in

index d646b444a40c46bed2d5d844a199a3b1fe3c3973..f7e97567406ab8554cb3a96cb758fad681f22575 100644 (file)
@@ -255,20 +255,36 @@ stop_daemon () {
             if version_geq "$version" "2.5.90"; then
                 actions="$graceful $actions"
             fi
+            actiontype=""
             for action in $actions; do
                 if pid_exists "$pid" >/dev/null 2>&1; then :; else
-                    return 0
+                    # pid does not exist.
+                    if [ -n "$actiontype" ]; then
+                        return 0
+                    fi
+                    # But, does the file exist? We may have had a daemon
+                    # segfault with `ovs-appctl exit`. Check one more time
+                    # before deciding that the daemon is dead.
+                    [ -e "$rundir/$1.pid" ] && sleep 2 && pid=`cat "$rundir/$1.pid"` 2>/dev/null
+                    if pid_exists "$pid" >/dev/null 2>&1; then :; else
+                        return 0
+                    fi
                 fi
                 case $action in
                     EXIT)
                         action "Exiting $1 ($pid)" \
                             ${bindir}/ovs-appctl -T 1 -t $rundir/$1.$pid.ctl exit $2
+                        # The above command could have resulted in delayed
+                        # daemon segfault. And if a monitor is running, it
+                        # would restart the daemon giving it a new pid.
                         ;;
                     TERM)
                         action "Killing $1 ($pid)" kill $pid
+                        actiontype="force"
                         ;;
                     KILL)
                         action "Killing $1 ($pid) with SIGKILL" kill -9 $pid
+                        actiontype="force"
                         ;;
                     FAIL)
                         log_failure_msg "Killing $1 ($pid) failed"