Lot's of the GR topotests kill daemons in order to test code
that deals with crashing daemons. Under heavy system load
it was noticed that a kill command was sent and if told to
wait we would sleep 2 seconds send another kill command and
call it good. This was causiing issues when subsuquent
json commands would get errors like `lost connection to daemon`
as the daemon finally shut down after some time due to load.
Modify the kill the daemon function to notice that the daemon
was not actually killed and if we need to wait wait some
more time for it too happen
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
self.cmd("kill -9 %s" % daemonpid)
if pid_exists(int(daemonpid)):
numRunning += 1
- if wait and numRunning > 0:
+ while wait and numRunning > 0:
sleep(
2,
"{}: waiting for {} daemon to be stopped".format(
)
)
self.cmd("kill -9 %s" % daemonpid)
- self.cmd("rm -- {}".format(d.rstrip()))
+ if daemonpid.isdigit() and not pid_exists(
+ int(daemonpid)
+ ):
+ numRunning -= 1
+ self.cmd("rm -- {}".format(d.rstrip()))
if wait:
errors = self.checkRouterCores(reportOnce=True)
if self.checkRouterVersion("<", minErrorVersion):