While using a MQ + NETEM setup, I had confirmation that the default
timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
queues) :
EST="est 1sec 4sec"
for ETH in eth1
do
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
done
We can see that timers get migrated into a single cpu, presumably idle
at the time timers are set up.
Then all qdisc dequeues run from this cpu and huge lock contention
happens. This single cpu is stuck in softirq mode and cannot dequeue
fast enough.