Ferenc Wágner [Sat, 27 Jan 2018 22:22:45 +0000 (23:22 +0100)]
build: sanitize LDFLAGS handling
The Automake manual states that according to the GNU Coding Standards
the so-called "Variables reserved for the user" mustn't be changed by
the build system, and they must override the default settings from the
build system. The "Flag Variables Ordering" section provides the
recipes on which we build here.
Using $lt_prog_compiler_pic directly shouldn't be necessary, let's try
leaving it out.
Use LD_LIBRARY_PATH to find the build tree artifacts and disable RPATH across the project
The libtool wrapper scripts of the test binaries set LD_LIBRARY_PATH to
find the libknet library and its modules in the build tree, but RPATH
overrides this setting. This is why RPATH is deprecated, so we switch
to RUNPATH instead by using --enable-new-dtags, which is already the
linker default on Debian for example.
Based on the original patch from Ferenc.
Signed-off-by: Ferenc Wágner <wferi@debian.org> Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
[PMTUd] drop (now) unnecessary and dangerous usleep
prior to all threads being able to notify PMTUd of EMSGSIZE errors,
we had this random usleep in there to have time to collect data.
It was working at the time, but it's a bad idea.
On super large clusters (>66 nodes) with 4 links on each node, when
applying heavy load (cpghum on all nodes at once), the average latency
between nodes can increase so much that the PMTUd thread usleep
could literally block corosync for seconds at a time.
Drop the usleep and live happily ever after
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
[PTMUd] if any threads receives a EMSGSIZE outside of a PMTUD run, force PTMUd to run
Scenario:
node X has MTU Y on the interface and application is sending packets with size >= Y.
The interface MTU is suddenly reduced to < Y
Before this change, the kernel would be dropping packets till the next PMTUd run.
After this change, the PMTUd will be informed that it has to rerun (overriding
the pmtud_interval), reducing the packet drop to a minimum.
How to test:
force knet_bench to send 1500 size packets with ping_data (requires code change)
and start it.
reduce MTU on the interface from 1500 to 1300 (for example)
Notice an immediate trigger of PMTUd run in debug mode
Note: going up, from 1300 to 1500 will wait for the next PMTUd re-run as there
is no immediate way to detect this change unless we start listening to kernel
netlink sockets with libnl3 (investigation in progress but not critical enough atm).
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
[PMTUd] Use kernel MTU information to determine next packet size during discovery
Using this information we can, for good links (*), determine and verify the link
MTU with 2 packets.
* good links means:
node X has MTU Y configured on a given interface. Any network object between node X
and destination is capable of handling MTU >= Y.
In no case the kernel will allow us to send packets > Y.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
to test this fix is necessary to use knet_bench or corosync with openssl
and lots of heavy load (perf benchmark) workload. Sooner or later
the application will crash with some random tracebacks.
after this patch, the crash cannot be reproduced anymore.
tested using 9 nodes x2 active/active links all running corosync + cpghum
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
PMTUd can take a long time to release the global read lock, mostly due
to the pthread_cond_timedwait required to ack/nack packets from the
other hosts. This delay could block any wrlock operation for several seconds
if not more.
The solution:
each call to the global pthread_rwlock_wrlock has been changed to a wrapper
that will notify the PMTUd to interrupt its operations (and restart) first,
then get a global write lock that is queued as soon as PMTUd is going out.
This solution also improves a lot shutdown speed.
How to test:
This is not super simple to test and verify. I used 2 VMs with known MTU of
1500. Start knet_bench on both (normal ping_data -C is more than enough).
Once they have established data exchange, change the MTU on one of the nodes
to 1600 (or higher). This should guarantee that the PMTUd process will take
a very long time to complete.
First verify that the PMTUd process takes several seconds.
Once the next PMTUd run starts, hit ctrl+c on the node that is executing
the PMTUd and the process should exit much faster than before this patch.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
[PMTUd] fix multiple issues and stability problems
- resolve locking issue with thread_heartbeat that was causing
spurious up/down link event.
In the event of a PMTUd run taking too long, the heartbeat
thread could hang for much longer than ping_timeout.
Use backoff_mutex to sync between threads instead of the global lock.
- pause the DATA tx thread when sending any PMTUd related packets.
Similar method as knet_send_sync, using the tx_mutex, allows a much
more stable communication between nodes without any visible performance
hit.
- calculate higher timeouts when using crypto to improve stability
- fix an odd race condition with the kernel where, during a single PMTUd run,
the same packet size was marked both BAD and GOOD (via EMSGSIZE) by the
kernel. That situation would cause our PMTUd to run away and calculate
bad values.
- add a minor usleep between sending PMTUd packets to give time to the
kernel to make its own mind about the link PMTU. This is based on average
latency.
- since PMTUd can take several seconds before completion, use the "end time"
to record the last run vs the start time.
- fix a major issue in sending PMTUd reply where an errno was not being
passed down the link layer and would cause the RX thread to block forever.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Bin Liu [Thu, 7 Dec 2017 06:48:57 +0000 (14:48 +0800)]
allow to choose whether to build debuginfo packages or not
1. If not "--enable-rpm-debuginfo" or "--disable-rpm-debuginfo" is
specified, follow the system default behavior
2. If set, then build debuginfo package or not build based on the
flag.