]> git.proxmox.com Git - mirror_kronosnet.git/log
mirror_kronosnet.git
4 years agoMerge pull request #279 from kronosnet/nitpick
Fabio M. Di Nitto [Mon, 27 Jan 2020 09:08:37 +0000 (10:08 +0100)]
Merge pull request #279 from kronosnet/nitpick

[udp] simplify code (same logic)

4 years ago[udp] simplify code (same logic)
Fabio M. Di Nitto [Sat, 25 Jan 2020 05:26:28 +0000 (06:26 +0100)]
[udp] simplify code (same logic)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #278 from kronosnet/dont-spin-enetunreach
Fabio M. Di Nitto [Fri, 24 Jan 2020 12:17:55 +0000 (13:17 +0100)]
Merge pull request #278 from kronosnet/dont-spin-enetunreach

[udp] don't make socket spin if a network I/F is down

4 years ago[udp] don't make socket spin if a network I/F is down
Christine Caulfield [Fri, 24 Jan 2020 09:33:50 +0000 (09:33 +0000)]
[udp] don't make socket spin if a network I/F is down

UDP treats ENETUNREACH as a temporary error and just retries,
but this causes the TX thread to spin just doing sendto() therefore
blocking all other traffic.

(To reproduce this try starting corosync with 2 links configured in
corosync.conf but only one of them configured to the 'right' address
- it will spin in a tight loop and need to be killed with -9)

SCTP does not seem to suffer from this.

4 years agoMerge pull request #277 from kronosnet/gcc10
Fabio M. Di Nitto [Wed, 22 Jan 2020 07:48:49 +0000 (08:48 +0100)]
Merge pull request #277 from kronosnet/gcc10

Fix errors detected by gcc10

4 years ago[nozzle] use interface name size consistently and drop strncpy in favour of memmove
Fabio M. Di Nitto [Wed, 22 Jan 2020 04:39:17 +0000 (05:39 +0100)]
[nozzle] use interface name size consistently and drop strncpy in favour of memmove

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[host] use KNET_MAX_HOST_LEN consistently
Fabio M. Di Nitto [Wed, 22 Jan 2020 04:17:39 +0000 (05:17 +0100)]
[host] use KNET_MAX_HOST_LEN consistently

detected by gcc10

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #274 from kronosnet/ppc-clang
Fabio M. Di Nitto [Wed, 20 Nov 2019 10:42:51 +0000 (11:42 +0100)]
Merge pull request #274 from kronosnet/ppc-clang

[tests] mark array as static

4 years ago[tests] mark array as static
Fabio M. Di Nitto [Wed, 20 Nov 2019 08:47:39 +0000 (09:47 +0100)]
[tests] mark array as static

fixes an odd segfault when running the test on ppc when built with clang

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #273 from kronosnet/wferi-patch-1
Fabio M. Di Nitto [Mon, 4 Nov 2019 07:35:16 +0000 (08:35 +0100)]
Merge pull request #273 from kronosnet/wferi-patch-1

[handle] fix typo in error log message

4 years ago[handle] fix typo in error log message
wferi [Sun, 3 Nov 2019 08:22:38 +0000 (09:22 +0100)]
[handle] fix typo in error log message

4 years agoMerge pull request #272 from kronosnet/cov-scan-errors
Fabio M. Di Nitto [Tue, 29 Oct 2019 13:25:03 +0000 (14:25 +0100)]
Merge pull request #272 from kronosnet/cov-scan-errors

[handle] make sure to unlock config handle on failure

4 years ago[handle] make sure to unlock config handle on failure
Fabio M. Di Nitto [Tue, 29 Oct 2019 12:20:55 +0000 (13:20 +0100)]
[handle] make sure to unlock config handle on failure

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #270 from kronosnet/bsd-build-fix
Fabio M. Di Nitto [Sun, 27 Oct 2019 14:08:00 +0000 (15:08 +0100)]
Merge pull request #270 from kronosnet/bsd-build-fix

[build] fix openssl version detection when not using pkg-config

4 years ago[build] fix openssl version detection when not using pkg-config
Fabio M. Di Nitto [Sun, 27 Oct 2019 05:42:54 +0000 (06:42 +0100)]
[build] fix openssl version detection when not using pkg-config

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #268 from kronosnet/netload-fixes
Fabio M. Di Nitto [Wed, 23 Oct 2019 13:40:27 +0000 (15:40 +0200)]
Merge pull request #268 from kronosnet/netload-fixes

Netload fixes

4 years ago[RX] silence defrag buffer expiration debug error
Fabio M. Di Nitto [Sat, 19 Oct 2019 07:05:16 +0000 (09:05 +0200)]
[RX] silence defrag buffer expiration debug error

when using active-active links, it is simply too noisy and
doesn't provide very useful information.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[RX] handle short write to the application properly
Fabio M. Di Nitto [Sat, 19 Oct 2019 06:47:27 +0000 (08:47 +0200)]
[RX] handle short write to the application properly

this change affects only applications that are not using knet
generated socketpairs to deliver/receive data to/from knet.

If an application uses a fd that is not SOCK_SEQPACKET (basically
streaming), we have to handle short writes accordingly, and knet
will continue delivering as long as there is progress.

The application is responsible to verify that the data packet
is complete as the delivery is not guaranteed to be complete.
The application can either embed the size of the packet in their
data structure or use the socket error notification callback
that will be invoked in case of errors or 0 data delivery.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[TX] discard too big packets when reading from socketpairs
Fabio M. Di Nitto [Fri, 18 Oct 2019 09:17:57 +0000 (11:17 +0200)]
[TX] discard too big packets when reading from socketpairs

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[RX] Discard incoming packets if knet cannot reply back.
Fabio M. Di Nitto [Fri, 18 Oct 2019 08:41:49 +0000 (10:41 +0200)]
[RX] Discard incoming packets if knet cannot reply back.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #266 from kronosnet/wferi/newline
Fabio M. Di Nitto [Fri, 18 Oct 2019 08:20:32 +0000 (10:20 +0200)]
Merge pull request #266 from kronosnet/wferi/newline

[test] append newline to knet_send timeout message

4 years ago[test] append newline to knet_send timeout message
Ferenc Wágner [Fri, 18 Oct 2019 06:38:04 +0000 (08:38 +0200)]
[test] append newline to knet_send timeout message

4 years agoMerge pull request #265 from kronosnet/netload-fixes
Fabio M. Di Nitto [Wed, 16 Oct 2019 07:04:08 +0000 (09:04 +0200)]
Merge pull request #265 from kronosnet/netload-fixes

[test] add packet verification option to knet_bench

4 years ago[test] add packet verification option to knet_bench
Fabio M. Di Nitto [Wed, 16 Oct 2019 06:10:23 +0000 (08:10 +0200)]
[test] add packet verification option to knet_bench

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #264 from kronosnet/netload-fixes
Fabio M. Di Nitto [Tue, 15 Oct 2019 14:17:18 +0000 (16:17 +0200)]
Merge pull request #264 from kronosnet/netload-fixes

Netload fixes

4 years ago[PMTUd] invalidate MTU for a link if the value is lower than minimum
Fabio M. Di Nitto [Tue, 15 Oct 2019 09:53:56 +0000 (11:53 +0200)]
[PMTUd] invalidate MTU for a link if the value is lower than minimum

Under heavy network load and packet loss, calculated MTU can be
too small. In that case we need to invalidate the link mtu,
that would remove the link from the rotation (and traffic) and
would give PMTUd time to get the right MTU in the next round.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[test] add ability to knet_bench to specify a fixed packet size for perf test
Fabio M. Di Nitto [Tue, 15 Oct 2019 05:16:22 +0000 (07:16 +0200)]
[test] add ability to knet_bench to specify a fixed packet size for perf test

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[rx] copy data into the defrag buffer only if we know the size of the frame
Fabio M. Di Nitto [Tue, 15 Oct 2019 05:02:05 +0000 (07:02 +0200)]
[rx] copy data into the defrag buffer only if we know the size of the frame

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[host] fix defrag buffers reclaim logic
Fabio M. Di Nitto [Tue, 15 Oct 2019 04:53:24 +0000 (06:53 +0200)]
[host] fix defrag buffers reclaim logic

The problem:

- let's assume a 2 nodes (A and B) cluster setup
- node A sends fragmented packets to node B and there is
  packet loss on the network.
- node B receives all those fragments and attempts to
  reassemble them.
- node A sends packet seq_num X in Y fragments.
- node B receives only part of the fragments and stores
  them in a defrag buf.
- packet loss stops.
- node A continues to send packets and a seq_num
  roll-over takes place.
- node A sends a new packet seq_num X in Y fragments.
- node B gets confused here because the parts of the old
  packet seq_num X are still stored and the buffer
  has not been reclaimed.
- node B continues to rebuild packet seq_num X with
  old stale data and new data from after the roll-over.
- node B completes reassembling the packet and delivers
  junk to the application.

The solution:

Add a much stronger buffer reclaim logic that will apply
on each received packet and not only when defrag buffers
are needed, as there might be a mix of fragmented and not
fragmented packets in-flight.

The new logic creates a window of N packets that can be
handled at the same time (based on the number of buffers)
and clear everything else.

Fixes https://github.com/kronosnet/kronosnet/issues/261

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[host] rename variables to make it easier to read the code
Fabio M. Di Nitto [Tue, 15 Oct 2019 04:46:36 +0000 (06:46 +0200)]
[host] rename variables to make it easier to read the code

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #263 from kronosnet/runtime-debug
Fabio M. Di Nitto [Wed, 9 Oct 2019 10:45:56 +0000 (12:45 +0200)]
Merge pull request #263 from kronosnet/runtime-debug

[build] add --with-sanitizers= option for sanitizer builds

4 years ago[build] add --with-sanitizers= option for sanitizer builds
Fabio M. Di Nitto [Wed, 9 Oct 2019 08:28:14 +0000 (10:28 +0200)]
[build] add --with-sanitizers= option for sanitizer builds

this option is stricly meant for runtime debugging purposes.
do NOT use in production.

check gcc/clang man pages on how to use ASAN/UBSAN/TSAN.

Also allow users to specificy SANITIZERS_CFLAGS and SANITIZERS_LDFLAGS
for advanced use.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #262 from ThomasLamprecht/fix-doxyxml-overflow
Fabio M. Di Nitto [Wed, 9 Oct 2019 05:35:55 +0000 (07:35 +0200)]
Merge pull request #262 from ThomasLamprecht/fix-doxyxml-overflow

doxyxml: print_param: fix heap-buffer-overflow on read

4 years agodoxyxml: print_param: fix heap-buffer-overflow on read
Thomas Lamprecht [Tue, 8 Oct 2019 15:09:07 +0000 (17:09 +0200)]
doxyxml: print_param: fix heap-buffer-overflow on read

in read_struct we can get the pi->paramtype assigned with:
> pi->paramtype = type?strdup(type):strdup("");

And in print_param we then always check the last character by getting
the strlen and subtracting one. But in the case where either type was
NULL and we assigned an empty string, or type wasn't null but
pointing to an empty string we ran into an read-heap-buffer-overflow
as here strlen is zero, and so we the first if branch evaluated to
> if (pi->paramtype[-1] == '*') {
which isn't valid. Depending on the OS, protection of surrounding
area due to said OS or the compiler, this can crash the program.

Similar issue was the case for the next check for double pointers,
here for all strings with strlen < 2.

To solve this get the strlen early and check if we cannot underflow
before doing the real read.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
4 years agoMerge pull request #260 from kronosnet/test-suite
Fabio M. Di Nitto [Thu, 26 Sep 2019 10:17:36 +0000 (12:17 +0200)]
Merge pull request #260 from kronosnet/test-suite

[tests] add common function to sleep based on how the test suite is r…

4 years ago[tests] add common function to sleep based on how the test suite is running
Fabio M. Di Nitto [Thu, 26 Sep 2019 05:18:46 +0000 (07:18 +0200)]
[tests] add common function to sleep based on how the test suite is running

Address issue while waiting for host to be up and PMTUd first run.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #258 from kronosnet/wferi/fixes
Fabio M. Di Nitto [Wed, 25 Sep 2019 07:21:59 +0000 (09:21 +0200)]
Merge pull request #258 from kronosnet/wferi/fixes

Assorted small fixups

4 years agoFix typo: trasport -> transport
Ferenc Wágner [Wed, 29 May 2019 09:42:08 +0000 (11:42 +0200)]
Fix typo: trasport -> transport

Signed-off-by: Ferenc Wágner <wferi@debian.org>
4 years agotests: skip the SCTP test if SCTP is not supported by the kernel
Ferenc Wágner [Wed, 3 Apr 2019 08:26:11 +0000 (10:26 +0200)]
tests: skip the SCTP test if SCTP is not supported by the kernel

For example, module loading is disabled on Debian build daemons.
(In the vein of c5aa1c3343703455b480cef5c173f471e1bb020f.)

Signed-off-by: Ferenc Wágner <wferi@debian.org>
4 years agoMerge pull request #257 from kronosnet/netload-fixes
Fabio M. Di Nitto [Thu, 19 Sep 2019 11:32:02 +0000 (13:32 +0200)]
Merge pull request #257 from kronosnet/netload-fixes

[links] fix memory corryption of link structure

4 years ago[links] fix memory corryption of link structure
Fabio M. Di Nitto [Thu, 19 Sep 2019 07:02:44 +0000 (09:02 +0200)]
[links] fix memory corryption of link structure

the index would overflow the buffer and overwrite data in the link
structure. Depending on what was written the cluster could fall
apart in many ways, from crashing, to hung.

Fixes: https://github.com/kronosnet/kronosnet/issues/255
thanks to the proxmox developers and community for reporting the issue
and for all the help reproducing / debugging the problem.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #254 from kronosnet/test-fixes
Fabio M. Di Nitto [Fri, 13 Sep 2019 11:09:00 +0000 (13:09 +0200)]
Merge pull request #254 from kronosnet/test-fixes

Test fixes

4 years ago[tests] give PMTUd more time to redetect MTU
Fabio M. Di Nitto [Fri, 13 Sep 2019 05:30:06 +0000 (07:30 +0200)]
[tests] give PMTUd more time to redetect MTU

Ideal fix would be to use PMTUd callback, but that requires a lot of
extra test infrastructure. For now just workaround the problem.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[tests] fix ip generation boundaries
Fabio M. Di Nitto [Fri, 13 Sep 2019 05:28:55 +0000 (07:28 +0200)]
[tests] fix ip generation boundaries

https://ci.kronosnet.org/job/knet-build-all-voting/1450/knet-build-all-voting=rhel80z-s390x/console

and similar, when pid = 255, the secondary IP would hit 256 that is of course invalid.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #253 from kronosnet/bsd-fixes
Fabio M. Di Nitto [Thu, 12 Sep 2019 15:55:03 +0000 (17:55 +0200)]
Merge pull request #253 from kronosnet/bsd-fixes

[nozzle] fix tapX range on newer FreeBSD

4 years ago[nozzle] fix tapX range on newer FreeBSD
Fabio M. Di Nitto [Thu, 12 Sep 2019 04:01:38 +0000 (06:01 +0200)]
[nozzle] fix tapX range on newer FreeBSD

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #252 from kronosnet/lock-fix
Fabio M. Di Nitto [Tue, 10 Sep 2019 07:19:43 +0000 (09:19 +0200)]
Merge pull request #252 from kronosnet/lock-fix

[pmtud] switch to use async version of dstcache update due to locking…

4 years ago[pmtud] switch to use async version of dstcache update due to locking context (read...
Fabio M. Di Nitto [Mon, 9 Sep 2019 13:11:25 +0000 (15:11 +0200)]
[pmtud] switch to use async version of dstcache update due to locking context (read vs write)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #251 from kronosnet/latency-fixes
Fabio M. Di Nitto [Mon, 9 Sep 2019 03:35:59 +0000 (05:35 +0200)]
Merge pull request #251 from kronosnet/latency-fixes

[links] stabilize latency calculation when nodes are not responsive

4 years ago[links] stabilize latency calculation when nodes are not responsive
Fabio M. Di Nitto [Fri, 6 Sep 2019 05:05:19 +0000 (07:05 +0200)]
[links] stabilize latency calculation when nodes are not responsive

The following scenario is more of a corner case than normal, but
this change allows to better deal with this situation:

1) 2 nodes cluster (corosync) (node A and node B)
2) kill -stop $(pidof corosync) on node A
3) node B will continue to send ping packets to node A
4) node A is accumulating those ping packets in the kernel network socket
5) wait some seconds and unpause node A
6) node A will start processing the ping packets in the queue
   and send pong replies to node B
7) node B will see an extreme increase of latency due
   those "obsoleted" ping/pong packets
8) node B, as latency increases, will take longer and longer
   to notice that node A is down due to the pong_timeout adjustment
   for latency (required for initial cluster spike).

the solution:

1) Use average latency to calculate pong_timeout_adj vs latency_max.
   Averate latency will go down again in time, while latency_max is never
   reset.

2) RX thread will filter out all pong packets that have higher latency
   than currently configure pong_timeout. This barrier should have
   been in place even before.

this solution reduces the latency spike on node B to a perfectly
reasonable level and it will all eventually stabilize over time
as latency samples increase and latency will reduce.

Please be aware that using a pong_timeout smaller than latency will
simply mark the link down now.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #250 from jfriesse/musl_fix
Fabio M. Di Nitto [Tue, 3 Sep 2019 10:12:28 +0000 (12:12 +0200)]
Merge pull request #250 from jfriesse/musl_fix

Fix compilation and running on Linux distribution with musl libc

4 years ago[handle] Set thread stack size on create
Jan Friesse [Mon, 2 Sep 2019 11:56:34 +0000 (13:56 +0200)]
[handle] Set thread stack size on create

Musl libc has small stack size for threads. Knet needs ~300KiB (tested
at the time when this patch was created). Glibc seems to use ~8MiB. As a
compromise, 1MiB is used.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years ago[common] Conditionalize RTLD_DI_ORIGIN
Jan Friesse [Mon, 2 Sep 2019 09:11:27 +0000 (11:11 +0200)]
[common] Conditionalize RTLD_DI_ORIGIN

RTLD_DI_ORIGIN is used to get absolute path of plugin. It is used only
for logging useful info and not strictly needed, so use it only when it
is defined (only musl is known to author of the patch)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years ago[common] Include correct errno.h
Jan Friesse [Mon, 2 Sep 2019 08:05:18 +0000 (10:05 +0200)]
[common] Include correct errno.h

sys/errno.h is system-specific path and errno.h should be used instead.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agoMerge pull request #248 from jfriesse/fix-prio-description
Fabio M. Di Nitto [Tue, 27 Aug 2019 07:25:20 +0000 (09:25 +0200)]
Merge pull request #248 from jfriesse/fix-prio-description

[man] Fix priority description of POLICY_PASSIVE

4 years ago[man] Fix priority description of POLICY_PASSIVE
Jan Friesse [Mon, 26 Aug 2019 13:41:23 +0000 (15:41 +0200)]
[man] Fix priority description of POLICY_PASSIVE

... to match source code.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
4 years agoMerge pull request #245 from kronosnet/pmtud-fixes
Fabio M. Di Nitto [Wed, 21 Aug 2019 04:17:33 +0000 (06:17 +0200)]
Merge pull request #245 from kronosnet/pmtud-fixes

[PMTUd] rework the whole math to calculate MTU

4 years ago[PMTUd] add ability to manually override MTU and disable PMTUd
Fabio M. Di Nitto [Tue, 20 Aug 2019 04:57:45 +0000 (06:57 +0200)]
[PMTUd] add ability to manually override MTU and disable PMTUd

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[PMTUd] add dynamic pong timeout when using crypto
Fabio M. Di Nitto [Tue, 13 Aug 2019 04:41:32 +0000 (06:41 +0200)]
[PMTUd] add dynamic pong timeout when using crypto

problem originally reported by proxmox community, users
observed that under pressure the MTU would flap back and forth
between 2 values due to other node response timeout.

implement a dynamic timeout multiplier when using crypto that
should solve the problem in a more flexible fashion.

When a timeout hits, those new logs will show:

[knet]: [info] host: host: 1 (passive) best link: 0 (pri: 0)
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (4) for host 1 link: 0
[knet]: [info] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 65429
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
[knet]: [info] pmtud: Global data MTU changed to: 65429
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (8) for host 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (16) for host 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (32) for host 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (64) for host 1 link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (128) for host 1 link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429

and when the latency reduces and it is safe to be more responsive again:

[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Decreasing PMTUd response timeout multiplier to (64) for host 1 link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429

....

testing this patch on normal hosts is a bit challenging tho.

Patch was tested by hardcoding a super low timeout here:

diff --git a/libknet/threads_pmtud.c b/libknet/threads_pmtud.c
index 4f0ba0f..5e2b89b 100644
--- a/libknet/threads_pmtud.c
+++ b/libknet/threads_pmtud.c
@@ -261,7 +271,8 @@ retry:
                        /*
                         * crypto, under pressure, is a royal PITA
                         */
-                       pong_timeout_adj_tmp = dst_link->pong_timeout_adj * 2;
+                       //pong_timeout_adj_tmp = dst_link->pong_timeout_adj * dst_link->pmtud_crypto_timeout_multiplier;
+                       pong_timeout_adj_tmp = 30 * dst_link->pmtud_crypto_timeout_multiplier;
                } else {
                        pong_timeout_adj_tmp = dst_link->pong_timeout_adj;
                }

and using a long running version of api_knet_send_crypto_test with a short PMTUd setfreq (10 sec).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[PMTUd] rework the whole math to calculate MTU
Fabio M. Di Nitto [Mon, 12 Aug 2019 14:52:59 +0000 (16:52 +0200)]
[PMTUd] rework the whole math to calculate MTU

internal changes:
- drop the concept of sec_header_size that was completely wrong
  and unnecessary
- bump crypto API to version 3 due to the above change
- clarify the difference between link->proto_overhead and
  link->status->proto_overhead. We cannot rename the status
  one as it would also change ABI.
- add onwire.c with documentation on the packet format
  and what various len(s) mean in context.
- add 3 new functions to calculate MTUs back and forth
  and use them around, hopefully with enough clarification
  on why things are done in a given way.
- heavily change thread_pmtud.c to use those new facilities.
- fix major calculation issues when using crypto (non-crypto
  was not affected by the problem).
- fix checks around to make sure they match the new math.
- fix padding calculation.
- add functional PMTUd crypto test
  this test can take several hours (12+) and should be executed
  on a controlled environment since it automatically changes
  loopback MTU to run tests.
- fix way the lowest MTU is calculated during a PMTUd run
  to avoid spurious double notifications.
- drop redundant checks.

user visible changes:
- Global MTU is now calculated properly when using crypto
  and values will be in general bigger than before due
  to incorrect padding calculation in the previous implementation.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #242 from kronosnet/pmtud-fixes
Fabio M. Di Nitto [Fri, 2 Aug 2019 11:22:45 +0000 (13:22 +0200)]
Merge pull request #242 from kronosnet/pmtud-fixes

Pmtud fixes

4 years ago[PMTUd] fix MTU calculation when using crypto and add docs
Fabio M. Di Nitto [Fri, 2 Aug 2019 08:44:23 +0000 (10:44 +0200)]
[PMTUd] fix MTU calculation when using crypto and add docs

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[docs] add knet packet layout
Fabio M. Di Nitto [Fri, 2 Aug 2019 08:43:09 +0000 (10:43 +0200)]
[docs] add knet packet layout

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[udp] log information about detected kernel MTU
Fabio M. Di Nitto [Wed, 31 Jul 2019 12:15:07 +0000 (14:15 +0200)]
[udp] log information about detected kernel MTU

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[crypto] fix log information
Fabio M. Di Nitto [Tue, 30 Jul 2019 09:18:33 +0000 (11:18 +0200)]
[crypto] fix log information

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #240 from kronosnet/cov-scan
Fabio M. Di Nitto [Fri, 26 Jul 2019 13:07:32 +0000 (15:07 +0200)]
Merge pull request #240 from kronosnet/cov-scan

coverity scan fixes

4 years ago[sctp] retry locking in case of failure
Fabio M. Di Nitto [Fri, 26 Jul 2019 07:58:05 +0000 (09:58 +0200)]
[sctp] retry locking in case of failure

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[tx] clean up channel management code for internal communications
Fabio M. Di Nitto [Thu, 25 Jul 2019 09:18:19 +0000 (11:18 +0200)]
[tx] clean up channel management code for internal communications

the code is still not in use but it's more clear and doesn't trigger
memory overrun

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[nozzle] fix a few coverity errors in the test suite
Fabio M. Di Nitto [Thu, 25 Jul 2019 07:24:26 +0000 (09:24 +0200)]
[nozzle] fix a few coverity errors in the test suite

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[tx] drop unnecessary usleep when sending to localhost
Fabio M. Di Nitto [Thu, 25 Jul 2019 06:28:34 +0000 (08:28 +0200)]
[tx] drop unnecessary usleep when sending to localhost

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[common] make sure string is null terminated
Fabio M. Di Nitto [Wed, 24 Jul 2019 11:59:47 +0000 (13:59 +0200)]
[common] make sure string is null terminated

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[test] simplify flush log
Fabio M. Di Nitto [Wed, 24 Jul 2019 11:46:51 +0000 (13:46 +0200)]
[test] simplify flush log

allocate on stack only once and make sure strings are null terminated
drop useless read loop since log msg are always smaller than PAGE_SIZE
and read are atomic at that level

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[nozzle] avoid tons of possible buffer overruns
Fabio M. Di Nitto [Wed, 24 Jul 2019 09:00:00 +0000 (11:00 +0200)]
[nozzle] avoid tons of possible buffer overruns

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[PMTUd] do not double unlock global read lock
Fabio M. Di Nitto [Wed, 24 Jul 2019 06:38:56 +0000 (08:38 +0200)]
[PMTUd] do not double unlock global read lock

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[coverity] add test targets to run coverity automatically
Fabio M. Di Nitto [Tue, 23 Jul 2019 07:15:15 +0000 (09:15 +0200)]
[coverity] add test targets to run coverity automatically

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[compress] do not overrun allocated array for compress modules
Fabio M. Di Nitto [Thu, 18 Jul 2019 13:39:57 +0000 (15:39 +0200)]
[compress] do not overrun allocated array for compress modules

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[logging] make sure not to overrun buffers by pre-allocating them
Fabio M. Di Nitto [Thu, 18 Jul 2019 11:31:32 +0000 (13:31 +0200)]
[logging] make sure not to overrun buffers by pre-allocating them

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[compress] don't leak memory in case of errors during zstd init
Fabio M. Di Nitto [Thu, 18 Jul 2019 11:12:36 +0000 (13:12 +0200)]
[compress] don't leak memory in case of errors during zstd init

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[nozzle] don't leak memory on error
Fabio M. Di Nitto [Thu, 18 Jul 2019 11:09:05 +0000 (13:09 +0200)]
[nozzle] don't leak memory on error

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[nozzle] fix negative return detected by coverity scan
Fabio M. Di Nitto [Thu, 18 Jul 2019 11:04:54 +0000 (13:04 +0200)]
[nozzle] fix negative return detected by coverity scan

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[sctp] cleanup bugs detected in error paths by coverity scan
Fabio M. Di Nitto [Thu, 18 Jul 2019 09:57:36 +0000 (11:57 +0200)]
[sctp] cleanup bugs detected in error paths by coverity scan

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[acl] avoid forward null deferencing
Fabio M. Di Nitto [Thu, 18 Jul 2019 09:08:32 +0000 (11:08 +0200)]
[acl] avoid forward null deferencing

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[rx] better error report if we can't resolve hostname / port
Fabio M. Di Nitto [Thu, 18 Jul 2019 08:43:58 +0000 (10:43 +0200)]
[rx] better error report if we can't resolve hostname / port

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[common] fix dlopen error handling
Fabio M. Di Nitto [Thu, 18 Jul 2019 08:36:43 +0000 (10:36 +0200)]
[common] fix dlopen error handling

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[tests] fix knet_bench coverity errors
Fabio M. Di Nitto [Thu, 18 Jul 2019 08:23:14 +0000 (10:23 +0200)]
[tests] fix knet_bench coverity errors

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[sctp] revalidate fd to make coverity scan happy
Fabio M. Di Nitto [Thu, 18 Jul 2019 05:59:01 +0000 (07:59 +0200)]
[sctp] revalidate fd to make coverity scan happy

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[handle] make sure that the pmtud buf contains at least knet header size
Fabio M. Di Nitto [Thu, 18 Jul 2019 05:50:37 +0000 (07:50 +0200)]
[handle] make sure that the pmtud buf contains at least knet header size

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[rx] align data types
Fabio M. Di Nitto [Thu, 18 Jul 2019 05:11:56 +0000 (07:11 +0200)]
[rx] align data types

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[sctp] free access list only if the socket is valid
Fabio M. Di Nitto [Thu, 18 Jul 2019 05:03:11 +0000 (07:03 +0200)]
[sctp] free access list only if the socket is valid

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[sctp] fix deference after null check
Fabio M. Di Nitto [Mon, 15 Jul 2019 13:10:15 +0000 (15:10 +0200)]
[sctp] fix deference after null check

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #238 from kronosnet/coverity_scan
Fabio M. Di Nitto [Wed, 17 Jul 2019 08:56:21 +0000 (10:56 +0200)]
Merge pull request #238 from kronosnet/coverity_scan

[coverity] add .travis.yml to integrate CI with coverity scan

4 years ago[coverity] add .travis.yml to integrate CI with coverity scan
Fabio M. Di Nitto [Wed, 17 Jul 2019 07:41:20 +0000 (09:41 +0200)]
[coverity] add .travis.yml to integrate CI with coverity scan

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #236 from kronosnet/queue-flush
Fabio M. Di Nitto [Fri, 28 Jun 2019 09:33:40 +0000 (11:33 +0200)]
Merge pull request #236 from kronosnet/queue-flush

[threads] allow knet_handle_setfwd to flush socket queues

4 years ago[threads] allow knet_handle_setfwd to flush socket queues
Fabio M. Di Nitto [Thu, 27 Jun 2019 08:55:23 +0000 (10:55 +0200)]
[threads] allow knet_handle_setfwd to flush socket queues

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #235 from kronosnet/minor-cleanup
Fabio M. Di Nitto [Wed, 26 Jun 2019 08:38:04 +0000 (10:38 +0200)]
Merge pull request #235 from kronosnet/minor-cleanup

Minor cleanup

4 years ago[compress] fix #if def around BZIP2 testing
Fabio M. Di Nitto [Wed, 26 Jun 2019 03:31:23 +0000 (05:31 +0200)]
[compress] fix #if def around BZIP2 testing

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years ago[compress] fix a few minor space vs tab and code formatting
Fabio M. Di Nitto [Wed, 26 Jun 2019 03:31:06 +0000 (05:31 +0200)]
[compress] fix a few minor space vs tab and code formatting

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
4 years agoMerge pull request #233 from ReyRen/compression_default_level
Fabio M. Di Nitto [Wed, 26 Jun 2019 03:11:18 +0000 (05:11 +0200)]
Merge pull request #233 from ReyRen/compression_default_level

[compress]Default compress level use

4 years ago[compress]Default compression level use
yuan ren [Tue, 25 Jun 2019 13:55:26 +0000 (21:55 +0800)]
[compress]Default compression level use

1. add test casees for a module without default.Using default
compression level.
2. Discuss with Fabio, invalid compression level not the knet
responsible for, so error logged. But if compress success but
dstLen larger than srcLen, defualt compression level will be
used, because the request level is not effective.

Signed-off-by: yuan ren <yren@suse.com>
4 years ago[tests] ignore libnss errors from OpenSuse Tumbleweed
Fabio M. Di Nitto [Tue, 25 Jun 2019 11:30:23 +0000 (13:30 +0200)]
[tests] ignore libnss errors from OpenSuse Tumbleweed

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>