git.proxmox.com Git - mirror_qemu.git/commit

author	Emilio G. Cota <cota@braap.org>
	Tue, 9 Oct 2018 17:45:56 +0000 (13:45 -0400)
committer	Richard Henderson <richard.henderson@linaro.org>
	Fri, 19 Oct 2018 01:58:10 +0000 (18:58 -0700)
commit	71aec3541d87d611f6efad71d45b310e515372cc
tree	1168445e8f1278b6986fa13e46db6f513eb3cf7f	tree
parent	ea9025cb49027d9b3c4f48c56602351b9cf65ff1	commit \| diff

cputlb: serialize tlb updates with env->tlb_lock

Currently we rely on atomic operations for cross-CPU invalidations.
There are two cases that these atomics miss: cross-CPU invalidations
can race with either (1) vCPU threads flushing their TLB, which
happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
which updates .addr_write with a regular store. This results in
undefined behaviour, since we're mixing regular and atomic ops
on concurrent accesses.

Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
and the corresponding victim cache now hold the lock.
The readers that do not hold tlb_lock must use atomic reads when
reading .addr_write, since this field can be updated by other threads;
the conversion to atomic reads is done in the next patch.

Note that an alternative fix would be to expand the use of atomic ops.
However, in the case of TLB flushes this would have a huge performance
impact, since (1) TLB flushes can happen very frequently and (2) we
currently use a full memory barrier to flush each TLB entry, and a TLB
has many entries. Instead, acquiring the lock is barely slower than a
full memory barrier since it is uncontended, and with a single lock
acquisition we can flush the entire TLB.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-6-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg/cputlb.c		diff \| blob \| blame \| history
include/exec/cpu-defs.h		diff \| blob \| blame \| history