Move tb_add_jump() call and surrounding code from cpu_exec() into
tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct
chaining optimization details into tb_find_fast(). It also allows to
move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer
to tb_find_slow() which also manipulates the lock.