crypto: ofb - fix handling partial blocks and make thread-safe
Fix multiple bugs in the OFB implementation:
1. It stored the per-request state 'cnt' in the tfm context, which can be
used by multiple threads concurrently (e.g. via AF_ALG).
2. It didn't support messages not a multiple of the block cipher size,
despite being a stream cipher.
3. It didn't set cra_blocksize to 1 to indicate it is a stream cipher.
To fix these, set the 'chunksize' property to the cipher block size to
guarantee that when walking through the scatterlist, a partial block can
only occur at the end. Then change the implementation to XOR a block at
a time at first, then XOR the partial block at the end if needed. This
is the same way CTR and CFB are implemented. As a bonus, this also
improves performance in most cases over the current approach.