blk-throttle: add throtl_qnode for dispatch fairness
With flat hierarchy, there's only single level of dispatching
happening and fairness beyond that point is the responsibility of the
rest of the block layer and driver, which usually works out okay;
however, with the planned hierarchy support,
service_queue->bio_lists[] can be filled up by bios from a single
source. While the limits would still be honored, it'd be very easy to
starve IOs from siblings or children.
To avoid such starvation, this patch implements throtl_qnode and
converts service_queue->bio_lists[] to lists of per-source qnodes
which in turn contains the bio's. For example, when a bio is
dispatched from a child group, the bio doesn't get queued on
->bio_lists[] directly but it first gets queued on the group's qnode
which in turn gets queued on service_queue->queued[]. When
dispatching for the upper level, the ->queued[] list is consumed in
round-robing order so that the dispatch windows is consumed fairly by
all IO sources.
There are two ways a bio can come to a throtl_grp - directly queued to
the group or dispatched from a child. For the former
throtl_grp->qnode_on_self[rw] is used. For the latter, the child's
->qnode_on_parent[rw].
Note that this means that the child which is contributing a bio to its
parent should stay pinned until all its bios are dispatched to its
grand-parent. This patch moves blkg refcnting from bio add/remove
spots to qnode activation/deactivation so that the blkg containing an
active qnode is always pinned. As child pins the parent, this is
sufficient for keeping the relevant sub-tree pinned while bios are in
flight.
The starvation issue was spotted by Vivek Goyal.
v2: The original patch used the same throtl_grp->qnode_on_self/parent
for reads and writes causing RWs to be queued incorrectly if there
already are outstanding IOs in the other direction. They should
be throtl_grp->qnode_on_self/parent[2] so that READs and WRITEs
can use different qnodes. Spotted by Vivek Goyal.