]>
Commit | Line | Data |
---|---|---|
6d6ac1c1 VG |
1 | CFQ ioscheduler tunables |
2 | ======================== | |
3 | ||
4 | slice_idle | |
5 | ---------- | |
6 | This specifies how long CFQ should idle for next request on certain cfq queues | |
7 | (for sequential workloads) and service trees (for random workloads) before | |
8 | queue is expired and CFQ selects next queue to dispatch from. | |
9 | ||
10 | By default slice_idle is a non-zero value. That means by default we idle on | |
11 | queues/service trees. This can be very helpful on highly seeky media like | |
12 | single spindle SATA/SAS disks where we can cut down on overall number of | |
13 | seeks and see improved throughput. | |
14 | ||
15 | Setting slice_idle to 0 will remove all the idling on queues/service tree | |
16 | level and one should see an overall improved throughput on faster storage | |
17 | devices like multiple SATA/SAS disks in hardware RAID configuration. The down | |
18 | side is that isolation provided from WRITES also goes down and notion of | |
19 | IO priority becomes weaker. | |
20 | ||
21 | So depending on storage and workload, it might be useful to set slice_idle=0. | |
22 | In general I think for SATA/SAS disks and software RAID of SATA/SAS disks | |
23 | keeping slice_idle enabled should be useful. For any configurations where | |
24 | there are multiple spindles behind single LUN (Host based hardware RAID | |
25 | controller or for storage arrays), setting slice_idle=0 might end up in better | |
26 | throughput and acceptable latencies. | |
27 | ||
28 | CFQ IOPS Mode for group scheduling | |
29 | =================================== | |
30 | Basic CFQ design is to provide priority based time slices. Higher priority | |
31 | process gets bigger time slice and lower priority process gets smaller time | |
32 | slice. Measuring time becomes harder if storage is fast and supports NCQ and | |
33 | it would be better to dispatch multiple requests from multiple cfq queues in | |
34 | request queue at a time. In such scenario, it is not possible to measure time | |
35 | consumed by single queue accurately. | |
36 | ||
37 | What is possible though is to measure number of requests dispatched from a | |
38 | single queue and also allow dispatch from multiple cfq queue at the same time. | |
39 | This effectively becomes the fairness in terms of IOPS (IO operations per | |
40 | second). | |
41 | ||
42 | If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches | |
43 | to IOPS mode and starts providing fairness in terms of number of requests | |
44 | dispatched. Note that this mode switching takes effect only for group | |
45 | scheduling. For non-cgroup users nothing should change. | |
4931402a VG |
46 | |
47 | CFQ IO scheduler Idling Theory | |
48 | =============================== | |
49 | Idling on a queue is primarily about waiting for the next request to come | |
50 | on same queue after completion of a request. In this process CFQ will not | |
51 | dispatch requests from other cfq queues even if requests are pending there. | |
52 | ||
53 | The rationale behind idling is that it can cut down on number of seeks | |
54 | on rotational media. For example, if a process is doing dependent | |
55 | sequential reads (next read will come on only after completion of previous | |
56 | one), then not dispatching request from other queue should help as we | |
57 | did not move the disk head and kept on dispatching sequential IO from | |
58 | one queue. | |
59 | ||
60 | CFQ has following service trees and various queues are put on these trees. | |
61 | ||
62 | sync-idle sync-noidle async | |
63 | ||
64 | All cfq queues doing synchronous sequential IO go on to sync-idle tree. | |
65 | On this tree we idle on each queue individually. | |
66 | ||
67 | All synchronous non-sequential queues go on sync-noidle tree. Also any | |
68 | request which are marked with REQ_NOIDLE go on this service tree. On this | |
69 | tree we do not idle on individual queues instead idle on the whole group | |
70 | of queues or the tree. So if there are 4 queues waiting for IO to dispatch | |
71 | we will idle only once last queue has dispatched the IO and there is | |
72 | no more IO on this service tree. | |
73 | ||
74 | All async writes go on async service tree. There is no idling on async | |
75 | queues. | |
76 | ||
77 | CFQ has some optimizations for SSDs and if it detects a non-rotational | |
78 | media which can support higher queue depth (multiple requests at in | |
79 | flight at a time), then it cuts down on idling of individual queues and | |
80 | all the queues move to sync-noidle tree and only tree idle remains. This | |
81 | tree idling provides isolation with buffered write queues on async tree. | |
82 | ||
83 | FAQ | |
84 | === | |
85 | Q1. Why to idle at all on queues marked with REQ_NOIDLE. | |
86 | ||
87 | A1. We only do tree idle (all queues on sync-noidle tree) on queues marked | |
88 | with REQ_NOIDLE. This helps in providing isolation with all the sync-idle | |
89 | queues. Otherwise in presence of many sequential readers, other | |
90 | synchronous IO might not get fair share of disk. | |
91 | ||
92 | For example, if there are 10 sequential readers doing IO and they get | |
93 | 100ms each. If a REQ_NOIDLE request comes in, it will be scheduled | |
94 | roughly after 1 second. If after completion of REQ_NOIDLE request we | |
95 | do not idle, and after a couple of milli seconds a another REQ_NOIDLE | |
96 | request comes in, again it will be scheduled after 1second. Repeat it | |
97 | and notice how a workload can lose its disk share and suffer due to | |
98 | multiple sequential readers. | |
99 | ||
100 | fsync can generate dependent IO where bunch of data is written in the | |
101 | context of fsync, and later some journaling data is written. Journaling | |
102 | data comes in only after fsync has finished its IO (atleast for ext4 | |
103 | that seemed to be the case). Now if one decides not to idle on fsync | |
104 | thread due to REQ_NOIDLE, then next journaling write will not get | |
105 | scheduled for another second. A process doing small fsync, will suffer | |
106 | badly in presence of multiple sequential readers. | |
107 | ||
108 | Hence doing tree idling on threads using REQ_NOIDLE flag on requests | |
109 | provides isolation from multiple sequential readers and at the same | |
110 | time we do not idle on individual threads. | |
111 | ||
112 | Q2. When to specify REQ_NOIDLE | |
113 | A2. I would think whenever one is doing synchronous write and not expecting | |
114 | more writes to be dispatched from same context soon, should be able | |
115 | to specify REQ_NOIDLE on writes and that probably should work well for | |
116 | most of the cases. |