Chris Wilson [Tue, 12 Jan 2021 10:07:59 +0000 (10:07 +0000)]
drm/i915/gt: Perform an arbitration check before busywaiting
During igt_reset_nop_engine, it was observed that an unexpected failed
engine reset lead to us busywaiting on the stop-ring semaphore (set
during the reset preparations) on the first request afterwards. There was
no explicit MI_ARB_CHECK in this sequence as the presumption was that
the failed MI_SEMAPHORE_WAIT would itself act as an arbitration point.
It did not in this circumstance, so force it.
This patch is based on the assumption that the MI_SEMAPHORE_WAIT failure
to arbitrate is a rare Tigerlake bug, similar to the lite-restore vs
semaphore issues previously seen in the CS. The explicit MI_ARB_CHECK
should always ensure that there is at least one arbitration point in the
request before the MI_SEMAPHORE_WAIT to trigger the IDLE->ACTIVE event.
Upon processing that event, we will clear the stop-ring flag and release
the semaphore from its busywait.
Chris Wilson [Tue, 12 Jan 2021 10:07:58 +0000 (10:07 +0000)]
drm/i915/gt: Check for arbitration after writing start seqno
On the off chance that we need to arbitrate before launching the
payload, perform the check after we signal the request is ready to
start. Assuming instantaneous processing of the CS event, the request
will then be treated as having started when we make the decisions as to
how to process that CS event.
v2: More commentary about the users of i915_request_started() as a
reminder about why we are marking the initial breadcrumb.
Chris Wilson [Tue, 12 Jan 2021 02:00:13 +0000 (02:00 +0000)]
drm/i915/selftests: Allow huge_gem_object to kick the shrinker
A new fi-cml-dallium CI machine has 8G and apparently plenty free, yet
fails some selftests with ENOMEM. The failures all seem to be from
huge_gem_object which does not try very hard to allocate memory,
skipping reclaim entirely. Let's try a bit harder and direct reclaim
before failing.
Chris Wilson [Mon, 11 Jan 2021 22:52:20 +0000 (22:52 +0000)]
drm/i915: Allow the sysadmin to override security mitigations
The clear-residuals mitigation is a relatively heavy hammer and under some
circumstances the user may wish to forgo the context isolation in order
to meet some performance requirement. Introduce a generic module
parameter to allow selectively enabling/disabling different mitigations.
To disable just the clear-residuals mitigation (on Ivybridge, Baytrail,
or Haswell) use the module parameter: i915.mitigations=auto,!residuals
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1858 Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: stable@vger.kernel.org # v5.7 Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210111225220.3483-3-chris@chris-wilson.co.uk
Chris Wilson [Mon, 11 Jan 2021 22:52:18 +0000 (22:52 +0000)]
drm/i915/gt: Limit VFE threads based on GT
MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the
range [0, n-1] where n is #EU * (#threads/EU) with the number of threads
based on plaform and the number of EU based on the number of slices and
subslices. This is a fixed number per platform/gt, so appropriately
limit the number of threads we spawn to match the device.
v2: Oversaturate the system with tasks to force execution on every HW
thread; if the thread idles it is returned to the pool and may be reused
again before an unused thread.
v3: Fix more state commands, which was causing Baytrail to barf.
v4: STATE_CACHE_INVALIDATE requires a stall on Ivybridge
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Randy Wright <rwright@hpe.com> Cc: stable@vger.kernel.org # v5.7+ Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210111225220.3483-1-chris@chris-wilson.co.uk
Chris Wilson [Mon, 11 Jan 2021 10:57:32 +0000 (10:57 +0000)]
drm/i915/gt: Disable arbitration around Braswell's pdp updates
Braswell's pdp workaround is full of dragons, that may be being angered
when they are interrupted. Let's not take that risk and disable
arbitration during the update.
Dan Carpenter [Mon, 11 Jan 2021 13:13:20 +0000 (13:13 +0000)]
drm/i915/selftests: Fix some error codes
These error paths return success instead of negative error codes as
intended.
Fixes: c92724de6db1 ("drm/i915/selftests: Try to detect rollback during batchbuffer preemption") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/X/xMdcewtft7+QFM@mwanda
Chris Wilson [Sat, 9 Jan 2021 16:34:55 +0000 (16:34 +0000)]
drm/i915: Refactor marking a request as EIO
When wedging the device, we cancel all outstanding requests and mark
them as EIO. Rather than duplicate the small function to do so between
each submission backend, export one.
Chris Wilson [Sat, 9 Jan 2021 16:34:54 +0000 (16:34 +0000)]
drm/i915/gt: Mark up a debug-only function
drivers/gpu/drm/i915//gt/intel_workarounds.c:1394:20: error: function 'is_nonpriv_flags_valid' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static inline bool is_nonpriv_flags_valid(u32 flags)
This is only used by debug build, so mark it as maybe-unused to keep the
compiler from complaining.
Chris Wilson [Fri, 8 Jan 2021 20:40:26 +0000 (20:40 +0000)]
drm/i915/gt: Disable arbitration on no-preempt requests
If a request is submitted and known to require no preemption, disable
arbitration around the batch which prevents the HW from handling a
preemption request during the payload.
Chris Wilson [Fri, 8 Jan 2021 20:40:25 +0000 (20:40 +0000)]
drm/i915/gt: Only disable preemption on gen8 render engines
The reason why we did not enable preemption on Broadwater was due to
missing GPGPU workarounds. Since this only applies to rcs0, only
restrict rcs0 (and our global capabilities).
While this does not affect exposing a preemption capability to
userspace, it does affect our internal decisions on whether to use
timeslicing and semaphores between individual engines.
Chris Wilson [Fri, 8 Jan 2021 20:40:24 +0000 (20:40 +0000)]
drm/i915/gt: Only retire on the last breadcrumb if the last request
We use the completion of the last active breadcrumb to retire the
requests along a timeline. This is purely opportunistic as nothing
guarantees that any particular timeline is terminated by a breadcrumb;
except for parking the engine where we explicitly add a breadcrumb so
that we park quickly and do an explicit retire upon signaling to reduce
the latency dramatically (avoiding a retire worker roundtrip).
With scheduling, we anticipate retiring completed timelines as a matter
of course. Performing the same action from inside the breadcrumbs is
intended to provide similar functionality for legacy ringbuffer
submission.
Chris Wilson [Fri, 8 Jan 2021 20:40:23 +0000 (20:40 +0000)]
drm/i915/gt: Restore ce->signal flush before releasing virtual engine
Before we mark the virtual engine as no longer inflight, flush any
ongoing signaling that may be using the ce->signal_link along the
previous breadcrumbs. On switch to a new physical engine, that link will
be inserted into the new set of breadcrumbs, causing confusion to an
ongoing iterator.
This patch undoes a last minute mistake introduced into commit bab0557c8dca ("drm/i915/gt: Remove virtual breadcrumb before transfer"),
whereby instead of unconditionally applying the flush, it was only
applied if the request itself was going to be reused.
v2: Generalise and cancel all remaining ce->signals
Chris Wilson [Fri, 8 Jan 2021 20:40:22 +0000 (20:40 +0000)]
drm/i915/selftests: Rearrange ktime_get to reduce latency against CS
In our tests where we measure the elapsed time on both the CPU and CS
using a udelay, our CS results match the udelay much more accurately
than the ktime (even when using ktime_get_fast_ns). With preemption
disabled, we can go one step lower than ktime and use local_clock.
If any of the perf tests run into 0 time, not only are we liable to
divide by zero, but the result would be highly questionable.
Nevertheless, let's not have a div-by-zero error.
Chris Wilson [Fri, 8 Jan 2021 20:40:20 +0000 (20:40 +0000)]
drm/i915/gt: Prevent use of engine->wa_ctx after error
On error we unpin and free the wa_ctx.vma, but do not clear any of the
derived flags. During lrc_init, we look at the flags and attempt to
dereference the wa_ctx.vma if they are set. To protect the error path
where we try to limp along without the wa_ctx, make sure we clear those
flags!
Reported-by: Matt Roper <matthew.d.roper@intel.com> Fixes: 604a8f6f1e33 ("drm/i915/lrc: Only enable per-context and per-bb buffers if set") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.15+ Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210108204026.20682-1-chris@chris-wilson.co.uk
Chris Wilson [Thu, 7 Jan 2021 13:23:22 +0000 (13:23 +0000)]
drm/i915/gt: Remove timeslice suppression
In the next^W future patch, we remove the strict priority system and
continuously re-evaluate the relative priority of tasks. As such we need
to enable the timeslice whenever there is more than one context in the
pipeline. This simplifies the decision and removes some of the tweaks to
suppress timeslicing, allowing us to lift the timeslice enabling to a
common spot at the end of running the submission tasklet.
One consequence of the suppression is that it was reducing fairness
between virtual engines on an over saturated system; undermining the
principle for timeslicing.
v2: Commentary
v3: Commentary for the right cancel_timer()
v4: Add tracing for why we need a timeslice
Chris Wilson [Wed, 6 Jan 2021 12:39:36 +0000 (12:39 +0000)]
drm/i915/selftests: Break out of the lrc layout test after register mismatch
AFter detecting a register mismatch between the protocontext and the
image generated by HW, immediately break out of the double loop.
Otherwise we end up with a second confusing error message.
Chris Wilson [Wed, 23 Sep 2020 11:41:56 +0000 (12:41 +0100)]
drm/i915/selftests: Switch 4k kmalloc to use get_free_page for alignment
In generating the reference LRC, we want a page-aligned address for
simplicity in computing the offsets within. This then shares the
computation for the HW LRC which is mapped and so page aligned, making
the comparison straightforward. It seems that kmalloc(4k) is not always
returning from a 4k-aligned slab cache (which would give us a page aligned
address) so force alignment by explicitly allocating a page.
Chris Wilson [Mon, 4 Jan 2021 11:51:42 +0000 (11:51 +0000)]
drm/i915/gt: Allow failed resets without assertion
If the engine reset fails, we will attempt to resume with the current
inflight submissions. When that happens, we cannot assert that the
engine reset cleared the pending submission, so do not.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2878 Fixes: 16f2941ad307 ("drm/i915/gt: Replace direct submit with direct call to tasklet") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210104115145.24460-3-chris@chris-wilson.co.uk
Chris Wilson [Mon, 4 Jan 2021 11:51:41 +0000 (11:51 +0000)]
drm/i915: Set rawclk earlier during mmio probe
Older platforms use rawclk to derive the CS clock. rawclk is being
determined during intel_device_info_init(), and so that needs to be
pushed slightly earlier.
Chris Wilson [Mon, 4 Jan 2021 17:15:11 +0000 (17:15 +0000)]
drm/i915/selftests: Guard against redifinition of SZ_8G
In the near future, upstream will introduce a SZ_8G macro that is
slightly different to our own. Employ a temporary ifndef to avoid
compilation failure until we have backmerged.
Matt Roper [Thu, 31 Dec 2020 19:11:03 +0000 (11:11 -0800)]
drm/i915: Clarify error message on failed workaround
Let's modify the "workaround lost" error message slightly to make it
more clear what the various numbers represent. Also, the 'expected'
value needs to be &'d with wa->read so that it doesn't include the mask
bits for masked registers (those bits are write-only in the hardware and
will usually always read out as 0's).
Chris Wilson [Thu, 31 Dec 2020 09:31:49 +0000 (09:31 +0000)]
drm/i915: Drop i915_request.lock requirement for intel_rps_boost()
Since we use a flag within i915_request.flags to indicate when we have
boosted the request (so that we only apply the boost) once, this can be
used as the serialisation with i915_request_retire() to avoid having to
explicitly take the i915_request.lock which is more heavily contended.
Chris Wilson [Thu, 31 Dec 2020 09:39:46 +0000 (09:39 +0000)]
drm/i915/gt: Pull context closure check from request submit to schedule-in
We only need to evaluate the current status of the context when it is
scheduled in, we will force a reschedule when the context is closed
propagating the change to inflight contexts.
Chris Wilson [Wed, 30 Dec 2020 22:00:28 +0000 (22:00 +0000)]
drm/i915/gt: Cancel submitted requests upon context reset
Since we process schedule-in of a context after submitting the request,
if we decide to reset the context at that time, we also have to cancel
the requets we have marked for submission.
Chris Wilson [Tue, 29 Dec 2020 14:16:26 +0000 (14:16 +0000)]
drm/i915/gt: Taint the reset mutex with the shrinker
Declare that, under extreme circumstances, the shrinker may need to wait
upon a request, in which case reset must not itself deadlock in order to
ensure forward progress of the driver. That is since the shrinker may
depend upon a reset, any reset cannot touch the shrinker.
Chris Wilson [Tue, 29 Dec 2020 12:08:28 +0000 (12:08 +0000)]
drm/i915/gt: Define guc firmware blob for older Cometlakes
When splitting the Coffeelake define to also identify Cometlakes, I
missed the double fw_def for Coffeelake. That is only newer Cometlakes
use the cml specific guc firmware, older Cometlakes should use kbl
firmware.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2859 Fixes: 5f4ae2704d59 ("drm/i915: Identify Cometlake platform") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.9+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201229120828.29931-1-chris@chris-wilson.co.uk
Chris Wilson [Thu, 24 Dec 2020 16:02:13 +0000 (16:02 +0000)]
drm/i915/gt: Refactor heartbeat request construction and submission
Pull the individual strands of creating a custom heartbeat requests into
a pair of common functions. This will reduce the number of changes we
will need to make in future.
Matthew Auld [Thu, 24 Dec 2020 15:13:58 +0000 (15:13 +0000)]
drm/i915: clear the gpu reloc batch
The reloc batch is short lived but can exist in the user visible ppGTT,
and since it's backed by an internal object, which lacks page clearing,
we should take care to clear it upfront.
Matthew Auld [Thu, 24 Dec 2020 15:13:57 +0000 (15:13 +0000)]
drm/i915: clear the shadow batch
The shadow batch is an internal object, which doesn't have any page
clearing, and since the batch_len can be smaller than the object, we
should take care to clear it.
Testcase: igt/gen9_exec_parse/shadow-peek Fixes: 4f7af1948abc ("drm/i915: Support ro ppgtt mapped cmdparser shadow buffers") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201224151358.401345-1-matthew.auld@intel.com Cc: stable@vger.kernel.org
Chris Wilson [Thu, 24 Dec 2020 13:55:44 +0000 (13:55 +0000)]
drm/i915/gt: ce->inflight updates are now serialised
Since schedule-in and schedule-out are now both always under the tasklet
bitlock, we can reduce the individual atomic operations to simple
instructions and worry less.
This notably eliminates the race observed with intel_context_inflight in
__engine_unpark().
Chris Wilson [Thu, 24 Dec 2020 13:55:43 +0000 (13:55 +0000)]
drm/i915/gt: Simplify virtual engine handling for execlists_hold()
Now that the tasklet completely controls scheduling of the requests, and
we postpone scheduling out the old requests, we can keep a hanging
virtual request bound to the engine on which it hung, and remove it from
te queue. On release, it will be returned to the same engine and remain
in its queue until it is scheduled; after which point it will become
eligible for transfer to a sibling. Instead, we could opt to resubmit the
request along the virtual engine on unhold, making it eligible for load
balancing immediately -- but that seems like a pointless optimisation
for a hanging context.
Chris Wilson [Thu, 24 Dec 2020 13:55:42 +0000 (13:55 +0000)]
drm/i915/gt: Resubmit the virtual engine on schedule-out
Having recognised that we do not change the sibling until we schedule
out, we can then defer the decision to resubmit the virtual engine from
the unwind of the active queue to scheduling out of the virtual context.
This improves our resilence in virtual engine scheduling, and should
eliminate the rare cases of gem_exec_balance failing.
By keeping the unwind order intact on the local engine, we can preserve
data dependency ordering while doing a preempt-to-busy pass until we
have determined the new ELSP. This means that if we try to timeslice
between a virtual engine and a data-dependent ordinary request, the pair
will maintain their relative ordering and we will avoid the
resubmission, cancelling the timeslicing until further change.
The dilemma though is that we then may end up in a situation where the
'demotion' of the virtual request to an ordinary request in the engine
queue results in filling the ELSP[] with virtual requests instead of
spreading the load across the engines. To compensate for this, we mark
each virtual request and refuse to resubmit a virtual request in the
secondary ELSP slots, thus forcing subsequent virtual requests to be
scheduled out after timeslicing. By delaying the decision until we
schedule out, we will avoid unnecessary resubmission.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2079 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2098 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-7-chris@chris-wilson.co.uk
Chris Wilson [Thu, 24 Dec 2020 13:55:41 +0000 (13:55 +0000)]
drm/i915/gt: Shrink the critical section for irq signaling
Let's only wait for the list iterator when decoupling the virtual
breadcrumb, as the signaling of all the requests may take a long time,
during which we do not want to keep the tasklet spinning.
Chris Wilson [Thu, 24 Dec 2020 13:55:40 +0000 (13:55 +0000)]
drm/i915/gt: Remove virtual breadcrumb before transfer
The issue with stale virtual breadcrumbs remain. Now we have the problem
that if the irq-signaler is still referencing the stale breadcrumb as we
transfer it to a new sibling, the list becomes spaghetti. This is a very
small window, but that doesn't stop it being hit infrequently. To
prevent the lists being tangled (the iterator starting on one engine's
b->signalers but walking onto another list), always decouple the virtual
breadcrumb on schedule-out and make sure that the walker has stepped out
of the lists.
Chris Wilson [Thu, 24 Dec 2020 13:55:39 +0000 (13:55 +0000)]
drm/i915/gt: Defer schedule_out until after the next dequeue
Inside schedule_out, we do extra work upon idling the context, such as
updating the runtime, kicking off retires, kicking virtual engines.
However, if we are in a series of processing single requests per
contexts, we may find ourselves scheduling out the context, only to
immediately schedule it back in during dequeue. This is just extra work
that we can avoid if we keep the context marked as inflight across the
dequeue. This becomes more significant later on for minimising virtual
engine misses.
Chris Wilson [Thu, 24 Dec 2020 13:55:38 +0000 (13:55 +0000)]
drm/i915/gt: Decouple inflight virtual engines
Once a virtual engine has been bound to a sibling, it will remain bound
until we finally schedule out the last active request. We can not rebind
the context to a new sibling while it is inflight as the context save
will conflict, hence we wait. As we cannot then use any other sibliing
while the context is inflight, only kick the bound sibling while it
inflight and upon scheduling out the kick the rest (so that we can swap
engines on timeslicing if the previously bound engine becomes
oversubscribed).
Chris Wilson [Thu, 24 Dec 2020 13:55:37 +0000 (13:55 +0000)]
drm/i915/gt: Use virtual_engine during execlists_dequeue
Rather than going back and forth between the rb_node entry and the
virtual_engine type, store the ve local and reuse it. As the
container_of conversion from rb_node to virtual_engine requires a
variable offset, performing that conversion just once shaves off a bit
of code.
v2: Keep a single virtual engine lookup, for typical use.
Chris Wilson [Thu, 24 Dec 2020 13:55:36 +0000 (13:55 +0000)]
drm/i915/gt: Replace direct submit with direct call to tasklet
Rather than having special case code for opportunistically calling
process_csb() and performing a direct submit while holding the engine
spinlock for submitting the request, simply call the tasklet directly.
This allows us to retain the direct submission path, including the CS
draining to allow fast/immediate submissions, without requiring any
duplicated code paths, and most importantly greatly simplifying the
control flow by removing reentrancy. This will enable us to close a few
races in the virtual engines in the next few patches.
The trickiest part here is to ensure that paired operations (such as
schedule_in/schedule_out) remain under consistent locking domains,
e.g. when pulled outside of the engine->active.lock
v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET
softirq synchronous").
v3: Update engine-reset to be tasklet aware
Chris Wilson [Wed, 23 Dec 2020 12:20:50 +0000 (12:20 +0000)]
drm/i915/gt: Prefer recycling an idle fence
If we want to reuse a fence that is in active use by the GPU, we have to
wait an uncertain amount of time, but if we reuse an inactive fence, we
can change it right away. Loop through the list of available fences
twice, ignoring any active fences on the first pass.
Chris Wilson [Wed, 23 Dec 2020 12:23:59 +0000 (12:23 +0000)]
drm/i915/gt: Consolidate the CS timestamp clocks
Pull the GT clock information [used to derive CS timestamps and PM
interval] under the GT so that is it local to the users. In doing so, we
consolidate the two references for the same information, of which the
runtime-info took note of a potential clock source override and scaling
factors.
Chris Wilson [Wed, 23 Dec 2020 12:23:58 +0000 (12:23 +0000)]
drm/i915/selftests: Confirm CS_TIMESTAMP / CTX_TIMESTAMP share a clock
We assume that both timestamps are driven off the same clock [reported
to userspace as I915_PARAM_CS_TIMESTAMP_FREQUENCY]. Verify that this is
so by reading the timestamp registers around a busywait (on an otherwise
idle engine so there should be no preemptions).
v2: Icelake (not ehl, nor tgl) seems to be using a fixed 80ns interval
for, and only for, CTX_TIMESTAMP -- or it may be GPU frequency and the
test is always running at maximum frequency?. As far as I can tell, this
isolated change in behaviour is undocumented.
Chris Wilson [Tue, 22 Dec 2020 11:35:36 +0000 (11:35 +0000)]
drm/i915/selftests: Be paranoid and flush the tasklet before checking status
When waiting for the submit, before checking the status of the request,
kick the tasklet to make sure we are processing the submission. This
speeds up submission if we are using any tasklet suppression for
secondary requests.
Chris Wilson [Tue, 22 Dec 2020 11:35:35 +0000 (11:35 +0000)]
drm/i915/selftests: Flush the preemption request before waiting
Make sure that the request has been submitted to HW before we begin our
wait. This reduces our reliance on the semaphore yield interrupt driving
the preemption request.
Chris Wilson [Tue, 22 Dec 2020 10:42:42 +0000 (10:42 +0000)]
drm/i915/gt: Track all timelines created using the HWSP
We assume that the contents of the HWSP are lost across suspend, and so
upon resume we must restore critical values such as the timeline seqno.
Keep track of every timeline allocated that uses the HWSP as its storage
and so we can then reset all seqno values by walking that list.
Chris Wilson [Sat, 19 Dec 2020 02:03:43 +0000 (02:03 +0000)]
drm/i915/gt: Provide a utility to create a scratch buffer
Primarily used by selftests, but also by runtime debugging of engine
w/a, is a routine to create a temporarily bound buffer for readback.
Almagamate the duplicated routines into one.
Chris Wilson [Sat, 19 Dec 2020 02:03:42 +0000 (02:03 +0000)]
drm/i915/gt: Split logical ring contexts from execlist submission
Split the definition, construction and updating of the Logical Ring
Context from the execlist submission interface. The LRC is used by the
HW, irrespective of our different submission backends.
Chris Wilson [Sun, 20 Dec 2020 13:48:58 +0000 (13:48 +0000)]
drm/i915/gt: Another tweak for flushing the tasklets
tasklet_kill() ensures that we _yield_ the processor until a remote
tasklet is completed. However, this leads to a starvation condition as
being at the bottom of the scheduler's runqueue means that anything else
is able to run, including all hogs keeping the tasklet occupied.
Chris Wilson [Fri, 18 Dec 2020 12:24:21 +0000 (12:24 +0000)]
drm/i915: Check for rq->hwsp validity after acquiring RCU lock
Since we allow removing the timeline map at runtime, there is a risk
that rq->hwsp points into a stale page. To control that risk, we hold
the RCU read lock while reading *rq->hwsp, but we missed a couple of
important barriers. First, the unpinning / removal of the timeline map
must be after all RCU readers into that map are complete, i.e. after an
rcu barrier (in this case courtesy of call_rcu()). Secondly, we must
make sure that the rq->hwsp we are about to dereference under the RCU
lock is valid. In this case, we make the rq->hwsp pointer safe during
i915_request_retire() and so we know that rq->hwsp may become invalid
only after the request has been signaled. Therefore is the request is
not yet signaled when we acquire rq->hwsp under the RCU, we know that
rq->hwsp will remain valid for the duration of the RCU read lock.
This is a very small window that may lead to either considering the
request not completed (causing a delay until the request is checked
again, any wait for the request is not affected) or dereferencing an
invalid pointer.
Fixes: 3adac4689f58 ("drm/i915: Introduce concept of per-timeline (context) HWSP") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.1+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@chris-wilson.co.uk
Aditya Swarup [Thu, 3 Dec 2020 07:23:59 +0000 (23:23 -0800)]
drm/i915/tgl: Add bound checks and simplify TGL REVID macros
Add bound checks for TGL REV ID array. Since, there might
be a possibility of using older kernels on latest platform
revisions, resulting in out of bounds access for rev ID array.
In this scenario, use the latest rev ID available and apply
those WAs.
Also, modify GT macros for TGL rev ID to reuse tgl_revids_get().
Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201203072359.156682-2-aditya.swarup@intel.com
Aditya Swarup [Thu, 3 Dec 2020 07:23:58 +0000 (23:23 -0800)]
drm/i915/tgl: Fix REVID macros for TGL to fetch correct stepping
Fix TGL REVID macros to fetch correct display/gt stepping based
on SOC rev id from INTEL_REVID() macro. Previously, we were just
returning the first element of the revid array instead of using
the correct index based on SOC rev id.
Fixes: c33298cb34f5 ("drm/i915/tgl: Fix stepping WA matching") Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201203072359.156682-1-aditya.swarup@intel.com
Chris Wilson [Tue, 15 Dec 2020 15:44:56 +0000 (15:44 +0000)]
drm/i915/gt: Track the overall awake/busy time
Since we wake the GT up before executing a request, and go to sleep as
soon as it is retired, the GT wake time not only represents how long the
device is powered up, but also provides a summary, albeit an overestimate,
of the device runtime (i.e. the rc0 time to compare against rc6 time).
v2: s/busy/awake/
v3: software-gt-awake-time and I915_PMU_SOFTWARE_GT_AWAKE_TIME
Chris Wilson [Thu, 17 Dec 2020 09:15:24 +0000 (09:15 +0000)]
drm/i915/gt: Drain the breadcrumbs just once
Matthew Brost pointed out that the while-loop on a shared breadcrumb was
inherently fraught with danger as it competed with the other users of
the breadcrumbs. However, in order to completely drain the re-arming irq
worker, the while-loop is a necessity, despite my optimism that we could
force cancellation with a couple of irq_work invocations.
Given that we can't merely drop the while-loop, use an activity counter on
the breadcrumbs to detect when we are parking the breadcrumbs for the
last time.
Based on a patch by Matthew Brost.
Reported-by: Matthew Brost <matthew.brost@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201217091524.10258-1-chris@chris-wilson.co.uk
Chris Wilson [Wed, 16 Dec 2020 16:58:50 +0000 (16:58 +0000)]
drm/i915: Encode fence specific waitqueue behaviour into the wait.flags
Use the wait_queue_entry.flags to denote the special fence behaviour
(flattening continuations along fence chains, and for propagating
errors) rather than trying to detect ordinary waiters by their
functions.
Chris Wilson [Tue, 15 Dec 2020 15:21:38 +0000 (15:21 +0000)]
drm/i915/gem: Drop free_work for GEM contexts
The free_list and worker was introduced in commit 5f09a9c8ab6b ("drm/i915:
Allow contexts to be unreferenced locklessly"), but subsequently made
redundant by the removal of the last sleeping lock in commit 2935ed5339c4
("drm/i915: Remove logical HW ID"). As we can now free the GEM context
immediately from any context, remove the deferral of the free_list
v2: Lift removing the context from the global list into close().
Chris Wilson [Wed, 16 Dec 2020 09:29:51 +0000 (09:29 +0000)]
drm/i915: Fix mismatch between misplaced vma check and vma insert
When inserting a VMA, we restrict the placement to the low 4G unless the
caller opts into using the full range. This was done to allow usersapce
the opportunity to transition slowly from a 32b address space, and to
avoid breaking inherent 32b assumptions of some commands.
However, for insert we limited ourselves to 4G-4K, but on verification
we allowed the full 4G. This causes some attempts to bind a new buffer
to sporadically fail with -ENOSPC, but at other times be bound
successfully.
commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
page") suggests that there is a genuine problem with stateless addressing
that cannot utilize the last page in 4G and so we purposefully excluded
it. This means that the quick pin pass may cause us to utilize a buggy
placement.
Reported-by: CQ Tang <cq.tang@intel.com>
Testcase: igt/gem_exec_params/larger-than-life-batch Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com> Reviewed-by: CQ Tang <cq.tang@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v4.5+ Link: https://patchwork.freedesktop.org/patch/msgid/20201216092951.7124-1-chris@chris-wilson.co.uk
doc: Fix build of documentation after i915 file rename
Commit 70a2b431c364 ("drm/i915/gt: Rename lrc.c to
execlists_submission.c") renamed intel_lrc.c to
intel_execlists_submission.c but forgot to update i915.rst.
Tvrtko Ursulin [Mon, 14 Dec 2020 09:43:49 +0000 (09:43 +0000)]
drm/i915/pmu: Remove !CONFIG_PM code
Chris spotted that since 16ffe73c186b ("drm/i915/pmu: Use GT parked for
estimating RC6 while asleep") we don't rely on runtime pm internals when
estimating RC6 while asleep. We can remove the ifdef code to simplify and
at the same time wake up the device less when querying RC6 if CONFIG_PM is
not compiled in.
Tvrtko Ursulin [Mon, 14 Dec 2020 09:43:47 +0000 (09:43 +0000)]
drm/i915/pmu: Don't grab wakeref when enabling events
Chris found a CI report which points out calling intel_runtime_pm_get from
inside i915_pmu_enable hook is not allowed since it can be invoked from
hard irq context. This is something we knew but forgot, so lets fix it
once again.
We do this by syncing the internal book keeping with hardware rc6 counter
on driver load.
v2:
* Always sync on parking and fully sync on init.
John Harrison [Thu, 10 Dec 2020 17:06:15 +0000 (09:06 -0800)]
drm/i915: Correct location of Wa_1408615072
The above workaround was added as an engine workaround not a GT
workaround. Moved it to the correct location.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201210170615.3107266-1-lucas.demarchi@intel.com
drm/i915: split gen8+ flush and bb_start emission functions
These functions are independent from the backend used and can therefore
be split out of the exelists submission file, so they can be re-used by
the upcoming GuC submission backend.
Chris Wilson [Wed, 9 Dec 2020 23:36:17 +0000 (23:36 +0000)]
drm/i915/gt: Rename lrc.c to execlists_submission.c
We want to separate the utility functions for controlling the logical
ring context from the execlists submission mechanism (which is an
overgrown scheduler).
This is similar to Daniele's work to split up the files, but being
selfish I wanted to base it after my own changes to intel_lrc.c petered
out.
Chris Wilson [Wed, 9 Dec 2020 23:36:16 +0000 (23:36 +0000)]
drm/i915/gt: Move move context layout registers and offsets to lrc_reg.h
Cleanup intel_lrc.h by moving some of the residual common register
definitions into intel_lrc_reg.h, prior to rebranding and splitting off
the submission backends.
v2: keep the SCHEDULE enum in the old file, since it is specific to the
gvt usage of the execlists submission backend (John)
Chris Wilson [Wed, 9 Dec 2020 16:40:07 +0000 (16:40 +0000)]
drm/i915: Sleep around performing iommu unmaps on Tigerlake
Tigerlake is plagued by spontaneous DMAR faults [reason 7, next page
table ptr is invalid] which lead to GPU hangs. These faults occur when
an iommu map is immediately reused. Adding further clflushes and
barriers around either the GTT PTE or iommu PTE updates do not prevent
the faults. So far the only effect has been from inducing a delay
between reuse of the iommu on the GPU, and applying the delay at the
iommu map allows for the smallest stable delay.
Note that such a delay is hideous and clearly does not fix the root cause,
and so should only be a bandaid until a complete solution is found. The
delay was determined by running igt/gem_exec_fence/parallel in a loop for
a few hours (unpatched MTBF is about 10s).
We have also seen such DMAR fault [reason 7] errors on other platforms,
notably gen9-gen11, but so far it has only been trivially and
consistently reproduced on Tigerlake.
v2: Leave a tell-tale to know when we apply the vt'd quirk, and as a
reminder to remove it again. Hopefully.
Chris Wilson [Wed, 9 Dec 2020 16:40:06 +0000 (16:40 +0000)]
drm/i915: Remove livelock from "do_idle_maps" vtd w/a
A call to wait for the GT to idle from inside the put_pages fallback is
prone to cause an uninterruptible livelock. As it does not provide
adequate serialisation with new requests, simply fallback to a trivial
sleep.
Lucas De Marchi [Wed, 9 Dec 2020 04:52:45 +0000 (20:52 -0800)]
drm/i915/gt: rename wa_write_masked_or()
The use of "masked" in this function is due to its history. Once upon a
time it received a mask and a value as parameter. Since
commit eeec73f8a4a4 ("drm/i915/gt: Skip rmw for masked registers")
that is not true anymore and now there is a clear and a set parameter.
Depending on the case, that can still be thought as a mask and value,
but there are some subtle differences: what we clear doesn't need to be
the same bits we are setting, particularly when we are using masked
registers.
The fact that we also have "masked registers", i.e. registers whose mask
is stored in the upper 16 bits of the register, makes it even more
confusing, because "masked" in wa_write_masked_or() has little to do
with masked registers, but rather refers to the old mask parameter the
function received (that can also, but not exclusively, be used to write
to masked register).
Avoid the ambiguity and misnomer by renaming it to something else,
hopefully less confusing: wa_write_clr_set(), to designate that we are
doing both clr and set operations in the register.
Lucas De Marchi [Wed, 9 Dec 2020 04:52:44 +0000 (20:52 -0800)]
drm/i915/gt: stop ignoring read with wa_masked_field_set
When using masked registers, there is nothing to clear since a masked
register has the mask in the upper 16b: we can just write to the
location we want and use the mask to control what bits we are writing
to.
However that doesn't mean we don't want to read back the register and
check the value actually matched what we wanted to write, i.e. that
the WA stick. That should be an explicit opt-out for registers that are
either write-only or that are affected by hardware misbehavior.
Moreover both wa_masked_en() and wa_masked_dis() check the WA stick, so
skipping the check just because the field is more than 1 bit is
surprising and error-prone.
Chris Wilson [Mon, 7 Dec 2020 19:38:05 +0000 (19:38 +0000)]
drm/i915/gem: Drop false !i915_vma_is_closed assertion
Closed vma are protected by the GT wakeref held as we lookup the vma, so
we know that the vma will not be freed as we process it for the execbuf.
Instead we expect to catch the closed status of the context, and simply
allow the close-race on an individual vma to be washed away.
Longer term, the GT wakeref protection will be removed by explicit
vma.kref tracking.
Colin Ian King [Fri, 2 Oct 2020 17:03:54 +0000 (18:03 +0100)]
drm/i915: fix size_t greater or equal to zero comparison
Currently the check that the unsigned size_t variable i is >= 0
is always true because the unsigned variable will never be negative,
causing the loop to run forever. Fix this by changing the
pre-decrement check to a zero check on i followed by a decrement of i.
Addresses-Coverity: ("Unsigned compared against 0") Fixes: bfed6708d6c9 ("drm/i915: use vmap in shmem_pin_map") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201002170354.94627-1-colin.king@canonical.com
Lucas De Marchi [Sat, 5 Dec 2020 09:25:42 +0000 (01:25 -0800)]
drm/i915: remove WA_SET_FIELD_MASKED()
Remove the last macro and implement it as a function like the rest of
the operations that don't assume there is a `wal` list, but rather
receive it as argument.