Chris Wilson [Mon, 19 Aug 2019 07:58:19 +0000 (08:58 +0100)]
drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe
We use a fake timeline->mutex lock to reassure lockdep that the timeline
is always locked when emitting requests. However, the use inside
__engine_park() may be inside hardirq and so lockdep now complains about
the mixed irq-state of the nested locked. Disable irqs around the
lockdep tracking to keep it happy.
Fixes: 6c69a45445af ("drm/i915/gt: Mark context->active_count as protected by timeline->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190819075835.20065-3-chris@chris-wilson.co.uk
Chris Wilson [Sat, 17 Aug 2019 23:25:11 +0000 (00:25 +0100)]
drm/i915: Propagate fence errors
Errors spread like wildfire, and must eventually be returned to the
user. They need to be captured and passed along the flow of fences,
infecting each in turn with the existing error, until finally they fall
out of a user visible result.
Michal Wajdeczko [Sun, 18 Aug 2019 09:52:04 +0000 (09:52 +0000)]
drm/i915/uc: Never fail on HuC firmware errors
There is no need to mark whole GPU as wedged just because
of the custom HuC fw failure as users can always verify
actual HuC firmware status using existing HUC_STATUS ioctl.
Michal Wajdeczko [Sun, 18 Aug 2019 09:52:03 +0000 (09:52 +0000)]
drm/i915/uc: Don't always fail on unavailable GuC firmware
If we failed to fetch default GuC firmware and we didn't plan
to use it for the submission and we never have used GuC before
then we may continue normal driver load, no need to declare
GPU wedged (we can use execlist for submission) and it is safe
to run without the HuC (users will check HuC status anyway).
Michal Wajdeczko [Sun, 18 Aug 2019 09:52:02 +0000 (09:52 +0000)]
drm/i915/guc: Don't open log relay if GuC is not running
As we plan to continue driver load after GuC initialization
failure, we can't assume that GuC log data will be available
just because GuC was initially enabled. We must check that
GuC is still running instead.
Michal Wajdeczko [Sat, 17 Aug 2019 13:11:43 +0000 (13:11 +0000)]
drm/i915/uc: Cleanup fw fetch on every GuC/HuC init failure
Be consistent and always perform fw fetch cleanup in GuC/HuC specific
init functions on every failure. Also while converting firmware
status to error, stop treating SELECTED as non-error, as long term
we should not see it.
Michal Wajdeczko [Sat, 17 Aug 2019 13:11:42 +0000 (13:11 +0000)]
drm/i915/uc: Cleanup fw fetch only if it was successful
We can rely on firmware status AVAILABLE to determine if any
firmware cleanup is required. Also don't unconditionally reset
fw status to SELECTED as we will loose MISSING/ERROR codes.
Chris Wilson [Sat, 17 Aug 2019 07:37:11 +0000 (08:37 +0100)]
drm/i915/selftests: Check the context size
Add a redzone to our context image and check the HW does not write into
after a context save, to verify that we have the correct context size.
(This does vary with feature bits, so test with a live setup that should
match how we run userspace.)
v2: Check the redzone on every context unpin
v3: Use a kernel context to prevent loading garbage for ringbuffer
submission
Mika Kuoppala [Fri, 16 Aug 2019 09:47:54 +0000 (12:47 +0300)]
drm/i915/gtt: Fold gen8 insertions into one
As we give page directory pointer (lvl 3) structure
for pte insertion, we can fold both versions into
one function by teaching it to get pdp regardless
of top level.
Michal Wajdeczko [Fri, 16 Aug 2019 20:56:58 +0000 (20:56 +0000)]
drm/i915/uc: Add explicit DISABLED state for firmware
We really need to have separate NOT_SUPPORTED state (for
lack of hardware support) and DISABLED state (to indicate
user decision) as we will have to take special steps even
if GuC firmware is now disabled but hardware exists and
could have been previously used.
v2: fix logic (Chris/CI)
v3: use proper check to avoid probe failure (CI)
v4: explain status transitions (Chris)
To reduce the number of explicit dev_priv->uncore calls in the display
code ahead of the introduction of dev_priv->de_uncore, this patch
introduces a wrapper for one of the main usages of it, the register
waits. When we transition to the new uncore, we can just update the
wrapper to point to the appropriate structure.
Since the vast majority of waits are on a set or clear of a bit or mask,
add set & clear flavours of the wrapper to simplify the code.
Chris Wilson [Fri, 16 Aug 2019 17:16:08 +0000 (18:16 +0100)]
drm/i915/execlists: Lift process_csb() out of the irq-off spinlock
If we only call process_csb() from the tasklet, though we lose the
ability to bypass ksoftirqd interrupt processing on direct submission
paths, we can push it out of the irq-off spinlock.
The penalty is that we then allow schedule_out to be called concurrently
with schedule_in requiring us to handle the usage count (baked into the
pointer itself) atomically.
As we do kick the tasklets (via local_bh_enable()) after our submission,
there is a possibility there to see if we can pull the local softirq
processing back from the ksoftirqd.
v2: Store the 'switch_priority_hint' on submission, so that we can
safely check during process_csb().
Chris Wilson [Fri, 16 Aug 2019 12:10:00 +0000 (13:10 +0100)]
drm/i915: Markup expected timeline locks for i915_active
As every i915_active_request should be serialised by a dedicated lock,
i915_active consists of a tree of locks; one for each node. Markup up
the i915_active_request with what lock is supposed to be guarding it so
that we can verify that the serialised updated are indeed serialised.
Chris Wilson [Fri, 16 Aug 2019 12:09:59 +0000 (13:09 +0100)]
drm/i915/gt: Mark context->active_count as protected by timeline->mutex
We use timeline->mutex to protect modifications to
context->active_count, and the associated enable/disable callbacks.
Due to complications with engine-pm barrier there is a path where we used
a "superlock" to provide serialised protect and so could not
unconditionally assert with lockdep that it was always held. However,
we can mark the mutex as taken (noting that we may be nested underneath
ourselves) which means we can be reassured the right timeline->mutex is
always treated as held and let lockdep roam free.
Michal Wajdeczko [Fri, 16 Aug 2019 10:54:59 +0000 (10:54 +0000)]
drm/i915/wopcm: Try to use already locked WOPCM layout
If WOPCM layout is already locked in HW we shouldn't continue
with our own partitioning as it could be likely different and
we will be unable to enforce it and fail. Instead we should try
to reuse what is already programmed, maybe there will be a fit.
This should enable us to reload driver with slightly different
HuC firmware (or even without HuC) without need to reboot.
v2: reordered/rebased
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190816105501.31020-4-michal.wajdeczko@intel.com
Michał Winiarski [Fri, 16 Aug 2019 10:54:57 +0000 (10:54 +0000)]
drm/i915/uc: Move FW size sanity check back to fetch
While we need to know WOPCM size to do this sanity check, it has more to
do with FW than with WOPCM. Let's move the check to fetch phase, it's
not like WOPCM is going to grow in the meantime.
v2: rebased
v3: use __intel_uc_fw_get_upload_size (Daniele)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jackie Li <yaodong.li@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190816105501.31020-2-michal.wajdeczko@intel.com
Matthew Auld [Fri, 16 Aug 2019 10:53:57 +0000 (11:53 +0100)]
drm/i915/buddy: use kmemleak_update_trace
Since nodes are cached in a free-list, and potentially marked as free
without actually being destroyed, thus allowing them to be
opportunistically re-allocated, we should apply kmemleak_update_trace
every time a node is given a new owner and marked as allocated, to aid
in debugging.
Chris Wilson [Fri, 16 Aug 2019 07:46:35 +0000 (08:46 +0100)]
drm/i915: Extract intel_frontbuffer active tracking
Move the active tracking for the frontbuffer operations out of the
i915_gem_object and into its own first class (refcounted) object. In the
process of detangling, we switch from low level request tracking to the
easier i915_active -- with the plan that this avoids any potential
atomic callbacks as the frontbuffer tracking wishes to sleep as it
flushes.
Chris Wilson [Thu, 15 Aug 2019 20:57:09 +0000 (21:57 +0100)]
drm/i915: Protect request retirement with timeline->mutex
Forgo the struct_mutex requirement for request retirement as we have
been transitioning over to only using the timeline->mutex for
controlling the lifetime of a request on that timeline.
Chris Wilson [Thu, 15 Aug 2019 20:57:08 +0000 (21:57 +0100)]
drm/i915/gt: Guard timeline pinning without relying on struct_mutex
In preparation for removing struct_mutex from around context retirement,
we need to make timeline pinning and unpinning safe. Since multiple
engines/contexts can share a single timeline, we cannot rely on
borrowing the context mutex (otherwise we could state that the timeline
is only pinned/unpinned inside the context pin/unpin and so guarded by
it). However, we only perform a sequence of atomic operations inside the
timeline pin/unpin and the sequence of those operations is safe for a
concurrent unpin / pin, so we can relax the struct_mutex requirement.
Chris Wilson [Thu, 15 Aug 2019 20:57:07 +0000 (21:57 +0100)]
drm/i915/gt: Convert timeline tracking to spinlock
Convert the active_list manipulation of timelines to use spinlocks so
that we can perform the updates from underneath a quick interrupt
callback, if need be.
Chris Wilson [Thu, 15 Aug 2019 04:20:30 +0000 (05:20 +0100)]
drm/i915: Move tasklet kicking to __i915_request_queue caller
Since __i915_request_queue() may be called from hardirq (timer) context,
we cannot use local_bh_disable/enable at the lower level. As we do want
to kick the tasklet to speed up initial submission or preemption for
normal client submission, lift it to the normal process context
callpath.
Mika Kuoppala [Thu, 15 Aug 2019 09:49:29 +0000 (12:49 +0300)]
drm/i915/icl: Add gen11 specific render breadcrumbs
Flush according to what gen11 expects when writing
breadcrumbs. As only the seqnowrite + flush differs
between engine and gens, enclose the footer to
helper.
v2: avoid problem of sane local naming by not using them
Mika Kuoppala [Thu, 15 Aug 2019 08:30:53 +0000 (11:30 +0300)]
drm/i915/icl: Implement gen11 flush including tile cache
Add tile cache flushing for gen11. To relive us from the
burden of previous obsolete workarounds, make a dedicated
flush/invalidate callback for gen11.
To fortify an independent single flush, do post
sync op as there are indications that without it
we don't flush everything. This should also make this
callback more readily usable in tgl (see l3 fabric flush).
Dan reported the following static checker warning:
drivers/gpu/drm/i915/selftests/i915_buddy.c:670 igt_buddy_alloc_range()
error: we previously assumed 'block' could be null (see line 665)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190815103210.11802-1-matthew.auld@intel.com
Chris Wilson [Thu, 15 Aug 2019 09:36:04 +0000 (10:36 +0100)]
drm/i915: Convert a few more bland dmesg info to be device specific
Looking around the GT initialisation, we have a few log messages we
think are interesting enough present to the user (such as the amount of L4
cache) and a few to inform them of the result of actions or conflicting
HW restrictions (i.e. quirks). These are device specific messages, so
use the dev family of printk.
Chris Wilson [Tue, 13 Aug 2019 20:09:05 +0000 (21:09 +0100)]
drm/i915: Serialise read/write of the barrier's engine
We use the request pointer inside the i915_active_node as the indicator
of the barrier's status; we mark it as used during
i915_request_add_active_barriers(), and search for an available barrier
in reuse_idle_barrier(). That check must be carefully serialised to
ensure we do use an engine for the barrier and not just a random
pointer. (Along the other reuse path, we are fully serialised by the
timeline->mutex.) The acquisition of the barrier itself is ordered through
the strong memory barrier in llist_del_all().
Chris Wilson [Tue, 13 Aug 2019 18:21:12 +0000 (19:21 +0100)]
drm/i915: Disregard drm_mode_config.fb_base
The fb_base is only used for communicating the GTT BAR from one piece of
the display code (kms setup) to another (fbdev). What is required in the
fbdev is just the aperture address which should be derived from the
bo we allocate for the framebuffer directly.
The same appears true for drm/; it is not used by the core or the uAPI,
it is merely for conveniently passing a device address from bit of
display management code to another.
v2: Note that since we only expose enough of a system map to cover our
single framebuffer, the screen_base/size and the smem are one and the
same.
The engine->guc_id is GuC FW defined and it is not guaranteed to be
below I915_NUM_ENGINES, so we shouldn't use it with the i915-defined
client->submissions, as we might overflow.
Instead of fixing it, just get rid of client->submissions, because the
information we get from it is not interesting anymore now that we only
have 1 client.
A new macro that is going to be added in a further patch will need to
adjust the offset returned by _MMIO_TRANS2(), so here adding
_TRANS2() and moving most of the implementation of _MMIO_TRANS2() to
it and while at it taking the opportunity to rename pipe to trans.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Dhinakaran Pandiyan <dhinakaran.pandiya@intel.com> Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiya@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730224753.14907-2-jose.souza@intel.com
Chris Wilson [Tue, 13 Aug 2019 19:07:05 +0000 (20:07 +0100)]
drm/i915: Push the wakeref->count deferral to the backend
If the backend wishes to defer the wakeref parking, make it responsible
for unlocking the wakeref (i.e. bumping the counter). This allows it to
time the unlock much more carefully in case it happens to needs the
wakeref to be active during its deferral.
For instance, during engine parking we may choose to emit an idle
barrier (a request). To do so, we borrow the engine->kernel_context
timeline and to ensure exclusive access we keep the
engine->wakeref.count as 0. However, to submit that request to HW may
require a intel_engine_pm_get() (e.g. to keep the submission tasklet
alive) and before we allow that we have to rewake our wakeref to avoid a
recursive deadlock.
drm/i915/tgl: Fix missing parentheses on TGL_TRANS_DDI_FUNC_CTL_VAL_TO_PORT
In this case we want to apply the mask and then shift so the
parentheses is needed.
SPANK! SPANK! SPANK! Naughty programmer!
Fixes: 9749a5b6c09f ("drm/i915/tgl: Fix the read of the DDI that transcoder is attached to") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190812175405.14479-1-jose.souza@intel.com
Gao, Fred [Thu, 18 Jul 2019 01:39:01 +0000 (09:39 +0800)]
drm/i915/gvt: Utility for valid command length check
Add utility for valid command length check.
v2: Add F_VAL_CONST flag to identify the value is const
although LEN maybe variable. (Zhenyu)
v3: unused code removal, flag rename/conflict. (Zhenyu)
v4: redefine F_IP_ADVANCE_CUSTOM and move the check function to
next patch. (Zhenyu)
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Gao, Fred <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Zhi Wang [Mon, 22 Jul 2019 11:07:07 +0000 (14:07 +0300)]
drm/i915/gvt: factor out tlb and mocs register offset table
Factor out tlb and mocs register offset table to fix the issues reported
by klocwork, #512 and #550. Mostly, the reason why the klocwork reports
these problems is because there can be possbilities for platforms, which
have more rings than the ring offset table, to take the dirty data from
the stack as the register offset. It results to a random HW register
offset writting in this scenairo when doing context switch between vGPUs.
After the factoring, the ring offset table of TLB and MOCS should be per
platform.
v2:
- Enable TLB register switch for GEN8. (Zhenyu)
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
drm/i915/gvt: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Because there is no need to check these functions, a number of local
functions can be made to return void to simplify things as nothing can
fail.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: intel-gvt-dev@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Michal Wajdeczko [Tue, 13 Aug 2019 08:15:59 +0000 (08:15 +0000)]
drm/i915/uc: Log fw status changes only under debug config
We don't care about internal firmware status changes unless
we are doing some real debugging. Note that our CI is not
using DRM_I915_DEBUG_GUC config by default so use it.
Chris Wilson [Mon, 12 Aug 2019 20:36:26 +0000 (21:36 +0100)]
drm/i915/guc: Use a local cancel_port_requests
Since execlists and the guc have diverged in their port tracking, we
cannot simply reuse the execlists cancellation code as it leads to
unbalanced reference counting. Use a local, simpler routine for the guc.
We rely on the tasklet to update the GT PM refcount, so we can't disable
it even if we've processed all the requests for the engine because we
might have detected the request completion before the interrupt arrived.
Since on all platforms on which we plan to support guc submission we
don't allow disabling the breadcrumb interrupts, we can further siplify
the park/unpark flow by removing the interrupt pin/unpin. A BUG_ON has
been added to catch changes to this flow that would require us to
restore some kind of pinning.
v2: split removal of engine_pin/unpin_breadcrumbs_irq to its own
patch (chris)
Chris Wilson [Mon, 12 Aug 2019 17:48:04 +0000 (18:48 +0100)]
drm/i915/overlay: Switch to using i915_active tracking
Remove the raw i915_active_request tracking in favour of the higher
level i915_active tracking for the sole purpose of making the lockless
transition easier in later patches.
Chris Wilson [Mon, 12 Aug 2019 17:48:03 +0000 (18:48 +0100)]
drm/i915: Forgo last_fence active request tracking
We were using the last_fence to track the last request that used this
vma that might be interpreted by a fence register and forced ourselves
to wait for this request before modifying any fence register that
overlapped our vma. Due to requirement that we need to track any XY_BLT
command, linear or tiled, this in effect meant that we have to track the
vma for its active lifespan anyway, so we can forgo the explicit
last_fence tracking and just use the whole vma->active.
Another solution would be to pipeline the register updates, and would
help resolve some long running stalls for gen3 (but only gen 2 and 3!)
Andi Shyti [Sun, 11 Aug 2019 21:06:33 +0000 (22:06 +0100)]
drm/i915: Extract general GT interrupt handlers
i915_irq.c is large. It serves as the central dispatch and handler for
all of our device interrupts. Lets break it up by pulling out the GT
interrupt handlers.
i915_irq.c is large. It serves as the central dispatch and handler for
all of our device interrupts. Pull out the GT pm interrupt handling
(leaving the central dispatch) so that we can encapsulate the logic a
little better.
Chris Wilson [Mon, 12 Aug 2019 09:10:38 +0000 (10:10 +0100)]
drm/i915/execlists: Avoid sync calls during park
Since we allow ourselves to use non-process context during parking, we
cannot allow ourselves to sleep and in particular cannot call
del_timer_sync() -- but we can use a plain del_timer().
Anshuman Gupta [Sun, 11 Aug 2019 10:02:32 +0000 (15:32 +0530)]
drm/i915/tgl: Fixing up list of PG3 power domains.
The DDI-IO power wells (PWR_WELL_CTL_DDI) are backing
the IO/PHY functionality, which doesn't need the PG3
power power well. Accordingly fixing up the list of
PG3 power domains.
Anshuman Gupta [Sun, 11 Aug 2019 08:19:08 +0000 (13:49 +0530)]
drm/i915/icl: Remove DDI IO power domain from PG3 power domains
The DDI-IO power wells (PWR_WELL_CTL_DDI) are backing
the IO/PHY functionality, which doesn't need the PG3
power power well. Accordingly fixing up the list of
PG3 power domains.
v2: Removed "DDI E/F IO"power domain as well [Imre]
Michal Wajdeczko [Sun, 11 Aug 2019 19:51:32 +0000 (19:51 +0000)]
drm/i915/uc: Use -EIO code for GuC initialization failures
Since commit 6ca9a2beb54a ("drm/i915: Unwind i915_gem_init() failure")
we believed that we correctly handle all errors encountered during
GuC initialization, including special one that indicates request to
run driver with disabled GPU submission (-EIO).
Unfortunately since commit 121981fafe69 ("drm/i915/guc: Combine
enable_guc_loading|submission modparams") we stopped using that
error code to avoid unwanted fallback to execlist submission mode.
In result any GuC initialization failure was treated as non-recoverable
error leading to driver load abort, so we could not even read related
GuC error log to investigate cause of the problem.
For now always return -EIO on any uC hardware related failure.
Michal Wajdeczko [Mon, 12 Aug 2019 07:39:49 +0000 (07:39 +0000)]
drm/i915/uc: Include HuC firmware version in summary
After successful uC initialization we are reporting GuC
firmware version and status of GuC submission and HuC.
Add HuC fw version to this report to make it complete,
but also skip all HuC info if HuC is not supported.
Chris Wilson [Sat, 10 Aug 2019 09:03:28 +0000 (10:03 +0100)]
drm/i915: Remove unused debugfs/i915_emon_status
Before we start upon our great GT interrupt refactor, throw out the
cruft! In this case, it is an unloved debugfs showing the current ips
status, a fairly meaningless bunch of numbers that we are not checking.
Matthew Auld [Fri, 9 Aug 2019 20:29:24 +0000 (21:29 +0100)]
drm/i915: buddy allocator
Simple buddy allocator. We want to allocate properly aligned
power-of-two blocks to promote usage of huge-pages for the GTT, so 64K,
2M and possibly even 1G. While we do support allocating stuff at a
specific offset, it is more intended for preallocating portions of the
address space, say for an initial framebuffer, for other uses drm_mm is
probably a much better fit. Anyway, hopefully this can all be thrown
away if we eventually move to having the core MM manage device memory.
Matthew Auld [Sat, 10 Aug 2019 17:43:38 +0000 (18:43 +0100)]
drm/i915/blt: support copying objects
We can already clear an object with the blt, so try to do the same to
support copying from one object backing store to another. Really this is
just object -> object, which is not that useful yet, what we really want
is two backing stores, but that will require some vma rework first,
otherwise we are stuck with "tmp" objects.
Matthew Auld [Fri, 9 Aug 2019 19:34:56 +0000 (20:34 +0100)]
drm/i915/gtt: disable 2M pages for pre-gen11
We currently disable THP(Transparent-Huge-Pages) for our shmem objects
due to a performance regression with read BW in some internal
benchmarks. Given that this is our main source of 2M pages, there really
isn't much point in enabling 2M GTT pages, especially as that comes at
the cost of disabling the GTT cache. However from gen11 it looks like we
should hopefully see the HW issue resolved. Given this opt for only
enabling 2M GTT pages from gen11 onwards.
Matthew Auld [Fri, 9 Aug 2019 19:34:55 +0000 (20:34 +0100)]
drm/i915/gtt: enable GTT cache by default
For some platforms the GTT cache is by default not enabled, and
currently where we explicitly enable it, we make it conditional on 2M GTT
page support, since the BSpec states that we must disable it if we
enable 2M/1G pages. To make this more consistent opt for blanket
enabling the GTT cache for all relevant gens in a single place, while
still keeping the same behaviour of checking for 2M support.
BSpec: 9314
BSpec: 423 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190809193456.3836-1-matthew.auld@intel.com
Matthew Auld [Sat, 10 Aug 2019 10:50:08 +0000 (11:50 +0100)]
drm/i915/selftests: move gpu-write-dw into utils
Using the gpu to write to some dword over a number of pages is rather
useful, and we already have two copies of such a thing, and we don't
want a third so move it to utils. There is probably some other stuff
also...
Matthew Auld [Sat, 10 Aug 2019 09:29:45 +0000 (10:29 +0100)]
drm/i915/blt: bump the size restriction
As pointed out by Chris, with our current approach we are actually
limited to S16_MAX * PAGE_SIZE for our size when using the blt to clear
pages. Keeping things simple try to fix this by reducing the copy to a
sequence of S16_MAX * PAGE_SIZE blocks.
Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: hide the details of the engine pool inside emit_vma] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190810092945.2762-1-chris@chris-wilson.co.uk
Matthew Auld [Sat, 10 Aug 2019 09:17:47 +0000 (10:17 +0100)]
drm/i915/blt: don't assume pinned intel_context
Currently we just pass in bcs0->engine_context so it matters not, but in
the future we may want to pass in something that is not a
kernel_context, so try to be a bit more generic.
Multiple uncore structures will share the debug infrastructure, so
move it to a common place and add extra locking around it.
Also, since we now have a separate object, it is cleaner to have
dedicated functions working on the object to stop and restart the
mmio debug. Apart from the cosmetic changes, this patch introduces
2 functional updates:
- All calls to check_for_unclaimed_mmio will now return false when
the debug is suspended, not just the ones that are active only when
i915_modparams.mmio_debug is set. If we don't trust the result of the
check while a user is doing mmio access then we shouldn't attempt the
check anywhere.
- i915_modparams.mmio_debug is not save/restored anymore around user
access. The value is now never touched by the kernel while debug is
disabled so no need for save/restore.
The filesystem reconfigure API is undergoing a transition, breaking our
current code. As we only set the default options, we can simply remove
the call to s_op->remount_fs(). In the future, when HW permits, we can
try re-enabling huge page support, albeit as suggested with new per-file
controls.
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190808172226.18306-1-chris@chris-wilson.co.uk
Chris Wilson [Fri, 9 Aug 2019 18:25:18 +0000 (19:25 +0100)]
drm/i915: Lift timeline into intel_context
Move the timeline from being inside the intel_ring to intel_context
itself. This saves much pointer dancing and makes the relations of the
context to its timeline much clearer.
Chris Wilson [Fri, 9 Aug 2019 18:25:16 +0000 (19:25 +0100)]
drm/i915/gt: Make deferred context allocation explicit
Refactor the backends to handle the deferred context allocation in a
consistent manner, and allow calling it as an explicit first step in
pinning a context for the first time. This should make it easier for
backends to keep track of partially constructed contexts from
initialisation.
Chris Wilson [Fri, 9 Aug 2019 18:25:15 +0000 (19:25 +0100)]
drm/i915: Remove i915_gem_context_create_gvt()
As we are phasing out using the GEM context for internal clients that
need to manipulate logical context state directly, remove the
constructor for the GVT context. We are not using it for anything other
than default setup and allocation of an i915_ppgtt.
Chris Wilson [Thu, 8 Aug 2019 07:41:52 +0000 (08:41 +0100)]
drm/i915: Drop the fudge warning on ring restart for ctg/elk
Since we have already stopped the ring, cleared the ring, disabled the
ring (and verifying the ring is clear), a later debug message that the
ring is no longer clear serves no function. It appears it restarts
anyway, and we verify that the ring started correctly afterwards.
Chris Wilson [Fri, 9 Aug 2019 09:10:09 +0000 (10:10 +0100)]
drm/i915: Replace global bsd_dispatch_index with random seed
We keep a global seed for the legacy BSD round-robin selector, but in
our testing of multiple simultaneous client workloads, a random seed
spreads the load more evenly. (As even as an initial round-robin selector
can be!) Removing the global is one less variable we have to find a home
for!
We can simulate multi-client (both same and mixed workloads) using
igt/gem_wsim to work out optimal strategies and then compare our
simulation with the actual transcoder on multi-engine machines. This
fixed round-robin turns out to be one of the worst methods.
No user is advised to use this method; the current suggestion is to use
a virtual engine for agnostic batches, randomised submission or using
the busyness tracking to select the most idle engine at the time of
dispatch. At the present time, intel-media is explicit, but libva still
seems to use it, with the exception of batches that must execute on vcs0.
Oh well.
Chris Wilson [Fri, 9 Aug 2019 12:31:53 +0000 (13:31 +0100)]
drm/i915: Check for a second VCS engine more carefully
To use the legacy BSD selector, you must have a second VCS engine, or
else the ABI simply maps the request for another engine onto VCS0.
However, we only checked a single VCS1 location and overlooking the
possibility of a sparse VCS set being mapped to the dense ABI.
v2: num_vcs_engines() turns out to be reusable and futureproof it so we
never have to worry about this silly bit of ABI again!
Chris Wilson [Fri, 9 Aug 2019 07:37:23 +0000 (08:37 +0100)]
drm/i915/execlists: Backtrack along timeline
After a preempt-to-busy, we may find an active request that is caught
between execution states. Walk back along the timeline instead of the
execution list to be safe.
[ 106.417541] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 106.417659] ==================================================================
[ 106.418041] BUG: KASAN: slab-out-of-bounds in __execlists_reset+0x2f2/0x440 [i915]
[ 106.418123] Read of size 8 at addr ffff888703506b30 by task swapper/1/0
[ 106.418194]
[ 106.418267] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U 5.3.0-rc3+ #5
[ 106.418344] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 106.418434] Call Trace:
[ 106.418508] <IRQ>
[ 106.418585] dump_stack+0x5b/0x90
[ 106.418941] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.419022] print_address_description+0x67/0x32d
[ 106.419376] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.419731] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.419810] __kasan_report.cold.6+0x1a/0x3c
[ 106.419888] ? __trace_bprintk+0xc0/0xd0
[ 106.420239] ? __execlists_reset+0x2f2/0x440 [i915]
[ 106.420318] check_memory_region+0x144/0x1c0
[ 106.420671] __execlists_reset+0x2f2/0x440 [i915]
[ 106.421029] execlists_reset+0x3d/0x50 [i915]
[ 106.421387] intel_engine_reset+0x203/0x3a0 [i915]
[ 106.421744] ? igt_reset_nop+0x2b0/0x2b0 [i915]
[ 106.421825] ? _raw_spin_trylock_bh+0xe0/0xe0
[ 106.421901] ? rcu_core+0x1b9/0x6a0
[ 106.422251] preempt_reset+0x9a/0xf0 [i915]
[ 106.422333] tasklet_action_common.isra.15+0xc0/0x1e0
[ 106.422685] ? execlists_submit_request+0x200/0x200 [i915]
[ 106.422764] __do_softirq+0x106/0x3cf
[ 106.422840] irq_exit+0xdc/0xf0
[ 106.422914] smp_apic_timer_interrupt+0x81/0x1c0
[ 106.422988] apic_timer_interrupt+0xf/0x20
[ 106.423059] </IRQ>
[ 106.423144] RIP: 0010:cpuidle_enter_state+0xc3/0x620
[ 106.423222] Code: 24 0f 1f 44 00 00 31 ff e8 da 87 9c ff 80 7c 24 10 00 74 12 9c 58 f6 c4 02 0f 85 33 05 00 00 31 ff e8 c1 77 a3 ff fb 45 85 e4 <0f> 89 bf 02 00 00 48 8d 7d 10 e8 4e 45 b9 ff c7 45 10 00 00 00 00
[ 106.423311] RSP: 0018:ffff88881c30fda8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 106.423390] RAX: 0000000000000000 RBX: ffffffff825b4c80 RCX: ffffffff810c8a00
[ 106.423465] RDX: dffffc0000000000 RSI: 0000000039f89620 RDI: ffff88881f6b00a8
[ 106.423540] RBP: ffff88881f6b5bf8 R08: 0000000000000002 R09: 000000000002ed80
[ 106.423616] R10: 0000003fdd956146 R11: ffff88881c2d1e47 R12: 0000000000000008
[ 106.423691] R13: 0000000000000008 R14: ffffffff825b4f80 R15: ffffffff825b4fc0
[ 106.423772] ? sched_idle_set_state+0x20/0x30
[ 106.423851] ? cpuidle_enter_state+0xa6/0x620
[ 106.423874] ? tick_nohz_idle_stop_tick+0x1d1/0x3f0
[ 106.423896] cpuidle_enter+0x37/0x60
[ 106.423919] do_idle+0x246/0x280
[ 106.423941] ? arch_cpu_idle_exit+0x30/0x30
[ 106.423964] ? __wake_up_common+0x46/0x240
[ 106.423986] cpu_startup_entry+0x14/0x20
[ 106.424009] start_secondary+0x1b0/0x200
[ 106.424031] ? set_cpu_sibling_map+0x990/0x990
[ 106.424054] secondary_startup_64+0xa4/0xb0
[ 106.424075]
[ 106.424096] Allocated by task 626:
[ 106.424119] save_stack+0x19/0x80
[ 106.424143] __kasan_kmalloc.constprop.7+0xc1/0xd0
[ 106.424165] kmem_cache_alloc+0xb2/0x1d0
[ 106.424277] i915_sched_lookup_priolist+0x1ab/0x320 [i915]
[ 106.424385] execlists_submit_request+0x73/0x200 [i915]
[ 106.424498] submit_notify+0x59/0x60 [i915]
[ 106.424600] __i915_sw_fence_complete+0x9b/0x330 [i915]
[ 106.424713] __i915_request_commit+0x4bf/0x570 [i915]
[ 106.424818] intel_engine_pulse+0x213/0x310 [i915]
[ 106.424925] context_close+0x22f/0x470 [i915]
[ 106.425033] i915_gem_context_destroy_ioctl+0x7b/0xa0 [i915]
[ 106.425058] drm_ioctl_kernel+0x131/0x170
[ 106.425081] drm_ioctl+0x2d9/0x4f1
[ 106.425104] do_vfs_ioctl+0x115/0x890
[ 106.425126] ksys_ioctl+0x35/0x70
[ 106.425147] __x64_sys_ioctl+0x38/0x40
[ 106.425169] do_syscall_64+0x66/0x220
[ 106.425191] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 106.425213]
[ 106.425234] Freed by task 0:
[ 106.425255] (stack is not available)
[ 106.425276]
[ 106.425297] The buggy address belongs to the object at ffff888703506a40
[ 106.425297] which belongs to the cache i915_priolist of size 104
[ 106.425321] The buggy address is located 136 bytes to the right of
[ 106.425321] 104-byte region [ffff888703506a40, ffff888703506aa8)
[ 106.425345] The buggy address belongs to the page:
[ 106.425367] page:ffffea001c0d4180 refcount:1 mapcount:0 mapping:ffff88873e1cf740 index:0xffff888703506e40 compound_mapcount: 0
[ 106.425391] flags: 0x8000000000010200(slab|head)
[ 106.425415] raw: 8000000000010200ffffea0020192b88ffff8888174b5450ffff88873e1cf740
[ 106.425439] raw: ffff888703506e40000000000010000e00000001ffffffff0000000000000000
[ 106.425464] page dumped because: kasan: bad access detected
[ 106.425486]
[ 106.425506] Memory state around the buggy address:
[ 106.425528] ffff888703506a00: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
[ 106.425551] ffff888703506a80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
[ 106.425573] >ffff888703506b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 106.425597] ^
[ 106.425619] ffff888703506b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 106.425642] ffff888703506c00: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
[ 106.425664] ==================================================================
Chris Wilson [Fri, 9 Aug 2019 11:07:52 +0000 (12:07 +0100)]
drm/i915: Free the imported shmemfs file for phys objects
Matthew spotted that we lost the fput() for phys objects now that we are
not relying on the core to cleanup the GEM object. (For the record, phys
objects import the shmemfs from their original set of pages and keep it
to provide swap space, but we never transform back into a shmem object.)
Reported-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 0c159ffef628 ("drm/i915/gem: Defer obj->base.resv fini until RCU callback") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809110752.19763-1-chris@chris-wilson.co.uk
Jani Nikula [Thu, 8 Aug 2019 13:42:49 +0000 (16:42 +0300)]
drm/i915: extract i915_gem_shrinker.h from i915_drv.h
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
from i915_drv.h to avoid sprinkling includes all over the place; this
can be changed as a follow-up if necessary.
Jani Nikula [Thu, 8 Aug 2019 13:42:48 +0000 (16:42 +0300)]
drm/i915: extract gem/i915_gem_stolen.h from i915_drv.h
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
from i915_drv.h to avoid sprinkling includes all over the place; this
can be changed as a follow-up if necessary.
Jani Nikula [Thu, 8 Aug 2019 13:42:47 +0000 (16:42 +0300)]
drm/i915: extract i915_memcpy.h from i915_drv.h
It used to be handy that we only had a couple of headers, but over time
i915_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.
Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.