John Harrison [Thu, 22 Sep 2022 20:12:09 +0000 (13:12 -0700)]
drm/i915/guc: Enable compute scheduling on DG2
DG2 has issues. To work around one of these the GuC must schedule
apps in an exclusive manner across both RCS and CCS. That is, if a
context from app X is running on RCS then all CCS engines must sit
idle even if there are contexts from apps Y, Z, ... waiting to run. A
certain OS favours RCS to the total starvation of CCS. Linux does not.
Hence the GuC now has a scheduling policy setting to control this
abitration.
Chris Wilson [Fri, 16 Sep 2022 20:48:23 +0000 (13:48 -0700)]
drm/i915/gt: Bump the reset-failure timeout to 60s
If attempting to perform a GT reset takes long than 5 seconds (including
resetting the display for gen3/4), then we declare all hope lost and
discard all user work and wedge the device to prevent further
misbehaviour. 5 seconds is too short a time for such drastic action, as
we may be stuck on other timeouts and watchdogs. If we allow a little
bit longer before hitting the big red button, we should at the very
least capture other hung task indicators pointing towards the reason why
the reset was hanging; and allow more marginal cases the extra headroom
to complete the reset without further collateral damage.
Chris Wilson [Mon, 26 Sep 2022 15:50:18 +0000 (16:50 +0100)]
drm/i915/gt: Move scratch page into system memory on all platforms
The scratch page should never be accessed, and is only assigned as a
filler page to redirection invalid userspace access. It is not of a
performance concern and so we prefer to have a single consistent
configuration across all platforms, reducing the pressure on device
memory and avoiding the direct device access that would be required to
initialise the scratch page.
Chris Wilson [Mon, 26 Sep 2022 15:33:33 +0000 (16:33 +0100)]
drm/i915/gt: Use i915_vm_put on ppgtt_create error paths
Now that the scratch page and page directories have a reference back to
the i915_address_space, we cannot do an immediate free of the ppgtt upon
error as those buffer objects will perform a later i915_vm_put in their
deferred frees.
The downside is that by replacing the onion unwind along the error
paths, the ppgtt cleanup must handle a partially constructed vm. This
includes ensuring that the vm->cleanup is set prior to the error path.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6900 Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Fixes: 4d8151ae5329 ("drm/i915: Don't free shared locks while shared") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v5.14+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220926153333.102195-1-matthew.auld@intel.com
Chris Wilson [Wed, 21 Sep 2022 13:52:58 +0000 (15:52 +0200)]
drm/i915/gt: Restrict forced preemption to the active context
When we submit a new pair of contexts to ELSP for execution, we start a
timer by which point we expect the HW to have switched execution to the
pending contexts. If the promotion to the new pair of contexts has not
occurred, we declare the executing context to have hung and force the
preemption to take place by resetting the engine and resubmitting the
new contexts.
This can lead to an unfair situation where almost all of the preemption
timeout is consumed by the first context which just switches into the
second context immediately prior to the timer firing and triggering the
preemption reset (assuming that the timer interrupts before we process
the CS events for the context switch). The second context hasn't yet had
a chance to yield to the incoming ELSP (and send the ACk for the
promotion) and so ends up being blamed for the reset.
If we see that a context switch has occurred since setting the
preemption timeout, but have not yet received the ACK for the ELSP
promotion, rearm the preemption timer and check again. This is
especially significant if the first context was not schedulable and so
we used the shortest timer possible, greatly increasing the chance of
accidentally blaming the second innocent context.
Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption") Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out") Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Tested-by: Andrzej Hajda <andrzej.hajda@intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220921135258.1714873-1-andrzej.hajda@intel.com
Matt Atwood [Tue, 20 Sep 2022 20:43:59 +0000 (13:43 -0700)]
drm/i915/dg2: introduce Wa_22015475538
Wa_22015475538 applies to all DG2 (and ATSM) skus. The workaround
implementation is identical to Wa_16011620976. LSC_CHICKEN_BIT_0_UDW is
a general render register instead of rcs so adding this move to the
proper wa init function.
Tvrtko Ursulin [Thu, 30 Jun 2022 12:57:16 +0000 (13:57 +0100)]
drm/i915/selftests: Remove flush_scheduled_work() from live_execlists
There are ongoing efforts to remove usages of flush_scheduled_work() from
drivers in order to avoid several cases of potentential problems when
flushing is done from certain contexts.
Remove the call from the live_execlists selftest. Its purpose was to be
thorough and sync with the execlists capture state handling, but that is
not strictly required for the test to function and can be removed.
Lucas De Marchi [Wed, 7 Sep 2022 23:08:41 +0000 (16:08 -0700)]
drm/i915: Noop lrc_init_wa_ctx() on recent/future platforms
Except for graphics version 8 and 9, nothing is done in
lrc_init_wa_ctx(). Assume this won't be needed on future platforms as
well and remove the warning.
Note that this function is not called for anything below version 8 since
those don't use either guc or execlist, i.e. HAS_EXECLISTS() is false.
Lucas De Marchi [Fri, 16 Sep 2022 17:36:08 +0000 (10:36 -0700)]
drm/i915/dgfx: Make failure to setup stolen non-fatal
There is no reason to consider the setup of Data Stolen Memory fatal on
dgfx and non-fatal on integrated. Move the debug and error propagation
around so both have the same behavior: non-fatal. Before this change,
loading i915 on a system with TGL + DG2 would result in just TGL
succeeding the initialization (without stolen).
Now loading i915 on the same system with an injected failure in
i915_gem_init_stolen():
$ dmesg | grep stolen
i915 0000:00:02.0: [drm] Injected failure, disabling use of stolen memory
i915 0000:00:02.0: [drm:init_stolen_smem [i915]] Skip stolen region: failed to setup
i915 0000:03:00.0: [drm] Injected failure, disabling use of stolen memory
i915 0000:03:00.0: [drm:init_stolen_lmem [i915]] Skip stolen region: failed to setup
Lucas De Marchi [Fri, 16 Sep 2022 17:36:07 +0000 (10:36 -0700)]
drm/i915: Split i915_gem_init_stolen()
Add some helpers: adjust_stolen(), request_smem_stolen_() and
init_reserved_stolen() that are now called by i915_gem_init_stolen() to
initialize each part of the Data Stolen Memory region.
Main goal is to split the reserved part within the stolen, also known as
WOPCM, as its calculation changes often per platform and is a big source
of confusion when handling stolen memory.
Lucas De Marchi [Fri, 16 Sep 2022 17:36:06 +0000 (10:36 -0700)]
drm/i915: Add missing mask when reading GEN12_DSMBASE
DSMBASE register is defined so BDSM bitfield contains the bits 63 to 20
of the base address of stolen. For the supported platforms bits 0-19 are
zero but that may not be true in future. Add the missing mask.
Matt Roper [Fri, 16 Sep 2022 01:43:45 +0000 (18:43 -0700)]
drm/i915: Split GAM and MSLICE steering
Although the bspec lists several MMIO ranges as "MSLICE," it turns out
that a subset of these are of a "GAM" subclass that has unique rules and
doesn't followed regular mslice steering behavior.
* Xe_HP SDV: GAM ranges must always be steered to 0,0. These
registers share the regular steering control register (0xFDC) with
other steering types
* DG2: GAM ranges must always be steered to 1,0. GAM registers have a
dedicated steering control register (0xFE0) so we can set the value
once at startup and rely on implicit steering. Technically the
hardware default should already be set to 1,0 properly, but it never
hurts to ensure that in the driver.
Nirmoy Das [Tue, 20 Sep 2022 17:06:28 +0000 (19:06 +0200)]
drm/i915: Do not cleanup obj with NULL bo->resource
For delayed BO release i915_ttm_delete_mem_notify()
gets called twice, once with proper bo->resource and
another time with NULL. We shouldn't do anything for
the 2nd time as we already cleaned up the obj once.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6850 Fixes: ad74457a6b5a96 ("drm/i915/dgfx: Release mmap on rpm suspend") Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220920170628.3391-1-nirmoy.das@intel.com
Ashutosh Dixit [Mon, 19 Sep 2022 16:24:01 +0000 (09:24 -0700)]
drm/i915: Perf_limit_reasons are only available for Gen11+
Register GT0_PERF_LIMIT_REASONS (0x1381a8) is available only for
Gen11+. Therefore ensure perf_limit_reasons sysfs/debugfs files are created
only for Gen11+. Otherwise on Gen < 5 accessing these files results in the
following oops:
<1> [88.829420] BUG: unable to handle page fault for address: ffffc90000bb81a8
<1> [88.829438] #PF: supervisor read access in kernel mode
<1> [88.829447] #PF: error_code(0x0000) - not-present page
Chris Wilson [Fri, 16 Sep 2022 09:24:03 +0000 (11:24 +0200)]
drm/i915/gem: Really move i915_gem_context.link under ref protection
i915_perf assumes that it can use the i915_gem_context reference to
protect its i915->gem.contexts.list iteration. However, this requires
that we do not remove the context from the list until after we drop the
final reference and release the struct. If, as currently, we remove the
context from the list during context_close(), the link.next pointer may
be poisoned while we are holding the context reference and cause a GPF:
v3: fix incorrect syntax of spin_lock() replacing spin_lock_irqsave()
v2: irqsave not required in a worker, neither conversion to irq safe
elsewhere (Tvrtko),
- perf: it's safe to call gen8_configure_context() even if context has
been closed, no need to check,
- drop unrelated cleanup (Andi, Tvrtko)
Reported-by: Mark Janes <mark.janes@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
References: a4e7ccdac38e ("drm/i915: Move context management under GEM") Fixes: f8246cf4d9a9 ("drm/i915/gem: Drop free_work for GEM contexts") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.12+ Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-3-janusz.krzysztofik@linux.intel.com
Due to i915_perf assuming that it can use the i915_gem_context reference
to protect its i915->gem.contexts.list iteration, we need to defer removal
of the context from the list until last reference to the context is put.
However, there is a risk of triggering kernel warning on contexts list not
empty at driver release time if we deleagate that task to a worker for
i915_gem_context_release_work(), unless that work is flushed first.
Unfortunately, it is not flushed on driver release. Fix it.
Instead of additionally calling flush_workqueue(), either directly or via
a new dedicated wrapper around it, replace last call to
i915_gem_drain_freed_objects() with existing i915_gem_drain_workqueue()
that performs both tasks.
Fixes: 75eefd82581f ("drm/i915: Release i915_gem_context from a worker") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: stable@kernel.org # v5.16+ Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-2-janusz.krzysztofik@linux.intel.com
Matt Roper [Sat, 10 Sep 2022 00:16:31 +0000 (17:16 -0700)]
drm/i915/mtl: Add MTL forcewake support
MTL has separate forcewake tables for the primary/render GT and the
media GT; each GT's intel_uncore will use a separate forcewake table and
should only initialize the domains that are relevant to that GT. The GT
ack register also moves to a new location of (GSI base + 0xDFC) on this
platform.
Note that although our uncore handlers take care of transparently
redirecting all register accesses in the media GT's GSI range to their
new offset at 0x380000, the forcewake ranges listed in the table should
use the final, post-translation offsets.
NOTE: There are two ranges in the media IP that have multicast
registers where the two register instances reside in different power
wells (either VD0 or VD2). We don't have an easy way to deal with this
today (and in fact we don't even access these register ranges in the
driver today), so for now we just mark those ranges as FORCEWAKE_ALL
which will cause all of the media power wells to be grabbed, ensuring
proper operation. If we start reading/writing in those ranges in the
future, we can re-visit whether it's worth adding extra steering
complexity into our forcewake support.
A patch was merged to remove the GuC log size override module
parameters. That patch was broken and caused kernel error messages on
boot in non CONFIG_DEBUG_GUC|GEM builds:
[ 12.085121] i915 0000:00:02.0: [drm] *ERROR* Zero GuC log crash dump size!
[ 12.092035] i915 0000:00:02.0: [drm] *ERROR* Zero GuC log debug size!
So fit it.
Fixes: f54e515c9180 ("drm/i915/guc: Remove log size module parameters") Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Julia Lawall <Julia.Lawall@inria.fr> Cc: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220913010929.2734885-2-John.C.Harrison@Intel.com
Ashutosh Dixit [Sat, 10 Sep 2022 14:38:44 +0000 (07:38 -0700)]
drm/i915/rps: Freq caps for MTL
For MTL, when reading from HW, RP0, RP1 (actuall RPe) and RPn freq use an
entirely different set of registers with different fields, bitwidths and
units.
v2: Move MTL check into a separate function (Jani)
drm/i915/debugfs: Add perf_limit_reasons in debugfs
Add perf_limit_reasons in debugfs. The upper 16 perf_limit_reasons RW "log"
bits are identical to the lower 16 RO "status" bits except that the "log"
bits remain set until cleared, thereby ensuring the throttling occurrence
is not missed. The clear fop clears the upper 16 "log" bits, the get fop
gets all 32 "log" and "status" bits.
v2: Expand commit message and clarify "log" and "status" bits in
comment (Rodrigo)
This, along with the changes already landed in commit 1c66a12ab431
("drm/i915: Handle each GT on init/release and suspend/resume") makes
engines from all GTs actually known to the driver.
To accomplish this we need to sprinkle a lot of for_each_gt calls around
but is otherwise pretty un-eventuful.
v2:
- Consolidate adjacent GT loops in a couple places. (Daniele)
If we abort driver initialisation in the middle of gt/engine discovery,
some engines will be fully setup and some not. Those incompletely setup
engines only have 'engine->release == NULL' and so will leak any of the
common objects allocated.
v2:
- Drop the destroy_pinned_context() helper for now. It's not really
worth it with just a single callsite at the moment. (Janusz)
John Harrison [Wed, 14 Sep 2022 23:46:05 +0000 (16:46 -0700)]
drm/i915/uc: Update to latest GuC and use new-format GuC/HuC names
Going forwards, the intention is for GuC firmware files to be named
for their major version only and HuC firmware files to have no version
number in the name at all. This patch adds those entries for all
platforms that are officially GuC/HuC enabled.
Also, update the expected GuC version numbers to the latest firmware
release for those platforms.
Lucas De Marchi [Tue, 13 Sep 2022 21:09:58 +0000 (14:09 -0700)]
drm/i915: Invert if/else ladder for stolen init
Continue converting the driver to the convention of last version first,
extending it to the future platforms. Now, any GRAPHICS_VER >= 11 will
be handled by the first branch.
Lucas De Marchi [Tue, 13 Sep 2022 21:09:57 +0000 (14:09 -0700)]
drm/i915/gt: Extract per-platform function for frequency read
Instead of calling read_clock_frequency() to walk the if/else ladder
per platform, move the ladder to intel_gt_init_clock_frequency() and
use one function per branch.
With the new logic, it's now clear the call to
gen9_get_crystal_clock_freq() was just dead code, as gen9 is handled by
another function and there is no version 10. Remove that function and
the caller.
v2: Correctly handle intel_gt_check_clock_frequency() that also calls
the function to read clock frequency (Gustavo)
Lucas De Marchi [Tue, 13 Sep 2022 21:09:56 +0000 (14:09 -0700)]
drm/i915: Invert if/else ladder for frequency read
Continue converting the driver to the convention of last version first,
extending it to the future platforms. Now, any GRAPHICS_VER >= 11 will
be handled by the first branch.
With the new ranges it's easier to see what platform a branch started to
be taken. Besides the >= 11 change, the branch taken for GRAPHICS_VER == 10
is also different, but currently there is no such platform in i915.
John Harrison [Wed, 14 Sep 2022 00:58:21 +0000 (17:58 -0700)]
drm/i915/uc: Fix issues with overriding firmware files
The earlier update to support reduced versioning of firmware files
introduced an issue with the firmware override module parameter. A
self test would specify an invalid file name (invalid meaning not in
the table) both with and without setting the override flag. The
*non-override* case would cause an infinite loop. I.e. a situation
that is impossible to hit outside of the selftest because either the
file name has come from the table in first place or it came from an
override. However, the override case was also broken in that it would
bypass some of the later processing.
The first fix is to update the scanning loop code so that if an
invalid file is passed in, it will exit rather than loop forever. So
if the impossible situation did somehow occur in the future, it
wouldn't be such a big problem.
The second flips the logic on the override early exit to be negative
rather than positive. That way if an explicit override has been set,
then it won't try to scan for backup options (because there is no
point anyway - the user wanted X and if X is not available, that's
their problem). It also means that it won't skip code that still needs
to be run once a valid firmware file has been selected.
v2: Also remove ANSI colour codes that accidentally got left in an
error message in the original patch.
Fixes: 665ae9c9ca79 ("drm/i915/uc: Support for version reduced and multiple firmware files") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220914005821.3702446-2-John.C.Harrison@Intel.com
Release all mmap mapping for all lmem objects which are associated
with userfault such that, while pcie function in D3hot, any access
to memory mappings will raise a userfault.
Runtime resume the dgpu(when gem object lies in lmem).
This will transition the dgpu graphics function to D0
state if it was in D3 in order to access the mmap memory
mappings.
v2:
- Squashes the patches. [Matt Auld]
- Add adequate locking for lmem_userfault_list addition. [Matt Auld]
- Reused obj->userfault_count to avoid double addition. [Matt Auld]
- Added i915_gem_object_lock to check
i915_gem_object_is_lmem. [Matt Auld]
v3:
- Use i915_ttm_cpu_maps_iomem. [Matt Auld]
- Fix 'ret == 0 to ret == VM_FAULT_NOPAGE'. [Matt Auld]
- Reuse obj->userfault_count as a bool 0 or 1. [Matt Auld]
- Delete the mmaped obj from lmem_userfault_list in obj
destruction path. [Matt Auld]
- Get a wakeref for object destruction patch. [Matt Auld]
- Use intel_wakeref_auto to delay runtime PM. [Matt Auld]
v4:
- Avoid using mmo offset to get the vma_node. [Matt Auld]
- Added comment to use the lmem_userfault_lock. [Matt Auld]
- Get lmem_userfault_lock in i915_gem_object_release_mmap_offset.
[Matt Auld]
- Fixed kernel test robot generated warning.
v5:
- Addressed the cosmetics comments. [Andi]
- Changed i915_gem_runtime_pm_object_release_mmap_offset() name to
i915_gem_object_runtime_pm_release_mmap_offset() to be rhythmic.
Refactor userfault_wakeref to re-use for discrete lmem mmap mapping
as well, as on discrete GTT mmap are not supported. Moving
userfault_wakeref from ggtt to gt structure.
Chris Wilson [Tue, 13 Sep 2022 15:21:51 +0000 (17:21 +0200)]
drm/i915/selftest: Clear the output buffers before GPU writes
When testing whether we can get the GPU to leak information about
non-privileged state, we first need to ensure that the output buffer is
set to a known value as the HW may opt to skip the write into memory for
a non-privileged read of a sensitive register. We chose POISON_INUSE (0x5a)
so that is both non-zero and distinct from the poison values used during
the test.
v2:
Use i915_gem_object_pin_map_unlocked
Reported-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com>
cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/5cebab02d182c171cf40cb5b73d6c3eeb7619360.1663081418.git.karolina.drobnik@intel.com
Chris Wilson [Tue, 13 Sep 2022 15:21:50 +0000 (17:21 +0200)]
drm/i915/selftest: Always cancel semaphore on error
Ensure that we always signal the semaphore when timing out, so that if it
happens to be stuck waiting for the semaphore we will quickly recover
without having to wait for a reset.
Reported-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com>
cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/8b7781f7dbaf2791156491b76d5faa7852e5cbbb.1663081418.git.karolina.drobnik@intel.com
Chris Wilson [Tue, 13 Sep 2022 15:21:49 +0000 (17:21 +0200)]
drm/i915/selftests: Check for incomplete LRI from the context image
In order to keep the context image parser simple, we assume that all
commands follow a similar format. A few, especially not MI commands on
the render engines, have fixed lengths not encoded in a length field.
This caused us to incorrectly skip over 3D state commands, and start
interpreting context data as instructions. Eventually, as Daniele
discovered, this would lead us to find addition LRI as part of the data
and mistakenly add invalid LRI commands to the context probes.
Stop parsing after we see the first !MI command, as we know we will have
seen all the context registers by that point. (Mostly true for all gen
so far, though the render context does have LRI after the first page
that we have been ignoring so far. It would be useful to extract those
as well so that we have the full list of user accessible registers.)
Similarly, emit a warning if we do try to emit an invalid zero-length
LRI.
Testcase: igt@i915_selftest@live@gt_lrc Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6580 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6670 Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Acked-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/7377cb3b371a983dce02be69f6611fcf85c822bb.1663081418.git.karolina.drobnik@intel.com
Chris Wilson [Tue, 13 Sep 2022 15:21:48 +0000 (17:21 +0200)]
drm/i915/gt: Explicitly clear BB_OFFSET for new contexts
Even though the initial protocontext we load onto HW has the register
cleared, by the time we save it into the default image, BB_OFFSET has
had the enable bit set. Reclear BB_OFFSET for each new context.
Testcase: igt/i915_selftests/gt_lrc
v2:
Extend it for gen8.
v3:
BB_OFFSET is recorded per engine from Gen9 onwards
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/37c67abb3303852f06a570a4360addf52bf941c1.1663081418.git.karolina.drobnik@intel.com
Lucas De Marchi [Mon, 12 Sep 2022 16:19:38 +0000 (09:19 -0700)]
drm/i915: Skip applying copy engine fuses
Support for reading the fuses to check what are the Link Copy engines
was added in commit ad5f74f34201 ("drm/i915/pvc: read fuses for link
copy engines"). However they were added unconditionally because the
FUSE3 register is present since graphics version 10.
However the bitfield with meml3 fuses only exists since graphics version
12. Moreover, Link Copy engines are currently only available in PVC.
Tying additional copy engines to the meml3 fuses is not correct for
other platforms.
Make sure there is a check for `12.60 <= ver < 12.70`. Later platforms
may extend this function later if it's needed to fuse off copy engines.
Currently it's harmless as the Link Copy engines are still not exported:
info->engine_mask only has BCS0 set and the register is only read for
platforms that do have it.
Lucas De Marchi [Fri, 9 Sep 2022 23:18:16 +0000 (16:18 -0700)]
drm/i915/gt: Extract function to apply media fuses
Just like is done for compute and copy engines, extract a function to
handle media engines. While at it, be consistent on using or not the
uncore/gt/info variable aliases.
Lucas De Marchi [Fri, 9 Sep 2022 23:18:15 +0000 (16:18 -0700)]
drm/i915/gt: Use MEDIA_VER() when handling media fuses
Check for media IP version instead of graphics since this is figuring
out the media engines' configuration. Currently the only platform with
non-matching graphics/media version is Meteor Lake: update the check in
gen11_vdbox_has_sfc() so it considers not only version 12, but also any
later version which then includes that platform.
Matt Roper [Tue, 6 Sep 2022 23:49:34 +0000 (16:49 -0700)]
drm/i915/mtl: Hook up interrupts for standalone media
Top-level handling of standalone media interrupts will be processed as
part of the primary GT's interrupt handler (since primary and media GTs
share an MMIO space, unlike remote tile setups). When we get down to
the point of handling engine interrupts, we need to take care to lookup
VCS and VECS engines in the media GT rather than the primary.
There are also a couple of additional "other" instance bits that
correspond to the media GT's GuC and media GT's power management
interrupts; we need to direct those to the media GT instance as well.
Matt Roper [Tue, 6 Sep 2022 23:49:33 +0000 (16:49 -0700)]
drm/i915/mtl: Use primary GT's irq lock for media GT
When we hook up interrupts (in the next patch), interrupts for the media
GT are still processed as part of the primary GT's interrupt flow. As
such, we should share the same IRQ lock with the primary GT. Let's
convert gt->irq_lock into a pointer and just point the media GT's
instance at the same lock the primary GT is using.
v2:
- Point media's gt->irq_lock at the primary GT lock properly. (Daniele)
- Fix jump target for intel_root_gt_init_early errors. (Daniele)
Matt Roper [Tue, 6 Sep 2022 23:49:32 +0000 (16:49 -0700)]
drm/i915/xelpmp: Expose media as another GT
Xe_LPM+ platforms have "standalone media." I.e., the media unit is
designed as an additional GT with its own engine list, GuC, forcewake,
etc. Let's allow platforms to include media GTs in their device info.
v2:
- Simplify GSI register handling and split it out to a separate patch
for ease of review. (Daniele)
Matt Roper [Tue, 6 Sep 2022 23:49:31 +0000 (16:49 -0700)]
drm/i915/mtl: Add gsi_offset when emitting aux table invalidation
The aux table invalidation registers are a bit unique --- they're
engine-centric registers that reside in the GSI register space rather
than within the engines' regular MMIO ranges. That means that when
issuing invalidation on engines in the standalone media GT, the GSI
offset must be added to the regular MMIO offset for the invalidation
registers.
Matt Roper [Thu, 8 Sep 2022 22:45:50 +0000 (15:45 -0700)]
drm/i915/uncore: Add GSI offset to uncore
GT non-engine registers (referred to as "GSI" registers by the spec)
have the same relative offsets on standalone media as they do on the
primary GT, just with an additional "GSI offset" added to their MMIO
address. If we store this GSI offset in the standalone media's
intel_uncore structure, it can be automatically applied to all GSI reg
reads/writes that happen on that GT, allowing us to re-use our existing
GT code with minimal changes.
Forcewake and shadowed register tables for the media GT (which will be
added in a future patch) are listed as final addresses that already
include the GSI offset, so we also need to add the GSI offset before
doing lookups of registers in one of those tables.
v2:
- Add comment on raw_reg_*() macros explaining why we don't bother with
GSI offsets in them. (Daniele)
Matt Roper [Tue, 6 Sep 2022 23:49:29 +0000 (16:49 -0700)]
drm/i915: Handle each GT on init/release and suspend/resume
In preparation for enabling a second GT, there are a number of GT/uncore
operations that happen during initialization or suspend flows that need
to be performed on each GT, not just the primary,
Matt Roper [Tue, 6 Sep 2022 23:49:27 +0000 (16:49 -0700)]
drm/i915: Use a DRM-managed action to release the PCI bridge device
As we start supporting multiple uncore structures in future patches, the
MMIO cleanup (which may also get called mid-init if there's a failure)
will become more complicated. Moving to DRM-managed actions will help
keep things simple.
Matt Roper [Tue, 6 Sep 2022 23:49:26 +0000 (16:49 -0700)]
drm/i915: Rename and expose common GT early init routine
The common early GT init is needed for initialization of all GT types
(root/primary, remote tile, standalone media). Since standalone media
(coming in a future patch) will be implemented in a separate file,
rename and expose the function for use.
Matt Roper [Tue, 6 Sep 2022 23:49:25 +0000 (16:49 -0700)]
drm/i915: Prepare more multi-GT initialization
We're going to introduce an additional intel_gt for MTL's media unit
soon. Let's provide a bit more multi-GT initialization framework in
preparation for that. The initialization will pull the list of GTs for
a platform from the device info structure. Although necessary for the
immediate MTL media enabling, this same framework will also be used
farther down the road when we enable remote tiles on xehpsdv and pvc.
v2:
- Re-add missing test for !HAS_EXTRA_GT_LIST in intel_gt_probe_all().
v3:
- Move intel_gt_definition struct to intel_gt_types.h. (Jani)
- Drop gtdef->setup(). For now we'll just use a switch() based on GT
type since we don't have too many different handlers for the
foreseeable future. (Jani)
Matt Roper [Tue, 6 Sep 2022 23:49:24 +0000 (16:49 -0700)]
drm/i915: Drop intel_gt_tile_cleanup()
Unmapping of the MMIO range can be done as a DRM-managed action, which
will take care of the unmapping on device teardown and error paths.
This will also ensure proper ordering with respect to other DRM-managed
actions that we'll be using to clean up non-primary GTs in upcoming
patches.
We have not yet enabled any non-root GTs in the driver yet, so the
kfree() of the GT structure is effectively dead code. When we do start
enabling non-root GTs in upcoming patches, those are going to be using
DRM-managed allocations tied to the device lifetime, so we don't need to
explicitly free them (and kfree would be incorrect anyway).
Matt Roper [Tue, 6 Sep 2022 23:49:23 +0000 (16:49 -0700)]
drm/i915: Use managed allocations for extra uncore objects
We're slowly transitioning the init-time kzalloc's of the driver over to
DRM-managed allocations; let's make sure the uncore objects allocated
for non-root GTs are thus allocated.
Matt Roper [Tue, 6 Sep 2022 23:49:22 +0000 (16:49 -0700)]
drm/i915: Only hook up uncore->debug for primary uncore
The original intent of intel_uncore_mmio_debug as described in commit 0a9b26306d6a ("drm/i915: split out uncore_mmio_debug") was to be a
singleton structure that could be shared between multiple GTs' uncore
objects in a multi-tile system. Somehow we went off track and
started allocating separate instances of this structure for each GT,
which defeats that original goal.
But in reality, there isn't even a need to share the mmio_debug between
multiple GTs; on all modern platforms (i.e., everything after gen7)
unclaimed register accesses are something that can only be detected for
display registers. There's no point in grabbing the debug spinlock and
checking for unclaimed accesses on an uncore used by an xehpsdv or pvc
remote tile GT, or the uncore used by a mtl standalone media GT since
all of the display accesses go through the primary intel_uncore.
The simplest solution is to simply leave uncore->debug NULL on all
intel_uncore instances except for the primary one. This will allow us
to avoid the pointless debug spinlock acquisition we've been doing on
MMIO accesses coming in through these intel_uncores.
Matt Roper [Tue, 6 Sep 2022 23:49:21 +0000 (16:49 -0700)]
drm/i915: Move locking and unclaimed check into mmio_debug_{suspend, resume}
Moving the locking for MMIO debug (and the final check for unclaimed
accesses when resuming debug after a userspace-initiated forcewake) will
make it simpler to completely skip MMIO debug handling on uncores that
don't support it in future patches.
The worker is canceled in gt_park path, but earlier it was assumed that
gt_park path cannot sleep and the cancel is asynchronous. This caused a
race with suspend flow where the worker runs after suspend and causes an
unclaimed register access warning. Cancel the worker synchronously since
the gt_park is indeed allowed to sleep.
v2: Fix author name and sign-off mismatch
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4419 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220827002135.139349-1-umesh.nerlige.ramappa@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tomas Winkler [Wed, 7 Sep 2022 21:51:12 +0000 (00:51 +0300)]
drm/i915/gsc: allocate extended operational memory in LMEM
GSC requires more operational memory than available on chip.
Reserve 4M of LMEM for GSC operation. The memory is provided to the
GSC as struct resource to the auxiliary data of the child device.
Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-16-tomas.winkler@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tomas Winkler [Wed, 7 Sep 2022 21:51:11 +0000 (00:51 +0300)]
mei: debugfs: add pxp mode to devstate in debugfs
Add pxp mode devstate to debugfs to monitor pxp state machine progress.
This is useful to debug issues in scenarios in which the pxp state
needs to be re-initialized, like during power transitions such as
suspend/resume. With this debugfs the state could be monitored
to ensure that pxp is in the ready state.
The check that hardware and host ready bits are set after start
is redundant and may fail and disable driver if there is
back-to-back link reset issued right after start.
This happens during pxp mode transitions when firmware
undergo reset. Remove these checks to eliminate such failures.
Tomas Winkler [Wed, 7 Sep 2022 21:51:08 +0000 (00:51 +0300)]
mei: gsc: setup gsc extended operational memory
1. Retrieve extended operational memory physical pointers from the
auxiliary device info.
2. Setup memory registers.
3. Notify firmware that the memory is ready by sending the memory
ready command.
4. Disable PXP device if GSC is not in PXP mode.
Tomas Winkler [Wed, 7 Sep 2022 21:51:03 +0000 (00:51 +0300)]
mei: gsc: use polling instead of interrupts
A work-around for a HW issue in XEHPSDV that manifests itself when SW reads
a gsc register when gsc is sending an interrupt. The work-around is
to disable interrupts and to use polling instead.
Define GSC on XeHP SDV (Intel(R) dGPU without display)
XeHP SDV uses the same hardware settings as DG1, but uses polling
instead of interrupts and runs the firmware in slow pace due to
hardware limitations.
Tomas Winkler [Wed, 7 Sep 2022 21:51:00 +0000 (00:51 +0300)]
mei: add slow_firmware flag to the mei auxiliary device
Add slow_firmware flag to the mei auxiliary device info
to inform the mei driver about slow underlying firmware.
Such firmware will require to use larger operation timeouts.
drm/i915/gsc: skip irq initialization if using polling
Some platforms require the host to poll on the
GSC registers instead of relaying on the interrupts.
For those platforms, irq initialization should be skipped
Nirmoy Das [Wed, 7 Sep 2022 17:26:41 +0000 (19:26 +0200)]
drm/i915: Set correct domains values at _i915_vma_move_to_active
Fix regression introduced by commit:
"drm/i915: Individualize fences before adding to dma_resv obj"
which sets obj->read_domains to 0 for both read and write paths.
Also set obj->write_domain to 0 on read path which was removed by
the commit.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6639 Fixes: 420a07b841d0 ("drm/i915: Individualize fences before adding to dma_resv obj") Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v5.16+ Cc: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907172641.12555-1-nirmoy.das@intel.com
So far, different views (normal, partial, rotated and remapped)
into the same object are only supported for GGTT mappings.
But with the upcoming VM_BIND feature, PPGTT will also use the
partial view mapping. Hence rename ggtt_view to more generic
gtt_view.
John Harrison [Tue, 6 Sep 2022 23:01:47 +0000 (16:01 -0700)]
drm/i915/uc: Add patch level version number support
With the move to un-versioned filenames, it becomes more difficult to
know exactly what version of a given firmware is being used. So add
the patch level version number to the debugfs output.
Also, support matching by patch level when selecting code paths for
firmware compatibility. While a patch level change cannot be backwards
breaking, it is potentially possible that a new feature only works
from a given patch level onwards (even though it was theoretically
added in an earlier version that bumped the major or minor version).
John Harrison [Tue, 6 Sep 2022 23:01:46 +0000 (16:01 -0700)]
drm/i915/uc: Support for version reduced and multiple firmware files
There was a misunderstanding in how firmware file compatibility should
be managed within i915. This has been clarified as:
i915 must support all existing firmware releases forever
new minor firmware releases should replace prior versions
only backwards compatibility breaking releases should be a new file
This patch cleans up the single fallback file support that was added
as a quick fix emergency effort. That is now removed in preference to
supporting arbitrary numbers of firmware files per platform.
The patch also adds support for having GuC firmware files that are
named by major version only (because the major version indicates
backwards breaking changes that affect the KMD) and for having HuC
firmware files with no version number at all (because the KMD has no
interface requirements with the HuC).
For GuC, the driver will report via dmesg if the found file is older than
expected. For HuC, the KMD will no longer require updating for any new
HuC release so will not be able to report what the latest expected
version is.
Matthew Auld [Mon, 5 Sep 2022 10:53:29 +0000 (11:53 +0100)]
drm/i915: consider HAS_FLAT_CCS() in needs_ccs_pages
Just move the HAS_FLAT_CCS() check into needs_ccs_pages. This also then
fixes i915_ttm_memcpy_allowed() which was incorrectly reporting true on
DG1, even though it doesn't have small-BAR or flat-CCS.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6605 Fixes: efeb3caf4341 ("drm/i915/ttm: disallow CPU fallback mode for ccs pages") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220905105329.41455-1-matthew.auld@intel.com
Nirmoy Das [Thu, 1 Sep 2022 17:22:17 +0000 (19:22 +0200)]
drm/i915/ttm: Abort suspend on i915_ttm_backup failure
On system suspend when system memory is low then i915_gem_obj_copy_ttm()
could fail trying to backup a lmem obj. GEM_WARN_ON() is not enough,
suspend shouldn't continue if i915_ttm_backup() throws an error.
v2: Keep the fdo issue till we have a igt test(Matt).
v3: Use %pe(Andrzej)
References: https://gitlab.freedesktop.org/drm/intel/-/issues/6529 Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Suggested-by: Chris P Wilson <chris.p.wilson@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220901172217.18392-1-nirmoy.das@intel.com
Rodrigo Vivi [Wed, 31 Aug 2022 21:45:38 +0000 (17:45 -0400)]
drm/i915/slpc: Let's fix the PCODE min freq table setup for SLPC
We need to inform PCODE of a desired ring frequencies so PCODE update
the memory frequencies to us. rps->min_freq and rps->max_freq are the
frequencies used in that request. However they were unset when SLPC was
enabled and PCODE never updated the memory freq.
v2 (as Suggested by Ashutosh): if SLPC is in use, let's pick the right
frequencies from the get_ia_constants instead of the fake init of
rps' min and max.
v3: don't forget the max <= min return
v4: Move all the freq conversion to intel_rps.c. And the max <= min
check to where it belongs.
v5: (Ashutosh) Fix old comment s/50 HZ/50 MHz and add a doc explaining
the "raw format"
Fixes: 7ba79a671568 ("drm/i915/guc/slpc: Gate Host RPS when SLPC is enabled") Cc: <stable@vger.kernel.org> # v5.15+ Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Tested-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220831214538.143950-1-rodrigo.vivi@intel.com
Rodrigo Vivi [Tue, 30 Aug 2022 19:35:37 +0000 (15:35 -0400)]
drm/i915/slpc: Fix inconsistent locked return
Fix for intel_guc_slpc_set_min_freq() warn:
inconsistent returns '&slpc->lock'.
v2: Avoid with_intel_runtime_pm with the
internal goto/return. (Ashutosh)
Also standardize the 'ret' if this came from
the efficient setup. And avoid the 'unlikely'.
Fixes: 95ccf312a1e4 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220830193537.52201-1-rodrigo.vivi@intel.com
On client DG2 platforms, optimal performance is achieved with the
hardware's default "age based" thread execution setting. However on
ATS-M, switching this to "round robin after dependencies" provides
better performance. We'll add a new "tuning" feature flag to the ATS-M
device info to enable/disable this setting.
The intent of Wa_14015141709 was to inform us that userspace can no
longer control object-level preemption as it has on past platforms
(i.e., by twiddling register bit CS_CHICKEN1[0]). The description of
the workaround in the spec wasn't terribly well-written, and when we
requested clarification from the hardware teams we were told that on the
kernel side we should also probably stop setting
FF_SLICE_CS_CHICKEN1[14], which is the register bit that directs the
hardware to honor the settings in per-context register CS_CHICKEN1. It
turns out that this guidance about FF_SLICE_CS_CHICKEN1[14] was a
mistake; even though CS_CHICKEN1[0] is non-operational and useless to
userspace, there are other bits in the register that do still work and
might need to be adjusted by userspace in the future (e.g., to implement
other workarounds that show up). If we don't set
FF_SLICE_CS_CHICKEN1[14] in i915, then those future workarounds would
not take effect.
This miscommunication came to light because another workaround
(Wa_16013994831) has now shown up that requires userspace to adjust the
value of CS_CHICKEN[10] in certain circumstances. To ensure userspace's
updates to this chicken bit are handled properly by the hardware, we
need to make sure that FF_SLICE_CS_CHICKEN1[14] is once again set by the
kernel.
Andrzej Hajda [Thu, 25 Aug 2022 15:42:39 +0000 (17:42 +0200)]
drm/i915/selftests: allow misaligned_pin test work with unmappable memory
In case of Small BAR configurations stolen local memory can be unmappable.
Since the test does not touch the memory, passing I915_BO_ALLOC_GPU_ONLY
flag to i915_gem_object_create_region, will prevent -ENOSPC error from
_i915_gem_object_stolen_init.
Matt Roper [Tue, 23 Aug 2022 20:24:49 +0000 (13:24 -0700)]
drm/i915/dg2: Incorporate Wa_16014892111 into DRAW_WATERMARK tuning
Although register tuning settings are generally implemented via the
workaround infrastructure, it turns out that the DRAW_WATERMARK register
is not properly saved/restored by hardware around power events (i.e.,
RC6 entry) so updates to the value cannot be applied in the usual
manner. New workaround Wa_16014892111 informs us that any tuning
updates to this register must instead be applied via an INDIRECT_CTX
batch buffer. This will ensure that the necessary value is re-applied
when a context begins running, even if an RC6 entry had wiped the
register back to hardware defaults since the last context ran.
Juston Li [Thu, 18 Aug 2022 17:42:05 +0000 (17:42 +0000)]
drm/i915/pxp: don't start pxp without mei_pxp bind
pxp will not start correctly until after mei_pxp bind completes and
intel_pxp_init_hw() is called.
Wait for the bind to complete before proceeding with startup.
This fixes a race condition during bootup where we observed a small
window for pxp commands to be sent, starting pxp before mei_pxp bind
completed.
Changes since v2:
- wait for pxp_component to bind instead of returning -EAGAIN (Daniele)
Changes since v1:
- check pxp_component instead of pxp_component_added (Daniele)
- pxp_component needs tee_mutex (Daniele)
- return -EAGAIN so caller knows to retry (Daniele)
Matt Roper [Thu, 18 Aug 2022 23:41:45 +0000 (16:41 -0700)]
drm/i915/mtl: Don't mask off CCS according to DSS fusing
Unlike the Xe_HP platforms, MTL only has a single CCS engine; the
quad-based engine masking logic does not apply to this platform (or
presumably any future platforms that only have 0 or 1 CCS).
Vinay Belgaumkar [Sat, 20 Aug 2022 01:08:32 +0000 (18:08 -0700)]
drm/i915/guc/slpc: Allow SLPC to use efficient frequency
Host Turbo operates at efficient frequency when GT is not idle unless
the user or workload has forced it to a higher level. Replicate the same
behavior in SLPC by allowing the algorithm to use efficient frequency.
We had disabled it during boot due to concerns that it might break
kernel ABI for min frequency. However, this is not the case since
SLPC will still abide by the (min,max) range limits.
With this change, min freq will be at efficient frequency level at init
instead of fused min (RPn). If user chooses to reduce min freq below the
efficient freq, we will turn off usage of efficient frequency and honor
the user request. When a higher value is written, it will get toggled
back again.
The patch also corrects the register which needs to be read for obtaining
the correct efficient frequency for Gen9+.
We see much better perf numbers with benchmarks like glmark2 with
efficient frequency usage enabled as expected.
v2: Address review comments (Rodrigo)
v3: with efficient frequency being dynamic, it is possible that the req
frequency may go beyond max freq. This will cause SLPC selftests to fail.
Add a FIXME there to start the test with [RPn, RP0] instead and restore
it afterwards.
Jani Nikula [Fri, 19 Aug 2022 12:02:34 +0000 (15:02 +0300)]
drm/i915/guc: remove runtime info printing from time stamp logging
Commit 368d179adbac ("drm/i915/guc: Add GuC <-> kernel time stamp
translation information") added intel_device_info_print_runtime() in the
time info dump for no obvious reason or explanation in the commit
message. It only logs the rawclk freq. Remove it.
Everything in CI using GuC is now timing out[1], and killing the machine
with this change (perhaps a deadlock?). CI was recently on fire due to
some changes coming in from -rc1, so likely the pre-merge CI results for
this series were invalid? For now just revert, unless GuC experts
already have a fix in mind.
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220819123904.913750-1-matthew.auld@intel.com
Matthew Brost [Wed, 17 Aug 2022 02:05:11 +0000 (19:05 -0700)]
drm/i915/guc: Add delay to disable scheduling after pin count goes to zero
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.
Alan Previn: Matt Brost first introduced this series back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.
Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).
Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.
Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.
Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.
If the GuC CTs are full and we need to stall the request submission
while waiting for space, we save the stalled request and where the stall
occurred; when the CTs have space again we pick up the request submission
from where we left off.
If a full GT reset occurs, the state of all contexts is cleared and all
non-guilty requests are unsubmitted, therefore we need to restart the
stalled request submission from scratch. To make sure that we do so,
clear the saved request after a reset.
Fixes note: the patch that introduced the bug is in 5.15, but no
officially supported platform had GuC submission enabled by default
in that kernel, so the backport to that particular version (and only
that one) can potentially be skipped.
Fixes: 925dc1cf58ed ("drm/i915/guc: Implement GuC submission tasklet") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: <stable@vger.kernel.org> # v5.15+ Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com