Koby Elbaz [Sun, 29 Oct 2023 17:53:26 +0000 (19:53 +0200)]
drm/xe: move the lmem verification code into a separate function
If lmem (VRAM) is not fully initialized, the punit will power down
the GT, which will prevent register access from the driver side.
That code moved into a corresponding function (xe_verify_lmem_ready)
to make the code clearer.
Brian Welty [Thu, 2 Nov 2023 23:04:53 +0000 (16:04 -0700)]
drm/xe: Fix unbind of unaccessed VMA (fault mode)
In fault mode, page table binding is deferred until fault handler.
Thus vma->tile_present will be unset unless the VMA is accessed by GPU.
During a later unbind, the logic doesn't account for the fact that local
fence variable will be NULL in this case, leading to pass NULL into
dma_fence_add_callback() and causing few WARN_ONs to print to console.
The fix is already present in the code, just hoist the fence variable
computation to be done earlier.
Resolves warnings seen with igt@xe_exec_fault_mode@once-invalid-fault
Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The "GPL-2.0" SPDX license identifier is deprecated. Update the
code to use "GPL-2.0-only" instead. Choose this identifier over
"GPL-2.0-or-later" since it's the most restrictive of the two and it's
not fully clear that "GPL-2.0" also allows "GPL-2.0-or-later".
Brian Welty [Tue, 31 Oct 2023 20:32:24 +0000 (13:32 -0700)]
drm/xe: Fix pagefault and access counter worker functions
When processing G2H messages for pagefault or access counters, we queue a
work item and call queue_work(). This fails if the worker thread is already
queued to run.
The expectation is that the worker function will do more than process a
single item and return. It needs to either process all pending items or
requeue itself if items are pending. But requeuing will add latency and
potential context switch can occur.
We don't want to add unnecessary latency and so the worker should process
as many faults as it can within a reasonable duration of time.
We also do not want to hog the cpu core, so here we execute in a loop
and requeue if still running after more than 20 ms.
This seems reasonable framework and easy to tune this futher if needed.
This resolves issues seen with several igt@xe_exec_fault_mode subtests
where the GPU will hang when KMD ignores a pending pagefault.
v2: requeue the worker instead of having an internal processing loop.
v3: implement hybrid model of v1 and v2
now, run for 20 msec before we will requeue if still running
v4: only requeue in worker if queue is non-empty (Matt B)
Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Andrzej Hajda [Fri, 27 Oct 2023 09:42:55 +0000 (11:42 +0200)]
drm/xe: implement driver initiated function-reset
Driver initiated function-reset (FLR) is the highest level of reset
that we can trigger from within the driver. In contrast to PCI FLR it
doesn't require re-enumeration of PCI BAR. It can be useful in case
GT fails to reset. It is also the only way to trigger GSC reset from
the driver and can be used in future addition of GSC support.
v2:
- use regs from xe_regs.h
- move the flag to xe.mmio
- call flr only on root gt
- use BIOS protection check
- copy/paste comments from i915
v3:
- flr code moved to xe_device.c
v4:
- needs_flr_on_fini moved to xe_device
Carlos Santa [Thu, 26 Oct 2023 22:01:27 +0000 (15:01 -0700)]
drm/xe: stringify the argument to avoid potential vulnerability
This error gets printed inside a sandbox with warnings turned on.
/mnt/host/source/src/third_party/kernel/v5.15/drivers/
gpu/drm/xe/xe_gt_idle_sysfs.c:87:26: error: format string is
not a string literal (potentially insecure) [-Werror,-Wformat-security]
return sysfs_emit(buff, gtidle->name);
^~~~~~~~~~~~
/mnt/host/source/src/third_party/kernel/v5.15/drivers/
gpu/drm/xe/xe_gt_idle_sysfs.c:87:26: note: treat the string
as an argument to avoid this
return sysfs_emit(buff, gtidle->name);
^
"%s",
1 error generated.
CC: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Carlos Santa <carlos.santa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Thu, 2 Nov 2023 12:48:55 +0000 (05:48 -0700)]
drm/xe: Add Wa_14019821291
This workaround is primarily implemented by the BIOS. However if the
BIOS applies the workaround it will reserve a small piece of our DSM
(which should be at the top, right below the WOPCM); we just need to
keep that region reserved so that nothing else attempts to re-use it.
v2 (Gustavo):
- Check for NULL media_gt
- Mask bits [5:0] to avoid potential issues in future platforms
Matt Roper [Tue, 31 Oct 2023 14:05:37 +0000 (07:05 -0700)]
drm/xe/xe2: Program correct MOCS registers
The LNCFCMOCS registers no longer exist on Xe2 so there's no need to
attempt to program them. Since GLOB_MOCS is the only set of MOCS
registers now, it's expected to be used for all platforms (both igpu and
dgpu) going forward, so adjust the MOCS programming flags accordingly.
v2:
- Fix typo (global mocs condition is >=, not >)
Brian Welty [Tue, 31 Oct 2023 21:12:16 +0000 (14:12 -0700)]
drm/xe: Fix dequeue of access counter work item
The access counters worker function is fixed to advance the head pointer
when dequeuing from the acc_queue. This now matches the similar logic in
get_pagefault().
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Brian Welty [Tue, 26 Sep 2023 00:12:48 +0000 (17:12 -0700)]
drm/xe: Replace usage of mem_type_to_tile
Currently mem_type_to_tile() is being used to access the tile's underlying
tile.mem.vram. However, this function makes the assumption that a mem_type
will only ever map to a single tile. Now that the TTM vram manager contains
a pointer to the memory_region, make use of this in xe_bo.c.
As such, introduce a helper function res_to_mem_region() to get the
ttm_vram_mgr->vram from the BO's resource, and use this to replace usage
of mem_type_to_tile().
xe_tile is still needed to choose the migration context, so this part is
unchanged. But as this is only renaming usage, function is renamed now to
mem_type_to_migrate().
Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Brian Welty [Mon, 25 Sep 2023 21:02:32 +0000 (14:02 -0700)]
drm/xe: Replace xe_ttm_vram_mgr.tile with xe_mem_region
Replace the xe_ttm_vram_mgr.tile pointer with a xe_mem_region pointer
instead. The former is currently unused.
TTM VRAM regions are exposing device vram and is better to store a pointer
directly to the xe_mem_region instead of the tile. This allows to cleanup
unnecessary usage of xe_tile from xe_bo.c in later patch.
Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Badal Nilawar [Mon, 30 Oct 2023 11:56:18 +0000 (17:26 +0530)]
drm/xe/hwmon: Expose power1_max_interval
Expose power1_max_interval, that is the tau corresponding to PL1, as a
custom hwmon attribute. Some bit manipulation is needed because of the
format of PKG_PWR_LIM_1_TIME in
PACKAGE_RAPL_LIMIT register (1.x * power(2,y))
v2: Get rpm wake ref while accessing power1_max_interval
v3: %s/hwmon/xe_hwmon/
v4:
- As power1_max_interval is rw attr take lock in read function as well
- Refine comment about val to fix point conversion (Andi)
- Update kernel version and date in doc
v5: Fix review comments (Anshuman)
Badal Nilawar [Mon, 30 Oct 2023 11:56:17 +0000 (17:26 +0530)]
drm/xe/hwmon: Protect hwmon rw attributes with hwmon_lock
Take hwmon_lock while accessing hwmon rw attributes. For readonly
attributes its not required to take lock as reads are protected
by sysfs layer and therefore sequential.
Matthew Auld [Wed, 25 Oct 2023 17:39:41 +0000 (18:39 +0100)]
drm/xe/bo: sync kernel fences for KMD buffers
With things like pipelined evictions, VRAM pages can be marked as free
and yet still have some active kernel fences, with the idea that the
next caller to allocate the memory will respect them. However it looks
like we are missing synchronisation for KMD internal buffers, like
page-tables, lrc etc. For userspace objects we should already have the
required synchronisation for CPU access via the fault handler, and
likewise for GPU access when vm_binding them.
To fix this synchronise against any kernel fences for all KMD objects at
creation. This should resolve some severe corruption seen during
evictions.
v2 (Matt B):
- Revamp the comment explaining this. Also mention why USAGE_KERNEL is
correct here.
v3 (Thomas):
- Make sure to use ctx.interruptible for the wait.
Testcase: igt@xe-evict-ccs Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/853 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/855 Reported-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Tested-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld [Wed, 25 Oct 2023 17:39:40 +0000 (18:39 +0100)]
drm/xe/bo: consider dma-resv fences for clear job
There could be active fences already in the dma-resv for the object
prior to clearing. Make sure to input them as dependencies for the clear
job.
v2 (Matt B):
- We can use USAGE_KERNEL here, since it's only the move fences we
care about here. Also add a comment.
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld [Wed, 25 Oct 2023 17:39:39 +0000 (18:39 +0100)]
drm/xe/migrate: fix MI_ARB_ON_OFF usage
Spec says: "This is a privileged command; it will not be effective (will
be converted to a no-op) if executed from within a non-privileged batch
buffer." However here it looks like we are just emitting it inside some
bb which was jumped to via the ppGTT, which should be considered
a non-privileged address space.
It looks like we just need some way of preventing things like the
emit_pte() and later copy/clear being preempted in-between so rather
just emit directly in the ring for migration jobs.
Bspec: 45716 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe/xe_exec_queue: Add check for access counter granularity
Add conditional check for access counter granularity.
This check will return -EINVAL if granularity is beyond 64M
which is a hardware limitation.
v2: Defined
XE_ACC_GRANULARITY_128K 0
XE_ACC_GRANULARITY_2M 1
XE_ACC_GRANULARITY_16M 2
XE_ACC_GRANULARITY_64M 3
as part of uAPI.
So, that user can also use it.(Oak)
v3: Move uAPI to proper location and give proper
documentation.(Brian, Oak)
Cc: Oak Zeng <oak.zeng@intel.com> Cc: Janga Rahul Kumar <janga.rahul.kumar@intel.com> Cc: Brian Welty <brian.welty@intel.com> Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com> Reviewed-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Oak Zeng <oak.zeng@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Tue, 24 Oct 2023 22:04:12 +0000 (15:04 -0700)]
drm/xe: Fix WA 14010918519 write to wrong register
FORCE_SLM_FENCE_SCOPE_TO_TILE and FORCE_UGM_FENCE_SCOPE_TO_TILE are in
the up dword of LSC_CHICKEN_BIT_0 register. Also, the 14010918519
workaround only applies to early steppings, A*. Eventually those should
be dropped, like they were in commit eaeb4b361452 ("drm/i915/dg2: Drop
pre-production GT workarounds"), so let's make sure they are annotated
appropriately.
v2: don't use the filename to identify the header type (Lucas)
v3: fix commit msg (Lucas)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe/huc: Don't re-auth HuC if it's already authenticated
On newer platforms the HuC survives reset and stays authenticated, so no
need to re-authenticate it.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe/huc: HuC is not supported on GTs that don't have video engines
On MTL-style multi-gt platforms, the HuC is only available on the media
GT, so we need to consider it as not supported on the render GT.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe/huc: Extract version and binary offset from new HuC headers
The GSC-enabled HuC binary starts with a GSC header, which is followed
by the legacy-style CSS header and the binary itself. We can parse the
GSC headers to find the HuC version and the location of the binary to
be used for the DMA transfer.
The parsing function has been designed to be re-used for the GSC binary,
so the entry names are external parameters (because the GSC uses
different ones) and the CSS entry is optional (because the GSC doesn't
have it).
v2: move new code to uc_fw.c, better comments and error checking, split
old code move to separate patch (Lucas), move headers and
documentation to uc_fw_abi.h.
v3: use 2 separate loops, rework marker check (Lucas)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe/uc: Prepare for parsing of different header types
GSC binaries and newer HuC ones use GSC-style headers instead of the
CSS. In preparation for adding support for such parsing, split out the
current parsing code to its own function, to make it cleaner to add the
new paths. The existing doc section has also been renamed to narrow it
to CSS-based binaries.
v2: new patch in series, split out from next patch for easier reviewing
v3: drop unneeded include (Lucas)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Badal Nilawar [Wed, 25 Oct 2023 16:12:01 +0000 (21:42 +0530)]
drm/xe: Extend rpX values extraction for future platforms
In existing code flow for future platforms i.e. >1270, the rpX
(rp0,rpn and rpe) fused values are read from gen 6 registers.
Which is not correct. Unless specified gen 1270 regs should be valid
for gen 1270+ platforms as well.
Matt Roper [Mon, 23 Oct 2023 20:41:13 +0000 (13:41 -0700)]
drm/xe/mocs: MOCS registers are multicast on Xe_HP and beyond
The MOCS registers should be written in an MCR-specific manner on Xe_HP
and beyond to prevent any other driver threads or external firmware from
putting the hardware into unicast mode while we initialize the MOCS
table.
Matt Roper [Wed, 25 Oct 2023 15:17:36 +0000 (08:17 -0700)]
drm/xe/xe2: Update SVG state handling
As with DG2/MTL, Xe2 also fails to emit instruction headers for SVG
state instructions if no explicit state has been set. The SVG part of
the LRC is nearly identical to DG2/MTL; the only change is that
3DSTATE_DRAWING_RECTANGLE has been replaced by
3DSTATE_DRAWING_RECTANGLE_FAST, so we can just re-use the same state
table and handle that single instruction when we encounter it.
Matt Roper [Wed, 25 Oct 2023 15:17:35 +0000 (08:17 -0700)]
drm/xe: Emit SVG state on RCS during driver load on DG2 and MTL
When recording the default LRC, the expectation is that the hardware's
original state settings (both register and instruction) will be written
out to the LRC upon first context switch. For many 3DSTATE_* state
instructions that don't truly have "default" values, this translates to
a simple instruction header (opcodes + dword length) being written to
the LRC, followed by an appropriate number of blank dwords as a place
holder. When userspace creates a context (which starts as a copy of the
default LRC), they'll generally emit real 3DSTATE_* as part of their
initialization to select the settings they desire. If they don't emit
one of the 3DSTATE instructions, then the zeroed dwords that remain in
their LRC image generally translate to various state remaining disabled.
This will either be what userspace wants or will lead to very
reproducible and easily-debugged problems (rendering glitches, engine
hangs).
It turns out that a subset of the 3DSTATE instructions, specifically
those belonging to the SVG (State Variable - Global) unit, are not only
emitting 0's for the instruction's "body" dwords, but also for the
instruction header dword if no specific state has been explicitly set
before context switch. This means that when the hardware switches to a
context that hasn't explicitly provided an appropriate state setting,
the hardware will just see a sequence of NOOPs in the spot reserved for
that 3DSTATE instruction while executing the LRC, and the actual
hardware state setting will unintentionally inherit the configuration
used by the previously running context. Now when userspace makes a
mistake and forgets to emit an important state instruction they no
longer get consistent, easily-reproducible corruption/hangs, but rather
erratic behavior where the presence/absence of a problem depends on what
other workloads are running on the system and what order the contexts
are scheduled on the engine.
A specific example of this that came up recently related to mesh shading
The OpenGL driver was not specifically emitting a 3DSTATE_MESH_CONTROL
to disable mesh shading at context init, so on context switch, mesh
shading would either be on or off depending on what the previous context
had been doing. Vulkan apps _were_ enabling mesh shading, so running a
Vulkan app and then context switching to an OpenGL app resulted in mesh
shading still unexpectedly being enabled during OpenGL operation, and
since other Mesh-related state was not properly initialized for that
context a GPU hang was seen. Due to the specific ordering requirements
(Vulkan app runs first, followed by OpenGL app), it took additional
debug effort to track down the cause of the problem.
There are various workarounds related to this behavior, with current
implementations handled in the userspace drivers. E.g., Wa_14019789679
and Wa_22018402687. However it's been suggested that the kernel driver
can help simplify things here by emitting zeroed SVG state with proper
instruction headers as part of our default context creation (i.e., at
the same point we apply LRC workarounds). This will help ensure that
any future cases where a userspace driver does not emit an important
state setting will result in consistent behavior.
Matt Roper [Wed, 25 Oct 2023 15:17:34 +0000 (08:17 -0700)]
drm/xe: Prepare to emit non-register state while recording default LRC
On some platforms we need to emit some non-register state while
recording an engine class' default LRC. Add the infrastructure to
support this; actual per-platform tables will be added in future
patches.
v2:
- Checkpatch whitespace fix
- Add extra assertion to ensure num_dw != 0. (Bala)
Shekhar Chauhan [Tue, 24 Oct 2023 22:07:38 +0000 (15:07 -0700)]
drm/xe: Add performance tuning settings for MTL and Xe2
Add L3SQCREG5 as part of HW recommended settings. The recommended value
in Bspec is 00e0007f. For Xe2-LPG, bits 23:21 don't exist anymore, but
it's confirmed with HW engineers that setting them doesn't do anything.
They still exist on the media GT, Xe2-LPM, but they are already they are
already set as per HW default value. So for Xe2 platform, the only bits
that need to be set are 9:0 since HW's default is 0x1ff and the
recommended value is 0x7f.
Unlike most registers, which have the same relative offset on both
the primary and media GT, this register has a different base offset
on the media GT.
On MTL the register only exists for the primary (graphics) GT, so
there's no need to program it on the media gt. Also, it's part of the
RCS engine's context, so it needs to be added as a LRC workaround.
Bspec: 72161 Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20231024220739.224251-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add the initial collection of gt/engine/lrc workarounds.
While at it, add some newlines around the platform/IP comments to make
them consistent across all workarounds.
v2:
- FF_MODE is an MCR register (Matt Roper)
- Group 18032247524 with other Xe2 workarounds (Matt Roper)
- Move WA changing PSS_CHICKEN to lrc_was[] as for Xe2 that register
is part of the render context image (Matt Roper)
- Apply WA 16020518922 only on render engine (Matt Roper)
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20231024220739.224251-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld [Wed, 18 Oct 2023 12:34:24 +0000 (13:34 +0100)]
drm/xe: fix pat[2] programming with 2M/1G pages
Bit 7 in the leaf node is normally programmed with pat[2], however with
2M/1G pages that same bit in the PDE/PDPE also toggles 2M/1G pages. For
2M/1G entries the pat[2] is rather moved to bit 12, which is now free
given that the address must be aligned to 2M or 1G, leaving bit 7 for
toggling 2M/1G pages.
Bspec: 59510, 45038 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The PVC GuC version that we're currently using (70.6.4) has a known
issue that leads to dropping the disabling of contexts that have
pending page faults. This is fixed in newer blobs, so we need to
update to a more recent release.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/696 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
During the uapi review it was identified a possible confusion
with the plural of acronym with a new acronym. So the
recommendation is to go with gt_list instead.
Suggested-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Matthew Brost [Thu, 14 Sep 2023 20:40:51 +0000 (13:40 -0700)]
drm/xe: Fix VM bind out-sync signaling ordering
A case existed where an out-sync of a later VM bind operation could
signal before a previous one if the later operation results in a NOP
(e.g. a unbind or prefetch to a VA range without any mappings). This
breaks the ordering rules, fix this. This patch also lays the groundwork
for users to pass in num_binds == 0 and out-syncs.
Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Brost [Thu, 14 Sep 2023 20:40:50 +0000 (13:40 -0700)]
drm/xe: Remove async worker and rework sync binds
Async worker is gone. All jobs and memory allocations done in IOCTL to
align with dma fencing rules.
Async vs. sync now means when do bind operations complete relative to
the IOCTL. Async completes when out-syncs signal while sync completes
when the IOCTL returns. In-syncs and out-syncs are only allowed in async
mode.
If memory allocations fail in the job creation step the VM is killed.
This is temporary, eventually a proper unwind will be done and VM will
be usable.
Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This extension is currently not used and it is not aligned with
the error handling on async VM_BIND. Let's remove it and along with
that, since it was the only extension for the vm_create, remove VM
extension entirely.
v2: rebase on top of the removal of drm_xe_ext_exec_queue_set_property
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Ashutosh Dixit [Wed, 20 Sep 2023 19:29:31 +0000 (15:29 -0400)]
drm/xe/uapi: Use common drm_xe_ext_set_property extension
There really is no difference between 'struct drm_xe_ext_vm_set_property'
and 'struct drm_xe_ext_exec_queue_set_property', they are extensions which
specify a <property, value> pair. Replace the two extensions with a single
common 'struct drm_xe_ext_set_property' extension. The rationale is that
rather than have each XE module (including future modules) invent their own
property/value extensions, all XE modules use a common set_property
extension when possible.
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Matthew Brost [Wed, 20 Sep 2023 19:29:30 +0000 (15:29 -0400)]
drm/xe: Remove XE_EXEC_QUEUE_SET_PROPERTY_COMPUTE_MODE from uAPI
Functionality of XE_EXEC_QUEUE_SET_PROPERTY_COMPUTE_MODE deprecated in a
previous patch, drop from uAPI. The property is just simply inherented
from the VM.
We are going to remove XE_EXEC_QUEUE_SET_PROPERTY_COMPUTE_MODE from the
uAPI, deprecate the implementation first by making
XE_EXEC_QUEUE_SET_PROPERTY_COMPUTE_MODE a NOP. After removal of
XE_EXEC_QUEUE_SET_PROPERTY_COMPUTE_MODE the proper is simply inherented
from the VM.
v2:
- Update commit message with explaination of removal (Niranjana)
drm/xe: Correlate engine and cpu timestamps with better accuracy
Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.
To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.
v2:
- Fix kernel-doc warnings (CI)
- Document input params and group them together (Jose)
- s/cs/engine/ (Jose)
- Remove padding in the query (Ashutosh)
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo finished the s/cs/engine renaming]
Matt Roper [Mon, 16 Oct 2023 16:34:56 +0000 (09:34 -0700)]
drm/xe/debugfs: Include GFXPIPE commands in LRC dump
RCS and CCS engines include several non-register gfxpipe commands in
their LRC images. Include these in the dump output so that we can see
exactly what's inside the context snapshot.
v2:
- Include raw instruction header in output
- Add 3DSTATE_AMFS_TEXTURE_POINTERS and 3DSTATE_MONOFILTER_SIZE. The
first was supposed to be removed in Xe_HPG, and the second by
gen12, but both still show up in the RCS LRC.
v3:
- Sanity check that we don't have numdw > remaining_dw. (Lucas)
Matt Roper [Mon, 16 Oct 2023 16:34:55 +0000 (09:34 -0700)]
drm/xe/debugfs: Add dump of default LRCs' MI instructions
For non-RCS engines, nearly all of the LRC state is composed of MI
instructions (specifically MI_LOAD_REGISTER_IMM). Providing a dump
interface allows us to verify that the context image layout matches
what's documented in the bspec, and also allows us to check whether LRC
workarounds are being properly captured by the default state we record
at startup.
For now, the non-MI instructions found in the RCS and CCS engines will
dump as "unknown;" parsing of those will be added in a follow-up patch.
v2:
- Add raw instruction header as well as decoded meaning. (Lucas)
- Check that num_dw isn't greater than remaining_dw for instructions
that have a "# dwords" field. (Lucas)
- Clarify comment about skipping over ppHWSP. (Lucas)
Matt Roper [Mon, 16 Oct 2023 16:34:54 +0000 (09:34 -0700)]
drm/xe: Extract MI_* instructions to their own header
Extracting the common MI_* instructions that can be used with any engine
to their own header will make it easier as we add additional engine
instructions in upcoming patches.
Also, since the majority of GPU instructions (both MI and non-MI) have
a "length" field in bits 7:0 of the instruction header, a common define
is added for that. Instruction-specific length fields are still defined
for special case instructions that have larger/smaller length fields.
v2:
- Use "instr" instead of "inst" as the short form of "instruction"
everywhere. (Lucas)
- Include xe_reg_defs.h instead of the i915 compat header. (Lucas)
Matt Roper [Mon, 16 Oct 2023 16:34:53 +0000 (09:34 -0700)]
drm/xe: Clarify number of dwords/qwords stored by MI_STORE_DATA_IMM
MI_STORE_DATA_IMM can store either dword values or qword values, and can
store more than one value if the instruction's length field is large
enough. Create explicit defines to specify the number of dwords/qwords
to be stored, which will set the instruction length correctly and, if
necessary, turn on the 'store qword' bit.
While we're here, also replace an open-coded version of
MI_STORE_DATA_IMM with the common macros.
Matt Roper [Mon, 16 Oct 2023 16:34:51 +0000 (09:34 -0700)]
drm/xe: Make MI_FLUSH_DW immediate size more explicit
Despite its name, MI_FLUSH_DW instruction can write an immediate value
of either dword size or qword size, depending on the 'length' field of
the instruction. Since "length" excludes the first two dwords of the
instruction, a value of 2 in the length field implies a dword write and
a value of 3 implies a qword write. Even in cases where the flush
instruction's post-sync operation is set to "no write" we're still
expected to size the overall instruction as if we were doing a dword or
qword write (i.e., a length of 1 shouldn't be used on modern platforms).
Rather than baking a size of "1" into the #define and then adding
another unexplained "+ 1" at all the spots where the definition gets
used, lets just create MI_FLUSH_IMM_DW and MI_FLUSH_IMM_QW definitions
that should be OR'd into the instruction header to make it more explicit
what behavior we're requesting.
Vitaly Lubart [Wed, 30 Aug 2023 10:05:40 +0000 (13:05 +0300)]
drm/xe/gsc: add has_heci_gscfi indication to device
Mark support of MEI-GSC interaction per device.
Add has_heci_gscfi indication to xe_device and xe_pci structures.
Mark DG1 and DG2 devices as supported.
drm/xe: Set PTE_AE for smem allocations in integrated devices
Without this if a atomic operation is executed in Xe2 integrated GPUs
it causes engine memory catastrophic error.
This fixes at least 3 failures in piglit sanity and 2 failures in
crucible for LNL.
v3:
- only add PTE_AE to smem in integrated
Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Koby Elbaz [Thu, 5 Oct 2023 15:06:18 +0000 (11:06 -0400)]
drm/xe: map MMIO BAR according to the num of tiles in device desc
When MMIO BAR is initially mapped, the driver assumes a single tile device.
However, former memory allocations take all tiles into account.
First, a common standard for resource usage is needed here.
Second, with the next (6th) patch in this series, the MMIO BAR remapping
will be done only if a reduced-tile device is attached.
Koby Elbaz [Thu, 5 Oct 2023 15:06:17 +0000 (11:06 -0400)]
drm/xe: add MMIO extension support flags
Besides the regular MMIO space that exists by default, MMIO
extension support & MMIO extension tile size should both be
defined per device, and updated from the device's descriptor.
Koby Elbaz [Thu, 5 Oct 2023 15:06:14 +0000 (11:06 -0400)]
drm/xe: add 28-bit address support in struct xe_reg
Xe driver currently supports 22-bit addresses for MMIO access.
Future platforms will have additional MMIO extension with
larger address spaces, and to access them, the driver will
have to support wider address representation.
Please note that while the XE_REG macro is used for MMIO access,
XE_REG_EXT macro will be used for MMIO-extension access.
Matthew Auld [Fri, 6 Oct 2023 08:46:16 +0000 (09:46 +0100)]
drm/xe: directly use pat_index for pte_encode
In a future patch userspace will be able to directly set the pat_index
as part of vm_bind. To support this we need to get away from using
xe_cache_level in the low level routines and rather just use the
pat_index directly.
v2: Rebase
v3: Some missed conversions, also prefer tile_to_xe() (Niranjana)
v4: remove leftover const (Lucas)
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld [Fri, 6 Oct 2023 08:46:15 +0000 (09:46 +0100)]
drm/xe/pat: trim the xelp PAT table
We don't seem to use the 4-7 pat indexes, even though they are defined
by the HW. In a future patch userspace will be able to directly set the
pat_index as part of vm_bind and we don't want to allow setting 4-7.
Simplest is to just ignore them here.
Suggested-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Anusha Srivatsa [Thu, 5 Oct 2023 20:54:47 +0000 (13:54 -0700)]
drm/xe/rplu: s/ADLP/ALDERLAKE_P
i915 now uses full names for platforms. So we now have
ALDERLAKE instead of ADL. Extend this to xe driver as well.
This will make it easier for macro magic usages.
v2: Do not make changes to compat-i915-headers/i915_drv.h
file with the rest of the changes (Jani)
Cc: Jani Nikula <jani.nikula@intel.com> Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231005205450.3177354-3-anusha.srivatsa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Fri, 6 Oct 2023 18:23:24 +0000 (11:23 -0700)]
drm/xe/pat: Add debugfs node to dump PAT
This is useful to debug cache issues, to double check if the PAT
indexes match what they were supposed to be set to from spec.
v2: Add separate functions for XeHP, XeHPC and XeLPG so it correctly
reads the index based on MCR/REG registers and also decodes the
fields (Matt Roper)
v3: Starting with XeHPC, do not translate values to human-readable
formats as the main goal is to make it easy to compare the table
with the spec. Also, share a single array for xelp/xehp str map
(Matt Roper)
Lucas De Marchi [Fri, 6 Oct 2023 18:23:23 +0000 (11:23 -0700)]
drm/xe/xe2: Add one more bit to encode PAT to ppgtt entries
Xe2 adds one more bit to cover all the possible 32 entries. Although
those entries are not used by internal kernel code paths, it's expected
that userspace will make use of it.
Bspec: 59510, 67095 Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20231006182325.3617685-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Fri, 6 Oct 2023 18:23:22 +0000 (11:23 -0700)]
drm/xe/xe2: Program PAT tables
The PAT tables become significantly more complicated on Xe2 platforms.
They now control L3, L4, and coherency settings, as well as additional
characteristics such as compression.
Aside from the main PAT table, there's an additional register that
also needs to be programmed with PAT settings for PCI Address
Translation Services.
Bspec: 71582 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20231006182325.3617685-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Fri, 29 Sep 2023 05:02:49 +0000 (22:02 -0700)]
drm/xe/xe2: Follow XeHPC for TLB invalidation
Register GUC_TLB_INV_CR is gone in xe2. When GuC submission is not yet
enabled, make sure to follow the same path as XeHPC.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Fri, 29 Sep 2023 05:02:48 +0000 (22:02 -0700)]
drm/xe/vm: Prefer xe_assert() over XE_WARN_ON()
When xelp_pte_encode_addr() was added in commit 23c8495efeed
("drm/xe/migrate: Do not hand-encode pte"), there was no xe pointer for
using xe_assert(). This is not the case anymore, so prefer it over
XE_WARN_ON().
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Atwood [Fri, 6 Oct 2023 16:47:59 +0000 (09:47 -0700)]
drm/xe: add gt tuning for indirect state
Force indirect state sampler data to only be in the dynamic state pool,
which is more convienent for the UMD. Behavior change mirrors similar
change for i915 in commit 16fc9c08f0ec ("drm/i915: disable sampler
indirect state in bindless heap")
v2: split out per engine tuning into separate patch, commit message
(Lucas)
v3: rebase
v4: Change to match render only, g.ver 1200 to 1271 (MattR)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld [Thu, 5 Oct 2023 16:38:55 +0000 (17:38 +0100)]
drm/xe/hwmon: fix uaf on unload
It doesn't look like you can mix and match devm_ and drmmm_ for a
managed resource. For drmmm the resources are all tracked in drm with
its own list, and there is only one devm_ resource for the entire list.
If the driver itself also adds some of its own devm resources, then
those will be released first. In the case of hwmon the devm_kzalloc will
be freed before the drmmm_ action to destroy the mutex allocated within,
leading to uaf.
Since hwmon itself wants to use devm, rather use that for the mutex
destroy.