Matt Roper [Mon, 16 Oct 2023 16:34:56 +0000 (09:34 -0700)]
drm/xe/debugfs: Include GFXPIPE commands in LRC dump
RCS and CCS engines include several non-register gfxpipe commands in
their LRC images. Include these in the dump output so that we can see
exactly what's inside the context snapshot.
v2:
- Include raw instruction header in output
- Add 3DSTATE_AMFS_TEXTURE_POINTERS and 3DSTATE_MONOFILTER_SIZE. The
first was supposed to be removed in Xe_HPG, and the second by
gen12, but both still show up in the RCS LRC.
v3:
- Sanity check that we don't have numdw > remaining_dw. (Lucas)
Matt Roper [Mon, 16 Oct 2023 16:34:55 +0000 (09:34 -0700)]
drm/xe/debugfs: Add dump of default LRCs' MI instructions
For non-RCS engines, nearly all of the LRC state is composed of MI
instructions (specifically MI_LOAD_REGISTER_IMM). Providing a dump
interface allows us to verify that the context image layout matches
what's documented in the bspec, and also allows us to check whether LRC
workarounds are being properly captured by the default state we record
at startup.
For now, the non-MI instructions found in the RCS and CCS engines will
dump as "unknown;" parsing of those will be added in a follow-up patch.
v2:
- Add raw instruction header as well as decoded meaning. (Lucas)
- Check that num_dw isn't greater than remaining_dw for instructions
that have a "# dwords" field. (Lucas)
- Clarify comment about skipping over ppHWSP. (Lucas)
Matt Roper [Mon, 16 Oct 2023 16:34:54 +0000 (09:34 -0700)]
drm/xe: Extract MI_* instructions to their own header
Extracting the common MI_* instructions that can be used with any engine
to their own header will make it easier as we add additional engine
instructions in upcoming patches.
Also, since the majority of GPU instructions (both MI and non-MI) have
a "length" field in bits 7:0 of the instruction header, a common define
is added for that. Instruction-specific length fields are still defined
for special case instructions that have larger/smaller length fields.
v2:
- Use "instr" instead of "inst" as the short form of "instruction"
everywhere. (Lucas)
- Include xe_reg_defs.h instead of the i915 compat header. (Lucas)
Matt Roper [Mon, 16 Oct 2023 16:34:53 +0000 (09:34 -0700)]
drm/xe: Clarify number of dwords/qwords stored by MI_STORE_DATA_IMM
MI_STORE_DATA_IMM can store either dword values or qword values, and can
store more than one value if the instruction's length field is large
enough. Create explicit defines to specify the number of dwords/qwords
to be stored, which will set the instruction length correctly and, if
necessary, turn on the 'store qword' bit.
While we're here, also replace an open-coded version of
MI_STORE_DATA_IMM with the common macros.
Matt Roper [Mon, 16 Oct 2023 16:34:51 +0000 (09:34 -0700)]
drm/xe: Make MI_FLUSH_DW immediate size more explicit
Despite its name, MI_FLUSH_DW instruction can write an immediate value
of either dword size or qword size, depending on the 'length' field of
the instruction. Since "length" excludes the first two dwords of the
instruction, a value of 2 in the length field implies a dword write and
a value of 3 implies a qword write. Even in cases where the flush
instruction's post-sync operation is set to "no write" we're still
expected to size the overall instruction as if we were doing a dword or
qword write (i.e., a length of 1 shouldn't be used on modern platforms).
Rather than baking a size of "1" into the #define and then adding
another unexplained "+ 1" at all the spots where the definition gets
used, lets just create MI_FLUSH_IMM_DW and MI_FLUSH_IMM_QW definitions
that should be OR'd into the instruction header to make it more explicit
what behavior we're requesting.
Vitaly Lubart [Wed, 30 Aug 2023 10:05:40 +0000 (13:05 +0300)]
drm/xe/gsc: add has_heci_gscfi indication to device
Mark support of MEI-GSC interaction per device.
Add has_heci_gscfi indication to xe_device and xe_pci structures.
Mark DG1 and DG2 devices as supported.
drm/xe: Set PTE_AE for smem allocations in integrated devices
Without this if a atomic operation is executed in Xe2 integrated GPUs
it causes engine memory catastrophic error.
This fixes at least 3 failures in piglit sanity and 2 failures in
crucible for LNL.
v3:
- only add PTE_AE to smem in integrated
Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Koby Elbaz [Thu, 5 Oct 2023 15:06:18 +0000 (11:06 -0400)]
drm/xe: map MMIO BAR according to the num of tiles in device desc
When MMIO BAR is initially mapped, the driver assumes a single tile device.
However, former memory allocations take all tiles into account.
First, a common standard for resource usage is needed here.
Second, with the next (6th) patch in this series, the MMIO BAR remapping
will be done only if a reduced-tile device is attached.
Koby Elbaz [Thu, 5 Oct 2023 15:06:17 +0000 (11:06 -0400)]
drm/xe: add MMIO extension support flags
Besides the regular MMIO space that exists by default, MMIO
extension support & MMIO extension tile size should both be
defined per device, and updated from the device's descriptor.
Koby Elbaz [Thu, 5 Oct 2023 15:06:14 +0000 (11:06 -0400)]
drm/xe: add 28-bit address support in struct xe_reg
Xe driver currently supports 22-bit addresses for MMIO access.
Future platforms will have additional MMIO extension with
larger address spaces, and to access them, the driver will
have to support wider address representation.
Please note that while the XE_REG macro is used for MMIO access,
XE_REG_EXT macro will be used for MMIO-extension access.
Matthew Auld [Fri, 6 Oct 2023 08:46:16 +0000 (09:46 +0100)]
drm/xe: directly use pat_index for pte_encode
In a future patch userspace will be able to directly set the pat_index
as part of vm_bind. To support this we need to get away from using
xe_cache_level in the low level routines and rather just use the
pat_index directly.
v2: Rebase
v3: Some missed conversions, also prefer tile_to_xe() (Niranjana)
v4: remove leftover const (Lucas)
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld [Fri, 6 Oct 2023 08:46:15 +0000 (09:46 +0100)]
drm/xe/pat: trim the xelp PAT table
We don't seem to use the 4-7 pat indexes, even though they are defined
by the HW. In a future patch userspace will be able to directly set the
pat_index as part of vm_bind and we don't want to allow setting 4-7.
Simplest is to just ignore them here.
Suggested-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Anusha Srivatsa [Thu, 5 Oct 2023 20:54:47 +0000 (13:54 -0700)]
drm/xe/rplu: s/ADLP/ALDERLAKE_P
i915 now uses full names for platforms. So we now have
ALDERLAKE instead of ADL. Extend this to xe driver as well.
This will make it easier for macro magic usages.
v2: Do not make changes to compat-i915-headers/i915_drv.h
file with the rest of the changes (Jani)
Cc: Jani Nikula <jani.nikula@intel.com> Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231005205450.3177354-3-anusha.srivatsa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Fri, 6 Oct 2023 18:23:24 +0000 (11:23 -0700)]
drm/xe/pat: Add debugfs node to dump PAT
This is useful to debug cache issues, to double check if the PAT
indexes match what they were supposed to be set to from spec.
v2: Add separate functions for XeHP, XeHPC and XeLPG so it correctly
reads the index based on MCR/REG registers and also decodes the
fields (Matt Roper)
v3: Starting with XeHPC, do not translate values to human-readable
formats as the main goal is to make it easy to compare the table
with the spec. Also, share a single array for xelp/xehp str map
(Matt Roper)
Lucas De Marchi [Fri, 6 Oct 2023 18:23:23 +0000 (11:23 -0700)]
drm/xe/xe2: Add one more bit to encode PAT to ppgtt entries
Xe2 adds one more bit to cover all the possible 32 entries. Although
those entries are not used by internal kernel code paths, it's expected
that userspace will make use of it.
Bspec: 59510, 67095 Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20231006182325.3617685-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Fri, 6 Oct 2023 18:23:22 +0000 (11:23 -0700)]
drm/xe/xe2: Program PAT tables
The PAT tables become significantly more complicated on Xe2 platforms.
They now control L3, L4, and coherency settings, as well as additional
characteristics such as compression.
Aside from the main PAT table, there's an additional register that
also needs to be programmed with PAT settings for PCI Address
Translation Services.
Bspec: 71582 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20231006182325.3617685-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Fri, 29 Sep 2023 05:02:49 +0000 (22:02 -0700)]
drm/xe/xe2: Follow XeHPC for TLB invalidation
Register GUC_TLB_INV_CR is gone in xe2. When GuC submission is not yet
enabled, make sure to follow the same path as XeHPC.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Fri, 29 Sep 2023 05:02:48 +0000 (22:02 -0700)]
drm/xe/vm: Prefer xe_assert() over XE_WARN_ON()
When xelp_pte_encode_addr() was added in commit 23c8495efeed
("drm/xe/migrate: Do not hand-encode pte"), there was no xe pointer for
using xe_assert(). This is not the case anymore, so prefer it over
XE_WARN_ON().
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Atwood [Fri, 6 Oct 2023 16:47:59 +0000 (09:47 -0700)]
drm/xe: add gt tuning for indirect state
Force indirect state sampler data to only be in the dynamic state pool,
which is more convienent for the UMD. Behavior change mirrors similar
change for i915 in commit 16fc9c08f0ec ("drm/i915: disable sampler
indirect state in bindless heap")
v2: split out per engine tuning into separate patch, commit message
(Lucas)
v3: rebase
v4: Change to match render only, g.ver 1200 to 1271 (MattR)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matthew Auld [Thu, 5 Oct 2023 16:38:55 +0000 (17:38 +0100)]
drm/xe/hwmon: fix uaf on unload
It doesn't look like you can mix and match devm_ and drmmm_ for a
managed resource. For drmmm the resources are all tracked in drm with
its own list, and there is only one devm_ resource for the entire list.
If the driver itself also adds some of its own devm resources, then
those will be released first. In the case of hwmon the devm_kzalloc will
be freed before the drmmm_ action to destroy the mutex allocated within,
leading to uaf.
Since hwmon itself wants to use devm, rather use that for the mutex
destroy.
It was reading (base) + 0x8c but that is not a valid register
and instead it should read (base) + 0x68.
So here reading the correct register and removing the wrong and
duplicated.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Paulo Zanoni [Fri, 29 Sep 2023 17:31:04 +0000 (10:31 -0700)]
drm/xe: fix range printing for debug messages
We're already using the half-open interval notation "[A, B)", that "-
1" there makes it wrong. Also, getting rid of the "-1" makes it much
easier to grep for the logs when you're looking for an address that's
the end of a vma and the start of another.
Expose the card reactive critical (I1) power. I1 is exposed as
power1_crit in microwatts (typically for client products) or as
curr1_crit in milliamperes (typically for server).
Matthew Brost [Fri, 29 Sep 2023 20:02:54 +0000 (13:02 -0700)]
drm/xe: Fix exec queue usage for unbinds
Passing in a NULL exec queue to __xe_pt_unbind_vma results in the
migrate exec queue being used. This is not the intent from the VM bind
IOCTL, rather a NULL exec queue should use default VM exec queue.
drm/xe/xe2: Update MOCS fields in blitter instructions
Xe2 changes or adds bits for mocs in a few BLT instructions:
XY_CTRL_SURF_COPY_BLT, XY_FAST_COLOR_BLT, XY_FAST_COPY_BLT, and MEM_SET.
Modify the code to deal with the new location. Unlike Xe1, the MOCS
field in those instructions is only the MOCS index and not the
Structure_MEMORY_OBJECT_CONTROL_STATE anymore. The pxp bit is now
explicitly documented separately.
Bspec: 57567,57566,57565,57562 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe/xe2: Set tile y type in XY_FAST_COPY_BLT to Tile4
Set bits 30 and 31 of XY_FAST_COPY_BLT's dword1 for XeHP and above.
Destination or source being Y-Major is selected on dword0 and there's
nothing to set on dword1. According to the bspec for Xe2,
"Behavior is undefined when programmed the value 0". Also for XeHP,
the only value allowed in those bits is 0b11, not being possible to
select "Legacy Tile-Y" anymore, only the newer Tile4.
So, unconditionally set those bits for graphics IP 12.50 and above.
v2: Reword commit message and extend it to graphics version >= 12.50
(Matt Roper)
Bspec: 57567 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Instead of using xe_mocs_index_to_value(), simply define the bitmask
with the shift left applied. This will make it easier to adapt to new
platforms that simply use the index.
This also fixes PVC bug in emit_clear_link_copy() where the MOCS was
getting shifted both by PVC_MS_MOCS_INDEX_MASK definition and by the
xe_moc_index_to_value function.
Bspec: 44509 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Fri, 29 Sep 2023 23:03:33 +0000 (16:03 -0700)]
drm/xe/tuning: Add missing engine class rules for LRC tuning
The LRC tuning settings we have today are modifying registers that are
part of the RCS engine's context; they're not part of the general CSFE
context that would apply to all engines. Add ENGINE_CLASS(RENDER) to
the RTP rules to properly restrict these to the RCS.
Fei Yang [Thu, 21 Sep 2023 22:05:00 +0000 (15:05 -0700)]
drm/xe: timeout needs to be a signed value
In xe_wait_user_fence_ioctl, the timeout is currently defined as
unsigned long. That could potentially pass a negative value to
the schedule_timeout() call because nsecs_to_jiffies() returns an
unsigned long which gets used as signed long.
Fixes: 5572a0046857 ("drm/xe: Use nanoseconds instead of jiffies in uapi for user fence") Signed-off-by: Fei Yang <fei.yang@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20230921220500.994558-2-fei.yang@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Fei Yang [Thu, 28 Sep 2023 04:43:35 +0000 (21:43 -0700)]
drm/xe: set PTE_AE for all platforms supporting it
Atomic access is supported by PVC, and became a common feature for all
platforms starting from Xe2. To enable that XE_VMA_ATOMIC_PTE_BIT needs
to be set, then pte encode will eventually set PTE_AE for devmem.
Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230928044335.1474903-2-fei.yang@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When working without GuC (i.e. working with execlists), the flow
attempts to perform suspend operation which is failing due to a
lack of support without GuC.
If PM ops are not supported without GuC we may as well avoid PM
registration rather than returning errors from various PM flows.
Lucas De Marchi [Wed, 27 Sep 2023 19:39:01 +0000 (12:39 -0700)]
drm/xe: Use vfunc for ggtt pte encoding
Use 2 different functions for encoding the ggtt's pte, assigning them
during initialization. Main difference is that before Xe-LPG, the pte
didn't have the cache bits.
v2: Re-use xelp_ggtt_pte_encode_bo() for the common part with
xelpg_ggtt_pte_encode_bo() (Matt Roper)
Lucas De Marchi [Wed, 27 Sep 2023 19:39:00 +0000 (12:39 -0700)]
drm/xe: Use pat_index to encode pde/pte
Change the xelp_pte_encode() and xelp_pde_encode() functions to use the
platform-dependent pat_index. The same function can be used for all
platforms as they only need to encode the pat_index bits in the same
pte/pde layout. For platforms that don't have the most significant bit,
as long as they don't return a bogus index they should be fine.
v2: Use the same logic to encode pde as it's compatible with previous
logic, it's more future proof and also fixes the cache setting for
PVC (Matt Roper)
Lucas De Marchi [Wed, 27 Sep 2023 19:38:59 +0000 (12:38 -0700)]
drm/xe/pat: Keep track of relevant indexes
Some of the PAT entries are relevant for internal driver use, which
varies per platform. Let the PAT early initialization set what they
should point to so the rest of the driver can use them where needed.
Lucas De Marchi [Wed, 27 Sep 2023 19:38:58 +0000 (12:38 -0700)]
drm/xe/pat: Prefer the arch/IP names
Both DG2 and PVC are derived from XeHP, but DG2 should not really
re-use something introduced by PVC, so it's odd to have DG2 re-using the
PVC programming for PAT. Let's prefer using the architecture and/or IP
names.
Lucas De Marchi [Wed, 27 Sep 2023 19:38:56 +0000 (12:38 -0700)]
drm/xe: Use vfunc to initialize PAT
Split the PAT initialization between SW-only and HW. The _early() only
sets up the ops and data structure that are used later to program the
tables. This allows the PAT to be easily extended to other platforms.
Lucas De Marchi [Wed, 27 Sep 2023 19:38:55 +0000 (12:38 -0700)]
drm/xe/migrate: Do not hand-encode pte
Instead of encoding the pte, call a new vfunc from xe_vm to handle that.
The encoding may not be the same on every platform, so keeping it in one
place helps to better support them.
Lucas De Marchi [Wed, 27 Sep 2023 19:38:54 +0000 (12:38 -0700)]
drm/xe: Use vfunc for pte/pde ppgtt encoding
Move the function to encode pte/pde to be vfuncs inside struct xe_vm.
This will allow to easily extend to platforms that don't have a
compatible encoding.
Lucas De Marchi [Wed, 27 Sep 2023 19:38:53 +0000 (12:38 -0700)]
drm/xe: Remove check for vma == NULL
vma at this point can never be NULL as otherwise it would crash earlier
in the only caller, xe_pt_stage_bind_entry(). Remove the extra check and
avoid adding and removing the bits from the pte.
Lucas De Marchi [Wed, 27 Sep 2023 19:38:52 +0000 (12:38 -0700)]
drm/xe: Normalize pte/pde encoding
Split functions that do only part of the pde/pte encoding and that can
be called by the different places. This normalizes how pde/pte are
encoded so they can be moved elsewhere in a subsequent change.
xe_pte_encode() was calling __pte_encode() with a NULL vma, which is the
opposite of what xe_pt_stage_bind_entry() does. Stop passing a NULL vma
and just split another function that deals with a vma rather than a bo.
Matt Roper [Wed, 27 Sep 2023 20:51:44 +0000 (13:51 -0700)]
drm/xe: Infer service copy functionality from engine list
On platforms with multiple BCS engines (i.e., PVC and Xe2), not all BCS
engines are created equal. The BCS0 engine is what the specs refer to
as a "resource copy engine," which supports the platform's full set of
copy/fill instructions. In contast, the non-BCS0 "service copy" engines
are more streamlined and only support a subset of the GPU instructions
supported by the resource copy engine. Platforms with both types of
copy engines always support the MEM_COPY and MEM_SET instructions which
can be used for simple copy and fill operations on either type of BCS
engine. Since the simple MEM_SET instruction meets the needs of Xe's
migrate code (and since the more elaborate XY_FAST_COLOR_BLT instruction
isn't available to use on service copy engines), we always prefer to use
MEM_SET for clearing buffers on our newer platforms.
We've been using a 'has_link_copy_engine' feature flag to keep track of
which platforms should use MEM_SET for fills. However a feature flag
like this is unnecessary since we can already derive the presence of
service copy engines (and in turn the MEM_SET instruction) just by
looking at the platform's pre-fusing engine list. Utilizing the engine
list for this also avoids mistakes like we've made on Xe2 where we
forget to set the feature flag in the IP definition.
For clarity, "service copy" is a general term that covers any blitter
engines that support a limited subset of the overall blitter instruction
set (in practice this is any non-BCS0 blitter engine). The "link copy
engines" introduced on PVC and the "paging copy engine" present in Xe2
are both instances of service copy engines.
drm/xe/irq: Clear GFX_MSTR_IRQ as part of IRQ reset
Starting with Xe_LP+, GFX_MSTR_IRQ contains status bits that have W1C
behavior. If we do not properly reset them, we would miss delivery of
interrupts if a pending bit is set when enabling IRQs.
As an example, the display part of our probe routine contains paths
where we wait for vblank interrupts. If a display interrupt was already
pending when enabling IRQs, we would time out waiting for the vblank.
That in fact happened recently when modprobing Xe on a Lunar Lake with a
specific configuration; and that's how we found out we were missing this
step in the IRQ enabling logic.
Fix the issue by clearing GFX_MSTR_IRQ as part of the IRQ reset.
v2:
- Make resetting GFX_MSTR_IRQ be the last step to avoid bit
re-latching. (Ville)
v3:
- Swap nesting order: guard loop with the IP version check instead of
doing the check at each iteration. (Lucas)
v4:
- Add braces for the "if" statement guarding the loop to make the
compiler happy. (Gustavo)
BSpec: 50875, 54028, 62357 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230926221914.106843-2-gustavo.sousa@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Ensure alignment with PAGE_SIZE for the size parameter
passed to __xe_bo_create_locked()
v2: move size alignment under else condition (Lucas)
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230920213259.3458968-1-pallavi.mishra@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lucas De Marchi [Fri, 22 Sep 2023 17:43:20 +0000 (10:43 -0700)]
drm/xe: Accept a const xe device
Depending on the context, it's preferred to have a const pointer to make
sure nothing is modified underneath. The assert macros only ever read
data from xe/tile/gt for printing, so they can be made const by default.
Use the newly added drm_print_memory_stats helper to show memory
utilisation of our objects in drm/driver specific fdinfo output.
To collect the stats we walk the per memory regions object lists
and accumulate object size into the respective drm_memory_stats
categories.
Objects with multiple possible placements are reported in multiple
regions for total and shared sizes, while other categories are
counted only for the currently active region.
V4:
- Remove rcu lock - Auld/Thomas
- take refcnt only if its non-zero - Auld
- DMA_RESV_USAGE_BOOKKEEP covers all fences - Auld
- covert to xe_bo for public objects
V3:
- dont use xe_bo_get/put, not needed
- use designated initializer - Jani
- use list_for_each_entry_rcu
- Fix Checkpatch err - CI
V2:
- Use static initializer for mem_type - Himal/Jani
In order to show per client memory consumption, we
need tracking support APIs to add at every bo consumption
and removal. Adding APIs here to add tracking calls at
places wherever it is applicable.
V5:
- Rebase
V4:
- remove client bo before vm_put
- spin_lock_irqsave not required - Auld
V3:
- update .h to return xe_drm_client_remove_bo void
- protect xe_drm_client_remove_bo under CONFIG_PROC_FS check - Himal
- Fixed Checkpatch error - CI
V2:
- make xe_drm_client_remove_bo return void - Himal
drm/xe: Interface xe drm client with fdinfo interface
DRM core driver has introduced recently fdinfo interface to
show memory stats of individual drm client. Lets interface
xe drm client to fdinfo interface.
V2:
- cover call to xe_drm_client_fdinfo under PROC_FS
drm/xe: Add child contexts to the GuC context lookup
The CAT_ERROR message from the GuC provides the guc id of the context
that caused the problem, which can be a child context. We therefore
need to be able to match that id to the exec_queue that owns it, which
we do by adding child context to the context lookup.
While at it, fix the error path of the guc id allocation code to
correctly free the ids allocated for parallel queues.
v2: rebase on s/XE_WARN_ON/xe_assert
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/590 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Wed, 13 Sep 2023 23:14:17 +0000 (16:14 -0700)]
drm/xe/wa: Apply tile workarounds at probe/resume
Although the vast majority of workarounds the driver needs to implement
are either GT-based or display-based, there are occasionally workarounds
that reside outside those parts of the hardware (i.e., in they target
registers in the sgunit/soc); we can consider these to be "tile"
workarounds since there will be instance of these registers per tile.
The registers in question should only lose their values during a
function-level reset, so they only need to be applied during probe and
resume; the registers will not be affected by GT/engine resets.
Tile workarounds are rare (there's only one, 22010954014, that's
relevant to Xe at the moment) so it's probably not worth updating the
xe_rtp design to handle tile-level workarounds yet, although we may want
to consider that in the future if/when more of these show up on future
platforms.
Thomas Hellström [Wed, 20 Sep 2023 09:50:01 +0000 (11:50 +0200)]
drm/xe: Disallow pinning dma-bufs in VRAM
For now only support pinning in TT memory, for two reasons:
1) Avoid pinning in a placement not accessible to some importers.
2) Pinning in VRAM requires PIN accounting which is a to-do.
With the GPUVA conversion, the xe_bo::vmas member became replaced with
drm_gem_object::gpuva.list, however there was a couple of usage instances
left using the old member. Most notably the pipelined fence
enable_signaling.
Remove the xe_bo::vmas member completely, fix usage instances and
also enable this pipelined fence enable_signaling even for faulting
VM:s since we actually wait for bind fences to complete.
When testing a new binary and/or debugging binary-related issues, it is
useful to have the option to change which binary is loaded without
having to update and re-compile the kernel. To support this option, this
patch adds 2 new modparams to override the FW path for GuC and HuC. The
HuC modparam can also be set to an empty string to disable HuC loading.
Note that those modparams only take effect on platforms where we already
have a default FW, so we're sure there is support for FW loading and the
kernel isn't going to explode in an undefined path.
v2: simplify comment (John),
rebase on s/guc_submission_enabled/uc_enabled
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
1) the HuC is moved to "disabled" instead of "not supported"
2) the status is left uninitialized instead of "disabled" when the
modparam is used to disable support
3) due to #1, a number of checks are done against "disabled" instead of
the appropriate status.
Address all of those by making sure to follow the appropriate state
transition and checking against the required state.
v2: rebase on s/guc_submission_enabled/uc_enabled/
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe/uc: Rename guc_submission_enabled() to uc_enabled()
The guc_submission_enabled() function is being used as a boolean toggle
for all firmwares and all related features, not just GuC submission. We
could add additional flags/functions to distinguish and allow different
use-cases (e.g. loading HuC but not using GuC submission), but given
that not using GuC is a debug-only scenario having a global switch for
all FWs is enough. However, we want to make it clear that this switch
turns off everything, so rename it to uc_enabled().
v2: rebase on s/XE_WARN_ON/xe_assert
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
v2:
Store last known value when device is awake return that while the GT is
suspended and then update the driver copy when read during awake.
v3:
1. drop init_samples, as storing counters before going to suspend should
be sufficient.
2. ported the "drm/i915/pmu: Make PMU sample array two-dimensional" and
dropped helpers to store and read samples.
3. use xe_device_mem_access_get_if_ongoing to check if device is active
before reading the OA registers.
4. dropped format attr as no longer needed
5. introduce xe_pmu_suspend to call engine_group_busyness_store
6. few other nits.
v4: minor nits.
v5: take forcewake when accessing the OAG registers
v6:
1. drop engine_busyness_sample_type
2. update UAPI documentation
v7:
1. update UAPI documentation
2. drop MEDIA_GT specific change for media busyness counter.
drm/xe: Use spinlock in forcewake instead of mutex
In PMU we need to access certain registers which fall under GT power
domain for which we need to take forcewake. But as PMU being an atomic
context can't expect to have any sleeping calls.
drm/xe/guc: Switch to major-only GuC FW tracking for MTL
Newer HuC binaries for MTL (8.5.1+) require GuC 70.7 or newer, so we
need to move on from 70.6.4. Given that the MTL GuC uses major-only
version matching in i915, we can do the same here instead of just
bumping the version (and having to push the versioned binaries,
because they're not there already for i915).
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drm/xe: Use Xe assert macros instead of XE_WARN_ON macro
The XE_WARN_ON macro maps to WARN_ON which is not justified
in many cases where only a simple debug check is needed.
Replace the use of the XE_WARN_ON macro with the new xe_assert
macros which relies on drm_*. This takes a struct drm_device
argument, which is one of the main changes in this commit. The
other main change is that the condition is reversed, as with
XE_WARN_ON a message is displayed if the condition is true,
whereas with xe_assert it is if the condition is false.
v2:
- Rebase
- Keep WARN splats in xe_wopcm.c (Matt Roper)