Kai-Heng Feng [Wed, 15 Mar 2023 12:07:23 +0000 (20:07 +0800)]
drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi
S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is
caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default").
The root cause is still not clear for now.
So extend and apply the ASPM quirk from commit e02fe3bc7aba
("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to
workaround the issue on Navi cards too.
Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Hung [Mon, 6 Mar 2023 03:39:51 +0000 (11:39 +0800)]
drm/amd/display: remove outdated 8bpc comments
[Why]
The commit c76e483cd916 ("drm/amd/display: Don't restrict bpc to 8 bpc")
removes the historical 8bpc dependency and sets max_bpc to 16.
[How]
The comment that states "8bpc for non-edp" needs to be removed as well.
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Saaem Rizvi [Mon, 6 Mar 2023 20:10:13 +0000 (15:10 -0500)]
drm/amd/display: Implement workaround for writing to OTG_PIXEL_RATE_DIV register
[Why and How]
Current implementation requires FPGA builds to take a different
code path from DCN32 to write to OTG_PIXEL_RATE_DIV. Now that
we have a workaround to write to OTG_PIXEL_RATE_DIV register without
blanking display on hotplug on DCN32, we can allow the code paths for
FPGA to be exactly the same allowing for more consistent
testing.
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Mon, 20 Mar 2023 09:33:47 +0000 (17:33 +0800)]
drm/amdgpu: Initialize umc ras callback
Fix a coding error which results to null interrupt
handler for umc ras.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:17:17 +0000 (08:17 +0000)]
drm/amd/display/dc/link/link_detection: Demote a couple of kerneldoc abuses
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_detection.c:877: warning: Function parameter or member 'link' not described in 'detect_link_and_local_sink'
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_detection.c:877: warning: Function parameter or member 'reason' not described in 'detect_link_and_local_sink'
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_detection.c:1232: warning: Function parameter or member 'link' not described in 'dc_link_detect_connection_type'
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Lee Jones <lee@kernel.org> Cc: Wenjing Liu <wenjing.liu@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:17:16 +0000 (08:17 +0000)]
drm/amd/display/dc/dce60/Makefile: Fix previous attempt to silence known override-init warnings
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/../display/dc/dce60/dce60_resource.c:157:21: note: in expansion of macro ‘mmCRTC1_DCFE_MEM_LIGHT_SLEEP_CNTL’
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_transform.h:170:9: note: in expansion of macro ‘SRI’
drivers/gpu/drm/amd/amdgpu/../display/dc/dce60/dce60_resource.c:183:17: note: in expansion of macro ‘XFM_COMMON_REG_LIST_DCE60’
drivers/gpu/drm/amd/amdgpu/../display/dc/dce60/dce60_resource.c:188:17: note: in expansion of macro ‘transform_regs’
drivers/gpu/drm/amd/amdgpu/../include/asic_reg/dce/dce_6_0_d.h:722:43: warning: initialized field overwritten [-Woverride-init]
drivers/gpu/drm/amd/amdgpu/../display/dc/dce60/dce60_resource.c:157:21: note: in expansion of macro ‘mmCRTC2_DCFE_MEM_LIGHT_SLEEP_CNTL’
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_transform.h:170:9: note: in expansion of macro ‘SRI’
drivers/gpu/drm/amd/amdgpu/../display/dc/dce60/dce60_resource.c:183:17: note: in expansion of macro ‘XFM_COMMON_REG_LIST_DCE60’
drivers/gpu/drm/amd/amdgpu/../display/dc/dce60/dce60_resource.c:189:17: note: in expansion of macro ‘transform_regs’
drivers/gpu/drm/amd/amdgpu/../include/asic_reg/dce/dce_6_0_d.h:722:43: note: (near initialization for ‘xfm_regs[2].DCFE_MEM_LIGHT_SLEEP_CN
[100 lines snipped for brevity]
Fixes: ceb3cf476a441 ("drm/amd/display/dc/dce60/Makefile: Ignore -Woverride-init warning") Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Mauro Rossi <issor.oruam@gmail.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:2190: warning: Function parameter or member 'link' not described in 'dc_link_is_dp_sink_present'
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:17:13 +0000 (08:17 +0000)]
drm/amd/display/dc/link/protocols/link_dp_capability: Remove unused variable and mark another as __maybe_unused
‘ds_port’ is clearly not used anywhere and ‘result_write_min_hblank’ is
only utilised when debugging is enabled. The alternative would be to
allocate the variable under the same clause as the debugging code, but
that would become very messy, very quickly.
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c: In function ‘dp_wa_power_up_0010FA’:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:280:42: warning: variable ‘ds_port’ set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c: In function ‘dpcd_set_source_specific_data’:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:1296:32: warning: variable ‘result_write_min_hblank’ set but not used [-Wunused-but-set-variable]
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Wenjing Liu <wenjing.liu@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:17:11 +0000 (08:17 +0000)]
drm/amd/display/dc/link/protocols/link_dp_training: Remove set but unused variable 'result'
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training.c: In function ‘perform_link_training_with_retries’:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training.c:1586:38: warning: variable ‘result’ set but not used [-Wunused-but-set-variable]
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Wenjing Liu <wenjing.liu@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_detection.c: In function ‘query_hdcp_capability’:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_detection.c:501:42: warning: variable ‘status’ set but not used [-Wunused-but-set-variable]
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Wenjing Liu <wenjing.liu@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:257: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Zhang <dingchen.zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:17:01 +0000 (08:17 +0000)]
drm/amd/display/amdgpu_dm/amdgpu_dm_helpers: Move defines out to where they are actually used
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/../display/include/ddc_service_types.h: At top level:
drivers/gpu/drm/amd/amdgpu/../display/include/ddc_service_types.h:143:22:
warning: ‘SYNAPTICS_DEVICE_ID’ defined but not used [-Wunused-const-variable=]
drivers/gpu/drm/amd/amdgpu/../display/include/ddc_service_types.h:140:22:
warning: ‘DP_VGA_LVDS_CONVERTER_ID_3’ defined but not used [-Wunused-const-variable=]
drivers/gpu/drm/amd/amdgpu/../display/include/ddc_service_types.h:138:22:
warning: ‘DP_VGA_LVDS_CONVERTER_ID_2’ defined but not used [-Wunused-const-variable=]
drivers/gpu/drm/amd/amdgpu/../display/include/ddc_service_types.h:133:22:
warning: ‘DP_SINK_DEVICE_STR_ID_2’ defined but not used [-Wunused-const-variable=]
drivers/gpu/drm/amd/amdgpu/../display/include/ddc_service_types.h:132:22:
warning: ‘DP_SINK_DEVICE_STR_ID_1’ defined but not used [-Wunused-const-variable=]
[snip 400 similar lines brevity]
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:17:00 +0000 (08:17 +0000)]
drm/amd/pm/swsmu/smu11/vangogh_ppt: Provide a couple of missing parameter descriptions
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/vangogh_ppt.c:2381: warning: Function parameter or member 'residency' not described in 'vangogh_get_gfxoff_residency'
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/vangogh_ppt.c:2399: warning: Function parameter or member 'entrycount' not described in 'vangogh_get_gfxoff_entrycount'
Cc: Evan Quan <evan.quan@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Li Ma <li.ma@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:16:58 +0000 (08:16 +0000)]
drm/amd/amdgpu/amdgpu_mes: Ensure amdgpu_bo_create_kernel()'s return value is checked
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function ‘amdgpu_mes_ctx_alloc_meta_data’:
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1099:13: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Jack Xiao <Jack.Xiao@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:16:57 +0000 (08:16 +0000)]
drm/amd/amdgpu/ih_v6_0: Repair misspelling and provide descriptions for 'ih'
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/ih_v6_0.c:392: warning: Function parameter or member 'ih' not described in 'ih_v6_0_get_wptr'
drivers/gpu/drm/amd/amdgpu/ih_v6_0.c:432: warning: Function parameter or member 'ih' not described in 'ih_v6_0_irq_rearm'
drivers/gpu/drm/amd/amdgpu/ih_v6_0.c:458: warning: Function parameter or member 'ih' not described in 'ih_v6_0_set_rptr'
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:16:56 +0000 (08:16 +0000)]
drm/amd/amdgpu/gmc_v11_0: Provide a few missing param descriptions relating to hubs and flushes
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:282: warning: Function parameter or member 'vmhub' not described in 'gmc_v11_0_flush_gpu_tlb'
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:282: warning: Function parameter or member 'flush_type' not described in 'gmc_v11_0_flush_gpu_tlb'
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:322: warning: Function parameter or member 'flush_type' not described in 'gmc_v11_0_flush_gpu_tlb_pasid'
drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c:322: warning: Function parameter or member 'all_hub' not described in 'gmc_v11_0_flush_gpu_tlb_pasid'
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lee Jones [Fri, 17 Mar 2023 08:16:42 +0000 (08:16 +0000)]
drm/amd/display/dc/dc_hdmi_types: Move string definition to the only file it's used in
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_hdmi_types.h:53:22:
warning: ‘dp_hdmi_dongle_signature_str’ defined but not used [-Wunused-const-variable=]
[snipped 400 similar lines for brevity]
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Wenjing Liu <wenjing.liu@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jane Jian [Fri, 8 Jul 2022 10:07:38 +0000 (18:07 +0800)]
drm/amdgpu/jpeg: enable jpeg v4_0 for sriov
- skip direct jpeg registers read&write since it is not allowed
- reset Doorbell range layout for sriov
Signed-off-by: Jane Jian <Jane.Jian@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1) Turn the connector-type + signal check into an early exit
condition to avoid the indentation level of the rest of the code
2) Add an array bounds check for the arrays indexed by dm->num_of_edps
3) register_backlight_device() always increases dm->num_of_edps if
amdgpu_dm_register_backlight_device() has assigned a backlight_dev to
the current dm->backlight_link[dm->num_of_edps] slot.
So on its next call dm->backlight_dev[dm->num_of_edps] always point to
the next empty slot and the "if (!dm->backlight_dev[dm->num_of_edps])"
check will thus always succeed and can be removed.
4) Add a bl_idx local variable to use as array index, rather then
using dm->num_of_edps to improve the code readability.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
backlight_device_register() returns an ERR_PTR on error, but other code
such as amdgpu_dm_connector_destroy() assumes dm->backlight_dev[i] is NULL
if no backlight is registered.
Clear dm->backlight_dev[i] on registration failure, to avoid other code
trying to deref an ERR_PTR pointer.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
YuBiao Wang [Thu, 16 Mar 2023 03:30:32 +0000 (11:30 +0800)]
drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
[Why]
For engines not supporting soft reset, i.e. VCN, there will be a failed
ib test before mode 1 reset during asic reset. The fences in this case
are never signaled and next time when we try to free the sa_bo, kernel
will hang.
[How]
During pre_asic_reset, driver will clear job fences and afterwards the
fences' refcount will be reduced to 1. For drm_sched_jobs it will be
released in job_free_cb, and for non-sched jobs like ib_test, it's meant
to be released in sa_bo_free but only when the fences are signaled. So
we have to force signal the non_sched bad job's fence during
pre_asic_reset or the clear is not complete.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tong Liu01 [Wed, 15 Mar 2023 07:24:22 +0000 (15:24 +0800)]
drm/amdgpu: add mes resume when do gfx post soft reset
[why]
when gfx do soft reset, mes will also do reset, if mes is not
resumed when do recover from soft reset, mes is unable to respond
in later sequence
[how]
resume mes when do gfx post soft reset
Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tim Huang [Thu, 9 Mar 2023 08:27:51 +0000 (16:27 +0800)]
drm/amdgpu: skip ASIC reset for APUs when go to S4
For GC IP v11.0.4/11, PSP TMR need to be reserved
for ASIC mode2 reset. But for S4, when psp suspend,
it will destroy the TMR that fails the ASIC reset.
We can skip the reset on APUs, assuming we can resume them
properly. Verified on some GFX11, GFX10 and old GFX9 APUs.
Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tim Huang [Wed, 15 Mar 2023 07:52:09 +0000 (15:52 +0800)]
drm/amdgpu: reposition the gpu reset checking for reuse
Move the amdgpu_acpi_should_gpu_reset out of
CONFIG_SUSPEND to share it with hibernate case.
Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 15 Mar 2023 17:51:43 +0000 (13:51 -0400)]
drm/amdgpu: drop the extra sign extension
amdgpu_bo_gpu_offset_no_check() already calls
amdgpu_gmc_sign_extend() so no need to call it twice.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Wed, 22 Mar 2023 00:35:45 +0000 (10:35 +1000)]
Merge tag 'drm-habanalabs-next-2023-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver and accel changes for v6.4:
- uAPI changes:
- Add opcodes to the CS ioctl to allow user to stall/resume specific engines
inside Gaudi2. This is to allow the user to perform power
testing/measurements when training different topologies.
- Expose in the INFO ioctl the amount of device memory that the driver
and f/w reserve for themselves.
- Expose in the INFO ioctl a bit-mask of the available rotator engines
in Gaudi2. This is to align with other engines that are already exposed.
- Expose in the INFO ioctl the register's address of the f/w that should
be used to trigger interrupts from within the user's code running in the
compute engines.
- Add a critical-event bit in the eventfd bitmask so the user will know the
event that was received was critical, and a reset will now occur
- Expose in the INFO ioctl two new opcodes to fetch information on h/w and
f/w events. The events recorded are the events that were reported in the
eventfd.
- New features and improvements:
- Add a dedicated interrupt ID in MSI-X in the device to the notification of
an unexpected user-related event in Gaudi2. Handle it in the driver by
reporting this event.
- Allow the user to fetch the device memory current usage even when the
device is undergoing compute-reset (a reset type that only clears the
compute engines).
- Enable graceful reset mechanism for compute-reset. This will give the
user a few seconds before the device is reset. For example, the user can,
during that time, perform certain device operations (dump data for debug)
or close the device in an orderly fashion.
- Align the decoder with the rest of the engines in regard to notification
to the user about interrupts and in regard to performing graceful reset
when needed (instead of immediate reset).
- Add support for assert interrupt from the TPC engine.
- Get the reset type that is necessary to perform per event from the
auto-generated irq_map array.
- Print the specific reason why a device is still in use when notifying to
the user about it (after the user closed the device's FD).
- Move to threaded IRQ when handling interrupts of workload completions.
- Firmware related fixes:
- Fix RAZWI event handler to match newest f/w version.
- Read error cause register in dma core events because the f/w doesn't
do that.
- Increase maximum time to wait for completion of Gaudi2 reset due to f/w
bug.
- Align to the latest firmware specs.
- Enforce the release order of the compute device and dma-buf.
i.e increment the device file refcount for any dma-buf that was exported
for that device. This will make sure the compute device release function
won't be called until the user closes all the FDs of the relevant
dma-bufs. Without this change, closing the device's FD before/without
closing the dma-buf's FD would always lead to hard-reset of the device.
- Fix a link in the drm documentation to correctly point to the accel section.
- Compilation warnings cleanups
- Misc bug fixes and code cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmQYfcAACgkQZR1NuKta
# 54DB4Af/SuiHZkVXwr+yHPv9El726rz9ZQD7mQtzNmehWGonwAvz15yqocNMUSbF
# JbqE/vrZjvbXrP1Uv5UrlRVdnFHSPV18VnHU4BMS/WOm19SsR6vZ0QOXOoa6/AUb
# w+kF3D//DbFI4/mTGfpH5/pzwu51ti8aVktosPFlHIa8iI8CB4/4IV+ivQ8UW4oK
# HyDRkIvHdRmER7vGOfhwhsr4zdqSlJBYrv3C3Z1dkSYBPW/5ICbiM1UlKycwdYKI
# cajQBSdUQwUCWnI+i8RmSy3kjNO6OE4XRUvTv89F2bQeyK/1rJLG2m2xZR/Ml/o5
# 7Cgvbn0hWZyeqe7OObYiBlSOBSehCA==
# =wclm
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 21 Mar 2023 01:37:36 AEST
# gpg: using RSA key ED311BA00042EF52DCB412C5651D4DB8AB5AE780
# gpg: Can't check signature: No public key
From: Oded Gabbay <ogabbay@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230320154026.GA766126@ogabbay-vm-u20.habana-labs.com
Dave Airlie [Tue, 21 Mar 2023 20:49:01 +0000 (06:49 +1000)]
Merge tag 'drm-intel-gt-next-2023-03-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:
- Fix issue #6333: "list_add corruption" and full system lockup from
performance monitoring (Janusz)
- Give the punit time to settle before fatally failing (Aravind, Chris)
- Don't use stolen memory or BAR for ring buffers on LLC platforms (John)
- Add missing ecodes and correct timeline seqno on GuC error captures (John)
- Make sure DSM size has correct 1MiB granularity on Gen12+ (Nirmoy,
Lucas)
- Fix potential SSEU max_subslices array-index-out-of-bounds access on Gen11 (Andrea)
- Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+ (Matt R.)
- Apply Wa_1408615072/Wa_1407596294 correctly on Gen11 (Matt R)
- Apply LNCF/LBCF workarounds correctly on XeHP SDV/PVC/DG2 (Matt R)
- Implement Wa_1606376872 for Xe_LP (Gustavo)
- Consider GSI offset when doing MCR lookups on Meteorlake+ (Matt R.)
- Add engine TLB invalidation for Meteorlake (Matt R.)
- Fix GSC Driver-FLR completion on Meteorlake (Alan)
- Fix GSC races on driver load/unload on Meteorlake+ (Daniele)
- Disable MC6 for MTL A step (Badal)
- Consolidate TLB invalidation flow (Tvrtko)
- Improve debug GuC/HuC debug messages (Michal Wa., John)
- Move fd_install after last use of fence (Rob)
- Initialize the obj flags for shmem objects (Aravind)
- Fix missing debug object activation (Nirmoy)
- Probe lmem before the stolen portion (Matt A)
- Improve clean up of GuC busyness stats worker (John)
- Fix missing return code checks in GuC submission init (John)
- Annotate two more workaround/tuning registers as MCR on PVC (Matt R)
- Fix GEN8_MISCCPCTL definition and remove unused INF_UNIT_LEVEL_CLKGATE (Lucas)
- Use sysfs_emit() and sysfs_emit_at() (Nirmoy)
- Make kobj_type structures constant (Thomas W.)
- make kobj attributes const on gt/ (Jani)
- Remove the unused virtualized start hack on buddy allocator (Matt A)
- Remove redundant check for DG1 (Lucas)
- Move DG2 tuning to the right function (Lucas)
- Rename dev_priv to i915 for private data naming consistency in gt/ (Andi)
- Remove unnecessary whitelisting of CS_CTX_TIMESTAMP on Xe_HP platforms (Matt R.)
-
Dave Airlie [Tue, 21 Mar 2023 01:03:16 +0000 (11:03 +1000)]
Merge tag 'drm-misc-next-2023-03-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.4-rc1:
Cross-subsystem Changes:
- Add drm_bridge.h to drm_bridge maintainers.
Core Changes:
- Assorted fixes to TTM, tests, format-helper, accel.
- Assorted Makefile fixes to drivers and accel.
- Implement fbdev emulation for GEM DMA drivers, and convert a lot of
drivers to use it.
- Use tgid instead of pid for tracking clients.
Driver Changes:
- Assorted fixes in rockchip, vmwgfx, nouveau, cirrus.
- Add imx25 driver.
- Add Elida KD50T048A, Sony TD4353, Novatek NT36523, STARRY 2081101QFH032011-53G panels.
- Add 4K mode support to rockchip.
- Convert cirrus to use regular atomic helpers, and more cirrus
improvements.
- Add damage clipping to cirrus, virtio.
Dani Liberman [Thu, 16 Mar 2023 13:03:12 +0000 (15:03 +0200)]
accel/habanalabs: change razwi handle after fw fix
FW had one data route for tpc0 and tpc1 when running in secured mode
and a different one when running without secured mode. After fw fixed
this issue, both mode have the same data path.
Ofir Bitton [Wed, 8 Mar 2023 11:34:52 +0000 (13:34 +0200)]
accel/habanalabs: add handling for unexpected user event
In order for the user to be aware of unexpected events in Gaudi2 that
aren't assigned to a specific engine, we are adding the handling of
this dedicated interrupt.
Bagas Sanjaya [Tue, 7 Mar 2023 04:35:26 +0000 (11:35 +0700)]
accel: Link to compute accelerator subsystem intro
Commit 2c204f3d53218d ("accel: add dedicated minor for accelerator
devices") adds link to accelerator nodes section of DRM internals doc
(Documentation/gpu/drm-internals.rst), but the target doesn't exist.
Instead, there is only an introduction doc for computer accelerator
subsytem.
Link to that doc until there is documentation of accelerator internals.
Fixes: 2c204f3d53218d ("accel: add dedicated minor for accelerator devices") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Tomer Tayar [Sun, 12 Mar 2023 15:15:03 +0000 (17:15 +0200)]
accel/habanalabs: remove '\n' when passing strings to gaudi2_print_event()
Remove all '\n' from strings which are passed as arguments to
gaudi2_print_event(), because the newline character is added internally
in this function.
Koby Elbaz [Tue, 7 Mar 2023 08:13:44 +0000 (10:13 +0200)]
accel/habanalabs: return tlb inv error code upon failure
Now that CQ-completion based jobs do not trigger a reset upon failure,
failure of such jobs (e.g., MMU cache invalidation) should be handled
by the caller itself depending on the error code returned to it.
- remove reset_sleep_ms arg from functions that don't use it.
- move the call msleep(reset_sleep_ms) from btm poll to gaudi2_hw_fini
as it is called from there already for other flow.
Dafna Hirschfeld [Mon, 20 Feb 2023 08:09:25 +0000 (10:09 +0200)]
accel/habanalabs: in hw_fini return error code if polling timed-out
In hw_fini callback, we use either the cpucp packet method or polling a
register. Currently we return error only in the case of cpucp packet
failure. In this patch we also return error if polling timed out.
Koby Elbaz [Mon, 6 Mar 2023 14:43:41 +0000 (16:43 +0200)]
accel/habanalabs: do not verify engine modes after being changed
Engines idle state can't always be verified between changes of
engine modes (e.g., stall/halt).
For example, if a CS is inflight when altering engine's mode,
idle state will return NOT idle, always.
Dave Airlie [Mon, 20 Mar 2023 06:44:36 +0000 (16:44 +1000)]
Merge tag 'amd-drm-next-6.4-2023-03-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.4-2023-03-17:
amdgpu:
- Misc code cleanups
- Documentation fixes
- Make kobj structures const
- Add thermal throttling adjustments for supported APUs
- UMC RAS fixes
- Display reset fixes
- DCN 3.2 fixes
- Freesync fixes
- DC code reorg
- Generalize dmabuf import to work with KFD
- DC DML fixes
- SRIOV fixes
- UVD code cleanups
- IH 4.4.2 updates
- HDP 4.4.2 updates
- SDMA 4.4.2 updates
- PSP 13.0.6 updates
- Add capped/uncapped workload handling for supported APUs
- DCN 3.1.4 updates
- Re-org DC Kconfig
- USB4 fixes
- Reorg DC plane and stream handling
- Register vga_switcheroo for apple-gmux
- SMU 13.0.6 updates
- Fix error checking in read_mm_registers functions for affected families
- VCN 4.0.4 fix
- Drop redundant pci_enable_pcie_error_reporting() call
- RDNA2 SMU OD suspend/resume fix
- Expose additional memory stats via fdinfo
- RAS fixes
- Misc display fixes
- DP MST fixes
- IOMMU regression fix for KFD
amdkfd:
- Make kobj structures const
- Support for exporting buffers via dmabuf
- Multi-VMA page migration fixes
- NBIO fixes
- Misc code cleanups
- Fix possible double free
- Fix possible UAF
radeon:
- iMac fix
UAPI:
- KFD dmabuf export support. Required for importing KFD buffers into GEM contexts and for RDMA P2P support.
Proposed user mode changes: https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf
Tom Rix [Fri, 3 Mar 2023 13:27:31 +0000 (08:27 -0500)]
drm/nouveau/fifo: set gf100_fifo_nonstall_block_dump storage-class-specifier to static
gcc with W=1 reports
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gf100.c:451:1: error:
no previous prototype for ‘gf100_fifo_nonstall_block’ [-Werror=missing-prototypes]
451 | gf100_fifo_nonstall_block(struct nvkm_event *event, int type, int index)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
gf100_fifo_nonstall_block is only used in gf100.c, so it should be static
Felix Kuehling [Tue, 14 Mar 2023 00:03:08 +0000 (20:03 -0400)]
drm/amdgpu: Don't resume IOMMU after incomplete init
Check kfd->init_complete in kgd2kfd_iommu_resume, consistent with other
kgd2kfd calls. This should fix IOMMU errors on resume from suspend when
KFD IOMMU initialization failed.
Hawking Zhang [Sat, 4 Mar 2023 12:22:23 +0000 (20:22 +0800)]
drm/amdgpu: drop ras check at asic level for new blocks
amdgpu_ras_register_ras_block should always be invoked
by ras_sw_init, where driver needs to check ras caps
at ip level, instead of asic level.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Mon, 13 Mar 2023 06:18:34 +0000 (14:18 +0800)]
drm/amdgpu: Rework pcie_bif ras sw_init
pcie_bif ras blocks needs to be initialized as early
as possible to handle fatal error detected in hw_init
phase. also align the pcie_bif ras sw_init with other
ras blocks
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Sat, 4 Mar 2023 11:54:14 +0000 (19:54 +0800)]
drm/amdgpu: Rework xgmi_wafl_pcs ras sw_init
To align with other IP blocks.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Wed, 15 Mar 2023 00:59:04 +0000 (08:59 +0800)]
drm/amdgpu: Rework mca ras sw_init
To align with other IP blocks
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Belanger [Tue, 28 Feb 2023 19:11:24 +0000 (14:11 -0500)]
drm/amdkfd: Fixed kfd_process cleanup on module exit.
Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.
v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.
v3: Fixed loop element access / synchronization. Fixed extra empty lines.
Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aric Cyr [Mon, 6 Mar 2023 01:48:26 +0000 (20:48 -0500)]
drm/amd/display: 3.2.227
This version brings along the following:
- FW Release 0.0.158.0
- Fixes to HDCP, DP MST and more
- Improvements on USB4 links and more
- Code re-architecture on link.h
Reviewed-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Fri, 3 Mar 2023 22:30:25 +0000 (17:30 -0500)]
drm/amd/display: fix assert condition
[Why & How]
Reversed assert condition when checking that phy_pix_clk[] is not 0
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stylon Wang [Wed, 1 Mar 2023 15:56:51 +0000 (23:56 +0800)]
drm/amd/display: Clearly states if long or short HPD event in dmesg logs
[Why]
The log "DMUB HPD callback" is crucial to identify when DP tunneling
is been established and driver is notified of this event from DMUB.
Same log is shared for long and short hotplug event and we need to
check trailing DC debug log to distinguish between them two, making
debugging on DPIA related issues a bit more troublesome.
[How]
Clearly states in dmesg logs whether this is a long or short hotplug
event.
Reviewed-by: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Stylon Wang <stylon.wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wesley Chalmers [Mon, 27 Feb 2023 18:21:17 +0000 (13:21 -0500)]
drm/amd/display: Make DCN32 functions available to future DCNs
[Why & How]
Make DCN32 functions available for more DCNs.
Reviewed-by: Chris Park <Chris.Park@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yifan Zha [Mon, 6 Mar 2023 06:54:05 +0000 (14:54 +0800)]
drm/amdgpu: Init MMVM_CONTEXTS_DISABLE in gmc11 golden setting under SRIOV
[Why]
If disable the mmhub vm contexts(set MMVM_CONTEXTS_DISABLE to 0xffff),
driver loading failed on vf due to fence fallback timer expired on all rings.
FLR cannot reset MMVM_CONTEXTS_DISABLE.
So this vf can not be recovered anymore unless trigger a whole gpu reset.
[How]
Under SRIOV, init MMVM_CONTEXTS_DISABLE in gmc11 golden register setting.
Samson Tam [Tue, 28 Feb 2023 19:33:00 +0000 (14:33 -0500)]
drm/amd/display: reallocate DET for dual displays with high pixel rate ratio
[Why]
For dual displays where pixel rate is much higher on one display,
we may get underflow when DET is evenly allocated.
[How]
Allocate less DET segments for the lower pixel rate display and
more DET segments for the higher pixel rate display
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cruise Hung [Thu, 2 Mar 2023 02:33:51 +0000 (10:33 +0800)]
drm/amd/display: Fix DP MST sinks removal issue
[Why]
In USB4 DP tunneling, it's possible to have this scenario that
the path becomes unavailable and CM tears down the path a little bit late.
So, in this case, the HPD is high but fails to read any DPCD register.
That causes the link connection type to be set to sst.
And not all sinks are removed behind the MST branch.
[How]
Restore the link connection type if it fails to read DPCD register.
Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tvrtko Ursulin [Tue, 14 Mar 2023 14:18:55 +0000 (14:18 +0000)]
drm: Track clients by tgid and not tid
Thread group id (aka pid from userspace point of view) is a more
interesting thing to show as an owner of a DRM fd, so track and show that
instead of the thread id.
In the next patch we will make the owner updated post file descriptor
handover, which will also be tgid based to avoid ping-pong when multiple
threads access the fd.
Bjorn Helgaas [Tue, 7 Mar 2023 20:27:29 +0000 (14:27 -0600)]
accel/habanalabs: Drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since
commit f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"),
the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Tomer Tayar [Wed, 1 Mar 2023 15:45:58 +0000 (17:45 +0200)]
accel/habanalabs: postpone mem_mgr IDR destruction to hpriv_release()
The memory manager IDR is currently destroyed when user releases the
file descriptor.
However, at this point the user context might be still held, and memory
buffers might be still in use.
Later on, calls to release those buffers will fail due to not finding
their handles in the IDR, leading to a memory leak.
To avoid this leak, split the IDR destruction from the memory manager
fini, and postpone it to hpriv_release() when there is no user context
and no buffers are used.
Dafna Hirschfeld [Mon, 20 Feb 2023 05:54:44 +0000 (07:54 +0200)]
accel/habanalabs: move soft-reset wait to soft-reset execute
We plan to do soft-reset either by mmio or by using cpucp packet
depending on the FW version. We don't want to check FW version in two
different places for that (execute soft-reset and wait to soft-reset)
so move the waiting to gaudi2_execute_soft_reset. This also makes sense
because the cpucp also does the waiting.
Koby Elbaz [Wed, 15 Feb 2023 15:51:14 +0000 (17:51 +0200)]
accel/habanalabs: add uapi to stall/resume engine
The user might want to stall/resume engines to perform power testing
for various scenarios. Because our current
HL_CS_FLAGS_ENGINE_CORE_COMMAND command only handles the engines' cores,
we need to add another opcode for handling entire engine and not just
its core.
The user supplies an array, where each entry holds the engine's ID and
the command to send to the engine. The size of the array is limited
by the number of engines in the ASIC (only Gaudi2 is currently
supported).
Tomer Tayar [Fri, 17 Feb 2023 10:56:48 +0000 (12:56 +0200)]
accel/habanalabs: use scnprintf() in print_device_in_use_info()
compose_device_in_use_info() was added to handle the snprintf() return
value in a single place.
However, the buffer size in print_device_in_use_info() is set such that
it would be enough for the max possible print, so
compose_device_in_use_info() is not really needed.
Moreover, scnprintf() can be used instead of snprintf(), to save the
check if the return value larger than the given size.
Koby Elbaz [Thu, 23 Feb 2023 16:17:02 +0000 (18:17 +0200)]
accel/habanalabs: use a mutex rather than a spinlock
There are two reasons why mutex is better here:
1. There's a critical section relatively long, where in
certain scenarios (e.g., multiple VM allocations) taking a spinlock
might cause noticeable performance degradation.
2. It will remove the incorrect usage of mutex under
spin_lock (where preemption is disabled).
Koby Elbaz [Tue, 21 Feb 2023 12:21:39 +0000 (14:21 +0200)]
accel/habanalabs: fix register address on PDMA/EDMA idle check
The PDMA/EDMA is_idle routines didn't check the correct CORE register
in order to get the accurate idle state.
Moreover, it's better to make the is_idle routine more robust by adding
additional checks (IS_HALTED) before announcing that the core is idle.
Koby Elbaz [Sun, 26 Feb 2023 06:22:45 +0000 (08:22 +0200)]
accel/habanalabs: remove a useless is_idle TPC flag
Is appears that the flag -
DCORE0_TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK, has no actual use when
it comes to querying TPC idleness, since this flag's corresponding bit
turns-off after stalling the engine, and turns back on after resuming
it.
Koby Elbaz [Thu, 23 Feb 2023 08:43:14 +0000 (10:43 +0200)]
accel/habanalabs: verify return code after scrubbing ARCs DCCMs
In case the KDMA fails scrubbing the DCCMs (following a soft-reset
upon device release), the driver will only print failure until reset
flow ends, rather than escalating it into a hard-reset.
Dafna Hirschfeld [Wed, 15 Feb 2023 10:15:57 +0000 (12:15 +0200)]
accel/habanalabs: assert return value of hw_fini
Since hw_fini return error code for failure indication, we should
check its return value. Currently it might only fail upon soft-reset
from hl_device_reset. Later patch will add hw_fini failure in case of
polling timeout in hard-reset.