git.proxmox.com Git - mirror_ubuntu-hirsute-kernel.git/log

drm/amd/powerplay: cleanup the interfaces for powergate setting through SMU

Provided an unified entry point. And fixed the confusing that the API
usage is conflict with what the naming implies.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: issue proper hdp flush for table transferring

Guard the content consistence between the view of GPU and CPU
during the table transferring.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: refine code to support no-dpm case

With "dpm=0", there will be no DPM enabled. The code
needs to be refined to support this.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: unified VRAM address for driver table interaction with SMU V2

By this, we can avoid to pass in the VRAM address on every table
transferring. That puts extra unnecessary traffics on SMU on
some cases(e.g. polling the amdgpu_pm_info sysfs interface).

V2: document what the driver table is for and how it works

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: cache the watermark settings on system memory

So that we do not need to allocate a piece of VRAM for it. This
is a preparation for coming change which unifies the VRAM address
for all driver tables interaction with SMU.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: custom pstate profiling clock frequence for navi series asics

add navi10 & navi14 pstate profiling clock value support.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: L1 Policy(5/5) - removed IH_CHICKEN from VF

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Jane Jian <jane.jian@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: L1 Policy(3/5) - removed ECC interrupt from VF

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Jane Jian <jane.jian@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: L1 Policy(2/5) - removed GC GRBM violations from gfxhub

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Jane Jian <jane.jian@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: L1 Policy(1/5) - removed VM settings for mmhub and gfxhub from VF

Signed-off-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Jane Jian <jane.jian@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: removed GFX RAS support check in UMC ECC callback

enable GPU recovery in event of uncorrectable UMC error

Signed-off-by: John Clements <john.clements@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: added function to wait for PSP BL availability

reduced duplicate code

increased wait time for PSP BL readiness

Signed-off-by: John Clements <john.clements@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use linux size macro to simplify ONE_Kib & One_Mib

replace internal size macro with linux size macro

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Tianci Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: resolve bug in UMC 6 error counter query

iterate over all error counter registers in SMN space

removed support error counter access via MMIO

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: add smu11_driver_if_arcturus.h new OOB members

This is to fit the latest SMC firmware and it's backward compatible.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

amd/amdgpu/sriov tdr enablement with pp_onevf_mode

Under sriov and pp_onevf mode,
1.take resume instead of hw_init for smc recover to avoid
potential memory leak.

2.add return condition inside smc resume function for
sriov_pp_onevf_mode and pm_enabled param.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

amd/amdgpu/sriov enable onevf mode for ARCTURUS VF

Before, initialization of smu ip block would be skipped
for sriov ASICs. But if there's only one VF being used,
guest driver should be able to dump some HW info such as
clks, temperature,etc.

To solve this, now after onevf mode is enabled, host
driver will notify guest. If it's onevf mode, guest will
do smu hw_init and skip some steps in normal smu hw_init
flow because host driver has already done it for smu.

With this fix, guest app can talk with smu and dump hw
information from smu.

v2: refine the logic for pm_enabled.Skip hw_init by not
changing pm_enabled.
v3: refine is_support_sw_smu and fix some indentation
issue.

Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: retrieve the enabled feature mask from cache

This is why those feature mask members designed for. And this
can reduce the SMU workload.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: avoid deadlock on Vega20 swSMU routine

The lock required was already hold by its parent API.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: update UMC 6.1 RAS error counter register access path

use proper method for SMN register access

Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: add helper function smu_get_dpm_level_range() for smu driver

this function can help smu driver to query dpm level clock range from
smu firmware.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: remove three set but not used variable

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/radeon/radeon_atombios.c: In function
‘radeon_get_atom_connector_info_from_object_table’:
drivers/gpu/drm/radeon/radeon_atombios.c:651:26: warning: variable
‘grph_obj_num’ set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/radeon/radeon_atombios.c:651:13: warning: variable
‘grph_obj_id’ set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/radeon/radeon_atombios.c:573:37: warning: variable
‘con_obj_type’ set but not used [-Wunused-but-set-variable]

They are never used, and so can be removed.

Signed-off-by: yu kuai <yukuai3@huawei.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/powerplay: fix NULL pointer issue when SMU disabled

Fix smu related NULL pointer issue which occurs when SMU is disabled.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: use unified variable smu->is_apu to check apu asic platform

use unified variable smu->is_apu to check apu asic in smu driver.

related patch:
drm/amd/powerplay: bypass dpm_context null pointer check guard for some
smu series

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: amalgamated PSP TA invoke functions

reduce duplicate code

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: amalgamate PSP TA load/unload functions

reduce duplicate code

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: by default output PSP ret status in event of cmd failure

update log level from DRM_DEBUG_DRIVER to DRM_WARN

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: correct RLC firmwares loading sequence

Per confirmation with RLC firmware team, the RLC should
be unhalted after all RLC related firmwares uploaded.
However, in fact the RLC is unhalted immediately after
RLCG firmware uploaded. And that may causes unexpected
PSP hang on loading the succeeding RLC save restore
list related firmwares.
So, we correct the firmware loading sequence to load
RLC save restore list related firmwares before RLCG
ucode. That will help to get around this issue.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: add check for baco support on Arcturus

This is used to determine whether runtime pm can be
supported or not.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in display_rq_dlg_calc_21.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:85:6-13: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:88:2-9: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:225:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:226:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:251:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:252:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:256:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:257:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:267:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:269:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:682:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c:1013:1-9: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in display_rq_dlg_calc_20v2.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:110:6-13: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:113:2-9: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:243:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:244:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:267:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:268:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:272:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:273:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:283:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:285:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:673:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:962:1-9: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in display_rq_dlg_calc_20.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:110:6-13: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:113:2-9: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:243:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:244:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:267:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:268:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:272:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:273:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:283:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:285:3-11: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:673:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c:961:1-9: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in dce_calcs.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:157:46-64: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:159:2-20: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:161:46-64: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:163:2-20: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:289:1-12: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:290:1-12: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:341:3-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c:343:4-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in display_mode_vba_21.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/dml/dcn21/display_mode_vba_21.c:4124:3-28: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_mode_vba_21.c:4128:5-30: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dml/dcn21/display_mode_vba_21.c:5207:3-37: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in dcn20_hwseq.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c:186:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c:189:2-10: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in dcn10_hw_sequencer.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c:482:6-14: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c:485:2-10: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use true, false for bool variable in dc_link_ddc.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c:593:6-9: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: use true, false for bool variable in vega20_hwmgr.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c:875:1-31: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: make the set_performance_level logic easier to follow

Have every asic provide a callback for this rather than a mix
of generic and asic specific code.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amdgpu: simplify ATPX detection"

This reverts commit f5fda6d89afe6e9cedaa1c3303903c905262f6e8.

You can't use BASE_CLASS in pci_get_class.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/995
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: simplify function return logic

Former return logic is redundant.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: support custom power profile setting

Support custom power profile mode settings on Arcturus.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: fix kernel_fpu_begin/_end() warnings

kernel_fpu_begin/_end() are already called inside dcn20_resource_construct,
and calling kernel_fpu_begin/_end() recursively triggers WARN_ON() when
CONFIG_X86_DEBUG_FPU is enabled.

[  107.060434] WARNING: CPU: 6 PID: 1370 at arch/x86/kernel/fpu/core.c:90 kernel_fpu_begin+0xbd/0xe0
<snip>
[  107.268197] Call Trace:
[  107.270751]  dcn20_patch_bounding_box+0x17/0x100 [amdgpu]
[  107.276204]  init_soc_bounding_box+0x1b3/0x5f0 [amdgpu]
[  107.281536]  ? _cond_resched+0x19/0x30
[  107.285307]  dcn20_resource_construct+0x3a9/0xa90 [amdgpu]
[  107.290957]  ? dcn20_resource_construct+0x3a9/0xa90 [amdgpu]
[  107.296621]  ? __alloc_pages_nodemask+0x16a/0x330
[  107.301476]  ? _cond_resched+0x19/0x30
[  107.305284]  ? kmem_cache_alloc_trace+0x197/0x230
[  107.310063]  ? _cond_resched+0x19/0x30
[  107.313783]  ? kmem_cache_alloc_trace+0x197/0x230
[  107.318691]  dcn20_create_resource_pool+0x42/0x70 [amdgpu]
[  107.324315]  dc_create_resource_pool+0x12d/0x170 [amdgpu]
[  107.329851]  dc_create+0x1b8/0x6a0 [amdgpu]
[  107.334013]  ? kmem_cache_alloc_trace+0x1e2/0x230
[  107.338832]  amdgpu_dm_init+0x13e/0x1c0 [amdgpu]
<snip>

Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Avoid hanging hardware in stop_cpsch

Don't use the HWS if it's known to be hanging. In a reset also
don't try to destroy the HIQ because that may hang on SRIOV if the
KIQ is unresponsive.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Improve HWS hang detection and handling

Move HWS hang detection into unmap_queues_cpsch to catch hangs in all
cases. If this happens during a reset, don't schedule another reset
because the reset already in progress is expected to take care of it.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Remove unused variable

dqm->pipeline_mem wasn't used anywhere.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix permissions of hang_hws

Reading from /sys/kernel/debug/kfd/hang_hws would cause a kernel
oops because we didn't implement a read callback. Set the permission
to write-only to prevent that.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: shaoyunl <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add DRIVER_SYNCOBJ_TIMELINE to amdgpu

Can expose it now that the khronos has exposed the
vlk extension.

Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use true, false for bool variable in amdgpu_psp.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:674:2-26: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:794:1-25: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:897:2-36: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:1016:1-35: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:1087:2-34: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c:1177:1-33: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use true, false for bool variable in amdgpu_debugfs.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:132:2-10: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:140:2-10: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:142:13-21: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use true, false for bool variable in amdgpu_device.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3961:1-19: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3981:1-19: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use true, false for bool variable in mxgpu_nv.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:255:2-20: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:267:2-20: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use true, false for bool variable in mxgpu_ai.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:253:2-20: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:265:2-20: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use true,false for bool variable in ni.c

Fixes coccicheck warning:

drivers/gpu/drm/radeon/ni.c:2020:2-15: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/radeon/ni.c:2088:2-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use true,false for bool variable in cik.c

Fixes coccicheck warning:

drivers/gpu/drm/radeon/cik.c:8140:2-15: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/radeon/cik.c:8212:2-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use true,false for bool variable in rv770.c

Fixes coccicheck warning:

drivers/gpu/drm/radeon/rv770.c:1706:2-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use true, false for bool variable in evergreen.c

Fixes coccicheck warning:

drivers/gpu/drm/radeon/evergreen.c:4948:2-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use true,false for bool variable in r600.c

Fixes coccicheck warning:

drivers/gpu/drm/radeon/r600.c:3056:2-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use true,false for bool variable in si.c

Fixes coccicheck warning:

drivers/gpu/drm/radeon/si.c:6475:2-15: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/radeon/si.c:6542:2-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: use true,false for bool variable in r100.c

Fixes coccicheck warning:

drivers/gpu/drm/radeon/r100.c:1826:3-31: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/radeon/r100.c:1828:3-31: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/radeon/r100.c:2390:2-22: WARNING: Assignment of 0/1 to bool variable
drivers/gpu/drm/radeon/r100.c:2395:2-22: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: add peak profile support for navi12

Add defined peak sclk for navi12 peak profile mode.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu/navi: Adjust default behavior for peak sclk profile

Fetch the sclk from the pptable if there is no specified sclk for
the board.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add missed return value set for error case

Return value should be set when going to error handle tag
for error case, this can avoid potential invalid array
access by upper caller.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove FB location config for sriov

FB location is already programmed by HV driver
for arcutus so remove this part

Signed-off-by: Frank.Min <Frank.Min@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: enable xgmi init for sriov use case

1. enable xgmi ta initialization for sriov
2. enable xgmi initialization for sriov

Signed-off-by: Frank.Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove memory training p2c buffer reservation(V2)

IP discovery TMR(occupied the top VRAM with size DISCOVERY_TMR_SIZE)
has been reserved, and the p2c buffer is in the range of this TMR, so
the p2c buffer reservation is unnecessary.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: update the method to get fb_loc of memory training(V4)

The method of getting fb_loc changed from parsing VBIOS to
taking certain offset from top of VRAM

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Remove unneeded variable 'ret' in navi10_ih.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/navi10_ih.c:113:5-8: Unneeded variable: "ret". Return "0" on line 182

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Remove unneeded variable 'ret' in amdgpu_device.c

Fixes coccicheck warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1036:5-8: Unneeded variable: "ret". Return "0" on line 1079

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx: Add mmSDMA2-7_EDC_COUNTER to support Arcturus

Add mmSDMA2-7_EDC_COUNTER to support Arcturus

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx: Add mmCOMPUTE_STATIC_THREAD_MGMT_SE4-7 to support Arcturus

Add mmCOMPUTE_STATIC_THREAD_MGMT_SE4-7 to support Arcturus

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx: Replace ARRAY_SIZE with size variable

Replace ARRAY_SIZE with size variables to support
different ASICs.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add mmCOMPUTE_STATIC_THREAD_MGMT_SE4-7 to support Arcturus

Arcturus has 8 SEs. Add mmCOMPUTE_STATIC_THREAD_MGMT_SE4-7
for EDC GPR _workarounds,

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Added ASIC specific check in gmc v9.0 ECC interrupt programming sequence

Devices newer then VEGA10/12 shall have these programming sequences performed by PSP BL

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: enlarge agp_start address into 48bit

max range of the agp aperture is 48 bits, so
enlarge agp_start address into 48bit with all bits set

Signed-off-by: Frank.Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: disable VCN2.5 ib test for Arcturus sriov

currently using TMR loading VCN fw MMSCH would fail
to init after FLR, just disable ib test for temporarily
daily testing, continuing debug with mm team.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix ctx init failure for asics without gfx ring

This workaround does not affect other asics because amdgpu only need expose
one gfx sched to user for now.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: attempt xgmi perfmon re-arm on failed arm

The DF routines to arm xGMI performance will attempt to re-arm both on
performance monitoring start and read on initial failure to arm.

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add perfmons accessible during df c-states

During DF C-State, Perfmon counters outside of range 1D700-1D7FF will
encounter SLVERR affecting xGMI performance monitoring. PerfmonCtr[7:4]
is being added to avoid SLVERR during read since it falls within this
range. PerfmonCtl[7:4] is being added in order to arm PerfmonCtr[7:4].
Since PerfmonCtl[7:4] exists outside of range 1D700-1D7FF, DF routines
will be enabled to opportunistically re-arm PerfmonCtl[7:4] on retry
after SLVERR.

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: simplify padding calculations (v2)

Simplify padding calculations.

v2: Comment update and spacing.

Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: expose num_cp_queues data field to topology node (v2)

Thunk driver would like to know the num_cp_queues data, however this data relied
on different asic specific. So it's better to get it from kfd driver.

v2: don't update name size.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: expose num_sdma_queues_per_engine data field to topology node (v2)

Thunk driver would like to know the num_sdma_queues_per_engine data, however
this data relied on different asic specific. So it's better to get it from kfd
driver.

v2: don't update the name size.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/powerplay: skip disable dynamic state management

Under sriov, the disable operation is no allowed.

Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: enable VCN0 and VCN1 sriov instances support for Arcturus

v1: compared to bare-metal: sriov support psp loading VCN firmware; only one
encoding ring would be used in each instance.
v2: keep unchange for bare-metal VCN2.5 hw_init, just add a flag with sriov
and also remove multiple lines.
v3: squash in warning fix

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: skip VCN2.5 power gating and clock gating for sriov Arcturus

v1: skip gating in serveral called functions by power gating and clock gating
v2: from suggestion, skip setting gate in both set function, which is where
it being called.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: update VCN1(dual instances) fw types ID and VCN ip block type

Previously there is no VCN1 type ID in psp gfx interface. Also add VCN ip
block type unless the reinit after FLR for sriov would fail.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add VCN2.5 sriov start for Arctrus

Use MMSCH V1 to finish Memory Controller
programming as well as start MMSCH to do
VCN2.5 initialization.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add VCN2.5 MMSCH start for Arcturus

Use MMSCH to do the initialization since MMSCH
manages VCN2.5 instances and its world switch.

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: move umc offset to one new header file for Arcturus

Code refactor and no functional change.

Fixes: 4cf781c24c3b ("drm/amdgpu: Added RAS UMC error query support for Arcturus")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/display: include delay.h

For udelay. This is needed for some platforms.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazluaskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: add metrics table lock for vega20 (v2)

To protect access to the metrics table.

v2: unlock on error

Bug: https://gitlab.freedesktop.org/drm/amd/issues/900
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: add metrics table lock for renoir (v2)

To protect access to the metrics table.

v2: unlock on error

Bug: https://gitlab.freedesktop.org/drm/amd/issues/900
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: add metrics table lock for navi (v2)

To protect access to the metrics table.

v2: unlock on error

Bug: https://gitlab.freedesktop.org/drm/amd/issues/900
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: add metrics table lock for arcturus (v2)

To protect access to the metrics table.

v2: unlock on error

Bug: https://gitlab.freedesktop.org/drm/amd/issues/900
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: add metrics table lock

This table is used for lots of things, add it's own lock.

Bug: https://gitlab.freedesktop.org/drm/amd/issues/900
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

gpu: drm: dead code elimination

this set adds support for removal of gpu drm dead code.

patch3 is similar with patch 1:
`num` is a data of u8 type and ATOM_MAX_HW_I2C_READ == 255,

so there is a impossible condition '(num > 255) => (0-255 > 255)'.

Signed-off-by: Pan Zhang <zhangpan26@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: wait for all rings to drain before runtime suspending

Add a safety check to runtime suspend to make sure all outstanding
fences have signaled before we suspend. Doesn't fix any known issue.

We already do this via the fence driver suspend function, but we
just force completion rather than bailing. This bails on runtime
suspend so we can try again later once the fences are signaled to
avoid missing any outstanding work.

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: fix spelling

s/dispaly/display/g

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Switch from system_highpri_wq to system_unbound_wq

This is to avoid queueing jobs to same CPU during XGMI hive reset
because there is a strict timeline for when the reset commands
must reach all the GPUs in the hive.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>