Oak Zeng [Wed, 28 Nov 2018 04:08:25 +0000 (22:08 -0600)]
drm/amdkfd: Fix a potential memory leak
Free mqd_mem_obj it GTT buffer allocation for MQD+control stack fails.
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Wed, 28 Nov 2018 03:58:54 +0000 (21:58 -0600)]
drm/amdkfd: Allocate MQD trunk for HIQ and SDMA
MEC FW for some new asic requires all SDMA MQDs to be in a continuous
trunk of memory right after HIQ MQD. Add a field in device queue manager
to hold the HIQ/SDMA MQD memory object and allocate MQD trunk on device
queue manager initialization.
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Wed, 5 Dec 2018 16:56:41 +0000 (10:56 -0600)]
drm/amdkfd: Add mqd size in mqd manager struct
Also initialize mqd size on mqd manager initialization
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Wed, 5 Dec 2018 16:15:27 +0000 (10:15 -0600)]
drm/amdkfd: Init mqd managers in device queue manager init
Previously mqd managers was initialized on demand. As there
are only a few type of mqd managers, the on demand initialization
doesn't save too much memory. Initialize them on device
queue initialization instead and delete the get_mqd_manager
interface. This makes codes more organized for future changes.
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Tue, 4 Dec 2018 02:38:43 +0000 (20:38 -0600)]
drm/amdkfd: Introduce DIQ type mqd manager
With introduction of new mqd allocation scheme for HIQ,
DIQ and HIQ use different mqd allocation scheme, DIQ
can't reuse HIQ mqd manager
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Mon, 3 Dec 2018 19:56:14 +0000 (13:56 -0600)]
drm/amdkfd: Introduce asic-specific mqd_manager_init function
Global function mqd_manager_init just calls asic-specific functions and it
is not necessary. Delete it and introduce a mqd_manager_init interface in
dqm for asic-specific mqd manager init. Call mqd_manager_init interface
directly to initialize mqd manager
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Tue, 7 May 2019 21:46:14 +0000 (17:46 -0400)]
drm/amdgpu: Improve error handling for HMM
Use unsigned long for number of pages.
Check that pfns are valid after hmm_vma_fault. If they are not,
return an error instead of continuing with invalid page pointers and
PTEs.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 4 Mar 2019 15:37:55 +0000 (10:37 -0500)]
drm/amdgpu: more descriptive message if HMM not enabled
If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 4 Mar 2019 19:41:03 +0000 (14:41 -0500)]
drm/amdgpu: support userptr cross VMAs case with HMM
userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the second VMA.
HMM expects range only have one VMA, loop over all VMAs in the address
range, create multiple ranges to handle this case. See
is_mergeable_anon_vma in mm/mmap.c for details.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 4 Mar 2019 19:10:12 +0000 (14:10 -0500)]
drm/amdkfd: support concurrent userptr update for HMM
Userptr restore may have concurrent userptr invalidation after
hmm_vma_fault adds the range to the hmm->ranges list, needs call
hmm_vma_range_done to remove the range from hmm->ranges list first,
then reschedule the restore worker. Otherwise hmm_vma_fault will add
same range to the list, this will cause loop in the list because
range->next point to range itself.
Add function untrack_invalid_user_pages to reduce code duplication.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 21 Feb 2019 17:39:21 +0000 (12:39 -0500)]
drm/amdgpu: fix HMM config dependency issue
Only select HMM_MIRROR will get kernel config dependency warnings
if CONFIG_HMM is missing in the config. Add depends on HMM will
solve the issue.
Add conditional compilation to fix compilation errors if HMM_MIRROR
is not enabled as HMM config is not enabled.
Remove unused function amdgpu_ttm_tt_mark_user_pages.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Thu, 13 Dec 2018 20:35:28 +0000 (15:35 -0500)]
drm/amdgpu: replace get_user_pages with HMM mirror helpers
Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.
If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
from scratch, for kfd, restore worker is rescheduled to retry.
HMM simplify the CPU page table concurrent update check, so remove
guptasklock, mmu_invalidations, last_set_pages fields from
amdgpu_ttm_tt struct.
HMM does not pin the page (increase page ref count), so remove related
operations like release_pages(), put_page(), mark_page_dirty().
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 5 Dec 2018 19:03:43 +0000 (14:03 -0500)]
drm/amdkfd: avoid HMM change cause circular lock
There is circular lock between gfx and kfd path with HMM change:
lock(dqm) -> bo::reserve -> amdgpu_mn_lock
To avoid this, move init/unint_mqd() out of lock(dqm), to remove nested
locking between mmap_sem and bo::reserve. The locking order
is: bo::reserve -> amdgpu_mn_lock(p->mn)
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 23 Jul 2018 21:45:46 +0000 (17:45 -0400)]
drm/amdgpu: use HMM callback to replace mmu notifier
Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.
It supports both KFD userptr and gfx userptr paths.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
shaoyunl [Thu, 25 Oct 2018 19:40:51 +0000 (15:40 -0400)]
drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration
There is a bug found in vml2 xgmi logic:
mtype is always sent as NC on the VMC to TC interface for a page walk,
regardless of whether the request is being sent to local or remote GPU.
NC means non-coherent and will cause the VMC return data to be cached
in the TCC (versus UC – uncached will not cache the data). Since the
page table updates are being done by SDMA/HDP, then TCC will never be
updated and the GC VML2 will continue to hit on the TCC and never get
the updated page tables and result in a fault.
Heave weigh tlb invalidation does a WB/INVAL of the L1/L2 GL data
caches so TCC will not be hit on next request
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jay Cornwall [Tue, 2 Apr 2019 16:43:30 +0000 (11:43 -0500)]
drm/amdkfd: Preserve ttmp[4:5] instead of ttmp[14:15]
ttmp[4:5] is initialized by the SPI with SPI_GDBG_TRAP_DATA* values.
These values are more useful to the debugger than ttmp[14:15], which
carries dispatch_scratch_base*. There are too few registers to
preserve both.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jay Cornwall [Tue, 19 Feb 2019 20:51:56 +0000 (14:51 -0600)]
drm/amdkfd: Fix gfx9 XNACK state save/restore
SQ_WAVE_IB_STS.RCNT grew from 4 bits to 5 in gfx9. Do not truncate
when saving in the high bits of TTMP1.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jay Cornwall [Thu, 31 Jan 2019 17:38:18 +0000 (11:38 -0600)]
drm/amdkfd: Preserve wave state after instruction fetch MEM_VIOL
If instruction fetch fails the wave cannot be halted and returned to
the shader without raising MEM_VIOL again. Currently the wave is
terminated if this occurs, but this loses information about the cause
of the fault. The debugger would prefer the faulting wave state to be
context-saved.
Poll inside the trap handler until TRAPSTS.SAVECTX indicates context
save is ready. Exit the poll loop and complete the remainder of the
exception handler, then return to the shader. The next instruction
fetch will be from the trap handler and not the faulting PC. Context
save will then deschedule the wave and save its state.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jay Cornwall [Thu, 15 Nov 2018 04:23:25 +0000 (22:23 -0600)]
drm/amdkfd: Fix gfx8 MEM_VIOL exception handler
When MEM_VIOL is asserted the context save handler rewinds the
program counter. This is incorrect for any source of the exception.
MEM_VIOL may be raised in normal operation by out-of-bounds access
to LDS or GDS and does not require special handling.
Remove PC adjustment when MEM_VIOL has been raised.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix compute profile switching on process termination.
Add a dedicated reference counter to keep track of entry/exit to/from
compute profile. This enables switching compute profiles for other
reasons than process creation or termination.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Tue, 4 Dec 2018 22:08:33 +0000 (16:08 -0600)]
drm/amdkfd: Shift sdma_engine_id and sdma_queue_id in mqd
FW of some new ASICs requires sdma mqd size to be not more than
128 dwords. Repurpose the last 2 reserved fields of sdma mqd for
driver internal use, so the total mqd size is no bigger than 128
dwords
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Mon, 3 Dec 2018 15:20:20 +0000 (09:20 -0600)]
drm/amdkfd: Differentiate b/t sdma_id and sdma_queue_id
sdma_queue_id is sdma queue index inside one sdma engine.
sdma_id is sdma queue index among all sdma engines. Use
those two names properly.
Signed-off-by: Oak Zeng <ozeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Thu, 8 Nov 2018 15:40:41 +0000 (10:40 -0500)]
drm/amdkfd: Add sdma allocation debug message
Add debug messages during SDMA queue allocation.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Thu, 1 Nov 2018 15:06:25 +0000 (11:06 -0400)]
drm/amdkfd: Use 64 bit sdma_bitmap
Maximumly support 64 sdma queues
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Mon, 29 Apr 2019 03:35:42 +0000 (11:35 +0800)]
drm/amd/powerplay: enable ppfeaturemask module parameter support on Vega20
Support DPM/DS/ULV related bitmasks of ppfeaturemask module parameter.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/powerplay: Fix maybe-uninitialized in get_ppfeature_status
This fixes the warning below
error: ‘feature_mask’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
*features_enabled = ((((uint64_t)feature_mask[0] << SMU_FEATURES_LOW_SHIFT) & SMU_FEATURES_LOW_MASK) |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(((uint64_t)feature_mask[1] << SMU_FEATURES_HIGH_SHIFT) & SMU_FEATURES_HIGH_MASK));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jun Lei [Tue, 30 Apr 2019 20:22:38 +0000 (16:22 -0400)]
drm/amd/display: dont set otg offset
move the update of otg instance outside of hw programming logic,
since this is sw state, it should always be updated and should
never be optimized away.
Signed-off-by: Jun Lei <Jun.Lei@amd.com> Reviewed-by: Eric Yang <eric.yang2@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Explicitly specify update type per plane info change
[Why]
The bit for flip addr is being set causing the determination for
FAST vs MEDIUM to always return MEDIUM when plane info is provided
as a surface update. This causes extreme stuttering for the typical
atomic update path on Linux.
[How]
Don't use update_flags->raw for determining FAST vs MEDIUM. It's too
fragile to changes like this.
Explicitly specify the update type per update flag instead. It's not
as clever as checking the bits itself but at least it's correct.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Acked-by: Eryk Brol <Eryk.Brol@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Program VTG params after programming Global Sync
[Why]
VTG has a parameter FP2, which is defined as:
if VSTARTUP is before VSYNC:
FP2 = number of lines in between VSTARTUP and VSYNC
else
FP2 = 0
Currently, FP2 is only programmed during "program_timing". However, the
position of VSTARTUP is affected by the prefetching requirements on all pipes,
so the position might change when we do memory request control on another pipe, so we need
to make sure that FP2 stays up-to-date whenever we adjust VSTARTUP.
[How]
- refactor VTG_CONTROL programming into a new function "set_vtg_params"
- call it after calling "program_global_sync"
- make sure it's called after because it relies on the cached dlg params
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Reviewed-by: Tony Cheng <Tony.Cheng@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Acked-by: Jun Lei <Jun.Lei@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Remove DPMS state dependency for fast boot
[Why]
The DPMS state of a display should not impact whether we want to enable fast boot.
Currently fast boot is not enabled when resuming from S4 because of this.
[How]
Remove check for DPMS state when determining if fast boot
can be applied.
Signed-off-by: SivapiriyanKumarasamy <sivapiriyan.kumarasamy@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Hook up CRC capture support for dce120
[Why]
Many IGT tests require CRC capture in order to confirm that the output
is visually correct.
These skip on dce120 because configure_crc and get_crc aren't set.
[How]
Hook up is_tg_enabled, configure_crc and get_crc functions on dce120's
timing generator.
The logic should be the same as DCE and DCN with some minor register
naming differences.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: David Francis <David.Francis@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Tue, 14 May 2019 03:46:27 +0000 (11:46 +0800)]
drm/amd/powerplay: support sw smu hotspot and memory temperature retrieval
Support hotspot and memory temperature retrieval on sw smu routine.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Tue, 14 May 2019 02:38:42 +0000 (10:38 +0800)]
drm/amd/powerplay: support uclk activity retrieve on sw smu routine
Support realtime uclk activity report.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Tue, 14 May 2019 13:12:45 +0000 (09:12 -0400)]
drm/amd/display: Drop DCN1_01 guards
[WHY]
These were only needed for bringup. They're not needed anymore.
Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Mon, 29 Apr 2019 13:39:15 +0000 (09:39 -0400)]
drm/amd/display: Don't load DMCU for Raven 1 (v2)
[WHY]
Some early Raven boards had a bad SBIOS that doesn't play nicely with
the DMCU FW. We thought the issues were fixed by ignoring errors on DMCU
load but that doesn't seem to be the case. We've still seen reports of
users unable to boot their systems at all.
[HOW]
Disable DMCU load on Raven 1. Only load it for Raven 2 and Picasso.
v2: Fix ifdef (Alex)
Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Tue, 14 May 2019 13:05:37 +0000 (09:05 -0400)]
drm/amd/display: Add ASICREV_IS_PICASSO
[WHY]
We only want to load DMCU FW on Picasso and Raven 2, not on Raven 1.
Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Ori Messinger [Mon, 22 Apr 2019 17:52:52 +0000 (13:52 -0400)]
drm/amdgpu: Report firmware versions with sysfs v2
Firmware versions can be found as separate sysfs files at:
/sys/class/drm/cardX/device/fw_version (where X is the card number)
The firmware versions are displayed in hexadecimal.
v2: Moved sysfs files to subfolder
Signed-off-by: Ori Messinger <ori.messinger@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tiecheng Zhou [Tue, 14 May 2019 02:03:35 +0000 (10:03 +0800)]
drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE
it requires to initialize HDP_NONSURFACE_BASE, so as to avoid
using the value left by a previous VM under sriov scenario.
v2: it should not hurt baremetal, generalize it for both sriov
and baremetal
Signed-off-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Mon, 13 May 2019 05:57:29 +0000 (13:57 +0800)]
drm/amdgpu: suppress repeating tmo report
only report once per TMO job and the timer would
be restarted upon the job finished if it's just slow.
Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Fri, 10 May 2019 17:56:30 +0000 (19:56 +0200)]
drm/amdgpu: remove static GDS, GWS and OA allocation
As far as we know this was never used by userspace and so should be removed.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 8 May 2019 05:55:21 +0000 (13:55 +0800)]
drm/amd/powerplay: force to update all clock tables on OD reset
On OD reset, the clock tables in SMU need to be reset to default.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Tue, 7 May 2019 04:49:03 +0000 (12:49 +0800)]
drm/amd/powerplay: update Vega10 power state on OD
Update Vega10 top performance level power state accordingly
on OD.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to enable or disable AVFS if it's already in wanted
state.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With user specified voltage(DPMTABLE_OD_UPDATE_VDDC), the AVFS
will be disabled. However, the buggy code makes this actually not
working as expected.
- V2: clear all OD flags excpet DPMTABLE_OD_UPDATE_VDDC
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Tue, 30 Apr 2019 08:34:20 +0000 (16:34 +0800)]
drm/amd/powerplay: fix Vega10 mclk/socclk voltage link setup
This may affects the Vega10 MCLK OD functionality.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 8 May 2019 15:13:53 +0000 (11:13 -0400)]
drm/amdgpu: check no_user_fence flag for engines
To replace checking ring type and make them generic
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 8 May 2019 15:10:05 +0000 (11:10 -0400)]
drm/amdgpu/VCN: set no_user_fence flag to true
There is no user fence support for VCN
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 8 May 2019 15:08:58 +0000 (11:08 -0400)]
drm/amdgpu/VCE: set no_user_fence flag to true
There is no user fence support for VCE
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 8 May 2019 15:07:26 +0000 (11:07 -0400)]
drm/amdgpu/UVD: set no_user_fence flag to true
There is no user fence support for UVD
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Wed, 8 May 2019 15:05:11 +0000 (11:05 -0400)]
drm/amdgpu: add no_user_fence flag to ring funcs
So we can generalize the no user fence supported engine
Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Thu, 9 May 2019 01:00:14 +0000 (09:00 +0800)]
drm/amdgpu: sdma handle ras resume
During S3/S4 bootloader will re-init ras state behind us.
Resume might fail or raise a gpu reset.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Thu, 9 May 2019 00:58:56 +0000 (08:58 +0800)]
drm/amdgpu: gfx handle ras resume
During S3/S4 bootloader will re-init ras state behind us.
Resume might fail or raise a gpu reset.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Thu, 9 May 2019 00:26:02 +0000 (08:26 +0800)]
drm/amdgpu: gmc handle ras resume
During S3/S4 bootloader will re-init ras state behind us.
Resume might fail or raise a gpu reset.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 23:32:54 +0000 (07:32 +0800)]
drm/amdgpu: enable ras suspend/resume
suspend/resume will change ras state behind us. Let driver get notified.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Thu, 9 May 2019 00:26:27 +0000 (08:26 +0800)]
drm/amdgpu: ras support suspend/resume
add ras suspend function. rename ras_post_init to amdgpu_ras_resume.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Tue, 7 May 2019 03:53:31 +0000 (11:53 +0800)]
drm/amdgpu: add badpages sysfs interafce
add badpages node.
it will output badpages list in format
gpu pfn : gpu page size : flags
example
0x00000000 : 0x00001000 : R
0x00000001 : 0x00001000 : R
0x00000002 : 0x00001000 : R
0x00000003 : 0x00001000 : R
0x00000004 : 0x00001000 : R
0x00000005 : 0x00001000 : R
0x00000006 : 0x00001000 : R
0x00000007 : 0x00001000 : P
0x00000008 : 0x00001000 : P
0x00000009 : 0x00001000 : P
flags can be one of below characters
R: reserved.
P: pending for reserve.
F: failed to reserve for some reasons.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Zhu [Wed, 8 May 2019 20:38:58 +0000 (16:38 -0400)]
drm/amdgpu: Fix S3 test issue
During S3 test, when system wake up and resume, ras interface
is already allocated. Move workaround before ras jumps to resume
step in gfx_v9_0_ecc_late_init, and make sure workaround applied
during resume. Also remove unused mmGB_EDC_MODE clearing.
Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wang Hai [Wed, 8 May 2019 12:55:16 +0000 (20:55 +0800)]
drm/amd/display: Make some functions static
Fix the following sparse warnings:
drivers/gpu/drm/amd/amdgpu/../display/dc/dce120/dce120_resource.c:483:21: warning: symbol 'dce120_clock_source_create' was not declared. Should it be static?
drivers/gpu/drm/amd/amdgpu/../display/dc/dce120/dce120_resource.c:506:6: warning: symbol 'dce120_clock_source_destroy' was not declared. Should it be static?
drivers/gpu/drm/amd/amdgpu/../display/dc/dce120/dce120_resource.c:513:6: warning: symbol 'dce120_hw_sequencer_create' was not declared. Should it be static?
Fixes: b8fdfcc6a92c ("drm/amd/display: Add DCE12 core support") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai26@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Trigger Huang [Fri, 1 Mar 2019 03:56:20 +0000 (11:56 +0800)]
drm/amdgpu: add basic func for RLC program reg
New feature for RLC, some registers can be programmed by
RLC interface under SR-IOV VF:
WREG32_SOC15_RLC_SHADOW:
1, for GRBM_GFX_CNTL, firstly the new register value should be be
programmed to SCRATCH_REG2
1, for GRBM_GFX_INDEX, firstly the new register value should be be
programmed to SCRATCH_REG3
WREG32_RLC:
for registers supported to be programmed by RLC interface, the
following sequence should be used:
1, write the value to SCRATCH_REG0
2, write reg | 0x80000000 to SCRATCH_REG1
3, write 0x1 to RLC_SPARE_INT to notify RLC
4, polling SCRATCH_REG1 to check if finished
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 16:13:22 +0000 (00:13 +0800)]
drm/amdgpu: gpu reset will run ras post init
ras need initialize proper state after late init
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 14:38:37 +0000 (22:38 +0800)]
drm/amdgpu: sdma support ras gpu reset
request a gpu reset if ras return EAGAIN.
we will run late init again so it is ok to do nothing this time.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 14:36:10 +0000 (22:36 +0800)]
drm/amdgpu: gfx support ras gpu reset
request a gpu reset if ras return EAGAIN.
we will run late init again so it is ok to do nothing this time.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 14:32:34 +0000 (22:32 +0800)]
drm/amdgpu: gmc support ras gpu reset
request a gpu reset if ras return EAGAIN.
we will run late init again so it is ok to do nothing this time.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 11:12:24 +0000 (19:12 +0800)]
drm/amdgpu: handle ras reset
add another flag to allow IP do a gpu reset after device init.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 08:13:03 +0000 (16:13 +0800)]
drm/amdgpu: Issue ras TA disable/enable cmd forcely on boot
Check ras TA error code and return EAGAIN.
Issue ras enable/disable cmd without checking currect state.
Looks like ras TA will handle current state == target state case.
Now driver might need do a reset to satisfy ras TA.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xinhui pan [Wed, 8 May 2019 14:17:57 +0000 (22:17 +0800)]
drm/amdgpu: gpu reset will run late_init
ras need late init to initialize proper state.
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>