Eric Yang [Fri, 9 Jul 2021 16:57:50 +0000 (12:57 -0400)]
drm/amd/display: add workaround for riommu invalidation request hang
[Why]
When an riommu invalidation request come at the same time as a pipe is
disabled there can be a case where DCN cannot ACK the request if only
one VMID is setup in the inuse list.
[How]
Setup a second unused VMID will work around the issue.
Reviewed-by: Jun Lei <jun.lei@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Eric Yang <Eric.Yang2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
DCN 3x increased Line buffer size for DCHUB latency hiding, from 4 lines
of 4K resolution lines to 5 lines of 4K resolution lines. All Line
Buffer can be used as extended memory for P State change latency hiding.
The maximum number of lines is increased to 32 lines. Finally,
LB_MEMORY_CONFIG_1 (LB memory piece 1) and LB_MEMORY _CONFIG_2 (LB
memory piece 2) are not affected, no change in size, only 3 pieces is
affected, i.e., when all 3 pieces are used in both LB_MEMORY_CONFIG_0
and LB_MEMORY_CONFIG_3 (for 4:2:0) modes.
Reviewed-by: Jun Lei <jun.lei@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Nevenko Stupar <Nevenko.Stupar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: DCN2X Prefer ODM over bottom pipe to find second pipe
[WHY]
When finding a second pipe for pipe split, currently will look for
bottom pipe in context first to decide the second pipe. This causes
issues in 2 plane to 1 plane transitions like fullscreen video where
bottom pipe no longer exists in the new configuration.
[HOW]
If previous context had an ODM pipe, use that to find the secondary pipe
first before looking at bottom pipe.
[Why & How]
We're missing a default value for dram_channel_width_bytes in the
DCN3.1 SOC bounding box and we don't currently have the interface in
place to query the actual value from VBIOS.
Put in a hardcoded default until we have the interface in place.
Reviewed-by: Eric Yang <eric.yang2@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Query VCO frequency from register for DCN3.1
[Why]
Hardcoding the VCO frequency isn't correct since we don't own or control
the value.
In the case where the hardcode is also missing we can't lightup display.
[How]
Query from the CLK register instead. Update the DFS frequency to be able
to compute the VCO frequency.
Reviewed-by: Eric Yang <eric.yang2@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix max vstartup calculation for modes with borders
[Why]
Vertical and horizontal borders in timings are treated as increasing the
active area - vblank and hblank actually shrink.
Our input into DML does not include these borders so it incorrectly
assumes it has more time than available for vstartup and tmdl
calculations for some modes with borders.
An example of such a timing would be 640x480@72Hz:
Victor Lu [Thu, 24 Jun 2021 15:05:42 +0000 (11:05 -0400)]
drm/amd/display: Fix comparison error in dcn21 DML
[why]
A comparison error made it possible to not iterate through all the
specified prefetch modes.
[how]
Correct "<" to "<="
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Reviewed-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jake Wang [Wed, 30 Jun 2021 17:55:50 +0000 (13:55 -0400)]
drm/amd/display: Fixed hardware power down bypass during headless boot
[Why]
During headless boot, DIG may be on which causes HW/SW discrepancies.
To avoid this we power down hardware on boot if DIG is turned on. With
introduction of multiple eDP, hardware power down is being bypassed
under certain conditions.
[How]
Fixed hardware power down bypass, and ensured hardware will power down
if DIG is on and seamless boot is not enabled.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Jake Wang <haonan.wang2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhan Liu [Tue, 29 Jun 2021 01:20:18 +0000 (21:20 -0400)]
drm/amd/display: Reduce delay when sink device not able to ACK 00340h write
[Why]
Theoretically, per DP 1.4a spec, sink device needs to AUX_ACK 00340h
write. However, due to hardware limitation, some sink devices have no
00340h dpcd address at all. This results in sink side fails to reply
ACK, and consequently cause source side keep retrying DPCD write on DPCD
00340h. This results in significant delay when DPCD 00340h write is
triggered (e.g. at S3 resume).
[How]
Check whether sink device could ACK on DPCD 00340h write on boot. If
sink device fails to ACK, then remember that, so we won't write to DPCD
00340h later on.
There will be a drm.debug KMS level message to inform user once a 00340h
DPCD write is skipped on purpose.
Reviewed-by: Nikola Cornij <Nikola.Cornij@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Zhan Liu <zhan.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Tue, 29 Jun 2021 01:41:08 +0000 (21:41 -0400)]
drm/amd/display: add debug print for DCC validation failure
[Why&How]
Print a debug message when dcc validation fails in the display driver.
Most DCC enablement related errors are from userspace. Adding a debug
print in case of a failure from display driver will aid quicker triage.
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This tends to take miliseconds in certain scenarios and we'd rather not
wait that long. Due to how this interacts with det size update and
locking waiting should not be necessary as compbuf updates before
unlock.
Add a watch for config error instead as that is something we actually do
care about.
Eric Yang [Wed, 23 Jun 2021 19:48:02 +0000 (15:48 -0400)]
drm/amd/display: implement workaround for riommu related hang
[Why]
During S4/S5/reboot, sometimes riommu invalidation request arrive too
early, DCN may be unable to respond to the invalidation request
resulting in pstate hang.
[How]
VBIOS will force allow pstate for riommu invalidation and driver will
clear it after powering down display pipes.
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Eric Yang <Eric.Yang2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Josip Pavic [Mon, 21 Jun 2021 16:17:24 +0000 (12:17 -0400)]
drm/amd/display: log additional register state for debug
[Why & How]
Extend existing state collection functions to add some additional
registers useful for debug, and add state collection function for DC
hubbub
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Josip Pavic <Josip.Pavic@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Krunoslav Kovac [Tue, 22 Jun 2021 22:42:28 +0000 (18:42 -0400)]
drm/amd/display: Assume active upper layer owns the HW cursor
[why]
The current logic checks if there's an upper pipe whose viewport
completely covers the current pipe viewport.
This fails in pipe splitting case as you can have layer 1 pipe that
crosses the two layer 0 pipes where it's contained in both, but neither
covers it completely, hence we allow the cursor on both layers.
[How]
Instead of trying to "sum up" rectangles from the higher level pipes
which could leave gaps and would not work generically, we will assume if
there's an upper layer that is active, it will control the HW cursor.
Commit 72a7cf0aec0c ("drm/amd/display: Keep linebuffer pixel depth at
30bpp for DCE-11.0.") doesn't seems to have fixed 10bit 4K rendering over
DisplayPort for CIK GPUs. On my machine with a HAWAII GPU I get a broken
image that looks like it has an effective resolution of 1920x1080 but
scaled up in an irregular way. Reverting the commit or applying this
patch fixes the problem on v5.14-rc1.
Fixes: 72a7cf0aec0c ("drm/amd/display: Keep linebuffer pixel depth at 30bpp for DCE-11.0.") Acked-by: Mario Kleiner <mario.kleiner.de@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Liviu Dudau <liviu@dudau.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kevin Wang [Fri, 16 Jul 2021 18:03:08 +0000 (14:03 -0400)]
drm/amdgpu/ttm: optimize vram access in amdgpu_ttm_access_memory()
1. using vram aper to access vram if possible
2. avoid MM_INDEX/MM_DATA is not working when mmio protect feature is
enabled.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kevin Wang [Thu, 15 Jul 2021 07:51:28 +0000 (15:51 +0800)]
drm/amdgpu/ttm: replace duplicate code with exiting function
using exiting function to replace duplicate code blocks in
amdgpu_ttm_vram_write().
Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kevin Wang [Fri, 16 Jul 2021 17:57:49 +0000 (13:57 -0400)]
drm/amdgpu: split amdgpu_device_access_vram() into two small parts
split amdgpu_device_access_vram()
1. amdgpu_device_mm_access(): using MM_INDEX/MM_DATA to access vram
2. amdgpu_device_aper_access(): using vram aperature to access vram (option)
Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiaojian Du [Wed, 14 Jul 2021 07:07:22 +0000 (15:07 +0800)]
drm/amdgpu: update the golden setting for vangogh
This patch is to update the golden setting for vangogh.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: avoid printing ERROR for unknown CEA parse(v2)
For the unknown CEA parse case on DMUB-enabled ASICs, dmesg will
print an error message like below, this will be captured by
automation tools as it has the word like ERROR during boot up
and treated as a false error, as it does not break bootup process.
So use DRM_WARN printing for this.
[drm:amdgpu_dm_update_freesync_caps [amdgpu]] *ERROR* Unknown EDID CEA parser results
v2: Use DRM_WARN to print such info.
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Stylon Wang <stylon.wang@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhan Liu [Thu, 8 Jul 2021 04:51:52 +0000 (00:51 -0400)]
drm/amdgpu/display - only update eDP's backlight level when necessary
[Why]
The original logic is to update eDP's backlight level
on every amdgpu dm atomic commit, which causes excessive
DMUB write. As a result, when playing game or moving window
around, DMUB timeout and system lagging are observed.
[How]
We only need to update eDP's backlight level when current level
doesn't match requested level.
Signed-off-by: Zhan Liu <zhan.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 7 Jul 2021 16:42:34 +0000 (12:42 -0400)]
drm/amdkfd: handle fault counters on invalid address
prange is NULL if vm fault retry on invalid address, for this case, can
not use prange to get pdd, use adev to get gpuidx and then get pdd
instead, then increase pdd vm fault counter.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/pm: drop smu_v13_0_1.c|h files for yellow carp
Since there's nothing special in smu implementation for yellow carp,
it's better to reuse the common smu_v13_0 interfaces and drop the
specific smu_v13_0_1.c|h files.
v2: remove the duplicate register offset and shift mask header files as
well.
Signed-off-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dan Carpenter [Sat, 3 Jul 2021 09:46:20 +0000 (12:46 +0300)]
drm/amdgpu: return -EFAULT if copy_to_user() fails
If copy_to_user() fails then this should return -EFAULT instead of
-EINVAL.
Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dan Carpenter [Sat, 3 Jul 2021 09:45:39 +0000 (12:45 +0300)]
drm/amdgpu: unlock on error in amdgpu_ras_debugfs_table_read()
This error path needs to unlock before returning. While we're at it,
the correct error code from copy_to_user() failure is -EFAULT, not
-EINVAL.
Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dan Carpenter [Sat, 3 Jul 2021 09:44:57 +0000 (12:44 +0300)]
drm/amdgpu: Fix signedness bug in __amdgpu_eeprom_xfer()
The i2c_transfer() function returns negatives or else the number of
messages transferred. This code does not work because ARRAY_SIZE()
is type size_t and so that means negative values of "r" are type
promoted to high positive values which are greater than the ARRAY_SIZE().
Fix this by changing the < to != which works regardless of type
promotion.
Fixes: 746b584762e452 ("drm/amdgpu: Fixes to the AMDGPU EEPROM driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dan Carpenter [Sat, 3 Jul 2021 09:44:12 +0000 (12:44 +0300)]
drm/amdgpu: fix a signedness bug in __verify_ras_table_checksum()
If amdgpu_eeprom_read() returns a negative error code then the error
handling checks:
if (res < buf_size) {
The problem is that "buf_size" is a u32 so negative values are type
promoted to a high positive values and the condition is false. Fix
this by changing the type of "buf_size" to int.
Fixes: 63d4c081a556a1 ("drm/amdgpu: Optimize EEPROM RAS table I/O") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aric Cyr [Mon, 21 Jun 2021 16:01:45 +0000 (12:01 -0400)]
drm/amd/display: Round KHz up when calculating clock requests
[Why]
When requesting clocks from SMU which takes MHz inputs, DC will round
down KHz when converting to MHz, thus potentially requesting too low a
clock value.
[How]
Round up (ceil) when converting KHz to MHz for clock requests to SMU.
Signed-off-by: Aric Cyr <aric.cyr@amd.com> Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some displays are not lighting up when put in LTTPR Transparent Mode
Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix updating infoframe for DCN3.1 eDP
[Why]
We're only treating TMDS as a valid target for infoframe updates which
results in PSR being unable to transition from state 4 to state 5.
[How]
Also allow infoframe updates for DCN3.1 - following how we handle
this path for earlier ASIC as well.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Eric Yang <eric.yang2@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stylon Wang [Sat, 29 May 2021 06:19:20 +0000 (14:19 +0800)]
drm/amd/display: Add Freesync HDMI support to DM with DMUB
[Why]
Changes in DM needed to support Freesync HDMI on DMUB.
[How]
Change implementation to parse CEA blocks in case of DMUB-enabled ASICs.
Signed-off-by: Stylon Wang <stylon.wang@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wang [Tue, 15 Jun 2021 21:22:57 +0000 (17:22 -0400)]
drm/amd/display: Add null checks
Added NULL checks before two problematic statements
Signed-off-by: Wang <anguwang@amd.com> Reviewed-by: Martin Leung <Martin.Leung@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
dmub would notify x86 response time violation by GPINT_DATAOUT
[How]
1. Use GPINT_DATAOUT to trigger x86 interrupt
2. Register GPINT_DATAOUT interrupt handler.
3. Trigger ACR while GPINT_DATAOUT occurred.
Signed-off-by: Chun-Liang Chang <Chun-Liang.Chang@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wenjing Liu [Tue, 4 May 2021 20:39:08 +0000 (16:39 -0400)]
drm/amd/display: isolate link training setting override to its own function
There is a difference between our default behavior and override
behavior. For default behavior we need to decide link training settings
within specs' limitation and mandates.
For override behavior we do not need to follow all these requirements.
We are isolating override decision to its own function to maintain the
integrity of our specs compliant default behavior.
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Reviewed-by: George Shen <George.Shen@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In amdgpu_ras_query_error_count() return an error
if the device doesn't support RAS. This prevents
that function from having to always set the values
of the integer pointers (if set), and thus
prevents function side effects--always to have to
set values of integers if integer pointers set,
regardless of whether RAS is supported or
not--with this change this side effect is
mitigated.
Also, if no pointers are set, don't count, since
we've no way of reporting the counts.
Also, give this function a kernel-doc.
Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Reported-by: Tom Rix <trix@redhat.com> Fixes: a46751fbcde505 ("drm/amdgpu: Fix RAS function interface") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: The I2C IP doesn't support 0 writes/reads
The I2C IP doesn't support writes or reads of 0 bytes.
In order for a START/STOP transaction to take
place on the bus, the data written/read has to be
at least one byte.
That is, you cannot generate a write with 0 bytes,
just to get the ACK from a device, just so you can
probe that device if it is on the bus and so to
discover all devices on the bus--you'll have to
read at least one byte. Writes of 0 bytes generate
no START/STOP on this I2C IP--the bus is not
engaged at all.
Set the I2C_AQ_NO_ZERO_LEN to the existing I2C
quirk tables for Aldebaran, Arcturus, Navi10 and
Sienna Cichlid, and add a quirk table to the I2C
driver which drives the bus when the SMU
doesn't--for instance on Vega20.
YuBiao Wang [Tue, 29 Jun 2021 03:21:25 +0000 (11:21 +0800)]
drm/amdgpu: Read clock counter via MMIO to reduce delay (v5)
[Why]
GPU timing counters are read via KIQ under sriov, which will introduce
a delay.
[How]
It could be directly read by MMIO.
v2: Add additional check to prevent carryover issue.
v3: Only check for carryover for once to prevent performance issue.
v4: Add comments of the rough frequency where carryover happens.
v5: Remove mutex and gfxoff ctrl unused with current timing registers.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Acked-by: Horace Chen <horace.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.co> Reviewed-by: Monk Liu <monk.liu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Eric Huang [Wed, 30 Jun 2021 21:58:12 +0000 (17:58 -0400)]
drm/amdkfd: Only apply TLB flush optimization on ALdebaran
It is based on reverting two patches back.
drm/amdkfd: Make TLB flush conditional on mapping
drm/amdgpu: Add table_freed parameter to amdgpu_vm_bo_update
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nirmoy Das [Fri, 2 Jul 2021 09:10:52 +0000 (11:10 +0200)]
drm/amdgpu: separate out vm pasid assignment
Use new helper function amdgpu_vm_set_pasid() to
assign vm pasid value. This also ensures that we don't free
a pasid from vm code as pasids are allocated somewhere else.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nirmoy Das [Mon, 28 Jun 2021 21:29:39 +0000 (23:29 +0200)]
drm/amdgpu: use xarray for storing pasid in vm
Replace idr with xarray as we actually need hash functionality.
Cleanup code related to vm pasid by adding helper function.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jiri Kosina [Thu, 24 Jun 2021 11:11:36 +0000 (13:11 +0200)]
drm/amdgpu: Avoid printing of stack contents on firmware load error
In case when psp_init_asd_microcode() fails to load ASD microcode file,
psp_v12_0_init_microcode() tries to print the firmware filename that
failed to load before bailing out.
This is wrong because:
- the firmware filename it would want it print is an incorrect one as
psp_init_asd_microcode() and psp_v12_0_init_microcode() are loading
different filenames
- it tries to print fw_name, but that's not yet been initialized by that
time, so it prints random stack contents, e.g.
amdgpu 0000:04:00.0: Direct firmware load for amdgpu/renoir_asd.bin failed with error -2
amdgpu 0000:04:00.0: amdgpu: fail to initialize asd microcode
amdgpu 0000:04:00.0: amdgpu: psp v12.0: Failed to load firmware "\xfeTO\x8e\xff\xff"
Fix that by bailing out immediately, instead of priting the bogus error
message.
It is not true (as stated in the reverted commit changelog) that we never
unmap the BAR on failure; it actually does happen properly on
amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
amdgpu_device_fini() error path.
What's worse, this commit actually completely breaks resource freeing on
probe failure (like e.g. failure to load microcode), as
amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
early, leaving all the resources that'd normally be freed in
amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
to all sorts of oopses when someone tries to, for example, access the
sysfs and procfs resources which are still around while the driver is
gone.
Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Reported-by: Vojtech Pavlik <vojtech@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lukas Bulwahn [Mon, 28 Jun 2021 10:53:34 +0000 (12:53 +0200)]
drm/amdgpu: rectify line endings in umc v8_7_0 IP headers
Commit 6b36fa6143f6 ("drm/amdgpu: add umc v8_7_0 IP headers") adds the new
file ./drivers/gpu/drm/amd/include/asic_reg/umc/umc_8_7_0_sh_mask.h with
DOS line endings, which is very uncommon for the kernel repository.
Rectify the line endings in this file with dos2unix.
Identified by a checkpatch evaluation on the whole kernel repository and
spot-checking for really unexpected checkpatch rule violations.
Reported-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Luben Tuikov [Fri, 11 Jun 2021 06:15:49 +0000 (02:15 -0400)]
drm/amdgpu: Correctly disable the I2C IP block
On long transfers to the EEPROM device,
i.e. write, it is observed that the driver aborts
the transfer.
The reason for this is that the driver isn't
patient enough--the IC_STATUS register's contents
is 0x27, which is MST_ACTIVITY | TFE | TFNF |
ACTIVITY. That is, while the transmission FIFO is
empty, we, the I2C master device, are still
driving the bus.
Implement the correct procedure to disable
the block, as described in the DesignWare I2C
Databook, section 3.8.3 Disabling DW_apb_i2c on
page 56. Now there are no premature aborts on long
data transfers.
Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Debugfs RAS EEPROM files are available when
the ASIC supports RAS, and when the debugfs is
enabled, an also when "ras_enable" module
parameter is set to 0. However in this case,
we get a kernel oops when accessing some of
the "ras_..." controls in debugfs. The reason
for this is that struct amdgpu_ras::adev is
unset. This commit sets it, thus enabling access
to those facilities. Note that this facilitates
EEPROM access and not necessarily RAS features or
functionality.
Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 30 Jun 2021 15:19:14 +0000 (11:19 -0400)]
drm/amdgpu: fix 64 bit divide in eeprom code
pos is 64 bits.
Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs") Cc: luben.tuikov@amd.com Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add "ras_eeprom_size" file in debugfs, which
reports the maximum size allocated to the RAS
table in EEROM, as the number of bytes and the
number of records it could store. For instance,
$cat /sys/kernel/debug/dri/0/ras/ras_eeprom_size
262144 bytes or 10921 records
$_
Add "ras_eeprom_table" file in debugfs, which
dumps the RAS table stored EEPROM, in a formatted
way. For instance,
Split functionality between read and write, which
simplifies the code and exposes areas of
optimization and more or less complexity, and take
advantage of that.
Read and write the table in one go; use a separate
stage to decode or encode the data, as opposed to
on the fly, which keeps the I2C bus busy. Use a
single read/write to read/write the table or at
most two if the number of records we're
reading/writing wraps around.
Check the check-sum of a table in EEPROM on init.
Update the checksum at the same time as when
updating the table header signature, when the
threshold was increased on boot.
Take advantage of arithmetic modulo 256, that is,
use a byte!, to greatly simplify checksum
arithmetic.
Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>