fred gao [Fri, 18 Aug 2017 07:41:10 +0000 (15:41 +0800)]
drm/i915/gvt: Refine error handling in dispatch_workload
When an error occurs in dispatch_workload, this patch is to do the
proper cleanup and rollback to the original states before the workload
is abandoned.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- original PTR_ERR(cs) is good and code cleanup. (Zhenyu)
v4:
- reuse the existing i915_add_request for error handling. (Zhenyu)
v5:
- remove the duplicate error handling release_shadow_wa_ctx and
move the engine->context_unpin upper. (Zhenyu)
v6:
- keep the old label "out". (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
fred gao [Fri, 18 Aug 2017 07:41:07 +0000 (15:41 +0800)]
drm/i915/gvt: Add error handling for intel_gvt_scan_and_shadow_workload
When an error occurs after shadow_indirect_ctx, this patch is to do the
proper cleanup and rollback to the original states for shadowed indirect
context before the workload is abandoned.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- no return check for clean up functions. (Changbin)
v4:
- expose and reuse the existing release_shadow_wa_ctx. (Zhenyu)
v5:
- move the release function to scheduler.c file. (Zhenyu)
v6:
- move error handling code of intel_gvt_scan_and_shadow_workload
to here. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
fred gao [Fri, 18 Aug 2017 07:41:06 +0000 (15:41 +0800)]
drm/i915/gvt: Separate cmd scan from request allocation
Currently i915 request structure and shadow ring buffer are allocated
before command scan, so it will have to restore to previous states once
any error happens afterwards in the long dispatch_workload path.
This patch is to introduce a reserved ring buffer created at the beginning
of vGPU initialization. Workload will be coped to this reserved buffer and
be scanned first, the i915 request and shadow ring buffer are only
allocated after the result of scan is successful.
To balance the memory usage and buffer alloc time, the coming bigger ring
buffer will be reallocated and kept until more bigger buffer is coming.
v2:
- use kmalloc for the smaller ring buffer, realloc if required. (Zhenyu)
v3:
- remove the dynamically allocated ring buffer. (Zhenyu)
Changbin Du [Tue, 15 Aug 2017 05:14:04 +0000 (13:14 +0800)]
drm/i915/gvt: Add emulation for BAR2 (aperture) with normal file RW approach
For vfio-pci, if the region support MMAP then it should support both
mmap and normal file access. The user-space is free to choose which is
being used. For qemu, we just need add 'x-no-mmap=on' for vfio-pci
option.
Currently GVTg only support MMAP for BAR2. So GVTg will not work when
user turn on x-no-mmap option.
This patch added file style access for BAR2, aka the GPU aperture. We
map the entire aperture partition of active vGPU to kernel space when
guest driver try to enable PCI Memory Space. Then we redirect the file
RW operation from kvmgt to this mapped area.
Changbin Du [Tue, 15 Aug 2017 05:20:51 +0000 (13:20 +0800)]
drm/i915/kvmgt: Sanitize PCI bar emulation
For PCI, 64bit bar consumes two BAR registers, but this doesn't mean
both of two BAR are valid. Actually the second BAR is regarded as
reserved in this case. So we shouldn't emulate the second BAR.
Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Chris Wilson [Wed, 6 Sep 2017 13:52:20 +0000 (14:52 +0100)]
drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages
i915_gem_object_attach_phys() is trying to swap out its shmemfs pages
for a new set of physically contiguous pages, but unfortunately triggers
an assert inside get-pages.
drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk
Skip compressing 1 segment at the end of the frame,
avoid a pixel count mismatch nuke event when last active
pixel and dummy pixel has same color for Odd Plane
Width / Height.
For both platforms Gemini Lake and Cannon Lake.
v2: Use function-like macro and also use mask to clean
to make sure bit 11 is 0. (Suggested by Paulo).
v3: Add Display WA notation and also apply for GLK.
Both Forgotten on v2.
Using "GLK_" prefix since GLK came before CNL.
v4: Forgot to "|=" when moving directly macro to masked
val. (Noticed by Paulo.)
v5: Rebased on top of 0a46ddd57c9e ("drm/i915/cnp: Wa 1181:
Fix Backlight issue")
Chris Wilson [Wed, 6 Sep 2017 10:56:53 +0000 (11:56 +0100)]
drm/i915: Move device_info.has_snoop into the static tables
Currently we define any !llc machine as using snoop instead. However,
some platforms run into trouble using snoop that we would like to
disable, and to do so easily we want to be able to use the static
device_info tables.
v2: Leave the old snoop = !llc as a warning for the time being to check
that all stanzas are filled as either llc or snoop.
Chris Wilson [Wed, 6 Sep 2017 15:28:59 +0000 (16:28 +0100)]
drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gm
The early gen3 machines (i915g/Grantsdale and i915gm/Alviso) share a lot
of characteristics in their MI/GTT blocks with gen2, and in particular
can only use physical addresses in MI_STORE_DATA_IMM. This makes it
incompatible with our usage, so include those two machines in the
blacklist to prevent usage.
v2: Make it easy for gcc and rewrite it as a switch to save some space.
Chris Wilson [Wed, 6 Sep 2017 11:14:05 +0000 (12:14 +0100)]
drm/i915: Re-enable GTT following a device reset
Ville Syrjälä spotted that PGETBL_CTL was losing its enable bit upon a
reset. That was causing the display to show garbage on his 945gm. On my
i915gm the effect was far more severe; re-enabling the display following
the reset without PGETBL_CTL being enabled lead to an immediate hard
hang.
We do have a routine to re-enable PGETBL_CTL which is applicable to
gen2-4, although on gen4 it is documented that a graphics reset doesn't
alter the register (no such wording is given for gen3) and should be safe
to call to punch back in the enable bit. However, that leaves the question
of whether we need to completely re-initialise the register and the
rest of the GSM. For g33/pnv/gen4+, where we do have a configurable
page table, its contents do seem to be kept, and so we should be able to
recover without having to reinitialise the GTT from scratch (as prior to
g33, that register is configured by the BIOS and we leave alone except
for the enable bit).
This appears to have been broken by commit 5fbd0418eef2 ("drm/i915:
Re-enable GGTT earlier during resume on pre-gen6 platforms"), which
moved the intel_enable_gtt() from i915_gem_init_hw() (also used by
reset) to add it earlier during hw init and resume, missing the reset
path.
v2: Find the culprit, rearrange ggtt_enable to be before gem_init_hw to
match init/resume
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101852 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170906111405.27110-1-chris@chris-wilson.co.uk Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 1 Sep 2017 16:54:34 +0000 (19:54 +0300)]
drm/i915: Annotate user relocs with __user
Add the missing __user to the urelocs cast to fix the following sparse
warning:
i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression
i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces)
i915_gem_execbuffer.c:1541:62: expected void const [noderef] <asn:1>*from
i915_gem_execbuffer.c:1541:62: got char *
Ville Syrjälä [Fri, 1 Sep 2017 17:12:52 +0000 (20:12 +0300)]
drm/i915: io unmap functions want __iomem
Don't cast away the __iomem from the io_mapping functions so that
sparse won't be so unhappy when we pass the pointer to the unmap
functions. Instead let's move the cast to where we actually use the
pointer.
Fixes the following sparse warnings:
i915_gem.c:1022:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1022:33: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1022:33: got void *[assigned] vaddr
i915_gem.c:1027:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1027:34: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1027:34: got void *[assigned] vaddr
i915_gem.c:1199:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1199:33: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1199:33: got void *[assigned] vaddr
i915_gem.c:1204:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1204:34: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1204:34: got void *[assigned] vaddr
Ville Syrjälä [Fri, 1 Sep 2017 19:54:56 +0000 (22:54 +0300)]
drm/i915: Wake up the device for the fbdev setup
Our fbdev setup requires the device to be awake for access
through the GTT. If one boots without connected displays and
later plugs one in, we won't have any runtime PM references when
the fbdev setup runs. Explicitly grab a runtime PM reference during
the fbdev setup to avoid the following spew:
Changbin Du [Mon, 4 Sep 2017 08:01:01 +0000 (16:01 +0800)]
drm/i915: Add interface to reserve fence registers for vGPU
In the past, vGPU alloc fence registers by walking through mm.fence_list
to find fence which pin_count = 0 and vma is empty. vGPU may not find
enough fence registers this way. Because a fence can be bind to vma even
though it is not in using. We have found such failure many times these
days.
An option to resolve this issue is that we can force-remove fence from
vma in this case.
This patch added two new api to the fence management code:
- i915_reserve_fence() will try to find a free fence from fence_list
and force-remove vma if need.
- i915_unreserve_fence() reclaim a reserved fence after vGPU has
finished.
With this change, the fence management is more clear to work with vGPU.
GVTg do not need remove fence from fence_list in private.
v3: (Chris)
- Add struct_mutex lock assertion.
- Only count for unpinned fence.
v2: (Chris)
- Rename the new api for symmetry.
- Add safeguard to ensure at least 1 fence remained for host display.
The header comment in include/trace/define_trace.h specifies that the
TRACE_INCLUDE_PATH needs to be relative to the define_trace.h header
rather than the trace file including it. Most instances get that wrong
and work around it by adding the $(src) directory to the include path.
While this works, it is preferable to refer to the correct path to the
trace file in the first place and avoid any workaround.
Ville Syrjälä [Fri, 1 Sep 2017 14:31:23 +0000 (17:31 +0300)]
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
Use enum pipe for PCH transcoders also in the FIFO underrun code.
Fixes the following new sparse warnings:
intel_fifo_underrun.c:340:49: warning: mixing different enum types
intel_fifo_underrun.c:340:49: int enum pipe versus
intel_fifo_underrun.c:340:49: int enum transcoder
intel_fifo_underrun.c:344:49: warning: mixing different enum types
intel_fifo_underrun.c:344:49: int enum pipe versus
intel_fifo_underrun.c:344:49: int enum transcoder
intel_fifo_underrun.c:397:57: warning: mixing different enum types
intel_fifo_underrun.c:397:57: int enum pipe versus
intel_fifo_underrun.c:397:57: int enum transcoder
intel_fifo_underrun.c:398:17: warning: mixing different enum types
intel_fifo_underrun.c:398:17: int enum pipe versus
intel_fifo_underrun.c:398:17: int enum transcoder
Ville Syrjälä [Fri, 1 Sep 2017 14:31:22 +0000 (17:31 +0300)]
drm/i915: Make i2c lock ops static
Make gmbus_lock_ops and proxy_lock_ops static to appease sparse
intel_i2c.c:652:34: warning: symbol 'gmbus_lock_ops' was not declared. Should it be static?
intel_sdvo.c:2981:34: warning: symbol 'proxy_lock_ops' was not declared. Should it be static?
Ville Syrjälä [Fri, 1 Sep 2017 14:31:21 +0000 (17:31 +0300)]
drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
Make i9xx_load_ycbcr_conversion_matrix() static to appease sparse:
intel_color.c:110:6: warning: symbol 'i9xx_load_ycbcr_conversion_matrix' was not declared. Should it be static?
Ville Syrjälä [Wed, 23 Aug 2017 15:22:23 +0000 (18:22 +0300)]
drm/i915: Pass proper old/new states to intel_plane_atomic_check_with_state()
Eliminate plane->state and crtc->state usage from
intel_plane_atomic_check_with_state() and its callers. Instead pass the
proper states in or dig them up from the top level atomic state.
Note that intel_plane_atomic_check_with_state() itself isn't allowed to
use the top level atomic state as there is none when it gets called from
the legacy cursor short circuit path.
v2: Rename some variables for easier comprehension (Maarten)
Up to Coffeelake we could deduce this GT number from the device ID.
This doesn't seem to be the case anymore. This change reorders pciids
per GT and adds a gt field to intel_device_info. We set this field on
the following platforms :
Manasi Navare [Tue, 15 Aug 2017 18:59:51 +0000 (11:59 -0700)]
drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
This patch fixes the DP AUX CH timeouts observed during CI runs causing
CI Failures on a specific PCI device. This issue was fixed previously
by adding a quirk but looks like we need to increase this delay even more
in order to get rid all the DP AUX CH timeouts.
Fixes: c99a259b4b5192ba ("drm/i915/edp: Add a T12 panel delay quirk to fix
DP AUX CH timeouts")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101144 Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Cc: Clinton Taylor <clinton.a.taylor@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1502823591-25310-1-git-send-email-manasi.d.navare@intel.com
Ville Syrjälä [Mon, 10 Jul 2017 19:33:47 +0000 (22:33 +0300)]
drm/i915: Consolidate max_cdclk_freq check in intel_crtc_compute_min_cdclk()
Currently the .modeset_calc_cdclk() hooks check the final cdclk value
against the max allowed. That's not really sufficient since the low
level calc_cdclk() functions effectively clamp the minimum required
cdclk to the max supported by the platform. Hence if the minimum
required exceeds the platforms capabilities we'd keep going anyway
using the max cdclk frequency.
To fix that let's move the check earlier into
intel_crtc_compute_min_cdclk() and we'll check the minimum required
cdclk of the pipe against the maximum supported by the platform.
Ville Syrjälä [Wed, 30 Aug 2017 18:57:03 +0000 (21:57 +0300)]
drm/i915: Track minimum acceptable cdclk instead of "minimum dotclock"
Make the min_pixclk thing less confusing by changing it to track
the minimum acceptable cdclk frequency instead. This means moving
the application of the guardbands to a slightly higher level from
the low level platform specific calc_cdclk() functions.
The immediate benefit is elimination of the confusing 2x factors
on GLK/CNL+ in the audio workarounds (which stems from the fact
that the pipes produce two pixels per clock).
v2: Keep cdclk higher on CNL to workaround missing DDI clock voltage handling
v3: Squash with the CNL cdclk limits patch (DK)
v4: s/intel_min_cdclk/intel_pixel_rate_to_cdclk/ (DK)
Rodrigo Vivi [Thu, 31 Aug 2017 14:53:56 +0000 (07:53 -0700)]
drm/i915/cnl: Fix DP max voltage
On clock recovery this function is called to find out
the max voltage swing level that we could go.
However gen 9 functions use the old buffer translation tables
to figure that out. That table is not valid for CNL
causing an invalid number of entries and an invalid selection
on the max voltage swing level.
v2: Let's use same approach that previous platforms.
v3: Actually use n_entries and avoid duplicated -1.
v4: Avoid cnl_max_level and use current style.
Rodrigo Vivi [Tue, 29 Aug 2017 23:22:28 +0000 (16:22 -0700)]
drm/i915/cnl: Move ddi buf trans related functions up.
No functional changes. But those functions will be needed
to get max level for HDMI and DP, so let's move those
up closer to other similar functions existent for previous
platforms.
Rodrigo Vivi [Tue, 29 Aug 2017 23:22:27 +0000 (16:22 -0700)]
drm/i915/cnl: Move voltage check into ddi buf trans functions.
Let's start converging CNL buf translations to same style
used on previous platforms. So first thing is to use the
standard signature so we don't need to propagate the voltage
check into other parts of the code, but only on the parts
that it is really useful.
Rodrigo Vivi [Tue, 29 Aug 2017 23:22:26 +0000 (16:22 -0700)]
drm/i915: Enable voltage swing before enabling DDI_BUF_CTL.
Sequences for DisplayPort asks us to
" Configure voltage swing and related IO settings.
Refer to DDI Buffer section."
before "Configure and enable DDI_BUF_CTL"
On BXT and CNL this means to execute the ddi vswing sequences.
At this point these sequences calls are getting duplicated for DP
because they are all called from DP link trainning sequences.
However this patch is not yet removing it before a futher discussion
since spec also allows that during link training without disabling
anything:
"
Notes
Changing voltage swing during link training:
Change the swing setting following the DDI Buffer section.
The port does not need to be disabled.
"
Rodrigo Vivi [Tue, 29 Aug 2017 23:22:25 +0000 (16:22 -0700)]
drm/i915: Align vswing sequences with old ddi buffer registers.
Vswing sequences on BXT and CNL are equivalent
to the ddi buffer registers setting on other platforms.
For some reason it got aligned with skl_ddi_set_iboost what
is semantically incorrect. This forced us to keep skipping
ddi buffer translation tables on the platforms that has
the vswing sequences.
v2: Don't mess with DP signal levels on this patch.
Cc: Vandana Kannan <vandana.kannan@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170829232230.23051-3-rodrigo.vivi@intel.com
Rodrigo Vivi [Tue, 29 Aug 2017 23:09:07 +0000 (16:09 -0700)]
drm/i915/cnl: Avoid ioremap_wc on Cannonlake as well.
Driver’s CPU access to GTT is via the GTTMMADR BAR.
The current HW implementation of that BAR is to only
support <= DW (and maybe QW) writes—not 16/32/64B writes
that could occur with WC and/or SSE/AVX moves.
GTTMMADR must be marked uncacheable (UC).
Accesses to GTTMMADR(GTT), must be 64 bits or less (ie. 1 GTT entry).
v2: Get clarification on the reasons and spec is getting
updated to reflect it now.
Ville Syrjälä [Thu, 24 Aug 2017 19:10:50 +0000 (22:10 +0300)]
drm/i915: Skip fence alignemnt check for the CCS plane
The CCS won't have the same stride as the main surface anyway so trying
to guard against the fence stride not matching the CCS stride is
not sensible. Just skip the fence vs. fb alignment check for the aux
plane.
Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Daniel Stone <daniels@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170824191100.10949-3-ville.syrjala@linux.intel.com Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Fixes: 2e2adb05736c ("drm/i915: Add render decompression support") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Thu, 24 Aug 2017 19:10:49 +0000 (22:10 +0300)]
drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
Userspace wants to treat fb->offsets[] as raw byte offsets into the gem
bo. Adjust the kernel code to match.
Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Daniel Stone <daniels@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170824191100.10949-2-ville.syrjala@linux.intel.com Acked-by: Ben Widawsky <ben@bwidawsk.net> Fixes: 2e2adb05736c ("drm/i915: Add render decompression support") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Chris Wilson [Tue, 29 Aug 2017 19:25:46 +0000 (20:25 +0100)]
drm/i915: Always wake the device to flush the GTT
Since we hold the device wakeref when writing through the GTT (otherwise
the writes would fail), we presumed that before the device sleeps those
writes would naturally be flushed and that we wouldn't need our mmio
read trick. However, that presumption seems false and a sleepy bxt seems
to require us to always manually flush the GTT writes prior to direct
access.
Chris Wilson [Sat, 26 Aug 2017 11:09:35 +0000 (12:09 +0100)]
drm/i915: Discard the request queue if we fail to sleep before suspend
If we fail to clear the outstanding request queue before suspending,
mark those requests as lost.
References: https://bugs.freedesktop.org/show_bug.cgi?id=102037 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-3-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Chris Wilson [Sat, 26 Aug 2017 11:09:34 +0000 (12:09 +0100)]
drm/i915: Clear wedged status upon resume
When we wake up from suspend, the device has been powered down and
should come back afresh. We should be able to safely remove the wedged
status from the previous session and start afresh.
Chris Wilson [Sat, 26 Aug 2017 11:09:33 +0000 (12:09 +0100)]
drm/i915: Always sanity check engine state upon idling
When we do a locked idle we know that afterwards all requests have been
completed and the engines have been cleared of tasks. For whatever
reason, this doesn't always happen and we may go into a suspend with
ELSP still full, and this causes an issue upon resume as we get very,
very confused.
If the engines refuse to idle, mark the device as wedged. In the process
we get rid of the maybe unused open-coded version of wait_for_engines
reported by Nick Desaulniers and Matthias Kaehlcke.
v2: Suppress the -EIO before suspend, but keep it for seqno wrap.
References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
References: https://bugs.freedesktop.org/show_bug.cgi?id=102456 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthias Kaehlcke <mka@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Chris Wilson [Sat, 26 Aug 2017 13:56:20 +0000 (14:56 +0100)]
drm/i915: Don't use GPU relocations prior to cmdparser stalls
If we are using the cmdparser, we will have to copy the batch and so
stall for the relocations. Rather than prolong that stall by adding more
relocation requests, just use CPU relocations and do the stall upfront.
Chris Wilson [Mon, 28 Aug 2017 10:46:31 +0000 (11:46 +0100)]
drm/i915: Recreate vmapping even when the object is pinned
Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.
This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).
We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.
v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.
Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Dave Airlie [Tue, 29 Aug 2017 00:38:14 +0000 (10:38 +1000)]
Merge branch 'drm-vmwgfx-next' of git://people.freedesktop.org/~syeh/repos_linux into drm-next
vmwgfx add fence fd support.
* 'drm-vmwgfx-next' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Bump the version for fence FD support
drm/vmwgfx: Add export fence to file descriptor support
drm/vmwgfx: Add support for imported Fence File Descriptor
drm/vmwgfx: Prepare to support fence fd
drm/vmwgfx: Fix incorrect command header offset at restart
drm/vmwgfx: Support the NOP_ERROR command
drm/vmwgfx: Restart command buffers after errors
drm/vmwgfx: Move irq bottom half processing to threads
drm/vmwgfx: Don't use drm_irq_[un]install
Dave Airlie [Tue, 29 Aug 2017 00:37:36 +0000 (10:37 +1000)]
Merge tag 'exynos-drm-next-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
Summary:
- Provide NV12MT pixel format support of Mixer driver in generic way.
- Refactor Exynos KMS drivers
. Refactoring to panel detection way
. Refactoring to setting up possible_crtcs
. Refactoring to video and command mode support
- Some cleanups
* tag 'exynos-drm-next-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: simplify set_pixfmt() in DECON and FIMD drivers
drm/exynos: consistent use of cpp
drm/exynos: mixer: remove src offset from mixer_graph_buffer()
drm/exynos: mixer: simplify mixer_graph_buffer()
drm/exynos: mixer: simplify vp_video_buffer()
drm/exynos: mixer: enable NV12MT support for the video plane
drm/exynos: mixer: fix chroma comment in vp_video_buffer()
arm64: dts: exynos: remove i80-if-timings nodes
dt-bindings: exynos5433-decon: remove i80-if-timings property
drm/exynos/decon5433: use mode info stored in CRTC to detect i80 mode
drm/exynos: add mode_valid callback to exynos_drm
drm/exynos/decon5433: refactor irq requesting code
drm/exynos/mic: use mode info stored in CRTC to detect i80 mode
drm/exynos/dsi: propagate info about command mode from panel
drm/exynos/dsi: refactor panel detection logic
drm/exynos: use helper to set possible crtcs
drm/exynos/decon5433: use readl_poll_timeout helpers
Dave Airlie [Tue, 29 Aug 2017 00:36:06 +0000 (10:36 +1000)]
Merge tag 'drm-misc-next-fixes-2017-08-28' of git://anongit.freedesktop.org/git/drm-misc into drm-next
UAPI Changes:
- Rename u32 to __u32 in struct drm_format_modifier_blob (Lionel)
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* tag 'drm-misc-next-fixes-2017-08-28' of git://anongit.freedesktop.org/git/drm-misc:
drm: rename u32 in __u32 in uapi
Jason Ekstrand [Mon, 28 Aug 2017 21:10:28 +0000 (14:10 -0700)]
drm/syncobj: Add a signal ioctl (v3)
This IOCTL provides a mechanism for userspace to trigger a sync object
directly. There are other ways that userspace can trigger a syncobj
such as submitting a dummy batch somewhere or hanging on to a triggered
sync_file and doing an import. This just provides an easy way to
manually trigger the sync object without weird hacks.
The motivation for this IOCTL is Vulkan fences. Vulkan lets you create
a fence already in the signaled state so that you can wait on it
immediatly without stalling. We could also handle this with a new
create flag to ask the driver to create a syncobj that is already
signaled but the IOCTL seemed a bit cleaner and more generic.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Mon, 28 Aug 2017 21:10:27 +0000 (14:10 -0700)]
drm/syncobj: Add a reset ioctl (v3)
This just resets the dma_fence to NULL so it looks like it's never been
signaled. This will be useful once we add the new wait API for allowing
wait on "submit and signal" behavior.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian König <christian.koenig@amd.com> (v1) Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 25 Aug 2017 17:52:26 +0000 (10:52 -0700)]
drm/syncobj: Add a syncobj_array_find helper
The wait ioctl has a bunch of code to read an syncobj handle array from
userspace and turn it into an array of syncobj pointers. We're about to
add two new IOCTLs which will need to work with arrays of syncobj
handles so let's make some helpers.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 25 Aug 2017 17:52:24 +0000 (10:52 -0700)]
drm/syncobj: Allow wait for submit and signal behavior (v5)
Vulkan VkFence semantics require that the application be able to perform
a CPU wait on work which may not yet have been submitted. This is
perfectly safe because the CPU wait has a timeout which will get
triggered eventually if no work is ever submitted. This behavior is
advantageous for multi-threaded workloads because, so long as all of the
threads agree on what fences to use up-front, you don't have the extra
cross-thread synchronization cost of thread A telling thread B that it
has submitted its dependent work and thread B is now free to wait.
Within a single process, this can be implemented in the userspace driver
by doing exactly the same kind of tracking the app would have to do
using posix condition variables or similar. However, in order for this
to work cross-process (as is required by VK_KHR_external_fence), we need
to handle this in the kernel.
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which
instructs the IOCTL to wait for the syncobj to have a non-null fence and
then wait on the fence. Combined with DRM_IOCTL_SYNCOBJ_RESET, you can
easily get the Vulkan behavior.
v2:
- Fix a bug in the invalid syncobj error path
- Unify the wait-all and wait-any cases
v3:
- Unify the timeout == 0 case a bit with the timeout > 0 case
- Use wait_event_interruptible_timeout
v4:
- Use proxy fence
v5:
- Revert to a combination of v2 and v3
- Don't use proxy fences
- Don't use wait_event_interruptible_timeout because it just adds an
extra layer of callbacks
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Dave Airlie <airlied@redhat.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 25 Aug 2017 17:52:25 +0000 (10:52 -0700)]
drm/syncobj: Add a CREATE_SIGNALED flag
This requests that the driver create the sync object such that it
already has a signaled dma_fence attached. Because we don't need
anything in particular (just something signaled), we use a dummy null
fence. This is useful for Vulkan which has a similar flag that can be
passed to vkCreateFence.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Mon, 28 Aug 2017 14:39:25 +0000 (07:39 -0700)]
drm/syncobj: Add a callback mechanism for replace_fence (v3)
It is useful in certain circumstances to know when the fence is replaced
in a syncobj. Specifically, it may be useful to know when the fence
goes from NULL to something valid. This does make syncobj_replace_fence
a little more expensive because it has to take a lock but, in the common
case where there is no callback list, it spends a very short amount of
time inside the lock.
v2:
- Don't lock in drm_syncobj_fence_get. We only really need to lock
around fence_replace to make the callback work.
v3:
- Fix the cb_list comment to make kbuild happy
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 25 Aug 2017 17:52:22 +0000 (10:52 -0700)]
drm/syncobj: add sync obj wait interface. (v8)
This interface will allow sync object to be used to back
Vulkan fences. This API is pretty much the vulkan fence waiting
API, and I've ported the code from amdgpu.
v2: accept relative timeout, pass remaining time back
to userspace.
v3: return to absolute timeouts.
v4: absolute zero = poll,
rewrite any/all code to have same operation for arrays
return -EINVAL for 0 fences.
v4.1: fixup fences allocation check, use u64_to_user_ptr
v5: move to sec/nsec, and use timespec64 for calcs.
v6: use -ETIME and drop the out status flag. (-ETIME
is suggested by ickle, I can feel a shed painting)
v7: talked to Daniel/Arnd, use ktime and ns everywhere.
v8: be more careful in the timeout calculations
use uint32_t for counter variables so we don't overflow
graciously handle -ENOINT being returned from dma_fence_wait_timeout
Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 25 Aug 2017 17:52:20 +0000 (10:52 -0700)]
drm/syncobj: Add a race-free drm_syncobj_fence_get helper (v2)
The atomic exchange operation in drm_syncobj_replace_fence is sufficient
for the case where it races with itself. However, if you have a race
between a replace_fence and dma_fence_get(syncobj->fence), you may end
up with the entire replace_fence happening between the point in time
where the one thread gets the syncobj->fence pointer and when it calls
dma_fence_get() on it. If this happens, then the reference may be
dropped before we get a chance to get a new one. The new helper uses
dma_fence_get_rcu_safe to get rid of the race.
This is also needed because it allows us to do a bit more than just get
a reference in drm_syncobj_fence_get should we wish to do so.
v2:
- RCU isn't that scary
- Call rcu_read_lock/unlock
- Don't rename fence to _fence
- Make the helper static inline
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Christian König <christian.koenig@amd.com> (v1) Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 25 Aug 2017 17:52:19 +0000 (10:52 -0700)]
drm/syncobj: Rename fence_get to find_fence
The function has far more in common with drm_syncobj_find than with
any in the get/put functions.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Christian König <christian.koenig@amd.com> (v1) Signed-off-by: Dave Airlie <airlied@redhat.com>
John Stultz [Tue, 22 Aug 2017 18:42:26 +0000 (11:42 -0700)]
drm: kirin: Add mode_valid logic to avoid mode clocks we can't generate
Currently the hikey dsi logic cannot generate accurate byte
clocks values for all pixel clock values. Thus if a mode clock
is selected that cannot match the calculated byte clock, the
device will boot with a blank screen.
This patch uses the new mode_valid callback (many thanks to
Jose Abreu for upstreaming it!) to ensure we don't select
modes we cannot generate.
Also, since the ade crtc code will adjust the mode in mode_set,
this patch also adds a mode_fixup callback which we use to make
sure we are validating the mode clock that will eventually be
used.
Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Sean Paul <seanpaul@chromium.org> Cc: David Airlie <airlied@linux.ie> Cc: Rob Clark <robdclark@gmail.com> Cc: Xinliang Liu <xinliang.liu@linaro.org> Cc: Xinliang Liu <z.liuxinliang@hisilicon.com> Cc: Rongrong Zou <zourongrong@gmail.com> Cc: Xinwei Kong <kong.kongxinwei@hisilicon.com> Cc: Chen Feng <puck.chen@hisilicon.com> Cc: Jose Abreu <Jose.Abreu@synopsys.com> Cc: Archit Taneja <architt@codeaurora.org> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Xinliang Liu <xinliang.liu@linaro.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
drm/vmwgfx: Add export fence to file descriptor support
Added code to link a fence to a out_fence_fd file descriptor and
thread out_fence_fd down to vmw_execbuf_copy_fence_user() so it can be
copied into the IOCTL reply and be passed back up the the user.
v2:
Make sure to sync and clean up in case of failure
Thomas Hellstrom [Thu, 24 Aug 2017 06:06:30 +0000 (08:06 +0200)]
drm/vmwgfx: Support the NOP_ERROR command
Can be used by user-space applications to test and verify the kernel
command buffer error recovery functionality.
Malicious user-space apps could potentially use this command to slow down
graphics processing somewhat, but they could also accomplish the same thing
using a random malformed command so this should be considered safe.
At least as safe as it gets.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Thomas Hellstrom [Thu, 24 Aug 2017 06:06:29 +0000 (08:06 +0200)]
drm/vmwgfx: Restart command buffers after errors
Previously we skipped the command buffer and added an extra fence to
avoid hangs due to skipped fence commands.
Now we instead restart the command buffer after the failing command,
if there are any commands left.
In addition we print out some information about the failing command
and its location in the command buffer.
Testing Done: ran glxgears using mesa modified to send the NOP_ERROR
command before each 10th clear and verified that we detected the device
error properly and that there were no other device errors caused by
incorrectly ordered command buffers. Also ran the piglit "quick" test
suite which generates a couple of device errors and verified that
they were handled as intended.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Thomas Hellstrom [Thu, 24 Aug 2017 06:06:28 +0000 (08:06 +0200)]
drm/vmwgfx: Move irq bottom half processing to threads
This gets rid of the irq bottom half tasklets and instead performs the
work needed in process context. We also convert irq-disabling spinlocks to
ordinary spinlocks.
This should decrease system latency for other system components, like
sound for example but has the potential to increase latency for processes
that wait on the GPU.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Thomas Hellstrom [Thu, 24 Aug 2017 06:06:27 +0000 (08:06 +0200)]
drm/vmwgfx: Don't use drm_irq_[un]install
We're not allowed to change the upstream version of the drm_irq_install
function to be able to incorporate threaded irqs. So roll our own irq
install- and uninstall functions instead of relying on the drm core ones.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Marta Lofstedt [Mon, 28 Aug 2017 12:18:10 +0000 (15:18 +0300)]
drm/i915: Beef up of Beef up the IPS vs. CRC workaround
Commit 6e644626945c ("drm/i915: Beef up the IPS vs. CRC
workaround") was supposed to solve below bug. However, the
patch I tested is not the same as the one that got merged.
With this addition the test pass.
Chris Wilson [Fri, 25 Aug 2017 15:02:15 +0000 (16:02 +0100)]
drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker
Since we use a worker to enable FBC on the CRTC, it is possible for the
CRTC to be switched off before we run. In this case, the CRTC will not
allow us to wait upon a vblank, so remove the DRM_ERROR as this is very
much expected.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102410 Fixes: ca18d51d77eb ("drm/i915/fbc: wait for a vblank instead of 50ms when enabling") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170825150215.19236-1-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Praveen Paneri [Thu, 10 Aug 2017 18:30:33 +0000 (00:00 +0530)]
drm/i915: Fix FBC cfb stride programming for non X-tiled FB
When FBC is enabled for linear, legacy Y-tiled and Yf-tiled
surfaces on gen9, the cfb stride must be programmed by SW as
cfb_stride = ceiling[(at least plane width in pixels)/
(32 * compression limit factor)] * 8
v2: Minor fix for a build error
v3: Fixed subject, register name and platform check (Ville)
v4: Added WA details in comment (Paulo)
v5:
- Read modified reg write to preserve other bit values (Paulo)
- Store modified stride value in reg_params (Paulo)
- Keep GLK out of the WA (Paulo)
v6:
- added additional field in reg_params for gen9_wa_cfb_stride (Paulo)
- Used appropriate bit mask while writing the register (Paulo)
v7 (from Paulo):
- Fix coding style and spacing issues.
- Mask the old values before writing.
- Bikeshed comments and unnecessary checks.