Liviu Dudau [Tue, 6 Jun 2017 14:05:21 +0000 (15:05 +0100)]
drm/arm: hdlcd: Set the CRTC's port before binding the encoder.
The component-based encoder(s) used by HDLCD expect the CRTC port
to be set before binding in order to find the right endpoint.
Without this patch, the TDA19988 encoder driver prints a warning
"Falling back to first CRTC".
Dave Airlie [Tue, 20 Jun 2017 22:55:22 +0000 (08:55 +1000)]
Merge tag 'drm-intel-next-2017-06-19' of git://anongit.freedesktop.org/git/drm-intel into drm-next
Final pile of features for 4.13
New uabi:
- batch bo in first slot, for faster execbuf assembly in userspace
(Chris Wilson)
- (sub)slice getparam, needed for mesa perf support (Robert Bragg)
First pile of patches for cnl/cfl support, maintained by Rodrigo but
with lots of contributions from others. Still incomplete since public
review still ongoing.
Features/refactoring:
- Make execbuf faster (Chris Wilson), a pile of series to make execbuf
buffer handling have fewer passes, use less list walking, postpone
more work to async workers and shuffle buffers less, all to make the
common case much faster (in some cases at least).
- cold boot support for glk dsi (Madhav Chauhan)
- Clean up pipe A quirk and related old platform hacks (Ville)
- perf sampling support for kbl/glk (Lionel)
- perf cleanups (Robert Bragg)
- wire atomic state to backlight code, to avoid pipe lookup hacks
(Maarten)
- reduce request waiting latency/overhead to remove the spinning and
associated cpu cycle wasting (Chris)
- fix 90/270 rotation wm computation (Ville)
- new ddb allocation algo for skl (Kumar Mahesh)
- fix regression due to system suspend optimiazatino (Imre)
- the usual pile of small cleanups and refactors all over
GVT updates contained in this tag:
- optimization for per-VM mmio save/restore (Changbin)
- optimization for mmio hash table (Changbin)
- scheduler optimization with event (Ping)
- vGPU reset refinement (Fred)
- other misc refactor and cleanups, etc.
* tag 'drm-intel-next-2017-06-19' of git://anongit.freedesktop.org/git/drm-intel: (170 commits)
drm/i915: Update DRIVER_DATE to 20170619
drm/i915/cfl: Introduce Coffee Lake workarounds.
drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH
drm/i915: Stash a pointer to the obj's resv in the vma
drm/i915: Async GPU relocation processing
drm/i915: Allow execbuffer to use the first object as the batch
drm/i915: Wait upon userptr get-user-pages within execbuffer
drm/i915: First try the previous execbuffer location
drm/i915: Store a persistent reference for an object in the execbuffer cache
drm/i915: Eliminate lots of iterations over the execobjects array
drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
drm/i915: Pass vma to relocate entry
drm/i915: Store a direct lookup from object handle to vma
drm/i915: Fix retrieval of hangcheck stats
drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
drm/i915: Mark CPU cache as dirty on every transition for CPU writes
drm/i915: Make i915_vma_destroy() static
drm/i915: Actually attach the tv_format property to the SDVO connector
Revert "drm/i915/skl: New ddb allocation algorithm"
drm/i915/glk: Add cold boot sequence for GLK DSI
...
Dave Airlie [Tue, 20 Jun 2017 22:49:54 +0000 (08:49 +1000)]
Merge tag 'drm-msm-next-2017-06-20' of git://people.freedesktop.org/~robclark/linux into drm-next
This time around, the biggest thing is a bunch of GEM rework for more
fine grained locking and prep work to handle multiple address spaces
(ie. per-process pagetables). Also some HDMI fixes for 8x96
(snapdragon 820).
One unrelated bus patch, for something that seems to get merged
through whatever random tree (and has all the right ack's).
* tag 'drm-msm-next-2017-06-20' of git://people.freedesktop.org/~robclark/linux:
drm/msm: Fix potential buffer overflow issue
bus: SIMPLE_PM_BUS does not depend on ARCH_RENESAS
drm/msm: Separate locking of buffer resources from struct_mutex
drm/msm/hdmi: Fix HDMI pink strip issue seen on 8x96
drm/msm/hdmi: 8996 PLL: Populate unprepare
drm/msm/hdmi: Use bitwise operators when building register values
drm/msm: update generated headers
drm/msm: remove address-space id
drm/msm: support for an arbitrary number of address spaces
drm/msm: refactor how we handle vram carveout buffers
drm/msm: pass address-space to _get_iova() and friends
drm/msm/mdp4+5: move aspace/id to base class
drm/msm/mdp5: kill pipe_lock
drm/msm: fix locking inconsistency for gpu->hw_init()
drm/msm: Remove memptrs->wptr
drm/msm: Add a struct to pass configuration to msm_gpu_init()
drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
drm/msm: Remove idle function hook
drm/msm: Remove DRM_MSM_NUM_IOCTLS
drm/msm: gpu: Enable zap shader for A5XX
Dave Airlie [Tue, 20 Jun 2017 01:19:08 +0000 (11:19 +1000)]
Merge branch 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux into drm-next
A few more things for 4.13:
- Semaphore support using sync objects
- Drop fb location programming
- Optimize bo list ioctl
* 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: Optimize mutex usage (v4)
drm/amdgpu: Optimization of AMDGPU_BO_LIST_OP_CREATE (v2)
amdgpu: use drm sync objects for shared semaphores (v6)
amdgpu/cs: split out fence dependency checking (v2)
drm/amdgpu: don't check the default value for vm size
Dave Airlie [Tue, 20 Jun 2017 01:17:30 +0000 (11:17 +1000)]
Merge branch 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld into drm-next
Here are the Mali DP driver changes. They include the mali-dp specific
changes from Jose Abreu on crtc->mode_valid() as well as a couple of
patches for fixing the sharing of IRQ lines and use of DRM CMA helper
for framebuffer physical address calculation. Please pull!
* 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld:
drm/arm: mali-dp: Use CMA helper for plane buffer address calculation
drm/mali-dp: Check PM status when sharing interrupt lines
drm/arm: malidp: Use crtc->mode_valid() callback
Dave Airlie [Tue, 20 Jun 2017 01:10:49 +0000 (11:10 +1000)]
Merge branch 'linux-4.13' of git://github.com/skeggsb/linux into drm-next
- HDMI stereoscopic support
- Rework of display code to properly support SOR pad macro routing on
>=GM20x GPUs
- Various other fixes/improvements.
* 'linux-4.13' of git://github.com/skeggsb/linux: (73 commits)
drm/nouveau/disp/nv50-: avoid creating ORs that aren't present on HW
drm/nouveau: use proper prototype in nouveau_pmops_runtime() definition
drm/nouveau: Skip vga_fini on non-PCI device
drm/nouveau/tegra: Don't leave GPU in reset
drm/nouveau/tegra: Skip manual unpowergating when not necessary
drm/nouveau/hwmon: Change permissions to numeric
drm/nouveau/hwmon: expose the auto_point and pwm_min/max attrs
drm/nouveau/hwmon: Remove old code, add .write/.read operations
drm/nouveau/hwmon: Add nouveau_hwmon_ops structure with .is_visible/.read_string
drm/nouveau/hwmon: Add config for all sensors and their settings
drm/nouveau/disp/gm200-: allow non-identity mapping of SOR <-> macro links
drm/nouveau/disp/nv50-: implement a common supervisor 3.0
drm/nouveau/disp/nv50-: implement a common supervisor 2.2
drm/nouveau/disp/nv50-: implement a common supervisor 2.1
drm/nouveau/disp/nv50-: implement a common supervisor 2.0
drm/nouveau/disp/nv50-: implement a common supervisor 1.0
drm/nouveau/disp/nv50-gt21x: remove workaround for dp->tmds hotplug issues
drm/nouveau/disp/dp: use new devinit script interpreter entry-point
drm/nouveau/disp/dp: determine link bandwidth requirements from head state
drm/nouveau/disp: introduce acquire/release display path methods
...
Dave Airlie [Tue, 20 Jun 2017 01:07:03 +0000 (11:07 +1000)]
Merge tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
drm/tegra: Changes for v4.13-rc1
This starts off with the addition of more documentation for the host1x
and DRM drivers and finishes with a slew of fixes and enhancements for
the staging IOCTLs as a result of the awesome work done by Dmitry and
Erik on the grate reverse-engineering effort.
* tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux:
gpu: host1x: At first try a non-blocking allocation for the gather copy
gpu: host1x: Refactor channel allocation code
gpu: host1x: Remove unused host1x_cdma_stop() definition
gpu: host1x: Remove unused 'struct host1x_cmdbuf'
gpu: host1x: Check waits in the firewall
gpu: host1x: Correct swapped arguments in the is_addr_reg() definition
gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall
gpu: host1x: Forbid RESTART opcode in the firewall
gpu: host1x: Forbid relocation address shifting in the firewall
gpu: host1x: Do not leak BO's phys address to userspace
gpu: host1x: Correct host1x_job_pin() error handling
gpu: host1x: Initialize firewall class to the job's one
drm/tegra: dc: Disable plane if it is invisible
drm/tegra: dc: Apply clipping to the plane
drm/tegra: dc: Avoid reset asserts on Tegra20
drm/tegra: Check syncpoint ID in the 'submit' IOCTL
drm/tegra: Correct copying of waitchecks and disable them in the 'submit' IOCTL
drm/tegra: Check for malformed offsets and sizes in the 'submit' IOCTL
drm/tegra: Add driver documentation
gpu: host1x: Flesh out kerneldoc
Kasin Li [Mon, 19 Jun 2017 21:36:53 +0000 (15:36 -0600)]
drm/msm: Fix potential buffer overflow issue
In function submit_create, if nr_cmds or nr_bos is assigned with
negative value, the allocated buffer may be small than intended.
Using this buffer will lead to buffer overflow issue.
Signed-off-by: Kasin Li <donglil@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
Alex Xie [Fri, 16 Jun 2017 13:07:29 +0000 (09:07 -0400)]
drm/amdgpu: Optimize mutex usage (v4)
In original function amdgpu_bo_list_get, the waiting
for result->lock can be quite long while mutex
bo_list_lock was holding. It can make other tasks
waiting for bo_list_lock for long period.
Secondly, this patch allows several tasks(readers of idr)
to proceed at the same time.
v2: use rcu and kref (Dave Airlie and Christian König)
v3: update v1 commit message (Michel Dänzer)
v4: rebase on upstream (Alex Deucher)
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
Alex Xie [Fri, 16 Jun 2017 04:23:41 +0000 (00:23 -0400)]
drm/amdgpu: Optimization of AMDGPU_BO_LIST_OP_CREATE (v2)
v2: Remove duplication of zeroing of bo list (Christian König)
Move idr_alloc function to end of ioctl (Christian König)
Call kfree bo_list when amdgpu_bo_list_set return error.
Combine the previous two patches into this patch.
Add amdgpu_bo_list_set function prototype.
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
Rob Clark [Tue, 13 Jun 2017 12:59:11 +0000 (08:59 -0400)]
bus: SIMPLE_PM_BUS does not depend on ARCH_RENESAS
In fact, it is needed for PCI to work on msm8996 (and probably other
things). No idea why it was depending on renesas but that doesn't make
any sense. So drop the dependency.
Signed-off-by: Rob Clark <robdclark@gmail.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Arnd Bergmann <arnd@arndb.de>
drm/msm: Separate locking of buffer resources from struct_mutex
Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.
Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker] Signed-off-by: Rob Clark <robdclark@gmail.com>
Rodrigo Vivi [Fri, 16 Jun 2017 22:49:58 +0000 (15:49 -0700)]
drm/i915/cfl: Introduce Coffee Lake workarounds.
Coffee Lake inherit most of Kabylake production
workarounds.
v2: Fix typo on commit message and remove
WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC,
since as Mika pointed out they shouldn't be here for cfl
according to BSpec.
Dave Airlie [Mon, 13 Mar 2017 22:18:15 +0000 (22:18 +0000)]
amdgpu: use drm sync objects for shared semaphores (v6)
This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.
Sync objects are managed via the drm syncobj ioctls.
The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.
This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.
In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.
NOTE: this interface addition needs a version bump to expose
it to userspace.
TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)
v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis
v6: lookup post deps earlier, and just replace fences
in post deps stage (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Thu, 9 Mar 2017 03:45:52 +0000 (03:45 +0000)]
amdgpu/cs: split out fence dependency checking (v2)
This just splits out the fence depenency checking into it's
own function to make it easier to add semaphore dependencies.
v2: rebase onto other changes.
v1-Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 15 Jun 2017 22:20:09 +0000 (18:20 -0400)]
drm/amdgpu: don't check the default value for vm size
Avoids printing spurious messages like this:
[ 3.102059] amdgpu 0000:01:00.0: VM size (-1) must be a power of 2
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH
Although we use 9 bits of Device ID for identifying PCH, only 8 bits are
stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and
HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the
platforms with LP PCH.
v2: Drop PCH_LPT_LP change (Imre)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Imre Deak <imre.deak@intel.com> Fixes: commit ec7e0bb35f8d ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH") Reported-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1497641774-29104-1-git-send-email-dhinakaran.pandiyan@intel.com
Chris Wilson [Fri, 16 Jun 2017 14:05:25 +0000 (15:05 +0100)]
drm/i915: Stash a pointer to the obj's resv in the vma
During execbuf, a mandatory step is that we add this request (this
fence) to each object's reservation_object. Inside execbuf, we track the
vma, and to add the fence to the reservation_object then means having to
first chase the obj, incurring another cache miss. We can reduce the
number of cache misses by stashing a pointer to the reservation_object
in the vma itself.
Chris Wilson [Fri, 16 Jun 2017 14:05:24 +0000 (15:05 +0100)]
drm/i915: Async GPU relocation processing
If the user requires patching of their batch or auxiliary buffers, we
currently make the alterations on the cpu. If they are active on the GPU
at the time, we wait under the struct_mutex for them to finish executing
before we rewrite the contents. This happens if shared relocation trees
are used between different contexts with separate address space (and the
buffers then have different addresses in each), the 3D state will need
to be adjusted between execution on each context. However, we don't need
to use the CPU to do the relocation patching, as we could queue commands
to the GPU to perform it and use fences to serialise the operation with
the current activity and future - so the operation on the GPU appears
just as atomic as performing it immediately. Performing the relocation
rewrites on the GPU is not free, in terms of pure throughput, the number
of relocations/s is about halved - but more importantly so is the time
under the struct_mutex.
v2: Break out the request/batch allocation for clearer error flow.
v3: A few asserts to ensure rq ordering is maintained
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Fri, 16 Jun 2017 14:05:23 +0000 (15:05 +0100)]
drm/i915: Allow execbuffer to use the first object as the batch
Currently, the last object in the execlist is the always the batch.
However, when building the batch buffer we often know the batch object
first and if we can use the first slot in the execlist we can emit
relocation instructions relative to it immediately and avoid a separate
pass to adjust the relocations to point to the last execlist slot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Fri, 16 Jun 2017 14:05:22 +0000 (15:05 +0100)]
drm/i915: Wait upon userptr get-user-pages within execbuffer
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Chris Wilson [Fri, 16 Jun 2017 14:05:21 +0000 (15:05 +0100)]
drm/i915: First try the previous execbuffer location
When choosing a slot for an execbuffer, we ideally want to use the same
address as last time (so that we don't have to rebind it) and the same
address as expected by the user (so that we don't have to fixup any
relocations pointing to it). If we first try to bind the incoming
execbuffer->offset from the user, or the currently bound offset that
should hopefully achieve the goal of avoiding the rebind cost and the
relocation penalty. However, if the object is not currently bound there
we don't want to arbitrarily unbind an object in our chosen position and
so choose to rebind/relocate the incoming object instead. After we
report the new position back to the user, on the next pass the
relocations should have settled down.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com>
Chris Wilson [Fri, 16 Jun 2017 14:05:20 +0000 (15:05 +0100)]
drm/i915: Store a persistent reference for an object in the execbuffer cache
If we take a reference to the object/vma when it is first used in an
execbuf, we can keep that reference until the object's file-local handle
is closed. Thereby saving a frequent ref/unref pair.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Fri, 16 Jun 2017 14:05:19 +0000 (15:05 +0100)]
drm/i915: Eliminate lots of iterations over the execobjects array
The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.
Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.
The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.
The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.
v2: Add a Theory of Operation spiel.
v3: Fall back to slow relocations in preparation for flushing userptrs.
v4: Document struct members, factor out eb_validate_vma(), add a few
more comments to explain some magic and hide other magic behind macros.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Fri, 16 Jun 2017 14:05:18 +0000 (15:05 +0100)]
drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
If we write a relocation into the buffer, we require our own implicit
synchronisation added after the start of the execbuf, outside of the
user's control. As we may end up clflushing, or doing the patch itself
on the GPU, asynchronously we need to look at the implicit serialisation
on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
object.
If the user does trigger a stall for relocations, we make sure the stall
is complete enough so that the batch is not submitted before we complete
those relocations.
Fixes: 77ae9957897d ("drm/i915: Enable userspace to opt-out of implicit fencing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Fri, 16 Jun 2017 14:05:17 +0000 (15:05 +0100)]
drm/i915: Pass vma to relocate entry
We can simplify our tracking of pending writes in an execbuf to the
single bit in the vma->exec_entry->flags, but that requires the
relocation function knowing the object's vma. Pass it along.
Note we have only been using a single bit to track flushing since
Chris Wilson [Fri, 16 Jun 2017 14:05:16 +0000 (15:05 +0100)]
drm/i915: Store a direct lookup from object handle to vma
The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.
In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.
v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Archit Taneja [Fri, 16 Jun 2017 05:09:34 +0000 (10:39 +0530)]
drm/msm/hdmi: 8996 PLL: Populate unprepare
Without doing anything in unprepare, the HDMI driver isn't able to
switch modes successfully. Calling set_rate with a new rate results
in an un-locked PLL.
If we reset the PLL in unprepare, the PLL is able to lock with the
new rate.
Signed-off-by: Archit Taneja <architt@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
Liviu Dudau [Thu, 15 Jun 2017 14:13:46 +0000 (15:13 +0100)]
drm/msm/hdmi: Use bitwise operators when building register values
Commit c0c0d9eeeb8d ("drm/msm: hdmi audio support") uses logical
OR operators to build up a value to be written in the
REG_HDMI_AUDIO_INFO0 and REG_HDMI_AUDIO_INFO1 registers when it
should have used bitwise operators.
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Fixes: c0c0d9eeeb8d ("drm/msm: hdmi audio support") Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 13 Jun 2017 18:27:45 +0000 (14:27 -0400)]
drm/msm: remove address-space id
Now that the msm_gem supports an arbitrary number of vma's, we no longer
need to assign an id (index) to each address space. So rip out the
associated code.
Rob Clark [Tue, 13 Jun 2017 15:50:05 +0000 (11:50 -0400)]
drm/msm: refactor how we handle vram carveout buffers
Pull some of the logic out into msm_gem_new() (since we don't need to
care about the imported-bo case), and don't defer allocating pages. The
latter is generally a good idea, since if we are using VRAM carveout to
allocate contiguous buffers (ie. no IOMMU), the allocation is more
likely to fail. So failing at allocation time is a more sane option.
Plus this simplifies things in the next patch.
Rob Clark [Tue, 13 Jun 2017 15:07:08 +0000 (11:07 -0400)]
drm/msm: pass address-space to _get_iova() and friends
No functional change, that will come later. But this will make it
easier to deal with dynamically created address spaces (ie. per-
process pagetables for gpu).
Rob Clark [Tue, 13 Jun 2017 14:22:37 +0000 (10:22 -0400)]
drm/msm/mdp4+5: move aspace/id to base class
Before we can shift to passing the address-space object to _get_iova(),
we need to fix a few places (dsi+fbdev) that were hard-coding the adress
space id. That gets somewhat easier if we just move these to the kms
base class.
Rob Clark [Tue, 13 Jun 2017 17:58:23 +0000 (13:58 -0400)]
drm/msm/mdp5: kill pipe_lock
It serves no purpose, things should be sufficiently synchronized already
by atomic framework. And it is somewhat awkward to be holding a spinlock
when msm_gem_iova() is going to start needing to grab a mutex.
Rob Clark [Tue, 13 Jun 2017 13:15:36 +0000 (09:15 -0400)]
drm/msm: fix locking inconsistency for gpu->hw_init()
Most, but not all, paths where calling the with struct_mutex held. The
fast-path in msm_gem_get_iova() (plus some sub-code-paths that only run
the first time) was masking this issue.
So lets just always hold struct_mutex for hw_init(). And sprinkle some
WARN_ON()'s and might_lock() to avoid this sort of problem in the
future.
Jordan Crouse [Mon, 8 May 2017 20:35:03 +0000 (14:35 -0600)]
drm/msm: Add a struct to pass configuration to msm_gpu_init()
The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
Jordan Crouse [Mon, 8 May 2017 20:35:01 +0000 (14:35 -0600)]
drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
Modify the 'pad' member of struct drm_msm_gem_info to 'flags'. If the
user sets 'flags' to non-zero it means that they want a IOVA for the
GEM object instead of a mmap() offset. Return the iova in the 'offset'
member.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[robclark: s/hint/flags in commit msg] Signed-off-by: Rob Clark <robdclark@gmail.com>
Jordan Crouse [Mon, 8 May 2017 20:34:59 +0000 (14:34 -0600)]
drm/msm: Remove DRM_MSM_NUM_IOCTLS
The ioctl array is sparsely populated but the compiler will make sure
that it is sufficiently sized for all the values that we have so we
can safely use ARRAY_SIZE() instead of having a constantly changing
#define in the uapi header.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
Jordan Crouse [Wed, 17 May 2017 14:45:29 +0000 (08:45 -0600)]
drm/msm: gpu: Enable zap shader for A5XX
The A5XX GPU powers on in "secure" mode. In secure mode the GPU can
only render to buffers that are marked as secure and inaccessible
to the kernel and user through a series of hardware protections. In
practice secure mode is used to draw things like a UI on a secure
video frame.
In order to switch out of secure mode the GPU executes a special
shader that clears out the GMEM and other sensitve registers and
then writes a register. Because the kernel can't be trusted the
shader binary is signed and verified and programmed by the
secure world. To do this we need to read the MDT header and the
segments from the firmware location and put them in memory and
present them for approval.
For targets without secure support there is an out: if the
secure world doesn't support secure then there are no hardware
protections and we can freely write the SECVID_TRUST register from
the CPU. We don't have 100% confidence that we can query the
secure capabilities at run time but we have enough calls that
need to go right to give us some confidence that we're at least doing
something useful.
Of course if we guess wrong you trigger a permissions violation
which usually ends up in a system crash but thats a problem
that shows up immediately.
[v2: use child device per Bjorn]
[v3: use generic MDT loader per Bjorn]
[v4: use managed dma functions and ifdefs for the MDT loader]
[v5: Add depends for QCOM_MDT_LOADER]
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[robclark: fix Kconfig to use select instead of depends + #if IS_ENABLED()] Signed-off-by: Rob Clark <robdclark@gmail.com>
Chris Wilson [Fri, 16 Jun 2017 10:54:55 +0000 (11:54 +0100)]
drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.
Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.
v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Chris Wilson [Thu, 15 Jun 2017 12:38:49 +0000 (13:38 +0100)]
drm/i915: Mark CPU cache as dirty on every transition for CPU writes
Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.
The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).
v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.
v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.
Reported-by: Dongwon Kim <dongwon.kim@intel.com> Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Ville Syrjälä [Thu, 15 Jun 2017 17:23:08 +0000 (20:23 +0300)]
drm/i915: Actually attach the tv_format property to the SDVO connector
Attach the tv_format property to the SDVO connector instead of passing
a '0' in place of the pointer to the property. This got broken when
the SDVO connector properties were converted to atomic.
We can thank sparse for catching this:
drivers/gpu/drm/i915/intel_sdvo.c:2742:75: warning: Using plain integer as NULL pointer
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Fixes: 630d30a4ee27 ("drm/i915: Convert intel_sdvo connector properties to atomic.") Link: http://patchwork.freedesktop.org/patch/msgid/20170615172308.10121-1-ville.syrjala@linux.intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Liviu Dudau [Tue, 23 May 2017 13:18:18 +0000 (14:18 +0100)]
drm/mali-dp: Check PM status when sharing interrupt lines
If an instance of Mali DP hardware shares the interrupt line with
another hardware (usually another instance of the Mali DP) its
interrupt handler can get called when the device is suspended.
Check the PM status before making access to the hardware registers
to avoid deadlocks.
Jose Abreu [Fri, 19 May 2017 00:52:17 +0000 (01:52 +0100)]
drm/arm: malidp: Use crtc->mode_valid() callback
Now that we have a callback to check if crtc supports a given mode
we can use it in malidp so that we restrict the number of probbed
modes to the ones we can actually display.
Also, remove the mode_fixup() callback as this is no longer needed
because mode_valid() will be called before.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Carlos Palminha <palminha@synopsys.com> Cc: Alexey Brodkin <abrodkin@synopsys.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@linux.ie> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Archit Taneja <architt@codeaurora.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
Jani Nikula [Fri, 16 Jun 2017 07:03:00 +0000 (10:03 +0300)]
Merge tag 'gvt-next-2017-06-08' of https://github.com/01org/gvt-linux into drm-intel-next-queued
gvt-next-2017-06-08
First gvt-next pull for 4.13:
- optimization for per-VM mmio save/restore (Changbin)
- optimization for mmio hash table (Changbin)
- scheduler optimization with event (Ping)
- vGPU reset refinement (Fred)
- other misc refactor and cleanups, etc.
Arnd Bergmann [Fri, 9 Jun 2017 10:38:33 +0000 (12:38 +0200)]
drm/nouveau: use proper prototype in nouveau_pmops_runtime() definition
There is a prototype for this function in the header, but the function
itself lacks a 'void' in the argument list, causing a harmless warning
when building with 'make W=1':
drivers/gpu/drm/nouveau/nouveau_drm.c: In function 'nouveau_pmops_runtime':
drivers/gpu/drm/nouveau/nouveau_drm.c:730:1: error: old-style function definition [-Werror=old-style-definition]
Fixes: 321f5c5f2c49 ("drm/nouveau: replace multiple open-coded runpm support checks with function") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Mikko Perttunen [Fri, 9 Jun 2017 12:25:41 +0000 (15:25 +0300)]
drm/nouveau: Skip vga_fini on non-PCI device
As with vga_init, this function doesn't make sense on non-PCI devices,
and the Thunderbolt check in it dereferences a NULL pointer in that
case. Add some code to skip this function when the device is not a PCI
device.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Mikko Perttunen [Fri, 9 Jun 2017 12:25:39 +0000 (15:25 +0300)]
drm/nouveau/tegra: Skip manual unpowergating when not necessary
On Tegra186, powergating is handled by the BPMP power domain provider
and the "legacy" powergating API is not available. Therefore skip
these calls if we are attached to a power domain.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Oscar Salvador [Thu, 18 May 2017 21:24:38 +0000 (23:24 +0200)]
drm/nouveau/hwmon: Change permissions to numeric
This patch replaces the symbolic permissions with the numeric ones.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Oscar Salvador [Thu, 18 May 2017 21:24:37 +0000 (23:24 +0200)]
drm/nouveau/hwmon: expose the auto_point and pwm_min/max attrs
This patch creates a special group attributes for attrs like "*auto_point*".
We check if we have support for them, and if we do, we gather them all in
an attribute_group's structure which is the parameter regarding special groups
of hwmon_device_register_with_info
We also do the same for pwm_min/max attrs.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Oscar Salvador [Thu, 18 May 2017 21:24:36 +0000 (23:24 +0200)]
drm/nouveau/hwmon: Remove old code, add .write/.read operations
This patch removes old code related to the old api and transforms the
functions for the new api. It also adds the .write and .read operations.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Oscar Salvador [Thu, 18 May 2017 21:24:35 +0000 (23:24 +0200)]
drm/nouveau/hwmon: Add nouveau_hwmon_ops structure with .is_visible/.read_string
This patch introduces the nouveau_hwmon_ops structure, sets up
.is_visible and .read_string operations and adds all the functions
for these operations.
This is also a preparation for the next patches, where most of the
work is being done.
This code doesn't interacture with the old one.
It's just to make easier the review of all patches.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Oscar Salvador [Thu, 18 May 2017 21:24:34 +0000 (23:24 +0200)]
drm/nouveau/hwmon: Add config for all sensors and their settings
This is a preparation for the next patches. It just adds the sensors with
their possible configurable settings and then fills the struct hwmon_channel_info
with all this information.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp/nv50-: implement a common supervisor 3.0
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp/nv50-: implement a common supervisor 2.2
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp/nv50-: implement a common supervisor 2.1
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp/nv50-: implement a common supervisor 2.0
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp/nv50-: implement a common supervisor 1.0
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
These exist to give NVKM information on the set of display paths that
the DD needs to be active at any given time.
Previously, the supervisor attempted to determine this solely from OR
state, but there's a few configurations where this information on its
own isn't enough to determine the specific display paths in question:
- ANX9805, where the PIOR protocol for both DP and TMDS is TMDS.
- On a device using DCB Switched Outputs.
- On GM20x and newer, with a crossbar between the SOR and macro links.
After this commit, the DD tells NVKM *exactly* which display path it's
attempting a modeset on.
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp/dp: train link only when actively displaying an image
This essentially (unless the link becomes unstable and needs to be
re-trained) gives us a single entry-point to link training, during
supervisor handling, where we can ensure all routing is up to date.
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp/dp: determine a failsafe link training rate
The aim here is to protect the OR against locking up when something
unexpected happens (such as the display disappearing during modeset,
or the DD misbehaving).
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp: identity-map display paths to output resources
This essentially replicates our current behaviour in a way that's
compatible with the new model that's emerging, so that we're able
to start porting the hw-specific functions to it.
Ben Skeggs [Fri, 19 May 2017 13:59:35 +0000 (23:59 +1000)]
drm/nouveau/disp: fork off some new hw-specific implementations
Upcoming commits make supervisor handling share code between the NV50
and GF119 implementations. Because of this, and a few other cleanups,
we need to allow some additional customisation.
In order to properly support the SOR -> SOR + pad macro separation
that occurred with GM20x GPUs, we need to separate OR handling out
of the output path code.
This will be used as the base to support ORs (DAC, SOR, PIOR).