Chris Wilson [Thu, 4 Aug 2016 06:52:23 +0000 (07:52 +0100)]
drm/i915: Split early global GTT initialisation
Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.
v2: Fixup after rebasing
v3: GGTT initialisation needs to be split around kicking out conflicts
v4: Restore an old UMS BUG_ON(mappable > total) as a DRM_ERROR plus
fixup of probe results.
Chris Wilson [Thu, 4 Aug 2016 06:52:22 +0000 (07:52 +0100)]
drm/i915: Update GGTT initialisation functions to take drm_i915_private
Since these are internal functions they operate on drm_i915_private and
not the drm_device being passed in. So pass in the drm_i915_private
instead, and remove one layer of dancing. No space wins here, just
conforming to the norm in function parameters.
Chris Wilson [Thu, 4 Aug 2016 06:52:21 +0000 (07:52 +0100)]
drm/i915: Split GGTT initialisation between probing and setup
In order to handle conflicting drivers (i.e. vgacon) having a different
setup of hardware, we have to remove those other drivers before we try
to setup our own mappings. This requires us to split GGTT initialisation
between probing for the hardware location (part of the PCI BAR) and
later establishing the kernel resources for it.
Chris Wilson [Thu, 4 Aug 2016 06:52:20 +0000 (07:52 +0100)]
drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers
As we can now have multiple VMA inside the global GTT (with partial
mappings, rotations, etc), it is no longer true that there may just be a
single GGTT entry and so we should walk the full vma_list to count up
the actual usage. In addition to unifying the two walkers, switch from
multiplying the object size for each vma to summing the bound vma sizes.
Chris Wilson [Tue, 2 Aug 2016 21:50:39 +0000 (22:50 +0100)]
drm/i915: Simplify calling engine->sync_to
Since requests can no longer be generated as a side-effect of
intel_ring_begin(), we know that the seqno will be unchanged during
ring-emission. This predicatablity then means we do not have to check
for the seqno wrapping around whilst emitting the semaphore for
engine->sync_to().
Now that emitting requests is identical between legacy and execlists, we
can use the same function to build up the ring for submitting to either
engine. (With the exception of i915_switch_contexts(), but in time that
will also be handled gracefully.)
Chris Wilson [Tue, 2 Aug 2016 21:50:37 +0000 (22:50 +0100)]
drm/i915: Refactor golden render state emission to unconfuse gcc
GCC was inlining the init and setup functions, but was getting itself
confused into thinking that variables could be used uninitialised. If we
do the inline for gcc, it is happy! As a bonus we shrink the code.
Chris Wilson [Tue, 2 Aug 2016 21:50:36 +0000 (22:50 +0100)]
drm/i915: Remove duplicate golden render state init from execlists
Now that we use the same vfuncs for emitting the batch buffer in both
execlists and legacy, the golden render state initialisation is
identical between both.
v2: gcc wants so.ggtt_offset initialised (even though it is not used)
Chris Wilson [Tue, 2 Aug 2016 21:50:33 +0000 (22:50 +0100)]
drm/i915: Stop passing caller's num_dwords to engine->semaphore.signal()
Rather than pass in the num_dwords that the caller wishes to use after
the signal command packet, split the breadcrumb emission into two phases
and have both the signal and breadcrumb individiually acquire space on
the ring. This makes the interface simpler for the reader, and will
simplify for patches.
Chris Wilson [Tue, 2 Aug 2016 21:50:32 +0000 (22:50 +0100)]
drm/i915/lrc: Update function names to match request flow
With adding engine->submit_request, we now have a bunch of functions
with similar names used at different stages of the execlist submission.
Try a different coat of paint, to hopefully reduce confusion between the
requests, intel_engine_cs and the actual execlists submision process.
Chris Wilson [Tue, 2 Aug 2016 21:50:31 +0000 (22:50 +0100)]
drm/i915: Unify request submission
Move request submission from emit_request into its own common vfunc
from i915_add_request().
v2: Convert I915_DISPATCH_flags to BIT(x) whilst passing
v3: Rename a few functions to match.
v4: Reenable execlists submission after disabling guc.
v5: Be aware that everyone calls i915_guc_submission_disable()!
Chris Wilson [Tue, 2 Aug 2016 21:50:30 +0000 (22:50 +0100)]
drm/i915: Move the modulus for ring emission to the register write
Space reservation is already safe with respect to the ring->size
modulus, but hardware only expects to see values in the range
0...ring->size-1 (inclusive) and so requires the modulus to prevent us
writing the value ring->size instead of 0. As this is only required for
the register itself, we can defer the modulus to the register update and
not perform it after every command packet. We keep the
intel_ring_advance() around in the code to provide demarcation for the
end-of-packet (which then can be compared against intel_ring_begin() as
the number of dwords emitted must match the reserved space).
v2: Assert that the ring size is a power-of-two to match assumptions in
the code. Simplify the comment before writing the tail value to explain
why the modulus is necessary.
Chris Wilson [Tue, 2 Aug 2016 21:50:29 +0000 (22:50 +0100)]
drm/i915: Convert engine->write_tail to operate on a request
If we rewrite the I915_WRITE_TAIL specialisation for the legacy
ringbuffer as submitting the request onto the ringbuffer, we can unify
the vfunc with both execlists and GuC in the next patch.
v2: Drop the modulus from the I915_WRITE_TAIL as it is currently being
applied in intel_ring_advance() after every command packet, and add a
comment explaining why we need the modulus at all.
Chris Wilson [Tue, 2 Aug 2016 21:50:28 +0000 (22:50 +0100)]
drm/i915: Remove intel_ring_get_tail()
Joonas doesn't like the tiny function, especially if I go around making
it more complicated and using it elsewhere. To remove that temptation,
remove the function!
Chris Wilson [Tue, 2 Aug 2016 21:50:27 +0000 (22:50 +0100)]
drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START
Both the ->dispatch_execbuffer and ->emit_bb_start callbacks do exactly
the same thing, add MI_BATCHBUFFER_START to the request's ringbuffer -
we need only one vfunc.
Chris Wilson [Tue, 2 Aug 2016 21:50:25 +0000 (22:50 +0100)]
drm/i915: Reduce engine->emit_flush() to a single mode parameter
Rather than passing a complete set of GPU cache domains for either
invalidation or for flushing, or even both, just pass a single parameter
to the engine->emit_flush to determine the required operations.
Space for flushing the GPU cache prior to completing the request is
preallocated and so cannot fail - the GPU caches will always be flushed
along with the completed request. This means we no longer have to track
whether the GPU cache is dirty between batches like we had to with the
outstanding_lazy_seqno.
With the removal of the duplication in the per-backend entry points for
emitting the obsolete lazy flush, we can then further unify the
engine->emit_flush.
v2: Expand a bit on the legacy of gpu_caches_dirty
Chris Wilson [Tue, 2 Aug 2016 21:50:21 +0000 (22:50 +0100)]
drm/i915: Rename struct intel_ringbuffer to struct intel_ring
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
Keith Packard [Sun, 31 Jul 2016 07:54:51 +0000 (00:54 -0700)]
drm/i915: cleanup_plane_fb: also drop reference to current state wait_req
There are two paths into intel_cleanup_plane_fb, the normal completion
path and the failure path.
In the failure case, intel_cleanup_plane_fb is called before
drm_atomic_helper_swap_state, so any wait_req reference made in
intel_prepare_plane_fb will be in old_intel_state->wait_req.
In the normal completion path, drm_atomic_helper_swap_state has
already been called, so the plane state holding the just-used wait_req
will not be in old_intel_state->wait_req, rather it will be in the
state associated with the plane itself.
Clearing this reference ensures that the wait_req will be freed as
soon as it the related mode setting operation is complete, rather than
waiting for some future mode setting operation to eventually
dereference it.
The existing dereference of old_intel_state->wait_req is still
required as that will hold the wait_req when the mode setting
operation fails.
cc: Daniel Vetter <daniel.vetter@intel.com>
cc: David Airlie <airlied@linux.ie>
cc: intel-gfx@lists.freedesktop.org
cc: dri-devel@lists.freedesktop.org Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ville Syrjälä [Fri, 29 Jul 2016 14:57:02 +0000 (17:57 +0300)]
drm/i915: Program FW_BLC_SELF on 915G as well
According to Bspec FW_BLC_SELF exists on 915G also. Let's program it.
The only open question is whether there's is a memory self-refresh
enable bit somewhere as well. For 945G/GM it's in FW_BLC_SELF, for
915GM it's in INSTPM. For 915G I can't find one in the docs. Let's drop
a FIXME about this, in case someone with the hardware is ever bored
enough to look for it.
Ville Syrjälä [Fri, 29 Jul 2016 14:57:01 +0000 (17:57 +0300)]
drm/i915: Always use cpp==4 for FW_BLC_SELF on 915GM/945GM
Bspec says:
"FW_BLC_SELF
...
Programming Note [DevALV] and [DevCST]: When calculating watermark
values for 15/16bpp, assume 32bpp for purposes of calculation using
the high priority bandwidth analysis spreadsheet."
Let's do that.
Perhaps this might even help with the problem that resulted in
commit 2ab1bc9df01d ("drm/i915: Disable self-refresh for untiled fbs on i915gm")
Ville Syrjälä [Tue, 12 Jul 2016 12:59:35 +0000 (15:59 +0300)]
drm/i915: Simplify intel_ddi_get_encoder_port()
We no longer have any need to look up the intel_digital_port based
on the passed in intel_encoder, but we still want to look up the port.
Let's just move that logic into intel_ddi_get_encoder_port() and drop
the dig_port stuff.
Ville Syrjälä [Tue, 12 Jul 2016 12:59:33 +0000 (15:59 +0300)]
drm/i915: Split DP/eDP/FDI and HDMI/DVI DDI buffer programming apart
DDI buffer prorgramming works quite differently depending on
the mode of the DDI port (DP/eDP/FDI vs. HDMI/DVI). Let's split
the function that does the programming into two matching variants
as well.
Ville Syrjälä [Tue, 12 Jul 2016 12:59:31 +0000 (15:59 +0300)]
drm/i915: Move bxt_ddi_vswing_sequence() call into intel_ddi_pre_enable() for HDMI
Now that the SKL iboost programming is done from intel_ddi_pre_enable()
for HDMI, let's move the BXT bxt_ddi_vswing_sequence() call there as
well. This makes things look more similar to the DP/eDP case which
is handled in ddi_signal_levels().
Ville Syrjälä [Tue, 12 Jul 2016 12:59:28 +0000 (15:59 +0300)]
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
Bspec says:
"For DDIA with x4 capability (DDI_BUF_CTL DDIA Lane Capability Control =
DDIA x4), the I_boost value has to be programmed in both
tx_blnclegsctl_0 and tx_blnclegsctl_4."
Currently we only program tx_blnclegsctl_0. Let's do the other one as
well.
Chris Wilson [Tue, 2 Aug 2016 10:15:27 +0000 (11:15 +0100)]
drm/i915: Protect older gen against intel_gt_init_powersave()
In the middle of intel_gt_init_powersave() we have an if-chain that ends
with a universal else clause to read gen6+ registers. Older platforms
like Pineview that end up here do not like those registers and may even
OOPS whilst reading them!
Fixes: 3ea9a80132 ("drm/i915: Perform static RPS frequency setup ...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470132927-1821-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
David Weinehall [Mon, 1 Aug 2016 14:33:27 +0000 (17:33 +0300)]
drm/i915/debugfs: Take runtime_pm ref for sseu
When reading the SSEU statistics, we need to call
intel_runtime_pm_get() first, otherwise we might end up
triggering "Device suspended during HW access".
Chris Wilson [Thu, 28 Jul 2016 23:45:35 +0000 (00:45 +0100)]
drm/i915: Add missing ring_mask to Pineview
It appears that we never told Pineview it has a RENDER_RING. This was
all fine until we started using the ring_mask for determining all the
available rings to initialise for legacy ringbuffer submission in commit 88d2ba2e95c8 ("drm/i915: Unify engine init loop"). Though really it is a
latent bug since the ring_mask inception in commit 73ae478cdf6a
("drm/i915: Replace has_bsd/blt/vebox with a mask").
To prevent similar mishaps in future, add a WARN_ON() if we find
ourselves with a device without any rings.
Fixes: 73ae478cdf6a ("drm/i915: Replace has_bsd/blt/vebox with a mask") Fixes: 88d2ba2e95c8 ("drm/i915: Unify engine init loop") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ben Widawsky <ben@bwidawsk.net> Link: http://patchwork.freedesktop.org/patch/msgid/1469749535-2382-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org
Ville Syrjälä [Mon, 23 May 2016 14:42:48 +0000 (17:42 +0300)]
drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
SNB (and IVB too I suppose) starts to misbehave if the GPU gets stuck
in an infinite batch buffer loop. The GPU apparently hogs something
critical and CPUs start to lose interrupts and whatnot. We can keep
the system limping along by unmasking some interrupts in
GEN6_PMINTRMSK. The EI up interrupt has been previously chosen for
that task, so let's never mask it.
Chris Wilson [Wed, 27 Jul 2016 08:07:30 +0000 (09:07 +0100)]
drm/i915: Update a couple of hangcheck comments to talk about engines
We still have lots of comments that refer to the old ring when we mean
struct intel_engine_cs and its hardware correspondence. This patch fixes
an instance inside hangcheck to talk about engines.
Chris Wilson [Wed, 27 Jul 2016 08:07:28 +0000 (09:07 +0100)]
drm/i915: Avoid using intel_engine_cs *ring for GPU error capture
Inside the error capture itself, we refer to not only the hardware
engine, its ringbuffer but also the capture state. Finding clear names
for each whilst avoiding mixing ring/intel_engine_cs is tricky. As a
compromise we keep using ering for the error capture.
v2: Use 'ee' locals for struct drm_i915_error_engine
Chris Wilson [Wed, 27 Jul 2016 08:07:27 +0000 (09:07 +0100)]
drm/i915: Use engine to refer to the user's BSD intel_engine_cs
This patch transitions the execbuf engine selection away from using the
ring nomenclature - though we still refer to the user's incoming
selector as their user_ring_id.
Ville Syrjälä [Wed, 13 Jul 2016 13:32:03 +0000 (16:32 +0300)]
drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
Bspec tells us to keep bashing the PCU for up to 3ms when trying to
inform it about an upcoming change in the cdclk frequency. Currently
we only keep at it for 15*10usec (+ whatever delays gets added by
the sandybridge_pcode_read() itself). Let's change the limit to 3ms.
I decided to keep 10 usec delay per iteration for now, even though
the spec doesn't really tell us to do that.
Cc: stable@vger.kernel.org Fixes: 5d96d8afcfbb ("drm/i915/skl: Deinit/init the display at suspend/resume") Cc: David Weinehall <david.weinehall@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468416723-23440-1-git-send-email-ville.syrjala@linux.intel.com Tested-by: David Weinehall <david.weinehall@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Tue, 26 Jul 2016 11:01:53 +0000 (12:01 +0100)]
drm/i915: Only drop the batch-pool's object reference
The obj->batch_pool_link is only inspected when traversing the batch
pool list and when on the batch pool list the object is referenced. Thus
when freeing the batch pool list, we only need to unreference the object
and do not have to worry about the obj->batch_pool_link.
Chris Wilson [Tue, 26 Jul 2016 11:01:52 +0000 (12:01 +0100)]
drm/i915: Only clear the client pointer when tearing down the file
Upon release of the file (i.e. the user calls close(fd)), we decouple
all objects from the client list so that we don't chase the dangling
file_priv. As we always inspect file_priv first, we only need to nullify
that pointer and can safely ignore the list_head.
Chris Wilson [Tue, 26 Jul 2016 11:01:50 +0000 (12:01 +0100)]
drm/i915: Reduce breadcrumb lock coverage for intel_engine_enable_signaling()
Since intel_engine_enable_signaling() is now only called via
fence_enable_sw_signaling(), we can rely on it to provide serialisation
and run-once for us and so make ourselves slightly simpler.
Chris Wilson [Sun, 24 Jul 2016 09:10:21 +0000 (10:10 +0100)]
drm/i915: Update the breadcrumb interrupt counter before enabling
In order to close a race with a long running hangcheck comparing a stale
interrupt counter with a just started waiter, we need to first bump the
counter as we start the fresh wait.
Chris Wilson [Sun, 24 Jul 2016 09:10:20 +0000 (10:10 +0100)]
drm/i915: Drop racy markup of missed-irqs from idle-worker
During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.
Chris Wilson [Thu, 21 Jul 2016 20:16:19 +0000 (21:16 +0100)]
Revert "drm/i915: Enable RC6 immediately"
This reverts commit b12e0ee2080c ("drm/i915: Enable RC6 immediately"),
as it was never meant to be sent anywhere other than the bug report for
experimentation.
I forgot to remove these when reworking the firmware loading sequence
last year. The new sequence is that we load firmware, and if it's not
there we entirely (and permanently) fail dmc setup.
Dave Gordon [Thu, 21 Jul 2016 17:39:38 +0000 (18:39 +0100)]
drm/i915: use i915_gem_object_put_unlocked() after releasing mutex
The exit path in intel_overlay_put_image_ioctl() first unlocks the
struct_mutex, then drops its reference to 'new_bo' by calling
i915_gem_object_put(). As it isn't holding the mutex at this point,
this should be i915_gem_object_put_unlocked().
This was previously correct but got splatted in the recent
s/drm_gem_object_unreference/i915_gem_object_put/
where the _unlocked suffix was lost in this one case.
v2: don't bother fixing whitespace glitch [Chris Wilson]
Chris can do it next time he touches gem_evict.c ;)
Dave Gordon [Wed, 20 Jul 2016 17:16:07 +0000 (18:16 +0100)]
drm/i915: rename & update eb_select_ring()
'ring' is an old deprecated term for a GPU engine, so we're trying to
phase out all such terminology. eb_select_ring() not only has 'ring'
(meaning engine) in its name, but it has an ugly calling convention
whereby it returns an errno and stores a pointer-to-engine indirectly
through an output parameter. As there is only one error it ever returns
(-EINVAL), we can make it return the pointer directly, and have the
caller pass back the error code -EINVAL if the pointer result is NULL.
Thus we can replace
- ret = eb_select_ring(dev_priv, file, args, &engine);
- if (ret)
- return ret;
with
+ engine = eb_select_engine(dev_priv, file, args);
+ if (!engine)
+ return -EINVAL;
for increased clarity and maybe save a few cycles too.
Dave Gordon [Wed, 20 Jul 2016 17:16:06 +0000 (18:16 +0100)]
drm/i915: rename 'ring' where it refers to an engine or engine_id
'ring' is an old deprecated term for a GPU engine. Chris Wilson wants to
use the name for what is currently known as an intel_ringbuffer, but it
will be dreadfully confusing if some rings are ringbuffers but other
rings are still engines. So this patch changes the names of a bunch of
parameters called 'ring' to either 'engine' or 'engine_id' according to
what they actually are.
Dave Gordon [Wed, 20 Jul 2016 17:16:05 +0000 (18:16 +0100)]
drm/i915: rename macro parameter(ring) to (engine)
'ring' is an old deprecated term for a GPU engine. Here we make the
terminology more consistent by renaming the 'ring' parameter of lots of
macros that calculate addresses within the MMIO space of an engine.
Add WaDisableGatherAtSetShaderCommonSlice for all gen9 as stated
by bspec. The bspec told to put this workaround to the per ctx bb.
Initial implementation and subsequent review were done based on
bspec. Arun raised a suspicion that this would belong to indirect bb
instead and he conducted more throughout investigation on the matter
and indeed the documentation was wrong.
v2: Move to indirect_ctx wa bb, as it is correct place (Arun)
References: HSD#2135817 Cc: Arun Siluvery <arun.siluvery@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> (v1) Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469013973-24104-1-git-send-email-mika.kuoppala@intel.com
Chris Wilson [Wed, 20 Jul 2016 12:31:57 +0000 (13:31 +0100)]
drm/i915: Convert i915_semaphores_is_enabled over to early sanitize
Rather than recomputing whether semaphores are enabled, we can do that
computation once during early initialisation as the i915.semaphores
module parameter is now read-only.
Chris Wilson [Wed, 20 Jul 2016 12:31:55 +0000 (13:31 +0100)]
drm/i915: Treat ringbuffer writes as write to normal memory
Ringbuffers are now being written to either through LLC or WC paths, so
treating them as simply iomem is no longer adequate. However, for the
older !llc hardware, the hardware is documentated as treating the TAIL
register update as serialising, so we can relax the barriers when filling
the rings (but even if it were not, it is still an uncached register write
and so serialising anyway.).
For simplicity, let's ignore the iomem annotation.
v2: Remove iomem from ringbuffer->virtual_address
v3: And for good measure add iomem elsewhere to keep sparse happy
As these are wrappers around kref_get/kref_put() it is preferable to
follow the naming convention and use the same verb get/put in our
wrapper names for manipulating a reference to the context.
Chris Wilson [Wed, 20 Jul 2016 08:21:15 +0000 (09:21 +0100)]
drm/i915: Wait on external rendering for GEM objects
When transitioning to the GTT or CPU domain we wait on all rendering
from i915 to complete (with the optimisation of allowing concurrent read
access by both the GPU and client). We don't yet ensure all rendering
from third parties (tracked by implicit fences on the dma-buf) is
complete. Since implicitly tracked rendering by third parties will
ignore our cache-domain tracking, we have to always wait upon rendering
from third-parties when transitioning to direct access to the backing
store. We still rely on clients notifying us of cache domain changes
(i.e. they need to move to the GTT read or write domain after doing a CPU
access before letting the third party render again).
v2:
This introduces a potential WARN_ON into i915_gem_object_free() as the
current i915_vma_unbind() calls i915_gem_object_wait_rendering(). To
hit this path we first need to render with the GPU, have a dma-buf
attached with an unsignaled fence and then interrupt the wait. It does
get fixed later in the series (when i915_vma_unbind() only waits on the
active VMA and not all, including third-party, rendering.
To offset that risk, use the __i915_vma_unbind_no_wait hack.
Chris Wilson [Wed, 20 Jul 2016 08:21:14 +0000 (09:21 +0100)]
drm/i915: Mark imported dma-buf objects as being coherent
A foreign dma-buf does not share our cache domain tracking, and we rely
on the producer ensuring cache coherency. Marking them as being in the
CPU domain is incorrect.
v2: Add commentary about the GTT domain. This is not the best place for
it, but pending an actual overhaul of our domain tracking and explaining
each one, this comment should help the next reader...
Chris Wilson [Wed, 20 Jul 2016 08:21:13 +0000 (09:21 +0100)]
drm/i915: Disable waitboosting for mmioflips/semaphores
Since commit a6f766f39751 ("drm/i915: Limit ring synchronisation (sw
sempahores) RPS boosts") and commit bcafc4e38b6a ("drm/i915: Limit mmio
flip RPS boosts") we have limited the waitboosting for semaphores and
flips. Ideally we do not want to boost in either of these instances as no
userspace consumer is waiting upon the results (though a userspace producer
may be stalled trying to submit an execbuf - but in this case the
producer is being throttled due to the engine being saturated with
work). With the introduction of NO_WAITBOOST in the previous patch, we
can finally disable these needless boosts.
Chris Wilson [Wed, 20 Jul 2016 08:21:12 +0000 (09:21 +0100)]
drm/i915: Disable waitboosting for fence_wait()
We want to restrict waitboosting to known process contexts, where we can
track which clients are receiving waitboosts and prevent excessive power
wasting. For fence_wait() we do not have any client tracking and so that
leaves it open to abuse.
v2: Hide the IS_ERR_OR_NULL testing for special clients
Chris Wilson [Wed, 20 Jul 2016 08:21:11 +0000 (09:21 +0100)]
drm/i915: Derive GEM requests from dma-fence
dma-buf provides a generic fence class for interoperation between
drivers. Internally we use the request structure as a fence, and so with
only a little bit of interfacing we can rebase those requests on top of
dma-buf fences. This will allow us, in the future, to pass those fences
back to userspace or between drivers.
v2: The fence_context needs to be globally unique, not just unique to
this device.
Chris Wilson [Wed, 20 Jul 2016 08:21:10 +0000 (09:21 +0100)]
drm/i915: Mark all current requests as complete before resetting them
Following a GPU reset upon hang, we retire all the requests and then
mark them all as complete. If we mark them as complete first, we both
keep the normal retirement order (completed first then retired) and
provide a small optimisation for concurrent lookups.
Chris Wilson [Wed, 20 Jul 2016 08:21:09 +0000 (09:21 +0100)]
drm/i915: Retire oldest completed request before allocating next
In order to keep the memory allocated for requests reasonably tight, try
to reuse the oldest request (so long as it is completed and has no
external references) for the next allocation.
v2: Throw in a comment to hopefully make sure no one mistakes the
optimistic retirement of the oldest request for simply stealing it.
We have latency issues that might impact the performance: #96606.
and hangs and loading issues on resume after S4: #96526.
This is also blocking a platform milestone so let's disable
this for now while we make sure we don't have any more loading
issue, or related basic hangs and it pass BAT for real in all
platofmrs.
In case BAT is wrong let's first fix BAT before re-enable it here.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96606
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96526 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Stable <stable@vger.kernel.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Christophe Prigent <christophe.prigent@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1468884477-30086-1-git-send-email-rodrigo.vivi@intel.com
Imre Deak [Fri, 1 Jul 2016 14:32:08 +0000 (17:32 +0300)]
drm/i915: Give proper names to MOCS entries
The purpose for each MOCS entry isn't well defined atm. Defining these
is important to remove any uncertainty about the use of these entries
for example in terms of performance and GPU/CPU coherency.
Suggested by Ville.
v4:
- Rename I915_MOCS_AUTO to I915_MOCS_PTE. (Chris)
Imre Deak [Fri, 1 Jul 2016 13:40:05 +0000 (16:40 +0300)]
drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config
Setting a write-back cache policy in the MOCS entry definition also
implies snooping, which has a considerable overhead. This is
unexpected for a few reasons:
- From user-space's point of view since it didn't want a coherent
surface (it didn't set the buffer as such via the set caching IOCTL).
- There is a separate MOCS entry field for snooping (which we never
set).
- This MOCS table is about caching in (e)LLC and there is no (e)LLC on
BXT. There is a separate table for L3 cache control.
Considering the above the current behavior of snooping looks like an
unintentional side-effect of the WB setting. Changing it to be LLC-UC
gets rid of the snooping without any ill-effects. For a coherent
surface the application would use a separate MOCS entry at index 1 and
call the set caching IOCTL to setup the PTE entries for the
corresponding buffer to be snooped. In the future we could also add a
new MOCS entry for coherent surfaces.
This resulted in 70% improvement in synthetic texturing benchmarks.
Kudos to Valtteri Rantala, Eero Tamminen and Michael T Frederick and
Ville who helped to narrow the source of problem to the kernel and to
the snooping behaviour in particular.
With a follow-up change to adjust the 3rd entry value
igt/gem_mocs_settings is passing after this change.
v2:
- Rebase on v2 of patch 1/2.
v3:
- Set the entry as LLC uncached instead of PTE-passthrough. This way
we also keep snooping disabled, but we also make the cacheability/
coherency setting indepent of the PTE which is managed by the
kernel. (Chris)
CC: Rong R Yang <rong.r.yang@intel.com> CC: Yakui Zhao <yakui.zhao@intel.com> CC: Valtteri Rantala <valtteri.rantala@intel.com> CC: Eero Tamminen <eero.t.tamminen@intel.com> CC: Michael T Frederick <michael.t.frederick@intel.com> CC: Ville Syrjälä <ville.syrjala@linux.intel.com> CC: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Acked-by: Zhao Yakui <yakui.zhao@intel.com> Tested-by: Rong R Yang <rong.r.yang@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467380406-11954-3-git-send-email-imre.deak@intel.com
Imre Deak [Fri, 1 Jul 2016 13:40:04 +0000 (16:40 +0300)]
drm/i915/gen9: Clean up MOCS table definitions
Use named struct initializers for clarity. Also fix the target cache
definition to reflect its role in GEN9 onwards. On GEN8 a TC value of 0
meant ELLC but on GEN9+ it means the TC and LRU controls are taken from
the PTE.
No functional change, igt/gem_mocs_settings still passing after this
change.
v2: (Chris)
- Add back the hexa literals for the entries.
Add note that igt/gem_mocs_settings still passes.
Daniel Vetter [Fri, 15 Jul 2016 19:48:06 +0000 (21:48 +0200)]
drm/i915: Clean up kerneldoc for intel_lrc.c
Fairly minimal, there's still lots of functions without any docs, and
which aren't static. But probably we want to first clean this up some more.
- Drop the bogus const. Marking argument pointers themselves (instead of
what they point at) as const provides roughly 0 value. And it's confusing,
since the data the pointer points at _is_ being changed.
- Remove kerneldoc for static functions. Keep comments where they seem valuable.
- Indent and whitespace fixes.
- Blockquote the bit field definitions of the descriptor for correct layouting.
Daniel Vetter [Fri, 15 Jul 2016 19:48:05 +0000 (21:48 +0200)]
drm/i915: Fixup kerneldoc code snippets in intel_uncore.c
We need :: before, blank lines around and indentation with 4 _additional_
spaces to make it work. Also, don't use @param in code snippets, it results
in confusion.
Chris Wilson [Sat, 16 Jul 2016 17:42:36 +0000 (18:42 +0100)]
drm/i915: Handle ENOSPC after failing to insert a mappable node
Even after adding individual page support for GTT mmaping, we can still
fail to find any space within the mappable region, and
drm_mm_insert_node() will then report ENOSPC. We have to then handle
this error by using the shmem access to the pages.
Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ... objects")
Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Dave Gordon [Thu, 14 Jul 2016 13:52:04 +0000 (14:52 +0100)]
drm/i915: refactor eb_get_batch()
Precursor for fix to secure batch execution. We will need to be able to
retrieve the batch VMA (as well as the batch itself) from the eb list,
so this patch extracts that part of eb_get_batch() into a separate
function, and moves both parts to a more logical place in the file, near
where the eb list is created.
Also, it may not be obvious, but the current execbuffer2 ioctl interface
requires that the buffer object containing the batch-to-be-executed be
the LAST entry in the exec2_list[] array (I expected it to be the
first!).
To clarify this, we can replace the rather obscure construct
"list_entry(eb->vmas.prev, ...)"
in the old version of eb_get_batch() with the equivalent but more
explicit
"list_last_entry(&eb->vmas,...)"
in the new eb_get_batch_vma() and of course add an explanatory comment.
Dave Gordon [Thu, 14 Jul 2016 13:52:03 +0000 (14:52 +0100)]
drm/i915: compile-time consistency check on __EXEC_OBJECT flags
Two different sets of flag bits are stored in the 'flags' member of a
'struct drm_i915_gem_exec_object2', and they're defined in two different
source files, increasing the risk of an accidental clash.
Some flags in this field are supplied by the user; these are defined in
i915_drm.h, and they start from the LSB and work up.
Other flags are defined in i915_gem_execbuffer, for internal use within
that file only; they start from the MSB and work down.
So here we add a compile-time check that the two sets of flags do not
overlap, which would cause all sorts of confusion.
Bob Paauwe [Fri, 15 Jul 2016 13:59:02 +0000 (14:59 +0100)]
drm/i915: Set legacy properties when using legacy gamma set IOCTL. (v2)
The i915 driver is now using atomic properties and atomic commit
to handle the legacy set gamma IOCTL. However, if the driver is
configured without atomic (nuclear_pageflip = false), it won't
update the legacy properties for degamma_lut, gamma_lut and ctm
leaving them out of sync with the atomic version of the properties.
Until the driver is full atomic, make sure we update the non-atomic
version of the properties.
v2: Update the comment with a FIXME. (Daniel)
v3: Update arguments of the gamma_set vfunc (Lionel)