Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/mmu: implement base for new vm management
This is the first chunk of the new VMM code that provides the structures
needed to describe a GPU virtual address-space layout, as well as common
interfaces to handle VMM creation, and connecting instances to a VMM.
The constructor now allocates the PD itself, rather than having the user
handle that manually. This won't/can't be used until after all backends
have been ported to these interfaces, so a little bit of memory will be
wasted on Fermi and newer for a couple of commits in the series.
Compatibility has been hacked into the old code to allow each GPU backend
to be ported individually.
GP100 "big" (which is a funny name, when it supports "even bigger") page
tables are small enough that we want to be able to suballocate them from
a larger block of memory.
This builds on the previous page table cache interfaces so that the VMM
code doesn't need to know the difference.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/mmu: implement page table cache
Builds up and maintains a small cache of each page table size in order
to reduce the frequency of expensive allocations, particularly in the
pathological case where an address range ping-pongs between allocated
and free.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau: allocate vram with nvkm_ram_get()
This will cause a subtle behaviour change on GPUs that are in mixed-memory
configurations in that VRAM in the degraded section of VRAM will no longer
be used for TTM buffer objects.
That section of VRAM is not meant to be used for displayable/compressed
surfaces, and we have no reliable way with the current interfaces to be
able to make that decision properly.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/fb/ram: add interface to allocate vram as an nvkm_memory object
Upcoming MMU changes use nvkm_memory as its basic representation of memory,
so we need to be able to allocate VRAM like this.
The code is basically identical to the current chipset-specific allocators,
minus support for compression tags (which will be handled elsewhere anyway).
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/core/memory: add mechanism to retrieve allocation granularity
Needed by VMM code to determine whether an allocation is compatible with
a given page size (ie. you can't map 4KiB system memory pages into 64KiB
GPU pages).
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/core/memory: change map interface to support upcoming mmu changes
Map flags (access, kind, etc) are currently defined in either the VMA,
or the memory object, which turns out to not be ideal for things like
suballocated buffers, etc.
These will become per-map flags instead, so we need to support passing
these arguments in nvkm_memory_map().
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/core/memory: comptag allocation
nvkm_memory is going to be used by the upcoming mmu rework for the basic
representation of a memory allocation, as such, this commit adds support
for comptag allocation to nvkm_memory.
This is very simple for now, in that it requires comptags for the entire
memory allocation even if only certain ranges are compressed.
Support for tracking ranges will be added at a later date.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/fb: move comptags mm into nvkm_fb
We're moving towards having a central place to handle comptag allocation,
and as some GPUs don't have a ram submodule (ie. Tegra), we need to move
the mm somewhere else.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau: hang drm client of a master
TTM memory allocations will be hanging off the DRM's client, but the
locking needed to do so gets really tricky with all the other use of
the DRM's object tree.
To solve this, we make the normal DRM client a child of a new master,
where the memory allocations will be done from instead.
This also solves a potential race with client creation.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/imem/nv50: support eviction of BAR2 mappings
A good deal of the structures we map into here aren't accessed very often
at all, and Fedora 26 has exposed an issue where after creating a heap of
channels, BAR2 space would run out, and we'd need to make use of the slow
path while accessing important structures like page tables.
This implements an LRU on BAR2 space, which allows eviction of mappings
that aren't currently needed, to make space for other objects.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj
This is not as simple as it was for earlier GPUs, due to the need to swap
accessor functions depending on whether BAR2 is usable or not.
We were previously protected by nvkm_instobj's accessor functions keeping
an object mapped permanently, with some unclear magic that managed to hit
the slow-path where needed even if an object was marked as mapped.
That's been replaced here by reference counting maps (some objects, like
page tables can be accessed concurrently), and swapping the functions as
necessary.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/imem/nv50: move slow-path locking into rd/wr functions
This is to simplify upcoming changes. The slow-path is something that
currently occurs during bootstrap of the BAR2 VMM, while backing up an
object during suspend/resume, or when BAR2 address space runs out.
The latter is a real problem that can happen at runtime, and occurs in
Fedora 26 already (due to some change that causes a lot of channels to
be created at login), so ideally we'd prefer not to make it any slower.
We'd also like suspend/resume speed to not suffer.
Upcoming commits will solve those problems in a better way, making the
extra overhead of moving the locking here a non-issue.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/bar: prevent BAR2 mapping of objects during destructor
GP100's page table nests a lot more deeply than the GF100-compatible
layout we're currently using, which means our hackish-but-simple way
of dealing with BAR2 VMM teardown won't work anymore.
In order to sanely handle the chicken-and-egg (BAR2's PTs get mapped
into themselves) problem, we need prevent page tables getting mapped
back into BAR2 during the destruction of its VMM.
To do this, we simply key off the state that's now maintained by the
BAR2 init/fini functions.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/core/object: allow arguments to be passed to map function
MMU will be needing this to specify kind info on BAR mappings.
We have no userspace currently using these interfaces, so break the ABI
instead of supporting both. NVIF version bump so any future use can be
guarded.
Ben Skeggs [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau: fix handling of GART OOM on pre-NV50 chipsets
The correct thing to do on OOM is to return 0 and set mm_node to NULL,
otherwise TTM will assume some other kind of error, and not attempt to
evict other buffers to make space.
Jérémy Lefaure [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/bios/init: use ARRAY_SIZE
Using the ARRAY_SIZE macro improves the readability of the code. Also,
it is useless to re-invent it.
Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)
Rhys Kidd [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau: Document nouveau support for Tegra in DRIVER_DESC
nouveau supports the Tegra K1 and higher after the SoC-based GPUs converged
with the main GeForce GPU families.
v2:
- Qualify that support is Tegra K1+ (Martin Peres)
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr> Acked-by: Pierre Moreau <pierre.morrow@free.fr> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Rhys Kidd [Tue, 31 Oct 2017 17:56:19 +0000 (03:56 +1000)]
drm/nouveau/therm/gp100: initial implementation of new gp1xx temperature sensor
v2:
- add nv138 and drop nv13b chipsets (Ilia Mirkin)
- refactor out status variable and instead mask tsensor (Ilia Mirkin)
- switch SHADOWed state message away from nvkm_error() (Ilia Mirkin)
- rename internal temperature variable (Karol Herbst)
v3:
- use nvkm_trace() for SHADOWed state message (Ben Skeggs)
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Dave Airlie [Thu, 2 Nov 2017 01:29:28 +0000 (11:29 +1000)]
Merge tag 'drm-msm-next-2017-11-01' of git://people.freedesktop.org/~robclark/linux into drm-next
+ preemption support for a5xx[1][2]
+ display fixes for 8x96 (snapdragon 820) including fixes for 4k scanout
(hwpipe assignment re-work to handle multiple hwpipe assigned to plane
for wide scanout)
+ async cursor plane updates and fixes
+ refactor adreno_bind/hwinit.. still defer fw loading until device open,
but move clk/irq/etc to probe/bind time to fix issues when fw isn't
present in filesys
+ clk/dt bindings cleanups w/ backward compat via msm_clk_get() (dt docs
part ack'ed by Rob Herring)
+ fw loading re-work with helper to handle either /lib/firmware/qcom/$fw
or /lib/firmware/$fw.. background, we've started landing fw for some of
generations in linux-firmware, but there is a preference to put fw files
under 'qcom' subdirectory, which is not what was done on android or for
people who copied fw from android. So now we first look in qcom subdir
and then fallback to the original location.
+ bunch of GPU debugging enhancements, to dump full cmdline of processes
that trigger faults, and to add a new debugfs to capture cmdstream of
just submits that triggered faults.. both quite useful for piglit ;-)
* tag 'drm-msm-next-2017-11-01' of git://people.freedesktop.org/~robclark/linux: (38 commits)
drm/msm: use %z format modifier for printing size_t
drm/msm/mdp5: Don't use async plane update path if plane visibility changes
drm/msm/mdp5: mdp5_crtc: Restore cursor state only if LM cursors are enabled
drm/msm/mdp5: Update mdp5_pipe_assign to spit out both planes
drm/msm/mdp5: Prepare mdp5_pipe_assign for some rework
drm/msm: remove mdp5_cursor_plane_funcs
drm/msm: update cursors asynchronously through atomic
drm/msm/atomic: switch to drm_atomic_helper_check
drm/msm/mdp5: restore cursor state when enabling crtc
drm/msm/mdp5: don't use autosuspend
drm/msm/mdp5: ignore planes that are not visible
drm/msm: dump submits which triggered gpu hang
drm/msm: preserve IOVAs in submit's bo table
drm/msm/rd: allow adding addition msg to top of dump
drm/msm: split rd debugfs file
drm/msm: add special _get_vaddr_active() for cmdstream dumps
drm/msm: show task cmdline in gpu recovery messages
drm/msm: dump a rd GPUADDR header for all buffers in the command
drm/msm: Removed unused struct_mutex_task
drm/msm: Implement preemption for A5XX targets
...
Arnd Bergmann [Thu, 3 Aug 2017 11:50:48 +0000 (13:50 +0200)]
drm/msm: use %z format modifier for printing size_t
The return type of ARRAY_SIZE() is size_t, so we have to use
%zu instead of %lu to avoid this warning:
drivers/gpu/drm/msm/msm_gpu.c: In function 'msm_gpu_init':
drivers/gpu/drm/msm/msm_gpu.c:742:31: error: format '%lu' expects argument of type 'long unsigned int', but argument 7 has type 'unsigned int' [-Werror=format=]
The warning it otherwise harmless as size_t is always the
same size as unsigned long in all supported architectures,
but gcc doesn't know that.
Fixes: c2fceabca6d5 ("drm/msm: Support multiple ringbuffers") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rob Clark <robdclark@gmail.com>
Peter Griffin [Tue, 15 Aug 2017 14:14:25 +0000 (15:14 +0100)]
drm/hisilicon: Ensure LDI regs are properly configured.
This patch fixes the following soft lockup:
BUG: soft lockup - CPU#0 stuck for 23s! [weston:307]
On weston idle-timeout the IP is powered down and reset
asserted. On weston resume we get a massive vblank
IRQ storm due to the LDI registers having lost some state.
This state loss is caused by ade_crtc_atomic_begin() not
calling ade_ldi_set_mode(). With this patch applied
resuming from Weston idle-timeout works well.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Cc: stable@vger.kernel.org Reviewed-by: Xinliang Liu <xinliang.liu@linaro.org> Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
2) In mac80211, validate user rate mask before configuring it. From
Johannes Berg.
3) Properly enforce memory limits in fair queueing code, from Toke
Hoiland-Jorgensen.
4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
5) Fix TSO header allocation and management in mvpp2 driver, from Yan
Markman.
6) Don't take socket lock in BH handler in strparser code, from Tom
Herbert.
7) Don't show sockets from other namespaces in AF_UNIX code, from
Andrei Vagin.
8) Fix double free in error path of tap_open(), from Girish Moodalbail.
9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
and Alexander Duyck.
10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
Long.
12) Properly align SKB head before building SKB in tuntap, from Jason
Wang.
13) Avoid matching qdiscs with a zero handle during lookups, from Cong
Wang.
14) Fix various endianness bugs in sctp, from Xin Long.
15) Fix tc filter callback races and add selftests which trigger the
problem, from Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
selftests: Introduce a new test case to tc testsuite
selftests: Introduce a new script to generate tc batch file
net_sched: fix call_rcu() race on act_sample module removal
net_sched: add rtnl assertion to tcf_exts_destroy()
net_sched: use tcf_queue_work() in tcindex filter
net_sched: use tcf_queue_work() in rsvp filter
net_sched: use tcf_queue_work() in route filter
net_sched: use tcf_queue_work() in u32 filter
net_sched: use tcf_queue_work() in matchall filter
net_sched: use tcf_queue_work() in fw filter
net_sched: use tcf_queue_work() in flower filter
net_sched: use tcf_queue_work() in flow filter
net_sched: use tcf_queue_work() in cgroup filter
net_sched: use tcf_queue_work() in bpf filter
net_sched: use tcf_queue_work() in basic filter
net_sched: introduce a workqueue for RCU callbacks of tc filter
sctp: fix some type cast warnings introduced since very beginning
sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
sctp: fix some type cast warnings introduced by transport rhashtable
sctp: fix some type cast warnings introduced by stream reconf
...