Ben Skeggs [Wed, 1 Jun 2022 10:48:14 +0000 (20:48 +1000)]
drm/nouveau/gr/gf117-: make ppc_nr[gpc] accurate
We're going to be pulling in a chunk of code from NVGPU to fixup our
SMID mappings on Volta and above, which depends on ppc_nr[gpc]
reflecting the actual number of PPCs present, not the maximum number.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
This won't work on Ampere, and, it's questionable whether we should have
been using our FW's method of storing the golden context image with NV's
firmware to begin with.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:52 +0000 (20:47 +1000)]
drm/nouveau/acr: use common falcon HS FW code for ACR FWs
Adds context binding and support for FWs with a bootloader to the code
that was added to load VPR scrubber HS binaries, and ports ACR over to
using all of it.
- gv100 split from gp108 to handle FW exit status differences
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:52 +0000 (20:47 +1000)]
drm/nouveau/fb/gp102-: unlock VPR right after devinit
Under memory load, instmem allocations could end up in the regions of
VRAM that are inaccessible right after boot, and be corrupted after a
suspend/resume cycle as a result of being restored before booting the
mem unlock firmware.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:51 +0000 (20:47 +1000)]
drm/nouveau/flcn: new code to load+boot simple HS FWs (VPR scrubber)
Adds the start of common interfaces to load and boot the HS binaries
provided by NVIDIA that enable the usage of GR.
ACR already handles most of this, but it's very much tied into ACR's
init process, and there's other code that could benefit from reusing
a lot of this stuff too (ie. VBIOS DEVINIT/PreOS, VPR scrubber).
The VPR scrubber code is fairly independent, and a good first target.
- adds better debug output to fw loading process, to ease bring-up/debug
v2:
- whitespace, 0->false
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:51 +0000 (20:47 +1000)]
drm/nouveau/flcn: rework falcon reset
Mostly preparation to fit in Ampere changes, but should result in reset
sequences a lot closer to RM's, and perhaps help out with the issues we
sometimes see reported in this area.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:38 +0000 (20:47 +1000)]
drm/nouveau/fifo: add new channel classes
Exposes a bunch of the new features that became possible as a result
of the earlier commits. DRM will build on this in the future to add
support for features such as SCG ("async compute") and multi-device
rendering, as part of the work necessary to be able to write a half-
decent vulkan driver - finally.
For the moment, this just crudely ports DRM to the API changes.
- channel class interfaces now the same for all HW classes
- channel group class exposed (SCG)
- channel runqueue selector exposed (SCG)
- channel sub-device id control exposed (multi-device rendering)
- channel names in logging will reflect creating process, not fd owner
- explicit USERD allocation required by VOLTA_CHANNEL_GPFIFO_A and newer
- drm is smarter about determining the appropriate channel class to use
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:35 +0000 (20:47 +1000)]
drm/nouveau/fifo: add common runlist control
- less dependence on waiting for runlist updates, on GPUs that allow it
- supports runqueue selector in RAMRL entries
- completes switch to common runl/cgrp/chan topology info
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:34 +0000 (20:47 +1000)]
drm/nouveau/fifo: add common channel recovery
That sure was fun to untangle.
- handled per-runlist, rather than globally
- more straight-forward process in general
- various potential SW/HW races have been fixed
- fixes lockdep issues that were present in >=gk104's prior implementation
- volta recovery now actually stands a chance of working
- volta/turing waiting for PBDMA idle before engine reset
- turing using hw-provided TSG info for CTXSW_TIMEOUT
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:33 +0000 (20:47 +1000)]
drm/nouveau/fifo: kill channel on a selection of PBDMA errors
A bunch of these can be handled in such a way that the channel can
continue, however, any of these are a pretty decent sign something
has gone horribly wrong, and the safest option is to disable the
channel.
This is a bit of a hack, we will want to handle these individually
and dump relevant debug info for each at some point.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:32 +0000 (20:47 +1000)]
drm/nouveau/fifo: add chan start()/stop()
- nvkm_chan_error() built on top, stops channel and sends 'killed' event
- removes an odd double-bashing of channel enable regs on kepler and up
- pokes doorbell on turing and up, after enabling channel
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:30 +0000 (20:47 +1000)]
drm/nouveau/fifo: add new engine context tracking
Channel groups have somewhat more complicated requirements than what we
currently support. An engine context is shared between all channels in
a channel group, VEID/subctx support (later) brings per-VEID components,
and we need to track an individual channel's engine context pointers.
This commit adds the structures and refcounting to support the above,
wrapping the prior implementation for the moment.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:29 +0000 (20:47 +1000)]
drm/nouveau/fifo: merge mmu fault handlers together
After updating GF100 implementation from the GK104/TU102 ones, and using
the new runlist/engine topology info, all three handlers become (almost)
identical.
- there's a temporary kludge to call through to the HW-specific recovery
- engine fault mapping info determined at load time, not on every fault
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:28 +0000 (20:47 +1000)]
drm/nouveau/fifo: move PBDMA init to runq
- bumps pbdma timeout to value RM uses on newer HW
- bumps fb timeout to max from boot default
- one/both of these greatly improves stability on // piglit runs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:24 +0000 (20:47 +1000)]
drm/nouveau/fifo: add common runlist/engine topology
Creates an nvkm_runl for each runlist on the GPU, and an nvkm_engn for
each engine that is reachable from a runlist.
- basically what gk104- already does, but extended to all chips
- adds per-runlist CHID allocators (Ampere)
- splits g98/gt2xx out from g84 (different target engines)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:23 +0000 (20:47 +1000)]
drm/nouveau/fifo: add chid allocator
We need to be able to allocate TSG IDs as well as channel IDs, also,
Ampere has per-runlist channel IDs.
- holds per-ID private data, which will be used for/to protect lookup
- holds an nvkm_event which will be used for events tied to IDs
- not used yet beyond setup, and switching use of "fifo->nr - 1" for
channel ID mask to "chid->mask"
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:21 +0000 (20:47 +1000)]
drm/nouveau/fifo: unify handling of channel classes
Adds the basic skeleton for common channel (group) interfaces.
- common behaviour between <gk104 and >=gk104 impl's
- separates priv/user channel objects
- passthrough to existing object for now, kludges removed later
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:47:19 +0000 (20:47 +1000)]
drm/nouveau/nvkm: add locking to subdev/engine init paths
This wasn't really needed before; the main place this could race is with
channel recovery, but (through potentially fragile means) shouldn't have
been possible.
However, a number of upcoming patches benefit from having better control
over subdev init, necessitating some improvements here.
- allows subdev/engine oneinit() without init() (host/fifo patches)
- merges engine use locking/tracking into subdev, and extends it to fix
some issues that will arise with future usage patterns (acr patches)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
It's quite a lot of tedious and error-prone work to switch over all the
subdevs at once, so allow an nvkm_intr to request new-style handlers to
be created that wrap the existing interfaces.
This will allow a more gradual transition.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:46:52 +0000 (20:46 +1000)]
drm/nouveau/intr: support multiple trees, and explicit interfaces
Turing adds a second top-level interrupt tree in HW, in addition to the
trees available via NV_PMC. Most of the interrupts we care about are
exposed in both trees, but not all of them, and we have some rather
nasty hacks to route the fault buffer interrupts.
Ampere removes the NV_PMC trees entirely.
Here we add some infrastructure to be able to handle all of this more
cleanly, as well as providing more explicit control over handlers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Wed, 1 Jun 2022 10:46:40 +0000 (20:46 +1000)]
drm/nouveau/kms: switch to drm fbdev helpers
This removes support for accelerated fbcon rendering, and fixes a number
of races/crashes/issues around suspend/resume/module unload etc.
Losing HW accelerated rendering isn't ideal, but it's been significantly
reduced in performance since the removal of accelerated scrolling in the
kernel anyway - not to mention, can be racey (skips cpu<->gpu sync) from
certain contexts.