Matt Roper [Mon, 22 Mar 2021 23:38:40 +0000 (16:38 -0700)]
drm/i915/display: Simplify GLK display version tests
GLK has always been a bit of a special case since it reports INTEL_GEN()
as 9, but has version 10 display IP. Now we can properly represent the
display version as 10 and simplify the display generation tests
throughout the display code.
Aside from manually adding the version to the glk_info structure, the
rest of this patch is generated with a Coccinelle semantic patch. Note
that we also need to switch any code that matches gen10 today but *not*
GLK to be CNL-specific:
Matt Roper [Sat, 20 Mar 2021 04:42:43 +0000 (21:42 -0700)]
drm/i915: Convert INTEL_GEN() to DISPLAY_VER() as appropriate in intel_pm.c
Although most of the code in this file is display-related (watermarks),
there's some functions that are not (e.g., clock gating). Thus we need
to do the conversions to DISPLAY_VER() manually here rather than using
Coccinelle.
In the near-future we'll probably want to think about moving watermark
logic out of intel_pm.c and into watermark-specific files under the
display/ directory.
v2:
- Use new IS_DISPLAY_VER macro where appropriate.
Matt Roper [Sat, 20 Mar 2021 04:42:42 +0000 (21:42 -0700)]
drm/i915/display: Eliminate most usage of INTEL_GEN()
Use Coccinelle to convert most of the usage of INTEL_GEN() and IS_GEN()
in the display code to use DISPLAY_VER() comparisons instead. The
following semantic patch was used:
Matt Roper [Sat, 20 Mar 2021 04:42:41 +0000 (21:42 -0700)]
drm/i915: Add DISPLAY_VER() and related macros
Although we've long referred to platforms by a single "GEN" number, the
hardware teams have recommended that we stop doing this since the
various component IP blocks are going to start using independent number
schemes with varying cadence. To support this, hardware platforms a bit
down the road are going to start providing MMIO registers that the
driver can read to obtain the "graphics version," "media version," and
"display version" without needing to do a PCI ID -> platform -> version
translation.
Although our current platforms don't yet expose these registers (and the
next couple we release probably won't have them yet either), the
hardware teams would still like to see us move to this independent
numbering scheme now in preparation. For i915 that means we should try
to eliminate all usage of INTEL_GEN() throughout our code and instead
replace it with separate GRAPHICS_VER(), MEDIA_VER(), and DISPLAY_VER()
constructs in the code. For old platforms, these will all usually give
the same value for each IP block (aside from a few special cases like
GLK which we can no more accurately represent as graphics=9 +
display=10), but future platforms will have more flexibility to bump IP
version numbers independently.
The upcoming ADL-P platform will have a display version of 13 and a
graphics version of 12, so let's just the first step of breaking out
DISPLAY_VER(), but leaving the rest of INTEL_GEN() untouched for now.
For now we'll automatically derive the display version from the
platform's INTEL_GEN() value except in cases where an alternative
display version is explicitly provided in the device info structure.
We also add some helper macros IS_DISPLAY_VER(i915, ver) and
IS_DISPLAY_RANGE(i915, from, until) that match the behavior of the
existing gen-based macros. However unlike IS_GEN(), we will implement
those macros with direct comparisons rather than trying to maintain a
mask to help compiler optimization. In practice the optimization winds
up not being used in very many places (since the vast majority of our
platform checks are of the form "gen >= x") so there is pretty minimal
size reduction in the final driver binary[1]. We're also likely going
to need to extend these version numbers to non-integer major.minor
values at some point in the future, so the mask approach won't work at
all once we get to platforms like that.
[1] The results before/after the next patch in this series, which
switches our code over to the new display macros:
$ size i915.ko.{orig,new}
text data bss dec hex filename 2940291 102944 5384 3048619 2e84ab i915.ko.orig 2940723 102956 5384 3049063 2e8667 i915.ko.new
v2:
- Move version into device info's display sub-struct. (Jani)
- Add extra parentheses to macros. (Jani)
- Note the lack of genmask optimization in the display-based macros and
give size data. (Lucas)
Matt Roper [Sat, 20 Mar 2021 04:42:40 +0000 (21:42 -0700)]
drm/i915/display: Convert gen5/gen6 tests to IS_IRONLAKE/IS_SANDYBRIDGE
ILK is the only platform that we consider "gen5" and SNB is the only
platform we consider "gen6." Add an IS_SANDYBRIDGE() macro and then
replace numeric platform tests for these two generations with direct
platform tests with the following Coccinelle semantic patch:
Ville Syrjälä [Fri, 5 Mar 2021 15:36:05 +0000 (17:36 +0200)]
drm/i915: Fix enabled_planes bitmask
The enabled_planes bitmask was supposed to track logically enabled
planes (ie. fb!=NULL and crtc!=NULL), but instead we end up putting
even disabled planes into the bitmask since
intel_plane_atomic_check_with_state() only takes the early exit
if the plane was disabled and stays disabled. I think I misread
the early said codepath to exit whenever the plane is logically
disabled, which is not true.
So let's fix this up properly and set the bit only when the plane
actually is logically enabled.
Anshuman Gupta [Fri, 19 Mar 2021 10:02:08 +0000 (15:32 +0530)]
drm/i915/hdcp: return correct error code
hdcp2_enable_stream_encryption shouldn't get called in case
of any port authentication or encryption error, though
hdcp2_enable_stream_encryption checks for link encryption
before enabling stream encryption and returns error but
this return error code won't be correct in case of any error
due to port authentication and encryption.
Anshuman Gupta [Fri, 19 Mar 2021 10:02:07 +0000 (15:32 +0530)]
drm/i915/hdcp: link hdcp2 recovery on link enc stopped
When stream encryption enabling fails due to Link encryption status
has stopped, prepare HDCP2 for recovery by disabling port authentication
and encryption such that it can re-attempt port authentication
and encryption.
Anshuman Gupta [Fri, 19 Mar 2021 10:02:06 +0000 (15:32 +0530)]
drm/i915/hdcp: HDCP2.2 MST Link failure recovery
DP MST Link Check performed only for the connector involved with
HDCP port authentication and encryption, for other connector it
simply returns link check with true and update the uevent.
Therefore in case of HDCP 2.2 link failure, disable HDCP encryption
and de-authenticate the port so next time it can enable port
authentication and encryption.
Anshuman Gupta [Fri, 19 Mar 2021 09:17:32 +0000 (14:47 +0530)]
drm/i915/hdcp: mst streams type1 capability check
It requires to check streams type1 capability in mst topology
by checking Rxinfo instead connector HDCP2.x capability in
order to enforce type0 stream encryption in a mix of
HDCP {1.x,2.x} mst topology.
Rxcaps always shows HDCP 2.x capability of immediate downstream
connector. Let's use Rxinfo HDCP1_DEVICE_DOWNSTREAM bit to
detect a HDCP {1.x,2.x} mix mst topology.
Ville Syrjälä [Thu, 18 Mar 2021 16:10:14 +0000 (18:10 +0200)]
drm/i915: Introduce g4x_hdmi.c
Extract the g4x+ HDMI low level code to its own file,
leaving intel_hdmi.c to deal with higher level issues.
The infoframe support I decided to leave in intel_hdmi.c
since I think we need to move that as a whole to its own file.
It is after all used also for DP SDPs, so no longer HDMI
specific.
Ville Syrjälä [Thu, 18 Mar 2021 16:10:13 +0000 (18:10 +0200)]
drm/i915: Introduce g4x_dp.c
Move the g4x+ DP code into a new file. This will leave mostly
platform agnostic code in intel_dp.c. Well, the misplaced phy
test stuff pretty much ruins that, but let's squint real hard
for now.
v2: Add comment exlaining which platforms are covered (Daniel)
Leave intel_dp_unused_lane_mask() be since it is pretty generic
Imre Deak [Wed, 17 Mar 2021 19:01:49 +0000 (21:01 +0200)]
drm/i915: Disable LTTPR support when the DPCD rev < 1.4
By the specification the 0xF0000-0xF02FF range is only valid when the
DPCD revision is 1.4 or higher. Disable LTTPR support if this isn't so.
Trying to detect LTTPRs returned corrupted values for the above DPCD
range at least on a Skylake host with an LG 43UD79-B monitor with a DPCD
revision 1.2 connected.
v2: Add the actual version check.
v3: Fix s/DRPX/DPRX/ typo.
Fixes: 7b2a4ab8b0ef ("drm/i915: Switch to LTTPR transparent mode link training") Cc: <stable@vger.kernel.org> # v5.11 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210317190149.4032966-1-imre.deak@intel.com
Imre Deak [Wed, 17 Mar 2021 18:48:59 +0000 (20:48 +0200)]
drm/i915/ilk-glk: Fix link training on links with LTTPRs
The spec requires to use at least 3.2ms for the AUX timeout period if
there are LT-tunable PHY Repeaters on the link (2.11.2). An upcoming
spec update makes this more specific, by requiring a 3.2ms minimum
timeout period for the LTTPR detection reading the 0xF0000-0xF0007
range (3.6.5.1).
Accordingly disable LTTPR detection until GLK, where the maximum timeout
we can set is only 1.6ms.
Link training in the non-transparent mode is known to fail at least on
some SKL systems with a WD19 dock on the link, which exposes an LTTPR
(see the References below). While this could have different reasons
besides the too short AUX timeout used, not detecting LTTPRs (and so not
using the non-transparent LT mode) fixes link training on these systems.
While at it add a code comment about the platform specific maximum
timeout values.
v2: Add a comment about the g4x maximum timeout as well. (Ville)
Reported-by: Takashi Iwai <tiwai@suse.de> Reported-and-tested-by: Santiago Zarate <santiago.zarate@suse.com> Reported-and-tested-by: Bodo Graumann <mail@bodograumann.de>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3166 Fixes: b30edfd8d0b4 ("drm/i915: Switch to LTTPR non-transparent mode link training") Cc: <stable@vger.kernel.org> # v5.11 Cc: Takashi Iwai <tiwai@suse.de> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210317184901.4029798-2-imre.deak@intel.com
Jani Nikula [Wed, 17 Mar 2021 16:36:53 +0000 (18:36 +0200)]
drm/i915/bios: add intel_bios_encoder_data to encoder, use for iboost
Add intel_bios_encoder_data pointer to encoder, and use it for hdmi and
dp iboost. For starters, we only set the encoder->devdata for DDI
encoders, i.e. we can only use it for data that is used by DDI encoders.
Jani Nikula [Wed, 17 Mar 2021 16:36:52 +0000 (18:36 +0200)]
drm/i915/bios: start using intel_bios_encoder_data for Type-C USB and TBT
Stop caching the information in ddi_port_info. We're phasing out
ddi_port_info usage completely, and prefer using the VBT child device
information directly using the provided helpers.
v2:
- Remove supports_typec_usb & supports_tbt from ddi_vbt_port_info (Lucas)
Jani Nikula [Wed, 17 Mar 2021 16:36:51 +0000 (18:36 +0200)]
drm/i915/bios: start using the intel_bios_encoder_data directly
Start using struct intel_bios_encoder_data directly. We'll start
sanitizing the child device data directly as well, instead of the cached
data in ddi_port_info[]. The one downside here is having to store a
non-const pointer back to intel_bios_encoder_data.
Eventually we'll be able to have a direct pointer from encoder to
intel_bios_encoder_data, removing the need to go through the
ddi_port_info[] array altogether. And we'll be able to remove all the
cached data in ddi_port_info[].
v2:
- Remove supports_dp and supports_edp from ddi_port_info too
- Add devdata != NULL check in intel_bios_is_port_edp()
Jani Nikula [Wed, 17 Mar 2021 16:36:50 +0000 (18:36 +0200)]
drm/i915/bios: save a higher level pointer in ddi_vbt_port_info[]
We'll be needing the intel_bios_encoder_data pointer going forward, and
it's just easier to store the higher level pointer in the
ddi_vbt_port_info[] array.
Jani Nikula [Wed, 17 Mar 2021 16:36:46 +0000 (18:36 +0200)]
drm/i915/bios: create fake child devices on missing VBT
Instead of initialing data directly in ddi_port_info array, create fake
child devices for default outputs when the VBT is missing. This makes
further unification of output handling easier.
This will make intel_bios_is_port_present() return true for the fake
child devices. This may cause subtle changes in a handful of places.
Jani Nikula [Wed, 17 Mar 2021 16:36:45 +0000 (18:36 +0200)]
drm/i915/bios: limit default outputs to ports A through F
There are two main cases where the default outputs are useful when the
VBT is missing:
- There are some DDI-platform Chromebooks out there that do not have a
VBT, which worked by coincidence because of the default outputs. The
machines need to continue to work.
- Early platform enabling when the VBT might not be available. (This
could be circumvented by using the i915.vbt_firmware parameter.)
Prepare for generating fake child devices for the default outputs by
limiting the number of outputs. We don't want to generate excessive
amounts of fake child devices. This could be perhaps be limited even
more in the future, but match what's possible on all DDI platforms.
Note that limiting the defaults to non-TypeC ports in commit 828ccb31cf41 ("drm/i915/icl: Add TypeC ports only if VBT is present") is
a more strict limit, and makes this a no-op on recent platforms.
Jani Nikula [Wed, 17 Mar 2021 16:36:44 +0000 (18:36 +0200)]
drm/i915/bios: limit default outputs by platform on missing VBT
Pre-DDI and non-CHV aren't using the information created here anyway, so
don't bother setting the defaults for them. This should be a
non-functional change, but is separated here to catch any regressions in
a single commit.
i915/perf: Start hrtimer only if sampling the OA buffer
SAMPLE_OA parameter enables sampling of OA buffer and results in a call
to init the OA buffer which initializes the OA unit head/tail pointers.
The OA_EXPONENT parameter controls the periodicity of the OA reports in
the OA buffer and results in starting a hrtimer.
Before gen12, all use cases required the use of the OA buffer and i915
enforced this setting when vetting out the parameters passed. In these
platforms the hrtimer was enabled if OA_EXPONENT was passed. This worked
fine since it was implied that SAMPLE_OA is always passed.
With gen12, this changed. Users can use perf without enabling the OA
buffer as in OAR use cases. While an OAR use case should ideally not
start the hrtimer, we see that passing an OA_EXPONENT parameter will
start the hrtimer even though SAMPLE_OA is not specified. This results
in an uninitialized OA buffer, so the head/tail pointers used to track
the buffer are zero.
This itself does not fail, but if we ran a use-case that SAMPLED the OA
buffer previously, then the OA_TAIL register is still pointing to an old
value. When the timer callback runs, it ends up calculating a
wrong/large number of available reports. Since we do a spinlock_irq_save
and start processing a large number of reports, NMI watchdog fires and
causes a crash.
Start the timer only if SAMPLE_OA is specified.
v2:
- Drop SAMPLE OA check when appending samples (Ashutosh)
- Prevent read if OA buffer is not being sampled
Ville Syrjälä [Fri, 5 Mar 2021 15:36:08 +0000 (17:36 +0200)]
drm/i915: Calculate min_ddb_alloc for trans_wm
Let's make all the "do we have enough DDB for this WM level?"
checks use min_ddb_alloc. To achieve that we need to populate
this for the transition watermarks as well.
Ville Syrjälä [Fri, 5 Mar 2021 15:36:07 +0000 (17:36 +0200)]
drm/i915: Check SAGV wm min_ddb_alloc rather than plane_res_b
For non-transition watermarks we are supposed to check min_ddb_alloc
rather than plane_res_b when determining if we have enough DDB space
for it. A bit too much copy pasta made me check the wrong thing.
Ville Syrjälä [Fri, 5 Mar 2021 15:36:06 +0000 (17:36 +0200)]
drm/i915: Tighten SAGV constraint for pre-tgl
Say we have two planes enabled with watermarks configured
as follows:
plane A: wm0=enabled/can_sagv=false, wm1=enabled/can_sagv=true
plane B: wm0=enabled/can_sagv=true, wm1=disabled
This is possible since the latency we use to calculate
can_sagv may not be the same for both planes due to
skl_needs_memory_bw_wa().
In this case skl_crtc_can_enable_sagv() will see that
both planes have enabled at least one watermark level
with can_sagv==true, and thus proceeds to allow SAGV.
However, since plane B does not have wm1 enabled
plane A can't actually use it either. Thus we are
now running with SAGV enabled, but plane A can't
actually tolerate the extra latency it imposes.
To remedy this only allow SAGV on if the highest common
enabled watermark level for all active planes can tolerate
the extra SAGV latency.
Ville Syrjälä [Sat, 20 Feb 2021 10:33:03 +0000 (12:33 +0200)]
drm/i915: Workaround async flip + VT-d corruption on HSW/BDW
On HSW/BDW with VT-d active the first tile row scanned out
after the first async flip of the frame often ends up corrupted.
Whether the corruption happens or not depends on the scanline
on which the async flip happens, but the behaviour seems very
consistent. Ie. the same set of scanlines (which are most scanlines)
always show the corruption. And another set of scanlines (far less
of them) never shows the corruption.
I discovered that disabling the fetch-stride stretching
feature cures the corruption. This is some kind of TLB related
prefetch thing AFAIK. We already disable it on SNB primary
planes due to a documented workaround. The hardware folks
indicated that disabling this should be fine, so let's go
with that.
And while we're here, let's document the relevant bits on all
pre-skl platforms.
Ville Syrjälä [Wed, 10 Mar 2021 19:43:51 +0000 (21:43 +0200)]
drm/i915: Tolerate bogus DPLL selection
Let's check that we actually found the PLL before doing the
port_clock readout, just in case the hardware was severly
misprogrammed by the previous guy. Not sure the hw would
even survive such misprogramming without hanging but no
real harm in checking anyway.
Ville Syrjälä [Wed, 24 Feb 2021 14:42:14 +0000 (16:42 +0200)]
drm/i915: Extend icl_sanitize_encoder_pll_mapping() to all DDI platforms
Now that all the encoder clock stuff is uniformly abstracted
for all hsw+ platforms, let's extend icl_sanitize_encoder_pll_mapping()
to cover all of them.
Not sure there is a particular benefit in doing so, but less special
cases always makes me happy.
Ville Syrjälä [Wed, 24 Feb 2021 14:42:13 +0000 (16:42 +0200)]
drm/i915: Add encoder->is_clock_enabled()
Support reading out the current state of the DDI clock.
Not sure we really want this. Seems a bit excessive just to
restore the debug print to icl_sanitize_encoder_pll_mapping()?
But maybe there's more use for it?
Ville Syrjälä [Wed, 24 Feb 2021 14:42:12 +0000 (16:42 +0200)]
drm/i915: Move DDI clock readout to encoder->get_config()
Move the *_get_ddi_pll() stuff into the encodet->get_config() hook.
There it neatly sits next to the matching .{enable,disable}_clock()
functions.
In order to avoid excessive boilerplate I changed the behaviour
such that all platforms now do the readout via
crtc_state->port_dpll[].
ICL+ TC is still a bit special due to TBTPLL not having a functional
.get_freq(). Should probably change that by adopting the LCPLL
approach, but that would require a fairly substantial rework of the
DPLL ID handling. So leave it for later.
Ville Syrjälä [Wed, 24 Feb 2021 14:42:11 +0000 (16:42 +0200)]
drm/i915: Use pipes instead crtc indices in PLL state tracking
All the other places we have use pipes instead of crtc indices
when tracking resource usage. Life is easier when we do it
the same way always, so switch the dpll mgr to using pipes as
well. Looks like it was actually mixing these up in some cases
so it would not even have worked correctly except when the
device has a contiguous set of pipes starting from pipe A.
Granted, that is the typical case but supposedly it may not
always hold on modern hw.
Ville Syrjälä [Thu, 25 Feb 2021 16:12:25 +0000 (18:12 +0200)]
drm/i915: Do intel_dpll_readout_hw_state() after encoder readout
The clock readout for DDI encoders needs to moved into the encoders.
To that end intel_dpll_readout_hw_state() needs to happen after
the encoder readout as otherwise it can't correctly populate
the PLL crtc_mask/active_mask bitmasks.
v2: Populate DPLL ref clocks before the encoder->get_config()
Ville Syrjälä [Wed, 24 Feb 2021 14:42:09 +0000 (16:42 +0200)]
drm/i915: Call primary encoder's .get_config() from MST .get_config()
Stop assuming intel_ddi_get_config() is all we need from the primary
encoder, and instead call it via the .get_config() vfunc. This
will allow customized .get_config() for the primary, which I plan
to use to handle the differences in the clock readout between various
platforms.
Linus Torvalds [Sat, 6 Mar 2021 01:27:59 +0000 (17:27 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Nothing special here, though Bob's regression fixes for rxe would have
made it before the rc cycle had there not been such strong winter
weather!
- Fix corner cases in the rxe reference counting cleanup that are
causing regressions in blktests for SRP
- Two kdoc fixes so W=1 is clean
- Missing error return in error unwind for mlx5
- Wrong lock type nesting in IB CM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
RDMA/rxe: Fix missed IB reference counting in loopback
RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc
RDMA/mlx5: Set correct kernel-doc identifier
IB/mlx5: Add missing error code
RDMA/rxe: Fix missing kconfig dependency on CRYPTO
RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
Linus Torvalds [Sat, 6 Mar 2021 01:23:03 +0000 (17:23 -0800)]
Merge tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc-plugins fixes from Kees Cook:
"Tiny gcc-plugin fixes for v5.12-rc2. These issues are small but have
been reported a couple times now by static analyzers, so best to get
them fixed to reduce the noise. :)
- Fix coding style issues (Jason Yan)"
* tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: latent_entropy: remove unneeded semicolon
gcc-plugins: structleak: remove unneeded variable 'ret'
Linus Torvalds [Sat, 6 Mar 2021 01:21:25 +0000 (17:21 -0800)]
Merge tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Rate-limit ECC warnings (Dmitry Osipenko)
- Fix error path check for NULL (Tetsuo Handa)
* tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Rate-limit "uncorrectable error in header" message
pstore: Fix warning in pstore_kill_sb()
Linus Torvalds [Fri, 5 Mar 2021 21:25:23 +0000 (13:25 -0800)]
Merge tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Fix DM verity target's optional Forward Error Correction (FEC) for
Reed-Solomon roots that are unaligned to block size"
* tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm verity: fix FEC for RS roots unaligned to block size
dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size
Linus Torvalds [Fri, 5 Mar 2021 20:59:37 +0000 (12:59 -0800)]
Merge tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe fixes:
- more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal
Terjan)
- fix a hwmon error return (Daniel Wagner)
- fix the keep alive timeout initialization (Martin George)
- ensure the model_number can't be changed on a used subsystem
(Max Gurtovoy)
- rsxx missing -EFAULT on copy_to_user() failure (Dan)
- rsxx remove unused linux.h include (Tian)
- kill unused RQF_SORTED (Jean)
- updated outdated BFQ comments (Joseph)
- revert work-around commit for bd_size_lock, since we removed the
offending user in this merge window (Damien)
* tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block:
nvmet: model_number must be immutable once set
nvme-fabrics: fix kato initialization
nvme-hwmon: Return error code when registration fails
nvme-pci: add quirks for Lexar 256GB SSD
nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
rsxx: Return -EFAULT if copy_to_user() fails
block/bfq: update comments and default value in docs for fifo_expire
rsxx: remove unused including <linux/version.h>
block: Drop leftover references to RQF_SORTED
block: revert "block: fix bd_size_lock use"
Linus Torvalds [Fri, 5 Mar 2021 20:44:43 +0000 (12:44 -0800)]
Merge tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A bit of a mix between fallout from the worker change, cleanups and
reductions now possible from that change, and fixes in general. In
detail:
- Fully serialize manager and worker creation, fixing races due to
that.
- Clean up some naming that had gone stale.
- SQPOLL fixes.
- Fix race condition around task_work rework that went into this
merge window.
- Implement unshare. Used for when the original task does unshare(2)
or setuid/seteuid and friends, drops the original workers and forks
new ones.
- Drop the only remaining piece of state shuffling we had left, which
was cred. Move it into issue instead, and we can drop all of that
code too.
- Kill f_op->flush() usage. That was such a nasty hack that we had
out of necessity, we no longer need it.
- Following from ->flush() removal, we can also drop various bits of
ctx state related to SQPOLL and cancelations.
- Fix an issue with IOPOLL retry, which originally was fallout from a
filemap change (removing iov_iter_revert()), but uncovered an issue
with iovec re-import too late.
- Fix an issue with system suspend.
- Use xchg() for fallback work, instead of cmpxchg().
- Properly destroy io-wq on exec.
- Add create_io_thread() core helper, and use that in io-wq and
io_uring. This allows us to remove various silly completion events
related to thread setup.
- A few error handling fixes.
This should be the grunt of fixes necessary for the new workers, next
week should be quieter. We've got a pending series from Pavel on
cancelations, and how tasks and rings are indexed. Outside of that,
should just be minor fixes. Even with these fixes, we're still killing
a net ~80 lines"
* tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block: (41 commits)
io_uring: don't restrict issue_flags for io_openat
io_uring: make SQPOLL thread parking saner
io-wq: kill hashed waitqueue before manager exits
io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
io_uring: don't keep looping for more events if we can't flush overflow
io_uring: move to using create_io_thread()
kernel: provide create_io_thread() helper
io_uring: reliably cancel linked timeouts
io_uring: cancel-match based on flags
io-wq: ensure all pending work is canceled on exit
io_uring: ensure that threads freeze on suspend
io_uring: remove extra in_idle wake up
io_uring: inline __io_queue_async_work()
io_uring: inline io_req_clean_work()
io_uring: choose right tctx->io_wq for try cancel
io_uring: fix -EAGAIN retry with IOPOLL
io-wq: fix error path leak of buffered write hash map
io_uring: remove sqo_task
io_uring: kill sqo_dead and sqo submission halting
io_uring: ignore double poll add on the same waitqueue head
...
Linus Torvalds [Fri, 5 Mar 2021 20:36:33 +0000 (12:36 -0800)]
Merge tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the usage of device links in the runtime PM core code and
update the DTPM (Dynamic Thermal Power Management) feature added
recently.
Specifics:
- Make the runtime PM core code avoid attempting to suspend supplier
devices before updating the PM-runtime status of a consumer to
'suspended' (Rafael Wysocki).
- Fix DTPM (Dynamic Thermal Power Management) root node
initialization and label that feature as EXPERIMENTAL in Kconfig
(Daniel Lezcano)"
* tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap/drivers/dtpm: Add the experimental label to the option description
powercap/drivers/dtpm: Fix root node initialization
PM: runtime: Update device status before letting suppliers suspend
Linus Torvalds [Fri, 5 Mar 2021 20:32:17 +0000 (12:32 -0800)]
Merge tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make the empty stubs of some helper functions used when CONFIG_ACPI is
not set actually match those functions (Andy Shevchenko)"
* tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: bus: Constify is_acpi_node() and friends (part 2)
Linus Torvalds [Fri, 5 Mar 2021 20:26:24 +0000 (12:26 -0800)]
Merge tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix a sleeping-while-atomic issue in the AMD IOMMU code
- Disable lazy IOTLB flush for untrusted devices in the Intel VT-d
driver
- Fix status code definitions for Intel VT-d
- Fix IO Page Fault issue in Tegra IOMMU driver
* tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix status code for Allocate/Free PASID command
iommu: Don't use lazy flush for untrusted device
iommu/tegra-smmu: Fix mc errors on tegra124-nyan
iommu/amd: Fix sleeping in atomic in increase_address_space()
Linus Torvalds [Fri, 5 Mar 2021 20:21:14 +0000 (12:21 -0800)]
Merge tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"More regression fixes and stabilization.
Regressions:
- zoned mode
- count zone sizes in wider int types
- fix space accounting for read-only block groups
- subpage: fix page tail zeroing
Fixes:
- fix spurious warning when remounting with free space tree
- fix warning when creating a directory with smack enabled
- ioctl checks for qgroup inheritance when creating a snapshot
- qgroup
- fix missing unlock on error path in zero range
- fix amount of released reservation on error
- fix flushing from unsafe context with open transaction,
potentially deadlocking
- minor build warning fixes"
* tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: do not account freed region of read-only block group as zone_unusable
btrfs: zoned: use sector_t for zone sectors
btrfs: subpage: fix the false data csum mismatch error
btrfs: fix warning when creating a directory with smack enabled
btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
btrfs: export and rename qgroup_reserve_meta
btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
btrfs: fix spurious free_space_tree remount warning
btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
btrfs: ref-verify: use 'inline void' keyword ordering
Linus Torvalds [Fri, 5 Mar 2021 20:12:28 +0000 (12:12 -0800)]
Merge tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Another batch of graph and video-interfaces schema conversions
- Drop DT header symlink for dropped C6X arch
- Fix bcm2711-hdmi schema error
* tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: media: Use graph and video-interfaces schemas, round 2
dts: drop dangling c6x symlink
dt-bindings: bcm2711-hdmi: Fix broken schema
Linus Torvalds [Fri, 5 Mar 2021 20:04:59 +0000 (12:04 -0800)]
Merge tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Functional fixes:
- Fix big endian conversion for arm64 in recordmcount processing
- Fix timestamp corruption in ring buffer on discarding events
- Fix memory leak in __create_synth_event()
- Skip selftests if tracing is disabled as it will cause them to
fail.
Non-functional fixes:
- Fix help text in Kconfig
- Remove duplicate prototype for trace_empty()
- Fix stale comment about the trace_event_call flags.
Self test update:
- Add more information to the validation output of when a corrupt
timestamp is found in the ring buffer, and also trigger a warning
to make sure that tests catch it"
* tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix comment about the trace_event_call flags
tracing: Skip selftests if tracing is disabled
tracing: Fix memory leak in __create_synth_event()
ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected
ring-buffer: Force before_stamp and write_stamp to be different on discard
tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig
tracing: Remove duplicate declaration from trace.h
ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount
Bob Pearson [Thu, 4 Mar 2021 19:20:49 +0000 (13:20 -0600)]
RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb
which triggered a warning at 'done:' and could possibly at 'exit:'. The
WARN_ONCE() calls are not actually needed. The call to free_pkt() is
moved to the end to clearly show that all skbs are freed.
Bob Pearson [Thu, 4 Mar 2021 19:20:49 +0000 (13:20 -0600)]
RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error
occurred causing an underflow on the reference counter. This code is
cleaned up to be clearer and easier to read.
Bob Pearson [Thu, 4 Mar 2021 19:20:49 +0000 (13:20 -0600)]
RDMA/rxe: Fix missed IB reference counting in loopback
When the noted patch below extending the reference taken by
rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it
was not matched by a reference in the loopback path resulting in
underflows.
Pavel Begunkov [Sun, 28 Feb 2021 22:35:14 +0000 (22:35 +0000)]
io_uring: don't restrict issue_flags for io_openat
45d189c606292 ("io_uring: replace force_nonblock with flags") did
something strange for io_openat() slicing all issue_flags but
IO_URING_F_NONBLOCK. Not a bug for now, but better to just forward the
flags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 5 Mar 2021 16:13:07 +0000 (09:13 -0700)]
Merge tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme into block-5.12
Pull NVMe fixes from Christoph:
"nvme fixes for 5.12:
- more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal Terjan)
- fix a hwmon error return (Daniel Wagner)
- fix the keep alive timeout initialization (Martin George)
- ensure the model_number can't be changed on a used subsystem
(Max Gurtovoy)"
* tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme:
nvmet: model_number must be immutable once set
nvme-fabrics: fix kato initialization
nvme-hwmon: Return error code when registration fails
nvme-pci: add quirks for Lexar 256GB SSD
nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
Jens Axboe [Fri, 5 Mar 2021 15:44:39 +0000 (08:44 -0700)]
io_uring: make SQPOLL thread parking saner
We have this weird true/false return from parking, and then some of the
callers decide to look at that. It can lead to unbalanced parks and
sqd locking. Have the callers check the thread status once it's parked.
We know we have the lock at that point, so it's either valid or it's NULL.
Fix race with parking on thread exit. We need to be careful here with
ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit.
Rename sqd->completion to sqd->parked to reflect that this is the only
thing this completion event doesn.
Jens Axboe [Fri, 5 Mar 2021 15:14:08 +0000 (08:14 -0700)]
io-wq: kill hashed waitqueue before manager exits
If we race with shutting down the io-wq context and someone queueing
a hashed entry, then we can exit the manager with it armed. If it then
triggers after the manager has exited, we can have a use-after-free where
io_wqe_hash_wake() attempts to wake a now gone manager process.
Move the killing of the hashed write queue into the manager itself, so
that we know we've killed it before the task exits.
Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx") Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 5 Mar 2021 04:02:58 +0000 (21:02 -0700)]
io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
The callback can only be armed, if we get -EIOCBQUEUED returned. It's
important that we clear the WAITQ bit for other cases, otherwise we can
queue for async retry and filemap will assume that we're armed and
return -EAGAIN instead of just blocking for the IO.
Jens Axboe [Fri, 5 Mar 2021 00:15:48 +0000 (17:15 -0700)]
io_uring: don't keep looping for more events if we can't flush overflow
It doesn't make sense to wait for more events to come in, if we can't
even flush the overflow we already have to the ring. Return -EBUSY for
that condition, just like we do for attempts to submit with overflow
pending.
Jens Axboe [Thu, 4 Mar 2021 19:39:36 +0000 (12:39 -0700)]
io_uring: move to using create_io_thread()
This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread. Get rid of
the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
completion events - we can now do setup before the task is running.
Ville Syrjälä [Thu, 4 Mar 2021 17:04:21 +0000 (19:04 +0200)]
drm/i915: Fix DSI TE max_vblank_count handling
commit 33267703df15 ("drm/i915/dsi: Enable software vblank counter")
claims to get the mode_flags from the crtc_state, but in fact does
not. Fix it to do it right.
Ville Syrjälä [Thu, 4 Mar 2021 17:04:20 +0000 (19:04 +0200)]
drm/i915: Return zero as the scanline counter for disabled pipes
We print the scanline counters as unsigned integers so the -1
here just makes the debugs/traces look a bit messy. Zero seems
equally valid for this usecase.
Ville Syrjälä [Thu, 4 Mar 2021 17:04:19 +0000 (19:04 +0200)]
drm/i915: Don't try to query the frame counter for disabled pipes
For platforms/outputs without hardware frame counters we can't
call drm_crtc_accurate_vblank_count() when the vblank support is
disabled or we just get a WARN due to the crtc timings
(vblank->hwmode) being considered invalid. Note that until the
pipe in question has been enabled and drm_crtc_set_max_vblank_count()
has been called on it we would also take this path on platforms
which have a working frame counter. So getting the WARN is rather
likely on any platform unless you always boot with lots of displays
plugged in.
Also even on hardware with a working frame counter we may not be
able to read the actual frame counter register on disabled pipes
due the relevant power well being disabled. Ie. would just result
in the unclaimed reg spew.
So let's just avoid all this an directly report zero in case
the pipe is disabled.
Ville Syrjälä [Thu, 4 Mar 2021 17:04:18 +0000 (19:04 +0200)]
drm/i915: Move pipe enable/disable tracepoints to intel_crtc_vblank_{on,off}()
On platforms/outputs without a working frame counter we rely
on the vblank code to cook up the frame counter from the timestamps.
That requires that vblank support is enabled. Thus we need to
move the pipe enable/disable tracepoints to the other side
of the drm_vblank_{on,off}() calls. There shouldn't really be
much happening between these old and new call sites so the
tracepoints should still provide reasonable data.
The alternative would be to give up on having the frame counter
values in the trace which would render the tracepoints more or
less pointless.
v2: Missed one case in intel_ddi_post_disable()
Drop the now useless i915_trace.h includes
Max Gurtovoy [Wed, 17 Feb 2021 17:19:40 +0000 (17:19 +0000)]
nvmet: model_number must be immutable once set
In case we have already established connection to nvmf target, it
shouldn't be allowed to change the model_number. E.g. if someone will
identify ctrl and get model_number of "my_model" later on will change
the model_numbel via configfs to "my_new_model" this will break the NVMe
specification for "Get Log Page – Persistent Event Log" that refers to
Model Number as: "This field contains the same value as reported in the
Model Number field of the Identify Controller data structure, bytes
63:24."
Although it doesn't mentioned explicitly that this field can't be
changed, we can assume it.
So allow setting this field only once: using configfs or in the first
identify ctrl operation.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Martin George [Thu, 11 Feb 2021 17:58:26 +0000 (23:28 +0530)]
nvme-fabrics: fix kato initialization
Currently kato is initialized to NVME_DEFAULT_KATO for both
discovery & i/o controllers. This is a problem specifically
for non-persistent discovery controllers since it always ends
up with a non-zero kato value. Fix this by initializing kato
to zero instead, and ensuring various controllers are assigned
appropriate kato values as follows:
non-persistent controllers - kato set to zero
persistent controllers - kato set to NVMF_DEV_DISC_TMO
(or any positive int via nvme-cli)
i/o controllers - kato set to NVME_DEFAULT_KATO
(or any positive int via nvme-cli)
Signed-off-by: Martin George <marting@netapp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Daniel Wagner [Fri, 12 Feb 2021 09:30:15 +0000 (10:30 +0100)]
nvme-hwmon: Return error code when registration fails
The hwmon pointer wont be NULL if the registration fails. Though the
exit code path will assign it to ctrl->hwmon_device. Later
nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by
returning the error code from hwmon_device_register_with_info().
Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation") Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
My 2TB SKC2000 showed the exact same symptoms that were provided
in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on
Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed
cold boot to get it back.
According to some sources, the A2000 is simply a rebadged
SKC2000 with a slightly optimized firmware.
Adding the SKC2000 PCI ID to the quirk list with the same workaround
as the A2000 made my laptop survive a 5 hours long Yocto bootstrap
buildfest which reliably triggered the SSD lockup previously.
Signed-off-by: Zoltán Böszörményi <zboszor@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Linus Torvalds [Fri, 5 Mar 2021 03:06:28 +0000 (19:06 -0800)]
Merge tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"More may show up but this is what I have at this stage: just a single
nouveau regression fix, and a bunch of amdgpu fixes.
amdgpu:
- S0ix fix
- Handle new NV12 SKU
- Misc power fixes
- Display uninitialized value fix
- PCIE debugfs register access fix
nouveau:
- regression fix for gk104"
* tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
drm/amd/display: fix the return of the uninitialized value in ret
drm/amdgpu: enable BACO runpm by default on sienna cichlid and navy flounder
drm/amd/pm: correct Arcturus mmTHM_BACO_CNTL register address
drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable
drm/amdgpu/pm: make unsupported power profile messages debug
drm/amdgpu:disable VCN for Navi12 SKU
drm/amdgpu: Only check for S0ix if AMD_PMC is configured
drm/nouveau/fifo/gk104-gp1xx: fix creation of sw class
Linus Torvalds [Fri, 5 Mar 2021 02:53:30 +0000 (18:53 -0800)]
Merge tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi
Pull iSCSI fixes from Martin Petersen:
"Three fixes for missed iSCSI verification checks (and make the sysfs
files use "sysfs_emit()" - that's what it is there for)"
* tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: iscsi: Verify lengths on passthrough PDUs
scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
scsi: iscsi: Restrict sessions and handles to admin capabilities
Dave Airlie [Fri, 5 Mar 2021 00:55:57 +0000 (10:55 +1000)]
Merge branch '00.00-inst' of git://github.com/skeggsb/linux into drm-fixes
A single regression fix here that I noticed while testing a bunch of
boards for something else, not sure where this got lost! Prevents 3D
driver from initialising on some GPUs.
Chris Leech [Wed, 24 Feb 2021 05:39:01 +0000 (21:39 -0800)]
scsi: iscsi: Verify lengths on passthrough PDUs
Open-iSCSI sends passthrough PDUs over netlink, but the kernel should be
verifying that the provided PDU header and data lengths fall within the
netlink message to prevent accessing beyond that in memory.
Cc: stable@vger.kernel.org Reported-by: Adam Nichols <adam@grimm-co.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Chris Leech [Wed, 24 Feb 2021 02:00:17 +0000 (18:00 -0800)]
scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
As the iSCSI parameters are exported back through sysfs, it should be
enforcing that they never are more than PAGE_SIZE (which should be more
than enough) before accepting updates through netlink.
Change all iSCSI sysfs attributes to use sysfs_emit().
Cc: stable@vger.kernel.org Reported-by: Adam Nichols <adam@grimm-co.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Lee Duncan [Tue, 23 Feb 2021 21:06:24 +0000 (13:06 -0800)]
scsi: iscsi: Restrict sessions and handles to admin capabilities
Protect the iSCSI transport handle, available in sysfs, by requiring
CAP_SYS_ADMIN to read it. Also protect the netlink socket by restricting
reception of messages to ones sent with CAP_SYS_ADMIN. This disables
normal users from being able to end arbitrary iSCSI sessions.
Cc: stable@vger.kernel.org Reported-by: Adam Nichols <adam@grimm-co.com> Reviewed-by: Chris Leech <cleech@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jens Axboe [Thu, 4 Mar 2021 19:21:05 +0000 (12:21 -0700)]
kernel: provide create_io_thread() helper
Provide a generic helper for setting up an io_uring worker. Returns a
task_struct so that the caller can do whatever setup is needed, then call
wake_up_new_task() to kick it into gear.
Add a kernel_clone_args member, io_thread, which tells copy_process() to
mark the task with PF_IO_WORKER.
Pavel Begunkov [Thu, 4 Mar 2021 13:59:25 +0000 (13:59 +0000)]
io_uring: reliably cancel linked timeouts
Linked timeouts are fired asynchronously (i.e. soft-irq), and use
generic cancellation paths to do its stuff, including poking into io-wq.
The problem is that it's racy to access tctx->io_wq, as
io_uring_task_cancel() and others may be happening at this exact moment.
Mark linked timeouts with REQ_F_INLIFGHT for now, making sure there are
no timeouts before io-wq destraction.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>