Jens Axboe [Sun, 26 Jan 2020 17:17:12 +0000 (10:17 -0700)]
io_uring: don't cancel all work on process exit
If we're sharing the ring across forks, then one process exiting means
that we cancel ALL work and prevent future work. This is overly
restrictive. As long as we cancel the work associated with the files
from the current task, it's safe to let others persist. Normal fd close
on exit will still wait (and cancel) pending work.
Fixes: fcb323cc53e2 ("io_uring: io_uring: add support for async work inheriting files") Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sun, 26 Jan 2020 16:53:12 +0000 (09:53 -0700)]
Revert "io_uring: only allow submit from owning task"
This ends up being too restrictive for tasks that willingly fork and
share the ring between forks. Andres reports that this breaks his
postgresql work. Since we're close to 5.5 release, revert this change
for now.
Cc: stable@vger.kernel.org Fixes: 44d282796f81 ("io_uring: only allow submit from owning task") Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
fds field of struct io_uring_files_update is problematic with regards
to compat user space, as pointer size is different in 32-bit, 32-on-64-bit,
and 64-bit user space. In order to avoid custom handling of compat in
the syscall implementation, make fds __u64 and use u64_to_user_ptr in
order to retrieve it. Also, align the field naturally and check that
no garbage is passed there.
Fixes: c3a31e605620c279 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 17 Jan 2020 02:00:24 +0000 (19:00 -0700)]
io_uring: only allow submit from owning task
If the credentials or the mm doesn't match, don't allow the task to
submit anything on behalf of this ring. The task that owns the ring can
pass the file descriptor to another task, but we don't want to allow
that task to submit an SQE that then assumes the ring mm and creds if
it needs to go async.
Cc: stable@vger.kernel.org Suggested-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 16 Jan 2020 04:51:17 +0000 (21:51 -0700)]
io_uring: ensure workqueue offload grabs ring mutex for poll list
A previous commit moved the locking for the async sqthread, but didn't
take into account that the io-wq workers still need it. We can't use
req->in_async for this anymore as both the sqthread and io-wq workers
set it, gate the need for locking on io_wq_current_is_worker() instead.
Fixes: 8a4955ff1cca ("io_uring: sqthread should grab ctx->uring_lock for submissions") Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bijan Mottahedeh [Thu, 16 Jan 2020 02:37:45 +0000 (18:37 -0800)]
io_uring: clear req->result always before issuing a read/write request
req->result is cleared when io_issue_sqe() calls io_read/write_pre()
routines. Those routines however are not called when the sqe
argument is NULL, which is the case when io_issue_sqe() is called from
io_wq_submit_work(). io_issue_sqe() may then examine a stale result if
a polled request had previously failed with -EAGAIN:
if (ctx->flags & IORING_SETUP_IOPOLL) {
if (req->result == -EAGAIN)
return -EAGAIN;
io_iopoll_req_issued(req);
}
and in turn cause a subsequently completed request to be re-issued in
io_wq_submit_work().
Jens Axboe [Wed, 15 Jan 2020 05:09:06 +0000 (22:09 -0700)]
io_uring: be consistent in assigning next work from handler
If we pass back dependent work in case of links, we need to always
ensure that we call the link setup and work prep handler. If not, we
might be missing some setup for the next work item.
Jens Axboe [Tue, 14 Jan 2020 02:23:24 +0000 (19:23 -0700)]
io_uring: don't setup async context for read/write fixed
We don't need it, and if we have it, then the retry handler will attempt
to copy the non-existent iovec with the inline iovec, with a segment
count that doesn't make sense.
Jens Axboe [Tue, 7 Jan 2020 20:08:56 +0000 (13:08 -0700)]
io_uring: remove punt of short reads to async context
We currently punt any short read on a regular file to async context,
but this fails if the short read is due to running into EOF. This is
especially problematic since we only do the single prep for commands
now, as we don't reset kiocb->ki_pos. This can result in a 4k read on
a 1k file returning zero, as we detect the short read and then retry
from async context. At the time of retry, the position is now 1k, and
we end up reading nothing, and hence return 0.
Instead of trying to patch around the fact that short reads can be
legitimate and won't succeed in case of retry, remove the logic to punt
a short read to async context. Simply return it.
Hillf Danton [Sun, 22 Dec 2019 14:46:54 +0000 (22:46 +0800)]
io-wq: remove unused busy list from io_sqe
Commit e61df66c69b1 ("io-wq: ensure free/busy list browsing see all
items") added a list for io workers in addition to the free and busy
lists, not only making worker walk cleaner, but leaving the busy list
unused. Let's remove it.
Jens Axboe [Fri, 20 Dec 2019 01:24:38 +0000 (18:24 -0700)]
io_uring: pass in 'sqe' to the prep handlers
This moves the prep handlers outside of the opcode handlers, and allows
us to pass in the sqe directly. If the sqe is non-NULL, it means that
the request should be prepared for the first time.
With the opcode handlers not having access to the sqe at all, we are
guaranteed that the prep handler has setup the request fully by the
time we get there. As before, for opcodes that need to copy in more
data then the io_kiocb allows for, the io_async_ctx holds that info. If
a prep handler is invoked with req->io set, it must use that to retain
information for later.
Jens Axboe [Thu, 19 Dec 2019 21:44:26 +0000 (14:44 -0700)]
io_uring: standardize the prep methods
We currently have a mix of use cases. Most of the newer ones are pretty
uniform, but we have some older ones that use different calling
calling conventions. This is confusing.
For the opcodes that currently rely on the req->io->sqe copy saving
them from reuse, add a request type struct in the io_kiocb command
union to store the data they need.
Prepare for all opcodes having a standard prep method, so we can call
it in a uniform fashion and outside of the opcode handler. This is in
preparation for passing in the 'sqe' pointer, rather than storing it
in the io_kiocb. Once we have uniform prep handlers, we can leave all
the prep work to that part, and not even pass in the sqe to the opcode
handler. This ensures that we don't reuse sqe data inadvertently.
Jens Axboe [Fri, 20 Dec 2019 16:02:01 +0000 (09:02 -0700)]
io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler
Add the count field to struct io_timeout, and ensure the prep handler
has read it. Timeout also needs an async context always, set it up
in the prep handler if we don't have one.
Jens Axboe [Fri, 20 Dec 2019 15:58:21 +0000 (08:58 -0700)]
io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler
Add struct io_sr_msg in our io_kiocb per-command union, and ensure that
the send/recvmsg prep handlers have grabbed what they need from the SQE
by the time prep is done.
Jens Axboe [Fri, 20 Dec 2019 15:45:55 +0000 (08:45 -0700)]
io_uring: add and use struct io_rw for read/writes
Put the kiocb in struct io_rw, and add the addr/len for the request as
well. Use the kiocb->private field for the buffer index for fixed reads
and writes.
Any use of kiocb->ki_filp is flipped to req->file. It's the same thing,
and less confusing.
Jens Axboe [Wed, 18 Dec 2019 19:19:41 +0000 (12:19 -0700)]
io_uring: io_wq_submit_work() should not touch req->rw
I've been chasing a weird and obscure crash that was userspace stack
corruption, and finally narrowed it down to a bit flip that made a
stack address invalid. io_wq_submit_work() unconditionally flips
the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work
handler, this isn't valid. Normal read/write operations own that
part of the request, on other types it could be something else.
Move the IOCB_NOWAIT clear to the read/write handlers where it belongs.
Pavel Begunkov [Wed, 18 Dec 2019 16:53:45 +0000 (19:53 +0300)]
io_uring: don't wait when under-submitting
There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.
Don't wait/poll if can't submit all sqes
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 18 Dec 2019 02:45:06 +0000 (19:45 -0700)]
io_uring: warn about unhandled opcode
Now that we have all the opcodes handled in terms of command prep and
SQE reuse, add a printk_once() to warn about any potentially new and
unhandled ones.
Jens Axboe [Wed, 18 Dec 2019 02:53:05 +0000 (19:53 -0700)]
io_uring: read opcode and user_data from SQE exactly once
If we defer a request, we can't be reading the opcode again. Ensure that
the user_data and opcode fields are stable. For the user_data we already
have a place for it, for the opcode we can fill a one byte hold and store
that as well. For both of them, assign them when we originally read the
SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data
is switched to req->opcode and req->user_data.
Jens Axboe [Wed, 18 Dec 2019 01:50:29 +0000 (18:50 -0700)]
io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable
If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the timeout remove op into
the prep handling to make it safe for SQE reuse.
Jens Axboe [Wed, 18 Dec 2019 01:45:56 +0000 (18:45 -0700)]
io_uring: make IORING_OP_CANCEL_ASYNC deferrable
If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the async cancel op into
the prep handling to make it safe for SQE reuse.
Jens Axboe [Wed, 18 Dec 2019 01:40:57 +0000 (18:40 -0700)]
io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable
If we defer these commands as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the poll add/remove into
the prep handling to make it safe for SQE reuse.
Pavel Begunkov [Tue, 17 Dec 2019 17:57:05 +0000 (20:57 +0300)]
io_uring: make HARDLINK imply LINK
The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a
link and there is no need to set IOSQE_IO_LINK separately, though it
could be there. Add proper check and ensure that IOSQE_IO_HARDLINK
implies IOSQE_IO_LINK.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 16 Dec 2019 18:55:28 +0000 (11:55 -0700)]
io_uring: any deferred command must have stable sqe data
We're currently not retaining sqe data for accept, fsync, and
sync_file_range. None of these commands need data outside of what
is directly provided, hence it can't go stale when the request is
deferred. However, it can get reused, if an application reuses
SQE entries.
Ensure that we retain the information we need and only read the sqe
contents once, off the submission path. Most of this is just moving
code into a prep and finish function.
Jens Axboe [Tue, 10 Dec 2019 21:38:45 +0000 (14:38 -0700)]
io_uring: remove 'sqe' parameter to the OP helpers that take it
We pass in req->sqe for all of them, no need to pass it in as the
request is always passed in. This is a necessary prep patch to be
able to cleanup/fix the request prep path.
Jens Axboe [Mon, 16 Dec 2019 05:13:43 +0000 (22:13 -0700)]
io_uring: fix pre-prepped issue with force_nonblock == true
Some of these code paths assume that any force_nonblock == true issue
is not prepped, but that's not true if we did prep as part of link setup
earlier. Check if we already have an async context allocate before
setting up a new one.
Cleanup the async context setup in general, we have a lot of duplicated
code there.
Jens Axboe [Sun, 15 Dec 2019 17:57:46 +0000 (10:57 -0700)]
io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG
If we have to punt the recvmsg to async context, we copy all the
context. But since the iovec used can be either on-stack (if small) or
dynamically allocated, if it's on-stack, then we need to ensure we reset
the iov pointer. If we don't, then we're reusing old stack data, and
that can lead to -EFAULTs if things get overwritten.
Ensure we retain the right pointers for the iov, and free it as well if
we end up having to go beyond UIO_FASTIOV number of vectors.
Linus Torvalds [Fri, 13 Dec 2019 22:58:24 +0000 (14:58 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Some fixes and cleanup patches"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: divide/multiply instead of shifts
virtio_balloon: name cleanups
virtio-balloon: fix managed page counts when migrating pages between zones
Linus Torvalds [Fri, 13 Dec 2019 22:45:40 +0000 (14:45 -0800)]
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
- removal of an old API where all in-kernel users have been converted
as of this merge window.
- a kdoc fix
- a new helper that will make dependencies for the next API conversion
a tad easier
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: add helper to check if a client has a driver attached
i2c: fix header file kernel-doc warning
i2c: remove i2c_new_dummy() API
Linus Torvalds [Fri, 13 Dec 2019 22:43:26 +0000 (14:43 -0800)]
Merge tag 'pm-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These add PM QoS support to devfreq and fix a few issues in that
subsystem, fix two cpuidle issues and do one minor cleanup in there,
and address an ACPI power management problem related to devices with
special power management requirements, like fans.
Specifics:
- Add PM QoS support, based on the frequency QoS introduced during
the 5.4 cycle, to devfreq (Leonard Crestez).
- Fix some assorted devfreq issues (Leonard Crestez).
- Fix an unintentional cpuidle behavior change (introduced during the
5.4 cycle) related to the active polling time limit (Marcelo
Tosatti).
- Fix a recently introduced cpuidle helper function and do a minor
cleanup in the cpuidle core (Rafael Wysocki).
- Avoid adding devices with special power management requirements,
like fans, to the generic ACPI PM domain (Rafael Wysocki)"
* tag 'pm-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: Drop unnecessary type cast in cpuidle_poll_time()
cpuidle: Fix cpuidle_driver_state_disabled()
ACPI: PM: Avoid attaching ACPI PM domain to certain devices
cpuidle: use first valid target residency as poll time
PM / devfreq: Use PM QoS for sysfs min/max_freq
PM / devfreq: Add PM QoS support
PM / devfreq: Don't fail devfreq_dev_release if not in list
PM / devfreq: Introduce get_freq_range helper
PM / devfreq: Set scaling_max_freq to max on OPP notifier error
PM / devfreq: Fix devfreq_notifier_call returning errno
Linus Torvalds [Fri, 13 Dec 2019 22:40:38 +0000 (14:40 -0800)]
Merge tag 'sound-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A small collection of fixes.
The main changes are fixes for a couple of regressions in AMD HD-audio
and FireWire that were introduced in 5.5-rc1. The rest are small fixes
for echoaudio and FireWire, as well as a usual Dell HD-audio fixup"
* tag 'sound-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
ALSA: hda/hdmi - Fix duplicate unref of pci_dev
ALSA: fireface: fix return value in error path of isochronous resources reservation
ALSA: oxfw: fix return value in error path of isochronous resources reservation
ALSA: firewire-motu: fix double unlocked 'motu->mutex'
ALSA: echoaudio: simplify get_audio_levels
Linus Torvalds [Fri, 13 Dec 2019 22:36:45 +0000 (14:36 -0800)]
Merge tag 'drm-fixes-2019-12-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Usual round of rc2 fixes.
i915 and amdgpu leading the charge, but a few others in here,
including some nouveau fixes, all seems pretty for rc2, but hey it's a
Fri 13th pull so I'm sure it'll cause untold bad fortune.
i915:
- GPU hang on idle transition
- GLK+ FBC corruption fix
- non-priv OA access on Tigerlake
- HDCP state fix
- CI found race fixes
amdgpu:
- renoir DC fixes
- GFX8 fence flush alignment with userspace
- Arcturus power profile fix
- DC aux + i2c over aux fixes
- GPUVM invalidation semaphore fixes
- gfx10 golden registers update
mgag200:
- expand startadd fix
panfrost:
- devfreq fix
- memory fixes
mcde:
- DSI pointer deref fix"
* tag 'drm-fixes-2019-12-13' of git://anongit.freedesktop.org/drm/drm: (51 commits)
drm/amdgpu: add invalidate semaphore limit for SRIOV in gmc10
drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9
drm/amdgpu: avoid using invalidate semaphore for picasso
Revert "drm/amdgpu: dont schedule jobs while in reset"
drm/amdgpu: fix license on Kconfig and Makefiles
drm/amdgpu/gfx10: update gfx golden settings for navi14
drm/amdgpu/gfx10: update gfx golden settings
drm/amdgpu/gfx10: update gfx golden settings for navi14
drm/amdgpu/gfx10: update gfx golden settings
drm/i915: Serialise with remote retirement
drm/amd/display: include linux/slab.h where needed
drm/amd/display: fix undefined struct member reference
drm/nouveau/kms/nv50-: fix panel scaling
drm/nouveau/kms/nv50-: Limit MST BPC to 8
drm/nouveau/kms/nv50-: Store the bpc we're using in nv50_head_atom
drm/nouveau/kms/nv50-: Call outp_atomic_check_view() before handling PBN
drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware
drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
drm/i915/gt: Detect if we miss WaIdleLiteRestore
drm/i915/hdcp: Nuke intel_hdcp_transcoder_config()
...
Linus Torvalds [Fri, 13 Dec 2019 22:27:19 +0000 (14:27 -0800)]
Merge tag 'for-linus-20191212' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- stable fix for the bi_size overflow. Not a corruption issue, but a
case wher we could merge but disallowed (Andreas)
- NVMe pull request via Keith, with various fixes.
- MD pull request from Song.
- Merge window regression fix for the rq passthrough stats (Logan)
- Remove unused blkcg_drain_queue() function (Guoqing)
* tag 'for-linus-20191212' of git://git.kernel.dk/linux-block:
blk-cgroup: remove blkcg_drain_queue
block: fix NULL pointer dereference in account statistics with IDE
md: make sure desc_nr less than MD_SB_DISKS
md: raid1: check rdev before reference in raid1_sync_request func
raid5: need to set STRIPE_HANDLE for batch head
block: fix "check bi_size overflow before merge"
nvme/pci: Fix read queue count
nvme/pci Limit write queue sizes to possible cpus
nvme/pci: Fix write and poll queue types
nvme/pci: Remove last_cq_head
nvme: Namepace identification descriptor list is optional
nvme-fc: fix double-free scenarios on hw queues
nvme: else following return is not needed
nvme: add error message on mismatching controller ids
nvme_fc: add module to ops template to allow module references
nvmet-loop: Avoid preallocating big SGL for data
nvme-fc: Avoid preallocating big SGL for data
nvme-rdma: Avoid preallocating big SGL for data
Linus Torvalds [Fri, 13 Dec 2019 22:24:54 +0000 (14:24 -0800)]
Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that
don't sever if the result is < 0.
This is mostly for linked timeouts, where if we ask for a pure
timeout we always get -ETIME. This makes links useless for that case,
hence allow a case where it works.
- Five minor optimizations to fix and improve cases that regressed
since v5.4.
- An SQTHREAD locking fix.
- A sendmsg/recvmsg iov assignment fix.
- Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and
subsequently ensuring that works for io_uring.
- Fix a case where for an invalid opcode we might return -EBADF instead
of -EINVAL, if the ->fd of that sqe was set to an invalid fd value.
* tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block:
io_uring: ensure we return -EINVAL on unknown opcode
io_uring: add sockets to list of files that support non-blocking issue
net: make socket read/write_iter() honor IOCB_NOWAIT
io_uring: only hash regular files for async work execution
io_uring: run next sqe inline if possible
io_uring: don't dynamically allocate poll data
io_uring: deferred send/recvmsg should assign iov
io_uring: sqthread should grab ctx->uring_lock for submissions
io-wq: briefly spin for new work after finishing work
io-wq: remove worker->wait waitqueue
io_uring: allow unbreakable links
Linus Torvalds [Fri, 13 Dec 2019 22:13:15 +0000 (14:13 -0800)]
Merge tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM multipath by restoring full path selector functionality for
bio-based configurations that don't haave a SCSI device handler.
- Fix dm-btree removal to ensure non-root btree nodes have at least
(max_entries / 3) entries. This resolves userspace thin_check
utility's report of "too few entries in btree_node".
- Fix both the DM thin-provisioning and dm-clone targets to properly
flush the data device prior to metadata commit. This resolves the
potential for inconsistency across a power loss event when the data
device has a volatile writeback cache.
- Small documentation fixes to dm-clone and dm-integrity.
* tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
docs: dm-integrity: remove reference to ARC4
dm thin: Flush data device before committing metadata
dm thin metadata: Add support for a pre-commit callback
dm clone: Flush destination device before committing metadata
dm clone metadata: Use a two phase commit
dm clone metadata: Track exact changes per transaction
dm btree: increase rebalance threshold in __rebalance2()
dm: add dm-clone to the documentation index
dm mpath: remove harmful bio-based optimization
Linus Torvalds [Fri, 13 Dec 2019 22:02:12 +0000 (14:02 -0800)]
Merge tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull FIELD_SIZEOF conversion from Kees Cook:
"A mostly mechanical treewide conversion from FIELD_SIZEOF() to
sizeof_field(). This avoids the redundancy of having 2 macros
(actually 3) doing the same thing, and consolidates on sizeof_field().
While "field" is not an accurate name, it is the common name used in
the kernel, and doesn't result in any unintended innuendo.
As there are still users of FIELD_SIZEOF() in -next, I will clean up
those during this coming development cycle and send the final old
macro removal patch at that time"
* tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
treewide: Use sizeof_field() macro
MIPS: OCTEON: Replace SIZEOF_FIELD() macro
* pm-cpuidle:
cpuidle: Drop unnecessary type cast in cpuidle_poll_time()
cpuidle: Fix cpuidle_driver_state_disabled()
cpuidle: use first valid target residency as poll time
* acpi-pm:
ACPI: PM: Avoid attaching ACPI PM domain to certain devices
Dave Airlie [Fri, 13 Dec 2019 04:50:01 +0000 (14:50 +1000)]
Merge tag 'drm-fixes-5.5-2019-12-12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
drm-fixes-5.5-2019-12-12:
amdgpu:
- DC fixes for renoir
- Gfx8 fence flush align with mesa
- Power profile fix for arcturus
- Freesync fix
- DC I2c over aux fix
- DC aux defer fix
- GPU reset fix
- GPUVM invalidation semaphore fixes for PCO and SR-IOV
- Golden settings updates for gfx10
Dave Airlie [Fri, 13 Dec 2019 04:44:08 +0000 (14:44 +1000)]
Merge tag 'drm-intel-fixes-2019-12-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix user reported issue #673: GPU hang on transition to idle
- Avoid corruption on the top of the screen on GLK+ by disabling FBC
- Fix non-privileged access to OA on Tigerlake
- Fix HDCP code not to touch global state when just computing commit
- Fix CI splat by saving irqstate around virtual_context_destroy
- Serialise context retirement possibly on another CPU
Saravana Kannan [Mon, 9 Dec 2019 19:31:19 +0000 (11:31 -0800)]
of/platform: Unconditionally pause/resume sync state during kernel init
Commit 5e6669387e22 ("of/platform: Pause/resume sync state during init
and of_platform_populate()") paused/resumed sync state during init only
if Linux had parsed and populated a devicetree.
However, the check for that (of_have_populated_dt()) can change after
of_platform_default_populate_init() executes. One example of this is
when devicetree unittests are enabled. This causes an unmatched
pause/resume of sync state. To avoid this, just unconditionally
pause/resume sync state during init.
Fixes: 5e6669387e22 ("of/platform: Pause/resume sync state during init and of_platform_populate()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Reviewed-by: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
Rob Herring [Wed, 11 Dec 2019 14:51:01 +0000 (08:51 -0600)]
dt-bindings: memory-controllers: tegra: Fix type references
Json-schema requires a $ref to be under an 'allOf' if there are
additional constraints otherwise the additional constraints are
ignored. (Note that this behavior will be changed in draft8.)
Fixes: 641262f5e1ed ("dt-bindings: memory: Add binding for NVIDIA Tegra30 External Memory Controller") Fixes: 785685b7a106 ("dt-bindings: memory: Add binding for NVIDIA Tegra30 Memory Controller") Fixes: 8da65c377b21 ("dt-bindings: memory: tegra30: Convert to Tegra124 YAML") Cc: Thierry Reding <treding@nvidia.com> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: linux-tegra@vger.kernel.org Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
PCI: rockchip: Fix IO outbound ATU register number
Since 62240a88004b ("PCI: rockchip: Drop storing driver private outbound
resource data), the offset calculation is wrong to access the register
number to program the IO outbound ATU.
Fix this by computing the ATU IO register number based on the number of MEM
registers, not the size of the IO region.
This causes 'synchronous external aborts' like the following:
changzhu [Tue, 10 Dec 2019 14:50:16 +0000 (22:50 +0800)]
drm/amdgpu: add invalidate semaphore limit for SRIOV in gmc10
It may fail to load guest driver in round 2 when using invalidate
semaphore for SRIOV. So it needs to avoid using invalidate semaphore
for SRIOV.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
changzhu [Tue, 10 Dec 2019 14:00:59 +0000 (22:00 +0800)]
drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9
It may fail to load guest driver in round 2 or cause Xstart problem
when using invalidate semaphore for SRIOV or picasso. So it needs avoid
using invalidate semaphore for SRIOV and picasso.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Thu, 12 Dec 2019 18:56:37 +0000 (10:56 -0800)]
Merge tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A fix to avoid a corner case when scheduling cap reclaim in batches
from Xiubo, a patch to add some observability into cap waiters from
Jeff and a couple of cleanups"
* tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client:
ceph: add more debug info when decoding mdsmap
ceph: switch to global cap helper
ceph: trigger the reclaim work once there has enough pending caps
ceph: show tasks waiting on caps in debugfs caps file
ceph: convert int fields in ceph_mount_options to unsigned int
cpuidle: Drop unnecessary type cast in cpuidle_poll_time()
The data type of the target_residency_ns field in struct cpuidle_state
is u64, so it does not need to be cast into u64.
Get rid of the unnecessary type cast.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Logan Gunthorpe [Tue, 10 Dec 2019 18:47:04 +0000 (11:47 -0700)]
block: fix NULL pointer dereference in account statistics with IDE
The IDE driver creates some passthru requests which never get
submitted to the block layer in such a way that blk_account_io_start()
gets called. However, the driver still calls __blk_mq_end_request() in
ide_end_rq() which will call blk_account_io_completion() which tries
to dereferences req->part which is never set. See ide_prep_sense() for
an example of where these requests come from.
To fix this, blk_account_io_completion() and blk_account_io_done()
should do nothing if req->part is not set.
changzhu [Tue, 10 Dec 2019 02:23:09 +0000 (10:23 +0800)]
drm/amdgpu: avoid using invalidate semaphore for picasso
It may cause timeout waiting for sem acquire in VM flush when using
invalidate semaphore for picasso. So it needs to avoid using invalidate
semaphore for piasso.
Guenter Roeck [Thu, 12 Dec 2019 08:34:03 +0000 (16:34 +0800)]
nios2: Fix ioremap
Commit 5ace77e0b41a ("nios2: remove __ioremap") removed the following code,
with the argument that cacheflag is always 0 and the expression would
therefore always be false.
Jens Axboe [Thu, 12 Dec 2019 03:49:58 +0000 (20:49 -0700)]
Merge branch 'md-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-linus
Pull MD fixes from Song.
* 'md-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: make sure desc_nr less than MD_SB_DISKS
md: raid1: check rdev before reference in raid1_sync_request func
raid5: need to set STRIPE_HANDLE for batch head
Linus Torvalds [Thu, 12 Dec 2019 02:10:40 +0000 (18:10 -0800)]
Merge tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"Fixes for AFS plus one patch to make debugging easier:
- Fix how addresses are matched to server records. This is currently
incorrect which means cache invalidation callbacks from the server
don't necessarily get delivered correctly. This causes stale data
and metadata to be seen under some circumstances.
- Make the dynamic root superblock R/W so that rpm/dnf can reapply
the SELinux label to it when upgrading the Fedora filesystem-afs
package. If the filesystem is R/O, this fails and the upgrade
fails.
It might be better in future to allow setxattr from an LSM to
bypass the R/O protections, if only for pseudo-filesystems.
- Fix the parsing of mountpoint strings. The mountpoint object has to
have a terminal dot, whereas the source/device string passed to
mount should not. This confuses type-forcing suffix detection
leading to the wrong volume variant being mounted.
- Make lookups in the dynamic root superblock for creation events
(such as mkdir) fail with EOPNOTSUPP rather than something like
EEXIST. The dynamic root only allows implicit creation by the
->lookup() method - and only if the target cell exists.
- Fix the looking up of an AFS superblock to include the cell in the
matching key - otherwise all volumes with the same ID number are
treated as the same thing, irrespective of which cell they're in.
- Show the volume name of each volume in the volume records displayed
in /proc/net/afs/<cell>/volumes. This proved useful in debugging as
it provides a way to map the volume IDs to names, where the names
are what appear in /proc/mounts"
* tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Show volume name in /proc/net/afs/<cell>/volumes
afs: Fix missing cell comparison in afs_test_super()
afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
afs: Fix mountpoint parsing
afs: Fix SELinux setting security label on /afs
afs: Fix afs_find_server lookups for ipv4 peers
Jens Axboe [Wed, 11 Dec 2019 22:55:43 +0000 (15:55 -0700)]
io_uring: ensure we return -EINVAL on unknown opcode
If we submit an unknown opcode and have fd == -1, io_op_needs_file()
will return true as we default to needing a file. Then when we go and
assign the file, we find the 'fd' invalid and return -EBADF. We really
should be returning -EINVAL for that case, as we normally do for
unsupported opcodes.
Change io_op_needs_file() to have the following return values:
0 - does not need a file
1 - does need a file
< 0 - error value
and use this to pass back the right value for this invalid case.
Linus Torvalds [Wed, 11 Dec 2019 20:25:32 +0000 (12:25 -0800)]
Merge tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Mainly address a regression reported by David recently observed
together with overlayfs due to the improper return value of
listxattr() without xattr. Update outdated expressions in document as
well.
Summary:
- Fix improper return value of listxattr() with no xattr
- Keep up documentation with latest code"
* tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: update documentation
erofs: zero out when listxattr is called with no xattr
Linus Torvalds [Wed, 11 Dec 2019 20:22:38 +0000 (12:22 -0800)]
Merge tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Remove code I accidentally applied when doing a minor fix up to a
patch, and then using "git commit -a --amend", which pulled in some
other changes I was playing with.
- Remove an used variable in trace_events_inject code
- Fix function graph tracer when it traces a ftrace direct function.
It will now ignore tracing a function that has a ftrace direct
tramploine attached. This is needed for eBPF to use the ftrace direct
code.
* tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix function_graph tracer interaction with BPF trampoline
tracing: remove set but not used variable 'buffer'
module: Remove accidental change of module_enable_x()
Linus Torvalds [Wed, 11 Dec 2019 19:46:19 +0000 (11:46 -0800)]
pipe: simplify signal handling in pipe_read() and add comments
There's no need to separately check for signals while inside the locked
region, since we're going to do "wait_event_interruptible()" right
afterwards anyway, and the error handling is much simpler there.
The check for whether we had already read anything was also redundant,
since we no longer do the odd merging of reads when there are pending
writers.
But perhaps more importantly, this adds commentary about why we still
need to wake up possible writers even though we didn't read any data,
and why we can skip all the finishing touches now if we get a signal (or
had a signal pending) while waiting for more data.
[ This is a split-out cleanup from my "make pipe IO use exclusive wait
queues" thing, which I can't apply because it triggers a nasty bug in
the GNU make jobserver - Linus ]
Guoqing Jiang [Wed, 27 Nov 2019 16:57:50 +0000 (17:57 +0100)]
raid5: need to set STRIPE_HANDLE for batch head
With commit 6ce220dd2f8ea71d6afc29b9a7524c12e39f374a ("raid5: don't set
STRIPE_HANDLE to stripe which is in batch list"), we don't want to set
STRIPE_HANDLE flag for sh which is already in batch list.
However, the stripe which is the head of batch list should set this flag,
otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head),
it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved.
Thanks for Xiao's effort to verify the change.
Fixes: 6ce220dd2f8ea ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list") Reported-by: Xiao Ni <xni@redhat.com> Tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
David Howells [Wed, 11 Dec 2019 08:58:59 +0000 (08:58 +0000)]
afs: Show volume name in /proc/net/afs/<cell>/volumes
Show the name of each volume in /proc/net/afs/<cell>/volumes to make it
easier to work out the name corresponding to a volume ID. This makes it
easier to work out which mounts in /proc/mounts correspond to which volume
ID.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
David Howells [Wed, 11 Dec 2019 08:06:08 +0000 (08:06 +0000)]
afs: Fix missing cell comparison in afs_test_super()
Fix missing cell comparison in afs_test_super(). Without this, any pair
volumes that have the same volume ID will share a superblock, no matter the
cell, unless they're in different network namespaces.
Normally, most users will only deal with a single cell and so they won't
see this. Even if they do look into a second cell, they won't see a
problem unless they happen to hit a volume with the same ID as one they've
already got mounted.
David Howells [Wed, 11 Dec 2019 08:56:04 +0000 (08:56 +0000)]
afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
Fix the lookup method on the dynamic root directory such that creation
calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP
rather than failing with some odd error (such as EEXIST).
lookup() itself tries to create automount directories when it is invoked.
These are cached locally in RAM and not committed to storage.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
David Howells [Mon, 9 Dec 2019 15:04:45 +0000 (15:04 +0000)]
afs: Fix mountpoint parsing
Each AFS mountpoint has strings that define the target to be mounted. This
is required to end in a dot that is supposed to be stripped off. The
string can include suffixes of ".readonly" or ".backup" - which are
supposed to come before the terminal dot. To add to the confusion, the "fs
lsmount" afs utility does not show the terminal dot when displaying the
string.
The kernel mount source string parser, however, assumes that the terminal
dot marks the suffix and that the suffix is always "" and is thus ignored.
In most cases, there is no suffix and this is not a problem - but if there
is a suffix, it is lost and this affects the ability to mount the correct
volume.
The command line mount command, on the other hand, is expected not to
include a terminal dot - so the problem doesn't arise there.
Fix this by making sure that the dot exists and then stripping it when
passing the string to the mount configuration.
Fixes: bec5eb614130 ("AFS: Implement an autocell mount capability [ver #2]") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
dt-bindings: net: ti: cpsw-switch: update to fix comments
After original patch was merged there were additional comments/requests
provided by Rob Herring [1]. Mostly they are related to json-schema usage,
and this patch fixes them. Also SPDX-License-Identifier has been changed to
(GPL-2.0-only OR BSD-2-Clause) as requested.
[1] https://lkml.org/lkml/2019/11/21/875 Fixes: ef63fe72f698 ("dt-bindings: net: ti: add new cpsw switch driver bindings") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
[robh: Remove 2 more maxItems that aren't necessary] Signed-off-by: Rob Herring <robh@kernel.org>
Chris Wilson [Thu, 21 Nov 2019 07:10:41 +0000 (07:10 +0000)]
drm/i915: Serialise with remote retirement
Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.
Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk
(cherry picked from commit 2d0fb251360ab7eccbffd99f6933a2a4de678d52) Fixes: 093b92287363 ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree") Closes: https://gitlab.freedesktop.org/drm/intel/issues/754 Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We managed to get confused about the shift direction at least once.
Let's switch to division/multiplcation instead. Add a number of pages
macro for this purpose. We still keep the order macro around too since
this is what alloc/free pages want.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com>
free_page_order is a confusing name. It's not a page order
actually, it's the order of the block of memory we are hinting.
Rename to hint_block_order. Also, rename SIZE to BYTES
to make it clear it's the block size in bytes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com>
virtio-balloon: fix managed page counts when migrating pages between zones
In case we have to migrate a ballon page to a newpage of another zone, the
managed page count of both zones is wrong. Paired with memory offlining
(which will adjust the managed page count), we can trigger kernel crashes
and all kinds of different symptoms.
In another instance (older kernel), I was able to observe this
(negative page count :/):
[ 180.896971] Offlined Pages 32768
[ 182.667462] Offlined Pages 32768
[ 184.408117] Offlined Pages 32768
[ 186.026321] Offlined Pages 32768
[ 187.684861] Offlined Pages 32768
[ 189.227013] Offlined Pages 32768
[ 190.830303] Offlined Pages 32768
[ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009
In another instance (older kernel), I was no longer able to start any
process:
[root@vm ~]# [ 214.348068] Offlined Pages 32768
[ 215.973009] Offlined Pages 32768
cat /proc/meminfo
-bash: fork: Cannot allocate memory
[root@vm ~]# cat /proc/meminfo
-bash: fork: Cannot allocate memory
Fix it by properly adjusting the managed page count when migrating if
the zone changed. The managed page count of the zones now looks after
unplug of the DIMM (and after deflating the balloon) just like before
inflating the balloon (and plugging+onlining the DIMM).
We'll temporarily modify the totalram page count. If this ever becomes a
problem, we can fine tune by providing helpers that don't touch
the totalram pages (e.g., adjust_zone_managed_page_count()).
Please note that fixing up the managed page count is only necessary when
we adjusted the managed page count when inflating - only if we
don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
managed page count is not touched when inflating/deflating.
Reported-by: Yumei Huang <yuhuang@redhat.com> Fixes: 3dcc0571cd64 ("mm: correctly update zone->managed_pages") Cc: <stable@vger.kernel.org> # v3.11+ Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Wolfram Sang [Wed, 6 Nov 2019 21:21:01 +0000 (22:21 +0100)]
i2c: add helper to check if a client has a driver attached
As a preparation for an API conversion, factor out something frequently
used in the media subsystem. As an improvement, it bails out on both,
NULL and ERRPTR to handle the old and new API.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Hui Wang [Wed, 11 Dec 2019 05:13:21 +0000 (13:13 +0800)]
ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
After applying the fixup ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, the
Line-out jack works well. And instead of adding a new set of pin
definition in the pin_fixup_tbl, we put a more generic matching entry
in the fallback_pin_fixup_tbl.
Jens Axboe [Tue, 10 Dec 2019 03:16:22 +0000 (20:16 -0700)]
io_uring: add sockets to list of files that support non-blocking issue
In chasing a performance issue between using IORING_OP_RECVMSG and
IORING_OP_READV on sockets, tracing showed that we always punt the
socket reads to async offload. This is due to io_file_supports_async()
not checking for S_ISSOCK on the inode. Since sockets supports the
O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list
of file types that we can do a non-blocking issue to.
Jens Axboe [Tue, 10 Dec 2019 03:58:56 +0000 (20:58 -0700)]
net: make socket read/write_iter() honor IOCB_NOWAIT
The socket read/write helpers only look at the file O_NONBLOCK. not
the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2
and io_uring that rely on not having the file itself marked nonblocking,
but rather the iocb itself.
Cc: netdev@vger.kernel.org Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 10 Dec 2019 03:01:01 +0000 (20:01 -0700)]
io_uring: run next sqe inline if possible
One major use case of linked commands is the ability to run the next
link inline, if at all possible. This is done correctly for async
offload, but somewhere along the line we lost the ability to do so when
we were able to complete a request without having to punt it. Ensure
that we do so correctly.
Jens Axboe [Tue, 10 Dec 2019 00:52:20 +0000 (17:52 -0700)]
io_uring: don't dynamically allocate poll data
This essentially reverts commit e944475e6984. For high poll ops
workloads, like TAO, the dynamic allocation of the wait_queue
entry for IORING_OP_POLL_ADD adds considerable extra overhead.
Go back to embedding the wait_queue_entry, but keep the usage of
wait->private for the pointer stashing.
Jens Axboe [Sun, 8 Dec 2019 04:03:59 +0000 (21:03 -0700)]
io-wq: remove worker->wait waitqueue
We only have one cases of using the waitqueue to wake the worker, the
rest are using wake_up_process(). Since we can save some cycles not
fiddling with the waitqueue io_wqe_worker(), switch the work activation
to task wakeup and get rid of the now unused wait_queue_head_t in
struct io_worker.
Jens Axboe [Sun, 8 Dec 2019 03:59:47 +0000 (20:59 -0700)]
io_uring: allow unbreakable links
Some commands will invariably end in a failure in the sense that the
completion result will be less than zero. One such example is timeouts
that don't have a completion count set, they will always complete with
-ETIME unless cancelled.
For linked commands, we sever links and fail the rest of the chain if
the result is less than zero. Since we have commands where we know that
will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
regardless of the completion result. Note that the link will still sever
if we fail submitting the parent request, hard links are only resilient
in the presence of completion results for requests that did submit
correctly.