Additional updates of the generic power domains (genpd) framework
(support for devices attached to multiple domains) and the cpupower
utility (minor fixes) for 4.18-rc1.
* pm-domains:
PM / Domains: Add dev_pm_domain_attach_by_id() to manage multi PM domains
PM / Domains: Add support for multi PM domains per device to genpd
PM / Domains: Split genpd_dev_pm_attach()
PM / Domains: Don't attach devices in genpd with multi PM domains
PM / Domains: dt: Allow power-domain property to be a list of specifiers
* pm-tools:
cpupower : Fix header name to read idle state name
cpupower: fix spelling mistake: "logilename" -> "logfilename"
Additional cpufreq updates for 4.18-rc1: fixes and cleanups in the
core and drivers and intel_pstate extension to do iowait boosting
on systems with HWP that improves performance quite a bit.
* pm-cpufreq:
cpufreq: imx6q: check speed grades for i.MX6ULL
cpufreq: governors: Fix long idle detection logic in load calculation
cpufreq: intel_pstate: enable boost for Skylake Xeon
cpufreq: intel_pstate: New sysfs entry to control HWP boost
cpufreq: intel_pstate: HWP boost performance on IO wakeup
cpufreq: intel_pstate: Add HWP boost utility and sched util hooks
cpufreq: ti-cpufreq: Use devres managed API in probe()
cpufreq: ti-cpufreq: Fix an incorrect error return value
cpufreq: ACPI: make function acpi_cpufreq_fast_switch() static
cpufreq: kryo: allow building as a loadable module
Revert "PM / runtime: Fixup reference counting of device link suppliers at probe"
Revert commit 1e8378619841 (PM / runtime: Fixup reference counting of
device link suppliers at probe), as it has introduced a regression
and the condition it was designed to address should be covered by the
existing code.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Chen Yu [Fri, 8 Jun 2018 01:07:33 +0000 (09:07 +0800)]
cpufreq: governors: Fix long idle detection logic in load calculation
According to current code implementation, detecting the long
idle period is done by checking if the interval between two
adjacent utilization update handlers is long enough. Although
this mechanism can detect if the idle period is long enough
(no utilization hooks invoked during idle period), it might
not cover a corner case: if the task has occupied the CPU
for too long which causes no context switches during that
period, then no utilization handler will be launched until this
high prio task is scheduled out. As a result, the idle_periods
field might be calculated incorrectly because it regards the
100% load as 0% and makes the conservative governor who uses
this field confusing.
Change the detection to compare the idle_time with sampling_rate
directly.
Reported-by: Artem S. Tashkinov <t.artem@mailcity.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PM / wakeup: Export wakeup_count instead of event_count via sysfs
Currently we export event_count instead of wakeup_count via the
per-device wakeup_count sysfs attribute. Change it to wakeup_count
to make it more meaningful.
wakeup_count increments only when events_check_enabled is set,
that is whenever writes the current wakeup count to
/sys/power/wakeup_count. Also events_check_enabled is cleared on
every resume. User space is expected to write to this just before
suspend. This way pm_wakeup_event(), when called from IRQs handles,
will increment wakeup_count only if we are in system-wide
suspend-resume cycle and should give a fair approximation of how many
times a device may have triggered a wakeup from system suspend.
event_count on the other hand will increment every time
pm_wakeup_event() is called irrespective of whether we are in a
suspend-resume cycle and some drivers call it on every interrupt
which makes it less useful for system wakeup tracking.
Signed-off-by: Ravi Chandra Sadineni <ravisadineni@chromium.org> Acked-by: Pavel Machek <pavel@ucw.cz>
[ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Thu, 31 May 2018 10:59:59 +0000 (12:59 +0200)]
PM / Domains: Add dev_pm_domain_attach_by_id() to manage multi PM domains
The existing dev_pm_domain_attach() function, allows a single PM domain to
be attached per device. To be able to support devices that are partitioned
across multiple PM domains, let's introduce a new interface,
dev_pm_domain_attach_by_id().
The dev_pm_domain_attach_by_id() returns a new allocated struct device with
the corresponding attached PM domain. This enables for example a driver to
operate on the new device from a power management point of view. The driver
may then also benefit from using the received device, to set up so called
device-links towards its original device. Depending on the situation, these
links may then be dynamically changed.
The new interface is typically called by drivers during their probe phase,
in case they manages devices which uses multiple PM domains. If that is the
case, the driver also becomes responsible of managing the detaching of the
PM domains, which typically should be done at the remove phase. Detaching
is done by calling the existing dev_pm_domain_detach() function and for
each of the received devices from dev_pm_domain_attach_by_id().
Note, currently its only genpd that supports multiple PM domains per
device, but dev_pm_domain_attach_by_id() can easily by extended to cover
other PM domain types, if/when needed.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Thu, 31 May 2018 10:59:58 +0000 (12:59 +0200)]
PM / Domains: Add support for multi PM domains per device to genpd
To support devices being partitioned across multiple PM domains, let's
begin with extending genpd to cope with these kind of configurations.
Therefore, add a new exported function genpd_dev_pm_attach_by_id(), which
is similar to the existing genpd_dev_pm_attach(), but with the difference
that it allows its callers to provide an index to the PM domain that it
wants to attach.
Note that, genpd_dev_pm_attach_by_id() shall only be called by the driver
core / PM core, similar to how the existing dev_pm_domain_attach() makes
use of genpd_dev_pm_attach(). However, this is implemented by following
changes on top.
Because, only one PM domain can be attached per device, genpd needs to
create a virtual device that it can attach/detach instead. More precisely,
let the new function genpd_dev_pm_attach_by_id() register a virtual struct
device via calling device_register(). Then let it attach this device to the
corresponding PM domain, rather than the one that is provided by the
caller. The actual attaching is done via re-using the existing genpd OF
functions.
At successful attachment, genpd_dev_pm_attach_by_id() returns the created
virtual device, which allows the caller to operate on it to deal with power
management. Following changes on top, provides more details in this
regards.
To deal with detaching of a PM domain for the multiple PM domains case,
let's also extend the existing genpd_dev_pm_detach() function, to cover the
cleanup of the created virtual device, via make it call device_unregister()
on it. In this way, there is no need to introduce a new function to deal
with detach for the multiple PM domain case, but instead the existing one
is re-used.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Thu, 31 May 2018 10:59:57 +0000 (12:59 +0200)]
PM / Domains: Split genpd_dev_pm_attach()
To extend genpd to deal with allowing multiple PM domains per device, some
of the code in genpd_dev_pm_attach() can be re-used. Let's prepare for this
by moving some of the code into a sub-function.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Thu, 31 May 2018 10:59:56 +0000 (12:59 +0200)]
PM / Domains: Don't attach devices in genpd with multi PM domains
The power-domain DT property may now contain a list of PM domain
specifiers, which represents that a device are partitioned across multiple
PM domains. This leads to a new situation in genpd_dev_pm_attach(), as only
one PM domain can be attached per device.
To remain things simple for the most common configuration, when a single PM
domain is used, let's treat the multiple PM domain case as being specific.
In other words, let's change genpd_dev_pm_attach() to check for multiple PM
domains and prevent it from attach any PM domain for this case. Instead,
leave this to be managed separately, from following changes to genpd.
Suggested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Thu, 31 May 2018 10:59:55 +0000 (12:59 +0200)]
PM / Domains: dt: Allow power-domain property to be a list of specifiers
To be able to describe topologies where devices are partitioned across
multiple power domains, let's extend the power-domain property to allow
being a list of PM domain specifiers.
Suggested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge tag 'linux-cpupower-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower updates for v4.18-rc1 from Shuah Khan:
"This cpupower update for 4.18-rc1 consists of two minor fixes."
* tag 'linux-cpupower-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower : Fix header name to read idle state name
cpupower: fix spelling mistake: "logilename" -> "logfilename"
cpufreq: intel_pstate: HWP boost performance on IO wakeup
This change uses SCHED_CPUFREQ_IOWAIT flag to boost HWP performance.
Since SCHED_CPUFREQ_IOWAIT flag is set frequently, we don't start
boosting steps unless we see two consecutive flags in two ticks. This
avoids boosting due to IO because of regular system activities.
To avoid synchronization issues, the actual processing of the flag is
done on the local CPU callback.
Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq: intel_pstate: Add HWP boost utility and sched util hooks
Added two utility functions to HWP boost up gradually and boost down to
the default cached HWP request values.
Boost up:
Boost up updates HWP request minimum value in steps. This minimum value
can reach upto at HWP request maximum values depends on how frequently,
this boost up function is called. At max, boost up will take three steps
to reach the maximum, depending on the current HWP request levels and HWP
capabilities. For example, if the current settings are:
If P0 (Turbo max) = P1 (Guaranteed max) = min
No boost at all.
If P0 (Turbo max) > P1 (Guaranteed max) = min
Should result in one level boost only for P0.
If P0 (Turbo max) = P1 (Guaranteed max) > min
Should result in two level boost:
(min + p1)/2 and P1.
If P0 (Turbo max) > P1 (Guaranteed max) > min
Should result in three level boost:
(min + p1)/2, P1 and P0.
We don't set any level between P0 and P1 as there is no guarantee that
they will be honored.
Boost down:
After the system is idle for hold time of 3ms, the HWP request is reset
to the default value from HWP init or user modified one via sysfs.
Caching of HWP Request and Capabilities
Store the HWP request value last set using MSR_HWP_REQUEST and read
MSR_HWP_CAPABILITIES. This avoid reading of MSRs in the boost utility
functions.
These boost utility functions calculated limits are based on the latest
HWP request value, which can be modified by setpolicy() callback. So if
user space modifies the minimum perf value, that will be accounted for
every time the boost up is called. There will be case when there can be
contention with the user modified minimum perf, in that case user value
will gain precedence. For example just before HWP_REQUEST MSR is updated
from setpolicy() callback, the boost up function is called via scheduler
tick callback. Here the cached MSR value is already the latest and limits
are updated based on the latest user limits, but on return the MSR write
callback called from setpolicy() callback will update the HWP_REQUEST
value. This will be used till next time the boost up function is called.
In addition add a variable to control HWP dynamic boosting. When HWP
dynamic boost is active then set the HWP specific update util hook. The
contents in the utility hooks will be filled in the subsequent patches.
Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Suman Anna [Thu, 31 May 2018 22:21:44 +0000 (17:21 -0500)]
cpufreq: ti-cpufreq: Use devres managed API in probe()
The ti_cpufreq_probe() function uses regular kzalloc to allocate
the ti_cpufreq_data structure and kfree for freeing this memory
on failures. Simplify this code by using the devres managed
API.
Signed-off-by: Suman Anna <s-anna@ti.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Suman Anna [Thu, 31 May 2018 22:21:43 +0000 (17:21 -0500)]
cpufreq: ti-cpufreq: Fix an incorrect error return value
Commit 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when
failure) has fixed a memory leak in the failure path, however
the patch returned a positive value on get_cpu_device() failure
instead of the previous negative value. Fix this incorrect error
return value properly.
Fixes: 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when failure) Cc: 4.14+ <stable@vger.kernel.org> # v4.14+ Signed-off-by: Suman Anna <s-anna@ti.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Colin Ian King [Fri, 1 Jun 2018 13:05:12 +0000 (14:05 +0100)]
cpufreq: ACPI: make function acpi_cpufreq_fast_switch() static
The acpi_cpufreq_fast_switch() function is local to the source and
does not need to be in global scope, so make it static.
Cleans up sparse warning:
drivers/cpufreq/acpi-cpufreq.c:468:14: warning: symbol
'acpi_cpufreq_fast_switch' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Arnd Bergmann [Tue, 5 Jun 2018 11:44:24 +0000 (13:44 +0200)]
cpufreq: kryo: allow building as a loadable module
Building the kryo cpufreq driver while QCOM_SMEM is a loadable module
results in a link error:
drivers/cpufreq/qcom-cpufreq-kryo.o: In function `qcom_cpufreq_kryo_probe':
qcom-cpufreq-kryo.c:(.text+0xbc): undefined reference to `qcom_smem_get'
The problem is that Kconfig ignores interprets the dependency as met
when the dependent symbol is a 'bool' one. By making it 'tristate',
it will be forced to be a module here, which builds successfully.
Fixes: 46e2856b8e18 (cpufreq: Add Kryo CPU scaling driver) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Tue, 5 Jun 2018 16:38:39 +0000 (09:38 -0700)]
Merge tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These include a significant update of the generic power domains
(genpd) and Operating Performance Points (OPP) frameworks, mostly
related to the introduction of power domain performance levels,
cpufreq updates (new driver for Qualcomm Kryo processors, updates of
the existing drivers, some core fixes, schedutil governor
improvements), PCI power management fixes, ACPI workaround for
EC-based wakeup events handling on resume from suspend-to-idle, and
major updates of the turbostat and pm-graph utilities.
Specifics:
- Introduce power domain performance levels into the the generic
power domains (genpd) and Operating Performance Points (OPP)
frameworks (Viresh Kumar, Rajendra Nayak, Dan Carpenter).
- Fix two issues in the runtime PM framework related to the
initialization and removal of devices using device links (Ulf
Hansson).
- Clean up the initialization of drivers for devices in PM domains
(Ulf Hansson, Geert Uytterhoeven).
- Fix a cpufreq core issue related to the policy sysfs interface
causing CPU online to fail for CPUs sharing one cpufreq policy in
some situations (Tao Wang).
- Make it possible to use platform-specific suspend/resume hooks in
the cpufreq-dt driver and make the Armada 37xx DVFS use that
feature (Viresh Kumar, Miquel Raynal).
- Optimize policy transition notifications in cpufreq (Viresh Kumar).
- Improve the iowait boost mechanism in the schedutil cpufreq
governor (Patrick Bellasi).
- Improve the handling of deferred frequency updates in the schedutil
cpufreq governor (Joel Fernandes, Dietmar Eggemann, Rafael Wysocki,
Viresh Kumar).
- Add a new cpufreq driver for Qualcomm Kryo (Ilia Lin).
- Fix and clean up some cpufreq drivers (Colin Ian King, Dmitry
Osipenko, Doug Smythies, Luc Van Oostenryck, Simon Horman, Viresh
Kumar).
- Fix the handling of PCI devices with the DPM_SMART_SUSPEND flag set
and update stale comments in the PCI core PM code (Rafael Wysocki).
- Work around an issue related to the handling of EC-based wakeup
events in the ACPI PM core during resume from suspend-to-idle if
the EC has been put into the low-power mode (Rafael Wysocki).
- Improve the handling of wakeup source objects in the PM core (Doug
Berger, Mahendran Ganesh, Rafael Wysocki).
- Update the driver core to prevent deferred probe from breaking
suspend/resume ordering (Feng Kan).
- Clean up the PM core somewhat (Bjorn Helgaas, Ulf Hansson, Rafael
Wysocki).
- Make the core suspend/resume code and cpufreq support the RT patch
(Sebastian Andrzej Siewior, Thomas Gleixner).
- Consolidate the PM QoS handling in cpuidle governors (Rafael
Wysocki).
- Fix a possible crash in the hibernation core (Tetsuo Handa).
- Update the rockchip-io Adaptive Voltage Scaling (AVS) driver (David
Wu).
- Update the turbostat utility (fixes, cleanups, new CPU IDs, new
command line options, built-in "Low Power Idle" counters support,
new POLL and POLL% columns) and add an entry for it to MAINTAINERS
(Len Brown, Artem Bityutskiy, Chen Yu, Laura Abbott, Matt Turner,
Prarit Bhargava, Srinivas Pandruvada).
- Update the pm-graph to version 5.1 (Todd Brandt).
- Update the intel_pstate_tracer utility (Doug Smythies)"
* tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (128 commits)
tools/power turbostat: update version number
tools/power turbostat: Add Node in output
tools/power turbostat: add node information into turbostat calculations
tools/power turbostat: remove num_ from cpu_topology struct
tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
tools/power turbostat: track thread ID in cpu_topology
tools/power turbostat: Calculate additional node information for a package
tools/power turbostat: Fix node and siblings lookup data
tools/power turbostat: set max_num_cpus equal to the cpumask length
tools/power turbostat: if --num_iterations, print for specific number of iterations
tools/power turbostat: Add Cannon Lake support
tools/power turbostat: delete duplicate #defines
x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: add POLL and POLL% column
tools/power turbostat: Fix --hide Pk%pc10
tools/power turbostat: Build-in "Low Power Idle" counters support
tools/power turbostat: Don't make man pages executable
tools/power turbostat: remove blank lines
tools/power turbostat: a small C-states dump readability immprovement
...
Linus Torvalds [Tue, 5 Jun 2018 16:34:03 +0000 (09:34 -0700)]
Merge tag 'for-linus-20180605' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"This just contains the dm kzalloc fix that was discussed, and a fix
that I queued up yesterday for a case where blk-mq doesn't honor the
stop bit appropriately"
* tag 'for-linus-20180605' of git://git.kernel.dk/linux-block:
dm: Use kzalloc for all structs with embedded biosets/mempools
blk-mq: return when hctx is stopped in blk_mq_run_work_fn
Linus Torvalds [Tue, 5 Jun 2018 16:04:46 +0000 (09:04 -0700)]
Merge branch 'faddr2line' (patches from Josh)
Merge faddr2line updates from Josh Poimboeuf:
- revert faddr2line's default output to its original non-code-listing
output, and make the code listing an optional feature
- give faddr2line a real maintainer, so get_maintainer.pl will actually
CC me on future patches
* emailed patches from Josh Poimboeuf <jpoimboe@redhat.com>:
MAINTAINERS: add Josh Poimboeuf as faddr2line maintainer
scripts/faddr2line: make the new code listing format optional
scripts/faddr2line: make the new code listing format optional
Commit 6870c0165feaa5 ("scripts/faddr2line: show the code context")
radically altered the output format of the faddr2line tool. And while
the new list output format might have merit it broke my vim usage and
was hard to read.
Make the new format optional; using a '--list' argument and attempt to
make the output slightly easier to read by adding a little whitespace to
separate the different files and explicitly mark the line in question.
Cc: Changbin Du <changbin.du@intel.com> Fixes: 6870c0165feaa5 ("scripts/faddr2line: show the code context") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kent Overstreet [Tue, 5 Jun 2018 09:26:33 +0000 (05:26 -0400)]
dm: Use kzalloc for all structs with embedded biosets/mempools
mempool_init()/bioset_init() require that the mempools/biosets be zeroed
first; they probably should not _require_ this, but not allocating those
structs with kzalloc is a fairly nonsensical thing to do (calling
mempool_exit()/bioset_exit() on an uninitialized mempool/bioset is legal
and safe, but only works if said memory was zeroed.)
Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Tue, 5 Jun 2018 04:34:39 +0000 (21:34 -0700)]
Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource controller updates from Thomas Gleixner:
"An update for the Intel Resource Director Technolgy (RDT) which adds a
feedback driven software controller to runtime adjust the bandwidth
allocation MSRs.
This makes the allocations more accurate and allows to use bandwidth
values in understandable units (MB/s) instead of using percentage
based allocations as the original, still available, interface.
The software controller can be enabled with a new mount option for the
resctrl filesystem"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth
x86/intel_rdt/mba_sc: Prepare for feedback loop
x86/intel_rdt/mba_sc: Add schemata support
x86/intel_rdt/mba_sc: Add initialization support
x86/intel_rdt/mba_sc: Enable/disable MBA software controller
x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc)
Linus Torvalds [Tue, 5 Jun 2018 03:26:07 +0000 (20:26 -0700)]
Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Thomas Gleixner:
- Fix a stack out of bounds write in the MCE error injection code.
- Avoid IPIs during CPU hotplug to read the MCx_MISC block address from
a remote CPU. That's fragile and pointless because the block
addresses are the same on all CPUs. So they can be read once and
local.
- Add support for MCE broadcasting on newer VIA Centaur CPUs.
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE/AMD: Read MCx_MISC block addresses on any CPU
x86/MCE: Fix stack out-of-bounds write in mce-inject.c: Flags_read()
x86/MCE: Enable MCE broadcasting on new Centaur CPUs
Linus Torvalds [Tue, 5 Jun 2018 02:59:22 +0000 (19:59 -0700)]
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
- Consolidation of softirq pending:
The softirq mask and its accessors/mutators have many implementations
scattered around many architectures. Most do the same things
consisting in a field in a per-cpu struct (often irq_cpustat_t)
accessed through per-cpu ops. We can provide instead a generic
efficient version that most of them can use. In fact s390 is the only
exception because the field is stored in lowcore.
- Support for level!?! triggered MSI (ARM)
Over the past couple of years, we've seen some SoCs coming up with
ways of signalling level interrupts using a new flavor of MSIs, where
the MSI controller uses two distinct messages: one that raises a
virtual line, and one that lowers it. The target MSI controller is in
charge of maintaining the state of the line.
This allows for a much simplified HW signal routing (no need to have
hundreds of discrete lines to signal level interrupts if you already
have a memory bus), but results in a departure from the current idea
the kernel has of MSIs.
- Support for Meson-AXG GPIO irqchip
- Large stm32 irqchip rework (suspend/resume, hierarchical domains)
- More SPDX conversions
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
ARM: dts: stm32: Add exti support to stm32mp157 pinctrl
ARM: dts: stm32: Add exti support for stm32mp157c
pinctrl/stm32: Add irq_eoi for stm32gpio irqchip
irqchip/stm32: Add suspend/resume support for hierarchy domain
irqchip/stm32: Add stm32mp1 support with hierarchy domain
irqchip/stm32: Prepare common functions
irqchip/stm32: Add host and driver data structures
irqchip/stm32: Add suspend support
irqchip/stm32: Add falling pending register support
irqchip/stm32: Checkpatch fix
irqchip/stm32: Optimizes and cleans up stm32-exti irq_domain
irqchip/meson-gpio: Add support for Meson-AXG SoCs
dt-bindings: interrupt-controller: New binding for Meson-AXG SoC
dt-bindings: interrupt-controller: Fix the double quotes
softirq/s390: Move default mutators of overwritten softirq mask to s390
softirq/x86: Switch to generic local_softirq_pending() implementation
softirq/sparc: Switch to generic local_softirq_pending() implementation
softirq/powerpc: Switch to generic local_softirq_pending() implementation
softirq/parisc: Switch to generic local_softirq_pending() implementation
softirq/ia64: Switch to generic local_softirq_pending() implementation
...
Linus Torvalds [Tue, 5 Jun 2018 02:19:16 +0000 (19:19 -0700)]
Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug updates from Ingo Molnar:
"This contains the x86 oops code printing reorganization and cleanups
from Borislav Betkov, with a particular focus in enhancing opcode
dumping all around"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Explain the reasoning for the prologue and buffer size
x86/dumpstack: Save first regs set for the executive summary
x86/dumpstack: Add a show_ip() function
x86/fault: Dump user opcode bytes on fatal faults
x86/dumpstack: Add loglevel argument to show_opcodes()
x86/dumpstack: Improve opcodes dumping in the code section
x86/dumpstack: Carve out code-dumping into a function
x86/dumpstack: Unexport oops_begin()
x86/dumpstack: Remove code_bytes
Linus Torvalds [Tue, 5 Jun 2018 02:17:47 +0000 (19:17 -0700)]
Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apm: Fix spelling mistake: "caculate" -> "calculate"
x86/mtrr: Rename main.c to mtrr.c and remove duplicate prefixes
x86: Remove pr_fmt duplicate logging prefixes
x86/early-quirks: Rename duplicate define of dev_err
x86/bpf: Clean up non-standard comments, to make the code more readable
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/io: Define readq()/writeq() to use 64-bit type
x86/asm/64: Micro-optimize __clear_user() - Use immediate constants
Linus Torvalds [Tue, 5 Jun 2018 01:19:18 +0000 (18:19 -0700)]
Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
- Centaur CPU updates (David Wang)
- AMD and other CPU topology enumeration improvements and fixes
(Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit)
- Continued 5-level paging work (Kirill A. Shutemov)
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Mark __pgtable_l5_enabled __initdata
x86/mm: Mark p4d_offset() __always_inline
x86/mm: Introduce the 'no5lvl' kernel parameter
x86/mm: Stop pretending pgtable_l5_enabled is a variable
x86/mm: Unify pgtable_l5_enabled usage in early boot code
x86/boot/compressed/64: Fix trampoline page table address calculation
x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores()
x86/Centaur: Report correct CPU/cache topology
x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
x86/CPU: Make intel_num_cpu_cores() generic
x86/CPU: Move cpu local function declarations to local header
x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available
x86/CPU: Modify detect_extended_topology() to return result
x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c
perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
x86/Centaur: Initialize supported CPU features properly
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Update util_est before updating schedutil
sched/cpufreq: Modify aggregate utilization to always include blocked FAIR utilization
sched/deadline/Documentation: Add overrun signal and GRUB-PA documentation
sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu()
sched/wait: Include <linux/wait.h> in <linux/swait.h>
sched/numa: Stagger NUMA balancing scan periods for new threads
sched/core: Don't schedule threads on pre-empted vCPUs
sched/fair: Avoid calling sync_entity_load_avg() unnecessarily
sched/fair: Rearrange select_task_rq_fair() to optimize it
Tooling side changes, which you can build and install in a single step
via:
make -C tools/perf clean install
perf annotate:
- Support 'perf annotate --group' for non-explicit recorded event
"groups", showing multiple columns, one for each event, just like
when dealing with explicit event groups (those enclosed with {})
(Jin Yao)
- Record min/max LBR cycles (>= Skylake) and add 'perf annotate' TUI
hotkey to show it (c) (Jin Yao)
perf bpf:
- Add infrastructure to help in writing eBPF C programs to be used
with '-e name.c' type events in tools such as 'record' and 'trace',
with headers for common constructs and an examples directory that
will get populated as we add more such helpers and the 'perf bpf'
(Arnaldo Carvalho de Melo)
perf stat:
- Display time in precision based on std deviation (Jiri Olsa)
- Add --table option to display time of each run (Jiri Olsa)
- Display length strings of each run for --table option (Jiri Olsa)
perf buildid-cache:
- Add --list and --purge-all options (Ravi Bangoria)
perf test:
- Let 'perf test list' display subtests (Hendrik Brueckner)
perf pti:
- Create extra kernel maps to help in decoding samples in x86 PTI
entry trampolines (Adrian Hunter)
- Copy x86 PTI entry trampoline sections in the kcore copy used for
annotation and intel_pt CPU traces decoding (Adrian Hunter)
... and a lot of other fixes, enhancements and cleanups I did not
list, see the shortlog and git log for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
perf/x86/intel/uncore: Clean up client IMC uncore
perf/x86/intel/uncore: Expose uncore_pmu_event*() functions
perf/x86/intel/uncore: Support IIO free-running counters on SKX
perf/x86/intel/uncore: Add infrastructure for free running counters
perf/x86/intel/uncore: Add new data structures for free running counters
perf/x86/intel/uncore: Correct fixed counter index check in generic code
perf/x86/intel/uncore: Correct fixed counter index check for NHM
perf/x86/intel/uncore: Introduce customized event_read() for client IMC uncore
perf/x86: Store user space frame-pointer value on a sample
perf/core: Wire up compat PERF_EVENT_IOC_QUERY_BPF, PERF_EVENT_IOC_MODIFY_ATTRIBUTES
perf/core: Fix bad use of igrab()
perf/core: Fix group scheduling with mixed hw and sw events
perf kcore_copy: Amend the offset of sections that remap kernel text
perf kcore_copy: Copy x86 PTI entry trampoline sections
perf kcore_copy: Get rid of kernel_map
perf kcore_copy: Iterate phdrs
perf kcore_copy: Layout sections
perf kcore_copy: Calculate offset from phnum
perf kcore_copy: Keep a count of phdrs
perf kcore_copy: Keep phdr data in a list
...
Linus Torvalds [Tue, 5 Jun 2018 00:12:50 +0000 (17:12 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
"Leftover perf tooling fixes from the v4.17 cycle: they sync up updated
ABI headers with their tooling versions"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools intel-pt-decoder: Update insn.h from the kernel sources
tools headers: Sync x86 cpufeatures.h with the kernel sources
tools headers: Synchronize prctl.h ABI header
perf trace beauty prctl: Default header_dir to cwd to work without parms
Linus Torvalds [Mon, 4 Jun 2018 23:40:11 +0000 (16:40 -0700)]
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Lots of tidying up changes all across the map for Linux's formal
memory/locking-model tooling, by Alan Stern, Akira Yokosawa, Andrea
Parri, Paul E. McKenney and SeongJae Park.
Notable changes beyond an overall update in the tooling itself is the
tidying up of spin_is_locked() semantics, which spills over into the
kernel proper as well.
- qspinlock improvements: the locking algorithm now guarantees forward
progress whereas the previous implementation in mainline could starve
threads indefinitely in cmpxchg() loops. Also other related cleanups
to the qspinlock code (Will Deacon)
- misc smaller improvements, cleanups and fixes all across the locking
subsystem
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
locking/rwsem: Simplify the is-owner-spinnable checks
tools/memory-model: Add reference for 'Simplifying ARM concurrency'
tools/memory-model: Update ASPLOS information
MAINTAINERS, tools/memory-model: Update e-mail address for Andrea Parri
tools/memory-model: Fix coding style in 'lock.cat'
tools/memory-model: Remove out-of-date comments and code from lock.cat
tools/memory-model: Improve mixed-access checking in lock.cat
tools/memory-model: Improve comments in lock.cat
tools/memory-model: Remove duplicated code from lock.cat
tools/memory-model: Flag "cumulativity" and "propagation" tests
tools/memory-model: Add model support for spin_is_locked()
tools/memory-model: Add scripts to test memory model
tools/memory-model: Fix coding style in 'linux-kernel.def'
tools/memory-model: Model 'smp_store_mb()'
tools/memory-order: Update the cheat-sheet to show that smp_mb__after_atomic() orders later RMW operations
tools/memory-order: Improve key for SELF and SV
tools/memory-model: Fix cheat sheet typo
tools/memory-model: Update required version of herdtools7
tools/memory-model: Redefine rb in terms of rcu-fence
tools/memory-model: Rename link and rcu-path to rcu-link and rb
...
Linus Torvalds [Mon, 4 Jun 2018 23:31:06 +0000 (16:31 -0700)]
Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
- decode x86 CPER data (Yazen Ghannam)
- ignore unrealistically large option ROMs (Hans de Goede)
- initialize UEFI secure boot state during Xen dom0 boot (Daniel Kiper)
- additional minor tweaks and fixes.
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/capsule-loader: Don't output reset log when reset flags are not set
efi/x86: Ignore unrealistically large option ROMs
efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function
efi: Align efi_pci_io_protocol typedefs to type naming convention
efi/libstub/tpm: Make function efi_retrieve_tpm2_eventlog_1_2() static
efi: Decode IA32/X64 Context Info structure
efi: Decode IA32/X64 MS Check structure
efi: Decode additional IA32/X64 Bus Check fields
efi: Decode IA32/X64 Cache, TLB, and Bus Check structures
efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs
efi: Decode IA32/X64 Processor Error Info Structure
efi: Decode IA32/X64 Processor Error Section
efi: Fix IA32/X64 Processor Error Record definition
efi/cper: Remove the INDENT_SP silliness
x86/xen/efi: Initialize UEFI secure boot state during dom0 boot
Linus Torvalds [Mon, 4 Jun 2018 22:54:04 +0000 (15:54 -0700)]
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
- updates to the handling of expedited grace periods
- updates to reduce lock contention in the rcu_node combining tree
[ These are in preparation for the consolidation of RCU-bh,
RCU-preempt, and RCU-sched into a single flavor, which was
requested by Linus in response to a security flaw whose root cause
included confusion between the multiple flavors of RCU ]
- torture-test updates that save their users some time and effort
- miscellaneous fixes
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
rcu/x86: Provide early rcu_cpu_starting() callback
torture: Make kvm-find-errors.sh find build warnings
rcutorture: Abbreviate kvm.sh summary lines
rcutorture: Print end-of-test state in kvm.sh summary
rcutorture: Print end-of-test state
torture: Fold parse-torture.sh into parse-console.sh
torture: Add a script to edit output from failed runs
rcu: Update list of rcu_future_grace_period() trace events
rcu: Drop early GP request check from rcu_gp_kthread()
rcu: Simplify and inline cpu_needs_another_gp()
rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp()
rcu: Make rcu_start_this_gp() check for out-of-range requests
rcu: Add funnel locking to rcu_start_this_gp()
rcu: Make rcu_start_future_gp() caller select grace period
rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp()
rcu: Clear request other than RCU_GP_FLAG_INIT at GP end
rcu: Cleanup, don't put ->completed into an int
rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs()
rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition
rcu: Make rcu_migrate_callbacks wake GP kthread when needed
...
Linus Torvalds [Mon, 4 Jun 2018 22:50:22 +0000 (15:50 -0700)]
Merge tag 'm68k-for-v4.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- a few time-related fixes:
- off-by-one calendar month on some classes of machines
- Y2038 preparation
- build fix for ndelay() being called with a 64-bit type
- revive 64-bit get_user(), which is used by some Android code
- defconfig updates
- fix for a long-standing fatal bug in iounmap() on '020/030, which was
actually fixed in 2.4.23, but never in 2.5.x and later
- default DMA mask to avoid warning splats
- minor fixes and cleanups
* tag 'm68k-for-v4.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Set default dma mask for platform devices
m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap()
m68k/defconfig: Update defconfigs for v4.17-rc3
m68k/uaccess: Revive 64-bit get_user()
m68k: Implement ndelay() as an inline function to force type checking/casting
zorro: Add a blank line after declarations
m68k: Use read_persistent_clock64() consistently
m68k: Fix off-by-one calendar month
m68k: Fix style, spelling, and grammar in siginfo_build_tests()
m68k/mac: Fix SWIM memory resource end address
Linus Torvalds [Mon, 4 Jun 2018 22:23:48 +0000 (15:23 -0700)]
Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
"This set of changes close the known issues with setting si_code to an
invalid value, and with not fully initializing struct siginfo. There
remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64
and x86 to get the code that generates siginfo into a simpler and more
maintainable state. Most of that work involves refactoring the signal
handling code and thus careful code review.
Also not included is the work to shrink the in kernel version of
struct siginfo. That depends on getting the number of places that
directly manipulate struct siginfo under control, as it requires the
introduction of struct kernel_siginfo for the in kernel things.
Overall this set of changes looks like it is making good progress, and
with a little luck I will be wrapping up the siginfo work next
development cycle"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
signal/sh: Stop gcc warning about an impossible case in do_divide_error
signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions
signal/um: More carefully relay signals in relay_signal.
signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR}
signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code
signal/signalfd: Add support for SIGSYS
signal/signalfd: Remove __put_user from signalfd_copyinfo
signal/xtensa: Use force_sig_fault where appropriate
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
signal/um: Use force_sig_fault where appropriate
signal/sparc: Use force_sig_fault where appropriate
signal/sparc: Use send_sig_fault where appropriate
signal/sh: Use force_sig_fault where appropriate
signal/s390: Use force_sig_fault where appropriate
signal/riscv: Replace do_trap_siginfo with force_sig_fault
signal/riscv: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_mceerr where appropriate
signal/openrisc: Use force_sig_fault where appropriate
signal/nios2: Use force_sig_fault where appropriate
...
Linus Torvalds [Mon, 4 Jun 2018 22:21:19 +0000 (15:21 -0700)]
Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns updates from Eric Biederman:
"This is the last couple of vfs bits to enable root in a user namespace
to mount and manipulate a filesystem with backing store (AKA not a
virtual filesystem like proc, but a filesystem where the unprivileged
user controls the content). The target filesystem for this work is
fuse, and Miklos should be sending you the pull request for the fuse
bits this merge window.
The two key patches are "evm: Don't update hmacs in user ns mounts"
and "vfs: Don't allow changing the link count of an inode with an
invalid uid or gid". Those close small gaps in the vfs that would be a
problem if an unprivileged fuse filesystem is mounted.
The rest of the changes are things that are now safe to allow a root
user in a user namespace to do with a filesystem they have mounted.
The most interesting development is that remount is now safe"
* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
capabilities: Allow privileged user in s_user_ns to set security.* xattrs
fs: Allow superblock owner to access do_remount_sb()
fs: Allow superblock owner to replace invalid owners of inodes
vfs: Allow userns root to call mknod on owned filesystems.
vfs: Don't allow changing the link count of an inode with an invalid uid or gid
evm: Don't update hmacs in user ns mounts
Linus Torvalds [Mon, 4 Jun 2018 21:42:46 +0000 (14:42 -0700)]
Merge tag '4.18-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
- smb3 fixes for stable
- addition of ftrace hooks for cifs.ko
- improvements in compounding and smbdirect (rdma)
* tag '4.18-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (38 commits)
CIFS: Add support for direct pages in wdata
CIFS: Use offset when reading pages
CIFS: Add support for direct pages in rdata
cifs: update multiplex loop to handle compounded responses
cifs: remove header_preamble_size where it is always 0
cifs: remove struct smb2_hdr
CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiry, status STATUS_NETWORK_SESSION_EXPIRED, however the server can also respond with STATUS_USER_SESSION_DELETED in cases where the session has been idle for some time and the server reaps the session to recover resources.
cifs: change smb2_get_data_area_len to take a smb2_sync_hdr as argument
cifs: update smb2_calc_size to use smb2_sync_hdr instead of smb2_hdr
cifs: remove struct smb2_oplock_break_rsp
cifs: remove rfc1002 header from all SMB2 response structures
smb3: on reconnect set PreviousSessionId field
smb3: Add posix create context for smb3.11 posix mounts
smb3: add tracepoints for smb2/smb3 open
cifs: add debug output to show nocase mount option
smb3: add define for id for posix create context and corresponding struct
cifs: update smb2_check_message to handle PDUs without a 4 byte length header
smb3: allow "posix" mount option to enable new SMB311 protocol extensions
smb3: add support for posix negotiate context
cifs: allow disabling less secure legacy dialects
...
Linus Torvalds [Mon, 4 Jun 2018 21:36:38 +0000 (14:36 -0700)]
Merge tag 'gfs2-4.18.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"We've got nine more patches for this merge window.
- remove sd_jheightsize to greatly simplify some code (Andreas
Gruenbacher)
- fix some comments (Andreas)
- fix a glock recursion bug when allocation errors occur (Andreas)
- improve the hole_size function so it returns the entire hole rather
than figuring it out piecemeal (Andreas)
- clean up gfs2_stuffed_write_end to remove a lot of redundancy
(Andreas)
- clarify code with regard to the way ordered writes are processed
(Andreas)
- a bunch of improvements and cleanups of the iomap code to pave the
way for iomap writes, which is a future patch set (Andreas)
- fix a bug where block reservations can run off the end of a bitmap
(Bob Peterson)
- add Andreas to the MAINTAINERS file (Bob Peterson)"
* tag 'gfs2-4.18.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
MAINTAINERS: Add Andreas Gruenbacher as a maintainer for gfs2
gfs2: Iomap cleanups and improvements
gfs2: Remove ordered write mode handling from gfs2_trans_add_data
gfs2: gfs2_stuffed_write_end cleanup
gfs2: hole_size improvement
GFS2: gfs2_free_extlen can return an extent that is too long
GFS2: Fix allocation error bug with recursive rgrp glocking
gfs2: Update find_metapath comment
gfs2: Remove sdp->sd_jheightsize
Linus Torvalds [Mon, 4 Jun 2018 21:34:06 +0000 (14:34 -0700)]
Merge tag 'dlm-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"These three commits fix and clean up the flags dlm was using on its
SCTP sockets. This improves performance and fixes some bad connection
delays"
* tag 'dlm-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: remove O_NONBLOCK flag in sctp_connect_to_sock
dlm: make sctp_connect_to_sock() return in specified time
dlm: fix a clerical error when set SCTP_NODELAY
Linus Torvalds [Mon, 4 Jun 2018 21:29:13 +0000 (14:29 -0700)]
Merge tag 'for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible features:
- added support for the ioctl FS_IOC_FSGETXATTR, per-inode flags,
successor of GET/SETFLAGS; now supports only existing flags:
append, immutable, noatime, nodump, sync
- 3 new unprivileged ioctls to allow users to enumerate subvolumes
- dedupe syscall implementation does not restrict the range to 16MiB,
though it still splits the whole range to 16MiB chunks
- on user demand, rmdir() is able to delete an empty subvolume,
export the capability in sysfs
- fix inode number types in tracepoints, other cleanups
- send: improved speed when dealing with a large removed directory,
measurements show decrease from 2000 minutes to 2 minutes on a
directory with 2 million entries
- pre-commit check of superblock to detect a mysterious in-memory
corruption
- log message updates
Other changes:
- orphan inode cleanup improved, does no keep long-standing
reservations that could lead up to early ENOSPC in some cases
- slight improvement of handling snapshotted NOCOW files by avoiding
some unnecessary tree searches
- avoid OOM when dealing with many unmergeable small extents at flush
time
- speedup conversion of free space tree representations from/to
bitmap/tree
- code refactoring, deletion, cleanups:
+ delayed refs
+ delayed iput
+ redundant argument removals
+ memory barrier cleanups
+ remove a redundant mutex supposedly excluding several ioctls to
run in parallel
- new tracepoints for blockgroup manipulation
- more sanity checks of compressed headers"
* tag 'for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (183 commits)
btrfs: Add unprivileged version of ino_lookup ioctl
btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
btrfs: Add unprivileged ioctl which returns subvolume information
Btrfs: clean up error handling in btrfs_truncate()
btrfs: Factor out write portion of btrfs_get_blocks_direct
btrfs: Factor out read portion of btrfs_get_blocks_direct
btrfs: return ENOMEM if path allocation fails in btrfs_cross_ref_exist
btrfs: raid56: Remove VLA usage
btrfs: return error value if create_io_em failed in cow_file_range
btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshot
btrfs: drop unused parameter qgroup_reserved
btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
btrfs: lift some btrfs_cross_ref_exist checks in nocow path
btrfs: Remove fs_info argument from btrfs_uuid_tree_rem
btrfs: Remove fs_info argument from btrfs_uuid_tree_add
Btrfs: remove unused check of skip_locking
Btrfs: remove always true check in unlock_up
Btrfs: grab write lock directly if write_lock_level is the max level
Btrfs: move get root out of btrfs_search_slot to a helper
Btrfs: use more straightforward extent_buffer_uptodate check
...
Linus Torvalds [Mon, 4 Jun 2018 20:57:43 +0000 (13:57 -0700)]
Merge branch 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio updates from Al Viro:
"Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.
The only thing I'm holding back for a day or so is Adam's aio ioprio -
his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
but let it sit in -next for decency sake..."
* 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
aio: sanitize the limit checking in io_submit(2)
aio: fold do_io_submit() into callers
aio: shift copyin of iocb into io_submit_one()
aio_read_events_ring(): make a bit more readable
aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
aio: take list removal to (some) callers of aio_complete()
aio: add missing break for the IOCB_CMD_FDSYNC case
random: convert to ->poll_mask
timerfd: convert to ->poll_mask
eventfd: switch to ->poll_mask
pipe: convert to ->poll_mask
crypto: af_alg: convert to ->poll_mask
net/rxrpc: convert to ->poll_mask
net/iucv: convert to ->poll_mask
net/phonet: convert to ->poll_mask
net/nfc: convert to ->poll_mask
net/caif: convert to ->poll_mask
net/bluetooth: convert to ->poll_mask
net/sctp: convert to ->poll_mask
net/tipc: convert to ->poll_mask
...
Linus Torvalds [Mon, 4 Jun 2018 20:46:22 +0000 (13:46 -0700)]
Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache lookup cleanups from Al Viro:
"Cleaning ->lookup() instances up - mostly d_splice_alias() conversions"
* 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (29 commits)
switch the rest of procfs lookups to d_splice_alias()
procfs: switch instantiate_t to d_splice_alias()
don't bother with tid_fd_revalidate() in lookups
proc_lookupfd_common(): don't bother with instantiate unless the file is open
procfs: get rid of ancient BS in pid_revalidate() uses
cifs_lookup(): switch to d_splice_alias()
cifs_lookup(): cifs_get_inode_...() never returns 0 with *inode left NULL
9p: unify paths in v9fs_vfs_lookup()
ncp_lookup(): use d_splice_alias()
hfsplus: switch to d_splice_alias()
hfs: don't allow mounting over .../rsrc
hfs: use d_splice_alias()
omfs_lookup(): report IO errors, use d_splice_alias()
orangefs_lookup: simplify
openpromfs: switch to d_splice_alias()
xfs_vn_lookup: simplify a bit
adfs_lookup: do not fail with ENOENT on negatives, use d_splice_alias()
adfs_lookup_byname: .. *is* taken care of in fs/namei.c
romfs_lookup: switch to d_splice_alias()
qnx6_lookup: switch to d_splice_alias()
...
Linus Torvalds [Mon, 4 Jun 2018 20:05:02 +0000 (13:05 -0700)]
Merge tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull fasync fix from Jeff Layton:
"Just a single fix for a deadlock in the fasync handling code that
Kirill observed while testing.
The fix is to change the fa_lock to be rwlock_t, and use a read lock
in kill_fasync_rcu"
* tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fasync: Fix deadlock between task-context and interrupt-context kill_fasync()
Linus Torvalds [Mon, 4 Jun 2018 19:34:27 +0000 (12:34 -0700)]
Merge tag 'docs-4.18' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"There's been a fair amount of work in the docs tree this time around,
including:
- Extensive RST conversions and organizational work in the
memory-management docs thanks to Mike Rapoport.
- An update of Documentation/features from Andrea Parri and a script
to keep it updated.
- Various LICENSES updates from Thomas, along with a script to check
SPDX tags.
- Work to fix dangling references to documentation files; this
involved a fair number of one-liner comment changes outside of
Documentation/
... and the usual list of documentation improvements, typo fixes, etc"
* tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits)
Documentation: document hung_task_panic kernel parameter
docs/admin-guide/mm: add high level concepts overview
docs/vm: move ksm and transhuge from "user" to "internals" section.
docs: Use the kerneldoc comments for memalloc_no*()
doc: document scope NOFS, NOIO APIs
docs: update kernel versions and dates in tables
docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge
docs/vm: transhuge: minor updates
docs/vm: transhuge: change sections order
Documentation: arm: clean up Marvell Berlin family info
Documentation: gpio: driver: Fix a typo and some odd grammar
docs: ranoops.rst: fix location of ramoops.txt
scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode
docs: uio-howto.rst: use a code block to solve a warning
mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback
w1: w1_io.c: fix a kernel-doc warning
Documentation/process/posting: wrap text at 80 cols
docs: admin-guide: add cgroup-v2 documentation
Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'"
Documentation: refcount-vs-atomic: Update reference to LKMM doc.
...
Linus Torvalds [Mon, 4 Jun 2018 19:01:15 +0000 (12:01 -0700)]
swait: strengthen language to discourage use
We already earlier discouraged people from using this interface in
commit 88796e7e5c45 ("sched/swait: Document it clearly that the swait
facilities are special and shouldn't be used"), but I just got a pull
request with a new broken user.
So make the comment *really* clear.
The swait interfaces are bad, and should not be used unless you have
some *very* strong reasons that include tons of hard performance numbers
on just why you want to use them, and you show that you actually
understand that they aren't at all like the normal wait/wakeup
interfaces.
So far, every single user has been suspect. The main user is KVM, which
is completely pointless (there is only ever one waiter, which avoids the
interface subtleties, but also means that having a queue instead of a
pointer is counter-productive and certainly not an "optimization").
So make the comments much stronger.
Not that anybody likely reads them anyway, but there's always some
slight hope that it will cause somebody to think twice.
I'd like to remove this interface entirely, but there is the theoretical
possibility that it's actually the right thing to use in some situation,
most likely some deep RT use.
Linus Torvalds [Mon, 4 Jun 2018 18:38:16 +0000 (11:38 -0700)]
Merge tag 'regmap-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"This is another quiet release for regmap, there's one minor feature
improvement for the recently added slimbus support and a few minor
fixes and cleanups"
* tag 'regmap-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: slimbus: allow register offsets up to 16 bits
regmap: add missing prototype for devm_init_slimbus
regmap: Skip clk_put for attached clocks when freeing context
regmap: include <linux/ktime.h> from include/linux/regmap.h
Linus Torvalds [Mon, 4 Jun 2018 18:34:06 +0000 (11:34 -0700)]
Merge tag 'spi-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"Quite a busy release for SPI, mainly as a result of Boris Brezillon's
work on improving the integration with MTD for accelerated SPI flash
controllers. He's added a new spi_mem interface which works a lot
better with general hardware and converted the users over to it, as a
result of this work we've got some MTD changes in here as well.
Other highlights include:
- Lots of spring cleaning for the s3c64xx driver.
- Removal of the bcm53xx, the hardware is also supported by the mspi
driver but SoC naming had caused people to miss the duplication.
- Conversion of the pxa2xx driver to use the standard message
processing loop rather than open coding.
- A bunch of improvements to the runtime PM of the OMAP McSPI driver"
* tag 'spi-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (47 commits)
spi: Fix typo on SPI_MEM help text
spi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCAC
mtd: devices: m25p80: Use spi_mem_set_drvdata() instead of spi_set_drvdata()
spi: omap2-mcspi: Remove unnecessary pm_runtime_force_suspend()
spi: Add missing pm_runtime_put_noidle() after failed get
spi: ti-qspi: Make sure res_mmap != NULL before dereferencing it
spi: spi-s3c64xx: Fix system resume support
spi: bcm-qspi: Fix build failure caused by spi_flash_read() API removal
spi: Get rid of the spi_flash_read() API
mtd: spi-nor: Use the spi_mem_xx() API
spi: ti-qspi: Implement the spi_mem interface
spi: bcm-qspi: Implement the spi_mem interface
spi: Make support for regular transfers optional when ->mem_ops != NULL
spi: Extend the core to ease integration of SPI memory controllers
spi: remove forgotten CONFIG_SPI_BCM53XX
spi: remove the older/duplicated bcm53xx driver
spi: pxa2xx: check clk_prepare_enable() return value
spi: lpspi: Switch to SPDX identifier
spi: mxs: Switch to SPDX identifier
spi: imx: Switch to SPDX identifier
...
Linus Torvalds [Mon, 4 Jun 2018 18:28:58 +0000 (11:28 -0700)]
Merge tag 'chrome-platform-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform
Pull chrome platform updates from Benson Leung:
- further changes from Dmitry related to the removal of platform data
from atmel_mxt_ts and chromeos_laptop.
This time, we have some changes that teach chromeos_laptop how to
supply acpi properties for some input devices so that the peripheral
driver doesn't have to do dmi matching on some Chromebook platforms.
- new Chromebook Tablet switch driver, which is useful for x86
convertible Chromebooks.
- other misc cleanup
* tag 'chrome-platform-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
platform/chrome: Use to_cros_ec_dev more broadly
platform/chrome: chromeos_laptop: fix touchpad button mapping on Celes
platform: chrome: Add input dependency for tablet switch driver
platform/chrome: chromeos_laptop - supply properties for ACPI devices
platform/chrome: chromeos_tbmc - add SPDX identifier
platform: chrome: Add Tablet Switch ACPI driver
platform/chrome: cros_ec_lpc: do not try DMI match when ACPI device found
Linus Torvalds [Mon, 4 Jun 2018 18:25:15 +0000 (11:25 -0700)]
Merge tag 'hwmon-for-linus-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
- asus_atk0110 driver modified to use new API
- k10temp supports new CPUs and reports both Tctl and Tdie
- minor fixes in gpio-fan, ltc2990, fschmd, and mc13783 drivers
* tag 'hwmon-for-linus-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (asus_atk0110) Make use of device managed memory
hwmon: (asus_atk0110) Replace deprecated device register call
hwmon: (k10temp) Make function get_raw_temp static
hwmon: (gpio-fan) Fix "#cooling-cells" property name in bindings
MAINTAINERS: hwmon: Add Documentation/devicetree/bindings/hwmon
hwmon: (ltc2990) support all measurement modes
hwmon: (ltc2990) add devicetree binding
hwmon: (ltc2990) Fix incorrect conversion of negative temperatures
hwmon: (core) check parent dev != NULL when chip != NULL
hwmon: (fschmd) fix typo 'can by' to 'can be'
hwmon: (k10temp) Display both Tctl and Tdie
hwmon: (k10temp) Add support for Stoney Ridge and Bristol Ridge CPUs
hwmon: MC13783: Add uid and die temperature sensor inputs
Jianchao Wang [Mon, 4 Jun 2018 09:03:55 +0000 (17:03 +0800)]
blk-mq: return when hctx is stopped in blk_mq_run_work_fn
If a hardware queue is stopped, it should not be run again before
explicitly started. Ignore stopped queues in blk_mq_run_work_fn(),
fixing a regression recently introduced when the START_ON_RUN bit
was removed.
Fixes: 15fe8a90bb45 ("blk-mq: remove blk_mq_delay_queue()") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Mon, 4 Jun 2018 17:58:12 +0000 (10:58 -0700)]
Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- replace the force_dma flag with a dma_configure bus method. (Nipun
Gupta, although one patch is іncorrectly attributed to me due to a
git rebase bug)
- use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)
- remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
right thing for bounce buffering.
- move dma-debug initialization to common code, and apply a few
cleanups to the dma-debug code.
- cleanup the Kconfig mess around swiotlb selection
- swiotlb comment fixup (Yisheng Xie)
- a trivial swiotlb fix. (Dan Carpenter)
- support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)
- add a new generic dma-noncoherent dma_map_ops implementation and use
it for arc, c6x and nds32.
- improve scatterlist validity checking in dma-debug. (Robin Murphy)
- add a struct device quirk to limit the dma-mask to 32-bit due to
bridge/system issues, and switch x86 to use it instead of a local
hack for VIA bridges.
- handle devices without a dma_mask more gracefully in the dma-direct
code.
* tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits)
dma-direct: don't crash on device without dma_mask
nds32: use generic dma_noncoherent_ops
nds32: implement the unmap_sg DMA operation
nds32: consolidate DMA cache maintainance routines
x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
x86/pci-dma: remove the explicit nodac and allowdac option
x86/pci-dma: remove the experimental forcesac boot option
Documentation/x86: remove a stray reference to pci-nommu.c
core, dma-direct: add a flag 32-bit dma limits
dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
dma-debug: check scatterlist segments
c6x: use generic dma_noncoherent_ops
arc: use generic dma_noncoherent_ops
arc: fix arc_dma_{map,unmap}_page
arc: fix arc_dma_sync_sg_for_{cpu,device}
arc: simplify arc_dma_sync_single_for_{cpu,device}
dma-mapping: provide a generic dma-noncoherent implementation
dma-mapping: simplify Kconfig dependencies
riscv: add swiotlb support
riscv: only enable ZONE_DMA32 for 64-bit
...
Linus Torvalds [Mon, 4 Jun 2018 17:14:28 +0000 (10:14 -0700)]
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Misc bits and pieces not fitting into anything more specific"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: delete unnecessary assignment in vfs_listxattr
Documentation: filesystems: update filesystem locking documentation
vfs: namei: use path_equal() in follow_dotdot()
fs.h: fix outdated comment about file flags
__inode_security_revalidate() never gets NULL opt_dentry
make xattr_getsecurity() static
vfat: simplify checks in vfat_lookup()
get rid of dead code in d_find_alias()
it's SB_BORN, not MS_BORN...
msdos_rmdir(): kill BS comment
remove rpc_rmdir()
fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
Linus Torvalds [Mon, 4 Jun 2018 16:53:33 +0000 (09:53 -0700)]
Merge branch 'work.rmdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull rmdir update from Al Viro:
"More shrink_dcache_parent()-related stuff - killing the main source of
potentially contended calls of that on large subtrees"
* 'work.rmdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rmdir(),rename(): do shrink_dcache_parent() only on success
Linus Torvalds [Mon, 4 Jun 2018 15:57:36 +0000 (08:57 -0700)]
Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache updates from Al Viro:
"This is the first part of dealing with livelocks etc around
shrink_dcache_parent()."
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
restore cond_resched() in shrink_dcache_parent()
dput(): turn into explicit while() loop
dcache: move cond_resched() into the end of __dentry_kill()
d_walk(): kill 'finish' callback
d_invalidate(): unhash immediately
- prep work for bcachefs and some block layer optimizations (Kent).
- convert users of bio_sets to using embedded structs (Kent).
- fixes for the BFQ io scheduler (Paolo/Davide/Filippo)
- lightnvm fixes and improvements (Matias, with contributions from Hans
and Javier)
- adding discard throttling to blk-wbt (me)
- sbitmap blk-mq-tag handling (me/Omar/Ming).
- remove the sparc jsflash block driver, acked by DaveM.
- Kyber scheduler improvement from Jianchao, making it more friendly
wrt merging.
- conversion of symbolic proc permissions to octal, from Joe Perches.
Previously the block parts were a mix of both.
- nbd fixes (Josef and Kevin Vigor)
- unify how we handle the various kinds of timestamps that the block
core and utility code uses (Omar)
- three NVMe pull requests from Keith and Christoph, bringing AEN to
feature completeness, file backed namespaces, cq/sq lock split, and
various fixes
- various little fixes and improvements all over the map
* tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits)
blk-mq: update nr_requests when switching to 'none' scheduler
block: don't use blocking queue entered for recursive bio submits
dm-crypt: fix warning in shutdown path
lightnvm: pblk: take bitmap alloc. out of critical section
lightnvm: pblk: kick writer on new flush points
lightnvm: pblk: only try to recover lines with written smeta
lightnvm: pblk: remove unnecessary bio_get/put
lightnvm: pblk: add possibility to set write buffer size manually
lightnvm: fix partial read error path
lightnvm: proper error handling for pblk_bio_add_pages
lightnvm: pblk: fix smeta write error path
lightnvm: pblk: garbage collect lines with failed writes
lightnvm: pblk: rework write error recovery path
lightnvm: pblk: remove dead function
lightnvm: pass flag on graceful teardown to targets
lightnvm: pblk: check for chunk size before allocating it
lightnvm: pblk: remove unnecessary argument
lightnvm: pblk: remove unnecessary indirection
lightnvm: pblk: return NVM_ error on failed submission
lightnvm: pblk: warn in case of corrupted write buffer
...
Clean up gfs2_iomap_alloc and gfs2_iomap_get. Document how
gfs2_iomap_alloc works: it now needs to be called separately after
gfs2_iomap_get where necessary; this will be used later by iomap write.
Move gfs2_iomap_ops into bmap.c.
Introduce a new gfs2_iomap_get_alloc helper and use it in
fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
with proper iomap write support.
In gfs2_block_map and fallocate_chunk, zero-initialize struct iomap.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
gfs2: Remove ordered write mode handling from gfs2_trans_add_data
In journaled data mode, we need to add each buffer head to the current
transaction. In ordered write mode, we only need to add the inode to
the ordered inode list. So far, both cases are handled in
gfs2_trans_add_data. This makes the code look misleading and is
inefficient for small block sizes as well. Handle both cases separately
instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reimplement function hole_size based on a generic function for walking
the metadata tree and rename hole_size to gfs2_hole_size. While
previously, multiple invocations of hole_size were sometimes needed to
walk across the entire hole, the new implementation always returns the
entire hole at once (provided that the caller is interested in the total
size).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Bob Peterson [Sat, 2 Jun 2018 03:09:50 +0000 (22:09 -0500)]
GFS2: gfs2_free_extlen can return an extent that is too long
Function gfs2_free_extlen calculates the length of an extent of
free blocks that may be reserved. The end pointer was calculated as
end = start + bh->b_size but b_size is incorrect because the
bitmap usually stops prior to the end of the buffer data on
the last bitmap.
What this means is that when you do a write, you can reserve a
chunk of blocks that runs off the end of the last bitmap. For
example, I've got a file system where there is only one bitmap
for each rgrp, so ri_length==1. I saw cases in which iozone
tried to do a big write, grabbed a large block reservation,
chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188.
So 5464153 + 8188 = 5472341 which is the end of the rgrp.
When it grabbed a reservation it got back: 5470936, length 7229.
But 5470936 + 7229 = 5478165. So the reservation starts inside
the rgrp but runs 5824 blocks past the end of the bitmap.
This patch fixes the calculation so it won't exceed the last
bitmap. It also adds a BUG_ON to guard against overflows in the
future.
GFS2: Fix allocation error bug with recursive rgrp glocking
Before this patch function gfs2_write_begin, upon discovering an
error, called gfs2_trim_blocks while the rgrp glock was still held.
That's because gfs2_inplace_release is not called until later.
This patch reorganizes the logic a bit so gfs2_inplace_release
is called to release the lock prior to the call to gfs2_trim_blocks,
thus preventing the glock recursion.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* pm-cpufreq: (25 commits)
dt-bindings: cpufreq: Document operating-points-v2-kryo-cpu
cpufreq: Add Kryo CPU scaling driver
cpufreq: Use static SRCU initializer
kernel/SRCU: provide a static initializer
cpufreq: Fix new policy initialization during limits updates via sysfs
cpufreq: tegra20: Wrap cpufreq into platform driver
cpufreq: tegra20: Allow cpufreq driver to be built as loadable module
cpufreq: tegra20: Check if this is Tegra20 machine
cpufreq: tegra20: Remove unneeded variable initialization
cpufreq: tegra20: Remove unnecessary parentheses
cpufreq: tegra20: Remove unneeded check in tegra_cpu_init
cpufreq: tegra20: Release clocks properly
cpufreq: tegra20: Remove EMC clock usage
cpufreq: tegra20: Clean up included headers
cpufreq: tegra20: Clean up whitespaces in the code
cpufreq: tegra20: Change module description
Revert "cpufreq: rcar: Add support for R8A7795 SoC"
Revert "cpufreq: dt: Add r8a7796 support to to use generic cpufreq driver"
cpufreq: intel_pstate: allow trace in passive mode
cpufreq: optimize cpufreq_notify_transition()
...
* pm-opp: (24 commits)
PM / Domains: Drop unused parameter in genpd_allocate_dev_data()
PM / Domains: Drop genpd as in-param for pm_genpd_remove_device()
PM / Domains: Drop __pm_genpd_add_device()
PM / Domains: Drop extern declarations of functions in pm_domain.h
PM / domains: Add perf_state attribute to genpd debugfs
OPP: Allow same OPP table to be used for multiple genpd
PM / Domain: Return 0 on error from of_genpd_opp_to_performance_state()
PM / OPP: Fix shared OPP table support in dev_pm_opp_register_set_opp_helper()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_regulators()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_prop_name()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_supported_hw()
PM / OPP: silence an uninitialized variable warning
PM / OPP: Remove dev_pm_opp_{un}register_get_pstate_helper()
PM / OPP: Get performance state using genpd helper
PM / Domain: Implement of_genpd_opp_to_performance_state()
PM / Domain: Add support to parse domain's OPP table
PM / Domain: Add struct device to genpd
PM / OPP: Implement dev_pm_opp_get_of_node()
PM / OPP: Implement of_dev_pm_opp_find_required_opp()
PM / OPP: Implement dev_pm_opp_of_add_table_indexed()
...
* pm-domains:
PM / domains: Improve wording of dev_pm_domain_attach() comment
PM / Domains: Don't return -EEXIST at attach when PM domain exists
spi: Respect all error codes from dev_pm_domain_attach()
soundwire: Respect all error codes from dev_pm_domain_attach()
mmc: sdio: Respect all error codes from dev_pm_domain_attach()
i2c: Respect all error codes from dev_pm_domain_attach()
driver core: Respect all error codes from dev_pm_domain_attach()
amba: Respect all error codes from dev_pm_domain_attach()
PM / Domains: Allow a better error handling of dev_pm_domain_attach()
PM / Domains: Check for existing PM domain in dev_pm_domain_attach()
PM / Domains: Drop redundant code in genpd while attaching devices
PM / Domains: Drop comment in genpd about legacy Samsung DT binding
PM / Domains: Fix error path during attach in genpd
* pm-qos:
PM / QoS: Drop redundant declaration of pm_qos_get_value()
* pm-core:
PM / runtime: Drop usage count for suppliers at device link removal
PM / runtime: Fixup reference counting of device link suppliers at probe
PM: wakeup: Use pr_debug() for the "aborting suspend" message
PM / core: Drop unused internal inline functions for sysfs
PM / core: Drop unused internal functions for pm_qos sysfs
PM / core: Drop unused internal inline functions for wakeirqs
PM / core: Drop internal unused inline functions for wakeups
PM / wakeup: Only update last time for active wakeup sources
PM / wakeup: Use seq_open() to show wakeup stats
PM / core: Use dev_printk() and symbols in suspend/resume diagnostics
PM / core: Simplify initcall_debug_report() timing
PM / core: Remove unused initcall_debug_report() arguments
PM / core: fix deferred probe breaking suspend resume order
Linus Torvalds [Sun, 3 Jun 2018 18:01:28 +0000 (11:01 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
- fix io_destroy()/aio_complete() race
- the vfs_open() change to get rid of open_check_o_direct() boilerplate
was nice, but buggy. Al has a patch avoiding a revert, but that's
definitely not a last-day fodder, so for now revert it is...
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Revert "fs: fold open_check_o_direct into do_dentry_open"
fix io_destroy()/aio_complete() race
error = vfs_open(path, f, cred);
if (!error) {
/* from now on we need fput() to dispose of f */
error = open_check_o_direct(f);
if (error) {
fput(f);
f = ERR_PTR(error);
}
} else {
put_filp(f);
f = ERR_PTR(error);
}
and sure, having that open_check_o_direct() boilerplate gotten rid of is
nice, but not that way...
Worse, another call chain (via finish_open()) is FUBAR now wrt
FILE_OPENED handling - in that case we get error returned, with file
already hit by fput() *AND* FILE_OPENED not set. Guess what happens in
path_openat(), when it hits
if (!(opened & FILE_OPENED)) {
BUG_ON(!error);
put_filp(file);
}
The root cause of all that crap is that the callers of do_dentry_open()
have no way to tell which way did it fail; while that could be fixed up
(by passing something like int *opened to do_dentry_open() and have it
marked if we'd called ->open()), it's probably much too late in the
cycle to do so right now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Sun, 3 Jun 2018 17:11:38 +0000 (19:11 +0200)]
Merge tag 'perf-urgent-for-mingo-4.17-20180602' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Carvalho de Melo:
- Update prctl and cpufeatures.h tools/ copies with the kernel sources
originals, which makes 'perf trace' know about the new prctl options
for speculation control and silences the build warnings (Arnaldo Carvalho de Melo)
- Update insn.h in Intel-PT instruction decoder with its original from from the
kernel sources, to silence build warnings, no effect on the actual tools this
time around (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 3 Jun 2018 16:01:41 +0000 (09:01 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
- two patches addressing the problem that the scheduler allows under
certain conditions user space tasks to be scheduled on CPUs which are
not yet fully booted which causes a few subtle and hard to debug
issue
- add a missing runqueue clock update in the deadline scheduler which
triggers a warning under certain circumstances
- fix a silly typo in the scheduler header file
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/headers: Fix typo
sched/deadline: Fix missing clock update
sched/core: Require cpu_active() in select_task_rq(), for user tasks
sched/core: Fix rules for running on online && !active CPUs
Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat utility updates for v4.18 from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (65 commits)
tools/power turbostat: update version number
tools/power turbostat: Add Node in output
tools/power turbostat: add node information into turbostat calculations
tools/power turbostat: remove num_ from cpu_topology struct
tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
tools/power turbostat: track thread ID in cpu_topology
tools/power turbostat: Calculate additional node information for a package
tools/power turbostat: Fix node and siblings lookup data
tools/power turbostat: set max_num_cpus equal to the cpumask length
tools/power turbostat: if --num_iterations, print for specific number of iterations
tools/power turbostat: Add Cannon Lake support
tools/power turbostat: delete duplicate #defines
x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: add POLL and POLL% column
tools/power turbostat: Fix --hide Pk%pc10
tools/power turbostat: Build-in "Low Power Idle" counters support
tools/power turbostat: Don't make man pages executable
tools/power turbostat: remove blank lines
tools/power turbostat: a small C-states dump readability immprovement
...
Ming Lei [Sat, 2 Jun 2018 07:18:09 +0000 (15:18 +0800)]
blk-mq: update nr_requests when switching to 'none' scheduler
Now we setup q->nr_requests when switching to one new scheduler,
but not do it for 'none', then q->nr_requests may not be correct
for 'none'.
This patch fixes this issue by always updating 'nr_requests' when
switching to 'none'.
Cc: Marco Patalano <mpatalan@redhat.com> Cc: "Ewan D. Milne" <emilne@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 2 Jun 2018 20:04:07 +0000 (14:04 -0600)]
block: don't use blocking queue entered for recursive bio submits
If we end up splitting a bio and the queue goes away between
the initial submission and the later split submission, then we
can block forever in blk_queue_enter() waiting for the reference
to drop to zero. This will never happen, since we already hold
a reference.
Mark a split bio as already having entered the queue, so we can
just use the live non-blocking queue enter variant.
Kent Overstreet [Sat, 2 Jun 2018 17:45:04 +0000 (13:45 -0400)]
dm-crypt: fix warning in shutdown path
The counter for the number of allocated pages includes pages in the
mempool's reserve, so checking that the number of allocated pages is 0
needs to happen after we exit the mempool.
Fixes: 6f1c819c219f ("dm: convert to bioset_init()/mempool_init()") Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reported-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Mike Snitzer <snitzer@redhat.com>
Fixed to always just use percpu_counter_sum()
1) Infinite loop in _decode_session6(), from Eric Dumazet.
2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
Dumazet.
3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.
4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
Makita.
5) Incorrect idr release in cls_flower, from Paul Blakey.
6) Probe error handling fix in davinci_emac, from Dan Carpenter.
7) Memory leak in XPS configuration, from Alexander Duyck.
8) Use after free with cloned sockets in kcm, from Kirill Tkhai.
9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
Dichtel.
10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
bpf: fix uapi hole for 32 bit compat applications
net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
ip6_tunnel: remove magic mtu value 0xFFF8
ip_tunnel: restore binding to ifaces with a large mtu
net: dsa: b53: Add BCM5389 support
kcm: Fix use-after-free caused by clonned sockets
net-sysfs: Fix memory leak in XPS configuration
ixgbe: fix parsing of TC actions for HW offload
net: ethernet: davinci_emac: fix error handling in probe()
net/ncsi: Fix array size in dumpit handler
cls_flower: Fix incorrect idr release when failing to modify rule
net/sonic: Use dma_mapping_error()
xfrm Fix potential error pointer dereference in xfrm_bundle_create.
vhost_net: flush batched heads before trying to busy polling
tun: Fix NULL pointer dereference in XDP redirect
be2net: Fix error detection logic for BE3
net: qmi_wwan: Add Netgear Aircard 779S
mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
atm: zatm: fix memcmp casting
iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
...
Long Li [Wed, 30 May 2018 19:47:56 +0000 (12:47 -0700)]
CIFS: Add support for direct pages in wdata
Add a function to allocate wdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages that
point to the data buffer to write to.
wdata is reponsible for free those pages after it's done.
Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
Long Li [Wed, 30 May 2018 19:47:54 +0000 (12:47 -0700)]
CIFS: Add support for direct pages in rdata
Add a function to allocate rdata without allocating pages for data
transfer. This gives the caller an option to pass a number of pages
that point to the data buffer.
rdata is still reponsible for free those pages after it's done.
Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>