git.proxmox.com Git - mirror_ubuntu-kernels.git/log

Merge tag 'pm-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"This time the majority of changes go to the cpufreq subsystem (and to
  the intel_pstate driver in particular) and there are some updates in
  the generic power domains framework, cpuidle, tools and a couple of
  other places.

  One thing worth mentioning is that the intel_pstate's sysfs interface
  has been reworked to be more consistent with the general expectations
  of the cpufreq core and less confusing, hopefully for the better.
  Also, we have a new cpufreq driver for Tegra186 and new hardware
  support in intel_pstata and the Mediatek cpufreq driver.

  Apart from that, the AnalyzeSuspend utility for system suspend
  profiling gets a companion called AnalyzeBoot for the analogous
  profiling of system boot and they both go into one place under
  tools/power/pm-graph/.

  The rest is mostly fixes, cleanups and code reorganization.

  Specifics:

   - Rework the intel_pstate driver's sysfs interface to make it more
     straightforward and more intuitive (Rafael Wysocki).

   - Make intel_pstate support all processors which advertise HWP
     (hardware-managed P-states) to the kernel in all operation modes
     and make it use the load-based P-state selection algorithm on a
     wider range of systems in the active mode (Rafael Wysocki).

   - Add cpufreq driver for Tegra186 (Mikko Perttunen).

   - Add support for Gemini Lake SoCs to intel_pstate (David Box).

   - Add support for MT8176 and MT817x to the Mediatek cpufreq driver
     and clean up that driver a bit (Daniel Kurtz).

   - Clean up intel_pstate and optimize it slightly (Rafael Wysocki).

   - Update the schedutil cpufreq governor, mostly to fix a couple of
     issues with it related to specific workloads, and rework its sysfs
     tunable and initialization a bit (Rafael Wysocki, Viresh Kumar).

   - Fix minor issues in the imx6q, dbx500 and qoriq cpufreq drivers
     (Christophe Jaillet, Irina Tirdea, Leonard Crestez, Viresh Kumar,
     YuanTian Tang).

   - Add file patterns for cpufreq DT bindings to MAINTAINERS (Geert
     Uytterhoeven).

   - Add support for "always on" power domains to the genpd (generic
     power domains) framework and clean up that code somewhat (Ulf
     Hansson, Lina Iyer, Viresh Kumar).

   - Fix minor issues in the powernv cpuidle driver and clean it up
     (Anton Blanchard, Gautham Shenoy).

   - Move the AnalyzeSuspend utility under tools/power/pm-graph/ and add
     an analogous boot-profiling utility called AnalyzeBoot to it (Todd
     Brandt).

   - Add rk3328 support to the rockchip-io AVS (Adaptive Voltage
     Scaling) driver (David Wu).

   - Fix minor issues in the cpuidle core, the intel_pstate_tracer
     utility, the devfreq framework and the PM core documentation
     (Chanwoo Choi, Doug Smythies, Johan Hovold, Marcin Nowakowski)"

* tag 'pm-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits)
  PM / runtime: Document autosuspend-helper side effects
  PM / runtime: Fix autosuspend documentation
  tools: power: pm-graph: Package makefile and man pages
  tools: power: pm-graph: AnalyzeBoot v2.0
  tools: power: pm-graph: AnalyzeSuspend v4.6
  cpufreq: Add Tegra186 cpufreq driver
  cpufreq: imx6q: Fix error handling code
  cpufreq: imx6q: Set max suspend_freq to avoid changes during suspend
  cpufreq: imx6q: Fix handling EPROBE_DEFER from regulator
  cpuidle: powernv: Avoid a branch in the core snooze_loop() loop
  cpuidle: powernv: Don't continually set thread priority in snooze_loop()
  cpuidle: powernv: Don't bounce between low and very low thread priority
  cpuidle: cpuidle-cps: remove unused variable
  tools/power/x86/intel_pstate_tracer: Adjust directory ownership
  cpufreq: schedutil: Use policy-dependent transition delays
  cpufreq: schedutil: Reduce frequencies slower
  PM / devfreq: Move struct devfreq_governor to devfreq directory
  PM / Domains: Ignore domain-idle-states that are not compatible
  cpufreq: intel_pstate: Add support for Gemini Lake
  powernv-cpuidle: Validate DT property array size
  ...

Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:
"Nothing major. Two notable fixes are Li's second stab at fixing the
  long-standing race condition in the mount path and suppression of
  spurious warning from cgroup_get(). All other changes are trivial"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: mark cgroup_get() with __maybe_unused
  cgroup: avoid attaching a cgroup root to two different superblocks, take 2
  cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()
  cgroup: move cgroup_subsys_state parent field for cache locality
  cpuset: Remove cpuset_update_active_cpus()'s parameter.
  cgroup: switch to BUG_ON()
  cgroup: drop duplicate header nsproxy.h
  kernel: convert css_set.refcount from atomic_t to refcount_t
  kernel: convert cgroup_namespace.count from atomic_t to refcount_t

Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue update from Tejun Heo:
"One trivial patch to use setup_deferrable_timer() instead of
open-coding the initialization"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: use setup_deferrable_timer

Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata updates from Tejun Heo:
"The biggest core change is removal of SCT WRITE SAME support, which
  never worked properly.

  Other than that, trivial updates in core code and specific embedded
  driver updates"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: remove SCT WRITE SAME support
  libata: reject passthrough WRITE SAME requests
  dt-bindings: ata: add DT bindings for ahci-dm816 SATA controller
  ata: ahci: add support for DaVinci DM816 SATA controller
  pata: remove the at91 driver
  libata: make ata_sg_clean static over again
  libata: use setup_deferrable_timer
  ata: allow subsystem to be used on m32r and s390 archs
  Delete redundant return value check of platform_get_resource()
  ata: constify of_device_id structures

Merge tag 'leds_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED updates from Jacek Anaszewski:
"New drivers:

   - add LED support for MT6323 PMIC

   - add LED support for Motorola CPCAP PMIC

  New features and improvements:

   - add LED trigger for all CPUs aggregated which is useful on tiny
     boards with more CPU cores than LED pins

   - add OF variants of LED registering functions as a preparation for
     adding generic support for Device Tree parsing

   - dell-led improvements and cleanups, followed by moving it to the
     x86 platform driver subsystem which is a more appropriate place for
     it

   - extend pca9532 Device Tree support by adding the LEDs
     'default-state' property

   - extend pca963x Device Tree support by adding nxp,inverted-out
     property for inverting the polarity of the output

   - remove ACPI support for lp3952 since it relied on a non-official
     ACPI IDs"

* tag 'leds_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: pca9532: Extend pca9532 device tree support
  leds: cpcap: new driver
  mfd: cpcap: Add missing include dependencies
  leds: lp3952: Use 'if (ret)' pattern
  leds: lp3952: Remove ACPI support for lp3952
  leds: mt6323: Fix an off by one bug in probe
  dt-bindings: leds: Add document bindings for leds-mt6323
  leds: Add LED support for MT6323 PMIC
  leds: gpio: use OF variant of LED registering function
  leds: core: add OF variants of LED registering functions
  platform/x86: dell-wmi-led: fix coding style issues
  dell-led: move driver to drivers/platform/x86/dell-wmi-led.c
  dell-led: remove code related to mic mute LED
  platform/x86: dell-laptop: import dell_micmute_led_set() from drivers/leds/dell-led.c
  ALSA: hda - rename dell_led_set_func to dell_micmute_led_set_func
  ALSA: hda - use dell_micmute_led_set() instead of dell_app_wmi_led_set()
  dell-led: remove GUID check from dell_micmute_led_set()
  leds/trigger/cpu: Add LED trigger for all CPUs aggregated

Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration

Pull mailbox updates from Jassi Brar:

- new driver for Broadcom FlexRM controller

- constify data structures of callback functions in some drivers

- a few bug fixes uncovered by multi-threaded use of mailbox channels
   in blocking mode

* 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: handle empty message in tx_tick
  mailbox: skip complete wait event if timer expired
  mailbox: always wait in mbox_send_message for blocking Tx mode
  mailbox: Remove depends on COMPILE_TEST for BCM_FLEXRM_MBOX
  mailbox: check ->last_tx_done for NULL in case of timer-based polling
  dt-bindings: Add DT bindings info for FlexRM ring manager
  mailbox: Add driver for Broadcom FlexRM ring manager
  dt-bindings: mailbox: Update doc with NSP PDC/mailbox support
  mailbox: bcm-pdc: Add Northstar Plus support to PDC driver
  mailbox: constify mbox_chan_ops structures

Merge tag 'for-linux-4.12' of git://github.com/cminyard/linux-ipmi

Pull IPMI updates from Corey Minyard:
"A few fixes of things in the IPMI area, the watchdog would have issues
  at panic time cause by a recently introduced change, a problem with
  device numbering, one possible panic in the I2C driver (destined for
  stable).

  Nothing earth-shattering, but some things that need to go in"

* tag 'for-linux-4.12' of git://github.com/cminyard/linux-ipmi:
  ipmi/watchdog: fix wdog hang on panic waiting for ipmi response
  ipmi_si: use smi_num for init_name
  ipmi: bt-bmc: Add ast2500 compatible string
  ACPI / IPMI: change warning to debug on timeout
  ACPI / IPMI: allow ACPI_IPMI with IPMI_SSIF
  ipmi_ssif: use setup_timer
  ipmi: Fix kernel panic at ipmi_ssif_thread()

Merge tag 'hsi-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi

Pull HSI fix from Sebastian Reichel:
"Fix double free fix in ssi-protocol"

* tag 'hsi-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
HSI: ssi_protocol: double free in ssip_pn_xmit()

Merge tag 'for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply and reset updates from Sebastian Reichel:
"New drivers:
   - gemini-poweroff
   - cpcap-charger (for Motorola Droid 4)
   - battery-lego-ev3 (for LEGO Mindstorms EV3)

  New chip/feature support:
   - bq24190-charger: add runtime PM support
   - bq24190-charger: add bq24192i support
   - register masking for syscon-poweroff

  ... and misc small fixes & cleanups

* tag 'for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (29 commits)
  power: supply: bq24190_charger: Use new extcon_register_notifier_all()
  power: supply: bq24190_charger: Longer delay while polling reset flag
  power: supply: bq24190_charger: Uniform pm_runtime_get() failure handling
  power: supply: bq24190_charger: Clean up extcon code
  power: supply: bq24190_charger: Limit over/under voltage fault logging
  power: supply: New driver for LEGO MINDSTORMS EV3 battery
  dt-bindings: power: supply: New bindings for LEGO MINDSTORMS EV3 battery
  power: supply: tps65217: remove debug messages for function calls
  power: supply: ltc2941-battery-gauge: Add OF device ID table
  power: supply: ltc2941-battery-gauge: Add vendor to compatibles in binding
  power: supply: charger-manager: simplify return statements
  power: supply: lp8788: prevent out of bounds array access
  power: supply: cpcap-charger: Add minimal CPCAP PMIC battery charger
  power: supply: bq24190_charger: Use extcon to determine ilimit, 5v boost
  power: supply: bq24190_charger: Add support for bq24192i
  power: supply: bq24190_charger: Use i2c-core irq-mapping code
  power: bq24190_charger: mark PM functions as __maybe_unused
  power: supply: sbs-charger: simplified bool function
  power: supply: ab8500: Replaced spaces with tabs in indent
  power: supply: bq25890: Use gpiod_get()
  ...

cgroup: mark cgroup_get() with __maybe_unused

a590b90d472f ("cgroup: fix spurious warnings on cgroup_is_dead() from
cgroup_sk_alloc()") converted most cgroup_get() usages to
cgroup_get_live() leaving cgroup_sk_alloc() the sole user of
cgroup_get(). When !CONFIG_SOCK_CGROUP_DATA, this ends up triggering
unused warning for cgroup_get().

Silence the warning by adding __maybe_unused to cgroup_get().

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20170501145340.17e8ef86@canb.auug.org.au
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'hwmon-for-linus-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon updates from Guenter Roeck:

- removed twl4030-madc driver

- added ASPEED PWM/fan driver

- various minor improvements and fixes in several drivers

* tag 'hwmon-for-linus-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (36 commits)
  hwmon: (twl4030-madc) drop driver
  hwmon: (tmp103) Use SIMPLE_DEV_PM_OPS helper macro
  hwmon: (adt7475) set start bit in probe
  hwmon: (ina209) Handled signed registers
  hwmon: (lm87) Add OF device ID table
  hwmon: (lm87) Remove unused I2C devices driver_data
  drivers: hwmon: Support for ASPEED PWM/Fan tach
  Documentation: dt-bindings: Document bindings for ASPEED AST2400/AST2500 PWM and Fan tach controller device driver
  hwmon: (lm87) Allow channel data to be set from dts file
  Documentation: dtb: lm87: Add hwmon binding documentation
  hwmon: (ads7828) Accept optional parameters from device tree
  hwmon: (dell-smm) Add Dell XPS 15 9560 into DMI list
  hwmon: Constify str parameter of hwmon_ops->read_string
  dt: Add vendor prefix for Sensirion
  hwmon: (tmp421) Add OF device ID table
  hwmon: (tmp103) Add OF device ID table
  hwmon: (tmp102) Add OF device ID table
  hwmon: (stts751) Add OF device ID table
  hwmon: (ucd9200) Add OF device ID table
  hwmon: (ucd9000) Add OF device ID table
  ...

Merge tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

- an EDAC driver for Cavium ThunderX RAS IP (Sergey Temerkhanov)

- removal of DRAM error reporting through PCI SERR NMI (Borislav
   Petkov)

- misc small fixes (Jan Glauber, Thor Thayer)

* tag 'edac_for_4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, ghes: Do not enable it by default
  EDAC: Rename report status accessors
  EDAC: Delete edac_stub.c
  EDAC: Update Kconfig help text
  EDAC: Remove EDAC_MM_EDAC
  EDAC: Issue tracepoint only when it is defined
  ACPI/extlog: Add EDAC dependency
  EDAC: Move edac_op_state to edac_mc.c
  EDAC: Remove edac_err_assert
  EDAC: Get rid of edac_handlers
  x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
  EDAC, highbank: Align Makefile directives
  EDAC, thunderx: Remove unused code
  EDAC, thunderx: Change LMC index calculation
  EDAC, altera: Fix peripheral warnings for Cyclone5
  EDAC, thunderx: Fix L2C MCI interrupt disable
  EDAC, thunderx: Add Cavium ThunderX EDAC driver

Merge branch 'for-4.12/post-merge' of git://git.kernel.dk/linux-block

Pull second round of block layer updates from Jens Axboe:

- Further fixups to the NVMe APST code, from Andy.

- Various fixes for (mostly) nvme-fc, from Christoph and James.

- NVMe scsi fixes from Jon and Christoph.

* 'for-4.12/post-merge' of git://git.kernel.dk/linux-block: (39 commits)
  nvme-scsi: remove nvme_trans_security_protocol
  nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_io
  nvme-scsi: Consider LBA format in IO splitting calculation
  nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice
  lpfc: Fix memory corruption of the lpfc_ncmd->list pointers
  nvme: Add nvme_core.force_apst to ignore the NO_APST quirk
  nvme: Display raw APST configuration via DYNAMIC_DEBUG
  nvme: Fix APST comment
  lpfc revison 11.2.0.12
  Fix Express lane queue creation.
  Update ABORT processing for NVMET.
  Fix implicit logo and RSCN handling for NVMET
  Add Fabric assigned WWN support.
  Fix max_sgl_segments settings for NVME / NVMET
  Fix crash after issuing lip reset
  Fix driver load issues when MRQ=8
  Remove hba lock from NVMET issue WQE.
  Fix nvme initiator handling when not enabled.
  Fix driver usage of 128B WQEs when WQ_CREATE is V1.
  Fix driver unload/reload operation.
  ...

Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:

- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
   was initially a fork of CFQ, but subsequently changed to implement
   fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
   to be used on desktop type single drives, providing good fairness.
   From Paolo.

- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
   using a scalable token based algorithm that throttles IO based on
   live completion IO stats, similary to blk-wbt. From Omar.

- A series from Jan, moving users to separately allocated backing
   devices. This continues the work of separating backing device life
   times, solving various problems with hot removal.

- A series of updates for lightnvm, mostly from Javier. Includes a
   'pblk' target that exposes an open channel SSD as a physical block
   device.

- A series of fixes and improvements for nbd from Josef.

- A series from Omar, removing queue sharing between devices on mostly
   legacy drivers. This helps us clean up other bits, if we know that a
   queue only has a single device backing. This has been overdue for
   more than a decade.

- Fixes for the blk-stats, and improvements to unify the stats and user
   windows. This both improves blk-wbt, and enables other users to
   register a need to receive IO stats for a device. From Omar.

- blk-throttle improvements from Shaohua. This provides a scalable
   framework for implementing scalable priotization - particularly for
   blk-mq, but applicable to any type of block device. The interface is
   marked experimental for now.

- Bucketized IO stats for IO polling from Stephen Bates. This improves
   efficiency of polled workloads in the presence of mixed block size
   IO.

- A few fixes for opal, from Scott.

- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
   From a variety of folks, mostly Sagi and James Smart.

- A series from Bart, improving our exposed info and capabilities from
   the blk-mq debugfs support.

- A series from Christoph, cleaning up how handle WRITE_ZEROES.

- A series from Christoph, cleaning up the block layer handling of how
   we track errors in a request. On top of being a nice cleanup, it also
   shrinks the size of struct request a bit.

- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
   never used by platforms, and the latter has outlived it's usefulness.

- Various little bug fixes and cleanups from a wide variety of folks.

* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
  block: hide badblocks attribute by default
  blk-mq: unify hctx delay_work and run_work
  block: add kblock_mod_delayed_work_on()
  blk-mq: unify hctx delayed_run_work and run_work
  nbd: fix use after free on module unload
  MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
  blk-mq-sched: alloate reserved tags out of normal pool
  mtip32xx: use runtime tag to initialize command header
  scsi: Implement blk_mq_ops.show_rq()
  blk-mq: Add blk_mq_ops.show_rq()
  blk-mq: Show operation, cmd_flags and rq_flags names
  blk-mq: Make blk_flags_show() callers append a newline character
  blk-mq: Move the "state" debugfs attribute one level down
  blk-mq: Unregister debugfs attributes earlier
  blk-mq: Only unregister hctxs for which registration succeeded
  blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
  blk-mq: Let blk_mq_debugfs_register() look up the queue name
  blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
  ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
  ide-pm: always pass 0 error to __blk_end_request_all
  ..

Linux 4.11

hwmon: (twl4030-madc) drop driver

This driver is no longer needed:

* It has no mainline users
* It has no DT support and OMAP is DT only
* iio-hwmon can be used for madc, which also works with DT

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"The final fixes for 4.11:

   - prevent a triple fault with function graph tracing triggered via
     suspend to ram

   - prevent optimizing for size when function graph tracing is enabled
     and the compiler does not support -mfentry

   - prevent mwaitx() being called with a zero timeout as mwaitx() might
     never return. Observed on the new Ryzen CPUs"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Prevent timer value 0 for MWAITX
  x86/build: convert function graph '-Os' error to warning
  ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
"A single fix for a cputime accounting regression which got introduced
in the 4.11 cycle"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Fix ksoftirqd cputime accounting regression

Prevent timer value 0 for MWAITX

Newer hardware has uncovered a bug in the software implementation of
using MWAITX for the delay function. A value of 0 for the timer is meant
to indicate that a timeout will not be used to exit MWAITX. On newer
hardware this can result in MWAITX never returning, resulting in NMI
soft lockup messages being printed. On older hardware, some of the other
conditions under which MWAITX can exit masked this issue. The AMD APM
does not currently document this and will be updated.

Please refer to http://marc.info/?l=kvm&m=148950623231140 for
information regarding NMI soft lockup messages on an AMD Ryzen 1800X.
This has been root-caused as a 0 passed to MWAITX causing it to wait
indefinitely.

This change has the added benefit of avoiding the unnecessary setup of
MONITORX/MWAITX when the delay value is zero.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Link: http://lkml.kernel.org/r/1493156643-29366-1-git-send-email-Janakarajan.Natarajan@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov iter fix from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a braino in ITER_PIPE iov_iter_revert()

fix a braino in ITER_PIPE iov_iter_revert()

Fixes: 27c0e3748e41
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fix from Stephen Boyd:
"One odd config build fix for a recent Allwinner clock driver change
  that got merged. The common code called code in another file that
  wasn't always built. This just forces it on so people don't run into
  this bad configuration"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: always select CCU_GATE

libata: remove SCT WRITE SAME support

This was already disabled a while ago because it caused I/O errors,
and it's severly getting into the way of the discard / write zeroes
rework.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

libata: reject passthrough WRITE SAME requests

The WRITE SAME to TRIM translation rewrites the DATA OUT buffer. While
the SCSI code accomodates for this by passing a read-writable buffer
userspace applications don't cater for this behavior. In fact it can
be used to rewrite e.g. a readonly file through mmap and should be
considered as a security fix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

cgroup: avoid attaching a cgroup root to two different superblocks, take 2

Commit bfb0b80db5f9 ("cgroup: avoid attaching a cgroup root to two
different superblocks") is broken. Now we try to fix the race by
delaying the initialization of cgroup root refcnt until a superblock
has been allocated.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Tested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge branch 'pm-tools'

* pm-tools:
  tools: power: pm-graph: Package makefile and man pages
  tools: power: pm-graph: AnalyzeBoot v2.0
  tools: power: pm-graph: AnalyzeSuspend v4.6
  tools/power/x86/intel_pstate_tracer: Adjust directory ownership

Merge branches 'pm-cpuidle', 'pm-core', 'pm-domains', 'pm-avs' and 'pm-devfreq'

* pm-cpuidle:
  cpuidle: powernv: Avoid a branch in the core snooze_loop() loop
  cpuidle: powernv: Don't continually set thread priority in snooze_loop()
  cpuidle: powernv: Don't bounce between low and very low thread priority
  cpuidle: cpuidle-cps: remove unused variable
  powernv-cpuidle: Validate DT property array size

* pm-core:
  PM / runtime: Document autosuspend-helper side effects
  PM / runtime: Fix autosuspend documentation

* pm-domains:
  PM / Domains: Ignore domain-idle-states that are not compatible
  PM / Domains: Don't warn about IRQ safe device for an always on PM domain
  PM / Domains: Respect errors from genpd's ->power_off() callback
  PM / Domains: Enable users of genpd to specify always on PM domains
  PM / Domains: Clean up code validating genpd's status
  PM / Domain: remove conditional from error case

* pm-avs:
  PM / AVS: rockchip-io: add io selectors and supplies for rk3328

* pm-devfreq:
  PM / devfreq: Move struct devfreq_governor to devfreq directory

Merge branch 'pm-cpufreq'

* pm-cpufreq: (37 commits)
  cpufreq: Add Tegra186 cpufreq driver
  cpufreq: imx6q: Fix error handling code
  cpufreq: imx6q: Set max suspend_freq to avoid changes during suspend
  cpufreq: imx6q: Fix handling EPROBE_DEFER from regulator
  cpufreq: schedutil: Use policy-dependent transition delays
  cpufreq: schedutil: Reduce frequencies slower
  cpufreq: intel_pstate: Add support for Gemini Lake
  cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max()
  cpufreq: intel_pstate: Do not walk policy->cpus
  cpufreq: intel_pstate: Introduce pid_in_use()
  cpufreq: intel_pstate: Drop struct cpu_defaults
  cpufreq: intel_pstate: Move cpu_defaults definitions
  cpufreq: intel_pstate: Add update_util callback to pstate_funcs
  cpufreq: intel_pstate: Use different utilization update callbacks
  cpufreq: intel_pstate: Modify check in intel_pstate_update_status()
  cpufreq: intel_pstate: Drop driver_registered variable
  cpufreq: intel_pstate: Skip unnecessary PID resets on init
  cpufreq: intel_pstate: Set HWP sampling interval once
  cpufreq: intel_pstate: Clean up intel_pstate_busy_pid_reset()
  cpufreq: intel_pstate: Fold intel_pstate_reset_all_pid() into the caller
  ...

Merge schedutil governor updates for v4.12.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"Just a couple more stragglers, I really hope this is it.

  1) Don't let frags slip down into the GRO segmentation handlers, from
     Steffen Klassert.

  2) Truesize under-estimation triggers warnings in TCP over loopback
     with socket filters, 2 part fix from Eric Dumazet.

  3) Fix undesirable reset of bonding MTU to ETH_HLEN on slave removal,
     from Paolo Abeni.

  4) If we flush the XFRM policy after garbage collection, it doesn't
     work because stray entries can be created afterwards. Fix from Xin
     Long.

  5) Hung socket connection fixes in TIPC from Parthasarathy Bhuvaragan.

  6) Fix GRO regression with IPSEC when netfilter is disabled, from
     Sabrina Dubroca.

  7) Fix cpsw driver Kconfig dependency regression, from Arnd Bergmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: hso: register netdev later to avoid a race condition
  net: adjust skb->truesize in ___pskb_trim()
  tcp: do not underestimate skb->truesize in tcp_trim_head()
  bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
  ipv4: Don't pass IP fragments to upper layer GRO handlers.
  cpsw/netcp: refine cpts dependency
  tipc: close the connection if protocol messages contain errors
  tipc: improve error validations for sockets in CONNECTING state
  tipc: Fix missing connection request handling
  xfrm: fix GRO for !CONFIG_NETFILTER
  xfrm: do the garbage collection after flushing policy

Merge intel_pstate driver updates for v4.12.

net: hso: register netdev later to avoid a race condition

If the netdev is accessed before the urbs are initialized,
there will be NULL pointer dereferences. That is avoided by
registering it when it is fully initialized.

This case occurs e.g. if dhcpcd is running in the background
and the device is probed, either after insmod hso or
when the device appears on the usb bus.

A backtrace is the following:

[ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
[ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
[ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
[ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
[ 1357.574096] usb 1-2: Manufacturer: Option N.V.
[ 1357.685882] hso 1-2:1.5: Not our interface
[ 1460.886352] hso: unloaded
[ 1460.889984] usbcore: deregistering interface driver hso
[ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
[ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 00000030
[ 1513.887664] hso 1-2:1.5: Not our interface
[ 1513.906890] usbcore: registered new interface driver hso
[ 1513.937988] pgd = ecdec000
[ 1513.949890] [00000030] *pgd=acd15831, *pte=00000000, *ppte=00000000
[ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
[ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007
[ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso]
[ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: G        W       4.11.0-rc8-letux+ #1
[ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 1514.077758] task: ee748240 task.stack: ecdd6000
[ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
[ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
[ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
sp : ecdd7e20  ip : 00000000  fp : ffffffff
[ 1514.105316] r10: 00000000  r9 : ed0e080c  r8 : ecd8fe2c
[ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 : 00000000  r4 : ed0dbd80
[ 1514.117706] r3 : 00000000  r2 : c0020c80  r1 : 00000000  r0 : ecdb7800
[ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 00000051
[ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
[ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
[ 1514.149078] 7e20: ed0dbd80 ecd8fe98 00000001 00000000 ecd8f800 ecd8fe00 ecd8fe60 00000000
[ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 00000001 bf7a0258 ecd8f830 c068d958
[ 1514.166320] 7e60: c068d8b8 ecd8f800 00000001 00001091 00001090 c068dba4 ecd8f800 00001090
[ 1514.174926] 7e80: ecd8f940 ecd8f800 00000000 c068dc60 00000000 00000001 ed0e0800 ecd8f800
[ 1514.183563] 7ea0: 00000000 c06feaa8 c0ca39c2 beea57dc 00000020 00000000 306f7368 00000000
[ 1514.192169] 7ec0: 00000000 00000000 00001091 00000000 00000000 00000000 00000000 00008914
[ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 00000006 00000000 00046858 c066a948
[ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 00000006 c0282c90 0000c000 c0283654
[ 1514.218017] 7f20: c09b0c00 c098bc31 00000001 c0c5e513 c0c5e513 00000000 c0151354 c01a20c0
[ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 00000000 ee748240
[ 1514.235260] 7f60: ee748734 00000000 ecc6b0c0 ecc6b0c0 beea57dc 00008914 00000006 00000000
[ 1514.243896] 7f80: 00046858 c02837b0 00001091 0003a1f0 00046608 0003a248 00000036 c01071e4
[ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 00000006 00008914 beea57dc 00001091
[ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 00000036 0003ac0c 00046608 00046610 00046858
[ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 00000006 00000000 00000000
[ 1514.278411] [<bf79c304>] (hso_start_net_device [hso]) from [<bf79ced8>] (hso_net_open+0x68/0x84 [hso])
[ 1514.288238] [<bf79ced8>] (hso_net_open [hso]) from [<c068d958>] (__dev_open+0xa0/0xf4)
[ 1514.296600] [<c068d958>] (__dev_open) from [<c068dba4>] (__dev_change_flags+0x8c/0x130)
[ 1514.305023] [<c068dba4>] (__dev_change_flags) from [<c068dc60>] (dev_change_flags+0x18/0x48)
[ 1514.313934] [<c068dc60>] (dev_change_flags) from [<c06feaa8>] (devinet_ioctl+0x348/0x714)
[ 1514.322540] [<c06feaa8>] (devinet_ioctl) from [<c066a948>] (sock_ioctl+0x2b0/0x308)
[ 1514.330627] [<c066a948>] (sock_ioctl) from [<c0282c90>] (vfs_ioctl+0x20/0x34)
[ 1514.338165] [<c0282c90>] (vfs_ioctl) from [<c0283654>] (do_vfs_ioctl+0x82c/0x93c)
[ 1514.346038] [<c0283654>] (do_vfs_ioctl) from [<c02837b0>] (SyS_ioctl+0x4c/0x74)
[ 1514.353759] [<c02837b0>] (SyS_ioctl) from [<c0107040>] (ret_fast_syscall+0x0/0x1c)
[ 1514.361755] Code: e3822103 e3822080 e1822781 e5981014 (e5832030)
[ 1514.510833] ---[ end trace dfb3e53c657f34a0 ]---

Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: adjust skb->truesize in ___pskb_trim()

Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket.

As we did recently in commit 158f323b9868 ("net: adjust skb->truesize in
pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
via a call to skb_condense().

If all frags were freed, then skb->truesize can be recomputed.

This call can be done if skb is not yet owned, or destructor is
sock_edemux().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: do not underestimate skb->truesize in tcp_trim_head()

Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
skb_try_coalesce() using syzkaller and a filter attached to a TCP
socket over loopback interface.

I believe one issue with looped skbs is that tcp_trim_head() can end up
producing skb with under estimated truesize.

It hardly matters for normal conditions, since packets sent over
loopback are never truncated.

Bytes trimmed from skb->head should not change skb truesize, since
skb->head is not reallocated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal

On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.

Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.

If bond0 is configured on top of ipoib devices, with the
following commands:

ifup bond0
for slave in $BOND_SLAVES_LIST; do
ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>

we will obtain a skb_under_panic() with a similar call trace:
skb_push+0x3d/0x40
push_pseudo_header+0x17/0x30 [ib_ipoib]
ipoib_hard_header+0x4e/0x80 [ib_ipoib]
arp_create+0x12f/0x220
arp_send_dst.part.19+0x28/0x50
arp_solicit+0x115/0x290
neigh_probe+0x4d/0x70
__neigh_event_send+0xa7/0x230
neigh_resolve_output+0x12e/0x1c0
ip_finish_output2+0x14b/0x390
ip_finish_output+0x136/0x1e0
ip_output+0x76/0xe0
ip_local_out+0x35/0x40
ip_send_skb+0x19/0x40
ip_push_pending_frames+0x33/0x40
raw_sendmsg+0x7d3/0xb50
inet_sendmsg+0x31/0xb0
sock_sendmsg+0x38/0x50
SYSC_sendto+0x102/0x190
SyS_sendto+0xe/0x10
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25

This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().

The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard
header").

Reported-by: Norbert P <noe@physik.uzh.ch>
Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len")
Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Don't pass IP fragments to upper layer GRO handlers.

Upper layer GRO handlers can not handle IP fragments, so
exit GRO processing in this case.

This fixes ESP GRO because the packet must be reassembled
before we can decapsulate, otherwise we get authentication
failures.

It also aligns IPv4 to IPv6 where packets with fragmentation
headers are not passed to upper layer GRO handlers.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cpsw/netcp: refine cpts dependency

Tony Lindgren reports a kernel oops that resulted from my compile-time
fix on the default config. This shows two problems:

a) configurations that did not already enable PTP_1588_CLOCK will
now miss the cpts driver

b) when cpts support is disabled, the driver crashes. This is a
preexisting problem that we did not notice before my patch.

While the second problem is still being investigated, this modifies
the dependencies again, getting us back to the original state, with
another 'select NET_PTP_CLASSIFY' added in to avoid the original
link error we got, and the 'depends on POSIX_TIMERS' to hide
the CPTS support when turning it on would be useless.

Cc: stable@vger.kernel.org # 4.11 needs this
Fixes: 07fef3623407 ("cpsw/netcp: cpts depends on posix_timers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmi/watchdog: fix wdog hang on panic waiting for ipmi response

Commit c49c097610fe ("ipmi: Don't call receive handler in the
panic context") means that the panic_recv_free is not called during a
panic and the atomic count does not drop to 0.

Fix this by only expecting one decrement of the atomic variable
which comes from panic_smi_free.

Signed-off-by: Robert Lippert <rlippert@google.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2017-04-28

1) Do garbage collecting after a policy flush to remove old
bundles immediately. From Xin Long.

2) Fix GRO if netfilter is not defined.
From Sabrina Dubroca.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()

cgroup_get() expected to be called only on live cgroups and triggers
warning on a dead cgroup; however, cgroup_sk_alloc() may be called
while cloning a socket which is left in an empty and removed cgroup
and thus may legitimately duplicate its reference on a dead cgroup.
This currently triggers the following warning spuriously.

WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60
...
  [<ffffffff8107e123>] __warn+0xd3/0xf0
  [<ffffffff8107e20e>] warn_slowpath_null+0x1e/0x20
  [<ffffffff810ff465>] cgroup_get+0x55/0x60
  [<ffffffff81106061>] cgroup_sk_alloc+0x51/0xe0
  [<ffffffff81761beb>] sk_clone_lock+0x2db/0x390
  [<ffffffff817cce06>] inet_csk_clone_lock+0x16/0xc0
  [<ffffffff817e8173>] tcp_create_openreq_child+0x23/0x4b0
  [<ffffffff818601a1>] tcp_v6_syn_recv_sock+0x91/0x670
  [<ffffffff817e8b16>] tcp_check_req+0x3a6/0x4e0
  [<ffffffff81861ba3>] tcp_v6_rcv+0x693/0xa00
  [<ffffffff81837429>] ip6_input_finish+0x59/0x3e0
  [<ffffffff81837cb2>] ip6_input+0x32/0xb0
  [<ffffffff81837387>] ip6_rcv_finish+0x57/0xa0
  [<ffffffff81837ac8>] ipv6_rcv+0x318/0x4d0
  [<ffffffff817778c7>] __netif_receive_skb_core+0x2d7/0x9a0
  [<ffffffff81777fa6>] __netif_receive_skb+0x16/0x70
  [<ffffffff81778023>] netif_receive_skb_internal+0x23/0x80
  [<ffffffff817787d8>] napi_gro_frags+0x208/0x270
  [<ffffffff8168a9ec>] mlx4_en_process_rx_cq+0x74c/0xf40
  [<ffffffff8168b270>] mlx4_en_poll_rx_cq+0x30/0x90
  [<ffffffff81778b30>] net_rx_action+0x210/0x350
  [<ffffffff8188c426>] __do_softirq+0x106/0x2c7
  [<ffffffff81082bad>] irq_exit+0x9d/0xa0 [<ffffffff8188c0e4>] do_IRQ+0x54/0xd0
  [<ffffffff8188a63f>] common_interrupt+0x7f/0x7f <EOI>
  [<ffffffff8173d7e7>] cpuidle_enter+0x17/0x20
  [<ffffffff810bdfd9>] cpu_startup_entry+0x2a9/0x2f0
  [<ffffffff8103edd1>] start_secondary+0xf1/0x100

This patch renames the existing cgroup_get() with the dead cgroup
warning to cgroup_get_live() after cgroup_kn_lock_live() and
introduces the new cgroup_get() which doesn't check whether the cgroup
is live or dead.

All existing cgroup_get() users except for cgroup_sk_alloc() are
converted to use cgroup_get_live().

Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Cc: stable@vger.kernel.org # v4.5+
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:
"Yet another quirk to i8042 to get touchpad recognized on some laptops"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Clevo P650RS to the i8042 reset list

clk: sunxi-ng: always select CCU_GATE

When the base driver is enabled but all SoC specific drivers are turned
off, we now get a build error after code was added to always refer to the
clk gates:

drivers/clk/built-in.o: In function `ccu_pll_notifier_cb':
:(.text+0x154f8): undefined reference to `ccu_gate_helper_disable'
:(.text+0x15504): undefined reference to `ccu_gate_helper_enable'

This changes the Kconfig to always require the gate code to be built-in
when CONFIG_SUNXI_CCU is set.

Fixes: 02ae2bc6febd ("clk: sunxi-ng: Add clk notifier to gate then ungate PLL clocks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>

Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fix from Chris Mason:
"We have one more fix for btrfs.

  This gets rid of a new WARN_ON from rc1 that ended up making more
  noise than we really want. The larger fix for the underflow got
  delayed a bit and it's better for now to put it under
  CONFIG_BTRFS_DEBUG"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: qgroup: move noisy underflow warning to debugging build

Merge branch 'tipc-socket-connection-hangs'

Parthasarathy Bhuvaragan says:

====================
tipc: fix hanging socket connections

This patch series contains fixes for the socket layer to
prevent hanging / stale connections.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: close the connection if protocol messages contain errors

When a socket is shutting down, we notify the peer node about the
connection termination by reusing an incoming message if possible.
If the last received message was a connection acknowledgment
message, we reverse this message and set the error code to
TIPC_ERR_NO_PORT and send it to peer.

In tipc_sk_proto_rcv(), we never check for message errors while
processing the connection acknowledgment or probe messages. Thus
this message performs the usual flow control accounting and leaves
the session hanging.

In this commit, we terminate the connection when we receive such
error messages.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: improve error validations for sockets in CONNECTING state

Until now, the checks for sockets in CONNECTING state was based on
the assumption that the incoming message was always from the
peer's accepted data socket.

However an application using a non-blocking socket sends an implicit
connect, this socket which is in CONNECTING state can receive error
messages from the peer's listening socket. As we discard these
messages, the application socket hangs as there due to inactivity.
In addition to this, there are other places where we process errors
but do not notify the user.

In this commit, we process such incoming error messages and notify
our users about them using sk_state_change().

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Fix missing connection request handling

In filter_connect, we use waitqueue_active() to check for any
connections to wakeup. But waitqueue_active() is missing memory
barriers while accessing the critical sections, leading to
inconsistent results.

In this commit, we replace this with an SMP safe wq_has_sleeper()
using the generic socket callback sk_data_ready().

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

block: hide badblocks attribute by default

Commit 99e6608c9e74 "block: Add badblock management for gendisks"
allowed for drivers like pmem and software-raid to advertise a list of
bad media areas. However, it inadvertently added a 'badblocks' to all
block devices. Lets clean this up by having the 'badblocks' attribute
not be visible when the driver has not populated a 'struct badblocks'
instance in the gendisk.

Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: unify hctx delay_work and run_work

The only difference between ->run_work and ->delay_work, is that
the latter is used to defer running a queue. This is done by
marking the queue stopped, and scheduling ->delay_work to run
sometime in the future. While the queue is stopped, direct runs
or runs through ->run_work will not run the queue.

If we combine the handlers, then we need to handle two things:

1) If a delayed/stopped run is scheduled, then we should not run
   the queue before that has been completed.
2) If a queue is delayed/stopped, the handler needs to restart
   the queue. Normally a run of a queue with the stopped bit set
   would be a no-op.

Case 1 is handled by modifying a currently pending queue run
to the deadline set by the caller of blk_mq_delay_queue().
Subsequent attempts to queue a queue run will find the work
item already pending, and direct runs will see a stopped queue
as before.

Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN,
that tells the work handler that it should clear a stopped
queue and run the handler.

Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

block: add kblock_mod_delayed_work_on()

This modifies (or adds, if not currently pending) an existing
delayed work item.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: unify hctx delayed_run_work and run_work

They serve the exact same purpose. Get rid of the non-delayed
work variant, and just run it without delay for the normal case.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

nbd: fix use after free on module unload

list_for_each_entry() isn't super safe if we're freeing the objects
while we traverse the list. Also don't bother taking the extra
reference, the module refcounting stuff will save us from having anybody
messing with the device while we're trying to unload.

Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler

Seems like this was forgotten in the bfq-series from Paolo. Let's do it now
so people don't miss out involving Paolo for any future changes or when
reporting bugs.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@fb.com>

Merge tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
"Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs
  in our NFSv2/v3 xdr code that could crash the server or leak memory"

* tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux:
  nfsd: stricter decoding of write-like NFSv2/v3 ops
  nfsd4: minor NFSv2/v3 write decoding cleanup
  nfsd: check for oversized NFSv2/v3 arguments

Merge tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
"A fix for a kernel stack overflow bug in ceph setattr code, marked for
stable"

* tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client:
ceph: fix recursion between ceph_set_acl() and __ceph_setattr()

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:

- fix orangefs handling of faults on write() - I'd missed that one back
   when orangefs was going through review.

- readdir counterpart of "9p: cope with bogus responses from server in
   p9_client_{read,write}" - server might be lying or broken, and we'd
   better not overrun the kmalloc'ed buffer we are copying the results
   into.

- NFS O_DIRECT read/write can leave iov_iter advanced by too much;
   that's what had been causing iov_iter_pipe() warnings davej had been
   seeing.

- statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should
   go in before 4.11.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
  fix nfs O_DIRECT advancing iov_iter too much
  p9_client_readdir() fix
  orangefs_bufmap_copy_from_iovec(): fix EFAULT handling

statx: correct error handling of NULL pathname

The change in commit 1e2f82d1e9d1 ("statx: Kill fd-with-NULL-path
support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
statx() is inconsistent.

It results in the error EINVAL for a NULL pathname. Other system calls
with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.

The solution is simply to remove the EINVAL check. As I already pointed
out in [1], user_path_at*() and filename_lookup() will handle the NULL
pathname as per the other APIs, to correctly produce the error EFAULT.

[1] https://lkml.org/lkml/2017/4/26/561

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-4.12/post-merge

Christoph writes:

"A couple more updates for 4.12. The biggest pile is fc and lpfc
updates from James, but there are various small fixes and cleanups as
well."

Fixes up a few merge issues, and also a warning in
lpfc_nvmet_rcv_unsol_abort() if CONFIG_NVME_TARGET_FC isn't enabled.

Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq-sched: alloate reserved tags out of normal pool

At least one driver, mtip32xx, has a hard coded dependency on
the value of the reserved tag used for internal commands. While
that should really be fixed up, for now let's ensure that we just
bypass the scheduler tags an allocation marked as reserved. They
are used for house keeping or error handling, so we can safely
ignore them in the scheduler.

Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

mtip32xx: use runtime tag to initialize command header

mtip32xx supposes that 'request_idx' passed to .init_request()
is tag of the request, and use that as request's tag to initialize
command header.

After MQ IO scheduler is in, request tag assigned isn't same with
the request index anymore, so cause strange hardware failure on
mtip32xx, even whole system panic is triggered.

This patch fixes the issue by initializing command header via
request's real tag.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

EDAC, ghes: Do not enable it by default

Leave it to the user to decide whether to enable this or not. Otherwise,
platform-specific drivers won't initialize (currently, EDAC supports
only a single platform driver loaded).

Signed-off-by: Borislav Petkov <bp@suse.de>

mailbox: handle empty message in tx_tick

We already check if the message is empty before calling the client
tx_done callback. Calling completion on a wait event is also invalid
if the message is empty.

This patch moves the existing empty message check earlier.

Fixes: 2b6d83e2b8b7 ("mailbox: Introduce framework for mailbox")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>

mailbox: skip complete wait event if timer expired

If a wait_for_completion_timeout() call returns due to a timeout,
complete() can get called after returning from the wait which is
incorrect and can cause subsequent transmissions on a channel to fail.
Since the wait_for_completion_timeout() sees the completion variable
is non-zero caused by the erroneous/spurious complete() call, and
it immediately returns without waiting for the time as expected by the
client.

This patch fixes the issue by skipping complete() call for the timer
expiry.

Fixes: 2b6d83e2b8b7 ("mailbox: Introduce framework for mailbox")
Reported-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>

mailbox: always wait in mbox_send_message for blocking Tx mode

There exists a race when msg_submit return immediately as there was an
active request being processed which may have completed just before it's
checked again in mbox_send_message. This will result in return to the
caller without waiting in mbox_send_message even when it's blocking Tx.

This patch fixes the issue by waiting for the completion always if Tx
is in blocking mode.

Fixes: 2b6d83e2b8b7 ("mailbox: Introduce framework for mailbox")
Reported-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>

xfrm: fix GRO for !CONFIG_NETFILTER

In xfrm_input() when called from GRO, async == 0, and we end up
skipping the processing in xfrm4_transport_finish(). GRO path will
always skip the NF_HOOK, so we don't need the special-case for
!NETFILTER during GRO processing.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

sched/cputime: Fix ksoftirqd cputime accounting regression

irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.

But this behaviour got broken by:

a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account")

... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().

This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.

ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().

To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.

Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

nvme-scsi: remove nvme_trans_security_protocol

This function just returns the same error code and sense data as
the default statement in the switch in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>

uapi: change the type of struct statx_timestamp.tv_nsec to unsigned

The comment asserting that the value of struct statx_timestamp.tv_nsec
must be negative when statx_timestamp.tv_sec is negative, is wrong, as
could be seen from the following example:

#define _FILE_OFFSET_BITS 64
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <linux/stat.h>

int main(void)
{
static const struct timespec ts[2] = {
{ .tv_nsec = UTIME_OMIT },
{ .tv_sec = -2, .tv_nsec = 42 }
};
assert(utimensat(AT_FDCWD, ".", ts, 0) == 0);

struct stat st;
assert(stat(".", &st) == 0);
printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n",
       (long long) st.st_mtim.tv_sec,
       (unsigned long) st.st_mtim.tv_nsec);

struct statx stx;
assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0);
printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n",
       (long long) stx.stx_mtime.tv_sec,
       (unsigned long) stx.stx_mtime.tv_nsec);

return 0;
}

It expectedly prints:
st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42
stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42

The more generic comment asserting that the value of struct
statx_timestamp.tv_nsec might be negative is confusing to say the least.

It contradicts both the struct stat.st_[acm]time_nsec tradition and
struct timespec.tv_nsec requirements in utimensat syscall.
If statx syscall ever returns a stx_[acm]time containing a negative
tv_nsec that cannot be passed unmodified to utimensat syscall,
it will cause an immense confusion.

Fix this source of confusion by changing the type of struct
statx_timestamp.tv_nsec from __s32 to __u32.

Fixes: a528d35e8bfc ("statx: Add a system call to make enhanced file info available")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
cc: mtk.manpages@gmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
"I didn't want the release to go out without the statx system call
  properly hooked up"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Update syscall tables.
  sparc64: Fill in rest of HAVE_REGS_AND_STACK_ACCESS_API

statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH

With the new statx() syscall, the following both allow the attributes of
the file attached to a file descriptor to be retrieved:

statx(dfd, NULL, 0, ...);

and:

statx(dfd, "", AT_EMPTY_PATH, ...);

Change the code to reject the first option, though this means copying
the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
non-directory provided path is "".

[ The timing of this isn't wonderful, but applying this now before we
  have statx() in any released kernel, before anybody starts using the
  NULL special case.    - Linus ]

Fixes: a528d35e8bfc ("statx: Add a system call to make enhanced file info available")
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Sandeen <sandeen@sandeen.net>
cc: fstests@vger.kernel.org
cc: linux-api@vger.kernel.org
cc: linux-man@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

scsi: Implement blk_mq_ops.show_rq()

Show the SCSI CDB for pending SCSI commands in
/sys/kernel/debug/block/*/mq/*/dispatch and */rq_list. An example
of how SCSI commands are displayed by this code:

ffff8801703245c0 {.op=READ, .cmd_flags=META PRIO, .rq_flags=DONTPREP IO_STAT STATS, .tag=14, .internal_tag=-1, .cmd=Read(10) 28 00 2a 81 1b 30 00 00 08 00}

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: <linux-scsi@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Add blk_mq_ops.show_rq()

This new callback function will be used in the next patch to show
more information about SCSI requests.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Show operation, cmd_flags and rq_flags names

Show the operation name, .cmd_flags and .rq_flags as names instead
of numbers.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Make blk_flags_show() callers append a newline character

This patch does not change any functionality but makes it possible
to produce a single line of output with multiple flag-to-name
translations.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Move the "state" debugfs attribute one level down

Move the "state" attribute from the top level to the "mq" directory
as requested by Omar.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Unregister debugfs attributes earlier

We currently call blk_mq_free_queue() from blk_cleanup_queue()
before we unregister the debugfs attributes for that queue in
blk_release_queue(). This leaves a window open during which
accessing most of the mq debugfs attributes would cause a
use-after-free. Additionally, the "state" attribute allows
running the queue, which we should not do after the queue has
entered the "dead" state. Fix both cases by unregistering the
debugfs attributes before freeing queue resources starts.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Only unregister hctxs for which registration succeeded

Hctx unregistration involves calling kobject_del(). kobject_del()
must not be called if kobject_add() has not been called. Hence in
the error path only unregister hctxs for which registration succeeded.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq-debugfs: Rename functions for registering and unregistering the mq directory

Since the blk_mq_debugfs_*register_hctxs() functions register and
unregister all attributes under the "mq" directory, rename these
into blk_mq_debugfs_*register_mq().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Let blk_mq_debugfs_register() look up the queue name

A later patch will move the call of blk_mq_debugfs_register() to
a function to which the queue name is not passed as an argument.
To avoid having to add a 'name' argument to multiple callers, let
blk_mq_debugfs_register() look up the queue name.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

blk-mq: Register <dev>/queue/mq after having registered <dev>/queue

A later patch in this series will modify blk_mq_debugfs_register()
such that it uses q->kobj.parent to determine the name of a
request queue. Hence make sure that that pointer is initialized
before blk_mq_debugfs_register() is called. To avoid lock inversion,
protect sysfs / debugfs registration with the queue sysfs_lock
instead of the global mutex all_q_mutex.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) MLX5 bug fixes from Saeed Mahameed et al:
     - released wrong resources when firmware timeout happens
     - fix wrong check for encapsulation size limits
     - UAR memory leak
     - ETHTOOL_GRXCLSRLALL failed to fill in info->data

2) Don't cache l3mdev on mis-matches local route, causes net devices to
    leak refs. From Robert Shearman.

3) Handle fragmented SKBs properly in macsec driver, the problem is
    that we were mis-sizing the sgvec table. From Jason A. Donenfeld.

4) We cannot have checksum offload enabled for inner UDP tunneled
    packet during IPSEC, from Ansis Atteka.

5) Fix double SKB free in ravb driver, from Dan Carpenter.

6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli.

7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver,
    from Maksim Salau.

8) Fix device leak in macvlan driver, from Herbert Xu. We have to purge
    the broadcast queue properly on port destroy.

9) Fix tx ring entry limit on EF10 devices in sfc driver. From Bert
    Kenward.

10) Fix memory leaks in team driver, from Pan Bian.

11) Don't setup ipv6_stub before it can be actually used, from Paolo
    Abeni.

12) Fix tipc socket flow control accounting, from Parthasarathy
    Bhuvaragan.

13) Fix crash on module unload in hso driver, from Andreas Kemnade.

14) Fix purging of bridge multicast entries, the problem is that if we
    don't defer it to ndo_uninit it's possible for new entries to get
    added after we purge. Fix from Xin Long.

15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander
    Potapenko.

16) Fix autoneg stall properly in PHY layer, and revert micrel driver
    change that was papering over it. From Alexander Kochetkov.

17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel
    code, from Cong Wang.

18) Clear out the congestion control private of the TCP socket in all of
    the right places, from Wei Wang.

19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie
    Bainbridge.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  ipv6: check raw payload size correctly in ioctl
  tcp: memset ca_priv data to 0 properly
  ipv6: check skb->protocol before lookup for nexthop
  net: core: Prevent from dereferencing null pointer when releasing SKB
  macsec: dynamically allocate space for sglist
  Revert "phy: micrel: Disable auto negotiation on startup"
  net: phy: fix auto-negotiation stall due to unavailable interrupt
  net/packet: check length in getsockopt() called with PACKET_HDRLEN
  net: ipv6: regenerate host route if moved to gc list
  bridge: move bridge multicast cleanup to ndo_uninit
  ipv6: fix source routing
  qed: Fix error in the dcbx app meta data initialization.
  netvsc: fix calculation of available send sections
  net: hso: fix module unloading
  tipc: fix socket flow control accounting error at tipc_recv_stream
  tipc: fix socket flow control accounting error at tipc_send_stream
  ipv6: move stub initialization after ipv6 setup completion
  team: fix memory leaks
  sfc: tx ring can only have 2048 entries for all EF10 NICs
  macvlan: Fix device ref leak when purging bc_queue
  ...

ipv6: check raw payload size correctly in ioctl

In situations where an skb is paged, the transport header pointer and
tail pointer can be the same because the skb contents are in frags.

This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
length of 0 when the length to receive is actually greater than zero.

skb->len is already correctly set in ip6_input_finish() with
pskb_pull(), so use skb->len as it always returns the correct result
for both linear and paged data.

Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: memset ca_priv data to 0 properly

Always zero out ca_priv data in tcp_assign_congestion_control() so that
ca_priv data is cleared out during socket creation.
Also always zero out ca_priv data in tcp_reinit_congestion_control() so
that when cc algorithm is changed, ca_priv data is cleared out as well.
We should still zero out ca_priv data even in TCP_CLOSE state because
user could call connect() on AF_UNSPEC to disconnect the socket and
leave it in TCP_CLOSE state and later call setsockopt() to switch cc
algorithm on this socket.

Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: check skb->protocol before lookup for nexthop

Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
neigh key as an IPv6 address:

        neigh = dst_neigh_lookup(skb_dst(skb),
                                 &ipv6_hdr(skb)->daddr);
        if (!neigh)
                goto tx_err_link_failure;

        addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
        addr_type = ipv6_addr_type(addr6);

        if (addr_type == IPV6_ADDR_ANY)
                addr6 = &ipv6_hdr(skb)->daddr;

        memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));

Also the network header of the skb at this point should be still IPv4
for 4in6 tunnels, we shold not just use it as IPv6 header.

This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
is, we are safe to do the nexthop lookup using skb_dst() and
ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
dest address we can pick here, we have to rely on callers to fill it
from tunnel config, so just fall to ip6_route_output() to make the
decision.

Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: Prevent from dereferencing null pointer when releasing SKB

Added NULL check to make __dev_kfree_skb_irq consistent with kfree
family of functions.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macsec: dynamically allocate space for sglist

We call skb_cow_data, which is good anyway to ensure we can actually
modify the skb as such (another error from prior). Now that we have the
number of fragments required, we can safely allocate exactly that amount
of memory.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "phy: micrel: Disable auto negotiation on startup"

This reverts commit 99f81afc139c6edd14d77a91ee91685a414a1c66.

It was papering over the real problem, which is fixed by commit
f555f34fdc58 ("net: phy: fix auto-negotiation stall due to unavailable
interrupt")

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: fix auto-negotiation stall due to unavailable interrupt

The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
cable was plugged before the Ethernet interface was brought up.

The patch trigger PHY state machine to update link state if PHY was requested to
do auto-negotiation and auto-negotiation complete flag already set.

During power-up cycle the PHY do auto-negotiation, generate interrupt and set
auto-negotiation complete flag. Interrupt is handled by PHY state machine but
doesn't update link state because PHY is in PHY_READY state. After some time
MAC bring up, start and request PHY to do auto-negotiation. If there are no new
settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
PHY continue to stay in auto-negotiation complete state and doesn't fire
interrupt. At the same time PHY state machine expect that PHY started
auto-negotiation and is waiting for interrupt from PHY and it won't get it.

Fixes: 321beec5047a ("net: phy: Use interrupts when available in NOLINK state")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Cc: stable <stable@vger.kernel.org> # v4.9+
Tested-by: Roger Quadros <rogerq@ti.com>
Tested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Since we got a bonus week, let me try to screw a few pending fixes.

  A slightly large fix is the locking fix in ASoC STI driver, but it's
  pretty board-specific, and the risk is fairly low.

  All the rest are small / trivial fixes, mostly marked as stable, for
  ALSA sequencer core, ASoC topology, ASoC Intel bytcr and Firewire
  drivers"

* tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: intel: Fix PM and non-atomic crash in bytcr drivers
  ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type
  ALSA: seq: Don't break snd_use_lock_sync() loop by timeout
  ASoC: topology: Fix to store enum text values
  ASoC: STI: Fix null ptr deference in IRQ handler
  ALSA: oxfw: fix regression to handle Stanton SCS.1m/1d

ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset

The caller only looks at the scsi_request result field anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

ide-pm: always pass 0 error to __blk_end_request_all

ide_pm_execute_rq exectures a PM request synchronously, and in the failure
case where it calls __blk_end_request_all it never checks the error field
passed to the end_io callback, so don't bother setting it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

scsi_transport_sas: always pass 0 error to blk_end_request_all

The SAS transport queues are only used by bsg, and bsg always looks at
the scsi_request results and never add the error passed in the end_io
callback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

xfrm: do the garbage collection after flushing policy

Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
These is no reason not to do it after flushing policies, especially
considering that 'garbage collection deferred' is only triggered
when it reaches gc_thresh.

It's no good that the policy is gone but the xdst still hold there.
The worse thing is that xdst->route/orig_dst is also hold and can
not be released even if the orig_dst is already expired.

This patch is to do the garbage collection if there is any policy
removed in xfrm_policy_flush.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Merge tag 'arc-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fix from Vineet Gupta:
"Last minute fixes for ARC:

   - build error in Mellanox nps platform

   - addressing lack of saving FPU regs in releavnt configs"

* tag 'arc-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARCv2: entry: save Accumulator register pair (r58:59) if present
  ARC: [plat-eznps] Fix build error

nfsd: stricter decoding of write-like NFSv2/v3 ops

The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: minor NFSv2/v3 write decoding cleanup

Use a couple shortcuts that will simplify a following bugfix.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd: check for oversized NFSv2/v3 arguments

A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply.  This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE).  But a client that sends an incorrectly long reply
can violate those assumptions.  This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended.  That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ceph: fix recursion between ceph_set_acl() and __ceph_setattr()

ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
then calls posix_acl_chmod().

The problem is that __ceph_setattr() calls posix_acl_chmod() before
sending the setattr request. The get_acl() call in posix_acl_chmod()
can trigger a getxattr request. The reply of the getxattr request
can restore inode's i_mode to its old value. The set_acl() call in
posix_acl_chmod() sees old value of inode's i_mode, so it calls
__ceph_setattr() again.

Cc: stable@vger.kernel.org # needs backporting for < 4.9
Link: http://tracker.ceph.com/issues/19688
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

net/packet: check length in getsockopt() called with PACKET_HDRLEN

In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4
|val| remains uninitialized and the syscall may behave differently
depending on its value, and even copy garbage to userspace on certain
architectures. To fix this we now return -EINVAL if optlen is too small.

This bug has been detected with KMSAN.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: regenerate host route if moved to gc list

Taking down the loopback device wreaks havoc on IPv6 routing. By
extension, taking down a VRF device wreaks havoc on its table.

Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
FIB code while running syzkaller fuzzer. The root cause is a dead dst
that is on the garbage list gets reinserted into the IPv6 FIB. While on
the gc (or perhaps when it gets added to the gc list) the dst->next is
set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
out-of-bounds access.

Andrey's reproducer was the key to getting to the bottom of this.

With IPv6, host routes for an address have the dst->dev set to the
loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
a walk of the fib evicting routes with the 'lo' device which means all
host routes are removed. That process moves the dst which is attached to
an inet6_ifaddr to the gc list and marks it as dead.

The recent change to keep global IPv6 addresses added a new function,
fixup_permanent_addr, that is called on admin up. That function restarts
dad for an inet6_ifaddr and when it completes the host route attached
to it is inserted into the fib. Since the route was marked dead and
moved to the gc list, re-inserting the route causes the reported
out-of-bounds accesses. If the device with the address is taken down
or the address is removed, the WARN_ON in fib6_del is triggered.

All of those faults are fixed by regenerating the host route if the
existing one has been moved to the gc list, something that can be
determined by checking if the rt6i_ref counter is 0.

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>