If pci_stop_and_remove_bus_device() is run concurrently for a device and
its parent bridge via remove_callback(), both code paths attempt to acquire
pci_rescan_remove_lock. If the child device removal acquires it first,
there will be no problems. However, if the parent bridge removal acquires
it first, it will eventually execute pci_destroy_dev() for the child
device, but that device object will not be freed yet due to the reference
held by the concurrent child removal. Consequently, both
pci_stop_bus_device() and pci_remove_bus_device() will be executed for that
device unnecessarily and pci_destroy_dev() will see a corrupted list head
in that object. Moreover, an excess put_device() will be executed for that
device in that case which may lead to a use-after-free in the final
kobject_put() done by sysfs_schedule_callback_work().
To avoid that problem, make pci_destroy_dev() check if the device's parent
kobject is NULL, which only happens after device_del() has already run for
it. Make pci_destroy_dev() return immediately whithout doing anything in
that case.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
xen/pcifront: Use global PCI rescan-remove locking
Multiple race conditions are possible between the Xen pcifront device
addition and removal and the generic PCI device addition and removal that
can be triggered via sysfs.
To avoid those race conditions make the Xen pcifront code use global PCI
rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Race conditions are theoretically possible between the PCI device addition
and removal in the PPC64 PCI error recovery driver and the generic PCI bus
rescan and device removal that can be triggered via sysfs.
To avoid those race conditions make PPC64 PCI error recovery driver use
global PCI rescan-remove locking around PCI device addition and removal.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
MPT / PCI: Use pci_stop_and_remove_bus_device_locked()
Race conditions are theoretically possible between the MPT PCI device
removal and the generic PCI bus rescan and device removal that can be
triggered via sysfs.
To avoid those race conditions make the MPT PCI code use
pci_stop_and_remove_bus_device_locked().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
platform / x86: Use global PCI rescan-remove locking
Multiple race conditions are possible between the rfkill hotplug in the
asus-wmi and eeepc-laptop drivers and the generic PCI bus rescan and device
removal that can be triggered via sysfs.
To avoid those race conditions make asus-wmi and eeepc-laptop use global
PCI rescan-remove locking around the rfkill hotplug.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Multiple race conditions are possible between the cardbus PCI device
addition and removal and the generic PCI bus rescan and device removal that
can be triggered via sysfs.
To avoid those race conditions make the cardbus code use global PCI
rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
ACPI / hotplug / PCI: Use global PCI rescan-remove locking
Multiple race conditions are possible between the ACPI-based PCI hotplug
(ACPIPHP) and the generic PCI bus rescan and device removal that can be
triggered via sysfs.
To avoid those race conditions make the ACPIPHP code use global PCI
rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug
Multiple race conditions are possible between the addition and removal of
PCI devices during ACPI PCI host bridge hotplug and the generic PCI bus
rescan and device removal that can be triggered via sysfs.
To avoid those race conditions make the ACPI PCI host bridge addition and
removal code use global PCI rescan-remove locking.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There are multiple PCI device addition and removal code paths that may be
run concurrently with the generic PCI bus rescan and device removal that
can be triggered via sysfs. If that happens, it may lead to multiple
different, potentially dangerous race conditions.
The most straightforward way to address those problems is to run
the code in question under the same lock that is used by the
generic rescan/remove code in pci-sysfs.c. To prepare for those
changes, move the definition of the global PCI remove/rescan lock
to probe.c and provide global wrappers, pci_lock_rescan_remove()
and pci_unlock_rescan_remove(), allowing drivers to manipulate
that lock. Also provide pci_stop_and_remove_bus_device_locked()
for the callers of pci_stop_and_remove_bus_device() who only need
to hold the rescan/remove lock around it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Mon, 13 Jan 2014 23:46:15 +0000 (16:46 -0700)]
Merge branch 'pci/aer' into next
* pci/aer:
PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0
ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
Betty Dall [Thu, 5 Dec 2013 15:08:24 +0000 (08:08 -0700)]
PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0
In the discussion for this set of patches [link below], Bjorn Helgaas
pointed out that the ACPI HEST AER error sources do not have the PCIe
segment number associated with the bus. I worked with the ACPI spec and
got this change to definition of the "Bus" field into the recently released
ACPI Spec 5.0a section 18.3.2.3-5:
Identifies the PCI Bus and Segment of the device. The Bus is encoded in
bits 0-7. For systems that expose multiple PCI segment groups, the
segment number is encoded in bits 8-23 and bits 24-31 must be zero. For
systems that do not expose multiple PCI segment groups, bits 8-31 must be
zero. If the GLOBAL flag is specified, this field is ignored.
This patch makes use of the new definition in the only place in the kernel
that uses the acpi_hest_aer_common's bus field.
This depends on 36f3615152c1 ("ACPICA: Add helper macros to extract
bus/segment numbers from HEST table.")
Betty Dall [Mon, 13 Jan 2014 19:17:47 +0000 (12:17 -0700)]
ACPICA: Add helper macros to extract bus/segment numbers from HEST table.
This change adds two macros to extract the encoded bus and segment
numbers from the HEST Bus field.
Signed-off-by: Betty Dall <betty.dall@hp.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts part of f46753c5e354 ("PCI: introduce pci_slot") and d25b7c8d6ba2 ("PCI: rename pci_update_slot_number to pci_renumber_slot"),
removing this interface:
pci_renumber_slot()
[bhelgaas: split to separate patch, add historical link from Alex] Link: http://lkml.kernel.org/r/20081009043140.8678.44164.stgit@bob.kio Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Chiang <achiang@canonical.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts part of 3e1b16002af2 ("ACPI/PCI: PCIe ASPM _OSC support
capabilities called when root bridge added"), removing this interface:
pcie_aspm_enabled()
[bhelgaas: split to separate patch] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Andrew Patterson <andrew.patterson@hp.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts db5679437a2b ("PCI: add interface to set visible size of
VPD"), removing this interface:
pci_vpd_truncate()
[bhelgaas: split to separate patch, also remove prototype from pci.h] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts b48d4425b602 ("PCI: add ID-based ordering enable/disable
support"), removing these interfaces:
pci_enable_ido()
pci_disable_ido()
[bhelgaas: split to separate patch, also remove prototypes from pci.h] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI: Remove unused Optimized Buffer Flush/Fill support
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts 48a92a8179b3 ("PCI: add OBFF enable/disable support"),
removing these interfaces:
pci_enable_obff()
pci_disable_obff()
[bhelgaas: split to separate patch, also remove prototypes from pci.h] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI: Remove unused Latency Tolerance Reporting support
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts 51c2e0a7e5bc ("PCI: add latency tolerance reporting
enable/disable support"), removing these interfaces:
pci_enable_ltr()
pci_disable_ltr()
pci_set_ltr()
[bhelgaas: split to separate patch, also remove prototypes from pci.h] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org>
Bjorn Helgaas [Fri, 10 Jan 2014 21:23:15 +0000 (14:23 -0700)]
Merge branch 'pci/resource' into next
* pci/resource:
PCI: Allocate 64-bit BARs above 4G when possible
PCI: Enforce bus address limits in resource allocation
PCI: Split out bridge window override of minimum allocation address
agp/ati: Use PCI_COMMAND instead of hard-coded 4
agp/intel: Use CPU physical address, not bus address, for ioremap()
agp/intel: Use pci_bus_address() to get GTTADR bus address
agp/intel: Use pci_bus_address() to get MMADR bus address
agp/intel: Support 64-bit GMADR
agp/intel: Rename gtt_bus_addr to gtt_phys_addr
drm/i915: Rename gtt_bus_addr to gtt_phys_addr
agp: Use pci_resource_start() to get CPU physical address for BAR
agp: Support 64-bit APBASE
PCI: Add pci_bus_address() to get bus address of a BAR
PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
PCI: Change pci_bus_region addresses to dma_addr_t
Erik Ekman [Wed, 1 Jan 2014 21:15:25 +0000 (22:15 +0100)]
PCI: Update documentation 00-INDEX file
The PCI-DMA-mapping.txt moved to general docs and became DMA-API-HOWTO.txt
in 5e07c2c7301b ("Documentation: rename PCI/PCI-DMA-mapping.txt to
DMA-API-HOWTO.txt"). Add new file about PCI Express I/O Virtualization.
Signed-off-by: Erik Ekman <erik@kryo.se> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Yinghai Lu [Fri, 20 Dec 2013 17:55:44 +0000 (10:55 -0700)]
PCI: Allocate 64-bit BARs above 4G when possible
Try to allocate space for 64-bit BARs above 4G first, to preserve the space
below 4G for 32-bit BARs. If there's no space above 4G available, fall
back to allocating anywhere.
[bhelgaas: reworked starting from http://lkml.kernel.org/r/1387485843-17403-2-git-send-email-yinghai@kernel.org] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Yinghai Lu [Fri, 20 Dec 2013 16:57:37 +0000 (09:57 -0700)]
PCI: Enforce bus address limits in resource allocation
When allocating space for 32-bit BARs, we previously limited RESOURCE
addresses so they would fit in 32 bits. However, the BUS address need not
be the same as the resource address, and it's the bus address that must fit
in the 32-bit BAR.
This patch adds:
- pci_clip_resource_to_region(), which clips a resource so it contains
only the range that maps to the specified bus address region, e.g., to
clip a resource to 32-bit bus addresses, and
- pci_bus_alloc_from_region(), which allocates space for a resource from
the specified bus address region,
and changes pci_bus_alloc_resource() to allocate space for 64-bit BARs from
the entire bus address region, and space for 32-bit BARs from only the bus
address region below 4GB.
we previously could not put a 32-bit BAR there, because the CPU addresses
don't fit in 32 bits. This patch fixes this, so we can use this space for
32-bit BARs.
It's also possible (though unlikely) to have resources with 32-bit CPU
addresses but bus addresses above 4GB. In this case the previous code
would allocate space that a 32-bit BAR could not map.
Remove PCIBIOS_MAX_MEM_32, which is no longer used.
[bhelgaas: reworked starting from http://lkml.kernel.org/r/1386658484-15774-3-git-send-email-yinghai@kernel.org] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Wed, 18 Dec 2013 23:31:39 +0000 (16:31 -0700)]
PCI: Split out bridge window override of minimum allocation address
pci_bus_alloc_resource() avoids allocating space below the "min" supplied
by the caller (usually PCIBIOS_MIN_IO or PCIBIOS_MIN_MEM). This is to
protect badly documented motherboard resources. But if we're allocating
space inside an already-configured PCI-PCI bridge window, we ignore "min".
See 688d191821de ("pci: make bus resource start address override minimum IO
address").
This patch moves the check to make it more visible and simplify future
patches. No functional change.
Bjorn Helgaas [Mon, 6 Jan 2014 23:15:31 +0000 (16:15 -0700)]
agp/ati: Use PCI_COMMAND instead of hard-coded 4
We're accessing the PCI_COMMAND register here, so use the appropriate
#define. The bit we're writing (1 << 14) isn't defined by the PCI or PCIe
spec, so we don't have a name for it.
Bjorn Helgaas [Mon, 6 Jan 2014 21:43:13 +0000 (14:43 -0700)]
agp/intel: Use CPU physical address, not bus address, for ioremap()
In i810_setup(), i830_setup(), and i9xx_setup(), we use the result of
pci_bus_address() as an argument to ioremap() and to compute gtt_phys_addr.
These should use pci_resource_start() instead because we want the CPU
physical address, not the bus address.
If there were an AGP device behind a host bridge that translated addresses,
e.g., a PNP0A08 device with _TRA != 0, this would fix a bug. I'm not aware
of any of those, but they are possible.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bjorn Helgaas [Sat, 4 Jan 2014 01:29:00 +0000 (18:29 -0700)]
agp/intel: Use pci_bus_address() to get GTTADR bus address
Per the Intel 915G/915GV/... Chipset spec (document number 301467-005),
GTTADR is a standard PCI BAR.
The PCI core reads GTTADR at enumeration-time. Use pci_bus_address()
instead of reading it again in the driver. This works correctly for both
32-bit and 64-bit BARs. The spec above only mentions 32-bit GTTADR, but we
should still use the standard interface.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bjorn Helgaas [Sat, 4 Jan 2014 01:28:31 +0000 (18:28 -0700)]
agp/intel: Use pci_bus_address() to get MMADR bus address
Per the Intel 915G/915GV/... Chipset spec (document number 301467-005),
MMADR is a standard PCI BAR.
The PCI core reads MMADR at enumeration-time. Use pci_bus_address()
instead of reading it again in the driver. This works correctly for both
32-bit and 64-bit BARs. The spec above only mentions 32-bit MMADR, but we
should still use the standard interface.
Also, stop clearing the low 19 bits of the bus address because it's invalid
to use addresses outside the region defined by the BAR. The spec claims
MMADR is 512KB; if that's the case, those bits will be zero anyway.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Yinghai Lu [Sat, 4 Jan 2014 01:28:06 +0000 (18:28 -0700)]
agp/intel: Support 64-bit GMADR
Per the Intel 915G/915GV/... Chipset spec (document number 301467-005),
GMADR is a standard PCI BAR.
The PCI core reads GMADR at enumeration-time. Use pci_bus_address()
instead of reading it again in the driver. This works correctly for both
32-bit and 64-bit BARs. The spec above only mentions 32-bit GMADR, but
Yinghai's patch (link below) indicates some devices have a 64-bit GMADR.
[bhelgaas: reworked starting from http://lkml.kernel.org/r/1385851238-21085-13-git-send-email-yinghai@kernel.org] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bjorn Helgaas [Mon, 6 Jan 2014 21:39:40 +0000 (14:39 -0700)]
agp/intel: Rename gtt_bus_addr to gtt_phys_addr
The only use of gtt_bus_addr is as an argument to ioremap(), so it is a CPU
physical address, not a bus address. Rename it to gtt_phys_addr to reflect
this.
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bjorn Helgaas [Mon, 6 Jan 2014 22:21:16 +0000 (15:21 -0700)]
agp: Use pci_resource_start() to get CPU physical address for BAR
amd_irongate_configure(), ati_configure(), and nvidia_configure() call
ioremap() on an address read directly from a BAR. But a BAR contains a
bus address, and ioremap() expects a CPU physical address. Use
pci_resource_start() to obtain the physical address.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bjorn Helgaas [Sat, 4 Jan 2014 01:26:58 +0000 (18:26 -0700)]
agp: Support 64-bit APBASE
Per the AGP 3.0 spec, APBASE is a standard PCI BAR and may be either 32
bits or 64 bits wide. Many drivers read APBASE directly, but they only
handled 32-bit BARs.
The PCI core reads APBASE at enumeration-time. Use pci_bus_address()
instead of reading it again in the driver. This works correctly for both
32-bit and 64-bit BARs.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PCI/MSI: Add pci_enable_msi_range() and pci_enable_msix_range()
This adds pci_enable_msi_range(), which supersedes the pci_enable_msi()
and pci_enable_msi_block() MSI interfaces.
It also adds pci_enable_msix_range(), which supersedes the
pci_enable_msix() MSI-X interface.
The old interfaces have three categories of return values:
negative: failure; caller should not retry
positive: failure; value indicates number of interrupts that *could*
have been allocated, and caller may retry with a smaller request
zero: success; at least as many interrupts allocated as requested
It is error-prone to handle these three cases correctly in drivers.
The new functions return either a negative error code or a number of
successfully allocated MSI/MSI-X interrupts, which is expected to lead to
clearer device driver code.
pci_enable_msi(), pci_enable_msi_block() and pci_enable_msix() still exist
unchanged, but are deprecated and may be removed after callers are updated.
[bhelgaas: tweak changelog] Suggested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Tejun Heo <tj@kernel.org>
This creates an MSI-X counterpart for pci_msi_vec_count(). Device drivers
can use this function to obtain maximum number of MSI-X interrupts the
device supports and use that number in a subsequent call to
pci_enable_msix().
pci_msix_vec_count() supersedes pci_msix_table_size() and returns a
negative errno if device does not support MSI-X interrupts. After this
update, callers must always check the returned value.
The only user of pci_msix_table_size() was the PCI-Express port driver,
which is also updated by this change.
Device drivers can use this interface to obtain the maximum number of MSI
interrupts the device supports and use that number, e.g., in a subsequent
call to pci_enable_msi_block().
Thomas Petazzoni [Thu, 26 Dec 2013 15:52:41 +0000 (16:52 +0100)]
PCI: mvebu: Call pci_ioremap_io() at startup instead of dynamically
The mvebu PCI host controller driver uses an emulated PCI-to-PCI bridge to
leverage the core PCI kernel enumeration logic to dynamically create and
remove the MBus windows needed to access the memory and I/O regions of each
PCI interface.
In the context of this PCI-to-PCI bridge emulation, the driver emulates
all reads and writes to the PCI bridge registers. Upon a write to the
registers configuring the I/O base and limit, the driver was creating the
MBus window and calling pci_ioremap_io() to setup the mapping.
However, it turns out that accesses to these registers are made in an IRQ
disabled context, while pci_ioremap_io() is a potentially sleeping
function. Not only this is wrong, but it is causing fairly loud warnings
at boot time when the appropriate kernel hacking options are enabled.
This patch solves this by moving the pci_ioremap_io() call to the startup
of the driver. At this point, we don't know how many PCI interfaces will
be enabled, so we are simply remapping the entire PCI I/O space to virtual
addresses. This is reasonable since this I/O space is limited to 1 MB in
size, and also because the MBus windows continue to be created in a dynamic
fashion only when devices need them.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Sat, 21 Dec 2013 15:33:26 +0000 (08:33 -0700)]
PCI: Add pci_bus_address() to get bus address of a BAR
We store BAR information as a struct resource, which contains the CPU
address, not the bus address. Drivers often need the bus address, and
there's currently no convenient way to get it, so they often read the
BAR directly, or use the resource address (which doesn't work if there's
any translation between CPU and bus addresses).
took a pci_dev, but they really depend only on the pci_bus. And we want to
use them in resource allocation paths where we have the bus but not a
device, so this patch converts them to take the pci_bus instead of the
pci_dev:
In fact, with standard PCI-PCI bridges, they only depend on the host
bridge, because that's the only place address translation occurs, but
we aren't going that far yet.
[bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Fri, 20 Dec 2013 19:41:40 +0000 (12:41 -0700)]
Merge branch 'pci/msi' into next
* pci/msi:
PCI/MSI: Make pci_enable_msi/msix() 'nvec' argument type as int
PCI/MSI: Return -ENOSYS for unimplemented interfaces, not -1
PCI/MSI: Return msix_capability_init() failure if populate_msi_sysfs() fails
s390/PCI: Remove superfluous check of MSI type
s390/PCI: Fix single MSI only check
PCI/MSI: Export MSI mode using attributes, not kobjects
Bjorn Helgaas [Fri, 20 Dec 2013 19:38:07 +0000 (12:38 -0700)]
Merge branch 'pci/host-imx6' into next
* pci/host-imx6:
PCI: imx6: Fix bugs in PCIe startup code
PCI: imx6: Start link in Gen1 before negotiating for Gen2 mode
PCI: imx6: Factor out link up wait loop
PCI: imx6: Factor out PHY reset
PCI: imx6: Report "link up" only after link training completes
PCI: imx6: Make reset-gpio optional
PCI/MSI: Return msix_capability_init() failure if populate_msi_sysfs() fails
If populate_msi_sysfs() function failed msix_capability_init() must return
the error code, but it returns the success instead. This update fixes the
described misbehaviour.
Pratyush Anand [Wed, 11 Dec 2013 09:38:33 +0000 (15:08 +0530)]
PCI: designware: Fix I/O transfers by using CPU (not realio) address
pp->io_base, which is the input of the outbound IO address translation
unit, should be the CPU address. It was incorrectly programmed to the
realio address.
We should pass global_io_offset rather than sys->io_offset to
pci_ioremap_io(), so we map the new window into the first available spot in
the Linux view of the I/O space.
We must also pass CPU address instead of realio address to pci_ioremap_io().
This patch fixes above issue. It has been tested with Lecroy PTC in AIC
mode and Pericom PI7C9X2G303EL PCIe switch, which does not work otherwise.
Tested-by: Mohit Kumar <mohit.kumar@st.com> Tested-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Pratyush Anand <pratyush.anand@st.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Marek Vasut <marex@denx.de Reviewed-by: Jagannadha Sutradharudu Teki <jagannadh.teki@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jingoo Han <jg1.han@samsung.com> Cc: Richard Zhu <Hong-Xing.Zhu@freescale.com>
Harro Haan [Thu, 12 Dec 2013 18:29:03 +0000 (19:29 +0100)]
PCI: designware: Fix missing MSI IRQs
The interrupts were cleared after the IRQ handler was called. This means
that new interrupts that occur after the handler handled the previous IRQ
but before the interrupt is cleared will be missed.
Tested-by: Marek Vasut <marex@denx.de> Tested-by: Matthias Mann <m.mann@arkona-technologies.de> Signed-off-by: Harro Haan <hrhaan@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Mohit Kumar <mohit.kumar@st.com> Cc: Richard Zhu <hong-xing.zhu@freescale.com> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Pratyush Anand <pratyush.anand@st.com> Cc: Tim Harvey <tharvey@gateworks.com> Cc: Juergen Beisert <jbe@pengutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Siva Reddy Kallam <siva.kallam@samsung.com> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com> Cc: Sean Cross <xobs@kosagi.com>
PCI/MSI: Export MSI mode using attributes, not kobjects
The PCI MSI sysfs code is a mess with kobjects for things that don't really
need to be kobjects. This patch creates attributes dynamically for the MSI
interrupts instead of using kobjects.
Note, this removes a directory from sysfs. Old MSI kobjects:
pci_device
└── msi_irqs
  └── 40
  └── mode
New MSI attributes:
pci_device
└── msi_irqs
  └── 40
As there was only one file "mode" with the kobject model, the interrupt
number is now a file that returns the "mode" of the interrupt (msi vs.
msix).
Bjorn Helgaas [Thu, 19 Dec 2013 21:24:13 +0000 (14:24 -0700)]
PCI/portdrv: Remove extra get_device()/put_device() for pcie_device
Previously pcie_device_init() called get_device() if device_register() for
the new pcie_device succeeded, and remove_iter() called put_device() when
removing before unregistering the device.
But device_register() already increments the reference count in
device_add(), so we don't need to do it again here.
Levente Kurusa [Thu, 19 Dec 2013 21:22:35 +0000 (14:22 -0700)]
PCI/portdrv: Add put_device() after device_register() failure
This is required so that we give up the last reference to the device.
Removed the kfree() as put_device will result in release_pcie_device()
being called and hence the container of the device will be kfree'd.
[bhelgaas: fix conflict after my previous cleanup] Signed-off-by: Levente Kurusa <levex@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Thu, 19 Dec 2013 21:20:09 +0000 (14:20 -0700)]
PCI/portdrv: Cleanup error paths
Make the straightline path the normal no-error path. Check for errors and
return them directly, instead of checking for success and putting the
normal path in an "if" body.
Marek Vasut [Thu, 12 Dec 2013 21:50:02 +0000 (22:50 +0100)]
PCI: imx6: Start link in Gen1 before negotiating for Gen2 mode
This patch first forces the link into Gen1 mode before starting up the link
and, only after the link is up, start negotiating possible Gen2 mode
operation. This is because without such sequence, some PCIe switches are
not detected at all.
Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Cc: Frank Li <lznuaa@gmail.com> Cc: Harro Haan <hrhaan@gmail.com> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Mohit KUMAR <Mohit.KUMAR@st.com> Cc: Pratyush Anand <pratyush.anand@st.com> Cc: Richard Zhu <r65037@freescale.com> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Sean Cross <xobs@kosagi.com> Cc: Siva Reddy Kallam <siva.kallam@samsung.com> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com> Cc: Tim Harvey <tharvey@gateworks.com> Cc: Troy Kisky <troy.kisky@boundarydevices.com> Cc: Yinghai Lu <yinghai@kernel.org>
Marek Vasut [Thu, 12 Dec 2013 21:50:01 +0000 (22:50 +0100)]
PCI: imx6: Factor out link up wait loop
Split the function that waits for the PCIe link to come up from the rest if
the host init function. We will find this change useful in the subsequent
patch, since this will be called twice then.
No functional change.
[bhelgaas: remove useless "return;"] Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Cc: Frank Li <lznuaa@gmail.com> Cc: Harro Haan <hrhaan@gmail.com> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Mohit KUMAR <Mohit.KUMAR@st.com> Cc: Pratyush Anand <pratyush.anand@st.com> Cc: Richard Zhu <r65037@freescale.com> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Sean Cross <xobs@kosagi.com> Cc: Siva Reddy Kallam <siva.kallam@samsung.com> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com> Cc: Tim Harvey <tharvey@gateworks.com> Cc: Troy Kisky <troy.kisky@boundarydevices.com> Cc: Yinghai Lu <yinghai@kernel.org>
Marek Vasut [Thu, 12 Dec 2013 21:49:59 +0000 (22:49 +0100)]
PCI: imx6: Report "link up" only after link training completes
While waiting for the PHY to report the PCIe link is up, we might hit a
situation where the link training is still in progress, while the PHY
already reports the link is up. Add additional check for this condition.
Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Cc: Frank Li <lznuaa@gmail.com> Cc: Harro Haan <hrhaan@gmail.com> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Mohit KUMAR <Mohit.KUMAR@st.com> Cc: Pratyush Anand <pratyush.anand@st.com> Cc: Richard Zhu <r65037@freescale.com> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Sean Cross <xobs@kosagi.com> Cc: Siva Reddy Kallam <siva.kallam@samsung.com> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com> Cc: Tim Harvey <tharvey@gateworks.com> Cc: Troy Kisky <troy.kisky@boundarydevices.com> Cc: Yinghai Lu <yinghai@kernel.org>
Bjorn Helgaas [Wed, 18 Dec 2013 21:04:35 +0000 (14:04 -0700)]
Merge branch 'pci/vc' into next
* pci/vc:
PCI: Rename PCI_VC_PORT_REG1/2 to PCI_VC_PORT_CAP1/2
PCI: Add Virtual Channel to save/restore support
PCI: Add support for save/restore of extended capabilities
PCI: Add pci_wait_for_pending() (refactor pci_wait_for_pending_transaction())
Bjorn Helgaas [Wed, 18 Dec 2013 21:04:09 +0000 (14:04 -0700)]
Merge branch 'pci/host-mvebu' into next
* pci/host-mvebu:
PCI: mvebu: Remove duplicate of_clk_get_by_name() call
PCI: mvebu: Support a bridge with no IO port window
PCI: mvebu: Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits
PCI: mvebu: Drop writes to bridge Secondary Status register
Bjorn Helgaas [Wed, 18 Dec 2013 21:03:38 +0000 (14:03 -0700)]
Merge branch 'pci/deletion' into next
* pci/deletion:
PCI: Remove from bus_list and release resources in pci_release_dev()
PCI: Move pci_proc_attach_device() to pci_bus_add_device()
PCI: Use device_release_driver() in pci_stop_root_bus()
PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
Yinghai Lu [Sat, 30 Nov 2013 22:40:29 +0000 (14:40 -0800)]
PCI: Remove from bus_list and release resources in pci_release_dev()
Previously we removed the pci_dev from the bus_list and released its
resources in pci_destroy_dev(). But that's too early: it's possible to
call pci_destroy_dev() twice for the same device (e.g., via sysfs), and
that will cause an oops when we try to remove it from bus_list the second
time.
We should remove it from the bus_list only when the last reference to the
pci_dev has been released, i.e., in pci_release_dev().
[bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Yinghai Lu [Sat, 30 Nov 2013 22:40:28 +0000 (14:40 -0800)]
PCI: Move pci_proc_attach_device() to pci_bus_add_device()
4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible")
moved pci_proc_attach_device() from pci_bus_add_device() to
pci_device_add().
This moves it back to pci_bus_add_device(), essentially reverting that
part of 4f535093cf8f. This makes it symmetric with pci_stop_dev(),
where we call pci_proc_detach_device() and pci_remove_sysfs_dev_files()
and set dev->is_added = 0.
[bhelgaas: changelog, create sysfs then attach proc for symmetry] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Yinghai Lu [Sat, 30 Nov 2013 22:40:27 +0000 (14:40 -0800)]
PCI: Use device_release_driver() in pci_stop_root_bus()
To be consistent with 4bff6749905d ("PCI: Move device_del() from
pci_stop_dev() to pci_destroy_dev()", this changes pci_stop_root_bus()
to use device_release_driver() instead of device_del().
This also changes pci_remove_root_bus() to use device_unregister()
instead of put_device() so it corresponds with the device_register()
call in pci_create_root_bus().
[bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some investigation documented in kernel bug #65281 led me to the
conclusion that the source of the problem is the device_del() in
pci_stop_dev() as it now causes the sysfs directory of the device to be
removed recursively along with all of its subdirectories. That includes
the sysfs directory of the device's subordinate bus (dev->subordinate) and
its "power" group.
Consequently, when pci_remove_bus() is called for dev->subordinate in
pci_remove_bus_device(), it calls device_unregister(&bus->dev), but at this
point the sysfs directory of bus->dev doesn't exist any more and its
"power" group doesn't exist either. Thus, when dpm_sysfs_remove() called
from device_del() tries to remove that group, it triggers the above
warning.
That indicates a logical mistake in the design of
pci_stop_and_remove_bus_device(), which causes bus device objects to be
left behind their parents (bridge device objects) and can be fixed by
moving the device_del() from pci_stop_dev() into pci_destroy_dev(), so
pci_remove_bus() can be called for the device's subordinate bus before the
device itself is unregistered from the hierarchy. Still, the driver, if
any, should be detached from the device in pci_stop_dev(), so use
device_release_driver() directly from there.
References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6 Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Alex Williamson [Tue, 17 Dec 2013 23:43:51 +0000 (16:43 -0700)]
PCI: Add Virtual Channel to save/restore support
While we don't really have any infrastructure for making use of VC
support, the system BIOS can configure the topology to non-default
VC values prior to boot. This may be due to silicon bugs, desire to
reserve traffic classes, or perhaps just BIOS bugs. When we reset
devices, the VC configuration may return to default values, which can
be incompatible with devices upstream. For instance, Nvidia GRID
cards provide a PCIe switch and some number of GPUs, all supporting
VC. The power-on default for VC is to support TC0-7 across VC0,
however some platforms will only enable TC0/VC0 mapping across the
topology. When we do a secondary bus reset on the downstream switch
port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end
of the link only enables TC0/VC0. If the GPU attempts to use TC1-7,
it fails.
This patch attempts to provide complete support for VC save/restore,
even beyond the minimally required use case above. This includes
save/restore and reload of the arbitration table, save/restore and
reload of the port arbitration tables, and re-enabling of the
channels for VC, VC9, and MFVC capabilities.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Mon, 16 Dec 2013 00:23:54 +0000 (17:23 -0700)]
PCI: pciehp: Move Attention & Power Indicator support tests to accessors
Previously, the caller checked ATTN_LED() or PWR_LED() to see whether the
slot has indicators before setting the indicator state. That clutters the
caller unnecessarily, so this moves the test inside the callees. The test
may not even be necessary; per spec it should be harmless to try to turn on
a non-existent LED. But checking first does avoid unnecessary hotplug
commands.