Andreas Herrmann [Fri, 13 Jan 2006 01:26:11 +0000 (02:26 +0100)]
[SCSI] zfcp: transport class adaptations II
Replaced zfcp adapter attributes with fc_host attributes:
fc_topology by port_type, physical_wwpn by permanent_port_name.
Make use of fc_host attribute supported_speeds.
Removed zfcp adapter attribute physical_s_id.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Added host stats, removed superfluous get_starget_ functions,
removed some attributes from zfcp specific sysfs tree (e.g.
scsi_host_no, scsi_lun, wwnn and d_id).
Host stats are given for the physical adapter port not for the
virtual adapter. Reset stats is implemented in the device driver.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Andreas Herrmann [Fri, 13 Jan 2006 01:16:54 +0000 (02:16 +0100)]
[SCSI] fc transport: add permanent_port_name fc_host attribute
Add fc_host attribute permanent_port_name which is
used to show the port name of the primary port -
the port that initially logged into the fabric.
For a virtual port (registered via the primary port with
FDISC command) it is useful to know not only its (virtual)
port name but also the permanent port name.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
[SCSI] sr: split sr_audio_ioctl into specific helpers
split each ioctl handled in sr_audio_ioctl into a function of it's own.
This cleans the code up nicely, and allows various places in sr_ioctl
to call these helpers directly instead of going through the multiplexer.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
[SCSI] always handle REQ_BLOCK_PC requests in common code
LLDDs should never see REQ_BLOCK_PC requests, we can handle them just
fine in the core code. There is a small behaviour change in that some
check in sr's rw_intr are bypassed, but I consider the old behaviour
a bug.
Mike found this cleanup opportunity and provdided early patches, so all
the credit goes to him, even if I redid the patches from scratch beause
that was easier than forward-porting the old patches.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
[SCSI] sas: fix removal of devices behind expanders
We need to iterate over all children when removing and expander, else
stale objects will be around after host removal. This fixes the oops
Eric Moore saw when removing and reloading mptsas.
Also don't try the scsi_remove_target call unless operating on an end
device. The current unconditional call is harmless but confusing.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Petr Vandrovec [Wed, 11 Jan 2006 19:31:07 +0000 (11:31 -0800)]
[SCSI] Pass proper device from BusLogic to SCSI layer
While trying to get SUSE's SLES9 working on system with more than 4GB we've
noticed that SCSI layer happilly passes addresses over 4GB to the buslogic
driver, which is quite a big problem as buslogic can generate only 32bit
busmastering cycles.
Fortunately in the current kernels this problem does not exist anymore as
SCSI layer now assumes 4GB capable device by default, but it is still good
idea to pass correct device structure to the SCSI layer. If nothing else,
/sys/block/sda/device now points to
/sys/devices/pci0000:00/0000:00:10.0/host0/... instead of
/sys/devices/platform/host0/... like it did in the past.
Change does nothing for ISA based BusLogic adapters, they'll still end
under platform (and they are probably broken for long time as I do not see
anything forcing ISA 16MB limit for them).
Signed-off-by: Petr Vandrovec <petr@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
James Bottomley [Thu, 12 Jan 2006 18:07:13 +0000 (12:07 -0600)]
[SCSI] aic79xx: bump version to 3.0
This takes us past the old 1.x version of the SCSI driver and the 2.x
version of the aic website version to reflect the full incorporation
of both branches.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Mark Haverkamp [Wed, 11 Jan 2006 17:28:29 +0000 (09:28 -0800)]
[SCSI] aacraid: 17 element sg performance update
Received From Mark Salyzyn.
The Jaguar and Corsair class of adapters (2410, 2810, 2610, 21610, CERC)
perform better (about 10% better read performance, write performance
neutral) with current Firmware if the OS limits the number of scatter
gather elements to 17 per request.
Signed-off-by: Mark Haverkamp <markh@osdl.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
James Bottomley [Tue, 10 Jan 2006 18:11:42 +0000 (12:11 -0600)]
[SCSI] aic7xxx: fix timer handling bug
The driver is doing a rather stupid mod_timer allegedly to "give
request sense more time to complete". This is illegal and pointless,
so just eliminate it. Also eliminate all the other uses of struct
timer_list in the driver, which are mostly bogus.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Adrian Bunk [Fri, 6 Jan 2006 19:21:28 +0000 (20:21 +0100)]
[SCSI] lpfc_scsi.c: make lpfc_get_scsi_buf() static
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: James Smart <James.Smart@Emulex.Com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Moore, Eric [Wed, 4 Jan 2006 21:58:43 +0000 (14:58 -0700)]
[SCSI] raid_class.c - adding RAID10 and RAID10 defines
Adding defines for RAID10 and RAID50 levels, in preparation
of adding RAID Transport support in the mpt fusion drivers.
(BTW: IME is RAID10, and IM is RAID1).
Signed-off-by: Eric Moore <Eric.Moore@lsil.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
o This patch fixes the problem of secondary cpus boot up. This situation
is faced when kernel is built for default locations like 16MB and
onwards. In this configuration, only primary cpu (BP) comes and
secondary cpus don't boot.
o Problem occurs because in trampoline code, lgdt is not able to load the
GDT as it happens to be situated beyond 16MB. This is due to the fact
that cpu is still in real mode and default operand size is 16bit.
o This patch uses lgdtl instead of lgdt to force operand size to 32
instead of 16.
Andi Kleen [Wed, 11 Jan 2006 21:46:57 +0000 (22:46 +0100)]
[PATCH] x86_64: Allow kernel page tables upto the end of memory
Previously they would be only allocated before the kernel text at
1MB. This limited the maximum supported memory to 128GB.
Now allow the e820 allocator to put them everywhere. Try
to put them beyond any DMA zones to avoid filling them up.
This should free some GFP_DMA memory compared to earlier kernels.
Andi Kleen [Wed, 11 Jan 2006 21:46:54 +0000 (22:46 +0100)]
[PATCH] x86_64: Use safe_smp_processor_id in MCE handler
hard_smp_processor_id would return the local APIC id instead
of the Linux processor id. On big systems they are often
not identical. safe_smp_processor_id is just a wrapper
around it that does the necessary conversions.
Andi Kleen [Wed, 11 Jan 2006 21:46:51 +0000 (22:46 +0100)]
[PATCH] x86_64: Some housekeeping in local APIC code
Remove support for obsolete hardware and cleanup.
- Remove checks for non integrated APICs
- Replace apic_write_around with apic_write.
- Remove apic_read_around
- Remove APIC version reads used by old workarounds
- Remove old workaround for Simics
- Fix indentation
Jan Beulich [Wed, 11 Jan 2006 21:46:48 +0000 (22:46 +0100)]
[PATCH] x86_64: Display meaningful part of filename during BUG()
When building in a separate objtree, file names produced by BUG() & Co. can
get fairly long; printing only the first 50 characters may thus result in
(almost) no useful information. The following change makes it so that rather
the last 50 characters of the filename get printed.
Jan Beulich [Wed, 11 Jan 2006 21:46:45 +0000 (22:46 +0100)]
[PATCH] x86_64: Reduce screen space needed by stack trace
Especially under Xen, where the console cannot be adjusted to more than 25
lines, it is fairly important that the information displayed during a panic
is as compact as possible. Below adjustments work towards that.
Jan Beulich [Wed, 11 Jan 2006 21:46:42 +0000 (22:46 +0100)]
[PATCH] x86_64: Fix get_cmos_time()
Due to a broken condition, the body of the loop that is intended to wait for
the Update-In-Progress bit to get set and then cleared again was never
entered; in fact, the entire loop was optimized out by the compiler. Here is
a change to fix the condition (and to also move the initialization of locals
out of the spin lock protected region).
Andi Kleen [Wed, 11 Jan 2006 21:46:36 +0000 (22:46 +0100)]
[PATCH] x86_64: Remove unused AMD K8 C stepping flag
X86_FEATURE_K8_C was a synthetic Linux CPUID flag that was used for some
code optimizations in Opteron C stepping or later. But support for pre C
stepping optimizations has been removed, so this isn't needed anymore.
Vivek Goyal [Wed, 11 Jan 2006 21:46:21 +0000 (22:46 +0100)]
[PATCH] x86_64: ioapic virtual wire mode fix
o Currently, during kexec reboot, IOAPIC is re-programmed back to virtual
wire mode if there was an i8259 connected to it. This enables getting
timer interrupts in second kernel in legacy mode.
o After putting into virtual wire mode, IOAPIC delivers the i8259 interrupts
to CPU0. This works well for kexec but not for kdump as we might crash
on a different CPU and second kernel will not see timer interrupts.
o This patch modifies the redirection table entry to deliver the timer
interrupts to the cpu we are rebooting (instead of hardcoding to zero).
This ensures that second kernel receives timer interrupts even on a
non-boot cpu.
[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_arch
Introduce vSMP arch to the kernel.
This patch:
1. Adds CONFIG_X86_VSMP
2. Adds machine specific macros for local_irq_disabled, local_irq_enabled
and irqs_disabled
3. Writes to the vSMP CTL device to indicate kernel compiled with CONFIG_VSMP
[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_align
vSMP specific alignment patch to
1. Define INTERNODE_CACHE_SHIFT for vSMP
2. Use this for alignment of critical structures
3. Use INTERNODE_CACHE_SHIFT for ARCH_MIN_TASKALIGN,
and let the slab align task_struct allocations to the internode cacheline size
4. Introduce and use ARCH_MIN_MMSTRUCT_ALIGN for mm_struct slab allocations.
Andi Kleen [Wed, 11 Jan 2006 21:46:12 +0000 (22:46 +0100)]
[PATCH] x86_64: Make sure BITS_PER_ATOMIC is defined in asm-generic/atomic.h
Fixes
CC fs/nfsctl.o
In file included from include2/asm/atomic.h:427,
from /home/lsrc/quilt/linux/include/linux/file.h:8,
from /home/lsrc/quilt/linux/fs/nfsctl.c:8:
/home/lsrc/quilt/linux/include/asm-generic/atomic.h:20:5: warning: "BITS_PER_LONG" is not defined
[PATCH] x86_64: Memorize location of i8259 for reboots.
Currently we attempt to restore virtual wire mode on reboot, which only
works if we can figure out where the i8259 is connected. This is very
useful when we are kexec another kernel and likely helpful to an peculiar
BIOS that make assumptions about how the system is setup.
Since the acpi MADT table does not provide the location where the i8259 is
connected we have to look at the hardware to figure it out.
Most systems have the i8259 connected the local apic of the cpu so won't be
affected but people running Opteron and some serverworks chipsets should be
able to use kexec now.
In addition this patch removes the hard coded assumption that the io_apic
that delivers isa interrups is always known to the kernel as io_apic 0.
There does not appear to be anything to guarantee that assumption is true.
And From: Vivek Goyal <vgoyal@in.ibm.com>
A minor fix to the patch which remembers the location of where i8259 is
connected. Now counter i has been replaced by apic. counter i is having
some junk value which was leading to non-detection of i8259 connected to
IOAPIC.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chuck Ebbert [Wed, 11 Jan 2006 21:46:03 +0000 (22:46 +0100)]
[PATCH] x86_64: allow setting RF in EFLAGS
Setting RF (resume flag) allows a debugger to resume execution after a code
breakpoint without tripping the breakpoint again. It is reset by the CPU
after executing one instruction.
arch/x86_64/kernel/mce_amd.c:321:29: warning: Using plain integer as NULL pointer
arch/x86_64/kernel/mce_amd.c:410:41: warning: Using plain integer as NULL pointer
Andi Kleen [Wed, 11 Jan 2006 21:45:45 +0000 (22:45 +0100)]
[PATCH] x86_64: Fix warning in nmi.c on uniprocessor kernels
Fix
CC arch/x86_64/kernel/nmi.o
linux/arch/x86_64/kernel/nmi.c: In function ???check_nmi_watchdog???:
linux/arch/x86_64/kernel/nmi.c:155: warning: statement with no effect
Patch uses a static PDA array early at boot and reallocates processor PDA
with node local memory when kmalloc is ready, just before pda_init.
The boot_cpu_pda is needed since the cpu_pda is used even before pda_init for
that cpu is called (to set the static per-cpu areas offset table etc)
[PATCH] x86_64: Early initialization of cpu_to_node
Patch enables early intialization of cpu_to_node.
apicid_to_node is built by reading the SRAT table, from acpi_numa_init with
ACPI_NUMA and k8_scan_nodes with K8_NUMA.
x86_cpu_to_apicid is built by parsing the ACPI MADT table, from acpi_boot_init.
We combine these two tables and setup cpu_to_node.
Early intialization helps the static per_cpu_areas in getting pages from
correct node.
Change since last release:
Do not initialize early init_cpu_to_node for faking node cases.
Patch tested on TYAN dual core 4P board with K8 only, ACPI_NUMA.
Tested on EM64T NUMA. Also tested with numa=off, numa=fake, and running
a kernel compiled with NUMA on a regular EM64 2 way SMP.
Andi Kleen [Wed, 11 Jan 2006 21:45:27 +0000 (22:45 +0100)]
[PATCH] i386: Replace broken serialize_cpu in microcode driver with correct sync_core
Passing random input values in eax to cpuid is not a good idea
because the CPU will GPF for unknown ones.
Use the correct x86-64 version that exists for a longer time too.
This also adds a memory barrier to prevent the optimizer from
reordering.
Andi Kleen [Wed, 11 Jan 2006 21:45:21 +0000 (22:45 +0100)]
[PATCH] x86_64: Support alternative() in vsyscalls
The real vsyscall .text addresses are not mapped when the alternative()
replacement runs early, so use some black magic to access them using
the direct mapping.
Andi Kleen [Wed, 11 Jan 2006 21:44:45 +0000 (22:44 +0100)]
[PATCH] x86_64: Clean up copy_*_user
- Remove optimization for old B stepping Opteron
- Make the fast path for copies with a multiple of eight length faster.
- Minor instruction rearrangement to hopefully avoid a pipeline
stall or two.
- Add comment about errata to consider.
Muli Ben-Yehuda [Wed, 11 Jan 2006 21:44:42 +0000 (22:44 +0100)]
[PATCH] x86_64: Use function pointers to call DMA mapping functions
AK: I hacked Muli's original patch a lot and there were a lot
of changes - all bugs are probably to blame on me now.
There were also some changes in the fall back behaviour
for swiotlb - in particular it doesn't try to use GFP_DMA
now anymore. Also all DMA mapping operations use the
same core dma_alloc_coherent code with proper fallbacks now.
And various other changes and cleanups.
Known problems: iommu=force swiotlb=force together breaks
needs more testing.
This patch cleans up x86_64's DMA mapping dispatching code. Right now
we have three possible IOMMU types: AGP GART, swiotlb and nommu, and
in the future we will also have Xen's x86_64 swiotlb and other HW
IOMMUs for x86_64. In order to support all of them cleanly, this
patch:
- introduces a struct dma_mapping_ops with function pointers for each
of the DMA mapping operations of gart (AMD HW IOMMU), swiotlb
(software IOMMU) and nommu (no IOMMU).
- gets rid of:
if (swiotlb)
return swiotlb_xxx();
- PCI_DMA_BUS_IS_PHYS is now checked against the dma_ops being set
This makes swiotlb faster by avoiding double copying in some cases.
Signed-Off-By: Muli Ben-Yehuda <mulix@mulix.org> Signed-Off-By: Jon D. Mason <jdmason@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Wed, 11 Jan 2006 21:44:39 +0000 (22:44 +0100)]
[PATCH] x86_64: Reject SRAT tables that don't cover all memory
Broken BIOS on Iwill 8way systems reports these and it causes the bootmem
allocator to crash. Add a sanity check if all the PXMs in the
SRAT table cover all memory as reported by e820. If the sanity
check fails the SRAT is rejected and the code will fall back
to discover the NUMA topology using the K8 northbridge registers
when applicable.
Andi Kleen [Wed, 11 Jan 2006 21:44:36 +0000 (22:44 +0100)]
[PATCH] x86_64: Add idle notifiers
This adds a new notifier chain that is called with IDLE_START
when a CPU goes idle and IDLE_END when it goes out of idle.
The context can be idle thread or interrupt context.
Since we cannot rely on MONITOR/MWAIT existing the idle
end check currently has to be done in all interrupt
handlers.
They were originally inspired by the similar s390 implementation.
They have a variety of applications:
- They will be needed for CONFIG_NO_IDLE_HZ
- They can be used for oprofile to fix up the missing time
in idle when performance counters don't tick.
- They can be used for better C state management in ACPI
- They could be used for microstate accounting.
[PATCH] x86_64: Handle missing local APIC timer interrupts on C3 state
Whenever we see that a CPU is capable of C3 (during ACPI cstate init), we
disable local APIC timer and switch to using a broadcast from external timer
interrupt (IRQ 0).
[PATCH] i386: Handle missing local APIC timer interrupts on C3 state
Whenever we see that a CPU is capable of C3 (during ACPI cstate init), we
disable local APIC timer and switch to using a broadcast from external timer
interrupt (IRQ 0). This is needed because Intel CPUs stop the local
APIC timer in C3. This is currently only enabled for Intel CPUs.
Patch below adds the code for i386 and also the ACPI hunk.
[PATCH] i386/x86-64: Remove sub jiffy profile timer support
Remove the finer control of local APIC timer. We cannot provide a sub-jiffy
control like this when we use broadcast from external timer in place of
local APIC. Instead of removing this only on systems that may end up using
broadcast from external timer (due to C3), I am going the
"I'm feeling lucky" way to remove this fully. Basically, I am not sure about
usefulness of this code today. Few other architectures also don't seem to
support this today.
If you are using profiling and fine grained control and don't like this going
away in normal case, yell at me right now.
John Blackwood [Wed, 11 Jan 2006 21:44:15 +0000 (22:44 +0100)]
[PATCH] x86_64: Report hardware breakpoints in user space when triggered by the kernel
I would like to throw out a suggestion for a possible change in the way that
the debug register traps are handled in do_debug() when the trap occurs
in kernel-mode.
In the x86_64 version of do_debug(), the code will skip around sending
a SIGTRAP to the current task if the trap occurred while in kernel mode.
On the i386-side of things, if the access happens to occur in kernel mode
(say during a read(2) of user's buffer that matches the address of a
debug register trap), then the do_debug() routine for i386 will go ahead
and call send_sigtrap() and send the SIGTRAP signal. The send_sigtrap()
code will also set the info.si_addr to NULL in this case (even though I
don't understand why, since the SIGTRAP siginfo processing doesn't use
the si_addr field...).
So I would like to suggest that the x86_64 do_debug() routine also
follow this type of behavior and have it go ahead and send the
SIGTRAP signal to the current task, even if the debug register trap
happens to have occurred in kernel mode. I have taken a stab at
a patch for this change below. (It includes the i386-ish change
for setting si_addr to NULL when the trap occurred in kernel mode.)
It seems like a useful feature to be able to 'watch' a user location that
might also be modified in the kernel via a system service call, and have the
debugger report that information back to the user, rather than to just
silently ignore the trap.
Additionally, I realize that users that pull in a kernel debugger such as
KGDB into their kernel might want to remove this change below when they add
in KGDB support. However, they could alternatively look at the current
task's thread.debugreg[] values to see if the trap occurred due to KGDB
or instead because of a user-space debugger trap, and still honor the
user SIGTRAP processing (instead of the KGDB breakpoint processing)
if the trap matches up with the thread.debugreg[] registers.
Andi Kleen [Wed, 11 Jan 2006 21:43:54 +0000 (22:43 +0100)]
[PATCH] x86_64: Allow compilation on a 32bit biarch toolchain
This might help on distributions that use a 32bit biarch compiler.
First pass -m64 by default.
Secondly add some more .code32s because at least the Ubuntu biarch
32bit as called by gcc doesn't seem to handle -m64 -m32 as generated
by the Makefile without such assistance.
And finally make sure the linker script can be preprocessed
with a 32bit cpp.
Ross Biro [Wed, 11 Jan 2006 21:43:51 +0000 (22:43 +0100)]
[PATCH] x86_64: Make udelay more accurate
The attempt to avoid overflow in __delay caused varying precision
on different CPUs depending on differences in the CPU speed.
We should be able to do this multiplication with out overflowing
provided the
cpu is running at less than about 128 GHz. xloops < 20000 * 0x10c6.
loops_per_jiffy * HZ <= cpu_clock_speed. So if the cpu clock speed
< 2^64/(20000 * 0x10c6) = 2^64/ 51E6CC0 < 2^64/2^27 = 2^37 = 128G we
will not overflow the calculation.
Andi Kleen [Wed, 11 Jan 2006 21:43:45 +0000 (22:43 +0100)]
[PATCH] x86_64: Handle unknown node (-1) in alloc_pages_node
Following kmalloc_node.
Needed for another patch to return -1 for unknown nodes in x86-64.
Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: kiran@scalex86.org Signed-off-by: Andi Kleen <ak@suse.de>
[ Changed 0 to numa_node_id() on suggestion by Christoph Lameter ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andi Kleen [Wed, 11 Jan 2006 21:43:42 +0000 (22:43 +0100)]
[PATCH] x86_64: Validate SLIT table
A lot of Opteron BIOS just pass 10 in all SLIT entries (10 is the
normalized unit). This is actually worse than the default heuristic
because it leads to pci_distance not knowing the difference between
local and remote nodes anymore. This messes up some NUMA
heuristics in generic code.
In this case it's better to fall back to the default heuristic
which just does nodea == nodeb ? 10 : 20.
This patch does some basic sanity checking on the SLIT and only accepts
the SLIT when it passes.
Invariants enforced are:
- Node to itself shall be 10
- Any other distance shouldn't be 10
- Distances smaller than 10 are illegal
Jan Beulich [Wed, 11 Jan 2006 21:43:36 +0000 (22:43 +0100)]
[PATCH] x86_64: Fix 64bit FXSAVE encoding
The separation of the rex64 prefix (on fxsave/fxrstor) by way of using
a semicolon resulted in the prefix not always taking effect (because
when extended registers are needed for addressing, another rex prefix
would have been generated by the compiler), thus (depending on the
build) resulting in eventually getting 32-bit saves and/or restores.