Implemented the use of a cache object for all internal
namespace nodes. Since there are about 1000 static nodes
in a typical system, this will decrease memory use for
cache implementations that minimize per-allocation overhead
(such as a slab allocator.)
Removed the reference count mechanism for internal
namespace nodes, since it was deemed unnecessary. This
reduces the size of each namespace node by about 5%-10%
on all platforms. Nodes are now 20 bytes for the 32-bit
case, and 32 bytes for the 64-bit case.
Optimized several internal data structures to reduce
object size on 64-bit platforms by packing data within
the 64-bit alignment. This includes the frequently used
ACPI_OPERAND_OBJECT, of which there can be ~1000 static
instances corresponding to the namespace objects.
Added two new strings for the predefined _OSI method:
"Windows 2001.1 SP1" and "Windows 2006".
Split the allocation tracking mechanism out to a separate
file, from utalloc.c to uttrack.c. This mechanism appears
to be only useful for application-level code. Kernels may
wish to not include uttrack.c in distributions.
Removed all remnants of the obsolete ACPI_REPORT_* macros
and the associated code. (These macros have been replaced
by the ACPI_ERROR and ACPI_WARNING macros.)
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Tagged all external interfaces to the subsystem with the
new ACPI_EXPORT_SYMBOL macro. This macro can be defined
as necessary to assist kernel integration. For Linux,
the macro resolves to the EXPORT_SYMBOL macro. The default
definition is NULL.
Added the ACPI_THREAD_ID type for the return value from
acpi_os_get_thread_id(). This allows the host to define this
as necessary to simplify kernel integration. The default
definition is ACPI_NATIVE_UINT.
Valery Podrezov fixed two interpreter problems related
to error processing, the deletion of objects, and placing
invalid pointers onto the internal operator result stack.
http://bugzilla.kernel.org/show_bug.cgi?id=6028
http://bugzilla.kernel.org/show_bug.cgi?id=6151
Increased the reference count threshold where a warning is
emitted for large reference counts in order to eliminate
unnecessary warnings on systems with large namespaces
(especially 64-bit.) Increased the value from 0x400
to 0x800.
Due to universal disagreement as to the meaning of the
'c' in the calloc() function, the ACPI_MEM_CALLOCATE
macro has been renamed to ACPI_ALLOCATE_ZEROED so that the
purpose of the interface is 'clear'. ACPI_MEM_ALLOCATE and
ACPI_MEM_FREE are renamed to ACPI_ALLOCATE and ACPI_FREE.
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Implemented a change to the IndexField support to match
the behavior of the Microsoft AML interpreter. The value
written to the Index register is now a byte offset,
no longer an index based upon the width of the Data
register. This should fix IndexField problems seen on
some machines where the Data register is not exactly one
byte wide. The ACPI specification will be clarified on
this point.
Fixed a problem where several resource descriptor
types could overrun the internal descriptor buffer due
to size miscalculation: VendorShort, VendorLong, and
Interrupt. This was noticed on IA64 machines, but could
affect all platforms.
Fixed a problem where individual resource descriptors were
misaligned within the internal buffer, causing alignment
faults on IA64 platforms.
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
Removed a couple of extraneous ACPI_ERROR messages that
appeared during normal execution. These became apparent
after the conversion from ACPI_DEBUG_PRINT.
Fixed a problem where the CreateField operator could hang
if the BitIndex or NumBits parameter referred to a named
object. From Valery Podrezov.
http://bugzilla.kernel.org/show_bug.cgi?id=5359
Fixed a problem where a DeRefOf operation on a buffer
object incorrectly failed with an exception. This also
fixes a couple of related RefOf and DeRefOf issues.
From Valery Podrezov.
http://bugzilla.kernel.org/show_bug.cgi?id=5360
http://bugzilla.kernel.org/show_bug.cgi?id=5387
http://bugzilla.kernel.org/show_bug.cgi?id=5392
Fixed a problem where the AE_BUFFER_LIMIT exception was
returned instead of AE_STRING_LIMIT on an out-of-bounds
Index() operation. From Valery Podrezov.
http://bugzilla.kernel.org/show_bug.cgi?id=5480
Implemented a memory cleanup at the end of the execution
of each iteration of an AML While() loop, preventing the
accumulation of outstanding objects. From Valery Podrezov.
http://bugzilla.kernel.org/show_bug.cgi?id=5427
Eliminated a chunk of duplicate code in the object
resolution code. From Valery Podrezov.
http://bugzilla.kernel.org/show_bug.cgi?id=5336
Fixed several warnings during the 64-bit code generation.
Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: (24 commits)
[PARISC] Fix double free when removing HIL drivers
[PARISC] Add atomic_sub_and_test
[PARISC] Enabled some NLS modules in a500, b180 and c3000 defconfigs
[PARISC] Kill duplicated EXPORT_SYMBOL warnings
[PARISC] Move ioremap EXPORT_SYMBOL from parisc_ksyms.c
[PARISC] Make local_t use atomic_long_t
[PARISC] Update defconfigs
[PARISC] Add PREEMPT support
[PARISC] More useful readwrite lock helpers
[PARISC] Convert HIL drivers to use input_allocate_device
[PARISC] Fixup CONFIG_EISA a bit
[PARISC] getsockopt should be ENTRY_COMP
[PARISC] Remove obsolete CONFIG_DEBUG_IOREMAP
[PARISC] Temporary FIXME for ioremapping EISA regions
[PARISC] Enable ioremap functionality unconditionally
[PARISC] Fix stifb with IOREMAP and a 64-bit kernel
[PARISC] Add CONFIG_HPPA_IOREMAP to conditionally enable ioremap
[PARISC] Add STRICT_MM_TYPECHECKS
[PARISC] Fix IOREMAP with a 64-bit kernel
[PARISC] Add parisc implementation of flush_kernel_dcache_page()
...
Linus Torvalds [Thu, 30 Mar 2006 22:32:38 +0000 (14:32 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mad: RMPP support for additional classes
IB/mad: include GID/class when matching receives
IB/mthca: Fix section mismatch problems
IPoIB: Fix oops with raw sockets
IB/mthca: Fix check of size in SRQ creation
IB/srp: Fix unmapping of fake scatterlist
Linus Torvalds [Thu, 30 Mar 2006 22:29:20 +0000 (14:29 -0800)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] sata_mv: three bug fixes
[PATCH] libata: ata_dev_init_params() fixes
[PATCH] libata: Fix interesting use of "extern" and also some bracketing
[PATCH] libata: Simplex and other mode filtering logic
[PATCH] libata - ATA is both ATA and CFA
[PATCH] libata: Add ->set_mode hook for odd drivers
[PATCH] libata: BMDMA handling updates
[PATCH] libata: kill trailing whitespace
[PATCH] libata: add FIXME above ata_dev_xfermask()
[PATCH] libata: cosmetic changes in ata_bus_softreset()
[PATCH] libata: kill E.D.D.
Linus Torvalds [Thu, 30 Mar 2006 20:38:18 +0000 (12:38 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] ioremap() should prefer WB over UC
[IA64] Add __mca_table to the DISCARD list in gate.lds
[IA64] Move __mca_table out of the __init section
[IA64] simplify some condition checks in iosapic_check_gsi_range
[IA64] correct some messages and fixes some minor things
[IA64-SGI] fix for-loop in sn_hwperf_geoid_to_cnode()
[IA64-SGI] sn_hwperf use of num_online_cpus()
[IA64] optimize flush_tlb_range on large numa box
[IA64] lazy_mmu_prot_update needs to be aware of huge pages
Jens Axboe [Thu, 30 Mar 2006 13:15:30 +0000 (15:15 +0200)]
[PATCH] Introduce sys_splice() system call
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).
From the splice.c comments:
"splice": joining two ropes together by interweaving their strands.
This is the "extended pipe" functionality, where a pipe is used as
an arbitrary in-memory buffer. Think of a pipe as a small kernel
buffer that you can use to transfer data from one end to the other.
The traditional unix read/write is extended with a "splice()" operation
that transfers data buffers to or from a pipe buffer.
Named by Larry McVoy, original implementation from Linus, extended by
Jens to support splicing to files and fixing the initial implementation
bugs.
Kyle McMartin [Thu, 30 Mar 2006 16:47:32 +0000 (11:47 -0500)]
[PARISC] Fix double free when removing HIL drivers
On Thu, Mar 30, 2006 at 08:31:02AM -0500, Dmitry Torokhov wrote:
> Don't do that, its double free. input_unregister_device() normally
> causes release() to be called and free the device. input_free_device
> is only to be called when input_register_device has not been called or
> failed.
>
> Plus you might want to unregister device after closing serio port,
> otherwise your interrupt routine might be referencing already freed
> memory.
Kyle McMartin [Wed, 29 Mar 2006 22:21:12 +0000 (15:21 -0700)]
[PARISC] Kill duplicated EXPORT_SYMBOL warnings
Some symbols are exported both in parisc_ksyms.c, and at their
definition site. Nuke the redundant EXPORT_SYMBOL in ksyms to quiet
warnings when vmlinux is linked.
Helge Deller [Wed, 22 Mar 2006 22:19:46 +0000 (15:19 -0700)]
[PARISC] Fix stifb with IOREMAP and a 64-bit kernel
Kill various warnings when built using ioremap.
Remove stifb_{read,write} functions, which are now obsolete (and stack abusers!)
Disable stifb mmap() functionality on a 64-bit kernel, it will crash the
machine.
Bjorn Helgaas [Thu, 30 Mar 2006 16:53:39 +0000 (09:53 -0700)]
[IA64] ioremap() should prefer WB over UC
efi_memmap_init() collects full granules of WB memory, without
regard for whether they also support UC. So in order for ioremap()
to work for main memory, it must prefer WB mappings when possible.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Jack Morgenstein [Wed, 29 Mar 2006 00:39:07 +0000 (16:39 -0800)]
IB/mad: include GID/class when matching receives
Received responses are currently matched against sent requests based
on TID only. According to the spec, responses should match based on
the combination of TID, management class, and requester LID/GID.
Without the additional qualification, an agent that is responding to
two requests, both of which have the same TID, can match RMPP ACKs
with the incorrect transaction. This problem can occur on the SM node
when responding to SA queries.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Mark Lord [Wed, 29 Mar 2006 14:50:31 +0000 (09:50 -0500)]
[PATCH] sata_mv: three bug fixes
(1) A DMA transfer size of 0x10000 was not being written
as 0x0000 in the PRDs. Fixed.
(1) The DEV_IRQ interrupt cause bit happens spuriously
during EDMA operation, and was not being ignored by the driver.
This led to various "drive busy" errors being reported,
with associated unpredictable behaviour. Fixed.
(2) If a SATA or PCI interrupt was received with no outstanding
command, the interrupt handler still attempted to invoke
ata_qc_complete(), triggering assert()/BUG_ON() behaviour
elsewhere in libata. Fixed.
The driver still has issues with confusion after error-recovery,
but should now be reliable in the absence of drive errors.
I will be looking more into the error-handling bugs next.
Signed-Off-By: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Albert Lee [Mon, 27 Mar 2006 08:39:18 +0000 (16:39 +0800)]
[PATCH] libata: ata_dev_init_params() fixes
ata_dev_init_params() fixes:
- Get the "heads" and "sectors" parameters from caller instead of implicitly from dev->id[].
- Return AC_ERR_INVALID instead of 0 if an invalid parameter is found
Signed-off-by: Albert Lee <albertcc@tw.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Mon, 27 Mar 2006 18:01:32 +0000 (19:01 +0100)]
[PATCH] libata: Fix interesting use of "extern" and also some bracketing
Signed-off-by: Alan Cox <alan@redhat.com>
Last of the set, just clean up some oddments. Assuming the whole set is
now ok then the remaining differences are the setup of PIO_0 at reset
and the ->data_xfer method. Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Mon, 27 Mar 2006 17:58:20 +0000 (18:58 +0100)]
[PATCH] libata: Simplex and other mode filtering logic
Add a field to the host_set called 'flags' (was host_set_flags changed
to suit Jeff)
Add a simplex_claimed field so we can remember who owns the DMA channel
Add a ->mode_filter() hook to allow drivers to filter modes
Add docs for mode_filter and set_mode
Filter according to simplex state
Filter cable in core
This provides the needed framework to support all the mode rules found
in the PATA world. The simplex filter deals with 'to spec' simplex DMA
systems found in older chips. The cable filter avoids duplicating the
same rules in each chip driver with PATA. Finally the mode filter is
neccessary because drive/chip combinations have errata that forbid
certain modes with some drives or types of ATA object.
Drive speed setup remains per channel for now and the filters now use
the framework Tejun put into place which cleans them up a lot from the
older libata-pata patches.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Mon, 27 Mar 2006 17:46:37 +0000 (18:46 +0100)]
[PATCH] libata: Add ->set_mode hook for odd drivers
Some hardware doesn't want the usual mode setup logic running. This
allows the hardware driver to replace it for special cases in the least
invasive way possible.
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Mon, 27 Mar 2006 17:42:40 +0000 (18:42 +0100)]
[PATCH] libata: BMDMA handling updates
This is the minimal patch set to enable the current code to be used with
a controller following SFF (ie any PATA and early SATA controllers)
safely without crashes if there is no BMDMA area or if BMDMA is not
assigned by the BIOS for some reason.
Simplex status is recorded but not acted upon in this change, this isn't
a problem with the current drivers as none of them are for simplex
hardware. A following diff will deal with that.
The flags in the probe structure remain ->host_set_flags although Jeff
asked me to rename them, simply because the rename would break the usual
Linux rules that old code should break when there are changes. not
compile and run and then blow up/eat your computer/etc. Renaming this
later is a trivial exercise once a better name is chosen.
gcc4 doesn't like us declaring a static function inside another
function. We can do away with this construct altogether and use
BUILD_BUG_ON() instead (idea from Andi Kleen.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
The comments concerning how the pcnet32 ethernet device driver selects
the MAC addr to use are incorrect. A recent patch (in the last 3 months)
changed how the code worked, but did not change the comments.
Side comment: the new behaviour is good; I've got a pcnet32 card which
powers up with garbage in the CSR's, and a good MAC addr in the PROM.
Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Arthur Othieno [Tue, 28 Mar 2006 22:09:01 +0000 (14:09 -0800)]
[PATCH] net: remove CONFIG_NET_CBUS conditional for NS8390
Don't bother testing for CONFIG_NET_CBUS ("NEC PC-9800 C-bus cards"); it went
out with the rest of PC98 subarch.
Signed-off-by: Arthur Othieno <apgo@patchbomb.org> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Mark Brown [Tue, 28 Mar 2006 22:08:55 +0000 (14:08 -0800)]
[PATCH] natsemi: Support oversized EEPROMs
The natsemi chip can have a larger EEPROM attached than it itself uses for
configuration. This patch adds support for user space access to such an
EEPROM.
Signed-off-by: Mark Brown <broonie@sirena.org.uk> Cc: Tim Hockin <thockin@hockin.org> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jay Vosburgh [Mon, 27 Mar 2006 21:27:43 +0000 (13:27 -0800)]
[PATCH] bonding: support carrier state for master
Add support for the bonding master to specify its carrier state
based upon the state of the slaves. For 802.3ad, the bond is up if
there is an active, parterned aggregator. For other modes, the bond is
up if any slaves are up. Updates driver version to 3.0.3.
Based on a patch by jamal <hadi@cyberus.ca>.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Randy Dunlap [Mon, 27 Mar 2006 20:26:12 +0000 (12:26 -0800)]
[PATCH] acenic: fix section mismatches
Fix section mismatches in acenic driver:
WARNING: drivers/net/acenic.o - Section mismatch: reference to .init.data:tigon2FwText from .text between 'acenic_probe_one' (at offset 0x2409) and 'ace_interrupt'
WARNING: drivers/net/acenic.o - Section mismatch: reference to .init.data:tigon2FwRodata from .text between 'acenic_probe_one' (at offset 0x2422) and 'ace_interrupt'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Gary Zambrano [Wed, 29 Mar 2006 22:12:05 +0000 (17:12 -0500)]
b44: fix force mac address before ifconfig up
Initializing the b44 MAC & PCI functional blocks in the controller must
occur inside init_one(). This will allow access to the MAC registers.
The controller was being powered up in b44_open() which would not allow
access to the registers before ifconfig was up.
Philip Kohlbecher found this bug.
Signed-off-by: Gary Zambrano <zambrano@broadcom.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (67 commits)
[PATCH] powerpc: Remove oprofile spinlock backtrace code
[PATCH] powerpc: Add oprofile calltrace support to all powerpc cpus
[PATCH] powerpc: Add oprofile calltrace support
[PATCH] for_each_possible_cpu: ppc
[PATCH] for_each_possible_cpu: powerpc
[PATCH] lock PTE before updating it in 440/BookE page fault handler
[PATCH] powerpc: Kill _machine and hard-coded platform numbers
ppc: Fix compile error in arch/ppc/lib/strcase.c
[PATCH] git-powerpc: WARN was a dumb idea
[PATCH] powerpc: a couple of trivial compile warning fixes
powerpc: remove OCP references
powerpc: Make uImage default build output for MPC8540 ADS
powerpc: move math-emu over to arch/powerpc
powerpc: use memparse() for mem= command line parsing
ppc: fix strncasecmp prototype
[PATCH] powerpc: make ISA floppies work again
[PATCH] powerpc: Fix some initcall return values
[PATCH] powerpc: Workaround for pSeries RTAS bug
[PATCH] spufs: fix __init/__exit annotations
[PATCH] powerpc: add hvc backend for rtas
...
Roland Dreier [Wed, 29 Mar 2006 17:36:46 +0000 (09:36 -0800)]
IB/mthca: Fix section mismatch problems
Quite a few cleanup functions in mthca were marked as __devexit.
However, they could also be called from error paths during
initialization, so they cannot be marked that way. Just delete all of
the incorrect annotations.
Roland Dreier [Wed, 29 Mar 2006 17:36:46 +0000 (09:36 -0800)]
IPoIB: Fix oops with raw sockets
ipoib_hard_header() needs to handle the case that daddr is NULL. This
can happen when packets are injected via a raw socket, and IPoIB
shouldn't oops in this case.
Jack Morgenstein [Sun, 26 Mar 2006 15:01:12 +0000 (17:01 +0200)]
IB/mthca: Fix check of size in SRQ creation
The previous patch for Tavor broke MemFree logic.
The driver should perform limit check only for Tavor. For MemFree,
the check is incorrect, since ds (WQE stride) is always a power-of-2
(although the max_desc_size may not be).
In Tavor, however, WQE stride and desc_size are the same, and are not
necessarily power-of-2. The check was really for the WQE stride (and
it Tavor, we use max_desc_size for the stride).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Roland Dreier [Wed, 29 Mar 2006 17:36:45 +0000 (09:36 -0800)]
IB/srp: Fix unmapping of fake scatterlist
The recently merged patch to create a fake scatterlist for non-SG SCSI
commands had a bug: the driver ended up doing dma_unmap_sg() on a
scatterlist scmnd->request_buffer rather than the fake scatter list it
created. Fix this so that the driver unmaps the same thing it maps.
Linus Torvalds [Wed, 29 Mar 2006 16:55:36 +0000 (08:55 -0800)]
Merge git://oss.sgi.com:8090/oss/git/xfs-2.6
* git://oss.sgi.com:8090/oss/git/xfs-2.6:
[XFS] Cleanup in XFS after recent get_block_t interface tweaks.
[XFS] Remove unused/obsoleted function: xfs_bmap_do_search_extents()
[XFS] A change to inode chunk allocation to try allocating the new chunk
Fixes a regression from the recent "remove ->get_blocks() support"
[XFS] Fix compiler warning and small code inconsistencies in compat
[XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all
Remove oprofile spinlock backtrace code now we have proper calltrace
support. Also make MMCRA sihv and sipr bits a variable since they may
change in future cpus. Finally, MMCRA should be a 64bit quantity.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Eugene Surovegin [Tue, 28 Mar 2006 18:13:12 +0000 (10:13 -0800)]
[PATCH] lock PTE before updating it in 440/BookE page fault handler
Fix 44x and BookE page fault handler to correctly lock PTE before
trying to pte_update() it, otherwise this PTE might be swapped out
after pte_present() check but before pte_uptdate() call, resulting in
corrupted PTE. This can happen with enabled preemption and low memory
condition.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
Oleg Nesterov [Wed, 29 Mar 2006 00:11:30 +0000 (16:11 -0800)]
[PATCH] send_sigqueue: simplify and fix the race
send_sigqueue() checks PF_EXITING, then locks p->sighand->siglock. This is
unsafe: 'p' can exit in between and set ->sighand = NULL. The race is
theoretical, the window is tiny and irqs are disabled by the caller, so I
don't think we need the fix for -stable tree.
Convert send_sigqueue() to use lock_task_sighand() helper.
Also, delete 'p->flags & PF_EXITING' re-check, it is unneeded and the
comment is wrong.
The previous patch has changed callsites of do_notify_parent_cldstop() so that
to_self == (->ptrace & PT_PTRACED) always (as it should be). We can remove
this parameter now.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Wed, 29 Mar 2006 00:11:28 +0000 (16:11 -0800)]
[PATCH] simplify do_signal_stop()
do_signal_stop() considers 'thread_group_empty()' as a special case.
This was needed to avoid taking tasklist_lock. Since this lock is
unneeded any longer, we can remove this special case and simplify
the code even more.
Also, before this patch, finish_stop() was called with stop_count == -1
for 'thread_group_empty()' case. This is not strictly wrong, but confusing
and unneeded.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal(). This
makes the exit path more understandable and allows us to do
cleanup_sighand() outside of ->siglock protected section.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Wed, 29 Mar 2006 00:11:26 +0000 (16:11 -0800)]
[PATCH] make fork() atomic wrt pgrp/session signals
Eric W. Biederman wrote:
>
> Ok. SUSV3/Posix is clear, fork is atomic with respect
> to signals. Either a signal comes before or after a
> fork but not during. (See the rationale section).
> http://www.opengroup.org/onlinepubs/000095399/functions/fork.html
>
> The tasklist_lock does not stop forks from adding to a process
> group. The forks stall while the tasklist_lock is held, but a fork
> that began before we grabbed the tasklist_lock simply completes
> afterwards, and the child does not receive the signal.
This also means that SIGSTOP or sig_kernel_coredump() signal can't
be delivered to pgrp/session reliably.
With this patch copy_process() returns -ERESTARTNOINTR when it
detects a pending signal, fork() will be restarted transparently
after handling the signals.
This patch also deletes now unneeded "group_stop_count > 0" check,
copy_process() can no longer succeed while group stop in progress.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Wed, 29 Mar 2006 00:11:25 +0000 (16:11 -0800)]
[PATCH] pids: kill PIDTYPE_TGID
This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
kernel/pid.c and speeding up subthreads create/destroy a bit. It is also a
preparation for the further tref/pids rework.
This patch adds 'struct list_head thread_group' to 'struct task_struct'
instead.
We don't detach group leader from PIDTYPE_PID namespace until another
thread inherits it's ->pid == ->tgid, so we are safe wrt premature
free_pidmap(->tgid) call.
Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
Should the need arise, we can use find_task_by_pid()->group_leader.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Wed, 29 Mar 2006 00:11:20 +0000 (16:11 -0800)]
[PATCH] do __unhash_process() under ->siglock
This patch moves __unhash_process() call from realease_task() to
__exit_signal(), so __detach_pid() is called with ->siglock held.
This means we don't need tasklist_lock to iterate over thread group anymore:
copy_process() was already changed to do attach_pid()
under ->siglock.
Eric's "pidhash-kill-switch_exec_pids.patch" from -mm
changed de_thread() so it doesn't touch PIDTYPE_TGID.
NOTE: de_thread() still needs some attention. It still changes task->pid
lockless. Taking ->sighand.siglock here allows to do more tasklist_lock
removals.
Oleg Nesterov [Wed, 29 Mar 2006 00:11:19 +0000 (16:11 -0800)]
[PATCH] revert "Optimize sys_times for a single thread process"
This patch reverts 'CONFIG_SMP && thread_group_empty()' optimization in
sys_times(). The reason is that the next patch breaks memory ordering which
is needed for that optimization.
tasklist_lock in sys_times() will be eliminated completely by further patch.
Oleg Nesterov [Wed, 29 Mar 2006 00:11:18 +0000 (16:11 -0800)]
[PATCH] move __exit_signal() to kernel/exit.c
__exit_signal() is private to release_task() now. I think it is better to
make it static in kernel/exit.c and export flush_sigqueue() instead - this
function is much more simple and straightforward.
__exit_signal() does important cleanups atomically under ->siglock. It is
also called from copy_process's error path. This is not good, for example we
can't move __unhash_process() under ->siglock for that reason.
We should not mix these 2 paths, just look at ugly 'if (p->sighand)' under
'bad_fork_cleanup_sighand:' label. For copy_process() case it is sufficient
to just backout copy_signal(), nothing more.
Again, nobody can see this task yet. For CLONE_THREAD case we just decrement
signal->count, otherwise nobody can see this ->signal and we can free it
lockless.
This patch assumes it is safe to do exit_thread_group_keys() without
tasklist_lock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>