Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...

author Linus Torvalds <torvalds@linux-foundation.org>

Thu, 21 Oct 2010 19:49:15 +0000 (12:49 -0700)

committer Linus Torvalds <torvalds@linux-foundation.org>

Thu, 21 Oct 2010 19:49:15 +0000 (12:49 -0700)
author Linus Torvalds <torvalds@linux-foundation.org>
Thu, 21 Oct 2010 19:49:15 +0000 (12:49 -0700)
committer Linus Torvalds <torvalds@linux-foundation.org>
Thu, 21 Oct 2010 19:49:15 +0000 (12:49 -0700)
diff --git a/CREDITS b/CREDITS

index 72b487869788c14cd40e9535b700f4be166aa12b..41d8e63d5165b5b786db6ab7d8c14fbc49fc0107 100644 (file)
--- a/CREDITS
+++ b/CREDITS
@@ -3554,12 +3554,12 @@ E: cvance@nai.com
  D: portions of the Linux Security Module (LSM) framework and security modules
  
  N: Petr Vandrovec
-E: vandrove@vc.cvut.cz
+E: petr@vandrovec.name
  D: Small contributions to ncpfs
  D: Matrox framebuffer driver
-S: Chudenicka 8
-S: 10200 Prague 10, Hostivar
-S: Czech Republic
+S: 21513 Conradia Ct
+S: Cupertino, CA 95014
+S: USA
  
  N: Thibaut Varene
  E: T-Bone@parisc-linux.org
diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt

index 2df71861e578b6ebcb2b94371d4a32fbdf7fca61..d9271e74e488a54177c548a5ad859d51ffdaff28 100644 (file)
--- a/Documentation/networking/e1000.txt
+++ b/Documentation/networking/e1000.txt
@@ -1,82 +1,35 @@
  Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters
  ===============================================================
  
-September 26, 2006
-
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
  
  Contents
  ========
  
-- In This Release
  - Identifying Your Adapter
-- Building and Installation
  - Command Line Parameters
  - Speed and Duplex Configuration
  - Additional Configurations
-- Known Issues
  - Support
  
-
-In This Release
-===============
-
-This file describes the Linux* Base Driver for the Intel(R) PRO/1000 Family
-of Adapters.  This driver includes support for Itanium(R)2-based systems.
-
-For questions related to hardware requirements, refer to the documentation
-supplied with your Intel PRO/1000 adapter. All hardware requirements listed
-apply to use with Linux.
-
-The following features are now available in supported kernels:
- - Native VLANs
- - Channel Bonding (teaming)
- - SNMP
-
-Channel Bonding documentation can be found in the Linux kernel source:
-/Documentation/networking/bonding.txt
-
-The driver information previously displayed in the /proc filesystem is not
-supported in this release.  Alternatively, you can use ethtool (version 1.6
-or later), lspci, and ifconfig to obtain the same information.
-
-Instructions on updating ethtool can be found in the section "Additional
-Configurations" later in this document.
-
-NOTE: The Intel(R) 82562v 10/100 Network Connection only provides 10/100
-support.
-
-
  Identifying Your Adapter
  ========================
  
  For more information on how to identify your adapter, go to the Adapter &
  Driver ID Guide at:
  
-    http://support.intel.com/support/network/adapter/pro100/21397.htm
+    http://support.intel.com/support/go/network/adapter/idguide.htm
  
  For the latest Intel network drivers for Linux, refer to the following
  website.  In the search field, enter your adapter name or type, or use the
  networking link on the left to search for your adapter:
  
-    http://downloadfinder.intel.com/scripts-df/support_intel.asp
-
+    http://support.intel.com/support/go/network/adapter/home.htm
  
  Command Line Parameters
  =======================
  
-If the driver is built as a module, the  following optional parameters
-are used by entering them on the command line with the modprobe command
-using this syntax:
-
-     modprobe e1000 [<option>=<VAL1>,<VAL2>,...]
-
-For example, with two PRO/1000 PCI adapters, entering:
-
-     modprobe e1000 TxDescriptors=80,128
-
-loads the e1000 driver with 80 TX descriptors for the first adapter and
-128 TX descriptors for the second adapter.
-
  The default value for each parameter is generally the recommended setting,
  unless otherwise noted.
  
@@ -89,10 +42,6 @@ NOTES:  For more information about the AutoNeg, Duplex, and Speed
          parameters, see the application note at:
          http://www.intel.com/design/network/applnots/ap450.htm
  
-        A descriptor describes a data buffer and attributes related to
-        the data buffer.  This information is accessed by the hardware.
-
-
  AutoNeg
  -------
  (Supported only on adapters with copper connections)
@@ -106,7 +55,6 @@ Duplex parameters must not be specified.
  NOTE:  Refer to the Speed and Duplex section of this readme for more
         information on the AutoNeg parameter.
  
-
  Duplex
  ------
  (Supported only on adapters with copper connections)
@@ -119,7 +67,6 @@ set to auto-negotiate, the board auto-detects the correct duplex.  If the
  link partner is forced (either full or half), Duplex defaults to half-
  duplex.
  
-
  FlowControl
  -----------
  Valid Range:   0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
@@ -128,16 +75,16 @@ Default Value: Reads flow control settings from the EEPROM
  This parameter controls the automatic generation(Tx) and response(Rx)
  to Ethernet PAUSE frames.
  
-
  InterruptThrottleRate
  ---------------------
  (not supported on Intel(R) 82542, 82543 or 82544-based adapters)
-Valid Range:   0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
+Valid Range:   0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
+                                   4=simplified balancing)
  Default Value: 3
  
  The driver can limit the amount of interrupts per second that the adapter
-will generate for incoming packets. It does this by writing a value to the 
-adapter that is based on the maximum amount of interrupts that the adapter 
+will generate for incoming packets. It does this by writing a value to the
+adapter that is based on the maximum amount of interrupts that the adapter
  will generate per second.
  
  Setting InterruptThrottleRate to a value greater or equal to 100
@@ -146,37 +93,43 @@ per second, even if more packets have come in. This reduces interrupt
  load on the system and can lower CPU utilization under heavy load,
  but will increase latency as packets are not processed as quickly.
  
-The default behaviour of the driver previously assumed a static 
-InterruptThrottleRate value of 8000, providing a good fallback value for 
-all traffic types,but lacking in small packet performance and latency. 
-The hardware can handle many more small packets per second however, and 
+The default behaviour of the driver previously assumed a static
+InterruptThrottleRate value of 8000, providing a good fallback value for
+all traffic types,but lacking in small packet performance and latency.
+The hardware can handle many more small packets per second however, and
  for this reason an adaptive interrupt moderation algorithm was implemented.
  
  Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which
-it dynamically adjusts the InterruptThrottleRate value based on the traffic 
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
  that it receives. After determining the type of incoming traffic in the last
-timeframe, it will adjust the InterruptThrottleRate to an appropriate value 
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
  for that traffic.
  
  The algorithm classifies the incoming traffic every interval into
-classes.  Once the class is determined, the InterruptThrottleRate value is 
-adjusted to suit that traffic type the best. There are three classes defined: 
+classes.  Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
  "Bulk traffic", for large amounts of packets of normal size; "Low latency",
  for small amounts of traffic and/or a significant percentage of small
-packets; and "Lowest latency", for almost completely small packets or 
+packets; and "Lowest latency", for almost completely small packets or
  minimal traffic.
  
-In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 
-for traffic that falls in class "Bulk traffic". If traffic falls in the "Low 
-latency" or "Lowest latency" class, the InterruptThrottleRate is increased 
+In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
+for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
+latency" or "Lowest latency" class, the InterruptThrottleRate is increased
  stepwise to 20000. This default mode is suitable for most applications.
  
  For situations where low latency is vital such as cluster or
  grid computing, the algorithm can reduce latency even more when
  InterruptThrottleRate is set to mode 1. In this mode, which operates
-the same as mode 3, the InterruptThrottleRate will be increased stepwise to 
+the same as mode 3, the InterruptThrottleRate will be increased stepwise to
  70000 for traffic in class "Lowest latency".
  
+In simplified mode the interrupt rate is based on the ratio of Tx and
+Rx traffic.  If the bytes per second rate is approximately equal, the
+interrupt rate will drop as low as 2000 interrupts per second.  If the
+traffic is mostly transmit or mostly receive, the interrupt rate could
+be as high as 8000.
+
  Setting InterruptThrottleRate to 0 turns off any interrupt moderation
  and may improve small packet latency, but is generally not suitable
  for bulk throughput traffic.
@@ -212,8 +165,6 @@ NOTE:  When e1000 is loaded with default settings and multiple adapters
         be platform-specific.  If CPU utilization is not a concern, use
         RX_POLLING (NAPI) and default driver settings.
  
-
-
  RxDescriptors
  -------------
  Valid Range:   80-256 for 82542 and 82543-based adapters
@@ -225,15 +176,14 @@ by the driver.  Increasing this value allows the driver to buffer more
  incoming packets, at the expense of increased system memory utilization.
  
  Each descriptor is 16 bytes.  A receive buffer is also allocated for each
-descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending 
+descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
  on the MTU setting. The maximum MTU size is 16110.
  
-NOTE:  MTU designates the frame size.  It only needs to be set for Jumbo 
-       Frames.  Depending on the available system resources, the request 
-       for a higher number of receive descriptors may be denied.  In this 
+NOTE:  MTU designates the frame size.  It only needs to be set for Jumbo
+       Frames.  Depending on the available system resources, the request
+       for a higher number of receive descriptors may be denied.  In this
         case, use a lower number.
  
-
  RxIntDelay
  ----------
  Valid Range:   0-65535 (0=off)
@@ -254,7 +204,6 @@ CAUTION:  When setting RxIntDelay to a value other than 0, adapters may
            restoring the network connection.  To eliminate the potential
            for the hang ensure that RxIntDelay is set to 0.
  
-
  RxAbsIntDelay
  -------------
  (This parameter is supported only on 82540, 82545 and later adapters.)
@@ -268,7 +217,6 @@ packet is received within the set amount of time.  Proper tuning,
  along with RxIntDelay, may improve traffic throughput in specific network
  conditions.
  
-
  Speed
  -----
  (This parameter is supported only on adapters with copper connections.)
@@ -280,7 +228,6 @@ Speed forces the line speed to the specified value in megabits per second
  partner is set to auto-negotiate, the board will auto-detect the correct
  speed.  Duplex should also be set when Speed is set to either 10 or 100.
  
-
  TxDescriptors
  -------------
  Valid Range:   80-256 for 82542 and 82543-based adapters
@@ -295,6 +242,36 @@ NOTE:  Depending on the available system resources, the request for a
         higher number of transmit descriptors may be denied.  In this case,
         use a lower number.
  
+TxDescriptorStep
+----------------
+Valid Range:    1 (use every Tx Descriptor)
+               4 (use every 4th Tx Descriptor)
+
+Default Value:  1 (use every Tx Descriptor)
+
+On certain non-Intel architectures, it has been observed that intense TX
+traffic bursts of short packets may result in an improper descriptor
+writeback. If this occurs, the driver will report a "TX Timeout" and reset
+the adapter, after which the transmit flow will restart, though data may
+have stalled for as much as 10 seconds before it resumes.
+
+The improper writeback does not occur on the first descriptor in a system
+memory cache-line, which is typically 32 bytes, or 4 descriptors long.
+
+Setting TxDescriptorStep to a value of 4 will ensure that all TX descriptors
+are aligned to the start of a system memory cache line, and so this problem
+will not occur.
+
+NOTES: Setting TxDescriptorStep to 4 effectively reduces the number of
+       TxDescriptors available for transmits to 1/4 of the normal allocation.
+       This has a possible negative performance impact, which may be
+       compensated for by allocating more descriptors using the TxDescriptors
+       module parameter.
+
+       There are other conditions which may result in "TX Timeout", which will
+       not be resolved by the use of the TxDescriptorStep parameter. As the
+       issue addressed by this parameter has never been observed on Intel
+       Architecture platforms, it should not be used on Intel platforms.
  
  TxIntDelay
  ----------
@@ -307,7 +284,6 @@ efficiency if properly tuned for specific network traffic.  If the
  system is reporting dropped transmits, this value may be set too high
  causing the driver to run out of available transmit descriptors.
  
-
  TxAbsIntDelay
  -------------
  (This parameter is supported only on 82540, 82545 and later adapters.)
@@ -330,6 +306,35 @@ Default Value: 1
  A value of '1' indicates that the driver should enable IP checksum
  offload for received packets (both UDP and TCP) to the adapter hardware.
  
+Copybreak
+---------
+Valid Range:   0-xxxxxxx (0=off)
+Default Value: 256
+Usage: insmod e1000.ko copybreak=128
+
+Driver copies all packets below or equaling this size to a fresh Rx
+buffer before handing it up the stack.
+
+This parameter is different than other parameters, in that it is a
+single (not 1,1,1 etc.) parameter applied to all driver instances and
+it is also available during runtime at
+/sys/module/e1000/parameters/copybreak
+
+SmartPowerDownEnable
+--------------------
+Valid Range: 0-1
+Default Value:  0 (disabled)
+
+Allows PHY to turn off in lower power states. The user can turn off
+this parameter in supported chipsets.
+
+KumeranLockLoss
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+This workaround skips resetting the PHY at shutdown for the initial
+silicon releases of ICH8 systems.
  
  Speed and Duplex Configuration
  ==============================
@@ -385,40 +390,9 @@ If the link partner is forced to a specific speed and duplex, then this
  parameter should not be used.  Instead, use the Speed and Duplex parameters
  previously mentioned to force the adapter to the same speed and duplex.
  
-
  Additional Configurations
  =========================
  
-  Configuring the Driver on Different Distributions
-  -------------------------------------------------
-  Configuring a network driver to load properly when the system is started
-  is distribution dependent.  Typically, the configuration process involves
-  adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well
-  as editing other system startup scripts and/or configuration files.  Many
-  popular Linux distributions ship with tools to make these changes for you.
-  To learn the proper way to configure a network device for your system,
-  refer to your distribution documentation.  If during this process you are
-  asked for the driver or module name, the name for the Linux Base Driver
-  for the Intel(R) PRO/1000 Family of Adapters is e1000.
-
-  As an example, if you install the e1000 driver for two PRO/1000 adapters
-  (eth0 and eth1) and set the speed and duplex to 10full and 100half, add
-  the following to modules.conf or or modprobe.conf:
-
-       alias eth0 e1000
-       alias eth1 e1000
-       options e1000 Speed=10,100 Duplex=2,1
-
-  Viewing Link Messages
-  ---------------------
-  Link messages will not be displayed to the console if the distribution is
-  restricting system messages.  In order to see network driver link messages
-  on your console, set dmesg to eight by entering the following:
-
-       dmesg -n 8
-
-  NOTE: This setting is not saved across reboots.
-
    Jumbo Frames
    ------------
    Jumbo Frames support is enabled by changing the MTU to a value larger than
@@ -437,9 +411,11 @@ Additional Configurations
     setting in a different location.
  
    Notes:
-
-  - To enable Jumbo Frames, increase the MTU size on the interface beyond
-    1500.
+  Degradation in throughput performance may be observed in some Jumbo frames
+  environments. If this is observed, increasing the application's socket buffer
+  size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
+  See the specific application manual and /usr/src/linux*/Documentation/
+  networking/ip-sysctl.txt for more details.
  
    - The maximum MTU setting for Jumbo Frames is 16110.  This value coincides
      with the maximum Jumbo Frames size of 16128.
@@ -447,40 +423,11 @@ Additional Configurations
    - Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or
      loss of link.
  
-  - Some Intel gigabit adapters that support Jumbo Frames have a frame size
-    limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes.
-    The adapters with this limitation are based on the Intel(R) 82571EB,
-    82572EI, 82573L and 80003ES2LAN controller.  These correspond to the
-    following product names:
-     Intel(R) PRO/1000 PT Server Adapter
-     Intel(R) PRO/1000 PT Desktop Adapter
-     Intel(R) PRO/1000 PT Network Connection
-     Intel(R) PRO/1000 PT Dual Port Server Adapter
-     Intel(R) PRO/1000 PT Dual Port Network Connection
-     Intel(R) PRO/1000 PF Server Adapter
-     Intel(R) PRO/1000 PF Network Connection
-     Intel(R) PRO/1000 PF Dual Port Server Adapter
-     Intel(R) PRO/1000 PB Server Connection
-     Intel(R) PRO/1000 PL Network Connection
-     Intel(R) PRO/1000 EB Network Connection with I/O Acceleration
-     Intel(R) PRO/1000 EB Backplane Connection with I/O Acceleration
-     Intel(R) PRO/1000 PT Quad Port Server Adapter
-
    - Adapters based on the Intel(R) 82542 and 82573V/E controller do not
      support Jumbo Frames. These correspond to the following product names:
       Intel(R) PRO/1000 Gigabit Server Adapter
       Intel(R) PRO/1000 PM Network Connection
  
-  - The following adapters do not support Jumbo Frames:
-     Intel(R) 82562V 10/100 Network Connection
-     Intel(R) 82566DM Gigabit Network Connection
-     Intel(R) 82566DC Gigabit Network Connection
-     Intel(R) 82566MM Gigabit Network Connection
-     Intel(R) 82566MC Gigabit Network Connection
-     Intel(R) 82562GT 10/100 Network Connection
-     Intel(R) 82562G 10/100 Network Connection
-
-
    Ethtool
    -------
    The driver utilizes the ethtool interface for driver configuration and
@@ -490,142 +437,14 @@ Additional Configurations
    The latest release of ethtool can be found from
    http://sourceforge.net/projects/gkernel.
  
-  NOTE: Ethtool 1.6 only supports a limited set of ethtool options.  Support
-  for a more complete ethtool feature set can be enabled by upgrading
-  ethtool to ethtool-1.8.1.
-
    Enabling Wake on LAN* (WoL)
    ---------------------------
-  WoL is configured through the Ethtool* utility.  Ethtool is included with
-  all versions of Red Hat after Red Hat 7.2.  For other Linux distributions,
-  download and install Ethtool from the following website:
-  http://sourceforge.net/projects/gkernel.
-
-  For instructions on enabling WoL with Ethtool, refer to the website listed
-  above.
+  WoL is configured through the Ethtool* utility.
  
    WoL will be enabled on the system during the next shut down or reboot.
    For this driver version, in order to enable WoL, the e1000 driver must be
    loaded when shutting down or rebooting the system.
  
-  Wake On LAN is only supported on port A for the following devices:
-  Intel(R) PRO/1000 PT Dual Port Network Connection
-  Intel(R) PRO/1000 PT Dual Port Server Connection
-  Intel(R) PRO/1000 PT Dual Port Server Adapter
-  Intel(R) PRO/1000 PF Dual Port Server Adapter
-  Intel(R) PRO/1000 PT Quad Port Server Adapter
-
-  NAPI
-  ----
-  NAPI (Rx polling mode) is enabled in the e1000 driver.
-
-  See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
-
-
-Known Issues
-============
-
-Dropped Receive Packets on Half-duplex 10/100 Networks
-------------------------------------------------------
-If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half-
-duplex, you may observe occasional dropped receive packets.  There are no
-workarounds for this problem in this network configuration.  The network must
-be updated to operate in full-duplex, and/or 1000mbps only.
-
-Jumbo Frames System Requirement
--------------------------------
-Memory allocation failures have been observed on Linux systems with 64 MB
-of RAM or less that are running Jumbo Frames.  If you are using Jumbo
-Frames, your system may require more than the advertised minimum
-requirement of 64 MB of system memory.
-
-Performance Degradation with Jumbo Frames
------------------------------------------
-Degradation in throughput performance may be observed in some Jumbo frames
-environments.  If this is observed, increasing the application's socket
-buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values
-may help.  See the specific application manual and
-/usr/src/linux*/Documentation/
-networking/ip-sysctl.txt for more details.
-
-Jumbo Frames on Foundry BigIron 8000 switch
--------------------------------------------
-There is a known issue using Jumbo frames when connected to a Foundry
-BigIron 8000 switch.  This is a 3rd party limitation.  If you experience
-loss of packets, lower the MTU size.
-
-Allocating Rx Buffers when Using Jumbo Frames 
----------------------------------------------
-Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if 
-the available memory is heavily fragmented. This issue may be seen with PCI-X 
-adapters or with packet split disabled. This can be reduced or eliminated 
-by changing the amount of available memory for receive buffer allocation, by
-increasing /proc/sys/vm/min_free_kbytes. 
-
-Multiple Interfaces on Same Ethernet Broadcast Network
-------------------------------------------------------
-Due to the default ARP behavior on Linux, it is not possible to have
-one system on two IP networks in the same Ethernet broadcast domain
-(non-partitioned switch) behave as expected.  All Ethernet interfaces
-will respond to IP traffic for any IP address assigned to the system.
-This results in unbalanced receive traffic.
-
-If you have multiple interfaces in a server, either turn on ARP
-filtering by entering:
-
-    echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
-(this only works if your kernel's version is higher than 2.4.5),
-
-NOTE: This setting is not saved across reboots.  The configuration
-change can be made permanent by adding the line:
-    net.ipv4.conf.all.arp_filter = 1
-to the file /etc/sysctl.conf
-
-      or,
-
-install the interfaces in separate broadcast domains (either in
-different switches or in a switch partitioned to VLANs).
-
-82541/82547 can't link or are slow to link with some link partners
------------------------------------------------------------------
-There is a known compatibility issue with 82541/82547 and some
-low-end switches where the link will not be established, or will
-be slow to establish.  In particular, these switches are known to
-be incompatible with 82541/82547:
-
-    Planex FXG-08TE
-    I-O Data ETG-SH8
-
-To workaround this issue, the driver can be compiled with an override
-of the PHY's master/slave setting.  Forcing master or forcing slave
-mode will improve time-to-link.
-
-    # make CFLAGS_EXTRA=-DE1000_MASTER_SLAVE=<n>
-
-Where <n> is:
-
-    0 = Hardware default
-    1 = Master mode
-    2 = Slave mode
-    3 = Auto master/slave
-
-Disable rx flow control with ethtool
-------------------------------------
-In order to disable receive flow control using ethtool, you must turn
-off auto-negotiation on the same command line.
-
-For example:
-
-   ethtool -A eth? autoneg off rx off
-
-Unplugging network cable while ethtool -p is running
-----------------------------------------------------
-In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging
-the network cable while ethtool -p is running will cause the system to
-become unresponsive to keyboard commands, except for control-alt-delete.
-Restarting the system appears to be the only remedy.
-
-
  Support
  =======
  
diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt

new file mode 100644 (file)

index 0000000..6aa048b
--- /dev/null
+++ b/Documentation/networking/e1000e.txt
@@ -0,0 +1,302 @@
+Linux* Driver for Intel(R) Network Connection
+===============================================================
+
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Command Line Parameters
+- Additional Configurations
+- Support
+
+Identifying Your Adapter
+========================
+
+The e1000e driver supports all PCI Express Intel(R) Gigabit Network
+Connections, except those that are 82575, 82576 and 82580-based*.
+
+* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by
+  the e1000 driver, not the e1000e driver due to the 82546 part being used
+  behind a PCI Express bridge.
+
+For more information on how to identify your adapter, go to the Adapter &
+Driver ID Guide at:
+
+    http://support.intel.com/support/go/network/adapter/idguide.htm
+
+For the latest Intel network drivers for Linux, refer to the following
+website.  In the search field, enter your adapter name or type, or use the
+networking link on the left to search for your adapter:
+
+    http://support.intel.com/support/go/network/adapter/home.htm
+
+Command Line Parameters
+=======================
+
+The default value for each parameter is generally the recommended setting,
+unless otherwise noted.
+
+NOTES:  For more information about the InterruptThrottleRate,
+        RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay
+        parameters, see the application note at:
+        http://www.intel.com/design/network/applnots/ap450.htm
+
+InterruptThrottleRate
+---------------------
+Valid Range:   0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
+                                   4=simplified balancing)
+Default Value: 3
+
+The driver can limit the amount of interrupts per second that the adapter
+will generate for incoming packets. It does this by writing a value to the
+adapter that is based on the maximum amount of interrupts that the adapter
+will generate per second.
+
+Setting InterruptThrottleRate to a value greater or equal to 100
+will program the adapter to send out a maximum of that many interrupts
+per second, even if more packets have come in. This reduces interrupt
+load on the system and can lower CPU utilization under heavy load,
+but will increase latency as packets are not processed as quickly.
+
+The driver has two adaptive modes (setting 1 or 3) in which
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
+that it receives. After determining the type of incoming traffic in the last
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
+for that traffic.
+
+The algorithm classifies the incoming traffic every interval into
+classes.  Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
+"Bulk traffic", for large amounts of packets of normal size; "Low latency",
+for small amounts of traffic and/or a significant percentage of small
+packets; and "Lowest latency", for almost completely small packets or
+minimal traffic.
+
+In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
+for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
+latency" or "Lowest latency" class, the InterruptThrottleRate is increased
+stepwise to 20000. This default mode is suitable for most applications.
+
+For situations where low latency is vital such as cluster or
+grid computing, the algorithm can reduce latency even more when
+InterruptThrottleRate is set to mode 1. In this mode, which operates
+the same as mode 3, the InterruptThrottleRate will be increased stepwise to
+70000 for traffic in class "Lowest latency".
+
+In simplified mode the interrupt rate is based on the ratio of Tx and
+Rx traffic.  If the bytes per second rate is approximately equal the
+interrupt rate will drop as low as 2000 interrupts per second.  If the
+traffic is mostly transmit or mostly receive, the interrupt rate could
+be as high as 8000.
+
+Setting InterruptThrottleRate to 0 turns off any interrupt moderation
+and may improve small packet latency, but is generally not suitable
+for bulk throughput traffic.
+
+NOTE:  InterruptThrottleRate takes precedence over the TxAbsIntDelay and
+       RxAbsIntDelay parameters.  In other words, minimizing the receive
+       and/or transmit absolute delays does not force the controller to
+       generate more interrupts than what the Interrupt Throttle Rate
+       allows.
+
+NOTE:  When e1000e is loaded with default settings and multiple adapters
+       are in use simultaneously, the CPU utilization may increase non-
+       linearly.  In order to limit the CPU utilization without impacting
+       the overall throughput, we recommend that you load the driver as
+       follows:
+
+           modprobe e1000e InterruptThrottleRate=3000,3000,3000
+
+       This sets the InterruptThrottleRate to 3000 interrupts/sec for
+       the first, second, and third instances of the driver.  The range
+       of 2000 to 3000 interrupts per second works on a majority of
+       systems and is a good starting point, but the optimal value will
+       be platform-specific.  If CPU utilization is not a concern, use
+       RX_POLLING (NAPI) and default driver settings.
+
+RxIntDelay
+----------
+Valid Range:   0-65535 (0=off)
+Default Value: 0
+
+This value delays the generation of receive interrupts in units of 1.024
+microseconds.  Receive interrupt reduction can improve CPU efficiency if
+properly tuned for specific network traffic.  Increasing this value adds
+extra latency to frame reception and can end up decreasing the throughput
+of TCP traffic.  If the system is reporting dropped receives, this value
+may be set too high, causing the driver to run out of available receive
+descriptors.
+
+CAUTION:  When setting RxIntDelay to a value other than 0, adapters may
+          hang (stop transmitting) under certain network conditions.  If
+          this occurs a NETDEV WATCHDOG message is logged in the system
+          event log.  In addition, the controller is automatically reset,
+          restoring the network connection.  To eliminate the potential
+          for the hang ensure that RxIntDelay is set to 0.
+
+RxAbsIntDelay
+-------------
+Valid Range:   0-65535 (0=off)
+Default Value: 8
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+receive interrupt is generated.  Useful only if RxIntDelay is non-zero,
+this value ensures that an interrupt is generated after the initial
+packet is received within the set amount of time.  Proper tuning,
+along with RxIntDelay, may improve traffic throughput in specific network
+conditions.
+
+TxIntDelay
+----------
+Valid Range:   0-65535 (0=off)
+Default Value: 8
+
+This value delays the generation of transmit interrupts in units of
+1.024 microseconds.  Transmit interrupt reduction can improve CPU
+efficiency if properly tuned for specific network traffic.  If the
+system is reporting dropped transmits, this value may be set too high
+causing the driver to run out of available transmit descriptors.
+
+TxAbsIntDelay
+-------------
+Valid Range:   0-65535 (0=off)
+Default Value: 32
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+transmit interrupt is generated.  Useful only if TxIntDelay is non-zero,
+this value ensures that an interrupt is generated after the initial
+packet is sent on the wire within the set amount of time.  Proper tuning,
+along with TxIntDelay, may improve traffic throughput in specific
+network conditions.
+
+Copybreak
+---------
+Valid Range:   0-xxxxxxx (0=off)
+Default Value: 256
+
+Driver copies all packets below or equaling this size to a fresh Rx
+buffer before handing it up the stack.
+
+This parameter is different than other parameters, in that it is a
+single (not 1,1,1 etc.) parameter applied to all driver instances and
+it is also available during runtime at
+/sys/module/e1000e/parameters/copybreak
+
+SmartPowerDownEnable
+--------------------
+Valid Range: 0-1
+Default Value:  0 (disabled)
+
+Allows PHY to turn off in lower power states. The user can set this parameter
+in supported chipsets.
+
+KumeranLockLoss
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+This workaround skips resetting the PHY at shutdown for the initial
+silicon releases of ICH8 systems.
+
+IntMode
+-------
+Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X)
+Default Value: 2
+
+Allows changing the interrupt mode at module load time, without requiring a
+recompile. If the driver load fails to enable a specific interrupt mode, the
+driver will try other interrupt modes, from least to most compatible.  The
+interrupt order is MSI-X, MSI, Legacy.  If specifying MSI (IntMode=1)
+interrupts, only MSI and Legacy will be attempted.
+
+CrcStripping
+------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+Strip the CRC from received packets before sending up the network stack.  If
+you have a machine with a BMC enabled but cannot receive IPMI traffic after
+loading or enabling the driver, try disabling this feature.
+
+WriteProtectNVM
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+Set the hardware to ignore all write/erase cycles to the GbE region in the
+ICHx NVM (non-volatile memory).  This feature can be disabled by the
+WriteProtectNVM module parameter (enabled by default) only after a hardware
+reset, but the machine must be power cycled before trying to enable writes.
+
+Note: the kernel boot option iomem=relaxed may need to be set if the kernel
+config option CONFIG_STRICT_DEVMEM=y, if the root user wants to write the
+NVM from user space via ethtool.
+
+Additional Configurations
+=========================
+
+  Jumbo Frames
+  ------------
+  Jumbo Frames support is enabled by changing the MTU to a value larger than
+  the default of 1500.  Use the ifconfig command to increase the MTU size.
+  For example:
+
+       ifconfig eth<x> mtu 9000 up
+
+  This setting is not saved across reboots.
+
+  Notes:
+
+  - The maximum MTU setting for Jumbo Frames is 9216.  This value coincides
+    with the maximum Jumbo Frames size of 9234 bytes.
+
+  - Using Jumbo Frames at 10 or 100 Mbps is not supported and may result in
+    poor performance or loss of link.
+
+  - Some adapters limit Jumbo Frames sized packets to a maximum of
+    4096 bytes and some adapters do not support Jumbo Frames.
+
+
+  Ethtool
+  -------
+  The driver utilizes the ethtool interface for driver configuration and
+  diagnostics, as well as displaying statistical information.  We
+  strongly recommend downloading the latest version of Ethtool at:
+
+  http://sourceforge.net/projects/gkernel.
+
+  Speed and Duplex
+  ----------------
+  Speed and Duplex are configured through the Ethtool* utility. For
+  instructions,  refer to the Ethtool man page.
+
+  Enabling Wake on LAN* (WoL)
+  ---------------------------
+  WoL is configured through the Ethtool* utility. For instructions on
+  enabling WoL with Ethtool, refer to the Ethtool man page.
+
+  WoL will be enabled on the system during the next shut down or reboot.
+  For this driver version, in order to enable WoL, the e1000e driver must be
+  loaded when shutting down or rebooting the system.
+
+  In most cases Wake On LAN is only supported on port A for multiple port
+  adapters. To verify if a port supports Wake on LAN run ethtool eth<X>.
+
+
+Support
+=======
+
+For general information, go to the Intel support website at:
+
+    www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+    http://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on the supported
+kernel with a supported adapter, email the specific information related
+to the issue to e1000-devel@lists.sf.net
diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt

old mode 100755 (executable)

new mode 100644 (file)

index 19015de..21dd5d1
--- a/Documentation/networking/ixgbevf.txt
+++ b/Documentation/networking/ixgbevf.txt
@@ -1,19 +1,16 @@
  Linux* Base Driver for Intel(R) Network Connection
  ==================================================
  
-November 24, 2009
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
  
  Contents
  ========
  
-- In This Release
  - Identifying Your Adapter
  - Known Issues/Troubleshooting
  - Support
  
-In This Release
-===============
-
  This file describes the ixgbevf Linux* Base Driver for Intel Network
  Connection.
  
@@ -33,7 +30,7 @@ Identifying Your Adapter
  For more information on how to identify your adapter, go to the Adapter &
  Driver ID Guide at:
  
-    http://support.intel.com/support/network/sb/CS-008441.htm
+    http://support.intel.com/support/go/network/adapter/idguide.htm
  
  Known Issues/Troubleshooting
  ============================
@@ -57,34 +54,3 @@ or the Intel Wired Networking project hosted by Sourceforge at:
  If an issue is identified with the released source code on the supported
  kernel with a supported adapter, email the specific information related
  to the issue to e1000-devel@lists.sf.net
-
-License
-=======
-
-Intel 10 Gigabit Linux driver.
-Copyright(c) 1999 - 2009 Intel Corporation.
-
-This program is free software; you can redistribute it and/or modify it
-under the terms and conditions of the GNU General Public License,
-version 2, as published by the Free Software Foundation.
-
-This program is distributed in the hope it will be useful, but WITHOUT
-ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-more details.
-
-You should have received a copy of the GNU General Public License along with
-this program; if not, write to the Free Software Foundation, Inc.,
-51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
-The full GNU General Public License is included in this distribution in
-the file called "COPYING".
-
-Trademarks
-==========
-
-Intel, Itanium, and Pentium are trademarks or registered trademarks of
-Intel Corporation or its subsidiaries in the United States and other
-countries.
-
-* Other names and brands may be claimed as the property of others.
diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c

index ccd951fa94eeda42a5a50c854b6b92e512fe6724..cc96ee2666f2e5f10f13b83b1fab72bcf8e18924 100644 (file)
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -478,7 +478,7 @@ static void prepare_hwpoison_fd(void)
         }
  
         if (opt_unpoison && !hwpoison_forget_fd) {
-               sprintf(buf, "%s/renew-pfn", hwpoison_debug_fs);
+               sprintf(buf, "%s/unpoison-pfn", hwpoison_debug_fs);
                 hwpoison_forget_fd = checked_open(buf, O_WRONLY);
         }
  }
diff --git a/MAINTAINERS b/MAINTAINERS

index 668682d1f5fa23f296b23c87efa1a8c024f3a62a..3d4179fbc5263ff4b624753c9ce06004286ce246 100644 (file)
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -962,6 +962,23 @@ W: http://www.fluff.org/ben/linux/
  S:     Maintained
  F:     arch/arm/mach-s3c6410/
  
+ARM/S5P ARM ARCHITECTURES
+M:     Kukjin Kim <kgene.kim@samsung.com>
+L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+L:     linux-samsung-soc@vger.kernel.org (moderated for non-subscribers)
+S:     Maintained
+F:     arch/arm/mach-s5p*/
+
+ARM/SAMSUNG S5P SERIES FIMC SUPPORT
+M:     Kyungmin Park <kyungmin.park@samsung.com>
+M:     Sylwester Nawrocki <s.nawrocki@samsung.com>
+L:     linux-arm-kernel@lists.infradead.org
+L:     linux-media@vger.kernel.org
+S:     Maintained
+F:     arch/arm/plat-s5p/dev-fimc*
+F:     arch/arm/plat-samsung/include/plat/*fimc*
+F:     drivers/media/video/s5p-fimc/
+
  ARM/SHMOBILE ARM ARCHITECTURE
  M:     Paul Mundt <lethal@linux-sh.org>
  M:     Magnus Damm <magnus.damm@gmail.com>
@@ -1510,6 +1527,8 @@ T:        git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
  S:     Supported
  F:     Documentation/filesystems/ceph.txt
  F:     fs/ceph
+F:     net/ceph
+F:     include/linux/ceph
  
  CERTIFIED WIRELESS USB (WUSB) SUBSYSTEM:
  M:     David Vrabel <david.vrabel@csr.com>
@@ -2528,7 +2547,7 @@ S:        Supported
  F:     drivers/scsi/gdt*
  
  GENERIC GPIO I2C DRIVER
-M:     Haavard Skinnemoen <hskinnemoen@atmel.com>
+M:     Haavard Skinnemoen <hskinnemoen@gmail.com>
  S:     Supported
  F:     drivers/i2c/busses/i2c-gpio.c
  F:     include/linux/i2c-gpio.h
@@ -3056,16 +3075,27 @@ L:      netdev@vger.kernel.org
  S:     Maintained
  F:     drivers/net/ixp2000/
  
-INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe)
+INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe/ixgbevf)
  M:     Jeff Kirsher <jeffrey.t.kirsher@intel.com>
  M:     Jesse Brandeburg <jesse.brandeburg@intel.com>
  M:     Bruce Allan <bruce.w.allan@intel.com>
-M:     Alex Duyck <alexander.h.duyck@intel.com>
+M:     Carolyn Wyborny <carolyn.wyborny@intel.com>
+M:     Don Skidmore <donald.c.skidmore@intel.com>
+M:     Greg Rose <gregory.v.rose@intel.com>
  M:     PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
+M:     Alex Duyck <alexander.h.duyck@intel.com>
  M:     John Ronciak <john.ronciak@intel.com>
  L:     e1000-devel@lists.sourceforge.net
  W:     http://e1000.sourceforge.net/
  S:     Supported
+F:     Documentation/networking/e100.txt
+F:     Documentation/networking/e1000.txt
+F:     Documentation/networking/e1000e.txt
+F:     Documentation/networking/igb.txt
+F:     Documentation/networking/igbvf.txt
+F:     Documentation/networking/ixgb.txt
+F:     Documentation/networking/ixgbe.txt
+F:     Documentation/networking/ixgbevf.txt
  F:     drivers/net/e100.c
  F:     drivers/net/e1000/
  F:     drivers/net/e1000e/
@@ -3073,6 +3103,7 @@ F:        drivers/net/igb/
  F:     drivers/net/igbvf/
  F:     drivers/net/ixgb/
  F:     drivers/net/ixgbe/
+F:     drivers/net/ixgbevf/
  
  INTEL PRO/WIRELESS 2100 NETWORK CONNECTION SUPPORT
  L:     linux-wireless@vger.kernel.org
@@ -3133,7 +3164,7 @@ F:        drivers/net/ioc3-eth.c
  
  IOC3 SERIAL DRIVER
  M:     Pat Gefre <pfg@sgi.com>
-L:     linux-mips@linux-mips.org
+L:     linux-serial@vger.kernel.org
  S:     Maintained
  F:     drivers/serial/ioc3_serial.c
  
@@ -3781,9 +3812,8 @@ W:        http://www.syskonnect.com
  S:     Supported
  
  MATROX FRAMEBUFFER DRIVER
-M:     Petr Vandrovec <vandrove@vc.cvut.cz>
  L:     linux-fbdev@vger.kernel.org
-S:     Maintained
+S:     Orphan
  F:     drivers/video/matrox/matroxfb_*
  F:     include/linux/matroxfb.h
  
@@ -3970,8 +4000,8 @@ S:        Maintained
  F:     drivers/net/natsemi.c
  
  NCP FILESYSTEM
-M:     Petr Vandrovec <vandrove@vc.cvut.cz>
-S:     Maintained
+M:     Petr Vandrovec <petr@vandrovec.name>
+S:     Odd Fixes
  F:     fs/ncpfs/
  
  NCR DUAL 700 SCSI DRIVER (MICROCHANNEL)
@@ -4777,6 +4807,15 @@ F:       fs/qnx4/
  F:     include/linux/qnx4_fs.h
  F:     include/linux/qnxtypes.h
  
+RADOS BLOCK DEVICE (RBD)
+F:     include/linux/qnxtypes.h
+M:     Yehuda Sadeh <yehuda@hq.newdream.net>
+M:     Sage Weil <sage@newdream.net>
+M:     ceph-devel@vger.kernel.org
+S:     Supported
+F:     drivers/block/rbd.c
+F:     drivers/block/rbd_types.h
+
  RADEON FRAMEBUFFER DISPLAY DRIVER
  M:     Benjamin Herrenschmidt <benh@kernel.crashing.org>
  L:     linux-fbdev@vger.kernel.org
@@ -5002,6 +5041,12 @@ F:       drivers/media/common/saa7146*
  F:     drivers/media/video/*7146*
  F:     include/media/*7146*
  
+SAMSUNG AUDIO (ASoC) DRIVERS
+M:     Jassi Brar <jassi.brar@samsung.com>
+L:     alsa-devel@alsa-project.org (moderated for non-subscribers)
+S:     Supported
+F:     sound/soc/s3c24xx
+
  TLG2300 VIDEO4LINUX-2 DRIVER
  M:     Huang Shijie <shijie8@gmail.com>
  M:     Kang Yong <kangyong@telegent.com>
@@ -6444,8 +6489,10 @@ F:       include/linux/wm97xx.h
  WOLFSON MICROELECTRONICS DRIVERS
  M:     Mark Brown <broonie@opensource.wolfsonmicro.com>
  M:     Ian Lartey <ian@opensource.wolfsonmicro.com>
+M:     Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
+T:     git git://opensource.wolfsonmicro.com/linux-2.6-asoc
  T:     git git://opensource.wolfsonmicro.com/linux-2.6-audioplus
-W:     http://opensource.wolfsonmicro.com/node/8
+W:     http://opensource.wolfsonmicro.com/content/linux-drivers-wolfson-devices
  S:     Supported
  F:     Documentation/hwmon/wm83??
  F:     drivers/leds/leds-wm83*.c
diff --git a/Makefile b/Makefile

index 471c49fd2f434e0cdfdb937ba16d35bac828fe07..860c26af52c31c1d354d979b00f8aad8e97d7c01 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
  VERSION = 2
  PATCHLEVEL = 6
  SUBLEVEL = 36
-EXTRAVERSION = -rc6
-NAME = Sheep on Meth
+EXTRAVERSION =
+NAME = Flesh-Eating Bats with Fangs
  
  # *DOCUMENTATION*
  # To see a list of typical targets execute "make help"
diff --git a/arch/alpha/kernel/signal.c b/arch/alpha/kernel/signal.c

index d290845aef5981f5db242d8abbd2f4d4c96c8fe5..6f7feb5db27193f33e24e962e4d1be257d8464ef 100644 (file)
--- a/arch/alpha/kernel/signal.c
+++ b/arch/alpha/kernel/signal.c
@@ -48,7 +48,7 @@ SYSCALL_DEFINE2(osf_sigprocmask, int, how, unsigned long, newmask)
         sigset_t mask;
         unsigned long res;
  
-       siginitset(&mask, newmask & ~_BLOCKABLE);
+       siginitset(&mask, newmask & _BLOCKABLE);
         res = sigprocmask(how, &mask, &oldmask);
         if (!res) {
                 force_successful_syscall_return();
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig

index 88c97bc7a6f5b7a751b0b628e75a17f4604f82f3..9c26ba7244fb450b0c73f15ca2565336033e152b 100644 (file)
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1101,6 +1101,20 @@ config ARM_ERRATA_720789
           invalidated are not, resulting in an incoherency in the system page
           tables. The workaround changes the TLB flushing routines to invalidate
           entries regardless of the ASID.
+
+config ARM_ERRATA_743622
+       bool "ARM errata: Faulty hazard checking in the Store Buffer may lead to data corruption"
+       depends on CPU_V7
+       help
+         This option enables the workaround for the 743622 Cortex-A9
+         (r2p0..r2p2) erratum. Under very rare conditions, a faulty
+         optimisation in the Cortex-A9 Store Buffer may lead to data
+         corruption. This workaround sets a specific bit in the diagnostic
+         register of the Cortex-A9 which disables the Store Buffer
+         optimisation, preventing the defect from occurring. This has no
+         visible impact on the overall performance or power consumption of the
+         processor.
+
  endmenu
  
  source "arch/arm/common/Kconfig"
diff --git a/arch/arm/kernel/kprobes-decode.c b/arch/arm/kernel/kprobes-decode.c

index 8bccbfa693ffc359dc55d6004837d2a149e2c5cd..2c1f0050c9c4d9fd74ac08b1c0a9c193e16df4de 100644 (file)
--- a/arch/arm/kernel/kprobes-decode.c
+++ b/arch/arm/kernel/kprobes-decode.c
@@ -1162,11 +1162,12 @@ space_cccc_001x(kprobe_opcode_t insn, struct arch_specific_insn *asi)
  {
         /*
          * MSR   : cccc 0011 0x10 xxxx xxxx xxxx xxxx xxxx
-        * Undef : cccc 0011 0x00 xxxx xxxx xxxx xxxx xxxx
+        * Undef : cccc 0011 0100 xxxx xxxx xxxx xxxx xxxx
          * ALU op with S bit and Rd == 15 :
          *         cccc 001x xxx1 xxxx 1111 xxxx xxxx xxxx
          */
-       if ((insn & 0x0f900000) == 0x03200000 ||        /* MSR & Undef */
+       if ((insn & 0x0fb00000) == 0x03200000 ||        /* MSR */
+           (insn & 0x0ff00000) == 0x03400000 ||        /* Undef */
             (insn & 0x0e10f000) == 0x0210f000)          /* ALU s-bit, R15  */
                 return INSN_REJECTED;
  
@@ -1177,7 +1178,7 @@ space_cccc_001x(kprobe_opcode_t insn, struct arch_specific_insn *asi)
          * *S (bit 20) updates condition codes
          * ADC/SBC/RSC reads the C flag
          */
-       insn &= 0xfff00fff;     /* Rn = r0, Rd = r0 */
+       insn &= 0xffff0fff;     /* Rd = r0 */
         asi->insn[0] = insn;
         asi->insn_handler = (insn & (1 << 20)) ?  /* S-bit */
                         emulate_alu_imm_rwflags : emulate_alu_imm_rflags;
diff --git a/arch/arm/mach-at91/include/mach/system.h b/arch/arm/mach-at91/include/mach/system.h

index c80e090b36708706671d8c66ff1a67a1c0106d61..ee8db152592e087c8fe986dd1c7d7e557d473ad4 100644 (file)
--- a/arch/arm/mach-at91/include/mach/system.h
+++ b/arch/arm/mach-at91/include/mach/system.h
@@ -28,17 +28,16 @@
  
  static inline void arch_idle(void)
  {
-#ifndef CONFIG_DEBUG_KERNEL
         /*
          * Disable the processor clock.  The processor will be automatically
          * re-enabled by an interrupt or by a reset.
          */
         at91_sys_write(AT91_PMC_SCDR, AT91_PMC_PCK);
-#else
+#ifndef CONFIG_CPU_ARM920T
         /*
          * Set the processor (CP15) into 'Wait for Interrupt' mode.
-        * Unlike disabling the processor clock via the PMC (above)
-        *  this allows the processor to be woken via JTAG.
+        * Post-RM9200 processors need this in conjunction with the above
+        * to save power when idle.
          */
         cpu_do_idle();
  #endif
diff --git a/arch/arm/mach-ep93xx/dma-m2p.c b/arch/arm/mach-ep93xx/dma-m2p.c

index 8904ca4e2e24fc9a4984bd0b88ce9d3eab52e7ea..a696d354b1f82598649586e6478be6334a91f978 100644 (file)
--- a/arch/arm/mach-ep93xx/dma-m2p.c
+++ b/arch/arm/mach-ep93xx/dma-m2p.c
@@ -276,7 +276,7 @@ static void channel_disable(struct m2p_channel *ch)
         v &= ~(M2P_CONTROL_STALL_IRQ_EN | M2P_CONTROL_NFB_IRQ_EN);
         m2p_set_control(ch, v);
  
-       while (m2p_channel_state(ch) == STATE_ON)
+       while (m2p_channel_state(ch) >= STATE_ON)
                 cpu_relax();
  
         m2p_set_control(ch, 0x0);
diff --git a/arch/arm/mach-imx/Kconfig b/arch/arm/mach-imx/Kconfig

index c5c0369bb481dff32f4bbb8c214f4859a16787ac..2f7e2728970d66966a959542f106fb9a216860ea 100644 (file)
--- a/arch/arm/mach-imx/Kconfig
+++ b/arch/arm/mach-imx/Kconfig
@@ -122,6 +122,7 @@ config MACH_CPUIMX27
         select IMX_HAVE_PLATFORM_IMX_I2C
         select IMX_HAVE_PLATFORM_IMX_UART
         select IMX_HAVE_PLATFORM_MXC_NAND
+       select MXC_ULPI if USB_ULPI
         help
           Include support for Eukrea CPUIMX27 platform. This includes
           specific configurations for the module and its peripherals.
diff --git a/arch/arm/mach-imx/mach-cpuimx27.c b/arch/arm/mach-imx/mach-cpuimx27.c

index 339150ab0ea5d63e13f0520e06612fc827f37f49..6830afd1d2baf01bc60aa959d15841aed48e26a6 100644 (file)
--- a/arch/arm/mach-imx/mach-cpuimx27.c
+++ b/arch/arm/mach-imx/mach-cpuimx27.c
@@ -259,7 +259,7 @@ static void __init eukrea_cpuimx27_init(void)
         i2c_register_board_info(0, eukrea_cpuimx27_i2c_devices,
                                 ARRAY_SIZE(eukrea_cpuimx27_i2c_devices));
  
-       imx27_add_i2c_imx1(&cpuimx27_i2c1_data);
+       imx27_add_i2c_imx0(&cpuimx27_i2c1_data);
  
         platform_add_devices(platform_devices, ARRAY_SIZE(platform_devices));
  
diff --git a/arch/arm/mach-s5p6440/cpu.c b/arch/arm/mach-s5p6440/cpu.c

index 526f33adb31d65e56d0a5064f3ac7b16496eef33..ec592e8660547a45956bcddc4bb6426523973b30 100644 (file)
--- a/arch/arm/mach-s5p6440/cpu.c
+++ b/arch/arm/mach-s5p6440/cpu.c
@@ -19,6 +19,7 @@
  #include <linux/sysdev.h>
  #include <linux/serial_core.h>
  #include <linux/platform_device.h>
+#include <linux/sched.h>
  
  #include <asm/mach/arch.h>
  #include <asm/mach/map.h>
diff --git a/arch/arm/mach-s5p6442/cpu.c b/arch/arm/mach-s5p6442/cpu.c

index a48fb553fd01cf5d42e1ab590146d7cad1e7680f..70ac681af72bc7a1a9ce5434c164e5b86e1fdf55 100644 (file)
--- a/arch/arm/mach-s5p6442/cpu.c
+++ b/arch/arm/mach-s5p6442/cpu.c
@@ -19,6 +19,7 @@
  #include <linux/sysdev.h>
  #include <linux/serial_core.h>
  #include <linux/platform_device.h>
+#include <linux/sched.h>
  
  #include <asm/mach/arch.h>
  #include <asm/mach/map.h>
diff --git a/arch/arm/mach-s5pc100/cpu.c b/arch/arm/mach-s5pc100/cpu.c

index 251c92ac5b227e05a28251ed7b890ff1f84eccfb..cd1afbce83e2cfa6d006d6084e384e1705408be8 100644 (file)
--- a/arch/arm/mach-s5pc100/cpu.c
+++ b/arch/arm/mach-s5pc100/cpu.c
@@ -21,6 +21,7 @@
  #include <linux/sysdev.h>
  #include <linux/serial_core.h>
  #include <linux/platform_device.h>
+#include <linux/sched.h>
  
  #include <asm/mach/arch.h>
  #include <asm/mach/map.h>
diff --git a/arch/arm/mach-s5pv210/clock.c b/arch/arm/mach-s5pv210/clock.c

index cfecd70657cb62e51d01ad820ce2f7856957f9fc..d562670e1b0b44005ade9644f69338de44b459bb 100644 (file)
--- a/arch/arm/mach-s5pv210/clock.c
+++ b/arch/arm/mach-s5pv210/clock.c
@@ -173,11 +173,6 @@ static int s5pv210_clk_ip3_ctrl(struct clk *clk, int enable)
         return s5p_gatectrl(S5P_CLKGATE_IP3, clk, enable);
  }
  
-static int s5pv210_clk_ip4_ctrl(struct clk *clk, int enable)
-{
-       return s5p_gatectrl(S5P_CLKGATE_IP4, clk, enable);
-}
-
  static int s5pv210_clk_mask0_ctrl(struct clk *clk, int enable)
  {
         return s5p_gatectrl(S5P_CLK_SRC_MASK0, clk, enable);
diff --git a/arch/arm/mach-s5pv210/cpu.c b/arch/arm/mach-s5pv210/cpu.c

index 77f456c91ad36ba97001630613f9c75d1d335819..245b82b53df4612d63d45984659e2221605d124f 100644 (file)
--- a/arch/arm/mach-s5pv210/cpu.c
+++ b/arch/arm/mach-s5pv210/cpu.c
@@ -19,6 +19,7 @@
  #include <linux/io.h>
  #include <linux/sysdev.h>
  #include <linux/platform_device.h>
+#include <linux/sched.h>
  
  #include <asm/mach/arch.h>
  #include <asm/mach/map.h>
diff --git a/arch/arm/mach-vexpress/ct-ca9x4.c b/arch/arm/mach-vexpress/ct-ca9x4.c

index efb127022d42facb807b4cb54220d3299fe24e04..71fb173495209572817a0e15f8794ab8c756b70f 100644 (file)
--- a/arch/arm/mach-vexpress/ct-ca9x4.c
+++ b/arch/arm/mach-vexpress/ct-ca9x4.c
@@ -68,7 +68,7 @@ static void __init ct_ca9x4_init_irq(void)
  }
  
  #if 0
-static void ct_ca9x4_timer_init(void)
+static void __init ct_ca9x4_timer_init(void)
  {
         writel(0, MMIO_P2V(CT_CA9X4_TIMER0) + TIMER_CTRL);
         writel(0, MMIO_P2V(CT_CA9X4_TIMER1) + TIMER_CTRL);
@@ -222,7 +222,7 @@ static struct platform_device pmu_device = {
         .resource       = pmu_resources,
  };
  
-static void ct_ca9x4_init(void)
+static void __init ct_ca9x4_init(void)
  {
         int i;
  
diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c

index 817f0ad38a0b5100ec0884e8c908c84af471bec4..7eaa232180a5ae627c3639ba642755444a5d3e27 100644 (file)
--- a/arch/arm/mach-vexpress/v2m.c
+++ b/arch/arm/mach-vexpress/v2m.c
@@ -48,7 +48,7 @@ void __init v2m_map_io(struct map_desc *tile, size_t num)
  }
  
  
-static void v2m_timer_init(void)
+static void __init v2m_timer_init(void)
  {
         writel(0, MMIO_P2V(V2M_TIMER0) + TIMER_CTRL);
         writel(0, MMIO_P2V(V2M_TIMER1) + TIMER_CTRL);
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c

index ab506272b2d3ef459b264b7741d61af46f6aa6b8..17e7b0b57e49f80e6e30c8bbe65ea9d1ebcf14a6 100644 (file)
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -204,8 +204,12 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
         /*
          * Don't allow RAM to be mapped - this causes problems with ARMv6+
          */
-       if (WARN_ON(pfn_valid(pfn)))
-               return NULL;
+       if (pfn_valid(pfn)) {
+               printk(KERN_WARNING "BUG: Your driver calls ioremap() on system memory.  This leads\n"
+                      KERN_WARNING "to architecturally unpredictable behaviour on ARMv6+, and ioremap()\n"
+                      KERN_WARNING "will fail in the next kernel release.  Please fix your driver.\n");
+               WARN_ON(1);
+       }
  
         type = get_mem_type(mtype);
         if (!type)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c

index 6a3a2d0cd6db15806342a7357c2c154b6ab5970d..e8ed9dc461fe39631efeeb4746b80854e13415c6 100644 (file)
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -248,7 +248,7 @@ static struct mem_type mem_types[] = {
         },
         [MT_MEMORY] = {
                 .prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
-                               L_PTE_USER | L_PTE_EXEC,
+                               L_PTE_WRITE | L_PTE_EXEC,
                 .prot_l1   = PMD_TYPE_TABLE,
                 .prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE,
                 .domain    = DOMAIN_KERNEL,
@@ -259,7 +259,7 @@ static struct mem_type mem_types[] = {
         },
         [MT_MEMORY_NONCACHED] = {
                 .prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
-                               L_PTE_USER | L_PTE_EXEC | L_PTE_MT_BUFFERABLE,
+                               L_PTE_WRITE | L_PTE_EXEC | L_PTE_MT_BUFFERABLE,
                 .prot_l1   = PMD_TYPE_TABLE,
                 .prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE,
                 .domain    = DOMAIN_KERNEL,
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S

index 7563ff0141bd85cee6d4cc626b69f7210141094c..197f21bed5e919f3a13f3cbc9404d428ad80fb97 100644 (file)
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -253,6 +253,14 @@ __v7_setup:
         orreq   r10, r10, #1 << 22              @ set bit #22
         mcreq   p15, 0, r10, c15, c0, 1         @ write diagnostic register
  #endif
+#ifdef CONFIG_ARM_ERRATA_743622
+       teq     r6, #0x20                       @ present in r2p0
+       teqne   r6, #0x21                       @ present in r2p1
+       teqne   r6, #0x22                       @ present in r2p2
+       mrceq   p15, 0, r10, c15, c0, 1         @ read diagnostic register
+       orreq   r10, r10, #1 << 6               @ set bit #6
+       mcreq   p15, 0, r10, c15, c0, 1         @ write diagnostic register
+#endif
  
  3:     mov     r10, #0
  #ifdef HARVARD_CACHE
@@ -365,7 +373,7 @@ __v7_ca9mp_proc_info:
         b       __v7_ca9mp_setup
         .long   cpu_arch_name
         .long   cpu_elf_name
-       .long   HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP
+       .long   HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP|HWCAP_TLS
         .long   cpu_v7_name
         .long   v7_processor_functions
         .long   v7wbi_tlb_fns
diff --git a/arch/arm/oprofile/common.c b/arch/arm/oprofile/common.c

index 0691176899ffc24f0a176d154a34f5b7200c6047..72e09eb642dd7c7a00d47fce210bb1290e926fd9 100644 (file)
--- a/arch/arm/oprofile/common.c
+++ b/arch/arm/oprofile/common.c
@@ -102,6 +102,7 @@ static int op_create_counter(int cpu, int event)
         if (IS_ERR(pevent)) {
                 ret = PTR_ERR(pevent);
         } else if (pevent->state != PERF_EVENT_STATE_ACTIVE) {
+               perf_event_release_kernel(pevent);
                 pr_warning("oprofile: failed to enable event %d "
                                 "on CPU %d\n", event, cpu);
                 ret = -EBUSY;
@@ -365,6 +366,7 @@ int __init oprofile_arch_init(struct oprofile_operations *ops)
         ret = init_driverfs();
         if (ret) {
                 kfree(counter_config);
+               counter_config = NULL;
                 return ret;
         }
  
@@ -402,7 +404,6 @@ void oprofile_arch_exit(void)
         struct perf_event *event;
  
         if (*perf_events) {
-               exit_driverfs();
                 for_each_possible_cpu(cpu) {
                         for (id = 0; id < perf_num_counters; ++id) {
                                 event = perf_events[cpu][id];
@@ -413,8 +414,10 @@ void oprofile_arch_exit(void)
                 }
         }
  
-       if (counter_config)
+       if (counter_config) {
                 kfree(counter_config);
+               exit_driverfs();
+       }
  }
  #else
  int __init oprofile_arch_init(struct oprofile_operations *ops)
diff --git a/arch/arm/plat-omap/Kconfig b/arch/arm/plat-omap/Kconfig

index e39a417a368dc92ad6776a68ef6fefc22a046b76..a92cb499313fdc9583890ebcc182ecae280cdc09 100644 (file)
--- a/arch/arm/plat-omap/Kconfig
+++ b/arch/arm/plat-omap/Kconfig
@@ -33,7 +33,7 @@ config OMAP_DEBUG_DEVICES
  config OMAP_DEBUG_LEDS
         bool
         depends on OMAP_DEBUG_DEVICES
-       default y if LEDS
+       default y if LEDS_CLASS
  
  config OMAP_RESET_CLOCKS
         bool "Reset unused clocks during boot"
diff --git a/arch/arm/plat-omap/iommu.c b/arch/arm/plat-omap/iommu.c

index a202a2ce6e3d0018ee3022ead0ba835527e3a4b0..6cd151b31bc5f7ee37afe54882737ddb60be1fbf 100644 (file)
--- a/arch/arm/plat-omap/iommu.c
+++ b/arch/arm/plat-omap/iommu.c
@@ -320,6 +320,7 @@ void flush_iotlb_page(struct iommu *obj, u32 da)
                 if ((start <= da) && (da < start + bytes)) {
                         dev_dbg(obj->dev, "%s: %08x<=%08x(%x)\n",
                                 __func__, start, da, bytes);
+                       iotlb_load_cr(obj, &cr);
                         iommu_write_reg(obj, 1, MMU_FLUSH_ENTRY);
                 }
         }
diff --git a/arch/arm/plat-omap/mcbsp.c b/arch/arm/plat-omap/mcbsp.c

index e31496e35b0f452d4ff9e375855718fc0a078d40..0c8612fd831237164968b1f2120a1134618557e3 100644 (file)
--- a/arch/arm/plat-omap/mcbsp.c
+++ b/arch/arm/plat-omap/mcbsp.c
@@ -156,7 +156,7 @@ static irqreturn_t omap_mcbsp_rx_irq_handler(int irq, void *dev_id)
                 /* Writing zero to RSYNC_ERR clears the IRQ */
                 MCBSP_WRITE(mcbsp_rx, SPCR1, MCBSP_READ_CACHE(mcbsp_rx, SPCR1));
         } else {
-               complete(&mcbsp_rx->tx_irq_completion);
+               complete(&mcbsp_rx->rx_irq_completion);
         }
  
         return IRQ_HANDLED;
diff --git a/arch/arm/plat-samsung/adc.c b/arch/arm/plat-samsung/adc.c

index 04d9521ddc9f0e652d62453a99bbb19fa877c7fa..e8f2be2d67f2cac33961487b47a7880b68f33832 100644 (file)
--- a/arch/arm/plat-samsung/adc.c
+++ b/arch/arm/plat-samsung/adc.c
@@ -435,7 +435,6 @@ static int s3c_adc_suspend(struct platform_device *pdev, pm_message_t state)
  static int s3c_adc_resume(struct platform_device *pdev)
  {
         struct adc_device *adc = platform_get_drvdata(pdev);
-       unsigned long flags;
  
         clk_enable(adc->clk);
         enable_irq(adc->irq);
diff --git a/arch/arm/plat-samsung/clock.c b/arch/arm/plat-samsung/clock.c

index 90a20512d68d5a40d9fa789b15c24620a0c8f9b3..e8d20b0bc50e11ee43d78f230b811409def1cb87 100644 (file)
--- a/arch/arm/plat-samsung/clock.c
+++ b/arch/arm/plat-samsung/clock.c
@@ -48,6 +48,9 @@
  #include <plat/clock.h>
  #include <plat/cpu.h>
  
+#include <linux/serial_core.h>
+#include <plat/regs-serial.h> /* for s3c24xx_uart_devs */
+
  /* clock information */
  
  static LIST_HEAD(clocks);
@@ -65,6 +68,28 @@ static int clk_null_enable(struct clk *clk, int enable)
         return 0;
  }
  
+static int dev_is_s3c_uart(struct device *dev)
+{
+       struct platform_device **pdev = s3c24xx_uart_devs;
+       int i;
+       for (i = 0; i < ARRAY_SIZE(s3c24xx_uart_devs); i++, pdev++)
+               if (*pdev && dev == &(*pdev)->dev)
+                       return 1;
+       return 0;
+}
+
+/*
+ * Serial drivers call get_clock() very early, before platform bus
+ * has been set up, this requires a special check to let them get
+ * a proper clock
+ */
+
+static int dev_is_platform_device(struct device *dev)
+{
+       return dev->bus == &platform_bus_type ||
+              (dev->bus == NULL && dev_is_s3c_uart(dev));
+}
+
  /* Clock API calls */
  
  struct clk *clk_get(struct device *dev, const char *id)
@@ -73,7 +98,7 @@ struct clk *clk_get(struct device *dev, const char *id)
         struct clk *clk = ERR_PTR(-ENOENT);
         int idno;
  
-       if (dev == NULL || dev->bus != &platform_bus_type)
+       if (dev == NULL || !dev_is_platform_device(dev))
                 idno = -1;
         else
                 idno = to_platform_device(dev)->id;
diff --git a/arch/avr32/kernel/module.c b/arch/avr32/kernel/module.c

index 98f94d041d9c1dd212a0519efac60def8034b7db..a727f54d64d6e633d58ae2836bbbfae4c82f0017 100644 (file)
--- a/arch/avr32/kernel/module.c
+++ b/arch/avr32/kernel/module.c
@@ -314,10 +314,9 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
         vfree(module->arch.syminfo);
         module->arch.syminfo = NULL;
  
-       return module_bug_finalize(hdr, sechdrs, module);
+       return 0;
  }
  
  void module_arch_cleanup(struct module *module)
  {
-       module_bug_cleanup(module);
  }
diff --git a/arch/h8300/kernel/module.c b/arch/h8300/kernel/module.c

index 0865e291c20d2948c95edc70f52925121409599d..db4953dc4e1b445adbdd7e004a68890a5c1c07a6 100644 (file)
--- a/arch/h8300/kernel/module.c
+++ b/arch/h8300/kernel/module.c
@@ -112,10 +112,9 @@ int module_finalize(const Elf_Ehdr *hdr,
                     const Elf_Shdr *sechdrs,
                     struct module *me)
  {
-       return module_bug_finalize(hdr, sechdrs, me);
+       return 0;
  }
  
  void module_arch_cleanup(struct module *mod)
  {
-       module_bug_cleanup(mod);
  }
diff --git a/arch/m32r/include/asm/elf.h b/arch/m32r/include/asm/elf.h

index 2f85412ef7302a0aedaf1b5e26f63719c69dc033..b8da7d0574d20635f489315ea8caf957d063be08 100644 (file)
--- a/arch/m32r/include/asm/elf.h
+++ b/arch/m32r/include/asm/elf.h
@@ -82,9 +82,9 @@ typedef elf_fpreg_t elf_fpregset_t;
   * These are used to set parameters in the core dumps.
   */
  #define ELF_CLASS      ELFCLASS32
-#if defined(__LITTLE_ENDIAN)
+#if defined(__LITTLE_ENDIAN__)
  #define ELF_DATA       ELFDATA2LSB
-#elif defined(__BIG_ENDIAN)
+#elif defined(__BIG_ENDIAN__)
  #define ELF_DATA       ELFDATA2MSB
  #else
  #error no endian defined
diff --git a/arch/m32r/kernel/.gitignore b/arch/m32r/kernel/.gitignore

new file mode 100644 (file)

index 0000000..c5f676c
--- /dev/null
+++ b/arch/m32r/kernel/.gitignore
@@ -0,0 +1 @@
+vmlinux.lds
diff --git a/arch/m32r/kernel/signal.c b/arch/m32r/kernel/signal.c

index 7bbe38645ed5559395f85c5b5fce93eb3f4992e3..a08697f0886d7988b012fa727c777d8bbed7b3fb 100644 (file)
--- a/arch/m32r/kernel/signal.c
+++ b/arch/m32r/kernel/signal.c
@@ -28,6 +28,8 @@
  
  #define DEBUG_SIG 0
  
+#define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP)))
+
  asmlinkage int
  sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
                 unsigned long r2, unsigned long r3, unsigned long r4,
@@ -254,7 +256,7 @@ give_sigsegv:
  static int prev_insn(struct pt_regs *regs)
  {
         u16 inst;
-       if (get_user(&inst, (u16 __user *)(regs->bpc - 2)))
+       if (get_user(inst, (u16 __user *)(regs->bpc - 2)))
                 return -EFAULT;
         if ((inst & 0xfff0) == 0x10f0)  /* trap ? */
                 regs->bpc -= 2;
diff --git a/arch/m68k/mac/macboing.c b/arch/m68k/mac/macboing.c

index 8f0640847ad2bf7bf99d0a184ed10ce8272a84f8..05285d08e54767a71a814773c23a506386f9626a 100644 (file)
--- a/arch/m68k/mac/macboing.c
+++ b/arch/m68k/mac/macboing.c
@@ -162,7 +162,7 @@ static void mac_init_asc( void )
  void mac_mksound( unsigned int freq, unsigned int length )
  {
         __u32 cfreq = ( freq << 5 ) / 468;
-       __u32 flags;
+       unsigned long flags;
         int i;
  
         if ( mac_special_bell == NULL )
@@ -224,7 +224,7 @@ static void mac_nosound( unsigned long ignored )
   */
  static void mac_quadra_start_bell( unsigned int freq, unsigned int length, unsigned int volume )
  {
-       __u32 flags;
+       unsigned long flags;
  
         /* if the bell is already ringing, ring longer */
         if ( mac_bell_duration > 0 )
@@ -271,7 +271,7 @@ static void mac_quadra_start_bell( unsigned int freq, unsigned int length, unsig
  static void mac_quadra_ring_bell( unsigned long ignored )
  {
         int     i, count = mac_asc_samplespersec / HZ;
-       __u32 flags;
+       unsigned long flags;
  
         /*
          * we neither want a sound buffer overflow nor underflow, so we need to match
diff --git a/arch/mips/Kbuild b/arch/mips/Kbuild

index e322d65f33a41e0085e5d352410ca0b5c3f37e26..7dd65cfae83759562e43ab20bb03be071ae56ce5 100644 (file)
--- a/arch/mips/Kbuild
+++ b/arch/mips/Kbuild
@@ -7,6 +7,10 @@ subdir-ccflags-y := -Werror
  include arch/mips/Kbuild.platforms
  obj-y := $(platform-y)
  
+# make clean traverses $(obj-) without having included .config, so
+# everything ends up here
+obj- := $(platform-)
+
  # mips object files
  # The object files are linked as core-y files would be linked
  
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig

index 3ad59dde485209bce858c425e4fc8a2f64b04c09..4c9f402295dd3d9b11548ab47d26853761df62f1 100644 (file)
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -13,6 +13,7 @@ config MIPS
         select HAVE_KPROBES
         select HAVE_KRETPROBES
         select RTC_LIB if !MACH_LOONGSON
+       select GENERIC_ATOMIC64 if !64BIT
  
  mainmenu "Linux/MIPS Kernel Configuration"
  
@@ -880,11 +881,15 @@ config NO_IOPORT
  config GENERIC_ISA_DMA
         bool
         select ZONE_DMA if GENERIC_ISA_DMA_SUPPORT_BROKEN=n
+       select ISA_DMA_API
  
  config GENERIC_ISA_DMA_SUPPORT_BROKEN
         bool
         select GENERIC_ISA_DMA
  
+config ISA_DMA_API
+       bool
+
  config GENERIC_GPIO
         bool
  
@@ -1646,8 +1651,16 @@ config MIPS_MT_SMP
         select SYS_SUPPORTS_SMP
         select SMP_UP
         help
-         This is a kernel model which is also known a VSMP or lately
-         has been marketesed into SMVP.
+         This is a kernel model which is known a VSMP but lately has been
+         marketesed into SMVP.
+         Virtual SMP uses the processor's VPEs  to implement virtual
+         processors. In currently available configuration of the 34K processor
+         this allows for a dual processor. Both processors will share the same
+         primary caches; each will obtain the half of the TLB for it's own
+         exclusive use. For a layman this model can be described as similar to
+         what Intel calls Hyperthreading.
+
+         For further information see http://www.linux-mips.org/wiki/34K#VSMP
  
  config MIPS_MT_SMTC
         bool "SMTC: Use all TCs on all VPEs for SMP"
@@ -1664,6 +1677,14 @@ config MIPS_MT_SMTC
         help
           This is a kernel model which is known a SMTC or lately has been
           marketesed into SMVP.
+         is presenting the available TC's of the core as processors to Linux.
+         On currently available 34K processors this means a Linux system will
+         see up to 5 processors. The implementation of the SMTC kernel differs
+         significantly from VSMP and cannot efficiently coexist in the same
+         kernel binary so the choice between VSMP and SMTC is a compile time
+         decision.
+
+         For further information see http://www.linux-mips.org/wiki/34K#SMTC
  
  endchoice
  
diff --git a/arch/mips/alchemy/common/prom.c b/arch/mips/alchemy/common/prom.c

index c29511b11d44fd6732b0dd153d009dbd7051555d..5340210596297fa54c8723e866aebaeb8c20269e 100644 (file)
--- a/arch/mips/alchemy/common/prom.c
+++ b/arch/mips/alchemy/common/prom.c
@@ -43,7 +43,7 @@ int prom_argc;
  char **prom_argv;
  char **prom_envp;
  
-void prom_init_cmdline(void)
+void __init prom_init_cmdline(void)
  {
         int i;
  
@@ -104,7 +104,7 @@ static inline void str2eaddr(unsigned char *ea, unsigned char *str)
         }
  }
  
-int prom_get_ethernet_addr(char *ethernet_addr)
+int __init prom_get_ethernet_addr(char *ethernet_addr)
  {
         char *ethaddr_str;
  
@@ -123,7 +123,6 @@ int prom_get_ethernet_addr(char *ethernet_addr)
  
         return 0;
  }
-EXPORT_SYMBOL(prom_get_ethernet_addr);
  
  void __init prom_free_prom_memory(void)
  {
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile

index ed9bb709c9a3816a4738d0ceef91db8c8de20cd4..5042d51b0512a087b02b85853aad2d2b38d9d7d6 100644 (file)
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -59,7 +59,7 @@ $(obj)/piggy.o: $(obj)/dummy.o $(obj)/vmlinux.bin.z FORCE
  hostprogs-y := calc_vmlinuz_load_addr
  
  VMLINUZ_LOAD_ADDRESS = $(shell $(obj)/calc_vmlinuz_load_addr \
-               $(objtree)/$(KBUILD_IMAGE) $(VMLINUX_LOAD_ADDRESS))
+               $(obj)/vmlinux.bin $(VMLINUX_LOAD_ADDRESS))
  
  vmlinuzobjs-y += $(obj)/piggy.o
  
@@ -105,4 +105,4 @@ OBJCOPYFLAGS_vmlinuz.srec := $(OBJCOPYFLAGS) -S -O srec
  vmlinuz.srec: vmlinuz
         $(call cmd,objcopy)
  
-clean-files := $(objtree)/vmlinuz.*
+clean-files := $(objtree)/vmlinuz $(objtree)/vmlinuz.{32,ecoff,bin,srec}
diff --git a/arch/mips/cavium-octeon/Kconfig b/arch/mips/cavium-octeon/Kconfig

index 094c17e38e163ab691b71bef78710f73848c7645..47323ca452dcbde751536c58c9f613dd1d9d51c8 100644 (file)
--- a/arch/mips/cavium-octeon/Kconfig
+++ b/arch/mips/cavium-octeon/Kconfig
@@ -83,3 +83,7 @@ config ARCH_SPARSEMEM_ENABLE
         def_bool y
         select SPARSEMEM_STATIC
         depends on CPU_CAVIUM_OCTEON
+
+config CAVIUM_OCTEON_HELPER
+       def_bool y
+       depends on OCTEON_ETHERNET || PCI
diff --git a/arch/mips/cavium-octeon/cpu.c b/arch/mips/cavium-octeon/cpu.c

index c664c8cc2b42cb8970b9f57531a03e2998075566..a5b427909b5cac04d28c4da1b099342ee72df4ce 100644 (file)
--- a/arch/mips/cavium-octeon/cpu.c
+++ b/arch/mips/cavium-octeon/cpu.c
@@ -41,7 +41,7 @@ static int cnmips_cu2_call(struct notifier_block *nfb, unsigned long action,
         return NOTIFY_OK;               /* Let default notifier send signals */
  }
  
-static int cnmips_cu2_setup(void)
+static int __init cnmips_cu2_setup(void)
  {
         return cu2_notifier(cnmips_cu2_call, 0);
  }
diff --git a/arch/mips/cavium-octeon/executive/Makefile b/arch/mips/cavium-octeon/executive/Makefile

index 2fd66db6939e0f981c338015127c69788d960843..7f41c5be2190ddca03fc92a00e8f21bd735414f5 100644 (file)
--- a/arch/mips/cavium-octeon/executive/Makefile
+++ b/arch/mips/cavium-octeon/executive/Makefile
@@ -11,4 +11,4 @@
  
  obj-y += cvmx-bootmem.o cvmx-l2c.o cvmx-sysinfo.o octeon-model.o
  
-obj-$(CONFIG_PCI) += cvmx-helper-errata.o cvmx-helper-jtag.o
+obj-$(CONFIG_CAVIUM_OCTEON_HELPER) += cvmx-helper-errata.o cvmx-helper-jtag.o
diff --git a/arch/mips/dec/Platform b/arch/mips/dec/Platform

index 3adbcbd95db1efd4dbb18fba91bc978153079d7f..cf55a6f4e720c4e831733ceab40d820fa136a2c8 100644 (file)
--- a/arch/mips/dec/Platform
+++ b/arch/mips/dec/Platform
@@ -1,7 +1,7 @@
  #
  # DECstation family
  #
-platform-$(CONFIG_MACH_DECSTATION)     = dec/
+platform-$(CONFIG_MACH_DECSTATION)     += dec/
  cflags-$(CONFIG_MACH_DECSTATION)       += \
                         -I$(srctree)/arch/mips/include/asm/mach-dec
  libs-$(CONFIG_MACH_DECSTATION)         += arch/mips/dec/prom/
diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h

index c63c56bfd18461b558e0cba7380d6297fc47115e..47d87da379f947c8a578f2a2b84e7da804f559a8 100644 (file)
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -782,6 +782,10 @@ static __inline__ int atomic64_add_unless(atomic64_t *v, long a, long u)
   */
  #define atomic64_add_negative(i, v) (atomic64_add_return(i, (v)) < 0)
  
+#else /* !CONFIG_64BIT */
+
+#include <asm-generic/atomic64.h>
+
  #endif /* CONFIG_64BIT */
  
  /*
diff --git a/arch/mips/include/asm/cop2.h b/arch/mips/include/asm/cop2.h

index 2cb2f0c2c4f89342ae5256a79d8082f01af4d319..3532e2c5f098ae46a4a79f7455cd004d699dd4a9 100644 (file)
--- a/arch/mips/include/asm/cop2.h
+++ b/arch/mips/include/asm/cop2.h
@@ -24,7 +24,7 @@ extern int cu2_notifier_call_chain(unsigned long val, void *v);
  
  #define cu2_notifier(fn, pri)                                          \
  ({                                                                     \
-       static struct notifier_block fn##_nb __cpuinitdata = {          \
+       static struct notifier_block fn##_nb = {                        \
                 .notifier_call = fn,                                    \
                 .priority = pri                                         \
         };                                                              \
diff --git a/arch/mips/include/asm/fcntl.h b/arch/mips/include/asm/fcntl.h

index e482fe90fe8850609ed05f79d4caa9423e31031a..75eddedcfc3ee31a5ba500089fb92215311f51eb 100644 (file)
--- a/arch/mips/include/asm/fcntl.h
+++ b/arch/mips/include/asm/fcntl.h
@@ -56,6 +56,7 @@
   */
  
  #ifdef CONFIG_32BIT
+#include <linux/types.h>
  
  struct flock {
         short   l_type;
diff --git a/arch/mips/include/asm/gic.h b/arch/mips/include/asm/gic.h

index 9b9436a4d816bfe7f26a1eb8a7e7cb677507c388..86548da650e765f79db345f4d3964a5d0eb6c0fe 100644 (file)
--- a/arch/mips/include/asm/gic.h
+++ b/arch/mips/include/asm/gic.h
@@ -321,6 +321,7 @@ struct gic_intrmask_regs {
   */
  struct gic_intr_map {
         unsigned int cpunum;    /* Directed to this CPU */
+#define GIC_UNUSED             0xdead                  /* Dummy data */
         unsigned int pin;       /* Directed to this Pin */
         unsigned int polarity;  /* Polarity : +/-       */
         unsigned int trigtype;  /* Trigger  : Edge/Levl */
diff --git a/arch/mips/include/asm/mach-tx49xx/kmalloc.h b/arch/mips/include/asm/mach-tx49xx/kmalloc.h

index b74caf65482b2e068b86d2763daa81733bc45503..ff9a8b86cb9363c1fb458546c56550fc35a7a3ab 100644 (file)
--- a/arch/mips/include/asm/mach-tx49xx/kmalloc.h
+++ b/arch/mips/include/asm/mach-tx49xx/kmalloc.h
@@ -1,6 +1,6 @@
  #ifndef __ASM_MACH_TX49XX_KMALLOC_H
  #define __ASM_MACH_TX49XX_KMALLOC_H
  
-#define ARCH_KMALLOC_MINALIGN  L1_CACHE_BYTES
+#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
  
  #endif /* __ASM_MACH_TX49XX_KMALLOC_H */
diff --git a/arch/mips/include/asm/mips-boards/maltaint.h b/arch/mips/include/asm/mips-boards/maltaint.h

index cea872fc6f5c0d1ae92f00f920f7f991c8b9b806..d11aa02a956a57ca41ff890dbe16bbdd0145db53 100644 (file)
--- a/arch/mips/include/asm/mips-boards/maltaint.h
+++ b/arch/mips/include/asm/mips-boards/maltaint.h
@@ -88,9 +88,6 @@
  
  #define GIC_EXT_INTR(x)                x
  
-/* Dummy data */
-#define X                      0xdead
-
  /* External Interrupts used for IPI */
  #define GIC_IPI_EXT_INTR_RESCHED_VPE0  16
  #define GIC_IPI_EXT_INTR_CALLFNC_VPE0  17
diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h

index a16beafcea91dd091f0b491ea572b5902e272a3d..e59cd1ac09c2f82eb8c91af1bdb0b6a520513d89 100644 (file)
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -150,6 +150,20 @@ typedef struct { unsigned long pgprot; } pgprot_t;
      ((unsigned long)(x) - PAGE_OFFSET + PHYS_OFFSET)
  #endif
  #define __va(x)                ((void *)((unsigned long)(x) + PAGE_OFFSET - PHYS_OFFSET))
+
+/*
+ * RELOC_HIDE was originally added by 6007b903dfe5f1d13e0c711ac2894bdd4a61b1ad
+ * (lmo) rsp. 8431fd094d625b94d364fe393076ccef88e6ce18 (kernel.org).  The
+ * discussion can be found in lkml posting
+ * <a2ebde260608230500o3407b108hc03debb9da6e62c@mail.gmail.com> which is
+ * archived at http://lists.linuxcoding.com/kernel/2006-q3/msg17360.html
+ *
+ * It is unclear if the misscompilations mentioned in
+ * http://lkml.org/lkml/2010/8/8/138 also affect MIPS so we keep this one
+ * until GCC 3.x has been retired before we can apply
+ * https://patchwork.linux-mips.org/patch/1541/
+ */
+
  #define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0))
  
  #define pfn_to_kaddr(pfn)      __va((pfn) << PAGE_SHIFT)
diff --git a/arch/mips/include/asm/siginfo.h b/arch/mips/include/asm/siginfo.h

index 96e28f18dad11fbc0dabf41e4fbaf25f1302f3b0..1ca64b4d33d96844da375d3df443246f781e564a 100644 (file)
--- a/arch/mips/include/asm/siginfo.h
+++ b/arch/mips/include/asm/siginfo.h
@@ -88,6 +88,7 @@ typedef struct siginfo {
  #ifdef __ARCH_SI_TRAPNO
                         int _trapno;    /* TRAP # which caused the signal */
  #endif
+                       short _addr_lsb;
                 } _sigfault;
  
                 /* SIGPOLL, SIGXFSZ (To do ...)  */
diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h

index 2376f2e06e470a264eeff5115dc692c80e9f37c3..70df9c0d3c5be20e2d7b646460276a44374fc077 100644 (file)
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -146,7 +146,8 @@ register struct thread_info *__current_thread_info __asm__("$28");
  #define _TIF_LOAD_WATCH                (1<<TIF_LOAD_WATCH)
  
  /* work to do on interrupt/exception return */
-#define _TIF_WORK_MASK         (0x0000ffef & ~_TIF_SECCOMP)
+#define _TIF_WORK_MASK         (0x0000ffef &                           \
+                                       ~(_TIF_SECCOMP | _TIF_SYSCALL_AUDIT))
  /* work to do on any return to u-space */
  #define _TIF_ALLWORK_MASK      (0x8000ffff & ~_TIF_SECCOMP)
  
diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h

index baa318a59c97f8c2791842df809c1218796c7492..550725b881d5edec666a5b32bbe1200164ac7fd8 100644 (file)
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -356,16 +356,19 @@
  #define __NR_perf_event_open           (__NR_Linux + 333)
  #define __NR_accept4                   (__NR_Linux + 334)
  #define __NR_recvmmsg                  (__NR_Linux + 335)
+#define __NR_fanotify_init             (__NR_Linux + 336)
+#define __NR_fanotify_mark             (__NR_Linux + 337)
+#define __NR_prlimit64                 (__NR_Linux + 338)
  
  /*
   * Offset of the last Linux o32 flavoured syscall
   */
-#define __NR_Linux_syscalls            335
+#define __NR_Linux_syscalls            338
  
  #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
  
  #define __NR_O32_Linux                 4000
-#define __NR_O32_Linux_syscalls                335
+#define __NR_O32_Linux_syscalls                338
  
  #if _MIPS_SIM == _MIPS_SIM_ABI64
  
@@ -668,16 +671,19 @@
  #define __NR_perf_event_open           (__NR_Linux + 292)
  #define __NR_accept4                   (__NR_Linux + 293)
  #define __NR_recvmmsg                  (__NR_Linux + 294)
+#define __NR_fanotify_init             (__NR_Linux + 295)
+#define __NR_fanotify_mark             (__NR_Linux + 296)
+#define __NR_prlimit64                 (__NR_Linux + 297)
  
  /*
   * Offset of the last Linux 64-bit flavoured syscall
   */
-#define __NR_Linux_syscalls            294
+#define __NR_Linux_syscalls            297
  
  #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
  
  #define __NR_64_Linux                  5000
-#define __NR_64_Linux_syscalls         294
+#define __NR_64_Linux_syscalls         297
  
  #if _MIPS_SIM == _MIPS_SIM_NABI32
  
@@ -985,16 +991,19 @@
  #define __NR_accept4                   (__NR_Linux + 297)
  #define __NR_recvmmsg                  (__NR_Linux + 298)
  #define __NR_getdents64                        (__NR_Linux + 299)
+#define __NR_fanotify_init             (__NR_Linux + 300)
+#define __NR_fanotify_mark             (__NR_Linux + 301)
+#define __NR_prlimit64                 (__NR_Linux + 302)
  
  /*
   * Offset of the last N32 flavoured syscall
   */
-#define __NR_Linux_syscalls            299
+#define __NR_Linux_syscalls            302
  
  #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
  
  #define __NR_N32_Linux                 6000
-#define __NR_N32_Linux_syscalls                299
+#define __NR_N32_Linux_syscalls                302
  
  #ifdef __KERNEL__
  
diff --git a/arch/mips/jz4740/Platform b/arch/mips/jz4740/Platform

index 6a97230e3d05ee4a53478c2a4625c51d92b26bd2..ba91be9c21ef405f65e0ae87fafe92a66ac8b06f 100644 (file)
--- a/arch/mips/jz4740/Platform
+++ b/arch/mips/jz4740/Platform
@@ -1,3 +1,3 @@
-core-$(CONFIG_MACH_JZ4740)     += arch/mips/jz4740/
+platform-$(CONFIG_MACH_JZ4740) += jz4740/
  cflags-$(CONFIG_MACH_JZ4740)   += -I$(srctree)/arch/mips/include/asm/mach-jz4740
  load-$(CONFIG_MACH_JZ4740)     += 0xffffffff80010000
diff --git a/arch/mips/kernel/branch.c b/arch/mips/kernel/branch.c

index 0176ed015c895644bc72fc30bc661c2ad555b515..32103cc2a2576877592d91050b86c8944a2479d9 100644 (file)
--- a/arch/mips/kernel/branch.c
+++ b/arch/mips/kernel/branch.c
@@ -40,7 +40,6 @@ int __compute_return_epc(struct pt_regs *regs)
                 return -EFAULT;
         }
  
-       regs->regs[0] = 0;
         switch (insn.i_format.opcode) {
         /*
          * jr and jalr are in r_format format.
diff --git a/arch/mips/kernel/irq-gic.c b/arch/mips/kernel/irq-gic.c

index b181f2f0ea8e71709f331c8ee32a7f6792999106..82ba9f62f49e3b2faa98abc1f6da2fe1e062a4ed 100644 (file)
--- a/arch/mips/kernel/irq-gic.c
+++ b/arch/mips/kernel/irq-gic.c
@@ -7,7 +7,6 @@
  #include <asm/io.h>
  #include <asm/gic.h>
  #include <asm/gcmpregs.h>
-#include <asm/mips-boards/maltaint.h>
  #include <asm/irq.h>
  #include <linux/hardirq.h>
  #include <asm-generic/bitops/find.h>
@@ -131,7 +130,7 @@ static int gic_set_affinity(unsigned int irq, const struct cpumask *cpumask)
         int             i;
  
         irq -= _irqbase;
-       pr_debug(KERN_DEBUG "%s(%d) called\n", __func__, irq);
+       pr_debug("%s(%d) called\n", __func__, irq);
         cpumask_and(&tmp, cpumask, cpu_online_mask);
         if (cpus_empty(tmp))
                 return -1;
@@ -222,7 +221,7 @@ static void __init gic_basic_init(int numintrs, int numvpes,
         /* Setup specifics */
         for (i = 0; i < mapsize; i++) {
                 cpu = intrmap[i].cpunum;
-               if (cpu == X)
+               if (cpu == GIC_UNUSED)
                         continue;
                 if (cpu == 0 && i != 0 && intrmap[i].flags == 0)
                         continue;
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c

index 1f4e2fa64140ee8204aed74ecf82eba7bab056be..f4546e97c60db111215495f924aa567595c39581 100644 (file)
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -283,7 +283,7 @@ static int kgdb_mips_notify(struct notifier_block *self, unsigned long cmd,
         struct pt_regs *regs = args->regs;
         int trap = (regs->cp0_cause & 0x7c) >> 2;
  
-       /* Userpace events, ignore. */
+       /* Userspace events, ignore. */
         if (user_mode(regs))
                 return NOTIFY_DONE;
  
diff --git a/arch/mips/kernel/kspd.c b/arch/mips/kernel/kspd.c

index 80e2ba694babcd0d70bd8266a6be941996e2a8a8..29811f043399588604da9bbc00efd9f0997aa530 100644 (file)
--- a/arch/mips/kernel/kspd.c
+++ b/arch/mips/kernel/kspd.c
@@ -251,7 +251,7 @@ void sp_work_handle_request(void)
                 memset(&tz, 0, sizeof(tz));
                 if ((ret.retval = sp_syscall(__NR_gettimeofday, (int)&tv,
                                              (int)&tz, 0, 0)) == 0)
-               ret.retval = tv.tv_sec;
+                       ret.retval = tv.tv_sec;
                 break;
  
         case MTSP_SYSCALL_EXIT:
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c

index c2dab140dc98fb1588259063699c7ba09b6f8ed7..6343b4a5b8350cb3a93edea5d75f3154cde48343 100644 (file)
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -341,3 +341,10 @@ asmlinkage long sys32_lookup_dcookie(u32 a0, u32 a1, char __user *buf,
  {
         return sys_lookup_dcookie(merge_64(a0, a1), buf, len);
  }
+
+SYSCALL_DEFINE6(32_fanotify_mark, int, fanotify_fd, unsigned int, flags,
+               u64, a3, u64, a4, int, dfd, const char  __user *, pathname)
+{
+       return sys_fanotify_mark(fanotify_fd, flags, merge_64(a3, a4),
+                                dfd, pathname);
+}
diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c

index 2340f11dc29cc8de689593c8b49315b613a6aae8..9a526ba6f25766f3ab6b58bc70b7f792c4011ff9 100644 (file)
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -103,7 +103,7 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
         if (!check_same_owner(p) && !capable(CAP_SYS_NICE))
                 goto out_unlock;
  
-       retval = security_task_setscheduler(p, 0, NULL);
+       retval = security_task_setscheduler(p)
         if (retval)
                 goto out_unlock;
  
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c

index c51b95ff86443e2fcb1eed618afe0861fd781eef..c8777333e19833667fe882110fe40d954fee5eeb 100644 (file)
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -536,7 +536,7 @@ asmlinkage void do_syscall_trace(struct pt_regs *regs, int entryexit)
  {
         /* do the secure computing check first */
         if (!entryexit)
-               secure_computing(regs->regs[0]);
+               secure_computing(regs->regs[2]);
  
         if (unlikely(current->audit_context) && entryexit)
                 audit_syscall_exit(AUDITSC_RESULT(regs->regs[2]),
@@ -565,7 +565,7 @@ asmlinkage void do_syscall_trace(struct pt_regs *regs, int entryexit)
  
  out:
         if (unlikely(current->audit_context) && !entryexit)
-               audit_syscall_entry(audit_arch(), regs->regs[0],
+               audit_syscall_entry(audit_arch(), regs->regs[2],
                                     regs->regs[4], regs->regs[5],
                                     regs->regs[6], regs->regs[7]);
  }
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S

index 17202bbe843f91172534dd30e41e8a9542e0d762..fbaabad0e6e28466aa1098d4f50c516d520b0c7d 100644 (file)
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -63,9 +63,9 @@ stack_done:
         sw      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       lw      t1, PT_R2(sp)           # syscall number
         negu    v0                      # error
-       sw      v0, PT_R0(sp)           # set flag for syscall
-                                       # restarting
+       sw      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sw      v0, PT_R2(sp)           # result
  
  o32_syscall_exit:
@@ -104,9 +104,9 @@ syscall_trace_entry:
         sw      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       lw      t1, PT_R2(sp)           # syscall number
         negu    v0                      # error
-       sw      v0, PT_R0(sp)           # set flag for syscall
-                                       # restarting
+       sw      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sw      v0, PT_R2(sp)           # result
  
         j       syscall_exit
@@ -169,8 +169,7 @@ stackargs:
          * We probably should handle this case a bit more drastic.
          */
  bad_stack:
-       negu    v0                              # error
-       sw      v0, PT_R0(sp)
+       li      v0, EFAULT
         sw      v0, PT_R2(sp)
         li      t0, 1                           # set error flag
         sw      t0, PT_R7(sp)
@@ -583,7 +582,10 @@ einval:    li      v0, -ENOSYS
         sys     sys_rt_tgsigqueueinfo   4
         sys     sys_perf_event_open     5
         sys     sys_accept4             4
-       sys     sys_recvmmsg            5
+       sys     sys_recvmmsg            5       /* 4335 */
+       sys     sys_fanotify_init       2
+       sys     sys_fanotify_mark       6
+       sys     sys_prlimit64           4
         .endm
  
         /* We pre-compute the number of _instruction_ bytes needed to
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S

index a8a6c596eb0405bab886e8dfff6ffeb8097d7fc6..3f4179283207b1cc21e7fc14d9fea4da0c38bb28 100644 (file)
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -66,9 +66,9 @@ NESTED(handle_sys64, PT_SIZE, sp)
         sd      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       ld      t1, PT_R2(sp)           # syscall number
         dnegu   v0                      # error
-       sd      v0, PT_R0(sp)           # set flag for syscall
-                                       # restarting
+       sd      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sd      v0, PT_R2(sp)           # result
  
  n64_syscall_exit:
@@ -109,8 +109,9 @@ syscall_trace_entry:
         sd      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       ld      t1, PT_R2(sp)           # syscall number
         dnegu   v0                      # error
-       sd      v0, PT_R0(sp)           # set flag for syscall restarting
+       sd      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sd      v0, PT_R2(sp)           # result
  
         j       syscall_exit
@@ -416,9 +417,12 @@ sys_call_table:
         PTR     sys_pipe2
         PTR     sys_inotify_init1
         PTR     sys_preadv
-       PTR     sys_pwritev                     /* 5390 */
+       PTR     sys_pwritev                     /* 5290 */
         PTR     sys_rt_tgsigqueueinfo
         PTR     sys_perf_event_open
         PTR     sys_accept4
-       PTR     sys_recvmmsg
+       PTR     sys_recvmmsg
+       PTR     sys_fanotify_init               /* 5295 */
+       PTR     sys_fanotify_mark
+       PTR     sys_prlimit64
         .size   sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S

index a3d66137731ac24386972c82426a4679d9ff9be3..f08ece6d8acc7f3aa78ecbca23f801ae76727f6a 100644 (file)
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -65,8 +65,9 @@ NESTED(handle_sysn32, PT_SIZE, sp)
         sd      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       ld      t1, PT_R2(sp)           # syscall number
         dnegu   v0                      # error
-       sd      v0, PT_R0(sp)           # set flag for syscall restarting
+       sd      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sd      v0, PT_R2(sp)           # result
  
         local_irq_disable               # make sure need_resched and
@@ -106,8 +107,9 @@ n32_syscall_trace_entry:
         sd      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       ld      t1, PT_R2(sp)           # syscall number
         dnegu   v0                      # error
-       sd      v0, PT_R0(sp)           # set flag for syscall restarting
+       sd      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sd      v0, PT_R2(sp)           # result
  
         j       syscall_exit
@@ -320,10 +322,10 @@ EXPORT(sysn32_call_table)
         PTR     sys_cacheflush
         PTR     sys_cachectl
         PTR     sys_sysmips
-       PTR     sys_io_setup                    /* 6200 */
+       PTR     compat_sys_io_setup                     /* 6200 */
         PTR     sys_io_destroy
-       PTR     sys_io_getevents
-       PTR     sys_io_submit
+       PTR     compat_sys_io_getevents
+       PTR     compat_sys_io_submit
         PTR     sys_io_cancel
         PTR     sys_exit_group                  /* 6205 */
         PTR     sys_lookup_dcookie
@@ -419,5 +421,8 @@ EXPORT(sysn32_call_table)
         PTR     sys_perf_event_open
         PTR     sys_accept4
         PTR     compat_sys_recvmmsg
-       PTR     sys_getdents
+       PTR     sys_getdents64
+       PTR     sys_fanotify_init               /* 6300 */
+       PTR     sys_fanotify_mark
+       PTR     sys_prlimit64
         .size   sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S

index 813689ef23847c6a2db230ebd6dcae60e0ff79f6..78d768a3e19da78fc986e9170f73b240ee98c1c2 100644 (file)
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -93,8 +93,9 @@ NESTED(handle_sys, PT_SIZE, sp)
         sd      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       ld      t1, PT_R2(sp)           # syscall number
         dnegu   v0                      # error
-       sd      v0, PT_R0(sp)           # flag for syscall restarting
+       sd      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sd      v0, PT_R2(sp)           # result
  
  o32_syscall_exit:
@@ -142,8 +143,9 @@ trace_a_syscall:
         sd      t0, PT_R7(sp)           # set error flag
         beqz    t0, 1f
  
+       ld      t1, PT_R2(sp)           # syscall number
         dnegu   v0                      # error
-       sd      v0, PT_R0(sp)           # set flag for syscall restarting
+       sd      t1, PT_R0(sp)           # save it for syscall restarting
  1:     sd      v0, PT_R2(sp)           # result
  
         j       syscall_exit
@@ -154,8 +156,7 @@ trace_a_syscall:
          * The stackpointer for a call with more than 4 arguments is bad.
          */
  bad_stack:
-       dnegu   v0                      # error
-       sd      v0, PT_R0(sp)
+       li      v0, EFAULT
         sd      v0, PT_R2(sp)
         li      t0, 1                   # set error flag
         sd      t0, PT_R7(sp)
@@ -444,10 +445,10 @@ sys_call_table:
         PTR     compat_sys_futex
         PTR     compat_sys_sched_setaffinity
         PTR     compat_sys_sched_getaffinity    /* 4240 */
-       PTR     sys_io_setup
+       PTR     compat_sys_io_setup
         PTR     sys_io_destroy
-       PTR     sys_io_getevents
-       PTR     sys_io_submit
+       PTR     compat_sys_io_getevents
+       PTR     compat_sys_io_submit
         PTR     sys_io_cancel                   /* 4245 */
         PTR     sys_exit_group
         PTR     sys32_lookup_dcookie
@@ -538,5 +539,8 @@ sys_call_table:
         PTR     compat_sys_rt_tgsigqueueinfo
         PTR     sys_perf_event_open
         PTR     sys_accept4
-       PTR     compat_sys_recvmmsg
+       PTR     compat_sys_recvmmsg             /* 4335 */
+       PTR     sys_fanotify_init
+       PTR     sys_32_fanotify_mark
+       PTR     sys_prlimit64
         .size   sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c

index 2099d5a4c4b78224f85ee5f9b175907be3d8f15c..5922342bca3991d4b7ab4a9d6f8483fb3e416779 100644 (file)
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -390,7 +390,6 @@ asmlinkage void sys_rt_sigreturn(nabi_no_regargs struct pt_regs regs)
  {
         struct rt_sigframe __user *frame;
         sigset_t set;
-       stack_t st;
         int sig;
  
         frame = (struct rt_sigframe __user *) regs.regs[29];
@@ -411,11 +410,9 @@ asmlinkage void sys_rt_sigreturn(nabi_no_regargs struct pt_regs regs)
         else if (sig)
                 force_sig(sig, current);
  
-       if (__copy_from_user(&st, &frame->rs_uc.uc_stack, sizeof(st)))
-               goto badframe;
         /* It is more difficult to avoid calling this function than to
            call it and ignore errors.  */
-       do_sigaltstack((stack_t __user *)&st, NULL, regs.regs[29]);
+       do_sigaltstack(&frame->rs_uc.uc_stack, NULL, regs.regs[29]);
  
         /*
          * Don't let your children do this ...
@@ -550,23 +547,26 @@ static int handle_signal(unsigned long sig, siginfo_t *info,
         struct mips_abi *abi = current->thread.abi;
         void *vdso = current->mm->context.vdso;
  
-       switch(regs->regs[0]) {
-       case ERESTART_RESTARTBLOCK:
-       case ERESTARTNOHAND:
-               regs->regs[2] = EINTR;
-               break;
-       case ERESTARTSYS:
-               if (!(ka->sa.sa_flags & SA_RESTART)) {
+       if (regs->regs[0]) {
+               switch(regs->regs[2]) {
+               case ERESTART_RESTARTBLOCK:
+               case ERESTARTNOHAND:
                         regs->regs[2] = EINTR;
                         break;
+               case ERESTARTSYS:
+                       if (!(ka->sa.sa_flags & SA_RESTART)) {
+                               regs->regs[2] = EINTR;
+                               break;
+                       }
+               /* fallthrough */
+               case ERESTARTNOINTR:
+                       regs->regs[7] = regs->regs[26];
+                       regs->regs[2] = regs->regs[0];
+                       regs->cp0_epc -= 4;
                 }
-       /* fallthrough */
-       case ERESTARTNOINTR:            /* Userland will reload $v0.  */
-               regs->regs[7] = regs->regs[26];
-               regs->cp0_epc -= 8;
-       }
  
-       regs->regs[0] = 0;              /* Don't deal with this again.  */
+               regs->regs[0] = 0;              /* Don't deal with this again.  */
+       }
  
         if (sig_uses_siginfo(ka))
                 ret = abi->setup_rt_frame(vdso + abi->rt_signal_return_offset,
@@ -575,6 +575,9 @@ static int handle_signal(unsigned long sig, siginfo_t *info,
                 ret = abi->setup_frame(vdso + abi->signal_return_offset,
                                        ka, regs, sig, oldset);
  
+       if (ret)
+               return ret;
+
         spin_lock_irq(&current->sighand->siglock);
         sigorsets(&current->blocked, &current->blocked, &ka->sa.sa_mask);
         if (!(ka->sa.sa_flags & SA_NODEFER))
@@ -622,17 +625,13 @@ static void do_signal(struct pt_regs *regs)
                 return;
         }
  
-       /*
-        * Who's code doesn't conform to the restartable syscall convention
-        * dies here!!!  The li instruction, a single machine instruction,
-        * must directly be followed by the syscall instruction.
-        */
         if (regs->regs[0]) {
                 if (regs->regs[2] == ERESTARTNOHAND ||
                     regs->regs[2] == ERESTARTSYS ||
                     regs->regs[2] == ERESTARTNOINTR) {
+                       regs->regs[2] = regs->regs[0];
                         regs->regs[7] = regs->regs[26];
-                       regs->cp0_epc -= 8;
+                       regs->cp0_epc -= 4;
                 }
                 if (regs->regs[2] == ERESTART_RESTARTBLOCK) {
                         regs->regs[2] = current->thread.abi->restart;
diff --git a/arch/mips/kernel/signal_n32.c b/arch/mips/kernel/signal_n32.c

index 2c5df818c65ae0395768264f1192059b792aaf02..ee24d814d5b91bb474ff3ff114e49f86618cdeae 100644 (file)
--- a/arch/mips/kernel/signal_n32.c
+++ b/arch/mips/kernel/signal_n32.c
@@ -109,6 +109,7 @@ asmlinkage int sysn32_rt_sigsuspend(nabi_no_regargs struct pt_regs regs)
  asmlinkage void sysn32_rt_sigreturn(nabi_no_regargs struct pt_regs regs)
  {
         struct rt_sigframe_n32 __user *frame;
+       mm_segment_t old_fs;
         sigset_t set;
         stack_t st;
         s32 sp;
@@ -143,7 +144,11 @@ asmlinkage void sysn32_rt_sigreturn(nabi_no_regargs struct pt_regs regs)
  
         /* It is more difficult to avoid calling this function than to
            call it and ignore errors.  */
+       old_fs = get_fs();
+       set_fs(KERNEL_DS);
         do_sigaltstack((stack_t __user *)&st, NULL, regs.regs[29]);
+       set_fs(old_fs);
+
  
         /*
          * Don't let your children do this ...
diff --git a/arch/mips/kernel/unaligned.c b/arch/mips/kernel/unaligned.c

index 69b039ca8d8337e60ecead9e32fbe7bd64659a64..33d5a5ce4a29d56037a38abb346a99e0212c16c2 100644 (file)
--- a/arch/mips/kernel/unaligned.c
+++ b/arch/mips/kernel/unaligned.c
@@ -109,8 +109,6 @@ static void emulate_load_store_insn(struct pt_regs *regs,
         unsigned long value;
         unsigned int res;
  
-       regs->regs[0] = 0;
-
         /*
          * This load never faults.
          */
diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c

index 7ba890860d98cb3916c84f369e3fef0200b07f2b..469d4019f795bd072b0aa4ba109d075b5377f55c 100644 (file)
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -44,27 +44,39 @@ static inline int cpu_is_noncoherent_r10000(struct device *dev)
  
  static gfp_t massage_gfp_flags(const struct device *dev, gfp_t gfp)
  {
+       gfp_t dma_flag;
+
         /* ignore region specifiers */
         gfp &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
  
-#ifdef CONFIG_ZONE_DMA
+#ifdef CONFIG_ISA
         if (dev == NULL)
-               gfp |= __GFP_DMA;
-       else if (dev->coherent_dma_mask < DMA_BIT_MASK(24))
-               gfp |= __GFP_DMA;
+               dma_flag = __GFP_DMA;
         else
  #endif
-#ifdef CONFIG_ZONE_DMA32
+#if defined(CONFIG_ZONE_DMA32) && defined(CONFIG_ZONE_DMA)
              if (dev->coherent_dma_mask < DMA_BIT_MASK(32))
-               gfp |= __GFP_DMA32;
+                       dma_flag = __GFP_DMA;
+       else if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
+                       dma_flag = __GFP_DMA32;
+       else
+#endif
+#if defined(CONFIG_ZONE_DMA32) && !defined(CONFIG_ZONE_DMA)
+            if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
+               dma_flag = __GFP_DMA32;
+       else
+#endif
+#if defined(CONFIG_ZONE_DMA) && !defined(CONFIG_ZONE_DMA32)
+            if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
+               dma_flag = __GFP_DMA;
         else
  #endif
-               ;
+               dma_flag = 0;
  
         /* Don't invoke OOM killer */
         gfp |= __GFP_NORETRY;
  
-       return gfp;
+       return gfp | dma_flag;
  }
  
  void *dma_alloc_noncoherent(struct device *dev, size_t size,
diff --git a/arch/mips/mm/sc-rm7k.c b/arch/mips/mm/sc-rm7k.c

index 1ef75cd80a0d819827f5057549fdd2622442a752..274af3be1442b42fa41d3cb960b598ddcbf5b8c2 100644 (file)
--- a/arch/mips/mm/sc-rm7k.c
+++ b/arch/mips/mm/sc-rm7k.c
@@ -30,7 +30,7 @@
  #define tc_lsize       32
  
  extern unsigned long icache_way_size, dcache_way_size;
-unsigned long tcache_size;
+static unsigned long tcache_size;
  
  #include <asm/r4kcache.h>
  
diff --git a/arch/mips/mti-malta/malta-int.c b/arch/mips/mti-malta/malta-int.c

index 15949b0be811f9718af9e2896d4bd9e947c84897..b79b24afe3a2fc67ab6687a082e44d45bf219242 100644 (file)
--- a/arch/mips/mti-malta/malta-int.c
+++ b/arch/mips/mti-malta/malta-int.c
@@ -385,6 +385,8 @@ static int __initdata msc_nr_eicirqs = ARRAY_SIZE(msc_eicirqmap);
   */
  
  #define GIC_CPU_NMI GIC_MAP_TO_NMI_MSK
+#define X GIC_UNUSED
+
  static struct gic_intr_map gic_intr_map[GIC_NUM_INTRS] = {
         { X, X,            X,           X,              0 },
         { X, X,            X,           X,              0 },
@@ -404,6 +406,7 @@ static struct gic_intr_map gic_intr_map[GIC_NUM_INTRS] = {
         { X, X,            X,           X,              0 },
         /* The remainder of this table is initialised by fill_ipi_map */
  };
+#undef X
  
  /*
   * GCMP needs to be detected before any SMP initialisation
diff --git a/arch/mips/pci/pci-rc32434.c b/arch/mips/pci/pci-rc32434.c

index 71f7d27b0d4cccf28dba3777b0a8848de53f82c4..f31218e17d3c1437f4f8ef15a7d1855ddc6a32fb 100644 (file)
--- a/arch/mips/pci/pci-rc32434.c
+++ b/arch/mips/pci/pci-rc32434.c
@@ -118,7 +118,7 @@ static int __init rc32434_pcibridge_init(void)
         if (!((pcicvalue == PCIM_H_EA) ||
               (pcicvalue == PCIM_H_IA_FIX) ||
               (pcicvalue == PCIM_H_IA_RR))) {
-               pr_err(KERN_ERR "PCI init error!!!\n");
+               pr_err("PCI init error!!!\n");
                 /* Not in Host Mode, return ERROR */
                 return -1;
         }
diff --git a/arch/mips/pnx8550/common/reset.c b/arch/mips/pnx8550/common/reset.c

index fadd8744a6bccfbf25283369608c66b16afeab04..e7a12ff304b9475c0db2e6097989c510b288a3c4 100644 (file)
--- a/arch/mips/pnx8550/common/reset.c
+++ b/arch/mips/pnx8550/common/reset.c
@@ -22,29 +22,19 @@
   */
  #include <linux/kernel.h>
  
+#include <asm/processor.h>
  #include <asm/reboot.h>
  #include <glb.h>
  
  void pnx8550_machine_restart(char *command)
  {
-       char head[] = "************* Machine restart *************";
-       char foot[] = "*******************************************";
-
-       printk("\n\n");
-       printk("%s\n", head);
-       if (command != NULL)
-               printk("* %s\n", command);
-       printk("%s\n", foot);
-
         PNX8550_RST_CTL = PNX8550_RST_DO_SW_RST;
  }
  
  void pnx8550_machine_halt(void)
  {
-       printk("*** Machine halt. (Not implemented) ***\n");
-}
-
-void pnx8550_machine_power_off(void)
-{
-       printk("*** Machine power off.  (Not implemented) ***\n");
+       while (1) {
+               if (cpu_wait)
+                       cpu_wait();
+       }
  }
diff --git a/arch/mips/pnx8550/common/setup.c b/arch/mips/pnx8550/common/setup.c

index 64246c9c875c51d09e5c3861ca0e6f1096d50ac5..43cb3945fdbfffb8b355789e237a9b27df21abef 100644 (file)
--- a/arch/mips/pnx8550/common/setup.c
+++ b/arch/mips/pnx8550/common/setup.c
@@ -44,7 +44,6 @@
  extern void __init board_setup(void);
  extern void pnx8550_machine_restart(char *);
  extern void pnx8550_machine_halt(void);
-extern void pnx8550_machine_power_off(void);
  extern struct resource ioport_resource;
  extern struct resource iomem_resource;
  extern char *prom_getcmdline(void);
@@ -100,7 +99,7 @@ void __init plat_mem_setup(void)
  
          _machine_restart = pnx8550_machine_restart;
          _machine_halt = pnx8550_machine_halt;
-        pm_power_off = pnx8550_machine_power_off;
+        pm_power_off = pnx8550_machine_halt;
  
         /* Clear the Global 2 Register, PCI Inta Output Enable Registers
            Bit 1:Enable DAC Powerdown
diff --git a/arch/mn10300/kernel/module.c b/arch/mn10300/kernel/module.c

index 6aea7fd76993b931f31f2dda76e72aecc1e31b4d..196a111e2e2937b134217356991c0ca2f68bda05 100644 (file)
--- a/arch/mn10300/kernel/module.c
+++ b/arch/mn10300/kernel/module.c
@@ -206,7 +206,7 @@ int module_finalize(const Elf_Ehdr *hdr,
                     const Elf_Shdr *sechdrs,
                     struct module *me)
  {
-       return module_bug_finalize(hdr, sechdrs, me);
+       return 0;
  }
  
  /*
@@ -214,5 +214,4 @@ int module_finalize(const Elf_Ehdr *hdr,
   */
  void module_arch_cleanup(struct module *mod)
  {
-       module_bug_cleanup(mod);
  }
diff --git a/arch/mn10300/mm/cache.c b/arch/mn10300/mm/cache.c

index 1b76719ec1c37b1686a648cd07f8c5e7baaf9ce1..9261217e8d2c5741bb500b829bbd7663859b5541 100644 (file)
--- a/arch/mn10300/mm/cache.c
+++ b/arch/mn10300/mm/cache.c
@@ -54,13 +54,30 @@ EXPORT_SYMBOL(flush_icache_page);
  void flush_icache_range(unsigned long start, unsigned long end)
  {
  #ifdef CONFIG_MN10300_CACHE_WBACK
-       unsigned long addr, size, off;
+       unsigned long addr, size, base, off;
         struct page *page;
         pgd_t *pgd;
         pud_t *pud;
         pmd_t *pmd;
         pte_t *ppte, pte;
  
+       if (end > 0x80000000UL) {
+               /* addresses above 0xa0000000 do not go through the cache */
+               if (end > 0xa0000000UL) {
+                       end = 0xa0000000UL;
+                       if (start >= end)
+                               return;
+               }
+
+               /* kernel addresses between 0x80000000 and 0x9fffffff do not
+                * require page tables, so we just map such addresses directly */
+               base = (start >= 0x80000000UL) ? start : 0x80000000UL;
+               mn10300_dcache_flush_range(base, end);
+               if (base == start)
+                       goto invalidate;
+               end = base;
+       }
+
         for (; start < end; start += size) {
                 /* work out how much of the page to flush */
                 off = start & (PAGE_SIZE - 1);
@@ -104,6 +121,7 @@ void flush_icache_range(unsigned long start, unsigned long end)
         }
  #endif
  
+invalidate:
         mn10300_icache_inv();
  }
  EXPORT_SYMBOL(flush_icache_range);
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c

index 159a2b81e90c630db82eb9834c2096df7eb66896..6e81bb596e5b476e598e4a7309e4aba80ba0a322 100644 (file)
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -941,11 +941,10 @@ int module_finalize(const Elf_Ehdr *hdr,
         nsyms = newptr - (Elf_Sym *)symhdr->sh_addr;
         DEBUGP("NEW num_symtab %lu\n", nsyms);
         symhdr->sh_size = nsyms * sizeof(Elf_Sym);
-       return module_bug_finalize(hdr, sechdrs, me);
+       return 0;
  }
  
  void module_arch_cleanup(struct module *mod)
  {
         deregister_unwind_table(mod);
-       module_bug_cleanup(mod);
  }
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c

index 477c663e014043a5c08fbaf51e82853005391344..49cee9df225be8bfc6b06a429ee9243d10484439 100644 (file)
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -63,11 +63,6 @@ int module_finalize(const Elf_Ehdr *hdr,
                 const Elf_Shdr *sechdrs, struct module *me)
  {
         const Elf_Shdr *sect;
-       int err;
-
-       err = module_bug_finalize(hdr, sechdrs, me);
-       if (err)
-               return err;
  
         /* Apply feature fixups */
         sect = find_section(hdr, sechdrs, "__ftr_fixup");
@@ -101,5 +96,4 @@ int module_finalize(const Elf_Ehdr *hdr,
  
  void module_arch_cleanup(struct module *mod)
  {
-       module_bug_cleanup(mod);
  }
diff --git a/arch/powerpc/platforms/512x/clock.c b/arch/powerpc/platforms/512x/clock.c

index 5b243bd3eb3b699ee6a0712340c9df51db2ed948..3dc2a8d262b8731aa4995b6c4b20621f06742fef 100644 (file)
--- a/arch/powerpc/platforms/512x/clock.c
+++ b/arch/powerpc/platforms/512x/clock.c
@@ -57,7 +57,7 @@ static struct clk *mpc5121_clk_get(struct device *dev, const char *id)
         int id_match = 0;
  
         if (dev == NULL || id == NULL)
-               return NULL;
+               return clk;
  
         mutex_lock(&clocks_mutex);
         list_for_each_entry(p, &clocks, node) {
diff --git a/arch/powerpc/platforms/52xx/efika.c b/arch/powerpc/platforms/52xx/efika.c

index 45c0cb9b67e6774958c621b6e3c8055d43e46163..18c10482019811fd4fa25cbf1b0972ef87d4d0c7 100644 (file)
--- a/arch/powerpc/platforms/52xx/efika.c
+++ b/arch/powerpc/platforms/52xx/efika.c
@@ -99,7 +99,7 @@ static void __init efika_pcisetup(void)
         if (bus_range == NULL || len < 2 * sizeof(int)) {
                 printk(KERN_WARNING EFIKA_PLATFORM_NAME
                        ": Can't get bus-range for %s\n", pcictrl->full_name);
-               return;
+               goto out_put;
         }
  
         if (bus_range[1] == bus_range[0])
@@ -111,12 +111,12 @@ static void __init efika_pcisetup(void)
         printk(" controlled by %s\n", pcictrl->full_name);
         printk("\n");
  
-       hose = pcibios_alloc_controller(of_node_get(pcictrl));
+       hose = pcibios_alloc_controller(pcictrl);
         if (!hose) {
                 printk(KERN_WARNING EFIKA_PLATFORM_NAME
                        ": Can't allocate PCI controller structure for %s\n",
                        pcictrl->full_name);
-               return;
+               goto out_put;
         }
  
         hose->first_busno = bus_range[0];
@@ -124,6 +124,9 @@ static void __init efika_pcisetup(void)
         hose->ops = &rtas_pci_ops;
  
         pci_process_bridge_OF_ranges(hose, pcictrl, 0);
+       return;
+out_put:
+       of_node_put(pcictrl);
  }
  
  #else
diff --git a/arch/powerpc/platforms/52xx/mpc52xx_common.c b/arch/powerpc/platforms/52xx/mpc52xx_common.c

index 6e905314ad5d66035daf38a514adbfaa2aa60dfa..41f3a7eda1def670c1c12864de488788fc27ff98 100644 (file)
--- a/arch/powerpc/platforms/52xx/mpc52xx_common.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_common.c
@@ -325,12 +325,16 @@ int mpc5200_psc_ac97_gpio_reset(int psc_number)
         clrbits32(&simple_gpio->simple_dvo, sync | out);
         clrbits8(&wkup_gpio->wkup_dvo, reset);
  
-       /* wait at lease 1 us */
-       udelay(2);
+       /* wait for 1 us */
+       udelay(1);
  
         /* Deassert reset */
         setbits8(&wkup_gpio->wkup_dvo, reset);
  
+       /* wait at least 200ns */
+       /* 7 ~= (200ns * timebase) / ns2sec */
+       __delay(7);
+
         /* Restore pin-muxing */
         out_be32(&simple_gpio->port_config, mux);
  
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c

index 22cfd634c35531b7f8a1d4057f92b4af72e20d1a..f7167ee4604cf7033e30eb3845aa8f9ba9fd538b 100644 (file)
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -407,10 +407,9 @@ int module_finalize(const Elf_Ehdr *hdr,
  {
         vfree(me->arch.syminfo);
         me->arch.syminfo = NULL;
-       return module_bug_finalize(hdr, sechdrs, me);
+       return 0;
  }
  
  void module_arch_cleanup(struct module *mod)
  {
-       module_bug_cleanup(mod);
  }
diff --git a/arch/sh/kernel/module.c b/arch/sh/kernel/module.c

index 43adddfe4c04b6d2eee9acfa8cad99a7788dfcc4..ae0be697a89e4b220f527a2bdb34f4da7ec7d318 100644 (file)
--- a/arch/sh/kernel/module.c
+++ b/arch/sh/kernel/module.c
@@ -149,13 +149,11 @@ int module_finalize(const Elf_Ehdr *hdr,
         int ret = 0;
  
         ret |= module_dwarf_finalize(hdr, sechdrs, me);
-       ret |= module_bug_finalize(hdr, sechdrs, me);
  
         return ret;
  }
  
  void module_arch_cleanup(struct module *mod)
  {
-       module_bug_cleanup(mod);
         module_dwarf_cleanup(mod);
  }
diff --git a/arch/um/drivers/hostaudio_kern.c b/arch/um/drivers/hostaudio_kern.c

index 0c46e398cd8f313d89a3ff07187916aa6021b93f..63c740a85b4cca0ed9333091593885cbdfface2e 100644 (file)
--- a/arch/um/drivers/hostaudio_kern.c
+++ b/arch/um/drivers/hostaudio_kern.c
@@ -40,6 +40,11 @@ static char *mixer = HOSTAUDIO_DEV_MIXER;
  "    This is used to specify the host mixer device to the hostaudio driver.\n"\
  "    The default is \"" HOSTAUDIO_DEV_MIXER "\".\n\n"
  
+module_param(dsp, charp, 0644);
+MODULE_PARM_DESC(dsp, DSP_HELP);
+module_param(mixer, charp, 0644);
+MODULE_PARM_DESC(mixer, MIXER_HELP);
+
  #ifndef MODULE
  static int set_dsp(char *name, int *add)
  {
@@ -56,15 +61,6 @@ static int set_mixer(char *name, int *add)
  }
  
  __uml_setup("mixer=", set_mixer, "mixer=<mixer device>\n" MIXER_HELP);
-
-#else /*MODULE*/
-
-module_param(dsp, charp, 0644);
-MODULE_PARM_DESC(dsp, DSP_HELP);
-
-module_param(mixer, charp, 0644);
-MODULE_PARM_DESC(mixer, MIXER_HELP);
-
  #endif
  
  /* /dev/dsp file operations */
diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c

index 2ab233ba32c1564f8323884017108b0d53978366..47d0c37897d5874d0bfb95d3d2b87441bd2df6a0 100644 (file)
--- a/arch/um/drivers/net_kern.c
+++ b/arch/um/drivers/net_kern.c
@@ -255,18 +255,6 @@ static void uml_net_tx_timeout(struct net_device *dev)
         netif_wake_queue(dev);
  }
  
-static int uml_net_set_mac(struct net_device *dev, void *addr)
-{
-       struct uml_net_private *lp = netdev_priv(dev);
-       struct sockaddr *hwaddr = addr;
-
-       spin_lock_irq(&lp->lock);
-       eth_mac_addr(dev, hwaddr->sa_data);
-       spin_unlock_irq(&lp->lock);
-
-       return 0;
-}
-
  static int uml_net_change_mtu(struct net_device *dev, int new_mtu)
  {
         dev->mtu = new_mtu;
@@ -373,7 +361,7 @@ static const struct net_device_ops uml_netdev_ops = {
         .ndo_start_xmit         = uml_net_start_xmit,
         .ndo_set_multicast_list = uml_net_set_multicast_list,
         .ndo_tx_timeout         = uml_net_tx_timeout,
-       .ndo_set_mac_address    = uml_net_set_mac,
+       .ndo_set_mac_address    = eth_mac_addr,
         .ndo_change_mtu         = uml_net_change_mtu,
         .ndo_validate_addr      = eth_validate_addr,
  };
@@ -472,7 +460,8 @@ static void eth_configure(int n, void *init, char *mac,
             ((*transport->user->init)(&lp->user, dev) != 0))
                 goto out_unregister;
  
-       eth_mac_addr(dev, device->mac);
+       /* don't use eth_mac_addr, it will not work here */
+       memcpy(dev->dev_addr, device->mac, ETH_ALEN);
         dev->mtu = transport->user->mtu;
         dev->netdev_ops = &uml_netdev_ops;
         dev->ethtool_ops = &uml_net_ethtool_ops;
diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c

index 1bcd208c459f609ab3634f951e1c11b3b2e50038..9734994cba1e86c53f60dead72952f7feb3ab02d 100644 (file)
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -163,6 +163,7 @@ struct ubd {
         struct scatterlist sg[MAX_SG];
         struct request *request;
         int start_sg, end_sg;
+       sector_t rq_pos;
  };
  
  #define DEFAULT_COW { \
@@ -187,6 +188,7 @@ struct ubd {
         .request =              NULL, \
         .start_sg =             0, \
         .end_sg =               0, \
+       .rq_pos =               0, \
  }
  
  /* Protected by ubd_lock */
@@ -1228,7 +1230,6 @@ static void do_ubd_request(struct request_queue *q)
  {
         struct io_thread_req *io_req;
         struct request *req;
-       sector_t sector;
         int n;
  
         while(1){
@@ -1239,12 +1240,12 @@ static void do_ubd_request(struct request_queue *q)
                                 return;
  
                         dev->request = req;
+                       dev->rq_pos = blk_rq_pos(req);
                         dev->start_sg = 0;
                         dev->end_sg = blk_rq_map_sg(q, req, dev->sg);
                 }
  
                 req = dev->request;
-               sector = blk_rq_pos(req);
                 while(dev->start_sg < dev->end_sg){
                         struct scatterlist *sg = &dev->sg[dev->start_sg];
  
@@ -1256,10 +1257,9 @@ static void do_ubd_request(struct request_queue *q)
                                 return;
                         }
                         prepare_request(req, io_req,
-                                       (unsigned long long)sector << 9,
+                                       (unsigned long long)dev->rq_pos << 9,
                                         sg->offset, sg->length, sg_page(sg));
  
-                       sector += sg->length >> 9;
                         n = os_write_file(thread_fd, &io_req,
                                           sizeof(struct io_thread_req *));
                         if(n != sizeof(struct io_thread_req *)){
@@ -1272,6 +1272,7 @@ static void do_ubd_request(struct request_queue *q)
                                 return;
                         }
  
+                       dev->rq_pos += sg->length >> 9;
                         dev->start_sg++;
                 }
                 dev->end_sg = 0;
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c

index 0350311906ae731ca91e9ecedbc3e7acbaa668d0..2d93bdbc9ac026f2c0ef1e3fcd9c9a208b609787 100644 (file)
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -34,7 +34,7 @@
  #include <asm/ia32.h>
  
  #undef WARN_OLD
-#undef CORE_DUMP /* probably broken */
+#undef CORE_DUMP /* definitely broken */
  
  static int load_aout_binary(struct linux_binprm *, struct pt_regs *regs);
  static int load_aout_library(struct file *);
@@ -131,21 +131,15 @@ static void set_brk(unsigned long start, unsigned long end)
   * macros to write out all the necessary info.
   */
  
-static int dump_write(struct file *file, const void *addr, int nr)
-{
-       return file->f_op->write(file, addr, nr, &file->f_pos) == nr;
-}
+#include <linux/coredump.h>
  
  #define DUMP_WRITE(addr, nr)                        \
         if (!dump_write(file, (void *)(addr), (nr))) \
                 goto end_coredump;
  
-#define DUMP_SEEK(offset)                                              \
-       if (file->f_op->llseek) {                                       \
-               if (file->f_op->llseek(file, (offset), 0) != (offset))  \
-                       goto end_coredump;                              \
-       } else                                                          \
-               file->f_pos = (offset)
+#define DUMP_SEEK(offset)              \
+       if (!dump_seek(file, offset))   \
+               goto end_coredump;
  
  #define START_DATA()   (u.u_tsize << PAGE_SHIFT)
  #define START_STACK(u) (u.start_stack)
@@ -217,12 +211,6 @@ static int aout_core_dump(long signr, struct pt_regs *regs, struct file *file,
                 dump_size = dump.u_ssize << PAGE_SHIFT;
                 DUMP_WRITE(dump_start, dump_size);
         }
-       /*
-        * Finally dump the task struct.  Not be used by gdb, but
-        * could be useful
-        */
-       set_fs(KERNEL_DS);
-       DUMP_WRITE(current, sizeof(*current));
  end_coredump:
         set_fs(fs);
         return has_dumped;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h

index 502e53f999cf28a25cc2b00f1766bd1ffb2f9ed2..c52e2eb40a1e254339658481621be634b427a8f0 100644 (file)
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -652,20 +652,6 @@ static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
         return (struct kvm_mmu_page *)page_private(page);
  }
  
-static inline u16 kvm_read_fs(void)
-{
-       u16 seg;
-       asm("mov %%fs, %0" : "=g"(seg));
-       return seg;
-}
-
-static inline u16 kvm_read_gs(void)
-{
-       u16 seg;
-       asm("mov %%gs, %0" : "=g"(seg));
-       return seg;
-}
-
  static inline u16 kvm_read_ldt(void)
  {
         u16 ldt;
@@ -673,16 +659,6 @@ static inline u16 kvm_read_ldt(void)
         return ldt;
  }
  
-static inline void kvm_load_fs(u16 sel)
-{
-       asm("mov %0, %%fs" : : "rm"(sel));
-}
-
-static inline void kvm_load_gs(u16 sel)
-{
-       asm("mov %0, %%gs" : : "rm"(sel));
-}
-
  static inline void kvm_load_ldt(u16 sel)
  {
         asm("lldt %0" : : "rm"(sel));
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c

index fb7a5f052e2b8766d11115e3f7fc174fadf6ac2f..fb16f17e59bea7dcde91e6ad6275c7bad5bbdfb8 100644 (file)
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -61,7 +61,7 @@ struct cstate_entry {
                 unsigned int ecx;
         } states[ACPI_PROCESSOR_MAX_POWER];
  };
-static struct cstate_entry *cpu_cstate_entry;  /* per CPU ptr */
+static struct cstate_entry __percpu *cpu_cstate_entry; /* per CPU ptr */
  
  static short mwait_supported[ACPI_PROCESSOR_MAX_POWER];
  
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c

index f1efebaf55105fa835ac7938c1654295fdd81562..5c5b8f3dddb58686ba4afc8b314237c0c318370b 100644 (file)
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -306,14 +306,19 @@ void arch_init_copy_chip_data(struct irq_desc *old_desc,
  
         old_cfg = old_desc->chip_data;
  
-       memcpy(cfg, old_cfg, sizeof(struct irq_cfg));
+       cfg->vector = old_cfg->vector;
+       cfg->move_in_progress = old_cfg->move_in_progress;
+       cpumask_copy(cfg->domain, old_cfg->domain);
+       cpumask_copy(cfg->old_domain, old_cfg->old_domain);
  
         init_copy_irq_2_pin(old_cfg, cfg, node);
  }
  
-static void free_irq_cfg(struct irq_cfg *old_cfg)
+static void free_irq_cfg(struct irq_cfg *cfg)
  {
-       kfree(old_cfg);
+       free_cpumask_var(cfg->domain);
+       free_cpumask_var(cfg->old_domain);
+       kfree(cfg);
  }
  
  void arch_free_chip_data(struct irq_desc *old_desc, struct irq_desc *desc)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c

index 490dac63c2d21e90ab3af70c543b012f1c3f43b5..f2f9ac7da25ccfba6d5ba7ea44b63899ba672e97 100644 (file)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -545,7 +545,7 @@ void __cpuinit cpu_detect(struct cpuinfo_x86 *c)
         }
  }
  
-static void __cpuinit get_cpu_cap(struct cpuinfo_x86 *c)
+void __cpuinit get_cpu_cap(struct cpuinfo_x86 *c)
  {
         u32 tfms, xlvl;
         u32 ebx;
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h

index 3624e8a0f71bf72e4c3cd86abcedfbb1c53d9b4d..f668bb1f7d432811bc3f773d777ca2596b6f6e33 100644 (file)
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -33,5 +33,6 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[],
                             *const __x86_cpu_dev_end[];
  
  extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+extern void get_cpu_cap(struct cpuinfo_x86 *c);
  
  #endif
diff --git a/arch/x86/kernel/cpu/cpufreq/pcc-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/pcc-cpufreq.c

index 994230d4dc4e545986a8c982629589905f0f224d..4f6f679f27990198640f9a1e139ea3a838b05eb8 100644 (file)
--- a/arch/x86/kernel/cpu/cpufreq/pcc-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/pcc-cpufreq.c
@@ -368,16 +368,22 @@ static int __init pcc_cpufreq_do_osc(acpi_handle *handle)
                 return -ENODEV;
  
         out_obj = output.pointer;
-       if (out_obj->type != ACPI_TYPE_BUFFER)
-               return -ENODEV;
+       if (out_obj->type != ACPI_TYPE_BUFFER) {
+               ret = -ENODEV;
+               goto out_free;
+       }
  
         errors = *((u32 *)out_obj->buffer.pointer) & ~(1 << 0);
-       if (errors)
-               return -ENODEV;
+       if (errors) {
+               ret = -ENODEV;
+               goto out_free;
+       }
  
         supported = *((u32 *)(out_obj->buffer.pointer + 4));
-       if (!(supported & 0x1))
-               return -ENODEV;
+       if (!(supported & 0x1)) {
+               ret = -ENODEV;
+               goto out_free;
+       }
  
  out_free:
         kfree(output.pointer);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c

index 85f69cdeae1020a18e1c9097c8da276a8e50b992..b4389441efbbd8e791289aa936369e5e73917c18 100644 (file)
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -39,6 +39,7 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
                         misc_enable &= ~MSR_IA32_MISC_ENABLE_LIMIT_CPUID;
                         wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
                         c->cpuid_level = cpuid_eax(0);
+                       get_cpu_cap(c);
                 }
         }
  
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c

index 5e975298fa819ff3ca648e647f531700edb1f408..39aaee5c1ab23f742dd749500ca09834e0d39bc4 100644 (file)
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -141,6 +141,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
                                 address = (low & MASK_BLKPTR_LO) >> 21;
                                 if (!address)
                                         break;
+
                                 address += MCG_XBLK_ADDR;
                         } else
                                 ++address;
@@ -148,12 +149,8 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
                         if (rdmsr_safe(address, &low, &high))
                                 break;
  
-                       if (!(high & MASK_VALID_HI)) {
-                               if (block)
-                                       continue;
-                               else
-                                       break;
-                       }
+                       if (!(high & MASK_VALID_HI))
+                               continue;
  
                         if (!(high & MASK_CNTP_HI)  ||
                              (high & MASK_LOCKED_HI))
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c

index d9368eeda3090eb9f53482704e000e600b6237e3..169d8804a9f8eed37faf06503e1b8243a566cb37 100644 (file)
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -216,7 +216,7 @@ static __cpuinit int thermal_throttle_add_dev(struct sys_device *sys_dev,
                 err = sysfs_add_file_to_group(&sys_dev->kobj,
                                               &attr_core_power_limit_count.attr,
                                               thermal_attr_group.name);
-       if (cpu_has(c, X86_FEATURE_PTS))
+       if (cpu_has(c, X86_FEATURE_PTS)) {
                 err = sysfs_add_file_to_group(&sys_dev->kobj,
                                               &attr_package_throttle_count.attr,
                                               thermal_attr_group.name);
@@ -224,6 +224,7 @@ static __cpuinit int thermal_throttle_add_dev(struct sys_device *sys_dev,
                         err = sysfs_add_file_to_group(&sys_dev->kobj,
                                         &attr_package_power_limit_count.attr,
                                         thermal_attr_group.name);
+       }
  
         return err;
  }
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c

index b560db3305be16ff954fc416137d17b17189b5e7..2490151739921e074085a5d4a39a6cfb783f7187 100644 (file)
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -660,8 +660,12 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
         for (idx = 0; idx < x86_pmu.num_counters; idx++) {
                 int overflow;
  
-               if (!test_bit(idx, cpuc->active_mask))
+               if (!test_bit(idx, cpuc->active_mask)) {
+                       /* catch in-flight IRQs */
+                       if (__test_and_clear_bit(idx, cpuc->running))
+                               handled++;
                         continue;
+               }
  
                 event = cpuc->events[idx];
                 hwc = &event->hw;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c

index 410fdb3f1939ba98cfabea61071d0ffe403ef1a3..7494999141b3ae9c9ab67302481a2e6c15e969e6 100644 (file)
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -506,7 +506,7 @@ static int hpet_assign_irq(struct hpet_dev *dev)
  {
         unsigned int irq;
  
-       irq = create_irq();
+       irq = create_irq_nr(0, -1);
         if (!irq)
                 return -EINVAL;
  
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c

index e0bc186d7501f123265ae288ae071e772016e89b..1c355c550960ab75279c644b52269cd5a368239a 100644 (file)
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -239,11 +239,10 @@ int module_finalize(const Elf_Ehdr *hdr,
                 apply_paravirt(pseg, pseg + para->sh_size);
         }
  
-       return module_bug_finalize(hdr, sechdrs, me);
+       return 0;
  }
  
  void module_arch_cleanup(struct module *mod)
  {
         alternatives_smp_module_del(mod);
-       module_bug_cleanup(mod);
  }
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c

index bc5b9b8d4a33117259882835bfb884f4f8f37656..8a3f9f64f86f9e7fee5bc5112bf50a04fbe37b15 100644 (file)
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -766,7 +766,6 @@ static void init_vmcb(struct vcpu_svm *svm)
  
         control->iopm_base_pa = iopm_base;
         control->msrpm_base_pa = __pa(svm->msrpm);
-       control->tsc_offset = 0;
         control->int_ctl = V_INTR_MASKING_MASK;
  
         init_seg(&save->es);
@@ -902,6 +901,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
         svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
         svm->asid_generation = 0;
         init_vmcb(svm);
+       svm->vmcb->control.tsc_offset = 0-native_read_tsc();
  
         err = fx_init(&svm->vcpu);
         if (err)
@@ -3163,8 +3163,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
         sync_lapic_to_cr8(vcpu);
  
         save_host_msrs(vcpu);
-       fs_selector = kvm_read_fs();
-       gs_selector = kvm_read_gs();
+       savesegment(fs, fs_selector);
+       savesegment(gs, gs_selector);
         ldt_selector = kvm_read_ldt();
         svm->vmcb->save.cr2 = vcpu->arch.cr2;
         /* required for live migration with NPT */
@@ -3251,10 +3251,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
         vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
         vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
  
-       kvm_load_fs(fs_selector);
-       kvm_load_gs(gs_selector);
-       kvm_load_ldt(ldt_selector);
         load_host_msrs(vcpu);
+       loadsegment(fs, fs_selector);
+#ifdef CONFIG_X86_64
+       load_gs_index(gs_selector);
+       wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs);
+#else
+       loadsegment(gs, gs_selector);
+#endif
+       kvm_load_ldt(ldt_selector);
  
         reload_tss(vcpu);
  
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c

index 49b25eee25acc075538a411fc24c23a326f02fd4..7bddfab120139435d1d10885520cc5293fefcc07 100644 (file)
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -803,7 +803,7 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
          */
         vmx->host_state.ldt_sel = kvm_read_ldt();
         vmx->host_state.gs_ldt_reload_needed = vmx->host_state.ldt_sel;
-       vmx->host_state.fs_sel = kvm_read_fs();
+       savesegment(fs, vmx->host_state.fs_sel);
         if (!(vmx->host_state.fs_sel & 7)) {
                 vmcs_write16(HOST_FS_SELECTOR, vmx->host_state.fs_sel);
                 vmx->host_state.fs_reload_needed = 0;
@@ -811,7 +811,7 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
                 vmcs_write16(HOST_FS_SELECTOR, 0);
                 vmx->host_state.fs_reload_needed = 1;
         }
-       vmx->host_state.gs_sel = kvm_read_gs();
+       savesegment(gs, vmx->host_state.gs_sel);
         if (!(vmx->host_state.gs_sel & 7))
                 vmcs_write16(HOST_GS_SELECTOR, vmx->host_state.gs_sel);
         else {
@@ -841,27 +841,21 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
  
  static void __vmx_load_host_state(struct vcpu_vmx *vmx)
  {
-       unsigned long flags;
-
         if (!vmx->host_state.loaded)
                 return;
  
         ++vmx->vcpu.stat.host_state_reload;
         vmx->host_state.loaded = 0;
         if (vmx->host_state.fs_reload_needed)
-               kvm_load_fs(vmx->host_state.fs_sel);
+               loadsegment(fs, vmx->host_state.fs_sel);
         if (vmx->host_state.gs_ldt_reload_needed) {
                 kvm_load_ldt(vmx->host_state.ldt_sel);
-               /*
-                * If we have to reload gs, we must take care to
-                * preserve our gs base.
-                */
-               local_irq_save(flags);
-               kvm_load_gs(vmx->host_state.gs_sel);
  #ifdef CONFIG_X86_64
-               wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE));
+               load_gs_index(vmx->host_state.gs_sel);
+               wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs);
+#else
+               loadsegment(gs, vmx->host_state.gs_sel);
  #endif
-               local_irq_restore(flags);
         }
         reload_tss();
  #ifdef CONFIG_X86_64
@@ -2589,8 +2583,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
         vmcs_write16(HOST_CS_SELECTOR, __KERNEL_CS);  /* 22.2.4 */
         vmcs_write16(HOST_DS_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
         vmcs_write16(HOST_ES_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
-       vmcs_write16(HOST_FS_SELECTOR, kvm_read_fs());    /* 22.2.4 */
-       vmcs_write16(HOST_GS_SELECTOR, kvm_read_gs());    /* 22.2.4 */
+       vmcs_write16(HOST_FS_SELECTOR, 0);            /* 22.2.4 */
+       vmcs_write16(HOST_GS_SELECTOR, 0);            /* 22.2.4 */
         vmcs_write16(HOST_SS_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
  #ifdef CONFIG_X86_64
         rdmsrl(MSR_FS_BASE, a);
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c

index f9897f7a9ef1e25cfa81e9190a3449bec9e35f70..9c0d0d399c307678a09668155d9067139ab30416 100644 (file)
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -420,9 +420,11 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
                 return -1;
         }
  
-       for_each_node_mask(i, nodes_parsed)
-               e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
-                                               nodes[i].end >> PAGE_SHIFT);
+       for (i = 0; i < num_node_memblks; i++)
+               e820_register_active_regions(memblk_nodeid[i],
+                               node_memblk_range[i].start >> PAGE_SHIFT,
+                               node_memblk_range[i].end >> PAGE_SHIFT);
+
         /* for out of order entries in SRAT */
         sort_node_map();
         if (!nodes_cover_memory(nodes)) {
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c

index 009b819f48d0a9fee5d3677e343056fe8d9f4c79..f1575c9a2572074cad464ae2f9c51ee88e63bce3 100644 (file)
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -674,6 +674,7 @@ static int __init ppro_init(char **cpu_type)
         case 0x0f:
         case 0x16:
         case 0x17:
+       case 0x1d:
                 *cpu_type = "i386/core_2";
                 break;
         case 0x1a:
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c

index 1a5353a753fcd10e1330c03d1af43a6fe9c5a5b7..b2bb5aa3b0540e42847a7664aa529cf5d2c0fb83 100644 (file)
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -489,8 +489,9 @@ static void xen_hvm_setup_cpu_clockevents(void)
  __init void xen_hvm_init_time_ops(void)
  {
         /* vector callback is needed otherwise we cannot receive interrupts
-        * on cpu > 0 */
-       if (!xen_have_vector_callback && num_present_cpus() > 1)
+        * on cpu > 0 and at this point we don't know how many cpus are
+        * available */
+       if (!xen_have_vector_callback)
                 return;
         if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
                 printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
diff --git a/block/bsg.c b/block/bsg.c

index 82d58829ba591eeef13a3ed4eca96b8a019c4d85..0c00870553a3fca2c58c325b780b92689fc677a6 100644 (file)
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -426,7 +426,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
         /*
          * fill in all the output members
          */
-       hdr->device_status = status_byte(rq->errors);
+       hdr->device_status = rq->errors & 0xff;
         hdr->transport_status = host_byte(rq->errors);
         hdr->driver_status = driver_byte(rq->errors);
         hdr->info = 0;
diff --git a/block/elevator.c b/block/elevator.c

index 205b09a5bd9ead9e64d7bcfb60cec924f1a22cbd..4e11559aa2b02d78c4d90c104e81e7a2e84aa05d 100644 (file)
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -938,6 +938,7 @@ int elv_register_queue(struct request_queue *q)
                         }
                 }
                 kobject_uevent(&e->kobj, KOBJ_ADD);
+               e->registered = 1;
         }
         return error;
  }
@@ -947,6 +948,7 @@ static void __elv_unregister_queue(struct elevator_queue *e)
  {
         kobject_uevent(&e->kobj, KOBJ_REMOVE);
         kobject_del(&e->kobj);
+       e->registered = 0;
  }
  
  void elv_unregister_queue(struct request_queue *q)
@@ -1042,11 +1044,13 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
  
         spin_unlock_irq(q->queue_lock);
  
-       __elv_unregister_queue(old_elevator);
+       if (old_elevator->registered) {
+               __elv_unregister_queue(old_elevator);
  
-       err = elv_register_queue(q);
-       if (err)
-               goto fail_register;
+               err = elv_register_queue(q);
+               if (err)
+                       goto fail_register;
+       }
  
         /*
          * finally exit old elevator and turn off BYPASS.
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig

index b811f2173f6f167c84700040fe2720bafb43156f..88681aca88c581891399e18b056d9a6df94a9e01 100644 (file)
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -105,7 +105,7 @@ config ACPI_EC_DEBUGFS
  
           Be aware that using this interface can confuse your Embedded
           Controller in a way that a normal reboot is not enough. You then
-         have to power of your system, and remove the laptop battery for
+         have to power off your system, and remove the laptop battery for
           some seconds.
           An Embedded Controller typically is available on laptops and reads
           sensor values like battery state and temperature.
diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c

index b76848c80be34729c56bd20d04f75f9f534cbb36..6b115f6c43133c211acd79c68a83785506a4357a 100644 (file)
--- a/drivers/acpi/acpi_pad.c
+++ b/drivers/acpi/acpi_pad.c
@@ -382,31 +382,32 @@ static void acpi_pad_remove_sysfs(struct acpi_device *device)
         device_remove_file(&device->dev, &dev_attr_rrtime);
  }
  
-/* Query firmware how many CPUs should be idle */
-static int acpi_pad_pur(acpi_handle handle, int *num_cpus)
+/*
+ * Query firmware how many CPUs should be idle
+ * return -1 on failure
+ */
+static int acpi_pad_pur(acpi_handle handle)
  {
         struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL};
         union acpi_object *package;
-       int rev, num, ret = -EINVAL;
+       int num = -1;
  
         if (ACPI_FAILURE(acpi_evaluate_object(handle, "_PUR", NULL, &buffer)))
-               return -EINVAL;
+               return num;
  
         if (!buffer.length || !buffer.pointer)
-               return -EINVAL;
+               return num;
  
         package = buffer.pointer;
-       if (package->type != ACPI_TYPE_PACKAGE || package->package.count != 2)
-               goto out;
-       rev = package->package.elements[0].integer.value;
-       num = package->package.elements[1].integer.value;
-       if (rev != 1 || num < 0)
-               goto out;
-       *num_cpus = num;
-       ret = 0;
-out:
+
+       if (package->type == ACPI_TYPE_PACKAGE &&
+               package->package.count == 2 &&
+               package->package.elements[0].integer.value == 1) /* rev 1 */
+
+               num = package->package.elements[1].integer.value;
+
         kfree(buffer.pointer);
-       return ret;
+       return num;
  }
  
  /* Notify firmware how many CPUs are idle */
@@ -433,7 +434,8 @@ static void acpi_pad_handle_notify(acpi_handle handle)
         uint32_t idle_cpus;
  
         mutex_lock(&isolated_cpus_lock);
-       if (acpi_pad_pur(handle, &num_cpus)) {
+       num_cpus = acpi_pad_pur(handle);
+       if (num_cpus < 0) {
                 mutex_unlock(&isolated_cpus_lock);
                 return;
         }
diff --git a/drivers/acpi/acpica/aclocal.h b/drivers/acpi/acpica/aclocal.h

index df85b53a674fc33105fa1434b3800db7ab26a423..7dad9160f20998112cc302d0a239c9fa27fcd429 100644 (file)
--- a/drivers/acpi/acpica/aclocal.h
+++ b/drivers/acpi/acpica/aclocal.h
@@ -854,6 +854,7 @@ struct acpi_bit_register_info {
         ACPI_BITMASK_POWER_BUTTON_STATUS   | \
         ACPI_BITMASK_SLEEP_BUTTON_STATUS   | \
         ACPI_BITMASK_RT_CLOCK_STATUS       | \
+       ACPI_BITMASK_PCIEXP_WAKE_DISABLE   | \
         ACPI_BITMASK_WAKE_STATUS)
  
  #define ACPI_BITMASK_TIMER_ENABLE               0x0001
diff --git a/drivers/acpi/acpica/exutils.c b/drivers/acpi/acpica/exutils.c

index 74c24d517f81768a6a00418db71c9f93be335b2e..4093522eed45692012b5da617c57187ef20c4458 100644 (file)
--- a/drivers/acpi/acpica/exutils.c
+++ b/drivers/acpi/acpica/exutils.c
@@ -109,7 +109,7 @@ void acpi_ex_enter_interpreter(void)
   *
   * DESCRIPTION: Reacquire the interpreter execution region from within the
   *              interpreter code. Failure to enter the interpreter region is a
- *              fatal system error. Used in  conjuction with
+ *              fatal system error. Used in  conjunction with
   *              relinquish_interpreter
   *
   ******************************************************************************/
diff --git a/drivers/acpi/acpica/rsutils.c b/drivers/acpi/acpica/rsutils.c

index 22cfcfbd9fff77cd4cd5bb77be80c244461b2df5..491191e6cf692bffe1811a22674a0eb0720c3771 100644 (file)
--- a/drivers/acpi/acpica/rsutils.c
+++ b/drivers/acpi/acpica/rsutils.c
@@ -149,7 +149,7 @@ acpi_rs_move_data(void *destination, void *source, u16 item_count, u8 move_type)
  
                         /*
                          * 16-, 32-, and 64-bit cases must use the move macros that perform
-                        * endian conversion and/or accomodate hardware that cannot perform
+                        * endian conversion and/or accommodate hardware that cannot perform
                          * misaligned memory transfers
                          */
                 case ACPI_RSC_MOVE16:
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig

index 907e350f1c7df58370cb0e68a399b91d903ec466..fca34ccfd294a782e3f05312d99d5b5fdef5bfb4 100644 (file)
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -34,6 +34,6 @@ config ACPI_APEI_ERST_DEBUG
         depends on ACPI_APEI
         help
           ERST is a way provided by APEI to save and retrieve hardware
-         error infomation to and from a persistent store. Enable this
+         error information to and from a persistent store. Enable this
           if you want to debugging and testing the ERST kernel support
           and firmware implementation.
diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c

index 73fd0c7487c1ae0b1311f4d7e1c048ddc809153a..4a904a4bf05f83a1b952ec2365811f8c5db931e1 100644 (file)
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -445,11 +445,15 @@ EXPORT_SYMBOL_GPL(apei_resources_sub);
  int apei_resources_request(struct apei_resources *resources,
                            const char *desc)
  {
-       struct apei_res *res, *res_bak;
+       struct apei_res *res, *res_bak = NULL;
         struct resource *r;
+       int rc;
  
-       apei_resources_sub(resources, &apei_resources_all);
+       rc = apei_resources_sub(resources, &apei_resources_all);
+       if (rc)
+               return rc;
  
+       rc = -EINVAL;
         list_for_each_entry(res, &resources->iomem, list) {
                 r = request_mem_region(res->start, res->end - res->start,
                                        desc);
@@ -475,7 +479,11 @@ int apei_resources_request(struct apei_resources *resources,
                 }
         }
  
-       apei_resources_merge(&apei_resources_all, resources);
+       rc = apei_resources_merge(&apei_resources_all, resources);
+       if (rc) {
+               pr_err(APEI_PFX "Fail to merge resources!\n");
+               goto err_unmap_ioport;
+       }
  
         return 0;
  err_unmap_ioport:
@@ -491,12 +499,13 @@ err_unmap_iomem:
                         break;
                 release_mem_region(res->start, res->end - res->start);
         }
-       return -EINVAL;
+       return rc;
  }
  EXPORT_SYMBOL_GPL(apei_resources_request);
  
  void apei_resources_release(struct apei_resources *resources)
  {
+       int rc;
         struct apei_res *res;
  
         list_for_each_entry(res, &resources->iomem, list)
@@ -504,7 +513,9 @@ void apei_resources_release(struct apei_resources *resources)
         list_for_each_entry(res, &resources->ioport, list)
                 release_region(res->start, res->end - res->start);
  
-       apei_resources_sub(&apei_resources_all, resources);
+       rc = apei_resources_sub(&apei_resources_all, resources);
+       if (rc)
+               pr_err(APEI_PFX "Fail to sub resources!\n");
  }
  EXPORT_SYMBOL_GPL(apei_resources_release);
  
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c

index 465c885938ee89a45db10f1f39d1ceb8cba41cf9..cf29df69380b8dd32585a074c53efc7af7a65471 100644 (file)
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -426,7 +426,9 @@ DEFINE_SIMPLE_ATTRIBUTE(error_inject_fops, NULL,
  
  static int einj_check_table(struct acpi_table_einj *einj_tab)
  {
-       if (einj_tab->header_length != sizeof(struct acpi_table_einj))
+       if ((einj_tab->header_length !=
+            (sizeof(struct acpi_table_einj) - sizeof(einj_tab->header)))
+           && (einj_tab->header_length != sizeof(struct acpi_table_einj)))
                 return -EINVAL;
         if (einj_tab->header.length < sizeof(struct acpi_table_einj))
                 return -EINVAL;
diff --git a/drivers/acpi/apei/erst-dbg.c b/drivers/acpi/apei/erst-dbg.c

index 5281ddda2777c99c40f0830c1d4e5022cc5b37aa..da1228a9a544f026bab6c549e65892d439367266 100644 (file)
--- a/drivers/acpi/apei/erst-dbg.c
+++ b/drivers/acpi/apei/erst-dbg.c
@@ -2,7 +2,7 @@
   * APEI Error Record Serialization Table debug support
   *
   * ERST is a way provided by APEI to save and retrieve hardware error
- * infomation to and from a persistent store. This file provide the
+ * information to and from a persistent store. This file provide the
   * debugging/testing support for ERST kernel support and firmware
   * implementation.
   *
@@ -111,11 +111,13 @@ retry:
                 goto out;
         }
         if (len > erst_dbg_buf_len) {
-               kfree(erst_dbg_buf);
+               void *p;
                 rc = -ENOMEM;
-               erst_dbg_buf = kmalloc(len, GFP_KERNEL);
-               if (!erst_dbg_buf)
+               p = kmalloc(len, GFP_KERNEL);
+               if (!p)
                         goto out;
+               kfree(erst_dbg_buf);
+               erst_dbg_buf = p;
                 erst_dbg_buf_len = len;
                 goto retry;
         }
@@ -150,11 +152,13 @@ static ssize_t erst_dbg_write(struct file *filp, const char __user *ubuf,
         if (mutex_lock_interruptible(&erst_dbg_mutex))
                 return -EINTR;
         if (usize > erst_dbg_buf_len) {
-               kfree(erst_dbg_buf);
+               void *p;
                 rc = -ENOMEM;
-               erst_dbg_buf = kmalloc(usize, GFP_KERNEL);
-               if (!erst_dbg_buf)
+               p = kmalloc(usize, GFP_KERNEL);
+               if (!p)
                         goto out;
+               kfree(erst_dbg_buf);
+               erst_dbg_buf = p;
                 erst_dbg_buf_len = usize;
         }
         rc = copy_from_user(erst_dbg_buf, ubuf, usize);
diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c

index 18645f4e83cdd2f22d526276b320dac631be8940..1211c03149e8c7c258fee89dd1109e1e6b901c26 100644 (file)
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -2,7 +2,7 @@
   * APEI Error Record Serialization Table support
   *
   * ERST is a way provided by APEI to save and retrieve hardware error
- * infomation to and from a persistent store.
+ * information to and from a persistent store.
   *
   * For more information about ERST, please refer to ACPI Specification
   * version 4.0, section 17.4.
@@ -266,13 +266,30 @@ static int erst_exec_move_data(struct apei_exec_context *ctx,
  {
         int rc;
         u64 offset;
+       void *src, *dst;
+
+       /* ioremap does not work in interrupt context */
+       if (in_interrupt()) {
+               pr_warning(ERST_PFX
+                          "MOVE_DATA can not be used in interrupt context");
+               return -EBUSY;
+       }
  
         rc = __apei_exec_read_register(entry, &offset);
         if (rc)
                 return rc;
-       memmove((void *)ctx->dst_base + offset,
-               (void *)ctx->src_base + offset,
-               ctx->var2);
+
+       src = ioremap(ctx->src_base + offset, ctx->var2);
+       if (!src)
+               return -ENOMEM;
+       dst = ioremap(ctx->dst_base + offset, ctx->var2);
+       if (!dst)
+               return -ENOMEM;
+
+       memmove(dst, src, ctx->var2);
+
+       iounmap(src);
+       iounmap(dst);
  
         return 0;
  }
@@ -750,7 +767,9 @@ __setup("erst_disable", setup_erst_disable);
  
  static int erst_check_table(struct acpi_table_erst *erst_tab)
  {
-       if (erst_tab->header_length != sizeof(struct acpi_table_erst))
+       if ((erst_tab->header_length !=
+            (sizeof(struct acpi_table_erst) - sizeof(erst_tab->header)))
+           && (erst_tab->header_length != sizeof(struct acpi_table_einj)))
                 return -EINVAL;
         if (erst_tab->header.length < sizeof(struct acpi_table_erst))
                 return -EINVAL;
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c

index 385a6059714a72dd32256685a2606586b92d771e..0d505e59214df73dfb40b67b7d5fc45a2d48e127 100644 (file)
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -302,7 +302,7 @@ static int __devinit ghes_probe(struct platform_device *ghes_dev)
         struct ghes *ghes = NULL;
         int rc = -EINVAL;
  
-       generic = ghes_dev->dev.platform_data;
+       generic = *(struct acpi_hest_generic **)ghes_dev->dev.platform_data;
         if (!generic->enabled)
                 return -ENODEV;
  
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c

index 343168d1826626c202171c3a53c5b5e4c2090a65..1a3508a7fe03f157c2e0144727bbeb5c244d2a46 100644 (file)
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -137,20 +137,23 @@ static int hest_parse_ghes_count(struct acpi_hest_header *hest_hdr, void *data)
  
  static int hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
  {
-       struct acpi_hest_generic *generic;
         struct platform_device *ghes_dev;
         struct ghes_arr *ghes_arr = data;
         int rc;
  
         if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
                 return 0;
-       generic = (struct acpi_hest_generic *)hest_hdr;
-       if (!generic->enabled)
+
+       if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
                 return 0;
         ghes_dev = platform_device_alloc("GHES", hest_hdr->source_id);
         if (!ghes_dev)
                 return -ENOMEM;
-       ghes_dev->dev.platform_data = generic;
+
+       rc = platform_device_add_data(ghes_dev, &hest_hdr, sizeof(void *));
+       if (rc)
+               goto err;
+
         rc = platform_device_add(ghes_dev);
         if (rc)
                 goto err;
diff --git a/drivers/acpi/atomicio.c b/drivers/acpi/atomicio.c

index 8f8bd736d4ff11919656e79d2ef9442b037fafff..542e5390389120de7ecd7da4cb7bd059baa18b01 100644 (file)
--- a/drivers/acpi/atomicio.c
+++ b/drivers/acpi/atomicio.c
@@ -142,7 +142,7 @@ static void __iomem *acpi_pre_map(phys_addr_t paddr,
         list_add_tail_rcu(&map->list, &acpi_iomaps);
         spin_unlock_irqrestore(&acpi_iomaps_lock, flags);
  
-       return vaddr + (paddr - pg_off);
+       return map->vaddr + (paddr - map->paddr);
  err_unmap:
         iounmap(vaddr);
         return NULL;
diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c

index dc58402b0a177a4e03e8dc54e7a094804edc1d36..98417201e9ce3881257354e7c2a360ee019a4e39 100644 (file)
--- a/drivers/acpi/battery.c
+++ b/drivers/acpi/battery.c
@@ -273,7 +273,6 @@ static enum power_supply_property energy_battery_props[] = {
         POWER_SUPPLY_PROP_CYCLE_COUNT,
         POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN,
         POWER_SUPPLY_PROP_VOLTAGE_NOW,
-       POWER_SUPPLY_PROP_CURRENT_NOW,
         POWER_SUPPLY_PROP_POWER_NOW,
         POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN,
         POWER_SUPPLY_PROP_ENERGY_FULL,
diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c

index 2bb28b9d91c4c2643106ed0d63069ae85799d136..af308d03f49235a6da0b30791e6857c824e8e32f 100644 (file)
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -183,6 +183,8 @@ static int __init dmi_disable_osi_vista(const struct dmi_system_id *d)
  {
         printk(KERN_NOTICE PREFIX "DMI detected: %s\n", d->ident);
         acpi_osi_setup("!Windows 2006");
+       acpi_osi_setup("!Windows 2006 SP1");
+       acpi_osi_setup("!Windows 2006 SP2");
         return 0;
  }
  static int __init dmi_disable_osi_win7(const struct dmi_system_id *d)
@@ -202,6 +204,23 @@ static struct dmi_system_id acpi_osi_dmi_table[] __initdata = {
                 },
         },
         {
+       /*
+        * There have a NVIF method in MSI GX723 DSDT need call by Nvidia
+        * driver (e.g. nouveau) when user press brightness hotkey.
+        * Currently, nouveau driver didn't do the job and it causes there
+        * have a infinite while loop in DSDT when user press hotkey.
+        * We add MSI GX723's dmi information to this table for workaround
+        * this issue.
+        * Will remove MSI GX723 from the table after nouveau grows support.
+        */
+       .callback = dmi_disable_osi_vista,
+       .ident = "MSI GX723",
+       .matches = {
+                    DMI_MATCH(DMI_SYS_VENDOR, "Micro-Star International"),
+                    DMI_MATCH(DMI_PRODUCT_NAME, "GX723"),
+               },
+       },
+       {
         .callback = dmi_disable_osi_vista,
         .ident = "Sony VGN-NS10J_S",
         .matches = {
@@ -226,6 +245,14 @@ static struct dmi_system_id acpi_osi_dmi_table[] __initdata = {
                 },
         },
         {
+       .callback = dmi_disable_osi_vista,
+       .ident = "Toshiba Satellite L355",
+       .matches = {
+                    DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
+                    DMI_MATCH(DMI_PRODUCT_VERSION, "Satellite L355"),
+               },
+       },
+       {
         .callback = dmi_disable_osi_win7,
         .ident = "ASUS K50IJ",
         .matches = {
@@ -233,6 +260,14 @@ static struct dmi_system_id acpi_osi_dmi_table[] __initdata = {
                      DMI_MATCH(DMI_PRODUCT_NAME, "K50IJ"),
                 },
         },
+       {
+       .callback = dmi_disable_osi_vista,
+       .ident = "Toshiba P305D",
+       .matches = {
+                    DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
+                    DMI_MATCH(DMI_PRODUCT_NAME, "Satellite P305D"),
+               },
+       },
  
         /*
          * BIOS invocation of _OSI(Linux) is almost always a BIOS bug.
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c

index 5c221ab535d5b0a459516a16034f80a94a7d1bb4..310e3b9749cbbacdabb3c03d288a6b42874aa41b 100644 (file)
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -55,7 +55,7 @@ EXPORT_SYMBOL(acpi_root_dir);
  static int set_power_nocheck(const struct dmi_system_id *id)
  {
         printk(KERN_NOTICE PREFIX "%s detected - "
-               "disable power check in power transistion\n", id->ident);
+               "disable power check in power transition\n", id->ident);
         acpi_power_nocheck = 1;
         return 0;
  }
@@ -80,23 +80,15 @@ static int set_copy_dsdt(const struct dmi_system_id *id)
  
  static struct dmi_system_id dsdt_dmi_table[] __initdata = {
         /*
-        * Insyde BIOS on some TOSHIBA machines corrupt the DSDT.
+        * Invoke DSDT corruption work-around on all Toshiba Satellite.
          * https://bugzilla.kernel.org/show_bug.cgi?id=14679
          */
         {
          .callback = set_copy_dsdt,
-        .ident = "TOSHIBA Satellite A505",
+        .ident = "TOSHIBA Satellite",
          .matches = {
                 DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
-               DMI_MATCH(DMI_PRODUCT_NAME, "Satellite A505"),
-               },
-       },
-       {
-        .callback = set_copy_dsdt,
-        .ident = "TOSHIBA Satellite L505D",
-        .matches = {
-               DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
-               DMI_MATCH(DMI_PRODUCT_NAME, "Satellite L505D"),
+               DMI_MATCH(DMI_PRODUCT_NAME, "Satellite"),
                 },
         },
         {}
@@ -1027,7 +1019,7 @@ static int __init acpi_init(void)
  
         /*
          * If the laptop falls into the DMI check table, the power state check
-        * will be disabled in the course of device power transistion.
+        * will be disabled in the course of device power transition.
          */
         dmi_check_system(power_nocheck_dmi_table);
  
diff --git a/drivers/acpi/fan.c b/drivers/acpi/fan.c

index 8a3b840c0bb268d0580cd5550c41986bf3c09662..d94d2953c9740f34675eeb576e00e74ee621bfd2 100644 (file)
--- a/drivers/acpi/fan.c
+++ b/drivers/acpi/fan.c
@@ -369,7 +369,9 @@ static void __exit acpi_fan_exit(void)
  
         acpi_bus_unregister_driver(&acpi_fan_driver);
  
+#ifdef CONFIG_ACPI_PROCFS
         remove_proc_entry(ACPI_FAN_CLASS, acpi_root_dir);
+#endif
  
         return;
  }
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c

index e9699aaed1092874b0f275d2ca86a8142850db1c..bec561c14bebee3a77817bdce8fa09b939f85ac0 100644 (file)
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -28,12 +28,6 @@ static int set_no_mwait(const struct dmi_system_id *id)
  }
  
  static struct dmi_system_id __cpuinitdata processor_idle_dmi_table[] = {
-       {
-       set_no_mwait, "IFL91 board", {
-       DMI_MATCH(DMI_BIOS_VENDOR, "COMPAL"),
-       DMI_MATCH(DMI_SYS_VENDOR, "ZEPTO"),
-       DMI_MATCH(DMI_PRODUCT_VERSION, "3215W"),
-       DMI_MATCH(DMI_BOARD_NAME, "IFL91") }, NULL},
         {
         set_no_mwait, "Extensa 5220", {
         DMI_MATCH(DMI_BIOS_VENDOR, "Phoenix Technologies LTD"),
@@ -352,4 +346,5 @@ void __init acpi_early_processor_set_pdc(void)
         acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
                             ACPI_UINT32_MAX,
                             early_init_pdc, NULL, NULL, NULL);
+       acpi_get_devices("ACPI0007", early_init_pdc, NULL, NULL);
  }
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c

index 15602189238942cc9db03b9d9f25c18823128ea0..347eb21b235302d44d0c5e2d768ce3a8870142f2 100644 (file)
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -850,7 +850,7 @@ static int __init acpi_processor_init(void)
                 printk(KERN_DEBUG "ACPI: %s registered with cpuidle\n",
                         acpi_idle_driver.name);
         } else {
-               printk(KERN_DEBUG "ACPI: acpi_idle yielding to %s",
+               printk(KERN_DEBUG "ACPI: acpi_idle yielding to %s\n",
                         cpuidle_get_driver()->name);
         }
  
diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c

index ba1bd263d903094692c3683c9557e8b919d1d985..3a73a93596e88a29c1e66fabec91dd2d0db96f6c 100644 (file)
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -447,8 +447,8 @@ int acpi_processor_notify_smm(struct module *calling_module)
         if (!try_module_get(calling_module))
                 return -EINVAL;
  
-       /* is_done is set to negative if an error occured,
-        * and to postitive if _no_ error occured, but SMM
+       /* is_done is set to negative if an error occurred,
+        * and to postitive if _no_ error occurred, but SMM
          * was already notified. This avoids double notification
          * which might lead to unexpected results...
          */
diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c

index cf82989ae7568c3c54ded69d16829a2851119d80..4754ff6e70e6daa25b13df6f115745eb7bfb7e55 100644 (file)
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -363,6 +363,12 @@ static int __init init_old_suspend_ordering(const struct dmi_system_id *d)
         return 0;
  }
  
+static int __init init_nvs_nosave(const struct dmi_system_id *d)
+{
+       acpi_nvs_nosave();
+       return 0;
+}
+
  static struct dmi_system_id __initdata acpisleep_dmi_table[] = {
         {
         .callback = init_old_suspend_ordering,
@@ -397,6 +403,22 @@ static struct dmi_system_id __initdata acpisleep_dmi_table[] = {
                 DMI_MATCH(DMI_BOARD_NAME, "CF51-2L"),
                 },
         },
+       {
+       .callback = init_nvs_nosave,
+       .ident = "Sony Vaio VGN-SR11M",
+       .matches = {
+               DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),
+               DMI_MATCH(DMI_PRODUCT_NAME, "VGN-SR11M"),
+               },
+       },
+       {
+       .callback = init_nvs_nosave,
+       .ident = "Everex StepNote Series",
+       .matches = {
+               DMI_MATCH(DMI_SYS_VENDOR, "Everex Systems, Inc."),
+               DMI_MATCH(DMI_PRODUCT_NAME, "Everex StepNote Series"),
+               },
+       },
         {},
  };
  #endif /* CONFIG_SUSPEND */
diff --git a/drivers/acpi/sysfs.c b/drivers/acpi/sysfs.c

index 68e2e4582fa2f18968058c8125704a23f151cb62..f8588f81048ac989d6af1d727693a234f54bc27b 100644 (file)
--- a/drivers/acpi/sysfs.c
+++ b/drivers/acpi/sysfs.c
@@ -100,7 +100,7 @@ static const struct acpi_dlevel acpi_debug_levels[] = {
         ACPI_DEBUG_INIT(ACPI_LV_EVENTS),
  };
  
-static int param_get_debug_layer(char *buffer, struct kernel_param *kp)
+static int param_get_debug_layer(char *buffer, const struct kernel_param *kp)
  {
         int result = 0;
         int i;
@@ -128,7 +128,7 @@ static int param_get_debug_layer(char *buffer, struct kernel_param *kp)
         return result;
  }
  
-static int param_get_debug_level(char *buffer, struct kernel_param *kp)
+static int param_get_debug_level(char *buffer, const struct kernel_param *kp)
  {
         int result = 0;
         int i;
@@ -149,10 +149,18 @@ static int param_get_debug_level(char *buffer, struct kernel_param *kp)
         return result;
  }
  
-module_param_call(debug_layer, param_set_uint, param_get_debug_layer,
-                 &acpi_dbg_layer, 0644);
-module_param_call(debug_level, param_set_uint, param_get_debug_level,
-                 &acpi_dbg_level, 0644);
+static struct kernel_param_ops param_ops_debug_layer = {
+       .set = param_set_uint,
+       .get = param_get_debug_layer,
+};
+
+static struct kernel_param_ops param_ops_debug_level = {
+       .set = param_set_uint,
+       .get = param_get_debug_level,
+};
+
+module_param_cb(debug_layer, &param_ops_debug_layer, &acpi_dbg_layer, 0644);
+module_param_cb(debug_level, &param_ops_debug_level, &acpi_dbg_level, 0644);
  
  static char trace_method_name[6];
  module_param_string(trace_method_name, trace_method_name, 6, 0644);
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c

index c5fef01b3c9591701fb03ae86290a6b5b643c32a..b836761265988590cea46dbcffc39d30ea3c5578 100644 (file)
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -59,8 +59,8 @@ acpi_backlight_cap_match(acpi_handle handle, u32 level, void *context,
                                   "support\n"));
                 *cap |= ACPI_VIDEO_BACKLIGHT;
                 if (ACPI_FAILURE(acpi_get_handle(handle, "_BQC", &h_dummy)))
-                       printk(KERN_WARNING FW_BUG PREFIX "ACPI brightness "
-                                       "control misses _BQC function\n");
+                       printk(KERN_WARNING FW_BUG PREFIX "No _BQC method, "
+                               "cannot determine initial brightness\n");
                 /* We have backlight support, no need to scan further */
                 return AE_CTRL_TERMINATE;
         }
diff --git a/drivers/atm/iphase.c b/drivers/atm/iphase.c

index ee9ddeb53417c7da782d252f7d9e4f8113ee44b4..8cb0347dec2848e4d6c33f5276b9ba986a9f3fac 100644 (file)
--- a/drivers/atm/iphase.c
+++ b/drivers/atm/iphase.c
@@ -3156,7 +3156,6 @@ static int __devinit ia_init_one(struct pci_dev *pdev,
  {  
         struct atm_dev *dev;  
         IADEV *iadev;  
-        unsigned long flags;
         int ret;
  
         iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
@@ -3188,19 +3187,14 @@ static int __devinit ia_init_one(struct pci_dev *pdev,
         ia_dev[iadev_count] = iadev;
         _ia_dev[iadev_count] = dev;
         iadev_count++;
-       spin_lock_init(&iadev->misc_lock);
-       /* First fixes first. I don't want to think about this now. */
-       spin_lock_irqsave(&iadev->misc_lock, flags); 
         if (ia_init(dev) || ia_start(dev)) {  
                 IF_INIT(printk("IA register failed!\n");)
                 iadev_count--;
                 ia_dev[iadev_count] = NULL;
                 _ia_dev[iadev_count] = NULL;
-               spin_unlock_irqrestore(&iadev->misc_lock, flags); 
                 ret = -EINVAL;
                 goto err_out_deregister_dev;
         }
-       spin_unlock_irqrestore(&iadev->misc_lock, flags); 
         IF_EVENT(printk("iadev_count = %d\n", iadev_count);)
  
         iadev->next_board = ia_boards;  
diff --git a/drivers/atm/iphase.h b/drivers/atm/iphase.h

index b2cd20f549cb4d474edb0a2a9a2f419d40d5072b..077735e0e04bfdd1d12f048ac8006f8a598832f8 100644 (file)
--- a/drivers/atm/iphase.h
+++ b/drivers/atm/iphase.h
@@ -1022,7 +1022,7 @@ typedef struct iadev_t {
         struct dle_q rx_dle_q;  
         struct free_desc_q *rx_free_desc_qhead;  
         struct sk_buff_head rx_dma_q;  
-        spinlock_t rx_lock, misc_lock;
+       spinlock_t rx_lock;
         struct atm_vcc **rx_open;       /* list of all open VCs */  
          u16 num_rx_desc, rx_buf_sz, rxing;
          u32 rx_pkt_ram, rx_tmp_cnt;
diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c

index f916ddf63938a03444c2ad6e4d8a6b63c46212aa..f46138ab38b6c310fffc589681f727840a05c646 100644 (file)
--- a/drivers/atm/solos-pci.c
+++ b/drivers/atm/solos-pci.c
@@ -444,6 +444,7 @@ static ssize_t console_show(struct device *dev, struct device_attribute *attr,
         struct atm_dev *atmdev = container_of(dev, struct atm_dev, class_dev);
         struct solos_card *card = atmdev->dev_data;
         struct sk_buff *skb;
+       unsigned int len;
  
         spin_lock(&card->cli_queue_lock);
         skb = skb_dequeue(&card->cli_queue[SOLOS_CHAN(atmdev)]);
@@ -451,11 +452,12 @@ static ssize_t console_show(struct device *dev, struct device_attribute *attr,
         if(skb == NULL)
                 return sprintf(buf, "No data.\n");
  
-       memcpy(buf, skb->data, skb->len);
-       dev_dbg(&card->dev->dev, "len: %d\n", skb->len);
+       len = skb->len;
+       memcpy(buf, skb->data, len);
+       dev_dbg(&card->dev->dev, "len: %d\n", len);
  
         kfree_skb(skb);
-       return skb->len;
+       return len;
  }
  
  static int send_command(struct solos_card *card, int dev, const char *buf, size_t size)
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig

index de277689da6153fac725e54e50f752e7a6ee5978..4b9359a6f6ca45c4cd5f8803b5e7cdb9ba496678 100644 (file)
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -488,4 +488,21 @@ config BLK_DEV_HD
  
           If unsure, say N.
  
+config BLK_DEV_RBD
+       tristate "Rados block device (RBD)"
+       depends on INET && EXPERIMENTAL && BLOCK
+       select CEPH_LIB
+       select LIBCRC32C
+       select CRYPTO_AES
+       select CRYPTO
+       default n
+       help
+         Say Y here if you want include the Rados block device, which stripes
+         a block device over objects stored in the Ceph distributed object
+         store.
+
+         More information at http://ceph.newdream.net/.
+
+         If unsure, say N.
+
  endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile

index aff5ac925c34332f235e523164f7f85cd76b6751..d7f463d6312d6494c3ef795c5b96ff11fa0ec801 100644 (file)
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -37,5 +37,6 @@ obj-$(CONFIG_BLK_DEV_HD)      += hd.o
  
  obj-$(CONFIG_XEN_BLKDEV_FRONTEND)      += xen-blkfront.o
  obj-$(CONFIG_BLK_DEV_DRBD)     += drbd/
+obj-$(CONFIG_BLK_DEV_RBD)     += rbd.o
  
  swim_mod-objs  := swim.o swim_asm.o
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c

index e9da874d04192b125561f4b71d8ba21ce55fadcc..03688c2da319c007f4923c4ffd989e4f9666b755 100644 (file)
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -113,7 +113,7 @@ static void ps3disk_scatter_gather(struct ps3_storage_device *dev,
                         memcpy(buf, dev->bounce_buf+offset, size);
                 offset += size;
                 flush_kernel_dcache_page(bvec->bv_page);
-               bvec_kunmap_irq(bvec, &flags);
+               bvec_kunmap_irq(buf, &flags);
                 i++;
         }
  }
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c

new file mode 100644 (file)

index 0000000..6ec9d53
--- /dev/null
+++ b/drivers/block/rbd.c
@@ -0,0 +1,1841 @@
+/*
+   rbd.c -- Export ceph rados objects as a Linux block device
+
+
+   based on drivers/block/osdblk.c:
+
+   Copyright 2009 Red Hat, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING.  If not, write to
+   the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+
+
+
+   Instructions for use
+   --------------------
+
+   1) Map a Linux block device to an existing rbd image.
+
+      Usage: <mon ip addr> <options> <pool name> <rbd image name> [snap name]
+
+      $ echo "192.168.0.1 name=admin rbd foo" > /sys/class/rbd/add
+
+      The snapshot name can be "-" or omitted to map the image read/write.
+
+   2) List all active blkdev<->object mappings.
+
+      In this example, we have performed step #1 twice, creating two blkdevs,
+      mapped to two separate rados objects in the rados rbd pool
+
+      $ cat /sys/class/rbd/list
+      #id     major   client_name     pool    name    snap    KB
+      0       254     client4143      rbd     foo     -      1024000
+
+      The columns, in order, are:
+      - blkdev unique id
+      - blkdev assigned major
+      - rados client id
+      - rados pool name
+      - rados block device name
+      - mapped snapshot ("-" if none)
+      - device size in KB
+
+
+   3) Create a snapshot.
+
+      Usage: <blkdev id> <snapname>
+
+      $ echo "0 mysnap" > /sys/class/rbd/snap_create
+
+
+   4) Listing a snapshot.
+
+      $ cat /sys/class/rbd/snaps_list
+      #id     snap    KB
+      0       -       1024000 (*)
+      0       foo     1024000
+
+      The columns, in order, are:
+      - blkdev unique id
+      - snapshot name, '-' means none (active read/write version)
+      - size of device at time of snapshot
+      - the (*) indicates this is the active version
+
+   5) Rollback to snapshot.
+
+      Usage: <blkdev id> <snapname>
+
+      $ echo "0 mysnap" > /sys/class/rbd/snap_rollback
+
+
+   6) Mapping an image using snapshot.
+
+      A snapshot mapping is read-only. This is being done by passing
+      snap=<snapname> to the options when adding a device.
+
+      $ echo "192.168.0.1 name=admin,snap=mysnap rbd foo" > /sys/class/rbd/add
+
+
+   7) Remove an active blkdev<->rbd image mapping.
+
+      In this example, we remove the mapping with blkdev unique id 1.
+
+      $ echo 1 > /sys/class/rbd/remove
+
+
+   NOTE:  The actual creation and deletion of rados objects is outside the scope
+   of this driver.
+
+ */
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/osd_client.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/decode.h>
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/blkdev.h>
+
+#include "rbd_types.h"
+
+#define DRV_NAME "rbd"
+#define DRV_NAME_LONG "rbd (rados block device)"
+
+#define RBD_MINORS_PER_MAJOR   256             /* max minors per blkdev */
+
+#define RBD_MAX_MD_NAME_LEN    (96 + sizeof(RBD_SUFFIX))
+#define RBD_MAX_POOL_NAME_LEN  64
+#define RBD_MAX_SNAP_NAME_LEN  32
+#define RBD_MAX_OPT_LEN                1024
+
+#define RBD_SNAP_HEAD_NAME     "-"
+
+#define DEV_NAME_LEN           32
+
+/*
+ * block device image metadata (in-memory version)
+ */
+struct rbd_image_header {
+       u64 image_size;
+       char block_name[32];
+       __u8 obj_order;
+       __u8 crypt_type;
+       __u8 comp_type;
+       struct rw_semaphore snap_rwsem;
+       struct ceph_snap_context *snapc;
+       size_t snap_names_len;
+       u64 snap_seq;
+       u32 total_snaps;
+
+       char *snap_names;
+       u64 *snap_sizes;
+};
+
+/*
+ * an instance of the client.  multiple devices may share a client.
+ */
+struct rbd_client {
+       struct ceph_client      *client;
+       struct kref             kref;
+       struct list_head        node;
+};
+
+/*
+ * a single io request
+ */
+struct rbd_request {
+       struct request          *rq;            /* blk layer request */
+       struct bio              *bio;           /* cloned bio */
+       struct page             **pages;        /* list of used pages */
+       u64                     len;
+};
+
+/*
+ * a single device
+ */
+struct rbd_device {
+       int                     id;             /* blkdev unique id */
+
+       int                     major;          /* blkdev assigned major */
+       struct gendisk          *disk;          /* blkdev's gendisk and rq */
+       struct request_queue    *q;
+
+       struct ceph_client      *client;
+       struct rbd_client       *rbd_client;
+
+       char                    name[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
+
+       spinlock_t              lock;           /* queue lock */
+
+       struct rbd_image_header header;
+       char                    obj[RBD_MAX_OBJ_NAME_LEN]; /* rbd image name */
+       int                     obj_len;
+       char                    obj_md_name[RBD_MAX_MD_NAME_LEN]; /* hdr nm. */
+       char                    pool_name[RBD_MAX_POOL_NAME_LEN];
+       int                     poolid;
+
+       char                    snap_name[RBD_MAX_SNAP_NAME_LEN];
+       u32 cur_snap;   /* index+1 of current snapshot within snap context
+                          0 - for the head */
+       int read_only;
+
+       struct list_head        node;
+};
+
+static spinlock_t node_lock;      /* protects client get/put */
+
+static struct class *class_rbd;          /* /sys/class/rbd */
+static DEFINE_MUTEX(ctl_mutex);          /* Serialize open/close/setup/teardown */
+static LIST_HEAD(rbd_dev_list);    /* devices */
+static LIST_HEAD(rbd_client_list);      /* clients */
+
+
+static int rbd_open(struct block_device *bdev, fmode_t mode)
+{
+       struct gendisk *disk = bdev->bd_disk;
+       struct rbd_device *rbd_dev = disk->private_data;
+
+       set_device_ro(bdev, rbd_dev->read_only);
+
+       if ((mode & FMODE_WRITE) && rbd_dev->read_only)
+               return -EROFS;
+
+       return 0;
+}
+
+static const struct block_device_operations rbd_bd_ops = {
+       .owner                  = THIS_MODULE,
+       .open                   = rbd_open,
+};
+
+/*
+ * Initialize an rbd client instance.
+ * We own *opt.
+ */
+static struct rbd_client *rbd_client_create(struct ceph_options *opt)
+{
+       struct rbd_client *rbdc;
+       int ret = -ENOMEM;
+
+       dout("rbd_client_create\n");
+       rbdc = kmalloc(sizeof(struct rbd_client), GFP_KERNEL);
+       if (!rbdc)
+               goto out_opt;
+
+       kref_init(&rbdc->kref);
+       INIT_LIST_HEAD(&rbdc->node);
+
+       rbdc->client = ceph_create_client(opt, rbdc);
+       if (IS_ERR(rbdc->client))
+               goto out_rbdc;
+       opt = NULL; /* Now rbdc->client is responsible for opt */
+
+       ret = ceph_open_session(rbdc->client);
+       if (ret < 0)
+               goto out_err;
+
+       spin_lock(&node_lock);
+       list_add_tail(&rbdc->node, &rbd_client_list);
+       spin_unlock(&node_lock);
+
+       dout("rbd_client_create created %p\n", rbdc);
+       return rbdc;
+
+out_err:
+       ceph_destroy_client(rbdc->client);
+out_rbdc:
+       kfree(rbdc);
+out_opt:
+       if (opt)
+               ceph_destroy_options(opt);
+       return ERR_PTR(ret);
+}
+
+/*
+ * Find a ceph client with specific addr and configuration.
+ */
+static struct rbd_client *__rbd_client_find(struct ceph_options *opt)
+{
+       struct rbd_client *client_node;
+
+       if (opt->flags & CEPH_OPT_NOSHARE)
+               return NULL;
+
+       list_for_each_entry(client_node, &rbd_client_list, node)
+               if (ceph_compare_options(opt, client_node->client) == 0)
+                       return client_node;
+       return NULL;
+}
+
+/*
+ * Get a ceph client with specific addr and configuration, if one does
+ * not exist create it.
+ */
+static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,
+                         char *options)
+{
+       struct rbd_client *rbdc;
+       struct ceph_options *opt;
+       int ret;
+
+       ret = ceph_parse_options(&opt, options, mon_addr,
+                                mon_addr + strlen(mon_addr), NULL, NULL);
+       if (ret < 0)
+               return ret;
+
+       spin_lock(&node_lock);
+       rbdc = __rbd_client_find(opt);
+       if (rbdc) {
+               ceph_destroy_options(opt);
+
+               /* using an existing client */
+               kref_get(&rbdc->kref);
+               rbd_dev->rbd_client = rbdc;
+               rbd_dev->client = rbdc->client;
+               spin_unlock(&node_lock);
+               return 0;
+       }
+       spin_unlock(&node_lock);
+
+       rbdc = rbd_client_create(opt);
+       if (IS_ERR(rbdc))
+               return PTR_ERR(rbdc);
+
+       rbd_dev->rbd_client = rbdc;
+       rbd_dev->client = rbdc->client;
+       return 0;
+}
+
+/*
+ * Destroy ceph client
+ */
+static void rbd_client_release(struct kref *kref)
+{
+       struct rbd_client *rbdc = container_of(kref, struct rbd_client, kref);
+
+       dout("rbd_release_client %p\n", rbdc);
+       spin_lock(&node_lock);
+       list_del(&rbdc->node);
+       spin_unlock(&node_lock);
+
+       ceph_destroy_client(rbdc->client);
+       kfree(rbdc);
+}
+
+/*
+ * Drop reference to ceph client node. If it's not referenced anymore, release
+ * it.
+ */
+static void rbd_put_client(struct rbd_device *rbd_dev)
+{
+       kref_put(&rbd_dev->rbd_client->kref, rbd_client_release);
+       rbd_dev->rbd_client = NULL;
+       rbd_dev->client = NULL;
+}
+
+
+/*
+ * Create a new header structure, translate header format from the on-disk
+ * header.
+ */
+static int rbd_header_from_disk(struct rbd_image_header *header,
+                                struct rbd_image_header_ondisk *ondisk,
+                                int allocated_snaps,
+                                gfp_t gfp_flags)
+{
+       int i;
+       u32 snap_count = le32_to_cpu(ondisk->snap_count);
+       int ret = -ENOMEM;
+
+       init_rwsem(&header->snap_rwsem);
+
+       header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
+       header->snapc = kmalloc(sizeof(struct ceph_snap_context) +
+                               snap_count *
+                                sizeof(struct rbd_image_snap_ondisk),
+                               gfp_flags);
+       if (!header->snapc)
+               return -ENOMEM;
+       if (snap_count) {
+               header->snap_names = kmalloc(header->snap_names_len,
+                                            GFP_KERNEL);
+               if (!header->snap_names)
+                       goto err_snapc;
+               header->snap_sizes = kmalloc(snap_count * sizeof(u64),
+                                            GFP_KERNEL);
+               if (!header->snap_sizes)
+                       goto err_names;
+       } else {
+               header->snap_names = NULL;
+               header->snap_sizes = NULL;
+       }
+       memcpy(header->block_name, ondisk->block_name,
+              sizeof(ondisk->block_name));
+
+       header->image_size = le64_to_cpu(ondisk->image_size);
+       header->obj_order = ondisk->options.order;
+       header->crypt_type = ondisk->options.crypt_type;
+       header->comp_type = ondisk->options.comp_type;
+
+       atomic_set(&header->snapc->nref, 1);
+       header->snap_seq = le64_to_cpu(ondisk->snap_seq);
+       header->snapc->num_snaps = snap_count;
+       header->total_snaps = snap_count;
+
+       if (snap_count &&
+           allocated_snaps == snap_count) {
+               for (i = 0; i < snap_count; i++) {
+                       header->snapc->snaps[i] =
+                               le64_to_cpu(ondisk->snaps[i].id);
+                       header->snap_sizes[i] =
+                               le64_to_cpu(ondisk->snaps[i].image_size);
+               }
+
+               /* copy snapshot names */
+               memcpy(header->snap_names, &ondisk->snaps[i],
+                       header->snap_names_len);
+       }
+
+       return 0;
+
+err_names:
+       kfree(header->snap_names);
+err_snapc:
+       kfree(header->snapc);
+       return ret;
+}
+
+static int snap_index(struct rbd_image_header *header, int snap_num)
+{
+       return header->total_snaps - snap_num;
+}
+
+static u64 cur_snap_id(struct rbd_device *rbd_dev)
+{
+       struct rbd_image_header *header = &rbd_dev->header;
+
+       if (!rbd_dev->cur_snap)
+               return 0;
+
+       return header->snapc->snaps[snap_index(header, rbd_dev->cur_snap)];
+}
+
+static int snap_by_name(struct rbd_image_header *header, const char *snap_name,
+                       u64 *seq, u64 *size)
+{
+       int i;
+       char *p = header->snap_names;
+
+       for (i = 0; i < header->total_snaps; i++, p += strlen(p) + 1) {
+               if (strcmp(snap_name, p) == 0)
+                       break;
+       }
+       if (i == header->total_snaps)
+               return -ENOENT;
+       if (seq)
+               *seq = header->snapc->snaps[i];
+
+       if (size)
+               *size = header->snap_sizes[i];
+
+       return i;
+}
+
+static int rbd_header_set_snap(struct rbd_device *dev,
+                              const char *snap_name,
+                              u64 *size)
+{
+       struct rbd_image_header *header = &dev->header;
+       struct ceph_snap_context *snapc = header->snapc;
+       int ret = -ENOENT;
+
+       down_write(&header->snap_rwsem);
+
+       if (!snap_name ||
+           !*snap_name ||
+           strcmp(snap_name, "-") == 0 ||
+           strcmp(snap_name, RBD_SNAP_HEAD_NAME) == 0) {
+               if (header->total_snaps)
+                       snapc->seq = header->snap_seq;
+               else
+                       snapc->seq = 0;
+               dev->cur_snap = 0;
+               dev->read_only = 0;
+               if (size)
+                       *size = header->image_size;
+       } else {
+               ret = snap_by_name(header, snap_name, &snapc->seq, size);
+               if (ret < 0)
+                       goto done;
+
+               dev->cur_snap = header->total_snaps - ret;
+               dev->read_only = 1;
+       }
+
+       ret = 0;
+done:
+       up_write(&header->snap_rwsem);
+       return ret;
+}
+
+static void rbd_header_free(struct rbd_image_header *header)
+{
+       kfree(header->snapc);
+       kfree(header->snap_names);
+       kfree(header->snap_sizes);
+}
+
+/*
+ * get the actual striped segment name, offset and length
+ */
+static u64 rbd_get_segment(struct rbd_image_header *header,
+                          const char *block_name,
+                          u64 ofs, u64 len,
+                          char *seg_name, u64 *segofs)
+{
+       u64 seg = ofs >> header->obj_order;
+
+       if (seg_name)
+               snprintf(seg_name, RBD_MAX_SEG_NAME_LEN,
+                        "%s.%012llx", block_name, seg);
+
+       ofs = ofs & ((1 << header->obj_order) - 1);
+       len = min_t(u64, len, (1 << header->obj_order) - ofs);
+
+       if (segofs)
+               *segofs = ofs;
+
+       return len;
+}
+
+/*
+ * bio helpers
+ */
+
+static void bio_chain_put(struct bio *chain)
+{
+       struct bio *tmp;
+
+       while (chain) {
+               tmp = chain;
+               chain = chain->bi_next;
+               bio_put(tmp);
+       }
+}
+
+/*
+ * zeros a bio chain, starting at specific offset
+ */
+static void zero_bio_chain(struct bio *chain, int start_ofs)
+{
+       struct bio_vec *bv;
+       unsigned long flags;
+       void *buf;
+       int i;
+       int pos = 0;
+
+       while (chain) {
+               bio_for_each_segment(bv, chain, i) {
+                       if (pos + bv->bv_len > start_ofs) {
+                               int remainder = max(start_ofs - pos, 0);
+                               buf = bvec_kmap_irq(bv, &flags);
+                               memset(buf + remainder, 0,
+                                      bv->bv_len - remainder);
+                               bvec_kunmap_irq(buf, &flags);
+                       }
+                       pos += bv->bv_len;
+               }
+
+               chain = chain->bi_next;
+       }
+}
+
+/*
+ * bio_chain_clone - clone a chain of bios up to a certain length.
+ * might return a bio_pair that will need to be released.
+ */
+static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
+                                  struct bio_pair **bp,
+                                  int len, gfp_t gfpmask)
+{
+       struct bio *tmp, *old_chain = *old, *new_chain = NULL, *tail = NULL;
+       int total = 0;
+
+       if (*bp) {
+               bio_pair_release(*bp);
+               *bp = NULL;
+       }
+
+       while (old_chain && (total < len)) {
+               tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+               if (!tmp)
+                       goto err_out;
+
+               if (total + old_chain->bi_size > len) {
+                       struct bio_pair *bp;
+
+                       /*
+                        * this split can only happen with a single paged bio,
+                        * split_bio will BUG_ON if this is not the case
+                        */
+                       dout("bio_chain_clone split! total=%d remaining=%d"
+                            "bi_size=%d\n",
+                            (int)total, (int)len-total,
+                            (int)old_chain->bi_size);
+
+                       /* split the bio. We'll release it either in the next
+                          call, or it will have to be released outside */
+                       bp = bio_split(old_chain, (len - total) / 512ULL);
+                       if (!bp)
+                               goto err_out;
+
+                       __bio_clone(tmp, &bp->bio1);
+
+                       *next = &bp->bio2;
+               } else {
+                       __bio_clone(tmp, old_chain);
+                       *next = old_chain->bi_next;
+               }
+
+               tmp->bi_bdev = NULL;
+               gfpmask &= ~__GFP_WAIT;
+               tmp->bi_next = NULL;
+
+               if (!new_chain) {
+                       new_chain = tail = tmp;
+               } else {
+                       tail->bi_next = tmp;
+                       tail = tmp;
+               }
+               old_chain = old_chain->bi_next;
+
+               total += tmp->bi_size;
+       }
+
+       BUG_ON(total < len);
+
+       if (tail)
+               tail->bi_next = NULL;
+
+       *old = old_chain;
+
+       return new_chain;
+
+err_out:
+       dout("bio_chain_clone with err\n");
+       bio_chain_put(new_chain);
+       return NULL;
+}
+
+/*
+ * helpers for osd request op vectors.
+ */
+static int rbd_create_rw_ops(struct ceph_osd_req_op **ops,
+                           int num_ops,
+                           int opcode,
+                           u32 payload_len)
+{
+       *ops = kzalloc(sizeof(struct ceph_osd_req_op) * (num_ops + 1),
+                      GFP_NOIO);
+       if (!*ops)
+               return -ENOMEM;
+       (*ops)[0].op = opcode;
+       /*
+        * op extent offset and length will be set later on
+        * in calc_raw_layout()
+        */
+       (*ops)[0].payload_len = payload_len;
+       return 0;
+}
+
+static void rbd_destroy_ops(struct ceph_osd_req_op *ops)
+{
+       kfree(ops);
+}
+
+/*
+ * Send ceph osd request
+ */
+static int rbd_do_request(struct request *rq,
+                         struct rbd_device *dev,
+                         struct ceph_snap_context *snapc,
+                         u64 snapid,
+                         const char *obj, u64 ofs, u64 len,
+                         struct bio *bio,
+                         struct page **pages,
+                         int num_pages,
+                         int flags,
+                         struct ceph_osd_req_op *ops,
+                         int num_reply,
+                         void (*rbd_cb)(struct ceph_osd_request *req,
+                                        struct ceph_msg *msg))
+{
+       struct ceph_osd_request *req;
+       struct ceph_file_layout *layout;
+       int ret;
+       u64 bno;
+       struct timespec mtime = CURRENT_TIME;
+       struct rbd_request *req_data;
+       struct ceph_osd_request_head *reqhead;
+       struct rbd_image_header *header = &dev->header;
+
+       ret = -ENOMEM;
+       req_data = kzalloc(sizeof(*req_data), GFP_NOIO);
+       if (!req_data)
+               goto done;
+
+       dout("rbd_do_request len=%lld ofs=%lld\n", len, ofs);
+
+       down_read(&header->snap_rwsem);
+
+       req = ceph_osdc_alloc_request(&dev->client->osdc, flags,
+                                     snapc,
+                                     ops,
+                                     false,
+                                     GFP_NOIO, pages, bio);
+       if (IS_ERR(req)) {
+               up_read(&header->snap_rwsem);
+               ret = PTR_ERR(req);
+               goto done_pages;
+       }
+
+       req->r_callback = rbd_cb;
+
+       req_data->rq = rq;
+       req_data->bio = bio;
+       req_data->pages = pages;
+       req_data->len = len;
+
+       req->r_priv = req_data;
+
+       reqhead = req->r_request->front.iov_base;
+       reqhead->snapid = cpu_to_le64(CEPH_NOSNAP);
+
+       strncpy(req->r_oid, obj, sizeof(req->r_oid));
+       req->r_oid_len = strlen(req->r_oid);
+
+       layout = &req->r_file_layout;
+       memset(layout, 0, sizeof(*layout));
+       layout->fl_stripe_unit = cpu_to_le32(1 << RBD_MAX_OBJ_ORDER);
+       layout->fl_stripe_count = cpu_to_le32(1);
+       layout->fl_object_size = cpu_to_le32(1 << RBD_MAX_OBJ_ORDER);
+       layout->fl_pg_preferred = cpu_to_le32(-1);
+       layout->fl_pg_pool = cpu_to_le32(dev->poolid);
+       ceph_calc_raw_layout(&dev->client->osdc, layout, snapid,
+                            ofs, &len, &bno, req, ops);
+
+       ceph_osdc_build_request(req, ofs, &len,
+                               ops,
+                               snapc,
+                               &mtime,
+                               req->r_oid, req->r_oid_len);
+       up_read(&header->snap_rwsem);
+
+       ret = ceph_osdc_start_request(&dev->client->osdc, req, false);
+       if (ret < 0)
+               goto done_err;
+
+       if (!rbd_cb) {
+               ret = ceph_osdc_wait_request(&dev->client->osdc, req);
+               ceph_osdc_put_request(req);
+       }
+       return ret;
+
+done_err:
+       bio_chain_put(req_data->bio);
+       ceph_osdc_put_request(req);
+done_pages:
+       kfree(req_data);
+done:
+       if (rq)
+               blk_end_request(rq, ret, len);
+       return ret;
+}
+
+/*
+ * Ceph osd op callback
+ */
+static void rbd_req_cb(struct ceph_osd_request *req, struct ceph_msg *msg)
+{
+       struct rbd_request *req_data = req->r_priv;
+       struct ceph_osd_reply_head *replyhead;
+       struct ceph_osd_op *op;
+       __s32 rc;
+       u64 bytes;
+       int read_op;
+
+       /* parse reply */
+       replyhead = msg->front.iov_base;
+       WARN_ON(le32_to_cpu(replyhead->num_ops) == 0);
+       op = (void *)(replyhead + 1);
+       rc = le32_to_cpu(replyhead->result);
+       bytes = le64_to_cpu(op->extent.length);
+       read_op = (le32_to_cpu(op->op) == CEPH_OSD_OP_READ);
+
+       dout("rbd_req_cb bytes=%lld readop=%d rc=%d\n", bytes, read_op, rc);
+
+       if (rc == -ENOENT && read_op) {
+               zero_bio_chain(req_data->bio, 0);
+               rc = 0;
+       } else if (rc == 0 && read_op && bytes < req_data->len) {
+               zero_bio_chain(req_data->bio, bytes);
+               bytes = req_data->len;
+       }
+
+       blk_end_request(req_data->rq, rc, bytes);
+
+       if (req_data->bio)
+               bio_chain_put(req_data->bio);
+
+       ceph_osdc_put_request(req);
+       kfree(req_data);
+}
+
+/*
+ * Do a synchronous ceph osd operation
+ */
+static int rbd_req_sync_op(struct rbd_device *dev,
+                          struct ceph_snap_context *snapc,
+                          u64 snapid,
+                          int opcode,
+                          int flags,
+                          struct ceph_osd_req_op *orig_ops,
+                          int num_reply,
+                          const char *obj,
+                          u64 ofs, u64 len,
+                          char *buf)
+{
+       int ret;
+       struct page **pages;
+       int num_pages;
+       struct ceph_osd_req_op *ops = orig_ops;
+       u32 payload_len;
+
+       num_pages = calc_pages_for(ofs , len);
+       pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
+       if (IS_ERR(pages))
+               return PTR_ERR(pages);
+
+       if (!orig_ops) {
+               payload_len = (flags & CEPH_OSD_FLAG_WRITE ? len : 0);
+               ret = rbd_create_rw_ops(&ops, 1, opcode, payload_len);
+               if (ret < 0)
+                       goto done;
+
+               if ((flags & CEPH_OSD_FLAG_WRITE) && buf) {
+                       ret = ceph_copy_to_page_vector(pages, buf, ofs, len);
+                       if (ret < 0)
+                               goto done_ops;
+               }
+       }
+
+       ret = rbd_do_request(NULL, dev, snapc, snapid,
+                         obj, ofs, len, NULL,
+                         pages, num_pages,
+                         flags,
+                         ops,
+                         2,
+                         NULL);
+       if (ret < 0)
+               goto done_ops;
+
+       if ((flags & CEPH_OSD_FLAG_READ) && buf)
+               ret = ceph_copy_from_page_vector(pages, buf, ofs, ret);
+
+done_ops:
+       if (!orig_ops)
+               rbd_destroy_ops(ops);
+done:
+       ceph_release_page_vector(pages, num_pages);
+       return ret;
+}
+
+/*
+ * Do an asynchronous ceph osd operation
+ */
+static int rbd_do_op(struct request *rq,
+                    struct rbd_device *rbd_dev ,
+                    struct ceph_snap_context *snapc,
+                    u64 snapid,
+                    int opcode, int flags, int num_reply,
+                    u64 ofs, u64 len,
+                    struct bio *bio)
+{
+       char *seg_name;
+       u64 seg_ofs;
+       u64 seg_len;
+       int ret;
+       struct ceph_osd_req_op *ops;
+       u32 payload_len;
+
+       seg_name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
+       if (!seg_name)
+               return -ENOMEM;
+
+       seg_len = rbd_get_segment(&rbd_dev->header,
+                                 rbd_dev->header.block_name,
+                                 ofs, len,
+                                 seg_name, &seg_ofs);
+
+       payload_len = (flags & CEPH_OSD_FLAG_WRITE ? seg_len : 0);
+
+       ret = rbd_create_rw_ops(&ops, 1, opcode, payload_len);
+       if (ret < 0)
+               goto done;
+
+       /* we've taken care of segment sizes earlier when we
+          cloned the bios. We should never have a segment
+          truncated at this point */
+       BUG_ON(seg_len < len);
+
+       ret = rbd_do_request(rq, rbd_dev, snapc, snapid,
+                            seg_name, seg_ofs, seg_len,
+                            bio,
+                            NULL, 0,
+                            flags,
+                            ops,
+                            num_reply,
+                            rbd_req_cb);
+done:
+       kfree(seg_name);
+       return ret;
+}
+
+/*
+ * Request async osd write
+ */
+static int rbd_req_write(struct request *rq,
+                        struct rbd_device *rbd_dev,
+                        struct ceph_snap_context *snapc,
+                        u64 ofs, u64 len,
+                        struct bio *bio)
+{
+       return rbd_do_op(rq, rbd_dev, snapc, CEPH_NOSNAP,
+                        CEPH_OSD_OP_WRITE,
+                        CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
+                        2,
+                        ofs, len, bio);
+}
+
+/*
+ * Request async osd read
+ */
+static int rbd_req_read(struct request *rq,
+                        struct rbd_device *rbd_dev,
+                        u64 snapid,
+                        u64 ofs, u64 len,
+                        struct bio *bio)
+{
+       return rbd_do_op(rq, rbd_dev, NULL,
+                        (snapid ? snapid : CEPH_NOSNAP),
+                        CEPH_OSD_OP_READ,
+                        CEPH_OSD_FLAG_READ,
+                        2,
+                        ofs, len, bio);
+}
+
+/*
+ * Request sync osd read
+ */
+static int rbd_req_sync_read(struct rbd_device *dev,
+                         struct ceph_snap_context *snapc,
+                         u64 snapid,
+                         const char *obj,
+                         u64 ofs, u64 len,
+                         char *buf)
+{
+       return rbd_req_sync_op(dev, NULL,
+                              (snapid ? snapid : CEPH_NOSNAP),
+                              CEPH_OSD_OP_READ,
+                              CEPH_OSD_FLAG_READ,
+                              NULL,
+                              1, obj, ofs, len, buf);
+}
+
+/*
+ * Request sync osd read
+ */
+static int rbd_req_sync_rollback_obj(struct rbd_device *dev,
+                                    u64 snapid,
+                                    const char *obj)
+{
+       struct ceph_osd_req_op *ops;
+       int ret = rbd_create_rw_ops(&ops, 1, CEPH_OSD_OP_ROLLBACK, 0);
+       if (ret < 0)
+               return ret;
+
+       ops[0].snap.snapid = snapid;
+
+       ret = rbd_req_sync_op(dev, NULL,
+                              CEPH_NOSNAP,
+                              0,
+                              CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
+                              ops,
+                              1, obj, 0, 0, NULL);
+
+       rbd_destroy_ops(ops);
+
+       if (ret < 0)
+               return ret;
+
+       return ret;
+}
+
+/*
+ * Request sync osd read
+ */
+static int rbd_req_sync_exec(struct rbd_device *dev,
+                            const char *obj,
+                            const char *cls,
+                            const char *method,
+                            const char *data,
+                            int len)
+{
+       struct ceph_osd_req_op *ops;
+       int cls_len = strlen(cls);
+       int method_len = strlen(method);
+       int ret = rbd_create_rw_ops(&ops, 1, CEPH_OSD_OP_CALL,
+                                   cls_len + method_len + len);
+       if (ret < 0)
+               return ret;
+
+       ops[0].cls.class_name = cls;
+       ops[0].cls.class_len = (__u8)cls_len;
+       ops[0].cls.method_name = method;
+       ops[0].cls.method_len = (__u8)method_len;
+       ops[0].cls.argc = 0;
+       ops[0].cls.indata = data;
+       ops[0].cls.indata_len = len;
+
+       ret = rbd_req_sync_op(dev, NULL,
+                              CEPH_NOSNAP,
+                              0,
+                              CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
+                              ops,
+                              1, obj, 0, 0, NULL);
+
+       rbd_destroy_ops(ops);
+
+       dout("cls_exec returned %d\n", ret);
+       return ret;
+}
+
+/*
+ * block device queue callback
+ */
+static void rbd_rq_fn(struct request_queue *q)
+{
+       struct rbd_device *rbd_dev = q->queuedata;
+       struct request *rq;
+       struct bio_pair *bp = NULL;
+
+       rq = blk_fetch_request(q);
+
+       while (1) {
+               struct bio *bio;
+               struct bio *rq_bio, *next_bio = NULL;
+               bool do_write;
+               int size, op_size = 0;
+               u64 ofs;
+
+               /* peek at request from block layer */
+               if (!rq)
+                       break;
+
+               dout("fetched request\n");
+
+               /* filter out block requests we don't understand */
+               if ((rq->cmd_type != REQ_TYPE_FS)) {
+                       __blk_end_request_all(rq, 0);
+                       goto next;
+               }
+
+               /* deduce our operation (read, write) */
+               do_write = (rq_data_dir(rq) == WRITE);
+
+               size = blk_rq_bytes(rq);
+               ofs = blk_rq_pos(rq) * 512ULL;
+               rq_bio = rq->bio;
+               if (do_write && rbd_dev->read_only) {
+                       __blk_end_request_all(rq, -EROFS);
+                       goto next;
+               }
+
+               spin_unlock_irq(q->queue_lock);
+
+               dout("%s 0x%x bytes at 0x%llx\n",
+                    do_write ? "write" : "read",
+                    size, blk_rq_pos(rq) * 512ULL);
+
+               do {
+                       /* a bio clone to be passed down to OSD req */
+                       dout("rq->bio->bi_vcnt=%d\n", rq->bio->bi_vcnt);
+                       op_size = rbd_get_segment(&rbd_dev->header,
+                                                 rbd_dev->header.block_name,
+                                                 ofs, size,
+                                                 NULL, NULL);
+                       bio = bio_chain_clone(&rq_bio, &next_bio, &bp,
+                                             op_size, GFP_ATOMIC);
+                       if (!bio) {
+                               spin_lock_irq(q->queue_lock);
+                               __blk_end_request_all(rq, -ENOMEM);
+                               goto next;
+                       }
+
+                       /* init OSD command: write or read */
+                       if (do_write)
+                               rbd_req_write(rq, rbd_dev,
+                                             rbd_dev->header.snapc,
+                                             ofs,
+                                             op_size, bio);
+                       else
+                               rbd_req_read(rq, rbd_dev,
+                                            cur_snap_id(rbd_dev),
+                                            ofs,
+                                            op_size, bio);
+
+                       size -= op_size;
+                       ofs += op_size;
+
+                       rq_bio = next_bio;
+               } while (size > 0);
+
+               if (bp)
+                       bio_pair_release(bp);
+
+               spin_lock_irq(q->queue_lock);
+next:
+               rq = blk_fetch_request(q);
+       }
+}
+
+/*
+ * a queue callback. Makes sure that we don't create a bio that spans across
+ * multiple osd objects. One exception would be with a single page bios,
+ * which we handle later at bio_chain_clone
+ */
+static int rbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
+                         struct bio_vec *bvec)
+{
+       struct rbd_device *rbd_dev = q->queuedata;
+       unsigned int chunk_sectors = 1 << (rbd_dev->header.obj_order - 9);
+       sector_t sector = bmd->bi_sector + get_start_sect(bmd->bi_bdev);
+       unsigned int bio_sectors = bmd->bi_size >> 9;
+       int max;
+
+       max =  (chunk_sectors - ((sector & (chunk_sectors - 1))
+                                + bio_sectors)) << 9;
+       if (max < 0)
+               max = 0; /* bio_add cannot handle a negative return */
+       if (max <= bvec->bv_len && bio_sectors == 0)
+               return bvec->bv_len;
+       return max;
+}
+
+static void rbd_free_disk(struct rbd_device *rbd_dev)
+{
+       struct gendisk *disk = rbd_dev->disk;
+
+       if (!disk)
+               return;
+
+       rbd_header_free(&rbd_dev->header);
+
+       if (disk->flags & GENHD_FL_UP)
+               del_gendisk(disk);
+       if (disk->queue)
+               blk_cleanup_queue(disk->queue);
+       put_disk(disk);
+}
+
+/*
+ * reload the ondisk the header 
+ */
+static int rbd_read_header(struct rbd_device *rbd_dev,
+                          struct rbd_image_header *header)
+{
+       ssize_t rc;
+       struct rbd_image_header_ondisk *dh;
+       int snap_count = 0;
+       u64 snap_names_len = 0;
+
+       while (1) {
+               int len = sizeof(*dh) +
+                         snap_count * sizeof(struct rbd_image_snap_ondisk) +
+                         snap_names_len;
+
+               rc = -ENOMEM;
+               dh = kmalloc(len, GFP_KERNEL);
+               if (!dh)
+                       return -ENOMEM;
+
+               rc = rbd_req_sync_read(rbd_dev,
+                                      NULL, CEPH_NOSNAP,
+                                      rbd_dev->obj_md_name,
+                                      0, len,
+                                      (char *)dh);
+               if (rc < 0)
+                       goto out_dh;
+
+               rc = rbd_header_from_disk(header, dh, snap_count, GFP_KERNEL);
+               if (rc < 0)
+                       goto out_dh;
+
+               if (snap_count != header->total_snaps) {
+                       snap_count = header->total_snaps;
+                       snap_names_len = header->snap_names_len;
+                       rbd_header_free(header);
+                       kfree(dh);
+                       continue;
+               }
+               break;
+       }
+
+out_dh:
+       kfree(dh);
+       return rc;
+}
+
+/*
+ * create a snapshot
+ */
+static int rbd_header_add_snap(struct rbd_device *dev,
+                              const char *snap_name,
+                              gfp_t gfp_flags)
+{
+       int name_len = strlen(snap_name);
+       u64 new_snapid;
+       int ret;
+       void *data, *data_start, *data_end;
+
+       /* we should create a snapshot only if we're pointing at the head */
+       if (dev->cur_snap)
+               return -EINVAL;
+
+       ret = ceph_monc_create_snapid(&dev->client->monc, dev->poolid,
+                                     &new_snapid);
+       dout("created snapid=%lld\n", new_snapid);
+       if (ret < 0)
+               return ret;
+
+       data = kmalloc(name_len + 16, gfp_flags);
+       if (!data)
+               return -ENOMEM;
+
+       data_start = data;
+       data_end = data + name_len + 16;
+
+       ceph_encode_string_safe(&data, data_end, snap_name, name_len, bad);
+       ceph_encode_64_safe(&data, data_end, new_snapid, bad);
+
+       ret = rbd_req_sync_exec(dev, dev->obj_md_name, "rbd", "snap_add",
+                               data_start, data - data_start);
+
+       kfree(data_start);
+
+       if (ret < 0)
+               return ret;
+
+       dev->header.snapc->seq =  new_snapid;
+
+       return 0;
+bad:
+       return -ERANGE;
+}
+
+/*
+ * only read the first part of the ondisk header, without the snaps info
+ */
+static int rbd_update_snaps(struct rbd_device *rbd_dev)
+{
+       int ret;
+       struct rbd_image_header h;
+       u64 snap_seq;
+
+       ret = rbd_read_header(rbd_dev, &h);
+       if (ret < 0)
+               return ret;
+
+       down_write(&rbd_dev->header.snap_rwsem);
+
+       snap_seq = rbd_dev->header.snapc->seq;
+
+       kfree(rbd_dev->header.snapc);
+       kfree(rbd_dev->header.snap_names);
+       kfree(rbd_dev->header.snap_sizes);
+
+       rbd_dev->header.total_snaps = h.total_snaps;
+       rbd_dev->header.snapc = h.snapc;
+       rbd_dev->header.snap_names = h.snap_names;
+       rbd_dev->header.snap_sizes = h.snap_sizes;
+       rbd_dev->header.snapc->seq = snap_seq;
+
+       up_write(&rbd_dev->header.snap_rwsem);
+
+       return 0;
+}
+
+static int rbd_init_disk(struct rbd_device *rbd_dev)
+{
+       struct gendisk *disk;
+       struct request_queue *q;
+       int rc;
+       u64 total_size = 0;
+
+       /* contact OSD, request size info about the object being mapped */
+       rc = rbd_read_header(rbd_dev, &rbd_dev->header);
+       if (rc)
+               return rc;
+
+       rc = rbd_header_set_snap(rbd_dev, rbd_dev->snap_name, &total_size);
+       if (rc)
+               return rc;
+
+       /* create gendisk info */
+       rc = -ENOMEM;
+       disk = alloc_disk(RBD_MINORS_PER_MAJOR);
+       if (!disk)
+               goto out;
+
+       sprintf(disk->disk_name, DRV_NAME "%d", rbd_dev->id);
+       disk->major = rbd_dev->major;
+       disk->first_minor = 0;
+       disk->fops = &rbd_bd_ops;
+       disk->private_data = rbd_dev;
+
+       /* init rq */
+       rc = -ENOMEM;
+       q = blk_init_queue(rbd_rq_fn, &rbd_dev->lock);
+       if (!q)
+               goto out_disk;
+       blk_queue_merge_bvec(q, rbd_merge_bvec);
+       disk->queue = q;
+
+       q->queuedata = rbd_dev;
+
+       rbd_dev->disk = disk;
+       rbd_dev->q = q;
+
+       /* finally, announce the disk to the world */
+       set_capacity(disk, total_size / 512ULL);
+       add_disk(disk);
+
+       pr_info("%s: added with size 0x%llx\n",
+               disk->disk_name, (unsigned long long)total_size);
+       return 0;
+
+out_disk:
+       put_disk(disk);
+out:
+       return rc;
+}
+
+/********************************************************************
+ * /sys/class/rbd/
+ *                   add       map rados objects to blkdev
+ *                   remove    unmap rados objects
+ *                   list      show mappings
+ *******************************************************************/
+
+static void class_rbd_release(struct class *cls)
+{
+       kfree(cls);
+}
+
+static ssize_t class_rbd_list(struct class *c,
+                             struct class_attribute *attr,
+                             char *data)
+{
+       int n = 0;
+       struct list_head *tmp;
+       int max = PAGE_SIZE;
+
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+       n += snprintf(data, max,
+                     "#id\tmajor\tclient_name\tpool\tname\tsnap\tKB\n");
+
+       list_for_each(tmp, &rbd_dev_list) {
+               struct rbd_device *rbd_dev;
+
+               rbd_dev = list_entry(tmp, struct rbd_device, node);
+               n += snprintf(data+n, max-n,
+                             "%d\t%d\tclient%lld\t%s\t%s\t%s\t%lld\n",
+                             rbd_dev->id,
+                             rbd_dev->major,
+                             ceph_client_id(rbd_dev->client),
+                             rbd_dev->pool_name,
+                             rbd_dev->obj, rbd_dev->snap_name,
+                             rbd_dev->header.image_size >> 10);
+               if (n == max)
+                       break;
+       }
+
+       mutex_unlock(&ctl_mutex);
+       return n;
+}
+
+static ssize_t class_rbd_add(struct class *c,
+                            struct class_attribute *attr,
+                            const char *buf, size_t count)
+{
+       struct ceph_osd_client *osdc;
+       struct rbd_device *rbd_dev;
+       ssize_t rc = -ENOMEM;
+       int irc, new_id = 0;
+       struct list_head *tmp;
+       char *mon_dev_name;
+       char *options;
+
+       if (!try_module_get(THIS_MODULE))
+               return -ENODEV;
+
+       mon_dev_name = kmalloc(RBD_MAX_OPT_LEN, GFP_KERNEL);
+       if (!mon_dev_name)
+               goto err_out_mod;
+
+       options = kmalloc(RBD_MAX_OPT_LEN, GFP_KERNEL);
+       if (!options)
+               goto err_mon_dev;
+
+       /* new rbd_device object */
+       rbd_dev = kzalloc(sizeof(*rbd_dev), GFP_KERNEL);
+       if (!rbd_dev)
+               goto err_out_opt;
+
+       /* static rbd_device initialization */
+       spin_lock_init(&rbd_dev->lock);
+       INIT_LIST_HEAD(&rbd_dev->node);
+
+       /* generate unique id: find highest unique id, add one */
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+       list_for_each(tmp, &rbd_dev_list) {
+               struct rbd_device *rbd_dev;
+
+               rbd_dev = list_entry(tmp, struct rbd_device, node);
+               if (rbd_dev->id >= new_id)
+                       new_id = rbd_dev->id + 1;
+       }
+
+       rbd_dev->id = new_id;
+
+       /* add to global list */
+       list_add_tail(&rbd_dev->node, &rbd_dev_list);
+
+       /* parse add command */
+       if (sscanf(buf, "%" __stringify(RBD_MAX_OPT_LEN) "s "
+                  "%" __stringify(RBD_MAX_OPT_LEN) "s "
+                  "%" __stringify(RBD_MAX_POOL_NAME_LEN) "s "
+                  "%" __stringify(RBD_MAX_OBJ_NAME_LEN) "s"
+                  "%" __stringify(RBD_MAX_SNAP_NAME_LEN) "s",
+                  mon_dev_name, options, rbd_dev->pool_name,
+                  rbd_dev->obj, rbd_dev->snap_name) < 4) {
+               rc = -EINVAL;
+               goto err_out_slot;
+       }
+
+       if (rbd_dev->snap_name[0] == 0)
+               rbd_dev->snap_name[0] = '-';
+
+       rbd_dev->obj_len = strlen(rbd_dev->obj);
+       snprintf(rbd_dev->obj_md_name, sizeof(rbd_dev->obj_md_name), "%s%s",
+                rbd_dev->obj, RBD_SUFFIX);
+
+       /* initialize rest of new object */
+       snprintf(rbd_dev->name, DEV_NAME_LEN, DRV_NAME "%d", rbd_dev->id);
+       rc = rbd_get_client(rbd_dev, mon_dev_name, options);
+       if (rc < 0)
+               goto err_out_slot;
+
+       mutex_unlock(&ctl_mutex);
+
+       /* pick the pool */
+       osdc = &rbd_dev->client->osdc;
+       rc = ceph_pg_poolid_by_name(osdc->osdmap, rbd_dev->pool_name);
+       if (rc < 0)
+               goto err_out_client;
+       rbd_dev->poolid = rc;
+
+       /* register our block device */
+       irc = register_blkdev(0, rbd_dev->name);
+       if (irc < 0) {
+               rc = irc;
+               goto err_out_client;
+       }
+       rbd_dev->major = irc;
+
+       /* set up and announce blkdev mapping */
+       rc = rbd_init_disk(rbd_dev);
+       if (rc)
+               goto err_out_blkdev;
+
+       return count;
+
+err_out_blkdev:
+       unregister_blkdev(rbd_dev->major, rbd_dev->name);
+err_out_client:
+       rbd_put_client(rbd_dev);
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+err_out_slot:
+       list_del_init(&rbd_dev->node);
+       mutex_unlock(&ctl_mutex);
+
+       kfree(rbd_dev);
+err_out_opt:
+       kfree(options);
+err_mon_dev:
+       kfree(mon_dev_name);
+err_out_mod:
+       dout("Error adding device %s\n", buf);
+       module_put(THIS_MODULE);
+       return rc;
+}
+
+static struct rbd_device *__rbd_get_dev(unsigned long id)
+{
+       struct list_head *tmp;
+       struct rbd_device *rbd_dev;
+
+       list_for_each(tmp, &rbd_dev_list) {
+               rbd_dev = list_entry(tmp, struct rbd_device, node);
+               if (rbd_dev->id == id)
+                       return rbd_dev;
+       }
+       return NULL;
+}
+
+static ssize_t class_rbd_remove(struct class *c,
+                               struct class_attribute *attr,
+                               const char *buf,
+                               size_t count)
+{
+       struct rbd_device *rbd_dev = NULL;
+       int target_id, rc;
+       unsigned long ul;
+
+       rc = strict_strtoul(buf, 10, &ul);
+       if (rc)
+               return rc;
+
+       /* convert to int; abort if we lost anything in the conversion */
+       target_id = (int) ul;
+       if (target_id != ul)
+               return -EINVAL;
+
+       /* remove object from list immediately */
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+       rbd_dev = __rbd_get_dev(target_id);
+       if (rbd_dev)
+               list_del_init(&rbd_dev->node);
+
+       mutex_unlock(&ctl_mutex);
+
+       if (!rbd_dev)
+               return -ENOENT;
+
+       rbd_put_client(rbd_dev);
+
+       /* clean up and free blkdev */
+       rbd_free_disk(rbd_dev);
+       unregister_blkdev(rbd_dev->major, rbd_dev->name);
+       kfree(rbd_dev);
+
+       /* release module ref */
+       module_put(THIS_MODULE);
+
+       return count;
+}
+
+static ssize_t class_rbd_snaps_list(struct class *c,
+                             struct class_attribute *attr,
+                             char *data)
+{
+       struct rbd_device *rbd_dev = NULL;
+       struct list_head *tmp;
+       struct rbd_image_header *header;
+       int i, n = 0, max = PAGE_SIZE;
+       int ret;
+
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+       n += snprintf(data, max, "#id\tsnap\tKB\n");
+
+       list_for_each(tmp, &rbd_dev_list) {
+               char *names, *p;
+               struct ceph_snap_context *snapc;
+
+               rbd_dev = list_entry(tmp, struct rbd_device, node);
+               header = &rbd_dev->header;
+
+               down_read(&header->snap_rwsem);
+
+               names = header->snap_names;
+               snapc = header->snapc;
+
+               n += snprintf(data + n, max - n, "%d\t%s\t%lld%s\n",
+                             rbd_dev->id, RBD_SNAP_HEAD_NAME,
+                             header->image_size >> 10,
+                             (!rbd_dev->cur_snap ? " (*)" : ""));
+               if (n == max)
+                       break;
+
+               p = names;
+               for (i = 0; i < header->total_snaps; i++, p += strlen(p) + 1) {
+                       n += snprintf(data + n, max - n, "%d\t%s\t%lld%s\n",
+                             rbd_dev->id, p, header->snap_sizes[i] >> 10,
+                             (rbd_dev->cur_snap &&
+                              (snap_index(header, i) == rbd_dev->cur_snap) ?
+                              " (*)" : ""));
+                       if (n == max)
+                               break;
+               }
+
+               up_read(&header->snap_rwsem);
+       }
+
+
+       ret = n;
+       mutex_unlock(&ctl_mutex);
+       return ret;
+}
+
+static ssize_t class_rbd_snaps_refresh(struct class *c,
+                               struct class_attribute *attr,
+                               const char *buf,
+                               size_t count)
+{
+       struct rbd_device *rbd_dev = NULL;
+       int target_id, rc;
+       unsigned long ul;
+       int ret = count;
+
+       rc = strict_strtoul(buf, 10, &ul);
+       if (rc)
+               return rc;
+
+       /* convert to int; abort if we lost anything in the conversion */
+       target_id = (int) ul;
+       if (target_id != ul)
+               return -EINVAL;
+
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+       rbd_dev = __rbd_get_dev(target_id);
+       if (!rbd_dev) {
+               ret = -ENOENT;
+               goto done;
+       }
+
+       rc = rbd_update_snaps(rbd_dev);
+       if (rc < 0)
+               ret = rc;
+
+done:
+       mutex_unlock(&ctl_mutex);
+       return ret;
+}
+
+static ssize_t class_rbd_snap_create(struct class *c,
+                               struct class_attribute *attr,
+                               const char *buf,
+                               size_t count)
+{
+       struct rbd_device *rbd_dev = NULL;
+       int target_id, ret;
+       char *name;
+
+       name = kmalloc(RBD_MAX_SNAP_NAME_LEN + 1, GFP_KERNEL);
+       if (!name)
+               return -ENOMEM;
+
+       /* parse snaps add command */
+       if (sscanf(buf, "%d "
+                  "%" __stringify(RBD_MAX_SNAP_NAME_LEN) "s",
+                  &target_id,
+                  name) != 2) {
+               ret = -EINVAL;
+               goto done;
+       }
+
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+       rbd_dev = __rbd_get_dev(target_id);
+       if (!rbd_dev) {
+               ret = -ENOENT;
+               goto done_unlock;
+       }
+
+       ret = rbd_header_add_snap(rbd_dev,
+                                 name, GFP_KERNEL);
+       if (ret < 0)
+               goto done_unlock;
+
+       ret = rbd_update_snaps(rbd_dev);
+       if (ret < 0)
+               goto done_unlock;
+
+       ret = count;
+done_unlock:
+       mutex_unlock(&ctl_mutex);
+done:
+       kfree(name);
+       return ret;
+}
+
+static ssize_t class_rbd_rollback(struct class *c,
+                               struct class_attribute *attr,
+                               const char *buf,
+                               size_t count)
+{
+       struct rbd_device *rbd_dev = NULL;
+       int target_id, ret;
+       u64 snapid;
+       char snap_name[RBD_MAX_SNAP_NAME_LEN];
+       u64 cur_ofs;
+       char *seg_name;
+
+       /* parse snaps add command */
+       if (sscanf(buf, "%d "
+                  "%" __stringify(RBD_MAX_SNAP_NAME_LEN) "s",
+                  &target_id,
+                  snap_name) != 2) {
+               return -EINVAL;
+       }
+
+       ret = -ENOMEM;
+       seg_name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
+       if (!seg_name)
+               return ret;
+
+       mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+       rbd_dev = __rbd_get_dev(target_id);
+       if (!rbd_dev) {
+               ret = -ENOENT;
+               goto done_unlock;
+       }
+
+       ret = snap_by_name(&rbd_dev->header, snap_name, &snapid, NULL);
+       if (ret < 0)
+               goto done_unlock;
+
+       dout("snapid=%lld\n", snapid);
+
+       cur_ofs = 0;
+       while (cur_ofs < rbd_dev->header.image_size) {
+               cur_ofs += rbd_get_segment(&rbd_dev->header,
+                                          rbd_dev->obj,
+                                          cur_ofs, (u64)-1,
+                                          seg_name, NULL);
+               dout("seg_name=%s\n", seg_name);
+
+               ret = rbd_req_sync_rollback_obj(rbd_dev, snapid, seg_name);
+               if (ret < 0)
+                       pr_warning("could not roll back obj %s err=%d\n",
+                                  seg_name, ret);
+       }
+
+       ret = rbd_update_snaps(rbd_dev);
+       if (ret < 0)
+               goto done_unlock;
+
+       ret = count;
+
+done_unlock:
+       mutex_unlock(&ctl_mutex);
+       kfree(seg_name);
+
+       return ret;
+}
+
+static struct class_attribute class_rbd_attrs[] = {
+       __ATTR(add,             0200, NULL, class_rbd_add),
+       __ATTR(remove,          0200, NULL, class_rbd_remove),
+       __ATTR(list,            0444, class_rbd_list, NULL),
+       __ATTR(snaps_refresh,   0200, NULL, class_rbd_snaps_refresh),
+       __ATTR(snap_create,     0200, NULL, class_rbd_snap_create),
+       __ATTR(snaps_list,      0444, class_rbd_snaps_list, NULL),
+       __ATTR(snap_rollback,   0200, NULL, class_rbd_rollback),
+       __ATTR_NULL
+};
+
+/*
+ * create control files in sysfs
+ * /sys/class/rbd/...
+ */
+static int rbd_sysfs_init(void)
+{
+       int ret = -ENOMEM;
+
+       class_rbd = kzalloc(sizeof(*class_rbd), GFP_KERNEL);
+       if (!class_rbd)
+               goto out;
+
+       class_rbd->name = DRV_NAME;
+       class_rbd->owner = THIS_MODULE;
+       class_rbd->class_release = class_rbd_release;
+       class_rbd->class_attrs = class_rbd_attrs;
+
+       ret = class_register(class_rbd);
+       if (ret)
+               goto out_class;
+       return 0;
+
+out_class:
+       kfree(class_rbd);
+       class_rbd = NULL;
+       pr_err(DRV_NAME ": failed to create class rbd\n");
+out:
+       return ret;
+}
+
+static void rbd_sysfs_cleanup(void)
+{
+       if (class_rbd)
+               class_destroy(class_rbd);
+       class_rbd = NULL;
+}
+
+int __init rbd_init(void)
+{
+       int rc;
+
+       rc = rbd_sysfs_init();
+       if (rc)
+               return rc;
+       spin_lock_init(&node_lock);
+       pr_info("loaded " DRV_NAME_LONG "\n");
+       return 0;
+}
+
+void __exit rbd_exit(void)
+{
+       rbd_sysfs_cleanup();
+}
+
+module_init(rbd_init);
+module_exit(rbd_exit);
+
+MODULE_AUTHOR("Sage Weil <sage@newdream.net>");
+MODULE_AUTHOR("Yehuda Sadeh <yehuda@hq.newdream.net>");
+MODULE_DESCRIPTION("rados block device");
+
+/* following authorship retained from original osdblk.c */
+MODULE_AUTHOR("Jeff Garzik <jeff@garzik.org>");
+
+MODULE_LICENSE("GPL");
diff --git a/drivers/block/rbd_types.h b/drivers/block/rbd_types.h

new file mode 100644 (file)

index 0000000..fc6c678
--- /dev/null
+++ b/drivers/block/rbd_types.h
@@ -0,0 +1,73 @@
+/*
+ * Ceph - scalable distributed file system
+ *
+ * Copyright (C) 2004-2010 Sage Weil <sage@newdream.net>
+ *
+ * This is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2.1, as published by the Free Software
+ * Foundation.  See file COPYING.
+ *
+ */
+
+#ifndef CEPH_RBD_TYPES_H
+#define CEPH_RBD_TYPES_H
+
+#include <linux/types.h>
+
+/*
+ * rbd image 'foo' consists of objects
+ *   foo.rbd      - image metadata
+ *   foo.00000000
+ *   foo.00000001
+ *   ...          - data
+ */
+
+#define RBD_SUFFIX             ".rbd"
+#define RBD_DIRECTORY           "rbd_directory"
+#define RBD_INFO                "rbd_info"
+
+#define RBD_DEFAULT_OBJ_ORDER  22   /* 4MB */
+#define RBD_MIN_OBJ_ORDER       16
+#define RBD_MAX_OBJ_ORDER       30
+
+#define RBD_MAX_OBJ_NAME_LEN   96
+#define RBD_MAX_SEG_NAME_LEN   128
+
+#define RBD_COMP_NONE          0
+#define RBD_CRYPT_NONE         0
+
+#define RBD_HEADER_TEXT                "<<< Rados Block Device Image >>>\n"
+#define RBD_HEADER_SIGNATURE   "RBD"
+#define RBD_HEADER_VERSION     "001.005"
+
+struct rbd_info {
+       __le64 max_id;
+} __attribute__ ((packed));
+
+struct rbd_image_snap_ondisk {
+       __le64 id;
+       __le64 image_size;
+} __attribute__((packed));
+
+struct rbd_image_header_ondisk {
+       char text[40];
+       char block_name[24];
+       char signature[4];
+       char version[8];
+       struct {
+               __u8 order;
+               __u8 crypt_type;
+               __u8 comp_type;
+               __u8 unused;
+       } __attribute__((packed)) options;
+       __le64 image_size;
+       __le64 snap_seq;
+       __le32 snap_count;
+       __le32 reserved;
+       __le64 snap_names_len;
+       struct rbd_image_snap_ondisk snaps[0];
+} __attribute__((packed));
+
+
+#endif
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c

index 2aafafca2b1374b11714546fb3c063044ed9200c..8320490226b78145f95c7e7801a15080419adc19 100644 (file)
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -2,7 +2,6 @@
  #include <linux/spinlock.h>
  #include <linux/slab.h>
  #include <linux/blkdev.h>
-#include <linux/smp_lock.h>
  #include <linux/hdreg.h>
  #include <linux/virtio.h>
  #include <linux/virtio_blk.h>
@@ -202,6 +201,7 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
         struct virtio_blk *vblk = disk->private_data;
         struct request *req;
         struct bio *bio;
+       int err;
  
         bio = bio_map_kern(vblk->disk->queue, id_str, VIRTIO_BLK_ID_BYTES,
                            GFP_KERNEL);
@@ -215,11 +215,14 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
         }
  
         req->cmd_type = REQ_TYPE_SPECIAL;
-       return blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+       err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+       blk_put_request(req);
+
+       return err;
  }
  
-static int virtblk_locked_ioctl(struct block_device *bdev, fmode_t mode,
-                        unsigned cmd, unsigned long data)
+static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
+                            unsigned int cmd, unsigned long data)
  {
         struct gendisk *disk = bdev->bd_disk;
         struct virtio_blk *vblk = disk->private_data;
@@ -234,18 +237,6 @@ static int virtblk_locked_ioctl(struct block_device *bdev, fmode_t mode,
                               (void __user *)data);
  }
  
-static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
-                            unsigned int cmd, unsigned long param)
-{
-       int ret;
-
-       lock_kernel();
-       ret = virtblk_locked_ioctl(bdev, mode, cmd, param);
-       unlock_kernel();
-
-       return ret;
-}
-
  /* We provide getgeo only to please some old bootloader/partitioning tools */
  static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
  {
diff --git a/drivers/char/tpm/tpm.c b/drivers/char/tpm/tpm.c

index 05ad4a17a28f238ecc197e67051f60a50faad769..7c4133582dbae484f449d49f97b2ee0f179a9891 100644 (file)
--- a/drivers/char/tpm/tpm.c
+++ b/drivers/char/tpm/tpm.c
@@ -47,6 +47,16 @@ enum tpm_duration {
  #define TPM_MAX_PROTECTED_ORDINAL 12
  #define TPM_PROTECTED_ORDINAL_MASK 0xFF
  
+/*
+ * Bug workaround - some TPM's don't flush the most
+ * recently changed pcr on suspend, so force the flush
+ * with an extend to the selected _unused_ non-volatile pcr.
+ */
+static int tpm_suspend_pcr;
+module_param_named(suspend_pcr, tpm_suspend_pcr, uint, 0644);
+MODULE_PARM_DESC(suspend_pcr,
+                "PCR to use for dummy writes to faciltate flush on suspend.");
+
  static LIST_HEAD(tpm_chip_list);
  static DEFINE_SPINLOCK(driver_lock);
  static DECLARE_BITMAP(dev_mask, TPM_NUM_DEVICES);
@@ -1077,18 +1087,6 @@ static struct tpm_input_header savestate_header = {
         .ordinal = TPM_ORD_SAVESTATE
  };
  
-/* Bug workaround - some TPM's don't flush the most
- * recently changed pcr on suspend, so force the flush
- * with an extend to the selected _unused_ non-volatile pcr.
- */
-static int tpm_suspend_pcr;
-static int __init tpm_suspend_setup(char *str)
-{
-       get_option(&str, &tpm_suspend_pcr);
-       return 1;
-}
-__setup("tpm_suspend_pcr=", tpm_suspend_setup);
-
  /*
   * We are about to suspend. Save the TPM state
   * so that it can be restored.
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c

index c810481a5bc23ae3ca57127729c83e454d681534..6c1b676643a9ef43fe3984bb8495996960cf4337 100644 (file)
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -48,6 +48,9 @@ struct ports_driver_data {
         /* Used for exporting per-port information to debugfs */
         struct dentry *debugfs_dir;
  
+       /* List of all the devices we're handling */
+       struct list_head portdevs;
+
         /* Number of devices this driver is handling */
         unsigned int index;
  
@@ -108,6 +111,9 @@ struct port_buffer {
   * ports for that device (vdev->priv).
   */
  struct ports_device {
+       /* Next portdev in the list, head is in the pdrvdata struct */
+       struct list_head list;
+
         /*
          * Workqueue handlers where we process deferred work after
          * notification
@@ -178,15 +184,21 @@ struct port {
         struct console cons;
  
         /* Each port associates with a separate char device */
-       struct cdev cdev;
+       struct cdev *cdev;
         struct device *dev;
  
+       /* Reference-counting to handle port hot-unplugs and file operations */
+       struct kref kref;
+
         /* A waitqueue for poll() or blocking read operations */
         wait_queue_head_t waitqueue;
  
         /* The 'name' of the port that we expose via sysfs properties */
         char *name;
  
+       /* We can notify apps of host connect / disconnect events via SIGIO */
+       struct fasync_struct *async_queue;
+
         /* The 'id' to identify the port with the Host */
         u32 id;
  
@@ -221,6 +233,41 @@ out:
         return port;
  }
  
+static struct port *find_port_by_devt_in_portdev(struct ports_device *portdev,
+                                                dev_t dev)
+{
+       struct port *port;
+       unsigned long flags;
+
+       spin_lock_irqsave(&portdev->ports_lock, flags);
+       list_for_each_entry(port, &portdev->ports, list)
+               if (port->cdev->dev == dev)
+                       goto out;
+       port = NULL;
+out:
+       spin_unlock_irqrestore(&portdev->ports_lock, flags);
+
+       return port;
+}
+
+static struct port *find_port_by_devt(dev_t dev)
+{
+       struct ports_device *portdev;
+       struct port *port;
+       unsigned long flags;
+
+       spin_lock_irqsave(&pdrvdata_lock, flags);
+       list_for_each_entry(portdev, &pdrvdata.portdevs, list) {
+               port = find_port_by_devt_in_portdev(portdev, dev);
+               if (port)
+                       goto out;
+       }
+       port = NULL;
+out:
+       spin_unlock_irqrestore(&pdrvdata_lock, flags);
+       return port;
+}
+
  static struct port *find_port_by_id(struct ports_device *portdev, u32 id)
  {
         struct port *port;
@@ -410,7 +457,10 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id,
  static ssize_t send_control_msg(struct port *port, unsigned int event,
                                 unsigned int value)
  {
-       return __send_control_msg(port->portdev, port->id, event, value);
+       /* Did the port get unplugged before userspace closed it? */
+       if (port->portdev)
+               return __send_control_msg(port->portdev, port->id, event, value);
+       return 0;
  }
  
  /* Callers must take the port->outvq_lock */
@@ -459,9 +509,12 @@ static ssize_t send_buf(struct port *port, void *in_buf, size_t in_count,
  
         /*
          * Wait till the host acknowledges it pushed out the data we
-        * sent.  This is done for ports in blocking mode or for data
-        * from the hvc_console; the tty operations are performed with
-        * spinlocks held so we can't sleep here.
+        * sent.  This is done for data from the hvc_console; the tty
+        * operations are performed with spinlocks held so we can't
+        * sleep here.  An alternative would be to copy the data to a
+        * buffer and relax the spinning requirement.  The downside is
+        * we need to kmalloc a GFP_ATOMIC buffer each time the
+        * console driver writes something out.
          */
         while (!virtqueue_get_buf(out_vq, &len))
                 cpu_relax();
@@ -522,6 +575,10 @@ static ssize_t fill_readbuf(struct port *port, char *out_buf, size_t out_count,
  /* The condition that must be true for polling to end */
  static bool will_read_block(struct port *port)
  {
+       if (!port->guest_connected) {
+               /* Port got hot-unplugged. Let's exit. */
+               return false;
+       }
         return !port_has_data(port) && port->host_connected;
  }
  
@@ -572,6 +629,9 @@ static ssize_t port_fops_read(struct file *filp, char __user *ubuf,
                 if (ret < 0)
                         return ret;
         }
+       /* Port got hot-unplugged. */
+       if (!port->guest_connected)
+               return -ENODEV;
         /*
          * We could've received a disconnection message while we were
          * waiting for more data.
@@ -613,6 +673,9 @@ static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
                 if (ret < 0)
                         return ret;
         }
+       /* Port got hot-unplugged. */
+       if (!port->guest_connected)
+               return -ENODEV;
  
         count = min((size_t)(32 * 1024), count);
  
@@ -626,6 +689,14 @@ static ssize_t port_fops_write(struct file *filp, const char __user *ubuf,
                 goto free_buf;
         }
  
+       /*
+        * We now ask send_buf() to not spin for generic ports -- we
+        * can re-use the same code path that non-blocking file
+        * descriptors take for blocking file descriptors since the
+        * wait is already done and we're certain the write will go
+        * through to the host.
+        */
+       nonblock = true;
         ret = send_buf(port, buf, count, nonblock);
  
         if (nonblock && ret > 0)
@@ -645,6 +716,10 @@ static unsigned int port_fops_poll(struct file *filp, poll_table *wait)
         port = filp->private_data;
         poll_wait(filp, &port->waitqueue, wait);
  
+       if (!port->guest_connected) {
+               /* Port got unplugged */
+               return POLLHUP;
+       }
         ret = 0;
         if (!will_read_block(port))
                 ret |= POLLIN | POLLRDNORM;
@@ -656,6 +731,8 @@ static unsigned int port_fops_poll(struct file *filp, poll_table *wait)
         return ret;
  }
  
+static void remove_port(struct kref *kref);
+
  static int port_fops_release(struct inode *inode, struct file *filp)
  {
         struct port *port;
@@ -676,6 +753,16 @@ static int port_fops_release(struct inode *inode, struct file *filp)
         reclaim_consumed_buffers(port);
         spin_unlock_irq(&port->outvq_lock);
  
+       /*
+        * Locks aren't necessary here as a port can't be opened after
+        * unplug, and if a port isn't unplugged, a kref would already
+        * exist for the port.  Plus, taking ports_lock here would
+        * create a dependency on other locks taken by functions
+        * inside remove_port if we're the last holder of the port,
+        * creating many problems.
+        */
+       kref_put(&port->kref, remove_port);
+
         return 0;
  }
  
@@ -683,22 +770,31 @@ static int port_fops_open(struct inode *inode, struct file *filp)
  {
         struct cdev *cdev = inode->i_cdev;
         struct port *port;
+       int ret;
  
-       port = container_of(cdev, struct port, cdev);
+       port = find_port_by_devt(cdev->dev);
         filp->private_data = port;
  
+       /* Prevent against a port getting hot-unplugged at the same time */
+       spin_lock_irq(&port->portdev->ports_lock);
+       kref_get(&port->kref);
+       spin_unlock_irq(&port->portdev->ports_lock);
+
         /*
          * Don't allow opening of console port devices -- that's done
          * via /dev/hvc
          */
-       if (is_console_port(port))
-               return -ENXIO;
+       if (is_console_port(port)) {
+               ret = -ENXIO;
+               goto out;
+       }
  
         /* Allow only one process to open a particular port at a time */
         spin_lock_irq(&port->inbuf_lock);
         if (port->guest_connected) {
                 spin_unlock_irq(&port->inbuf_lock);
-               return -EMFILE;
+               ret = -EMFILE;
+               goto out;
         }
  
         port->guest_connected = true;
@@ -713,10 +809,23 @@ static int port_fops_open(struct inode *inode, struct file *filp)
         reclaim_consumed_buffers(port);
         spin_unlock_irq(&port->outvq_lock);
  
+       nonseekable_open(inode, filp);
+
         /* Notify host of port being opened */
         send_control_msg(filp->private_data, VIRTIO_CONSOLE_PORT_OPEN, 1);
  
         return 0;
+out:
+       kref_put(&port->kref, remove_port);
+       return ret;
+}
+
+static int port_fops_fasync(int fd, struct file *filp, int mode)
+{
+       struct port *port;
+
+       port = filp->private_data;
+       return fasync_helper(fd, filp, mode, &port->async_queue);
  }
  
  /*
@@ -732,6 +841,8 @@ static const struct file_operations port_fops = {
         .write = port_fops_write,
         .poll  = port_fops_poll,
         .release = port_fops_release,
+       .fasync = port_fops_fasync,
+       .llseek = no_llseek,
  };
  
  /*
@@ -990,6 +1101,12 @@ static unsigned int fill_queue(struct virtqueue *vq, spinlock_t *lock)
         return nr_added_bufs;
  }
  
+static void send_sigio_to_port(struct port *port)
+{
+       if (port->async_queue && port->guest_connected)
+               kill_fasync(&port->async_queue, SIGIO, POLL_OUT);
+}
+
  static int add_port(struct ports_device *portdev, u32 id)
  {
         char debugfs_name[16];
@@ -1004,6 +1121,7 @@ static int add_port(struct ports_device *portdev, u32 id)
                 err = -ENOMEM;
                 goto fail;
         }
+       kref_init(&port->kref);
  
         port->portdev = portdev;
         port->id = id;
@@ -1011,6 +1129,7 @@ static int add_port(struct ports_device *portdev, u32 id)
         port->name = NULL;
         port->inbuf = NULL;
         port->cons.hvc = NULL;
+       port->async_queue = NULL;
  
         port->cons.ws.ws_row = port->cons.ws.ws_col = 0;
  
@@ -1021,14 +1140,20 @@ static int add_port(struct ports_device *portdev, u32 id)
         port->in_vq = portdev->in_vqs[port->id];
         port->out_vq = portdev->out_vqs[port->id];
  
-       cdev_init(&port->cdev, &port_fops);
+       port->cdev = cdev_alloc();
+       if (!port->cdev) {
+               dev_err(&port->portdev->vdev->dev, "Error allocating cdev\n");
+               err = -ENOMEM;
+               goto free_port;
+       }
+       port->cdev->ops = &port_fops;
  
         devt = MKDEV(portdev->chr_major, id);
-       err = cdev_add(&port->cdev, devt, 1);
+       err = cdev_add(port->cdev, devt, 1);
         if (err < 0) {
                 dev_err(&port->portdev->vdev->dev,
                         "Error %d adding cdev for port %u\n", err, id);
-               goto free_port;
+               goto free_cdev;
         }
         port->dev = device_create(pdrvdata.class, &port->portdev->vdev->dev,
                                   devt, port, "vport%up%u",
@@ -1093,7 +1218,7 @@ free_inbufs:
  free_device:
         device_destroy(pdrvdata.class, port->dev->devt);
  free_cdev:
-       cdev_del(&port->cdev);
+       cdev_del(port->cdev);
  free_port:
         kfree(port);
  fail:
@@ -1102,21 +1227,45 @@ fail:
         return err;
  }
  
-/* Remove all port-specific data. */
-static int remove_port(struct port *port)
+/* No users remain, remove all port-specific data. */
+static void remove_port(struct kref *kref)
+{
+       struct port *port;
+
+       port = container_of(kref, struct port, kref);
+
+       sysfs_remove_group(&port->dev->kobj, &port_attribute_group);
+       device_destroy(pdrvdata.class, port->dev->devt);
+       cdev_del(port->cdev);
+
+       kfree(port->name);
+
+       debugfs_remove(port->debugfs_file);
+
+       kfree(port);
+}
+
+/*
+ * Port got unplugged.  Remove port from portdev's list and drop the
+ * kref reference.  If no userspace has this port opened, it will
+ * result in immediate removal the port.
+ */
+static void unplug_port(struct port *port)
  {
         struct port_buffer *buf;
  
+       spin_lock_irq(&port->portdev->ports_lock);
+       list_del(&port->list);
+       spin_unlock_irq(&port->portdev->ports_lock);
+
         if (port->guest_connected) {
                 port->guest_connected = false;
                 port->host_connected = false;
                 wake_up_interruptible(&port->waitqueue);
-               send_control_msg(port, VIRTIO_CONSOLE_PORT_OPEN, 0);
-       }
  
-       spin_lock_irq(&port->portdev->ports_lock);
-       list_del(&port->list);
-       spin_unlock_irq(&port->portdev->ports_lock);
+               /* Let the app know the port is going down. */
+               send_sigio_to_port(port);
+       }
  
         if (is_console_port(port)) {
                 spin_lock_irq(&pdrvdata_lock);
@@ -1135,9 +1284,6 @@ static int remove_port(struct port *port)
                 hvc_remove(port->cons.hvc);
  #endif
         }
-       sysfs_remove_group(&port->dev->kobj, &port_attribute_group);
-       device_destroy(pdrvdata.class, port->dev->devt);
-       cdev_del(&port->cdev);
  
         /* Remove unused data this port might have received. */
         discard_port_data(port);
@@ -1148,12 +1294,19 @@ static int remove_port(struct port *port)
         while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
                 free_buf(buf);
  
-       kfree(port->name);
-
-       debugfs_remove(port->debugfs_file);
+       /*
+        * We should just assume the device itself has gone off --
+        * else a close on an open port later will try to send out a
+        * control message.
+        */
+       port->portdev = NULL;
  
-       kfree(port);
-       return 0;
+       /*
+        * Locks around here are not necessary - a port can't be
+        * opened after we removed the port struct from ports_list
+        * above.
+        */
+       kref_put(&port->kref, remove_port);
  }
  
  /* Any private messages that the Host and Guest want to share */
@@ -1192,7 +1345,7 @@ static void handle_control_message(struct ports_device *portdev,
                 add_port(portdev, cpkt->id);
                 break;
         case VIRTIO_CONSOLE_PORT_REMOVE:
-               remove_port(port);
+               unplug_port(port);
                 break;
         case VIRTIO_CONSOLE_CONSOLE_PORT:
                 if (!cpkt->value)
@@ -1234,6 +1387,12 @@ static void handle_control_message(struct ports_device *portdev,
                 spin_lock_irq(&port->outvq_lock);
                 reclaim_consumed_buffers(port);
                 spin_unlock_irq(&port->outvq_lock);
+
+               /*
+                * If the guest is connected, it'll be interested in
+                * knowing the host connection state changed.
+                */
+               send_sigio_to_port(port);
                 break;
         case VIRTIO_CONSOLE_PORT_NAME:
                 /*
@@ -1330,6 +1489,9 @@ static void in_intr(struct virtqueue *vq)
  
         wake_up_interruptible(&port->waitqueue);
  
+       /* Send a SIGIO indicating new data in case the process asked for it */
+       send_sigio_to_port(port);
+
         if (is_console_port(port) && hvc_poll(port->cons.hvc))
                 hvc_kick();
  }
@@ -1566,6 +1728,10 @@ static int __devinit virtcons_probe(struct virtio_device *vdev)
                 add_port(portdev, 0);
         }
  
+       spin_lock_irq(&pdrvdata_lock);
+       list_add_tail(&portdev->list, &pdrvdata.portdevs);
+       spin_unlock_irq(&pdrvdata_lock);
+
         __send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
                            VIRTIO_CONSOLE_DEVICE_READY, 1);
         return 0;
@@ -1589,23 +1755,41 @@ static void virtcons_remove(struct virtio_device *vdev)
  {
         struct ports_device *portdev;
         struct port *port, *port2;
-       struct port_buffer *buf;
-       unsigned int len;
  
         portdev = vdev->priv;
  
+       spin_lock_irq(&pdrvdata_lock);
+       list_del(&portdev->list);
+       spin_unlock_irq(&pdrvdata_lock);
+
+       /* Disable interrupts for vqs */
+       vdev->config->reset(vdev);
+       /* Finish up work that's lined up */
         cancel_work_sync(&portdev->control_work);
  
         list_for_each_entry_safe(port, port2, &portdev->ports, list)
-               remove_port(port);
+               unplug_port(port);
  
         unregister_chrdev(portdev->chr_major, "virtio-portsdev");
  
-       while ((buf = virtqueue_get_buf(portdev->c_ivq, &len)))
-               free_buf(buf);
+       /*
+        * When yanking out a device, we immediately lose the
+        * (device-side) queues.  So there's no point in keeping the
+        * guest side around till we drop our final reference.  This
+        * also means that any ports which are in an open state will
+        * have to just stop using the port, as the vqs are going
+        * away.
+        */
+       if (use_multiport(portdev)) {
+               struct port_buffer *buf;
+               unsigned int len;
  
-       while ((buf = virtqueue_detach_unused_buf(portdev->c_ivq)))
-               free_buf(buf);
+               while ((buf = virtqueue_get_buf(portdev->c_ivq, &len)))
+                       free_buf(buf);
+
+               while ((buf = virtqueue_detach_unused_buf(portdev->c_ivq)))
+                       free_buf(buf);
+       }
  
         vdev->config->del_vqs(vdev);
         kfree(portdev->in_vqs);
@@ -1652,6 +1836,7 @@ static int __init init(void)
                            PTR_ERR(pdrvdata.debugfs_dir));
         }
         INIT_LIST_HEAD(&pdrvdata.consoles);
+       INIT_LIST_HEAD(&pdrvdata.portdevs);
  
         return register_virtio_driver(&virtio_console);
  }
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c

index c2408bbe9c2eed3521f4eb21a86c1e9671774534..f508690eb95859ef80e217f68db827daba606f29 100644 (file)
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -80,7 +80,7 @@
   * Limiting Performance Impact
   * ---------------------------
   * C states, especially those with large exit latencies, can have a real
- * noticable impact on workloads, which is not acceptable for most sysadmins,
+ * noticeable impact on workloads, which is not acceptable for most sysadmins,
   * and in addition, less performance has a power price of its own.
   *
   * As a general rule of thumb, menu assumes that the following heuristic
diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c

index 216f9d383b5b7b1b0a4d2062c388518a7270de55..effd140fc042b827617bebce013dae190c7b4d76 100644 (file)
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -879,7 +879,7 @@ int __devinit ioat2_dma_probe(struct ioatdma_device *device, int dca)
         dma->device_issue_pending = ioat2_issue_pending;
         dma->device_alloc_chan_resources = ioat2_alloc_chan_resources;
         dma->device_free_chan_resources = ioat2_free_chan_resources;
-       dma->device_tx_status = ioat_tx_status;
+       dma->device_tx_status = ioat_dma_tx_status;
  
         err = ioat_probe(device);
         if (err)
diff --git a/drivers/dma/shdma.c b/drivers/dma/shdma.c

index fb64cf36ba61d0e786ecfeb802f43909ade4f2f2..eb6b54dbb8064a9a5d2e71eb3261132195ff6f8d 100644 (file)
--- a/drivers/dma/shdma.c
+++ b/drivers/dma/shdma.c
@@ -580,7 +580,6 @@ static struct dma_async_tx_descriptor *sh_dmae_prep_slave_sg(
  
         sh_chan = to_sh_chan(chan);
         param = chan->private;
-       slave_addr = param->config->addr;
  
         /* Someone calling slave DMA on a public channel? */
         if (!param || !sg_len) {
@@ -589,6 +588,8 @@ static struct dma_async_tx_descriptor *sh_dmae_prep_slave_sg(
                 return NULL;
         }
  
+       slave_addr = param->config->addr;
+
         /*
          * if (param != NULL), this is a successfully requested slave channel,
          * therefore param->config != NULL too.
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c

index e0187d16dd7c53fd240b58d62005e4c17df14bc4..0fd5b85a0f756745bd1074ae673e89d6c81a237a 100644 (file)
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1140,6 +1140,7 @@ static struct mcidev_sysfs_attribute i7core_udimm_counters_attrs[] = {
         ATTR_COUNTER(0),
         ATTR_COUNTER(1),
         ATTR_COUNTER(2),
+       { .attr = { .name = NULL } }
  };
  
  static struct mcidev_sysfs_group i7core_udimm_counters = {
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c

index 1b05896648bce47cff20ed04acbf5d1ee6175a8b..9dcb17d51aee737bcbb4ac478807c04311588d6c 100644 (file)
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -2840,7 +2840,7 @@ static int __devinit pci_probe(struct pci_dev *dev,
                                const struct pci_device_id *ent)
  {
         struct fw_ohci *ohci;
-       u32 bus_options, max_receive, link_speed, version, link_enh;
+       u32 bus_options, max_receive, link_speed, version;
         u64 guid;
         int i, err, n_ir, n_it;
         size_t size;
@@ -2894,23 +2894,6 @@ static int __devinit pci_probe(struct pci_dev *dev,
         if (param_quirks)
                 ohci->quirks = param_quirks;
  
-       /* TI OHCI-Lynx and compatible: set recommended configuration bits. */
-       if (dev->vendor == PCI_VENDOR_ID_TI) {
-               pci_read_config_dword(dev, PCI_CFG_TI_LinkEnh, &link_enh);
-
-               /* adjust latency of ATx FIFO: use 1.7 KB threshold */
-               link_enh &= ~TI_LinkEnh_atx_thresh_mask;
-               link_enh |= TI_LinkEnh_atx_thresh_1_7K;
-
-               /* use priority arbitration for asynchronous responses */
-               link_enh |= TI_LinkEnh_enab_unfair;
-
-               /* required for aPhyEnhanceEnable to work */
-               link_enh |= TI_LinkEnh_enab_accel;
-
-               pci_write_config_dword(dev, PCI_CFG_TI_LinkEnh, link_enh);
-       }
-
         ar_context_init(&ohci->ar_request_ctx, ohci,
                         OHCI1394_AsReqRcvContextControlSet);
  
diff --git a/drivers/firewire/ohci.h b/drivers/firewire/ohci.h

index 0e6c5a466908d58156f4fe7dd978c469efb94aad..ef5e7336da68ddf6af413ecd0cc616a812ae1490 100644 (file)
--- a/drivers/firewire/ohci.h
+++ b/drivers/firewire/ohci.h
@@ -155,12 +155,4 @@
  
  #define OHCI1394_phy_tcode             0xe
  
-/* TI extensions */
-
-#define PCI_CFG_TI_LinkEnh             0xf4
-#define  TI_LinkEnh_enab_accel         0x00000002
-#define  TI_LinkEnh_enab_unfair                0x00000080
-#define  TI_LinkEnh_atx_thresh_mask    0x00003000
-#define  TI_LinkEnh_atx_thresh_1_7K    0x00001000
-
  #endif /* _FIREWIRE_OHCI_H */
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c

index bf92d07510df740d856c173e7dcce292659fe6f7..5663d2719063de9231ca6cc153b63b30422e17aa 100644 (file)
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -148,7 +148,7 @@ int drm_gem_object_init(struct drm_device *dev,
                 return -ENOMEM;
  
         kref_init(&obj->refcount);
-       kref_init(&obj->handlecount);
+       atomic_set(&obj->handle_count, 0);
         obj->size = size;
  
         atomic_inc(&dev->object_count);
@@ -462,28 +462,6 @@ drm_gem_object_free(struct kref *kref)
  }
  EXPORT_SYMBOL(drm_gem_object_free);
  
-/**
- * Called after the last reference to the object has been lost.
- * Must be called without holding struct_mutex
- *
- * Frees the object
- */
-void
-drm_gem_object_free_unlocked(struct kref *kref)
-{
-       struct drm_gem_object *obj = (struct drm_gem_object *) kref;
-       struct drm_device *dev = obj->dev;
-
-       if (dev->driver->gem_free_object_unlocked != NULL)
-               dev->driver->gem_free_object_unlocked(obj);
-       else if (dev->driver->gem_free_object != NULL) {
-               mutex_lock(&dev->struct_mutex);
-               dev->driver->gem_free_object(obj);
-               mutex_unlock(&dev->struct_mutex);
-       }
-}
-EXPORT_SYMBOL(drm_gem_object_free_unlocked);
-
  static void drm_gem_object_ref_bug(struct kref *list_kref)
  {
         BUG();
@@ -496,12 +474,8 @@ static void drm_gem_object_ref_bug(struct kref *list_kref)
   * called before drm_gem_object_free or we'll be touching
   * freed memory
   */
-void
-drm_gem_object_handle_free(struct kref *kref)
+void drm_gem_object_handle_free(struct drm_gem_object *obj)
  {
-       struct drm_gem_object *obj = container_of(kref,
-                                                 struct drm_gem_object,
-                                                 handlecount);
         struct drm_device *dev = obj->dev;
  
         /* Remove any name for this object */
@@ -528,6 +502,10 @@ void drm_gem_vm_open(struct vm_area_struct *vma)
         struct drm_gem_object *obj = vma->vm_private_data;
  
         drm_gem_object_reference(obj);
+
+       mutex_lock(&obj->dev->struct_mutex);
+       drm_vm_open_locked(vma);
+       mutex_unlock(&obj->dev->struct_mutex);
  }
  EXPORT_SYMBOL(drm_gem_vm_open);
  
@@ -535,7 +513,10 @@ void drm_gem_vm_close(struct vm_area_struct *vma)
  {
         struct drm_gem_object *obj = vma->vm_private_data;
  
-       drm_gem_object_unreference_unlocked(obj);
+       mutex_lock(&obj->dev->struct_mutex);
+       drm_vm_close_locked(vma);
+       drm_gem_object_unreference(obj);
+       mutex_unlock(&obj->dev->struct_mutex);
  }
  EXPORT_SYMBOL(drm_gem_vm_close);
  
diff --git a/drivers/gpu/drm/drm_info.c b/drivers/gpu/drm/drm_info.c

index 2ef2c78272434dcb6b32dc17fd96e34dd7a8d959..974e970ce3f81ce014170b90ad1b8adc8a1dd5a9 100644 (file)
--- a/drivers/gpu/drm/drm_info.c
+++ b/drivers/gpu/drm/drm_info.c
@@ -255,7 +255,7 @@ int drm_gem_one_name_info(int id, void *ptr, void *data)
  
         seq_printf(m, "%6d %8zd %7d %8d\n",
                    obj->name, obj->size,
-                  atomic_read(&obj->handlecount.refcount),
+                  atomic_read(&obj->handle_count),
                    atomic_read(&obj->refcount.refcount));
         return 0;
  }
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c

index fda67468e603b6169393b92bc4922afef8b4d8ce..5df450683aab8649511aaa96aaa759452b022fc0 100644 (file)
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -433,15 +433,7 @@ static void drm_vm_open(struct vm_area_struct *vma)
         mutex_unlock(&dev->struct_mutex);
  }
  
-/**
- * \c close method for all virtual memory types.
- *
- * \param vma virtual memory area.
- *
- * Search the \p vma private data entry in drm_device::vmalist, unlink it, and
- * free it.
- */
-static void drm_vm_close(struct vm_area_struct *vma)
+void drm_vm_close_locked(struct vm_area_struct *vma)
  {
         struct drm_file *priv = vma->vm_file->private_data;
         struct drm_device *dev = priv->minor->dev;
@@ -451,7 +443,6 @@ static void drm_vm_close(struct vm_area_struct *vma)
                   vma->vm_start, vma->vm_end - vma->vm_start);
         atomic_dec(&dev->vma_count);
  
-       mutex_lock(&dev->struct_mutex);
         list_for_each_entry_safe(pt, temp, &dev->vmalist, head) {
                 if (pt->vma == vma) {
                         list_del(&pt->head);
@@ -459,6 +450,23 @@ static void drm_vm_close(struct vm_area_struct *vma)
                         break;
                 }
         }
+}
+
+/**
+ * \c close method for all virtual memory types.
+ *
+ * \param vma virtual memory area.
+ *
+ * Search the \p vma private data entry in drm_device::vmalist, unlink it, and
+ * free it.
+ */
+static void drm_vm_close(struct vm_area_struct *vma)
+{
+       struct drm_file *priv = vma->vm_file->private_data;
+       struct drm_device *dev = priv->minor->dev;
+
+       mutex_lock(&dev->struct_mutex);
+       drm_vm_close_locked(vma);
         mutex_unlock(&dev->struct_mutex);
  }
  
diff --git a/drivers/gpu/drm/i810/i810_dma.c b/drivers/gpu/drm/i810/i810_dma.c

index 61b4caf220fa83bd15815ea0f82b627f2d773727..fb07e73581e84467ab59ebe744e21ff6f712d2ce 100644 (file)
--- a/drivers/gpu/drm/i810/i810_dma.c
+++ b/drivers/gpu/drm/i810/i810_dma.c
@@ -116,7 +116,7 @@ static int i810_mmap_buffers(struct file *filp, struct vm_area_struct *vma)
  static const struct file_operations i810_buffer_fops = {
         .open = drm_open,
         .release = drm_release,
-       .unlocked_ioctl = drm_ioctl,
+       .unlocked_ioctl = i810_ioctl,
         .mmap = i810_mmap_buffers,
         .fasync = drm_fasync,
  };
diff --git a/drivers/gpu/drm/i830/i830_dma.c b/drivers/gpu/drm/i830/i830_dma.c

index 671aa18415ac52d17164e79b4c2a9f287b02da0d..cc92c7e6236fbdffb86078290b5b93dffad2cbad 100644 (file)
--- a/drivers/gpu/drm/i830/i830_dma.c
+++ b/drivers/gpu/drm/i830/i830_dma.c
@@ -118,7 +118,7 @@ static int i830_mmap_buffers(struct file *filp, struct vm_area_struct *vma)
  static const struct file_operations i830_buffer_fops = {
         .open = drm_open,
         .release = drm_release,
-       .unlocked_ioctl = drm_ioctl,
+       .unlocked_ioctl = i830_ioctl,
         .mmap = i830_mmap_buffers,
         .fasync = drm_fasync,
  };
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c

index 9d67b485303005771a090ea7a3c1e1c6f8b74e9e..2dd2c93ebfa35dace7916b38c6df18c835161978 100644 (file)
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1787,9 +1787,9 @@ unsigned long i915_chipset_val(struct drm_i915_private *dev_priv)
                 }
         }
  
-       div_u64(diff, diff1);
+       diff = div_u64(diff, diff1);
         ret = ((m * diff) + c);
-       div_u64(ret, 10);
+       ret = div_u64(ret, 10);
  
         dev_priv->last_count1 = total_count;
         dev_priv->last_time1 = now;
@@ -1858,7 +1858,7 @@ void i915_update_gfx_val(struct drm_i915_private *dev_priv)
  
         /* More magic constants... */
         diff = diff * 1181;
-       div_u64(diff, diffms * 10);
+       diff = div_u64(diff, diffms * 10);
         dev_priv->gfx_power = diff;
  }
  
@@ -2231,6 +2231,9 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
         dev_priv->mchdev_lock = &mchdev_lock;
         spin_unlock(&mchdev_lock);
  
+       /* XXX Prevent module unload due to memory corruption bugs. */
+       __module_get(THIS_MODULE);
+
         return 0;
  
  out_workqueue_free:
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c

index bced9b25c71e2bc5819c3b02dfde055909937323..90b1d6753b9d493d3ed8d2c45153bf2047b54d8f 100644 (file)
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -136,14 +136,12 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
                 return -ENOMEM;
  
         ret = drm_gem_handle_create(file_priv, obj, &handle);
+       /* drop reference from allocate - handle holds it now */
+       drm_gem_object_unreference_unlocked(obj);
         if (ret) {
-               drm_gem_object_unreference_unlocked(obj);
                 return ret;
         }
  
-       /* Sink the floating reference from kref_init(handlecount) */
-       drm_gem_object_handle_unreference_unlocked(obj);
-
         args->handle = handle;
         return 0;
  }
@@ -471,14 +469,17 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
                 return -ENOENT;
         obj_priv = to_intel_bo(obj);
  
-       /* Bounds check source.
-        *
-        * XXX: This could use review for overflow issues...
-        */
-       if (args->offset > obj->size || args->size > obj->size ||
-           args->offset + args->size > obj->size) {
-               drm_gem_object_unreference_unlocked(obj);
-               return -EINVAL;
+       /* Bounds check source.  */
+       if (args->offset > obj->size || args->size > obj->size - args->offset) {
+               ret = -EINVAL;
+               goto err;
+       }
+
+       if (!access_ok(VERIFY_WRITE,
+                      (char __user *)(uintptr_t)args->data_ptr,
+                      args->size)) {
+               ret = -EFAULT;
+               goto err;
         }
  
         if (i915_gem_object_needs_bit17_swizzle(obj)) {
@@ -490,8 +491,8 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
                                                         file_priv);
         }
  
+err:
         drm_gem_object_unreference_unlocked(obj);
-
         return ret;
  }
  
@@ -580,8 +581,6 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev, struct drm_gem_object *obj,
  
         user_data = (char __user *) (uintptr_t) args->data_ptr;
         remain = args->size;
-       if (!access_ok(VERIFY_READ, user_data, remain))
-               return -EFAULT;
  
  
         mutex_lock(&dev->struct_mutex);
@@ -934,14 +933,17 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
                 return -ENOENT;
         obj_priv = to_intel_bo(obj);
  
-       /* Bounds check destination.
-        *
-        * XXX: This could use review for overflow issues...
-        */
-       if (args->offset > obj->size || args->size > obj->size ||
-           args->offset + args->size > obj->size) {
-               drm_gem_object_unreference_unlocked(obj);
-               return -EINVAL;
+       /* Bounds check destination. */
+       if (args->offset > obj->size || args->size > obj->size - args->offset) {
+               ret = -EINVAL;
+               goto err;
+       }
+
+       if (!access_ok(VERIFY_READ,
+                      (char __user *)(uintptr_t)args->data_ptr,
+                      args->size)) {
+               ret = -EFAULT;
+               goto err;
         }
  
         /* We can only do the GTT pwrite on untiled buffers, as otherwise
@@ -975,8 +977,8 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
                 DRM_INFO("pwrite failed %d\n", ret);
  #endif
  
+err:
         drm_gem_object_unreference_unlocked(obj);
-
         return ret;
  }
  
@@ -3258,6 +3260,8 @@ i915_gem_object_pin_and_relocate(struct drm_gem_object *obj,
                                   (int) reloc->offset,
                                   reloc->read_domains,
                                   reloc->write_domain);
+                       drm_gem_object_unreference(target_obj);
+                       i915_gem_object_unpin(obj);
                         return -EINVAL;
                 }
                 if (reloc->write_domain & I915_GEM_DOMAIN_CPU ||
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c

index e85246ef691ce339ab3ba331c30a6e846b7ead36..5c428fa3e0b34049e94786184b646a98ee87c06d 100644 (file)
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -93,7 +93,7 @@ i915_gem_evict_something(struct drm_device *dev, int min_size, unsigned alignmen
  {
         drm_i915_private_t *dev_priv = dev->dev_private;
         struct list_head eviction_list, unwind_list;
-       struct drm_i915_gem_object *obj_priv, *tmp_obj_priv;
+       struct drm_i915_gem_object *obj_priv;
         struct list_head *render_iter, *bsd_iter;
         int ret = 0;
  
@@ -175,39 +175,34 @@ i915_gem_evict_something(struct drm_device *dev, int min_size, unsigned alignmen
         return -ENOSPC;
  
  found:
+       /* drm_mm doesn't allow any other other operations while
+        * scanning, therefore store to be evicted objects on a
+        * temporary list. */
         INIT_LIST_HEAD(&eviction_list);
-       list_for_each_entry_safe(obj_priv, tmp_obj_priv,
-                                &unwind_list, evict_list) {
+       while (!list_empty(&unwind_list)) {
+               obj_priv = list_first_entry(&unwind_list,
+                                           struct drm_i915_gem_object,
+                                           evict_list);
                 if (drm_mm_scan_remove_block(obj_priv->gtt_space)) {
-                       /* drm_mm doesn't allow any other other operations while
-                        * scanning, therefore store to be evicted objects on a
-                        * temporary list. */
                         list_move(&obj_priv->evict_list, &eviction_list);
-               } else
-                       drm_gem_object_unreference(&obj_priv->base);
+                       continue;
+               }
+               list_del(&obj_priv->evict_list);
+               drm_gem_object_unreference(&obj_priv->base);
         }
  
         /* Unbinding will emit any required flushes */
-       list_for_each_entry_safe(obj_priv, tmp_obj_priv,
-                                &eviction_list, evict_list) {
-#if WATCH_LRU
-               DRM_INFO("%s: evicting %p\n", __func__, &obj_priv->base);
-#endif
-               ret = i915_gem_object_unbind(&obj_priv->base);
-               if (ret)
-                       return ret;
-
+       while (!list_empty(&eviction_list)) {
+               obj_priv = list_first_entry(&eviction_list,
+                                           struct drm_i915_gem_object,
+                                           evict_list);
+               if (ret == 0)
+                       ret = i915_gem_object_unbind(&obj_priv->base);
+               list_del(&obj_priv->evict_list);
                 drm_gem_object_unreference(&obj_priv->base);
         }
  
-       /* The just created free hole should be on the top of the free stack
-        * maintained by drm_mm, so this BUG_ON actually executes in O(1).
-        * Furthermore all accessed data has just recently been used, so it
-        * should be really fast, too. */
-       BUG_ON(!drm_mm_search_free(&dev_priv->mm.gtt_space, min_size,
-                                  alignment, 0));
-
-       return 0;
+       return ret;
  }
  
  int
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c

index b5bf51a4502dc4f4e2ca4d2914673004c931c3b5..979228594599a28ac7737762679f1c97fd5981bf 100644 (file)
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1013,8 +1013,8 @@ void intel_wait_for_vblank(struct drm_device *dev, int pipe)
                 DRM_DEBUG_KMS("vblank wait timed out\n");
  }
  
-/**
- * intel_wait_for_vblank_off - wait for vblank after disabling a pipe
+/*
+ * intel_wait_for_pipe_off - wait for pipe to turn off
   * @dev: drm device
   * @pipe: pipe to wait for
   *
@@ -1022,25 +1022,39 @@ void intel_wait_for_vblank(struct drm_device *dev, int pipe)
   * spinning on the vblank interrupt status bit, since we won't actually
   * see an interrupt when the pipe is disabled.
   *
- * So this function waits for the display line value to settle (it
- * usually ends up stopping at the start of the next frame).
+ * On Gen4 and above:
+ *   wait for the pipe register state bit to turn off
+ *
+ * Otherwise:
+ *   wait for the display line value to settle (it usually
+ *   ends up stopping at the start of the next frame).
+ *  
   */
-void intel_wait_for_vblank_off(struct drm_device *dev, int pipe)
+static void intel_wait_for_pipe_off(struct drm_device *dev, int pipe)
  {
         struct drm_i915_private *dev_priv = dev->dev_private;
-       int pipedsl_reg = (pipe == 0 ? PIPEADSL : PIPEBDSL);
-       unsigned long timeout = jiffies + msecs_to_jiffies(100);
-       u32 last_line;
-
-       /* Wait for the display line to settle */
-       do {
-               last_line = I915_READ(pipedsl_reg) & DSL_LINEMASK;
-               mdelay(5);
-       } while (((I915_READ(pipedsl_reg) & DSL_LINEMASK) != last_line) &&
-                time_after(timeout, jiffies));
-
-       if (time_after(jiffies, timeout))
-               DRM_DEBUG_KMS("vblank wait timed out\n");
+
+       if (INTEL_INFO(dev)->gen >= 4) {
+               int pipeconf_reg = (pipe == 0 ? PIPEACONF : PIPEBCONF);
+
+               /* Wait for the Pipe State to go off */
+               if (wait_for((I915_READ(pipeconf_reg) & I965_PIPECONF_ACTIVE) == 0,
+                            100, 0))
+                       DRM_DEBUG_KMS("pipe_off wait timed out\n");
+       } else {
+               u32 last_line;
+               int pipedsl_reg = (pipe == 0 ? PIPEADSL : PIPEBDSL);
+               unsigned long timeout = jiffies + msecs_to_jiffies(100);
+
+               /* Wait for the display line to settle */
+               do {
+                       last_line = I915_READ(pipedsl_reg) & DSL_LINEMASK;
+                       mdelay(5);
+               } while (((I915_READ(pipedsl_reg) & DSL_LINEMASK) != last_line) &&
+                        time_after(timeout, jiffies));
+               if (time_after(jiffies, timeout))
+                       DRM_DEBUG_KMS("pipe_off wait timed out\n");
+       }
  }
  
  /* Parameters have changed, update FBC info */
@@ -2328,13 +2342,13 @@ static void i9xx_crtc_dpms(struct drm_crtc *crtc, int mode)
                         I915_READ(dspbase_reg);
                 }
  
-               /* Wait for vblank for the disable to take effect */
-               intel_wait_for_vblank_off(dev, pipe);
-
                 /* Don't disable pipe A or pipe A PLLs if needed */
                 if (pipeconf_reg == PIPEACONF &&
-                   (dev_priv->quirks & QUIRK_PIPEA_FORCE))
+                   (dev_priv->quirks & QUIRK_PIPEA_FORCE)) {
+                       /* Wait for vblank for the disable to take effect */
+                       intel_wait_for_vblank(dev, pipe);
                         goto skip_pipe_off;
+               }
  
                 /* Next, disable display pipes */
                 temp = I915_READ(pipeconf_reg);
@@ -2343,8 +2357,8 @@ static void i9xx_crtc_dpms(struct drm_crtc *crtc, int mode)
                         I915_READ(pipeconf_reg);
                 }
  
-               /* Wait for vblank for the disable to take effect. */
-               intel_wait_for_vblank_off(dev, pipe);
+               /* Wait for the pipe to turn off */
+               intel_wait_for_pipe_off(dev, pipe);
  
                 temp = I915_READ(dpll_reg);
                 if ((temp & DPLL_VCO_ENABLE) != 0) {
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c

index 1a51ee07de3e72daf3a3c59ffdb15f70d5c95959..9ab8708ac6ba1370cea75680d6a660daa5f9b147 100644 (file)
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1138,18 +1138,14 @@ static bool
  intel_dp_set_link_train(struct intel_dp *intel_dp,
                         uint32_t dp_reg_value,
                         uint8_t dp_train_pat,
-                       uint8_t train_set[4],
-                       bool first)
+                       uint8_t train_set[4])
  {
         struct drm_device *dev = intel_dp->base.enc.dev;
         struct drm_i915_private *dev_priv = dev->dev_private;
-       struct intel_crtc *intel_crtc = to_intel_crtc(intel_dp->base.enc.crtc);
         int ret;
  
         I915_WRITE(intel_dp->output_reg, dp_reg_value);
         POSTING_READ(intel_dp->output_reg);
-       if (first)
-               intel_wait_for_vblank(dev, intel_crtc->pipe);
  
         intel_dp_aux_native_write_1(intel_dp,
                                     DP_TRAINING_PATTERN_SET,
@@ -1174,10 +1170,15 @@ intel_dp_link_train(struct intel_dp *intel_dp)
         uint8_t voltage;
         bool clock_recovery = false;
         bool channel_eq = false;
-       bool first = true;
         int tries;
         u32 reg;
         uint32_t DP = intel_dp->DP;
+       struct intel_crtc *intel_crtc = to_intel_crtc(intel_dp->base.enc.crtc);
+
+       /* Enable output, wait for it to become active */
+       I915_WRITE(intel_dp->output_reg, intel_dp->DP);
+       POSTING_READ(intel_dp->output_reg);
+       intel_wait_for_vblank(dev, intel_crtc->pipe);
  
         /* Write the link configuration data */
         intel_dp_aux_native_write(intel_dp, DP_LINK_BW_SET,
@@ -1210,9 +1211,8 @@ intel_dp_link_train(struct intel_dp *intel_dp)
                         reg = DP | DP_LINK_TRAIN_PAT_1;
  
                 if (!intel_dp_set_link_train(intel_dp, reg,
-                                            DP_TRAINING_PATTERN_1, train_set, first))
+                                            DP_TRAINING_PATTERN_1, train_set))
                         break;
-               first = false;
                 /* Set training pattern 1 */
  
                 udelay(100);
@@ -1266,8 +1266,7 @@ intel_dp_link_train(struct intel_dp *intel_dp)
  
                 /* channel eq pattern */
                 if (!intel_dp_set_link_train(intel_dp, reg,
-                                            DP_TRAINING_PATTERN_2, train_set,
-                                            false))
+                                            DP_TRAINING_PATTERN_2, train_set))
                         break;
  
                 udelay(400);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h

index ad312ca6b3e570125732168b3c2f670467264beb..8828b3ac6414eabff93134e34a41ae5c38d1cd34 100644 (file)
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -229,7 +229,6 @@ extern struct drm_display_mode *intel_crtc_mode_get(struct drm_device *dev,
                                                     struct drm_crtc *crtc);
  int intel_get_pipe_from_crtc_id(struct drm_device *dev, void *data,
                                 struct drm_file *file_priv);
-extern void intel_wait_for_vblank_off(struct drm_device *dev, int pipe);
  extern void intel_wait_for_vblank(struct drm_device *dev, int pipe);
  extern struct drm_crtc *intel_get_crtc_from_pipe(struct drm_device *dev, int pipe);
  extern struct drm_crtc *intel_get_load_detect_pipe(struct intel_encoder *intel_encoder,
diff --git a/drivers/gpu/drm/i915/intel_fb.c b/drivers/gpu/drm/i915/intel_fb.c

index 7bdc96256bf55b6e87d102377b428a871792be91..b61966c126d3e3839d33be6c8df2c0170c5d1376 100644 (file)
--- a/drivers/gpu/drm/i915/intel_fb.c
+++ b/drivers/gpu/drm/i915/intel_fb.c
@@ -237,8 +237,10 @@ int intel_fbdev_destroy(struct drm_device *dev,
         drm_fb_helper_fini(&ifbdev->helper);
  
         drm_framebuffer_cleanup(&ifb->base);
-       if (ifb->obj)
+       if (ifb->obj) {
                 drm_gem_object_unreference(ifb->obj);
+               ifb->obj = NULL;
+       }
  
         return 0;
  }
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c

index ead7b8fc53fcbcd473dbdc7a97d893a3e2e9c454..19620a6709f55c00e97efd5d2f816705788420f8 100644 (file)
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -167,11 +167,9 @@ nouveau_gem_ioctl_new(struct drm_device *dev, void *data,
                 goto out;
  
         ret = drm_gem_handle_create(file_priv, nvbo->gem, &req->info.handle);
+       /* drop reference from allocate - handle holds it now */
+       drm_gem_object_unreference_unlocked(nvbo->gem);
  out:
-       drm_gem_object_handle_unreference_unlocked(nvbo->gem);
-
-       if (ret)
-               drm_gem_object_unreference_unlocked(nvbo->gem);
         return ret;
  }
  
diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c

index 79082d4398ae156378609bbbbb4e8a9c900124cc..2f93d46ae69ad58dfb90ea5db02021a402506ae5 100644 (file)
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -1137,7 +1137,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev)
  
                 WREG32(RCU_IND_INDEX, 0x203);
                 efuse_straps_3 = RREG32(RCU_IND_DATA);
-               efuse_box_bit_127_124 = (u8)(efuse_straps_3 & 0xF0000000) >> 28;
+               efuse_box_bit_127_124 = (u8)((efuse_straps_3 & 0xF0000000) >> 28);
  
                 switch(efuse_box_bit_127_124) {
                 case 0x0:
@@ -1407,6 +1407,7 @@ int evergreen_mc_init(struct radeon_device *rdev)
         rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE) * 1024 * 1024;
         rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE) * 1024 * 1024;
         rdev->mc.visible_vram_size = rdev->mc.aper_size;
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         r600_vram_gtt_location(rdev, &rdev->mc);
         radeon_update_bandwidth_info(rdev);
  
@@ -1520,7 +1521,7 @@ void evergreen_disable_interrupt_state(struct radeon_device *rdev)
  {
         u32 tmp;
  
-       WREG32(CP_INT_CNTL, 0);
+       WREG32(CP_INT_CNTL, CNTX_BUSY_INT_ENABLE | CNTX_EMPTY_INT_ENABLE);
         WREG32(GRBM_INT_CNTL, 0);
         WREG32(INT_MASK + EVERGREEN_CRTC0_REGISTER_OFFSET, 0);
         WREG32(INT_MASK + EVERGREEN_CRTC1_REGISTER_OFFSET, 0);
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c

index e151f16a8f86d73090ec6a4eb17a3590661868db..e59422320bb6df9873fbf88f9e29d34fdc412110 100644 (file)
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -1030,6 +1030,7 @@ int r100_cp_init(struct radeon_device *rdev, unsigned ring_size)
                 return r;
         }
         rdev->cp.ready = true;
+       rdev->mc.active_vram_size = rdev->mc.real_vram_size;
         return 0;
  }
  
@@ -1047,6 +1048,7 @@ void r100_cp_fini(struct radeon_device *rdev)
  void r100_cp_disable(struct radeon_device *rdev)
  {
         /* Disable ring */
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         rdev->cp.ready = false;
         WREG32(RADEON_CP_CSQ_MODE, 0);
         WREG32(RADEON_CP_CSQ_CNTL, 0);
@@ -2295,6 +2297,7 @@ void r100_vram_init_sizes(struct radeon_device *rdev)
         /* FIXME we don't use the second aperture yet when we could use it */
         if (rdev->mc.visible_vram_size > rdev->mc.aper_size)
                 rdev->mc.visible_vram_size = rdev->mc.aper_size;
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         config_aper_size = RREG32(RADEON_CONFIG_APER_SIZE);
         if (rdev->flags & RADEON_IS_IGP) {
                 uint32_t tom;
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c

index ddc3adea1dda4155374b2de956af7f88fd280404..7b65e4efe8af61e2df5404ea52c468fe8ee564db 100644 (file)
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1248,6 +1248,7 @@ int r600_mc_init(struct radeon_device *rdev)
         rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE);
         rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE);
         rdev->mc.visible_vram_size = rdev->mc.aper_size;
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         r600_vram_gtt_location(rdev, &rdev->mc);
  
         if (rdev->flags & RADEON_IS_IGP) {
@@ -1917,6 +1918,7 @@ void r600_pciep_wreg(struct radeon_device *rdev, u32 reg, u32 v)
   */
  void r600_cp_stop(struct radeon_device *rdev)
  {
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         WREG32(R_0086D8_CP_ME_CNTL, S_0086D8_CP_ME_HALT(1));
  }
  
@@ -2910,7 +2912,7 @@ static void r600_disable_interrupt_state(struct radeon_device *rdev)
  {
         u32 tmp;
  
-       WREG32(CP_INT_CNTL, 0);
+       WREG32(CP_INT_CNTL, CNTX_BUSY_INT_ENABLE | CNTX_EMPTY_INT_ENABLE);
         WREG32(GRBM_INT_CNTL, 0);
         WREG32(DxMODE_INT_MASK, 0);
         if (ASIC_IS_DCE3(rdev)) {
@@ -3528,7 +3530,8 @@ void r600_ioctl_wait_idle(struct radeon_device *rdev, struct radeon_bo *bo)
         /* r7xx hw bug.  write to HDP_DEBUG1 followed by fb read
          * rather than write to HDP_REG_COHERENCY_FLUSH_CNTL
          */
-       if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740)) {
+       if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740) &&
+           rdev->vram_scratch.ptr) {
                 void __iomem *ptr = (void *)rdev->vram_scratch.ptr;
                 u32 tmp;
  
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c

index 9ceb2a1ce7996c85f36b86f4ddf0fa834b091adf..3473c00781ffaaac06cab0c520231a5a66a21111 100644 (file)
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -532,6 +532,7 @@ int r600_blit_init(struct radeon_device *rdev)
         memcpy(ptr + rdev->r600_blit.ps_offset, r6xx_ps, r6xx_ps_size * 4);
         radeon_bo_kunmap(rdev->r600_blit.shader_obj);
         radeon_bo_unreserve(rdev->r600_blit.shader_obj);
+       rdev->mc.active_vram_size = rdev->mc.real_vram_size;
         return 0;
  }
  
@@ -539,6 +540,7 @@ void r600_blit_fini(struct radeon_device *rdev)
  {
         int r;
  
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         if (rdev->r600_blit.shader_obj == NULL)
                 return;
         /* If we can't reserve the bo, unref should be enough to destroy
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h

index a168d644bf9e96724b5e717f2a8777bb8354f5e5..9ff38c99a6ea0e568f2567c0a7ad34b06e60e512 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -344,6 +344,7 @@ struct radeon_mc {
          * about vram size near mc fb location */
         u64                     mc_vram_size;
         u64                     visible_vram_size;
+       u64                     active_vram_size;
         u64                     gtt_size;
         u64                     gtt_start;
         u64                     gtt_end;
diff --git a/drivers/gpu/drm/radeon/radeon_atombios.c b/drivers/gpu/drm/radeon/radeon_atombios.c

index ebae14c4b768b4413990e84c1055782a72590009..8e43ddae70cc27d3c37d472561527432cc51dd53 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_atombios.c
+++ b/drivers/gpu/drm/radeon/radeon_atombios.c
@@ -317,6 +317,15 @@ static bool radeon_atom_apply_quirks(struct drm_device *dev,
                         *connector_type = DRM_MODE_CONNECTOR_DVID;
         }
  
+       /* MSI K9A2GM V2/V3 board has no HDMI or DVI */
+       if ((dev->pdev->device == 0x796e) &&
+           (dev->pdev->subsystem_vendor == 0x1462) &&
+           (dev->pdev->subsystem_device == 0x7302)) {
+               if ((supported_device == ATOM_DEVICE_DFP2_SUPPORT) ||
+                   (supported_device == ATOM_DEVICE_DFP3_SUPPORT))
+                       return false;
+       }
+
         /* a-bit f-i90hd - ciaranm on #radeonhd - this board has no DVI */
         if ((dev->pdev->device == 0x7941) &&
             (dev->pdev->subsystem_vendor == 0x147b) &&
@@ -1549,39 +1558,39 @@ radeon_atombios_get_tv_info(struct radeon_device *rdev)
                 switch (tv_info->ucTV_BootUpDefaultStandard) {
                 case ATOM_TV_NTSC:
                         tv_std = TV_STD_NTSC;
-                       DRM_INFO("Default TV standard: NTSC\n");
+                       DRM_DEBUG_KMS("Default TV standard: NTSC\n");
                         break;
                 case ATOM_TV_NTSCJ:
                         tv_std = TV_STD_NTSC_J;
-                       DRM_INFO("Default TV standard: NTSC-J\n");
+                       DRM_DEBUG_KMS("Default TV standard: NTSC-J\n");
                         break;
                 case ATOM_TV_PAL:
                         tv_std = TV_STD_PAL;
-                       DRM_INFO("Default TV standard: PAL\n");
+                       DRM_DEBUG_KMS("Default TV standard: PAL\n");
                         break;
                 case ATOM_TV_PALM:
                         tv_std = TV_STD_PAL_M;
-                       DRM_INFO("Default TV standard: PAL-M\n");
+                       DRM_DEBUG_KMS("Default TV standard: PAL-M\n");
                         break;
                 case ATOM_TV_PALN:
                         tv_std = TV_STD_PAL_N;
-                       DRM_INFO("Default TV standard: PAL-N\n");
+                       DRM_DEBUG_KMS("Default TV standard: PAL-N\n");
                         break;
                 case ATOM_TV_PALCN:
                         tv_std = TV_STD_PAL_CN;
-                       DRM_INFO("Default TV standard: PAL-CN\n");
+                       DRM_DEBUG_KMS("Default TV standard: PAL-CN\n");
                         break;
                 case ATOM_TV_PAL60:
                         tv_std = TV_STD_PAL_60;
-                       DRM_INFO("Default TV standard: PAL-60\n");
+                       DRM_DEBUG_KMS("Default TV standard: PAL-60\n");
                         break;
                 case ATOM_TV_SECAM:
                         tv_std = TV_STD_SECAM;
-                       DRM_INFO("Default TV standard: SECAM\n");
+                       DRM_DEBUG_KMS("Default TV standard: SECAM\n");
                         break;
                 default:
                         tv_std = TV_STD_NTSC;
-                       DRM_INFO("Unknown TV standard; defaulting to NTSC\n");
+                       DRM_DEBUG_KMS("Unknown TV standard; defaulting to NTSC\n");
                         break;
                 }
         }
diff --git a/drivers/gpu/drm/radeon/radeon_combios.c b/drivers/gpu/drm/radeon/radeon_combios.c

index a04b7a6ad95f3225b1df2879e45e3ced89304c17..7b7ea269549ccef95c1e6c343083071774e7e9de 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_combios.c
+++ b/drivers/gpu/drm/radeon/radeon_combios.c
@@ -913,47 +913,47 @@ radeon_combios_get_tv_info(struct radeon_device *rdev)
                         switch (RBIOS8(tv_info + 7) & 0xf) {
                         case 1:
                                 tv_std = TV_STD_NTSC;
-                               DRM_INFO("Default TV standard: NTSC\n");
+                               DRM_DEBUG_KMS("Default TV standard: NTSC\n");
                                 break;
                         case 2:
                                 tv_std = TV_STD_PAL;
-                               DRM_INFO("Default TV standard: PAL\n");
+                               DRM_DEBUG_KMS("Default TV standard: PAL\n");
                                 break;
                         case 3:
                                 tv_std = TV_STD_PAL_M;
-                               DRM_INFO("Default TV standard: PAL-M\n");
+                               DRM_DEBUG_KMS("Default TV standard: PAL-M\n");
                                 break;
                         case 4:
                                 tv_std = TV_STD_PAL_60;
-                               DRM_INFO("Default TV standard: PAL-60\n");
+                               DRM_DEBUG_KMS("Default TV standard: PAL-60\n");
                                 break;
                         case 5:
                                 tv_std = TV_STD_NTSC_J;
-                               DRM_INFO("Default TV standard: NTSC-J\n");
+                               DRM_DEBUG_KMS("Default TV standard: NTSC-J\n");
                                 break;
                         case 6:
                                 tv_std = TV_STD_SCART_PAL;
-                               DRM_INFO("Default TV standard: SCART-PAL\n");
+                               DRM_DEBUG_KMS("Default TV standard: SCART-PAL\n");
                                 break;
                         default:
                                 tv_std = TV_STD_NTSC;
-                               DRM_INFO
+                               DRM_DEBUG_KMS
                                     ("Unknown TV standard; defaulting to NTSC\n");
                                 break;
                         }
  
                         switch ((RBIOS8(tv_info + 9) >> 2) & 0x3) {
                         case 0:
-                               DRM_INFO("29.498928713 MHz TV ref clk\n");
+                               DRM_DEBUG_KMS("29.498928713 MHz TV ref clk\n");
                                 break;
                         case 1:
-                               DRM_INFO("28.636360000 MHz TV ref clk\n");
+                               DRM_DEBUG_KMS("28.636360000 MHz TV ref clk\n");
                                 break;
                         case 2:
-                               DRM_INFO("14.318180000 MHz TV ref clk\n");
+                               DRM_DEBUG_KMS("14.318180000 MHz TV ref clk\n");
                                 break;
                         case 3:
-                               DRM_INFO("27.000000000 MHz TV ref clk\n");
+                               DRM_DEBUG_KMS("27.000000000 MHz TV ref clk\n");
                                 break;
                         default:
                                 break;
@@ -1324,7 +1324,7 @@ bool radeon_legacy_get_tmds_info_from_combios(struct radeon_encoder *encoder,
  
         if (tmds_info) {
                 ver = RBIOS8(tmds_info);
-               DRM_INFO("DFP table revision: %d\n", ver);
+               DRM_DEBUG_KMS("DFP table revision: %d\n", ver);
                 if (ver == 3) {
                         n = RBIOS8(tmds_info + 5) + 1;
                         if (n > 4)
@@ -1408,7 +1408,7 @@ bool radeon_legacy_get_ext_tmds_info_from_combios(struct radeon_encoder *encoder
                 offset = combios_get_table_offset(dev, COMBIOS_EXT_TMDS_INFO_TABLE);
                 if (offset) {
                         ver = RBIOS8(offset);
-                       DRM_INFO("External TMDS Table revision: %d\n", ver);
+                       DRM_DEBUG_KMS("External TMDS Table revision: %d\n", ver);
                         tmds->slave_addr = RBIOS8(offset + 4 + 2);
                         tmds->slave_addr >>= 1; /* 7 bit addressing */
                         gpio = RBIOS8(offset + 4 + 3);
diff --git a/drivers/gpu/drm/radeon/radeon_cursor.c b/drivers/gpu/drm/radeon/radeon_cursor.c

index 5731fc9b1ae3ae9274188a5bf8cdae7aa78f3b33..3eef567b0421ae71826abd77ac3bc035a5ec1c33 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_cursor.c
+++ b/drivers/gpu/drm/radeon/radeon_cursor.c
@@ -203,6 +203,7 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
         struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
         struct radeon_device *rdev = crtc->dev->dev_private;
         int xorigin = 0, yorigin = 0;
+       int w = radeon_crtc->cursor_width;
  
         if (x < 0)
                 xorigin = -x + 1;
@@ -213,22 +214,7 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
         if (yorigin >= CURSOR_HEIGHT)
                 yorigin = CURSOR_HEIGHT - 1;
  
-       radeon_lock_cursor(crtc, true);
-       if (ASIC_IS_DCE4(rdev)) {
-               /* cursors are offset into the total surface */
-               x += crtc->x;
-               y += crtc->y;
-               DRM_DEBUG("x %d y %d c->x %d c->y %d\n", x, y, crtc->x, crtc->y);
-
-               /* XXX: check if evergreen has the same issues as avivo chips */
-               WREG32(EVERGREEN_CUR_POSITION + radeon_crtc->crtc_offset,
-                      ((xorigin ? 0 : x) << 16) |
-                      (yorigin ? 0 : y));
-               WREG32(EVERGREEN_CUR_HOT_SPOT + radeon_crtc->crtc_offset, (xorigin << 16) | yorigin);
-               WREG32(EVERGREEN_CUR_SIZE + radeon_crtc->crtc_offset,
-                      ((radeon_crtc->cursor_width - 1) << 16) | (radeon_crtc->cursor_height - 1));
-       } else if (ASIC_IS_AVIVO(rdev)) {
-               int w = radeon_crtc->cursor_width;
+       if (ASIC_IS_AVIVO(rdev)) {
                 int i = 0;
                 struct drm_crtc *crtc_p;
  
@@ -260,7 +246,17 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
                         if (w <= 0)
                                 w = 1;
                 }
+       }
  
+       radeon_lock_cursor(crtc, true);
+       if (ASIC_IS_DCE4(rdev)) {
+               WREG32(EVERGREEN_CUR_POSITION + radeon_crtc->crtc_offset,
+                      ((xorigin ? 0 : x) << 16) |
+                      (yorigin ? 0 : y));
+               WREG32(EVERGREEN_CUR_HOT_SPOT + radeon_crtc->crtc_offset, (xorigin << 16) | yorigin);
+               WREG32(EVERGREEN_CUR_SIZE + radeon_crtc->crtc_offset,
+                      ((w - 1) << 16) | (radeon_crtc->cursor_height - 1));
+       } else if (ASIC_IS_AVIVO(rdev)) {
                 WREG32(AVIVO_D1CUR_POSITION + radeon_crtc->crtc_offset,
                              ((xorigin ? 0 : x) << 16) |
                              (yorigin ? 0 : y));
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c

index 127a395f70fb304d6f4d0df613de1fdfc8ca1223..b92d2f2fcbed6a8bd472ce9f4b936aa82f309278 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -349,6 +349,8 @@ static void radeon_print_display_setup(struct drm_device *dev)
                                         DRM_INFO("    DFP4: %s\n", encoder_names[radeon_encoder->encoder_id]);
                                 if (devices & ATOM_DEVICE_DFP5_SUPPORT)
                                         DRM_INFO("    DFP5: %s\n", encoder_names[radeon_encoder->encoder_id]);
+                               if (devices & ATOM_DEVICE_DFP6_SUPPORT)
+                                       DRM_INFO("    DFP6: %s\n", encoder_names[radeon_encoder->encoder_id]);
                                 if (devices & ATOM_DEVICE_TV1_SUPPORT)
                                         DRM_INFO("    TV1: %s\n", encoder_names[radeon_encoder->encoder_id]);
                                 if (devices & ATOM_DEVICE_CV_SUPPORT)
@@ -841,8 +843,9 @@ static void radeon_user_framebuffer_destroy(struct drm_framebuffer *fb)
  {
         struct radeon_framebuffer *radeon_fb = to_radeon_framebuffer(fb);
  
-       if (radeon_fb->obj)
+       if (radeon_fb->obj) {
                 drm_gem_object_unreference_unlocked(radeon_fb->obj);
+       }
         drm_framebuffer_cleanup(fb);
         kfree(radeon_fb);
  }
diff --git a/drivers/gpu/drm/radeon/radeon_fb.c b/drivers/gpu/drm/radeon/radeon_fb.c

index c74a8b20d9413e921bc6a03cd92578146a9ee8ef..40b0c087b5921384d46bf7f745cf93f2b391015b 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_fb.c
+++ b/drivers/gpu/drm/radeon/radeon_fb.c
@@ -94,6 +94,7 @@ static void radeonfb_destroy_pinned_object(struct drm_gem_object *gobj)
         ret = radeon_bo_reserve(rbo, false);
         if (likely(ret == 0)) {
                 radeon_bo_kunmap(rbo);
+               radeon_bo_unpin(rbo);
                 radeon_bo_unreserve(rbo);
         }
         drm_gem_object_unreference_unlocked(gobj);
@@ -325,8 +326,6 @@ static int radeon_fbdev_destroy(struct drm_device *dev, struct radeon_fbdev *rfb
  {
         struct fb_info *info;
         struct radeon_framebuffer *rfb = &rfbdev->rfb;
-       struct radeon_bo *rbo;
-       int r;
  
         if (rfbdev->helper.fbdev) {
                 info = rfbdev->helper.fbdev;
@@ -338,14 +337,8 @@ static int radeon_fbdev_destroy(struct drm_device *dev, struct radeon_fbdev *rfb
         }
  
         if (rfb->obj) {
-               rbo = rfb->obj->driver_private;
-               r = radeon_bo_reserve(rbo, false);
-               if (likely(r == 0)) {
-                       radeon_bo_kunmap(rbo);
-                       radeon_bo_unpin(rbo);
-                       radeon_bo_unreserve(rbo);
-               }
-               drm_gem_object_unreference_unlocked(rfb->obj);
+               radeonfb_destroy_pinned_object(rfb->obj);
+               rfb->obj = NULL;
         }
         drm_fb_helper_fini(&rfbdev->helper);
         drm_framebuffer_cleanup(&rfb->base);
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c

index c578f265b24cefc6dce21734783b01c1aed1ce27..d1e595d9172396b8104d19c7a1d0a87d3b14b772 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -201,11 +201,11 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
                 return r;
         }
         r = drm_gem_handle_create(filp, gobj, &handle);
+       /* drop reference from allocate - handle holds it now */
+       drm_gem_object_unreference_unlocked(gobj);
         if (r) {
-               drm_gem_object_unreference_unlocked(gobj);
                 return r;
         }
-       drm_gem_object_handle_unreference_unlocked(gobj);
         args->handle = handle;
         return 0;
  }
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c

index 0afd1e62347dcfb9670d20e13a818d8d7a99b59c..b3b5306bb578bf88547e4078fe48f59d9e0ea720 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -69,7 +69,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
         u32 c = 0;
  
         rbo->placement.fpfn = 0;
-       rbo->placement.lpfn = 0;
+       rbo->placement.lpfn = rbo->rdev->mc.active_vram_size >> PAGE_SHIFT;
         rbo->placement.placement = rbo->placements;
         rbo->placement.busy_placement = rbo->placements;
         if (domain & RADEON_GEM_DOMAIN_VRAM)
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h

index 353998dc2c03b12992cd244ff116d01e45db2b96..3481bc7f6f582b08a0c2a9ff079fa9787defb6cd 100644 (file)
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -124,11 +124,8 @@ static inline int radeon_bo_wait(struct radeon_bo *bo, u32 *mem_type,
         int r;
  
         r = ttm_bo_reserve(&bo->tbo, true, no_wait, false, 0);
-       if (unlikely(r != 0)) {
-               if (r != -ERESTARTSYS)
-                       dev_err(bo->rdev->dev, "%p reserve failed for wait\n", bo);
+       if (unlikely(r != 0))
                 return r;
-       }
         spin_lock(&bo->tbo.lock);
         if (mem_type)
                 *mem_type = bo->tbo.mem.mem_type;
diff --git a/drivers/gpu/drm/radeon/rs600.c b/drivers/gpu/drm/radeon/rs600.c

index cc05b230d7effbbae88524da0d698dace6228ccf..51d5f7b5ab21b40a6e34d2fd286f28da91e4f0b0 100644 (file)
--- a/drivers/gpu/drm/radeon/rs600.c
+++ b/drivers/gpu/drm/radeon/rs600.c
@@ -693,6 +693,7 @@ void rs600_mc_init(struct radeon_device *rdev)
         rdev->mc.real_vram_size = RREG32(RADEON_CONFIG_MEMSIZE);
         rdev->mc.mc_vram_size = rdev->mc.real_vram_size;
         rdev->mc.visible_vram_size = rdev->mc.aper_size;
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         rdev->mc.igp_sideport_enabled = radeon_atombios_sideport_present(rdev);
         base = RREG32_MC(R_000004_MC_FB_LOCATION);
         base = G_000004_MC_FB_START(base) << 16;
diff --git a/drivers/gpu/drm/radeon/rs690.c b/drivers/gpu/drm/radeon/rs690.c

index 3e3f75718be3e83ab156465dc80a11f604b58a64..4dc2a87ea68018f0292cc0724d6ef4868c00e8ac 100644 (file)
--- a/drivers/gpu/drm/radeon/rs690.c
+++ b/drivers/gpu/drm/radeon/rs690.c
@@ -157,6 +157,7 @@ void rs690_mc_init(struct radeon_device *rdev)
         rdev->mc.aper_base = pci_resource_start(rdev->pdev, 0);
         rdev->mc.aper_size = pci_resource_len(rdev->pdev, 0);
         rdev->mc.visible_vram_size = rdev->mc.aper_size;
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         base = RREG32_MC(R_000100_MCCFG_FB_LOCATION);
         base = G_000100_MC_FB_START(base) << 16;
         rdev->mc.igp_sideport_enabled = radeon_atombios_sideport_present(rdev);
diff --git a/drivers/gpu/drm/radeon/rv770.c b/drivers/gpu/drm/radeon/rv770.c

index bfa59db374d23d3c4a06877a6e9a37aec59904e0..9490da700749487c00fe9c57671ec89727653136 100644 (file)
--- a/drivers/gpu/drm/radeon/rv770.c
+++ b/drivers/gpu/drm/radeon/rv770.c
@@ -267,6 +267,7 @@ static void rv770_mc_program(struct radeon_device *rdev)
   */
  void r700_cp_stop(struct radeon_device *rdev)
  {
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         WREG32(CP_ME_CNTL, (CP_ME_HALT | CP_PFP_HALT));
  }
  
@@ -992,6 +993,7 @@ int rv770_mc_init(struct radeon_device *rdev)
         rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE);
         rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE);
         rdev->mc.visible_vram_size = rdev->mc.aper_size;
+       rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
         r600_vram_gtt_location(rdev, &rdev->mc);
         radeon_update_bandwidth_info(rdev);
  
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c

index cb4cf7ef4d1eee9bc726c4d4ee34f8962526b316..db809e034cc48b6d8c246cba1ede0660113e30de 100644 (file)
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -441,6 +441,43 @@ out_err:
         return ret;
  }
  
+/**
+ * Call bo::reserved and with the lru lock held.
+ * Will release GPU memory type usage on destruction.
+ * This is the place to put in driver specific hooks.
+ * Will release the bo::reserved lock and the
+ * lru lock on exit.
+ */
+
+static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
+{
+       struct ttm_bo_global *glob = bo->glob;
+
+       if (bo->ttm) {
+
+               /**
+                * Release the lru_lock, since we don't want to have
+                * an atomic requirement on ttm_tt[unbind|destroy].
+                */
+
+               spin_unlock(&glob->lru_lock);
+               ttm_tt_unbind(bo->ttm);
+               ttm_tt_destroy(bo->ttm);
+               bo->ttm = NULL;
+               spin_lock(&glob->lru_lock);
+       }
+
+       if (bo->mem.mm_node) {
+               drm_mm_put_block(bo->mem.mm_node);
+               bo->mem.mm_node = NULL;
+       }
+
+       atomic_set(&bo->reserved, 0);
+       wake_up_all(&bo->event_queue);
+       spin_unlock(&glob->lru_lock);
+}
+
+
  /**
   * If bo idle, remove from delayed- and lru lists, and unref.
   * If not idle, and already on delayed list, do nothing.
@@ -456,6 +493,7 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, bool remove_all)
         int ret;
  
         spin_lock(&bo->lock);
+retry:
         (void) ttm_bo_wait(bo, false, false, !remove_all);
  
         if (!bo->sync_obj) {
@@ -464,31 +502,52 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, bool remove_all)
                 spin_unlock(&bo->lock);
  
                 spin_lock(&glob->lru_lock);
-               put_count = ttm_bo_del_from_lru(bo);
+               ret = ttm_bo_reserve_locked(bo, false, !remove_all, false, 0);
+
+               /**
+                * Someone else has the object reserved. Bail and retry.
+                */
  
-               ret = ttm_bo_reserve_locked(bo, false, false, false, 0);
-               BUG_ON(ret);
-               if (bo->ttm)
-                       ttm_tt_unbind(bo->ttm);
+               if (unlikely(ret == -EBUSY)) {
+                       spin_unlock(&glob->lru_lock);
+                       spin_lock(&bo->lock);
+                       goto requeue;
+               }
+
+               /**
+                * We can re-check for sync object without taking
+                * the bo::lock since setting the sync object requires
+                * also bo::reserved. A busy object at this point may
+                * be caused by another thread starting an accelerated
+                * eviction.
+                */
+
+               if (unlikely(bo->sync_obj)) {
+                       atomic_set(&bo->reserved, 0);
+                       wake_up_all(&bo->event_queue);
+                       spin_unlock(&glob->lru_lock);
+                       spin_lock(&bo->lock);
+                       if (remove_all)
+                               goto retry;
+                       else
+                               goto requeue;
+               }
+
+               put_count = ttm_bo_del_from_lru(bo);
  
                 if (!list_empty(&bo->ddestroy)) {
                         list_del_init(&bo->ddestroy);
                         ++put_count;
                 }
-               if (bo->mem.mm_node) {
-                       drm_mm_put_block(bo->mem.mm_node);
-                       bo->mem.mm_node = NULL;
-               }
-               spin_unlock(&glob->lru_lock);
  
-               atomic_set(&bo->reserved, 0);
+               ttm_bo_cleanup_memtype_use(bo);
  
                 while (put_count--)
                         kref_put(&bo->list_kref, ttm_bo_ref_bug);
  
                 return 0;
         }
-
+requeue:
         spin_lock(&glob->lru_lock);
         if (list_empty(&bo->ddestroy)) {
                 void *sync_obj = bo->sync_obj;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c

index 72ec2e2b6e9787196ca1de65f28e4c6a0f090051..a96ed6d9d010b82cfc58ed41ec6240f99d5a9103 100644 (file)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -148,13 +148,16 @@ static struct pci_device_id vmw_pci_id_list[] = {
         {0, 0, 0}
  };
  
-static char *vmw_devname = "vmwgfx";
+static int enable_fbdev;
  
  static int vmw_probe(struct pci_dev *, const struct pci_device_id *);
  static void vmw_master_init(struct vmw_master *);
  static int vmwgfx_pm_notifier(struct notifier_block *nb, unsigned long val,
                               void *ptr);
  
+MODULE_PARM_DESC(enable_fbdev, "Enable vmwgfx fbdev");
+module_param_named(enable_fbdev, enable_fbdev, int, 0600);
+
  static void vmw_print_capabilities(uint32_t capabilities)
  {
         DRM_INFO("Capabilities:\n");
@@ -192,8 +195,6 @@ static int vmw_request_device(struct vmw_private *dev_priv)
  {
         int ret;
  
-       vmw_kms_save_vga(dev_priv);
-
         ret = vmw_fifo_init(dev_priv, &dev_priv->fifo);
         if (unlikely(ret != 0)) {
                 DRM_ERROR("Unable to initialize FIFO.\n");
@@ -206,9 +207,35 @@ static int vmw_request_device(struct vmw_private *dev_priv)
  static void vmw_release_device(struct vmw_private *dev_priv)
  {
         vmw_fifo_release(dev_priv, &dev_priv->fifo);
-       vmw_kms_restore_vga(dev_priv);
  }
  
+int vmw_3d_resource_inc(struct vmw_private *dev_priv)
+{
+       int ret = 0;
+
+       mutex_lock(&dev_priv->release_mutex);
+       if (unlikely(dev_priv->num_3d_resources++ == 0)) {
+               ret = vmw_request_device(dev_priv);
+               if (unlikely(ret != 0))
+                       --dev_priv->num_3d_resources;
+       }
+       mutex_unlock(&dev_priv->release_mutex);
+       return ret;
+}
+
+
+void vmw_3d_resource_dec(struct vmw_private *dev_priv)
+{
+       int32_t n3d;
+
+       mutex_lock(&dev_priv->release_mutex);
+       if (unlikely(--dev_priv->num_3d_resources == 0))
+               vmw_release_device(dev_priv);
+       n3d = (int32_t) dev_priv->num_3d_resources;
+       mutex_unlock(&dev_priv->release_mutex);
+
+       BUG_ON(n3d < 0);
+}
  
  static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
  {
@@ -228,6 +255,7 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
         dev_priv->last_read_sequence = (uint32_t) -100;
         mutex_init(&dev_priv->hw_mutex);
         mutex_init(&dev_priv->cmdbuf_mutex);
+       mutex_init(&dev_priv->release_mutex);
         rwlock_init(&dev_priv->resource_lock);
         idr_init(&dev_priv->context_idr);
         idr_init(&dev_priv->surface_idr);
@@ -244,6 +272,8 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
         dev_priv->vram_start = pci_resource_start(dev->pdev, 1);
         dev_priv->mmio_start = pci_resource_start(dev->pdev, 2);
  
+       dev_priv->enable_fb = enable_fbdev;
+
         mutex_lock(&dev_priv->hw_mutex);
  
         vmw_write(dev_priv, SVGA_REG_ID, SVGA_ID_2);
@@ -343,17 +373,6 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
  
         dev->dev_private = dev_priv;
  
-       if (!dev->devname)
-               dev->devname = vmw_devname;
-
-       if (dev_priv->capabilities & SVGA_CAP_IRQMASK) {
-               ret = drm_irq_install(dev);
-               if (unlikely(ret != 0)) {
-                       DRM_ERROR("Failed installing irq: %d\n", ret);
-                       goto out_no_irq;
-               }
-       }
-
         ret = pci_request_regions(dev->pdev, "vmwgfx probe");
         dev_priv->stealth = (ret != 0);
         if (dev_priv->stealth) {
@@ -369,26 +388,52 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
                         goto out_no_device;
                 }
         }
-       ret = vmw_request_device(dev_priv);
+       ret = vmw_kms_init(dev_priv);
         if (unlikely(ret != 0))
-               goto out_no_device;
-       vmw_kms_init(dev_priv);
+               goto out_no_kms;
         vmw_overlay_init(dev_priv);
-       vmw_fb_init(dev_priv);
+       if (dev_priv->enable_fb) {
+               ret = vmw_3d_resource_inc(dev_priv);
+               if (unlikely(ret != 0))
+                       goto out_no_fifo;
+               vmw_kms_save_vga(dev_priv);
+               vmw_fb_init(dev_priv);
+               DRM_INFO("%s", vmw_fifo_have_3d(dev_priv) ?
+                        "Detected device 3D availability.\n" :
+                        "Detected no device 3D availability.\n");
+       } else {
+               DRM_INFO("Delayed 3D detection since we're not "
+                        "running the device in SVGA mode yet.\n");
+       }
+
+       if (dev_priv->capabilities & SVGA_CAP_IRQMASK) {
+               ret = drm_irq_install(dev);
+               if (unlikely(ret != 0)) {
+                       DRM_ERROR("Failed installing irq: %d\n", ret);
+                       goto out_no_irq;
+               }
+       }
  
         dev_priv->pm_nb.notifier_call = vmwgfx_pm_notifier;
         register_pm_notifier(&dev_priv->pm_nb);
  
-       DRM_INFO("%s", vmw_fifo_have_3d(dev_priv) ? "Have 3D\n" : "No 3D\n");
-
         return 0;
  
-out_no_device:
-       if (dev_priv->capabilities & SVGA_CAP_IRQMASK)
-               drm_irq_uninstall(dev_priv->dev);
-       if (dev->devname == vmw_devname)
-               dev->devname = NULL;
  out_no_irq:
+       if (dev_priv->enable_fb) {
+               vmw_fb_close(dev_priv);
+               vmw_kms_restore_vga(dev_priv);
+               vmw_3d_resource_dec(dev_priv);
+       }
+out_no_fifo:
+       vmw_overlay_close(dev_priv);
+       vmw_kms_close(dev_priv);
+out_no_kms:
+       if (dev_priv->stealth)
+               pci_release_region(dev->pdev, 2);
+       else
+               pci_release_regions(dev->pdev);
+out_no_device:
         ttm_object_device_release(&dev_priv->tdev);
  out_err4:
         iounmap(dev_priv->mmio_virt);
@@ -415,19 +460,20 @@ static int vmw_driver_unload(struct drm_device *dev)
  
         unregister_pm_notifier(&dev_priv->pm_nb);
  
-       vmw_fb_close(dev_priv);
+       if (dev_priv->capabilities & SVGA_CAP_IRQMASK)
+               drm_irq_uninstall(dev_priv->dev);
+       if (dev_priv->enable_fb) {
+               vmw_fb_close(dev_priv);
+               vmw_kms_restore_vga(dev_priv);
+               vmw_3d_resource_dec(dev_priv);
+       }
         vmw_kms_close(dev_priv);
         vmw_overlay_close(dev_priv);
-       vmw_release_device(dev_priv);
         if (dev_priv->stealth)
                 pci_release_region(dev->pdev, 2);
         else
                 pci_release_regions(dev->pdev);
  
-       if (dev_priv->capabilities & SVGA_CAP_IRQMASK)
-               drm_irq_uninstall(dev_priv->dev);
-       if (dev->devname == vmw_devname)
-               dev->devname = NULL;
         ttm_object_device_release(&dev_priv->tdev);
         iounmap(dev_priv->mmio_virt);
         drm_mtrr_del(dev_priv->mmio_mtrr, dev_priv->mmio_start,
@@ -500,7 +546,7 @@ static long vmw_unlocked_ioctl(struct file *filp, unsigned int cmd,
                 struct drm_ioctl_desc *ioctl =
                     &vmw_ioctls[nr - DRM_COMMAND_BASE];
  
-               if (unlikely(ioctl->cmd != cmd)) {
+               if (unlikely(ioctl->cmd_drv != cmd)) {
                         DRM_ERROR("Invalid command format, ioctl %d\n",
                                   nr - DRM_COMMAND_BASE);
                         return -EINVAL;
@@ -589,6 +635,16 @@ static int vmw_master_set(struct drm_device *dev,
         struct vmw_master *vmaster = vmw_master(file_priv->master);
         int ret = 0;
  
+       if (!dev_priv->enable_fb) {
+               ret = vmw_3d_resource_inc(dev_priv);
+               if (unlikely(ret != 0))
+                       return ret;
+               vmw_kms_save_vga(dev_priv);
+               mutex_lock(&dev_priv->hw_mutex);
+               vmw_write(dev_priv, SVGA_REG_TRACES, 0);
+               mutex_unlock(&dev_priv->hw_mutex);
+       }
+
         if (active) {
                 BUG_ON(active != &dev_priv->fbdev_master);
                 ret = ttm_vt_lock(&active->lock, false, vmw_fp->tfile);
@@ -617,7 +673,13 @@ static int vmw_master_set(struct drm_device *dev,
         return 0;
  
  out_no_active_lock:
-       vmw_release_device(dev_priv);
+       if (!dev_priv->enable_fb) {
+               mutex_lock(&dev_priv->hw_mutex);
+               vmw_write(dev_priv, SVGA_REG_TRACES, 1);
+               mutex_unlock(&dev_priv->hw_mutex);
+               vmw_kms_restore_vga(dev_priv);
+               vmw_3d_resource_dec(dev_priv);
+       }
         return ret;
  }
  
@@ -645,11 +707,23 @@ static void vmw_master_drop(struct drm_device *dev,
  
         ttm_lock_set_kill(&vmaster->lock, true, SIGTERM);
  
+       if (!dev_priv->enable_fb) {
+               ret = ttm_bo_evict_mm(&dev_priv->bdev, TTM_PL_VRAM);
+               if (unlikely(ret != 0))
+                       DRM_ERROR("Unable to clean VRAM on master drop.\n");
+               mutex_lock(&dev_priv->hw_mutex);
+               vmw_write(dev_priv, SVGA_REG_TRACES, 1);
+               mutex_unlock(&dev_priv->hw_mutex);
+               vmw_kms_restore_vga(dev_priv);
+               vmw_3d_resource_dec(dev_priv);
+       }
+
         dev_priv->active_master = &dev_priv->fbdev_master;
         ttm_lock_set_kill(&dev_priv->fbdev_master.lock, false, SIGTERM);
         ttm_vt_unlock(&dev_priv->fbdev_master.lock);
  
-       vmw_fb_on(dev_priv);
+       if (dev_priv->enable_fb)
+               vmw_fb_on(dev_priv);
  }
  
  
@@ -722,6 +796,7 @@ static struct drm_driver driver = {
         .irq_postinstall = vmw_irq_postinstall,
         .irq_uninstall = vmw_irq_uninstall,
         .irq_handler = vmw_irq_handler,
+       .get_vblank_counter = vmw_get_vblank_counter,
         .reclaim_buffers_locked = NULL,
         .get_map_ofs = drm_core_get_map_ofs,
         .get_reg_ofs = drm_core_get_reg_ofs,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h

index 429f917b60bf4b30ecdfd0946f9f1f0978bd5c58..58de6393f611dd79fbbebeab81102dcf0f4abfcb 100644 (file)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -277,6 +277,7 @@ struct vmw_private {
  
         bool stealth;
         bool is_opened;
+       bool enable_fb;
  
         /**
          * Master management.
@@ -285,6 +286,9 @@ struct vmw_private {
         struct vmw_master *active_master;
         struct vmw_master fbdev_master;
         struct notifier_block pm_nb;
+
+       struct mutex release_mutex;
+       uint32_t num_3d_resources;
  };
  
  static inline struct vmw_private *vmw_priv(struct drm_device *dev)
@@ -319,6 +323,9 @@ static inline uint32_t vmw_read(struct vmw_private *dev_priv,
         return val;
  }
  
+int vmw_3d_resource_inc(struct vmw_private *dev_priv);
+void vmw_3d_resource_dec(struct vmw_private *dev_priv);
+
  /**
   * GMR utilities - vmwgfx_gmr.c
   */
@@ -511,6 +518,7 @@ void vmw_kms_write_svga(struct vmw_private *vmw_priv,
                         unsigned bbp, unsigned depth);
  int vmw_kms_update_layout_ioctl(struct drm_device *dev, void *data,
                                 struct drm_file *file_priv);
+u32 vmw_get_vblank_counter(struct drm_device *dev, int crtc);
  
  /**
   * Overlay control - vmwgfx_overlay.c
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c

index 870967a97c15d52eb3f380323e6038d32ed6e76f..409e172f4abfe94502be96502e251b5d6b2e54c9 100644 (file)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
@@ -615,6 +615,11 @@ int vmw_dmabuf_to_start_of_vram(struct vmw_private *vmw_priv,
         if (unlikely(ret != 0))
                 goto err_unlock;
  
+       if (bo->mem.mem_type == TTM_PL_VRAM &&
+           bo->mem.mm_node->start < bo->num_pages)
+               (void) ttm_bo_validate(bo, &vmw_sys_placement, false,
+                                      false, false);
+
         ret = ttm_bo_validate(bo, &ne_placement, false, false, false);
  
         /* Could probably bug on */
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c

index e6a1eb7ea95498f00e65d123adeae79160af8fa8..0fe31766e4cf5f11e6025e5a85f96baa5936408f 100644 (file)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
@@ -106,6 +106,7 @@ int vmw_fifo_init(struct vmw_private *dev_priv, struct vmw_fifo_state *fifo)
         mutex_lock(&dev_priv->hw_mutex);
         dev_priv->enable_state = vmw_read(dev_priv, SVGA_REG_ENABLE);
         dev_priv->config_done_state = vmw_read(dev_priv, SVGA_REG_CONFIG_DONE);
+       dev_priv->traces_state = vmw_read(dev_priv, SVGA_REG_TRACES);
         vmw_write(dev_priv, SVGA_REG_ENABLE, 1);
  
         min = 4;
@@ -175,6 +176,8 @@ void vmw_fifo_release(struct vmw_private *dev_priv, struct vmw_fifo_state *fifo)
                   dev_priv->config_done_state);
         vmw_write(dev_priv, SVGA_REG_ENABLE,
                   dev_priv->enable_state);
+       vmw_write(dev_priv, SVGA_REG_TRACES,
+                 dev_priv->traces_state);
  
         mutex_unlock(&dev_priv->hw_mutex);
         vmw_fence_queue_takedown(&fifo->fence_queue);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c

index 64d7f47df8683ef49cfbb3c83eb449026949af03..e882ba099f0c33dab30f12f7b9328b3b628ac712 100644 (file)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -898,7 +898,19 @@ int vmw_kms_save_vga(struct vmw_private *vmw_priv)
                 save->width = vmw_read(vmw_priv, SVGA_REG_DISPLAY_WIDTH);
                 save->height = vmw_read(vmw_priv, SVGA_REG_DISPLAY_HEIGHT);
                 vmw_write(vmw_priv, SVGA_REG_DISPLAY_ID, SVGA_ID_INVALID);
+               if (i == 0 && vmw_priv->num_displays == 1 &&
+                   save->width == 0 && save->height == 0) {
+
+                       /*
+                        * It should be fairly safe to assume that these
+                        * values are uninitialized.
+                        */
+
+                       save->width = vmw_priv->vga_width - save->pos_x;
+                       save->height = vmw_priv->vga_height - save->pos_y;
+               }
         }
+
         return 0;
  }
  
@@ -984,3 +996,8 @@ out_unlock:
         ttm_read_unlock(&vmaster->lock);
         return ret;
  }
+
+u32 vmw_get_vblank_counter(struct drm_device *dev, int crtc)
+{
+       return 0;
+}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c

index 7083b1a24df35b99fe8792040ef234d6536d69a2..11cb39e3accbfa9581801095ab0398952d2313f1 100644 (file)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
@@ -27,6 +27,8 @@
  
  #include "vmwgfx_kms.h"
  
+#define VMWGFX_LDU_NUM_DU 8
+
  #define vmw_crtc_to_ldu(x) \
         container_of(x, struct vmw_legacy_display_unit, base.crtc)
  #define vmw_encoder_to_ldu(x) \
@@ -536,6 +538,10 @@ static int vmw_ldu_init(struct vmw_private *dev_priv, unsigned unit)
  
  int vmw_kms_init_legacy_display_system(struct vmw_private *dev_priv)
  {
+       struct drm_device *dev = dev_priv->dev;
+       int i;
+       int ret;
+
         if (dev_priv->ldu_priv) {
                 DRM_INFO("ldu system already on\n");
                 return -EINVAL;
@@ -553,23 +559,24 @@ int vmw_kms_init_legacy_display_system(struct vmw_private *dev_priv)
  
         drm_mode_create_dirty_info_property(dev_priv->dev);
  
-       vmw_ldu_init(dev_priv, 0);
-       /* for old hardware without multimon only enable one display */
         if (dev_priv->capabilities & SVGA_CAP_MULTIMON) {
-               vmw_ldu_init(dev_priv, 1);
-               vmw_ldu_init(dev_priv, 2);
-               vmw_ldu_init(dev_priv, 3);
-               vmw_ldu_init(dev_priv, 4);
-               vmw_ldu_init(dev_priv, 5);
-               vmw_ldu_init(dev_priv, 6);
-               vmw_ldu_init(dev_priv, 7);
+               for (i = 0; i < VMWGFX_LDU_NUM_DU; ++i)
+                       vmw_ldu_init(dev_priv, i);
+               ret = drm_vblank_init(dev, VMWGFX_LDU_NUM_DU);
+       } else {
+               /* for old hardware without multimon only enable one display */
+               vmw_ldu_init(dev_priv, 0);
+               ret = drm_vblank_init(dev, 1);
         }
  
-       return 0;
+       return ret;
  }
  
  int vmw_kms_close_legacy_display_system(struct vmw_private *dev_priv)
  {
+       struct drm_device *dev = dev_priv->dev;
+
+       drm_vblank_cleanup(dev);
         if (!dev_priv->ldu_priv)
                 return -ENOSYS;
  
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c

index 5f2d5df01e5c370acbc1be99772701626419daf5..c8c40e9979dbd21442a5d87cf6ac4cdcad08cfad 100644 (file)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -211,6 +211,7 @@ static void vmw_hw_context_destroy(struct vmw_resource *res)
         cmd->body.cid = cpu_to_le32(res->id);
  
         vmw_fifo_commit(dev_priv, sizeof(*cmd));
+       vmw_3d_resource_dec(dev_priv);
  }
  
  static int vmw_context_init(struct vmw_private *dev_priv,
@@ -247,6 +248,7 @@ static int vmw_context_init(struct vmw_private *dev_priv,
         cmd->body.cid = cpu_to_le32(res->id);
  
         vmw_fifo_commit(dev_priv, sizeof(*cmd));
+       (void) vmw_3d_resource_inc(dev_priv);
         vmw_resource_activate(res, vmw_hw_context_destroy);
         return 0;
  }
@@ -406,6 +408,7 @@ static void vmw_hw_surface_destroy(struct vmw_resource *res)
         cmd->body.sid = cpu_to_le32(res->id);
  
         vmw_fifo_commit(dev_priv, sizeof(*cmd));
+       vmw_3d_resource_dec(dev_priv);
  }
  
  void vmw_surface_res_free(struct vmw_resource *res)
@@ -473,6 +476,7 @@ int vmw_surface_init(struct vmw_private *dev_priv,
         }
  
         vmw_fifo_commit(dev_priv, submit_size);
+       (void) vmw_3d_resource_inc(dev_priv);
         vmw_resource_activate(res, vmw_hw_surface_destroy);
         return 0;
  }
diff --git a/drivers/hid/hid-cando.c b/drivers/hid/hid-cando.c

index 4267a6fdc277a183cc840342fb5508cadda90815..5925bdcd417dbbf74d878b52fedcbc24a4d4de31 100644 (file)
--- a/drivers/hid/hid-cando.c
+++ b/drivers/hid/hid-cando.c
@@ -237,6 +237,8 @@ static const struct hid_device_id cando_devices[] = {
                         USB_DEVICE_ID_CANDO_MULTI_TOUCH) },
         { HID_USB_DEVICE(USB_VENDOR_ID_CANDO,
                         USB_DEVICE_ID_CANDO_MULTI_TOUCH_11_6) },
+       { HID_USB_DEVICE(USB_VENDOR_ID_CANDO,
+               USB_DEVICE_ID_CANDO_MULTI_TOUCH_15_6) },
         { }
  };
  MODULE_DEVICE_TABLE(hid, cando_devices);
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c

index 3f7292486024b8feace0b72775c03a2ae122e8fc..a0dea3d1296e65ebc9b84e24ec1aeacd69aa59d7 100644 (file)
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -1292,6 +1292,7 @@ static const struct hid_device_id hid_blacklist[] = {
         { HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE_2) },
         { HID_USB_DEVICE(USB_VENDOR_ID_CANDO, USB_DEVICE_ID_CANDO_MULTI_TOUCH) },
         { HID_USB_DEVICE(USB_VENDOR_ID_CANDO, USB_DEVICE_ID_CANDO_MULTI_TOUCH_11_6) },
+       { HID_USB_DEVICE(USB_VENDOR_ID_CANDO, USB_DEVICE_ID_CANDO_MULTI_TOUCH_15_6) },
         { HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_CYMOTION) },
         { HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_CYMOTION_SOLAR) },
         { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_TACTICAL_PAD) },
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h

index 765a4f53eb5cb663fd319d0a71386ae1a9ee0fa5..c5ae5f1545bd0a18d516edab6f04b3c18d0fca71 100644 (file)
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -134,6 +134,7 @@
  #define USB_VENDOR_ID_CANDO            0x2087
  #define USB_DEVICE_ID_CANDO_MULTI_TOUCH        0x0a01
  #define USB_DEVICE_ID_CANDO_MULTI_TOUCH_11_6 0x0b03
+#define USB_DEVICE_ID_CANDO_MULTI_TOUCH_15_6 0x0f01
  
  #define USB_VENDOR_ID_CH               0x068e
  #define USB_DEVICE_ID_CH_PRO_PEDALS    0x00f2
@@ -503,6 +504,7 @@
  
  #define USB_VENDOR_ID_TURBOX           0x062a
  #define USB_DEVICE_ID_TURBOX_KEYBOARD  0x0201
+#define USB_DEVICE_ID_TURBOX_TOUCHSCREEN_MOSART        0x7100
  
  #define USB_VENDOR_ID_TWINHAN          0x6253
  #define USB_DEVICE_ID_TWINHAN_IR_REMOTE        0x0100
diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c

index 47d70c523d93474a658bbaa5aa5b1cfb327f194b..a3866b5c0c43da58bceae7bb463a2780846454be 100644 (file)
--- a/drivers/hid/hidraw.c
+++ b/drivers/hid/hidraw.c
@@ -109,6 +109,12 @@ static ssize_t hidraw_write(struct file *file, const char __user *buffer, size_t
         int ret = 0;
  
         mutex_lock(&minors_lock);
+
+       if (!hidraw_table[minor]) {
+               ret = -ENODEV;
+               goto out;
+       }
+
         dev = hidraw_table[minor]->hid;
  
         if (!dev->hid_output_raw_report) {
@@ -244,6 +250,10 @@ static long hidraw_ioctl(struct file *file, unsigned int cmd,
  
         mutex_lock(&minors_lock);
         dev = hidraw_table[minor];
+       if (!dev) {
+               ret = -ENODEV;
+               goto out;
+       }
  
         switch (cmd) {
                 case HIDIOCGRDESCSIZE:
@@ -317,6 +327,7 @@ static long hidraw_ioctl(struct file *file, unsigned int cmd,
  
                 ret = -ENOTTY;
         }
+out:
         mutex_unlock(&minors_lock);
         return ret;
  }
diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c

index 70da3181c8a0467663c15abd324dd6fbe801ea51..f0260c699adb45ac9d5b11f119c1491a95e04504 100644 (file)
--- a/drivers/hid/usbhid/hid-quirks.c
+++ b/drivers/hid/usbhid/hid-quirks.c
@@ -36,6 +36,7 @@ static const struct hid_blacklist {
         { USB_VENDOR_ID_DWAV, USB_DEVICE_ID_EGALAX_TOUCHCONTROLLER, HID_QUIRK_MULTI_INPUT | HID_QUIRK_NOGET },
         { USB_VENDOR_ID_DWAV, USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH, HID_QUIRK_MULTI_INPUT },
         { USB_VENDOR_ID_MOJO, USB_DEVICE_ID_RETRO_ADAPTER, HID_QUIRK_MULTI_INPUT },
+       { USB_VENDOR_ID_TURBOX, USB_DEVICE_ID_TURBOX_TOUCHSCREEN_MOSART, HID_QUIRK_MULTI_INPUT },
         { USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_DRIVING, HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
         { USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FLYING, HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
         { USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FIGHTING, HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
diff --git a/drivers/hwmon/f71882fg.c b/drivers/hwmon/f71882fg.c

index 537841ef44b99d179318f7510dbf28dddedb0ed8..75afb3b0e0763c184a1b22cdc163ef32d10d2969 100644 (file)
--- a/drivers/hwmon/f71882fg.c
+++ b/drivers/hwmon/f71882fg.c
@@ -111,7 +111,7 @@ static struct platform_device *f71882fg_pdev;
  /* Super-I/O Function prototypes */
  static inline int superio_inb(int base, int reg);
  static inline int superio_inw(int base, int reg);
-static inline void superio_enter(int base);
+static inline int superio_enter(int base);
  static inline void superio_select(int base, int ld);
  static inline void superio_exit(int base);
  
@@ -861,11 +861,20 @@ static int superio_inw(int base, int reg)
         return val;
  }
  
-static inline void superio_enter(int base)
+static inline int superio_enter(int base)
  {
+       /* Don't step on other drivers' I/O space by accident */
+       if (!request_muxed_region(base, 2, DRVNAME)) {
+               printk(KERN_ERR DRVNAME ": I/O address 0x%04x already in use\n",
+                               base);
+               return -EBUSY;
+       }
+
         /* according to the datasheet the key must be send twice! */
         outb(SIO_UNLOCK_KEY, base);
         outb(SIO_UNLOCK_KEY, base);
+
+       return 0;
  }
  
  static inline void superio_select(int base, int ld)
@@ -877,6 +886,7 @@ static inline void superio_select(int base, int ld)
  static inline void superio_exit(int base)
  {
         outb(SIO_LOCK_KEY, base);
+       release_region(base, 2);
  }
  
  static inline int fan_from_reg(u16 reg)
@@ -2175,21 +2185,15 @@ static int f71882fg_remove(struct platform_device *pdev)
  static int __init f71882fg_find(int sioaddr, unsigned short *address,
         struct f71882fg_sio_data *sio_data)
  {
-       int err = -ENODEV;
         u16 devid;
-
-       /* Don't step on other drivers' I/O space by accident */
-       if (!request_region(sioaddr, 2, DRVNAME)) {
-               printk(KERN_ERR DRVNAME ": I/O address 0x%04x already in use\n",
-                               (int)sioaddr);
-               return -EBUSY;
-       }
-
-       superio_enter(sioaddr);
+       int err = superio_enter(sioaddr);
+       if (err)
+               return err;
  
         devid = superio_inw(sioaddr, SIO_REG_MANID);
         if (devid != SIO_FINTEK_ID) {
                 pr_debug(DRVNAME ": Not a Fintek device\n");
+               err = -ENODEV;
                 goto exit;
         }
  
@@ -2213,6 +2217,7 @@ static int __init f71882fg_find(int sioaddr, unsigned short *address,
         default:
                 printk(KERN_INFO DRVNAME ": Unsupported Fintek device: %04x\n",
                        (unsigned int)devid);
+               err = -ENODEV;
                 goto exit;
         }
  
@@ -2223,12 +2228,14 @@ static int __init f71882fg_find(int sioaddr, unsigned short *address,
  
         if (!(superio_inb(sioaddr, SIO_REG_ENABLE) & 0x01)) {
                 printk(KERN_WARNING DRVNAME ": Device not activated\n");
+               err = -ENODEV;
                 goto exit;
         }
  
         *address = superio_inw(sioaddr, SIO_REG_ADDR);
         if (*address == 0) {
                 printk(KERN_WARNING DRVNAME ": Base address not set\n");
+               err = -ENODEV;
                 goto exit;
         }
         *address &= ~(REGION_LENGTH - 1);       /* Ignore 3 LSB */
@@ -2239,7 +2246,6 @@ static int __init f71882fg_find(int sioaddr, unsigned short *address,
                 (int)superio_inb(sioaddr, SIO_REG_DEVREV));
  exit:
         superio_exit(sioaddr);
-       release_region(sioaddr, 2);
         return err;
  }
  
diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c

index f7bd2613ceccbc69e4d04b69432fdc40638b2555..f2de3be35df36265cdfca5d899097b161ef10171 100644 (file)
--- a/drivers/i2c/busses/i2c-cpm.c
+++ b/drivers/i2c/busses/i2c-cpm.c
@@ -677,6 +677,11 @@ static int __devinit cpm_i2c_probe(struct platform_device *ofdev,
         dev_dbg(&ofdev->dev, "hw routines for %s registered.\n",
                 cpm->adap.name);
  
+       /*
+        * register OF I2C devices
+        */
+       of_i2c_register_devices(&cpm->adap);
+
         return 0;
  out_shut:
         cpm_i2c_shutdown(cpm);
diff --git a/drivers/i2c/busses/i2c-davinci.c b/drivers/i2c/busses/i2c-davinci.c

index 2222c87876b97bc711b6d739fd4a82deef7db330..5795c8398c7c3a25af82319a0ce1d46741def8b2 100644 (file)
--- a/drivers/i2c/busses/i2c-davinci.c
+++ b/drivers/i2c/busses/i2c-davinci.c
@@ -331,21 +331,16 @@ i2c_davinci_xfer_msg(struct i2c_adapter *adap, struct i2c_msg *msg, int stop)
         INIT_COMPLETION(dev->cmd_complete);
         dev->cmd_err = 0;
  
-       /* Take I2C out of reset, configure it as master and set the
-        * start bit */
-       flag = DAVINCI_I2C_MDR_IRS | DAVINCI_I2C_MDR_MST | DAVINCI_I2C_MDR_STT;
+       /* Take I2C out of reset and configure it as master */
+       flag = DAVINCI_I2C_MDR_IRS | DAVINCI_I2C_MDR_MST;
  
         /* if the slave address is ten bit address, enable XA bit */
         if (msg->flags & I2C_M_TEN)
                 flag |= DAVINCI_I2C_MDR_XA;
         if (!(msg->flags & I2C_M_RD))
                 flag |= DAVINCI_I2C_MDR_TRX;
-       if (stop)
-               flag |= DAVINCI_I2C_MDR_STP;
-       if (msg->len == 0) {
+       if (msg->len == 0)
                 flag |= DAVINCI_I2C_MDR_RM;
-               flag &= ~DAVINCI_I2C_MDR_STP;
-       }
  
         /* Enable receive or transmit interrupts */
         w = davinci_i2c_read_reg(dev, DAVINCI_I2C_IMR_REG);
@@ -357,7 +352,11 @@ i2c_davinci_xfer_msg(struct i2c_adapter *adap, struct i2c_msg *msg, int stop)
  
         dev->terminate = 0;
  
-       /* write the data into mode register */
+       /*
+        * Write mode register first as needed for correct behaviour
+        * on OMAP-L138, but don't set STT yet to avoid a race with XRDY
+        * occuring before we have loaded DXR
+        */
         davinci_i2c_write_reg(dev, DAVINCI_I2C_MDR_REG, flag);
  
         /*
@@ -365,12 +364,19 @@ i2c_davinci_xfer_msg(struct i2c_adapter *adap, struct i2c_msg *msg, int stop)
          * because transmit-data-ready interrupt can come before
          * NACK-interrupt during sending of previous message and
          * ICDXR may have wrong data
+        * It also saves us one interrupt, slightly faster
          */
         if ((!(msg->flags & I2C_M_RD)) && dev->buf_len) {
                 davinci_i2c_write_reg(dev, DAVINCI_I2C_DXR_REG, *dev->buf++);
                 dev->buf_len--;
         }
  
+       /* Set STT to begin transmit now DXR is loaded */
+       flag |= DAVINCI_I2C_MDR_STT;
+       if (stop && msg->len != 0)
+               flag |= DAVINCI_I2C_MDR_STP;
+       davinci_i2c_write_reg(dev, DAVINCI_I2C_MDR_REG, flag);
+
         r = wait_for_completion_interruptible_timeout(&dev->cmd_complete,
                                                       dev->adapter.timeout);
         if (r == 0) {
diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c

index 43ca32fddde2b77309c92533f9c2c50e44719879..89eedf45d30ed877e6abbc76c66507372c4a3f83 100644 (file)
--- a/drivers/i2c/busses/i2c-ibm_iic.c
+++ b/drivers/i2c/busses/i2c-ibm_iic.c
@@ -761,6 +761,9 @@ static int __devinit iic_probe(struct platform_device *ofdev,
         dev_info(&ofdev->dev, "using %s mode\n",
                  dev->fast_mode ? "fast (400 kHz)" : "standard (100 kHz)");
  
+       /* Now register all the child nodes */
+       of_i2c_register_devices(adap);
+
         return 0;
  
  error_cleanup:
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c

index d1ff9408dc1f2d68cdbe305c6777ba259eed64e4..4c2a62b75b5cf188dd896dff100c80bfe253aded 100644 (file)
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -159,15 +159,9 @@ static int i2c_imx_bus_busy(struct imx_i2c_struct *i2c_imx, int for_busy)
  
  static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)
  {
-       int result;
-
-       result = wait_event_interruptible_timeout(i2c_imx->queue,
-               i2c_imx->i2csr & I2SR_IIF, HZ / 10);
+       wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF, HZ / 10);
  
-       if (unlikely(result < 0)) {
-               dev_dbg(&i2c_imx->adapter.dev, "<%s> result < 0\n", __func__);
-               return result;
-       } else if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
+       if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
                 dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
                 return -ETIMEDOUT;
         }
@@ -295,7 +289,7 @@ static irqreturn_t i2c_imx_isr(int irq, void *dev_id)
                 i2c_imx->i2csr = temp;
                 temp &= ~I2SR_IIF;
                 writeb(temp, i2c_imx->base + IMX_I2C_I2SR);
-               wake_up_interruptible(&i2c_imx->queue);
+               wake_up(&i2c_imx->queue);
                 return IRQ_HANDLED;
         }
  
diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c

index a1c419a716af8d24f4a84a680cc93123b6c8bff9..b74e6dc6886c71ed5ebe3e02f219a68f8f0a2df8 100644 (file)
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -632,6 +632,7 @@ static int __devinit fsl_i2c_probe(struct platform_device *op,
                 dev_err(i2c->dev, "failed to add adapter\n");
                 goto fail_add;
         }
+       of_i2c_register_devices(&i2c->adap);
  
         return result;
  
diff --git a/drivers/i2c/busses/i2c-octeon.c b/drivers/i2c/busses/i2c-octeon.c

index 0e9f85d0a835718dac97ecd52ff270f327d84136..56dbe54e88118a3fb7b112da16e11ccd5bdbc9fb 100644 (file)
--- a/drivers/i2c/busses/i2c-octeon.c
+++ b/drivers/i2c/busses/i2c-octeon.c
@@ -218,7 +218,7 @@ static int octeon_i2c_wait(struct octeon_i2c *i2c)
                 return result;
         } else if (result == 0) {
                 dev_dbg(i2c->dev, "%s: timeout\n", __func__);
-               result = -ETIMEDOUT;
+               return -ETIMEDOUT;
         }
  
         return 0;
diff --git a/drivers/i2c/busses/i2c-pca-isa.c b/drivers/i2c/busses/i2c-pca-isa.c

index bbd77603a4173b8f29aa1353ef1e47376c594196..29933f87d8fa8fdc31ffdd38c4ff905fddd7774d 100644 (file)
--- a/drivers/i2c/busses/i2c-pca-isa.c
+++ b/drivers/i2c/busses/i2c-pca-isa.c
@@ -71,8 +71,8 @@ static int pca_isa_readbyte(void *pd, int reg)
  
  static int pca_isa_waitforcompletion(void *pd)
  {
-       long ret = ~0;
         unsigned long timeout;
+       long ret;
  
         if (irq > -1) {
                 ret = wait_event_timeout(pca_wait,
@@ -81,11 +81,15 @@ static int pca_isa_waitforcompletion(void *pd)
         } else {
                 /* Do polling */
                 timeout = jiffies + pca_isa_ops.timeout;
-               while (((pca_isa_readbyte(pd, I2C_PCA_CON)
-                               & I2C_PCA_CON_SI) == 0)
-                               && (ret = time_before(jiffies, timeout)))
+               do {
+                       ret = time_before(jiffies, timeout);
+                       if (pca_isa_readbyte(pd, I2C_PCA_CON)
+                                       & I2C_PCA_CON_SI)
+                               break;
                         udelay(100);
+               } while (ret);
         }
+
         return ret > 0;
  }
  
diff --git a/drivers/i2c/busses/i2c-pca-platform.c b/drivers/i2c/busses/i2c-pca-platform.c

index ef5c78487eb779c36fd5b98f3fd8d0727c34494e..5f6d7f89e2252d1a4806a3be9e6368c212ceed48 100644 (file)
--- a/drivers/i2c/busses/i2c-pca-platform.c
+++ b/drivers/i2c/busses/i2c-pca-platform.c
@@ -80,8 +80,8 @@ static void i2c_pca_pf_writebyte32(void *pd, int reg, int val)
  static int i2c_pca_pf_waitforcompletion(void *pd)
  {
         struct i2c_pca_pf_data *i2c = pd;
-       long ret = ~0;
         unsigned long timeout;
+       long ret;
  
         if (i2c->irq) {
                 ret = wait_event_timeout(i2c->wait,
@@ -90,10 +90,13 @@ static int i2c_pca_pf_waitforcompletion(void *pd)
         } else {
                 /* Do polling */
                 timeout = jiffies + i2c->adap.timeout;
-               while (((i2c->algo_data.read_byte(i2c, I2C_PCA_CON)
-                               & I2C_PCA_CON_SI) == 0)
-                               && (ret = time_before(jiffies, timeout)))
+               do {
+                       ret = time_before(jiffies, timeout);
+                       if (i2c->algo_data.read_byte(i2c, I2C_PCA_CON)
+                                       & I2C_PCA_CON_SI)
+                               break;
                         udelay(100);
+               } while (ret);
         }
  
         return ret > 0;
diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c

index 72902e0bbfa79a48caaf2193420d4b8712af1e18..bf831bf8158741a9f857eb541afc1f3a48d38e52 100644 (file)
--- a/drivers/i2c/busses/i2c-s3c2410.c
+++ b/drivers/i2c/busses/i2c-s3c2410.c
@@ -662,8 +662,8 @@ static int s3c24xx_i2c_clockrate(struct s3c24xx_i2c *i2c, unsigned int *got)
                 unsigned long sda_delay;
  
                 if (pdata->sda_delay) {
-                       sda_delay = (freq / 1000) * pdata->sda_delay;
-                       sda_delay /= 1000000;
+                       sda_delay = clkin * pdata->sda_delay;
+                       sda_delay = DIV_ROUND_UP(sda_delay, 1000000);
                         sda_delay = DIV_ROUND_UP(sda_delay, 5);
                         if (sda_delay > 3)
                                 sda_delay = 3;
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c

index 6649176de940572a317b2744bd1d4e393c4bbd04..bea4c5021d26cb5b4c92e9b58cabeda2b882023c 100644 (file)
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -32,7 +32,6 @@
  #include <linux/init.h>
  #include <linux/idr.h>
  #include <linux/mutex.h>
-#include <linux/of_i2c.h>
  #include <linux/of_device.h>
  #include <linux/completion.h>
  #include <linux/hardirq.h>
@@ -197,11 +196,12 @@ static int i2c_device_pm_suspend(struct device *dev)
  {
         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
  
-       if (pm_runtime_suspended(dev))
-               return 0;
-
-       if (pm)
-               return pm->suspend ? pm->suspend(dev) : 0;
+       if (pm) {
+               if (pm_runtime_suspended(dev))
+                       return 0;
+               else
+                       return pm->suspend ? pm->suspend(dev) : 0;
+       }
  
         return i2c_legacy_suspend(dev, PMSG_SUSPEND);
  }
@@ -216,12 +216,6 @@ static int i2c_device_pm_resume(struct device *dev)
         else
                 ret = i2c_legacy_resume(dev);
  
-       if (!ret) {
-               pm_runtime_disable(dev);
-               pm_runtime_set_active(dev);
-               pm_runtime_enable(dev);
-       }
-
         return ret;
  }
  
@@ -229,11 +223,12 @@ static int i2c_device_pm_freeze(struct device *dev)
  {
         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
  
-       if (pm_runtime_suspended(dev))
-               return 0;
-
-       if (pm)
-               return pm->freeze ? pm->freeze(dev) : 0;
+       if (pm) {
+               if (pm_runtime_suspended(dev))
+                       return 0;
+               else
+                       return pm->freeze ? pm->freeze(dev) : 0;
+       }
  
         return i2c_legacy_suspend(dev, PMSG_FREEZE);
  }
@@ -242,11 +237,12 @@ static int i2c_device_pm_thaw(struct device *dev)
  {
         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
  
-       if (pm_runtime_suspended(dev))
-               return 0;
-
-       if (pm)
-               return pm->thaw ? pm->thaw(dev) : 0;
+       if (pm) {
+               if (pm_runtime_suspended(dev))
+                       return 0;
+               else
+                       return pm->thaw ? pm->thaw(dev) : 0;
+       }
  
         return i2c_legacy_resume(dev);
  }
@@ -255,11 +251,12 @@ static int i2c_device_pm_poweroff(struct device *dev)
  {
         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
  
-       if (pm_runtime_suspended(dev))
-               return 0;
-
-       if (pm)
-               return pm->poweroff ? pm->poweroff(dev) : 0;
+       if (pm) {
+               if (pm_runtime_suspended(dev))
+                       return 0;
+               else
+                       return pm->poweroff ? pm->poweroff(dev) : 0;
+       }
  
         return i2c_legacy_suspend(dev, PMSG_HIBERNATE);
  }
@@ -876,9 +873,6 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
         if (adap->nr < __i2c_first_dynamic_bus_num)
                 i2c_scan_static_board_info(adap);
  
-       /* Register devices from the device tree */
-       of_i2c_register_devices(adap);
-
         /* Notify drivers */
         mutex_lock(&core_lock);
         bus_for_each_drv(&i2c_bus_type, NULL, adap, __process_new_adapter);
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c

old mode 100755 (executable)

new mode 100644 (file)

index a10152b..c37ef64
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -83,7 +83,7 @@ static unsigned int mwait_substates;
  /* Reliable LAPIC Timer States, bit 1 for C1 etc.  */
  static unsigned int lapic_timer_reliable_states;
  
-static struct cpuidle_device *intel_idle_cpuidle_devices;
+static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
  static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state);
  
  static struct cpuidle_state *cpuidle_state_table;
@@ -108,7 +108,7 @@ static struct cpuidle_state nehalem_cstates[MWAIT_MAX_NUM_CSTATES] = {
                 .name = "NHM-C3",
                 .desc = "MWAIT 0x10",
                 .driver_data = (void *) 0x10,
-               .flags = CPUIDLE_FLAG_TIME_VALID,
+               .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
                 .exit_latency = 20,
                 .power_usage = 500,
                 .target_residency = 80,
@@ -117,7 +117,7 @@ static struct cpuidle_state nehalem_cstates[MWAIT_MAX_NUM_CSTATES] = {
                 .name = "NHM-C6",
                 .desc = "MWAIT 0x20",
                 .driver_data = (void *) 0x20,
-               .flags = CPUIDLE_FLAG_TIME_VALID,
+               .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
                 .exit_latency = 200,
                 .power_usage = 350,
                 .target_residency = 800,
@@ -149,7 +149,7 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = {
                 .name = "ATM-C4",
                 .desc = "MWAIT 0x30",
                 .driver_data = (void *) 0x30,
-               .flags = CPUIDLE_FLAG_TIME_VALID,
+               .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
                 .exit_latency = 100,
                 .power_usage = 250,
                 .target_residency = 400,
@@ -157,13 +157,13 @@ static struct cpuidle_state atom_cstates[MWAIT_MAX_NUM_CSTATES] = {
         { /* MWAIT C5 */ },
         { /* MWAIT C6 */
                 .name = "ATM-C6",
-               .desc = "MWAIT 0x40",
-               .driver_data = (void *) 0x40,
-               .flags = CPUIDLE_FLAG_TIME_VALID,
-               .exit_latency = 200,
+               .desc = "MWAIT 0x52",
+               .driver_data = (void *) 0x52,
+               .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
+               .exit_latency = 140,
                 .power_usage = 150,
-               .target_residency = 800,
-               .enter = NULL },        /* disabled */
+               .target_residency = 560,
+               .enter = &intel_idle },
  };
  
  /**
@@ -185,6 +185,16 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
  
         local_irq_disable();
  
+       /*
+        * If the state flag indicates that the TLB will be flushed or if this
+        * is the deepest c-state supported, do a voluntary leave mm to avoid
+        * costly and mostly unnecessary wakeups for flushing the user TLB's
+        * associated with the active mm.
+        */
+       if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED ||
+           (&dev->states[dev->state_count - 1] == state))
+               leave_mm(cpu);
+
         if (!(lapic_timer_reliable_states & (1 << (cstate))))
                 clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
  
diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c

index c908c5f83645c901f87823e2a587ba65d9b5ee97..9ddafc30f432236539f9304468d77e2daf761f7e 100644 (file)
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -669,6 +669,9 @@ static long evdev_do_ioctl(struct file *file, unsigned int cmd,
  
                 if ((_IOC_NR(cmd) & ~ABS_MAX) == _IOC_NR(EVIOCGABS(0))) {
  
+                       if (!dev->absinfo)
+                               return -EINVAL;
+
                         t = _IOC_NR(cmd) & ABS_MAX;
                         abs = dev->absinfo[t];
  
@@ -680,10 +683,13 @@ static long evdev_do_ioctl(struct file *file, unsigned int cmd,
                 }
         }
  
-       if (_IOC_DIR(cmd) == _IOC_READ) {
+       if (_IOC_DIR(cmd) == _IOC_WRITE) {
  
                 if ((_IOC_NR(cmd) & ~ABS_MAX) == _IOC_NR(EVIOCSABS(0))) {
  
+                       if (!dev->absinfo)
+                               return -EINVAL;
+
                         t = _IOC_NR(cmd) & ABS_MAX;
  
                         if (copy_from_user(&abs, p, min_t(size_t,
diff --git a/drivers/input/joydev.c b/drivers/input/joydev.c

index d85bd8a7967d2ee26aff8e5313c67a0cd7290532..22239e9884988139df3228a50db421f337818355 100644 (file)
--- a/drivers/input/joydev.c
+++ b/drivers/input/joydev.c
@@ -483,6 +483,9 @@ static int joydev_handle_JSIOCSAXMAP(struct joydev *joydev,
  
         memcpy(joydev->abspam, abspam, len);
  
+       for (i = 0; i < joydev->nabs; i++)
+               joydev->absmap[joydev->abspam[i]] = i;
+
   out:
         kfree(abspam);
         return retval;
diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c

index 0d4266a533a524564adcc85cc546736248753628..360698553eb55d2d1f1b84f912dbc6984eb606bd 100644 (file)
--- a/drivers/input/misc/uinput.c
+++ b/drivers/input/misc/uinput.c
@@ -404,6 +404,13 @@ static int uinput_setup_device(struct uinput_device *udev, const char __user *bu
                 retval = uinput_validate_absbits(dev);
                 if (retval < 0)
                         goto exit;
+               if (test_bit(ABS_MT_SLOT, dev->absbit)) {
+                       int nslot = input_abs_get_max(dev, ABS_MT_SLOT) + 1;
+                       input_mt_create_slots(dev, nslot);
+                       input_set_events_per_packet(dev, 6 * nslot);
+               } else if (test_bit(ABS_MT_POSITION_X, dev->absbit)) {
+                       input_set_events_per_packet(dev, 60);
+               }
         }
  
         udev->state = UIST_SETUP_COMPLETE;
diff --git a/drivers/input/tablet/wacom_sys.c b/drivers/input/tablet/wacom_sys.c

index 42ba3691d908bc1fc8c370da5ada4287ceb21115..b35876ee6908c7328f29e2505fd29a94a61b2d61 100644 (file)
--- a/drivers/input/tablet/wacom_sys.c
+++ b/drivers/input/tablet/wacom_sys.c
@@ -103,27 +103,26 @@ static void wacom_sys_irq(struct urb *urb)
  static int wacom_open(struct input_dev *dev)
  {
         struct wacom *wacom = input_get_drvdata(dev);
+       int retval = 0;
  
-       mutex_lock(&wacom->lock);
-
-       wacom->irq->dev = wacom->usbdev;
-
-       if (usb_autopm_get_interface(wacom->intf) < 0) {
-               mutex_unlock(&wacom->lock);
+       if (usb_autopm_get_interface(wacom->intf) < 0)
                 return -EIO;
-       }
+
+       mutex_lock(&wacom->lock);
  
         if (usb_submit_urb(wacom->irq, GFP_KERNEL)) {
-               usb_autopm_put_interface(wacom->intf);
-               mutex_unlock(&wacom->lock);
-               return -EIO;
+               retval = -EIO;
+               goto out;
         }
  
         wacom->open = true;
         wacom->intf->needs_remote_wakeup = 1;
  
+out:
         mutex_unlock(&wacom->lock);
-       return 0;
+       if (retval)
+               usb_autopm_put_interface(wacom->intf);
+       return retval;
  }
  
  static void wacom_close(struct input_dev *dev)
@@ -135,6 +134,8 @@ static void wacom_close(struct input_dev *dev)
         wacom->open = false;
         wacom->intf->needs_remote_wakeup = 0;
         mutex_unlock(&wacom->lock);
+
+       usb_autopm_put_interface(wacom->intf);
  }
  
  static int wacom_parse_hid(struct usb_interface *intf, struct hid_descriptor *hid_desc,
diff --git a/drivers/input/tablet/wacom_wac.c b/drivers/input/tablet/wacom_wac.c

index 6e29badb969e44192be85080caf801851c3d06fa..47fd7a041c52e1898a8727c5386569fb3caed45c 100644 (file)
--- a/drivers/input/tablet/wacom_wac.c
+++ b/drivers/input/tablet/wacom_wac.c
@@ -442,8 +442,10 @@ static void wacom_intuos_general(struct wacom_wac *wacom)
         /* general pen packet */
         if ((data[1] & 0xb8) == 0xa0) {
                 t = (data[6] << 2) | ((data[7] >> 6) & 3);
-               if (features->type >= INTUOS4S && features->type <= INTUOS4L)
+               if ((features->type >= INTUOS4S && features->type <= INTUOS4L) ||
+                   features->type == WACOM_21UX2) {
                         t = (t << 1) | (data[1] & 1);
+               }
                 input_report_abs(input, ABS_PRESSURE, t);
                 input_report_abs(input, ABS_TILT_X,
                                 ((data[7] << 1) & 0x7e) | (data[8] >> 7));
diff --git a/drivers/isdn/sc/interrupt.c b/drivers/isdn/sc/interrupt.c

index 485be8b1e1b33bef2bd7e3af8fab2066d3687ec3..f0225bc0f2670ce2f0fde9b5d276f26151c6a046 100644 (file)
--- a/drivers/isdn/sc/interrupt.c
+++ b/drivers/isdn/sc/interrupt.c
@@ -112,11 +112,19 @@ irqreturn_t interrupt_handler(int dummy, void *card_inst)
                         }
                         else if(callid>=0x0000 && callid<=0x7FFF)
                         {
+                               int len;
+
                                 pr_debug("%s: Got Incoming Call\n",
                                                 sc_adapter[card]->devicename);
-                               strcpy(setup.phone,&(rcvmsg.msg_data.byte_array[4]));
-                               strcpy(setup.eazmsn,
-                                       sc_adapter[card]->channel[rcvmsg.phy_link_no-1].dn);
+                               len = strlcpy(setup.phone, &(rcvmsg.msg_data.byte_array[4]),
+                                               sizeof(setup.phone));
+                               if (len >= sizeof(setup.phone))
+                                       continue;
+                               len = strlcpy(setup.eazmsn,
+                                               sc_adapter[card]->channel[rcvmsg.phy_link_no - 1].dn,
+                                               sizeof(setup.eazmsn));
+                               if (len >= sizeof(setup.eazmsn))
+                                       continue;
                                 setup.si1 = 7;
                                 setup.si2 = 0;
                                 setup.plan = 0;
@@ -176,7 +184,9 @@ irqreturn_t interrupt_handler(int dummy, void *card_inst)
                  * Handle a GetMyNumber Rsp
                  */
                 if (IS_CE_MESSAGE(rcvmsg,Call,0,GetMyNumber)){
-                       strcpy(sc_adapter[card]->channel[rcvmsg.phy_link_no-1].dn,rcvmsg.msg_data.byte_array);
+                       strlcpy(sc_adapter[card]->channel[rcvmsg.phy_link_no - 1].dn,
+                               rcvmsg.msg_data.byte_array,
+                               sizeof(rcvmsg.msg_data.byte_array));
                         continue;
                 }
                         
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c

index ed4900ade93a4d80b84784e76aafc406689020b2..e4fb58db5454d4bfc5cc5257bb74435907b3b3f2 100644 (file)
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1000,10 +1000,11 @@ static int bitmap_init_from_disk(struct bitmap *bitmap, sector_t start)
                                 page = bitmap->sb_page;
                                 offset = sizeof(bitmap_super_t);
                                 if (!file)
-                                       read_sb_page(bitmap->mddev,
-                                                    bitmap->mddev->bitmap_info.offset,
-                                                    page,
-                                                    index, count);
+                                       page = read_sb_page(
+                                               bitmap->mddev,
+                                               bitmap->mddev->bitmap_info.offset,
+                                               page,
+                                               index, count);
                         } else if (file) {
                                 page = read_page(file, index, bitmap, count);
                                 offset = 0;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c

index ad83a4dcadc3ed7cafa914d2e4dcb7ef1a939fdf..0b830bbe1d8b6323bac02106c2ccc4d806e1e390 100644 (file)
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1839,7 +1839,9 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
  
                 /* take from bio_init */
                 bio->bi_next = NULL;
+               bio->bi_flags &= ~(BIO_POOL_MASK-1);
                 bio->bi_flags |= 1 << BIO_UPTODATE;
+               bio->bi_comp_cpu = -1;
                 bio->bi_rw = READ;
                 bio->bi_vcnt = 0;
                 bio->bi_idx = 0;
@@ -1912,7 +1914,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
                             !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
                                 break;
                         BUG_ON(sync_blocks < (PAGE_SIZE>>9));
-                       if (len > (sync_blocks<<9))
+                       if ((len >> 9) > sync_blocks)
                                 len = sync_blocks<<9;
                 }
  
diff --git a/drivers/media/IR/ir-keytable.c b/drivers/media/IR/ir-keytable.c

index 7e82a9df726b51ab6d90f00fc1b1fa22998c8862..7961d59f5cace91b18fc6a67fef4caea09f08265 100644 (file)
--- a/drivers/media/IR/ir-keytable.c
+++ b/drivers/media/IR/ir-keytable.c
@@ -319,7 +319,7 @@ static void ir_timer_keyup(unsigned long cookie)
          * a keyup event might follow immediately after the keydown.
          */
         spin_lock_irqsave(&ir->keylock, flags);
-       if (time_is_after_eq_jiffies(ir->keyup_jiffies))
+       if (time_is_before_eq_jiffies(ir->keyup_jiffies))
                 ir_keyup(ir);
         spin_unlock_irqrestore(&ir->keylock, flags);
  }
@@ -510,6 +510,13 @@ int __ir_input_register(struct input_dev *input_dev,
                    (ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_IR_RAW) ?
                         " in raw mode" : "");
  
+       /*
+        * Default delay of 250ms is too short for some protocols, expecially
+        * since the timeout is currently set to 250ms. Increase it to 500ms,
+        * to avoid wrong repetition of the keycodes.
+        */
+       input_dev->rep[REP_DELAY] = 500;
+
         return 0;
  
  out_event:
diff --git a/drivers/media/IR/ir-lirc-codec.c b/drivers/media/IR/ir-lirc-codec.c

index 77b5946413c0203739d9bdcecb129e218f194356..e63f757d5d72ca0ba6c1b4e206914c7d7625112a 100644 (file)
--- a/drivers/media/IR/ir-lirc-codec.c
+++ b/drivers/media/IR/ir-lirc-codec.c
@@ -267,7 +267,7 @@ static int ir_lirc_register(struct input_dev *input_dev)
                         features |= LIRC_CAN_SET_SEND_CARRIER;
  
                 if (ir_dev->props->s_tx_duty_cycle)
-                       features |= LIRC_CAN_SET_REC_DUTY_CYCLE;
+                       features |= LIRC_CAN_SET_SEND_DUTY_CYCLE;
         }
  
         if (ir_dev->props->s_rx_carrier_range)
diff --git a/drivers/media/IR/ir-raw-event.c b/drivers/media/IR/ir-raw-event.c

index 43094e7eccfa92ba6213115ecdab06d88b1387ec..8e0e1b1f8c87ef9f83f05ab6a564e30ddd78aa4c 100644 (file)
--- a/drivers/media/IR/ir-raw-event.c
+++ b/drivers/media/IR/ir-raw-event.c
@@ -279,9 +279,11 @@ int ir_raw_event_register(struct input_dev *input_dev)
                         "rc%u",  (unsigned int)ir->devno);
  
         if (IS_ERR(ir->raw->thread)) {
+               int ret = PTR_ERR(ir->raw->thread);
+
                 kfree(ir->raw);
                 ir->raw = NULL;
-               return PTR_ERR(ir->raw->thread);
+               return ret;
         }
  
         mutex_lock(&ir_raw_handler_lock);
diff --git a/drivers/media/IR/ir-sysfs.c b/drivers/media/IR/ir-sysfs.c

index 96dafc425c8e61495cd662bf7f4c11182d674e79..46d42467f9b43010739895f165ed7bfe93793137 100644 (file)
--- a/drivers/media/IR/ir-sysfs.c
+++ b/drivers/media/IR/ir-sysfs.c
@@ -67,13 +67,14 @@ static ssize_t show_protocols(struct device *d,
         char *tmp = buf;
         int i;
  
-       if (ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
+       if (ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
                 enabled = ir_dev->rc_tab.ir_type;
                 allowed = ir_dev->props->allowed_protos;
-       } else {
+       } else if (ir_dev->raw) {
                 enabled = ir_dev->raw->enabled_protocols;
                 allowed = ir_raw_get_allowed_protocols();
-       }
+       } else
+               return sprintf(tmp, "[builtin]\n");
  
         IR_dprintk(1, "allowed - 0x%llx, enabled - 0x%llx\n",
                    (long long)allowed,
@@ -121,10 +122,14 @@ static ssize_t store_protocols(struct device *d,
         int rc, i, count = 0;
         unsigned long flags;
  
-       if (ir_dev->props->driver_type == RC_DRIVER_SCANCODE)
+       if (ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_SCANCODE)
                 type = ir_dev->rc_tab.ir_type;
-       else
+       else if (ir_dev->raw)
                 type = ir_dev->raw->enabled_protocols;
+       else {
+               IR_dprintk(1, "Protocol switching not supported\n");
+               return -EINVAL;
+       }
  
         while ((tmp = strsep((char **) &data, " \n")) != NULL) {
                 if (!*tmp)
@@ -185,7 +190,7 @@ static ssize_t store_protocols(struct device *d,
                 }
         }
  
-       if (ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
+       if (ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
                 spin_lock_irqsave(&ir_dev->rc_tab.lock, flags);
                 ir_dev->rc_tab.ir_type = type;
                 spin_unlock_irqrestore(&ir_dev->rc_tab.lock, flags);
diff --git a/drivers/media/IR/keymaps/rc-rc6-mce.c b/drivers/media/IR/keymaps/rc-rc6-mce.c

index 64264f7f838f29a0be7861f872f8e852bf376669..39557ad401b63fce5a892e5245fe7d6f42b83ee1 100644 (file)
--- a/drivers/media/IR/keymaps/rc-rc6-mce.c
+++ b/drivers/media/IR/keymaps/rc-rc6-mce.c
@@ -19,6 +19,7 @@ static struct ir_scancode rc6_mce[] = {
  
         { 0x800f0416, KEY_PLAY },
         { 0x800f0418, KEY_PAUSE },
+       { 0x800f046e, KEY_PLAYPAUSE },
         { 0x800f0419, KEY_STOP },
         { 0x800f0417, KEY_RECORD },
  
@@ -37,6 +38,8 @@ static struct ir_scancode rc6_mce[] = {
         { 0x800f0411, KEY_VOLUMEDOWN },
         { 0x800f0412, KEY_CHANNELUP },
         { 0x800f0413, KEY_CHANNELDOWN },
+       { 0x800f043a, KEY_BRIGHTNESSUP },
+       { 0x800f0480, KEY_BRIGHTNESSDOWN },
  
         { 0x800f0401, KEY_NUMERIC_1 },
         { 0x800f0402, KEY_NUMERIC_2 },
diff --git a/drivers/media/IR/mceusb.c b/drivers/media/IR/mceusb.c

index ac6bb2c01a4810446451d2651df53936b57d104c..bc620e10ef77e46149f57bdf278cc8bf79d150b3 100644 (file)
--- a/drivers/media/IR/mceusb.c
+++ b/drivers/media/IR/mceusb.c
@@ -120,6 +120,10 @@ static struct usb_device_id mceusb_dev_table[] = {
         { USB_DEVICE(VENDOR_PHILIPS, 0x0613) },
         /* Philips eHome Infrared Transceiver */
         { USB_DEVICE(VENDOR_PHILIPS, 0x0815) },
+       /* Philips/Spinel plus IR transceiver for ASUS */
+       { USB_DEVICE(VENDOR_PHILIPS, 0x206c) },
+       /* Philips/Spinel plus IR transceiver for ASUS */
+       { USB_DEVICE(VENDOR_PHILIPS, 0x2088) },
         /* Realtek MCE IR Receiver */
         { USB_DEVICE(VENDOR_REALTEK, 0x0161) },
         /* SMK/Toshiba G83C0004D410 */
diff --git a/drivers/media/dvb/dvb-usb/dib0700_core.c b/drivers/media/dvb/dvb-usb/dib0700_core.c

index fe818348b8a36450357f2b19571b000afdcd873f..48397f103d326264b261506539bc1c4cee9a343d 100644 (file)
--- a/drivers/media/dvb/dvb-usb/dib0700_core.c
+++ b/drivers/media/dvb/dvb-usb/dib0700_core.c
@@ -673,9 +673,6 @@ static int dib0700_probe(struct usb_interface *intf,
                         else
                                 dev->props.rc.core.bulk_mode = false;
  
-                       /* Need a higher delay, to avoid wrong repeat */
-                       dev->rc_input_dev->rep[REP_DELAY] = 500;
-
                         dib0700_rc_setup(dev);
  
                         return 0;
diff --git a/drivers/media/dvb/dvb-usb/dib0700_devices.c b/drivers/media/dvb/dvb-usb/dib0700_devices.c

index f634d2e784b2ce24a005b0593372b505b903e1c3..e06acd1fecb61b9f3bf2676f0a16375f9101e907 100644 (file)
--- a/drivers/media/dvb/dvb-usb/dib0700_devices.c
+++ b/drivers/media/dvb/dvb-usb/dib0700_devices.c
@@ -940,6 +940,58 @@ static int stk7070p_frontend_attach(struct dvb_usb_adapter *adap)
         return adap->fe == NULL ? -ENODEV : 0;
  }
  
+/* STK7770P */
+static struct dib7000p_config dib7770p_dib7000p_config = {
+       .output_mpeg2_in_188_bytes = 1,
+
+       .agc_config_count = 1,
+       .agc = &dib7070_agc_config,
+       .bw  = &dib7070_bw_config_12_mhz,
+       .tuner_is_baseband = 1,
+       .spur_protect = 1,
+
+       .gpio_dir = DIB7000P_GPIO_DEFAULT_DIRECTIONS,
+       .gpio_val = DIB7000P_GPIO_DEFAULT_VALUES,
+       .gpio_pwm_pos = DIB7000P_GPIO_DEFAULT_PWM_POS,
+
+       .hostbus_diversity = 1,
+       .enable_current_mirror = 1,
+       .disable_sample_and_hold = 0,
+};
+
+static int stk7770p_frontend_attach(struct dvb_usb_adapter *adap)
+{
+       struct usb_device_descriptor *p = &adap->dev->udev->descriptor;
+       if (p->idVendor  == cpu_to_le16(USB_VID_PINNACLE) &&
+           p->idProduct == cpu_to_le16(USB_PID_PINNACLE_PCTV72E))
+               dib0700_set_gpio(adap->dev, GPIO6, GPIO_OUT, 0);
+       else
+               dib0700_set_gpio(adap->dev, GPIO6, GPIO_OUT, 1);
+       msleep(10);
+       dib0700_set_gpio(adap->dev, GPIO9, GPIO_OUT, 1);
+       dib0700_set_gpio(adap->dev, GPIO4, GPIO_OUT, 1);
+       dib0700_set_gpio(adap->dev, GPIO7, GPIO_OUT, 1);
+       dib0700_set_gpio(adap->dev, GPIO10, GPIO_OUT, 0);
+
+       dib0700_ctrl_clock(adap->dev, 72, 1);
+
+       msleep(10);
+       dib0700_set_gpio(adap->dev, GPIO10, GPIO_OUT, 1);
+       msleep(10);
+       dib0700_set_gpio(adap->dev, GPIO0, GPIO_OUT, 1);
+
+       if (dib7000p_i2c_enumeration(&adap->dev->i2c_adap, 1, 18,
+                                    &dib7770p_dib7000p_config) != 0) {
+               err("%s: dib7000p_i2c_enumeration failed.  Cannot continue\n",
+                   __func__);
+               return -ENODEV;
+       }
+
+       adap->fe = dvb_attach(dib7000p_attach, &adap->dev->i2c_adap, 0x80,
+               &dib7770p_dib7000p_config);
+       return adap->fe == NULL ? -ENODEV : 0;
+}
+
  /* DIB807x generic */
  static struct dibx000_agc_config dib807x_agc_config[2] = {
         {
@@ -1781,7 +1833,7 @@ struct usb_device_id dib0700_usb_id_table[] = {
  /* 60 */{ USB_DEVICE(USB_VID_TERRATEC, USB_PID_TERRATEC_CINERGY_T_XXS_2) },
         { USB_DEVICE(USB_VID_DIBCOM,    USB_PID_DIBCOM_STK807XPVR) },
         { USB_DEVICE(USB_VID_DIBCOM,    USB_PID_DIBCOM_STK807XP) },
-       { USB_DEVICE(USB_VID_PIXELVIEW, USB_PID_PIXELVIEW_SBTVD) },
+       { USB_DEVICE_VER(USB_VID_PIXELVIEW, USB_PID_PIXELVIEW_SBTVD, 0x000, 0x3f00) },
         { USB_DEVICE(USB_VID_EVOLUTEPC, USB_PID_TVWAY_PLUS) },
  /* 65 */{ USB_DEVICE(USB_VID_PINNACLE, USB_PID_PINNACLE_PCTV73ESE) },
         { USB_DEVICE(USB_VID_PINNACLE,  USB_PID_PINNACLE_PCTV282E) },
@@ -2406,7 +2458,7 @@ struct dvb_usb_device_properties dib0700_devices[] = {
                                 .pid_filter_count = 32,
                                 .pid_filter       = stk70x0p_pid_filter,
                                 .pid_filter_ctrl  = stk70x0p_pid_filter_ctrl,
-                               .frontend_attach  = stk7070p_frontend_attach,
+                               .frontend_attach  = stk7770p_frontend_attach,
                                 .tuner_attach     = dib7770p_tuner_attach,
  
                                 DIB0700_DEFAULT_STREAMING_CONFIG(0x02),
diff --git a/drivers/media/dvb/dvb-usb/opera1.c b/drivers/media/dvb/dvb-usb/opera1.c

index 6b22ec64ab0cc69d7124bc16421309d3e9c070cd..f896337b453518e603419beedd5093a9a5f1fb16 100644 (file)
--- a/drivers/media/dvb/dvb-usb/opera1.c
+++ b/drivers/media/dvb/dvb-usb/opera1.c
@@ -483,9 +483,7 @@ static int opera1_xilinx_load_firmware(struct usb_device *dev,
                 }
         }
         kfree(p);
-       if (fw) {
-               release_firmware(fw);
-       }
+       release_firmware(fw);
         return ret;
  }
  
diff --git a/drivers/media/dvb/frontends/dib7000p.c b/drivers/media/dvb/frontends/dib7000p.c

index 2e28b973dfd3cbb1743ac48d3fb931c9d2600063..3aed0d43392152688bbe4176ebc8ee68712ab724 100644 (file)
--- a/drivers/media/dvb/frontends/dib7000p.c
+++ b/drivers/media/dvb/frontends/dib7000p.c
@@ -260,6 +260,9 @@ static void dib7000p_set_adc_state(struct dib7000p_state *state, enum dibx000_ad
  
  //     dprintk( "908: %x, 909: %x\n", reg_908, reg_909);
  
+       reg_909 |= (state->cfg.disable_sample_and_hold & 1) << 4;
+       reg_908 |= (state->cfg.enable_current_mirror & 1) << 7;
+
         dib7000p_write_word(state, 908, reg_908);
         dib7000p_write_word(state, 909, reg_909);
  }
@@ -778,7 +781,10 @@ static void dib7000p_set_channel(struct dib7000p_state *state, struct dvb_fronte
                 default:
                 case GUARD_INTERVAL_1_32: value *= 1; break;
         }
-       state->div_sync_wait = (value * 3) / 2 + 32; // add 50% SFN margin + compensate for one DVSY-fifo TODO
+       if (state->cfg.diversity_delay == 0)
+               state->div_sync_wait = (value * 3) / 2 + 48; // add 50% SFN margin + compensate for one DVSY-fifo
+       else
+               state->div_sync_wait = (value * 3) / 2 + state->cfg.diversity_delay; // add 50% SFN margin + compensate for one DVSY-fifo
  
         /* deactive the possibility of diversity reception if extended interleaver */
         state->div_force_off = !1 && ch->u.ofdm.transmission_mode != TRANSMISSION_MODE_8K;
diff --git a/drivers/media/dvb/frontends/dib7000p.h b/drivers/media/dvb/frontends/dib7000p.h

index 805dd13a97ee347d3b06e517e266d2bb1de9e4d3..da17345bf5bdd66002ea64080e2c4456bfde1ca3 100644 (file)
--- a/drivers/media/dvb/frontends/dib7000p.h
+++ b/drivers/media/dvb/frontends/dib7000p.h
@@ -33,6 +33,11 @@ struct dib7000p_config {
         int (*agc_control) (struct dvb_frontend *, u8 before);
  
         u8 output_mode;
+       u8 disable_sample_and_hold : 1;
+
+       u8 enable_current_mirror : 1;
+       u8 diversity_delay;
+
  };
  
  #define DEFAULT_DIB7000P_I2C_ADDRESS 18
diff --git a/drivers/media/dvb/siano/smscoreapi.c b/drivers/media/dvb/siano/smscoreapi.c

index d93468cd3a85e1a5a3d8eee586c7ec29b69d3051..ff3b0fa901b39f00e23250e20dda2502fbb42e04 100644 (file)
--- a/drivers/media/dvb/siano/smscoreapi.c
+++ b/drivers/media/dvb/siano/smscoreapi.c
@@ -1098,33 +1098,26 @@ EXPORT_SYMBOL_GPL(smscore_onresponse);
   *
   * @return pointer to descriptor on success, NULL on error.
   */
-struct smscore_buffer_t *smscore_getbuffer(struct smscore_device_t *coredev)
+
+struct smscore_buffer_t *get_entry(struct smscore_device_t *coredev)
  {
         struct smscore_buffer_t *cb = NULL;
         unsigned long flags;
  
-       DEFINE_WAIT(wait);
-
         spin_lock_irqsave(&coredev->bufferslock, flags);
-
-       /* This function must return a valid buffer, since the buffer list is
-        * finite, we check that there is an available buffer, if not, we wait
-        * until such buffer become available.
-        */
-
-       prepare_to_wait(&coredev->buffer_mng_waitq, &wait, TASK_INTERRUPTIBLE);
-       if (list_empty(&coredev->buffers)) {
-               spin_unlock_irqrestore(&coredev->bufferslock, flags);
-               schedule();
-               spin_lock_irqsave(&coredev->bufferslock, flags);
+       if (!list_empty(&coredev->buffers)) {
+               cb = (struct smscore_buffer_t *) coredev->buffers.next;
+               list_del(&cb->entry);
         }
+       spin_unlock_irqrestore(&coredev->bufferslock, flags);
+       return cb;
+}
  
-       finish_wait(&coredev->buffer_mng_waitq, &wait);
-
-       cb = (struct smscore_buffer_t *) coredev->buffers.next;
-       list_del(&cb->entry);
+struct smscore_buffer_t *smscore_getbuffer(struct smscore_device_t *coredev)
+{
+       struct smscore_buffer_t *cb = NULL;
  
-       spin_unlock_irqrestore(&coredev->bufferslock, flags);
+       wait_event(coredev->buffer_mng_waitq, (cb = get_entry(coredev)));
  
         return cb;
  }
diff --git a/drivers/media/radio/si470x/radio-si470x-i2c.c b/drivers/media/radio/si470x/radio-si470x-i2c.c

index 67a4ec8768a6145ecfd6fb1d44095bfb51f29fdc..4ce541a5eb47558f5b26dc9bd14523e937c5cfaf 100644 (file)
--- a/drivers/media/radio/si470x/radio-si470x-i2c.c
+++ b/drivers/media/radio/si470x/radio-si470x-i2c.c
@@ -395,7 +395,7 @@ static int __devinit si470x_i2c_probe(struct i2c_client *client,
         radio->registers[POWERCFG] = POWERCFG_ENABLE;
         if (si470x_set_register(radio, POWERCFG) < 0) {
                 retval = -EIO;
-               goto err_all;
+               goto err_video;
         }
         msleep(110);
  
diff --git a/drivers/media/video/cx231xx/Makefile b/drivers/media/video/cx231xx/Makefile

index 755dd0ce65ff724f2fd479f502e314e0cabf236e..6f2b57384488b3bb8124eed072217ef8ecf3194b 100644 (file)
--- a/drivers/media/video/cx231xx/Makefile
+++ b/drivers/media/video/cx231xx/Makefile
@@ -11,4 +11,5 @@ EXTRA_CFLAGS += -Idrivers/media/video
  EXTRA_CFLAGS += -Idrivers/media/common/tuners
  EXTRA_CFLAGS += -Idrivers/media/dvb/dvb-core
  EXTRA_CFLAGS += -Idrivers/media/dvb/frontends
+EXTRA_CFLAGS += -Idrivers/media/dvb/dvb-usb
  
diff --git a/drivers/media/video/cx231xx/cx231xx-cards.c b/drivers/media/video/cx231xx/cx231xx-cards.c

index 6bdc0ef18119716dadc5facc2347f22fda5c6192..f2a4900014bc5c1dace49615b123cb86184ba814 100644 (file)
--- a/drivers/media/video/cx231xx/cx231xx-cards.c
+++ b/drivers/media/video/cx231xx/cx231xx-cards.c
@@ -32,6 +32,7 @@
  #include <media/v4l2-chip-ident.h>
  
  #include <media/cx25840.h>
+#include "dvb-usb-ids.h"
  #include "xc5000.h"
  
  #include "cx231xx.h"
@@ -175,6 +176,8 @@ struct usb_device_id cx231xx_id_table[] = {
          .driver_info = CX231XX_BOARD_CNXT_RDE_250},
         {USB_DEVICE(0x0572, 0x58A1),
          .driver_info = CX231XX_BOARD_CNXT_RDU_250},
+       {USB_DEVICE_VER(USB_VID_PIXELVIEW, USB_PID_PIXELVIEW_SBTVD, 0x4000,0x4fff),
+        .driver_info = CX231XX_BOARD_UNKNOWN},
         {},
  };
  
@@ -226,14 +229,16 @@ void cx231xx_pre_card_setup(struct cx231xx *dev)
                      dev->board.name, dev->model);
  
         /* set the direction for GPIO pins */
-       cx231xx_set_gpio_direction(dev, dev->board.tuner_gpio->bit, 1);
-       cx231xx_set_gpio_value(dev, dev->board.tuner_gpio->bit, 1);
-       cx231xx_set_gpio_direction(dev, dev->board.tuner_sif_gpio, 1);
+       if (dev->board.tuner_gpio) {
+               cx231xx_set_gpio_direction(dev, dev->board.tuner_gpio->bit, 1);
+               cx231xx_set_gpio_value(dev, dev->board.tuner_gpio->bit, 1);
+               cx231xx_set_gpio_direction(dev, dev->board.tuner_sif_gpio, 1);
  
-       /* request some modules if any required */
+               /* request some modules if any required */
  
-       /* reset the Tuner */
-       cx231xx_gpio_set(dev, dev->board.tuner_gpio);
+               /* reset the Tuner */
+               cx231xx_gpio_set(dev, dev->board.tuner_gpio);
+       }
  
         /* set the mode to Analog mode initially */
         cx231xx_set_mode(dev, CX231XX_ANALOG_MODE);
diff --git a/drivers/media/video/cx25840/cx25840-core.c b/drivers/media/video/cx25840/cx25840-core.c

index 86ca8c2359dd8bb409cadd434dc951a17ff170a9..f5a3e74c3c7cc0e6e52ebec222b28604e7cf53ef 100644 (file)
--- a/drivers/media/video/cx25840/cx25840-core.c
+++ b/drivers/media/video/cx25840/cx25840-core.c
@@ -1996,7 +1996,7 @@ static int cx25840_probe(struct i2c_client *client,
  
                 state->volume = v4l2_ctrl_new_std(&state->hdl,
                         &cx25840_audio_ctrl_ops, V4L2_CID_AUDIO_VOLUME,
-                       0, 65335, 65535 / 100, default_volume);
+                       0, 65535, 65535 / 100, default_volume);
                 state->mute = v4l2_ctrl_new_std(&state->hdl,
                         &cx25840_audio_ctrl_ops, V4L2_CID_AUDIO_MUTE,
                         0, 1, 1, 0);
diff --git a/drivers/media/video/cx88/Kconfig b/drivers/media/video/cx88/Kconfig

index 99dbae1175919befc55ae7e2e00a3a524f666053..0fa85cbefbb12ffbe1e5729581ce6544dbadcfce 100644 (file)
--- a/drivers/media/video/cx88/Kconfig
+++ b/drivers/media/video/cx88/Kconfig
@@ -17,7 +17,7 @@ config VIDEO_CX88
  
  config VIDEO_CX88_ALSA
         tristate "Conexant 2388x DMA audio support"
-       depends on VIDEO_CX88 && SND && EXPERIMENTAL
+       depends on VIDEO_CX88 && SND
         select SND_PCM
         ---help---
           This is a video4linux driver for direct (DMA) audio on
diff --git a/drivers/media/video/gspca/gspca.c b/drivers/media/video/gspca/gspca.c

index b9846106913eb4871f429924092dd27563f0f115..78abc1c1f9d52766704af26c0ec3769cd08f9198 100644 (file)
--- a/drivers/media/video/gspca/gspca.c
+++ b/drivers/media/video/gspca/gspca.c
@@ -223,6 +223,7 @@ static int alloc_and_submit_int_urb(struct gspca_dev *gspca_dev,
                 usb_rcvintpipe(dev, ep->bEndpointAddress),
                 buffer, buffer_len,
                 int_irq, (void *)gspca_dev, interval);
+       urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
         gspca_dev->int_urb = urb;
         ret = usb_submit_urb(urb, GFP_KERNEL);
         if (ret < 0) {
diff --git a/drivers/media/video/gspca/sn9c20x.c b/drivers/media/video/gspca/sn9c20x.c

index 83a718f0f3f9841b8412c34967f41476040398e0..9052d5702556539fbf77dbd8cf1da66e5d87bdc4 100644 (file)
--- a/drivers/media/video/gspca/sn9c20x.c
+++ b/drivers/media/video/gspca/sn9c20x.c
@@ -2357,8 +2357,7 @@ static void sd_pkt_scan(struct gspca_dev *gspca_dev,
                             (data[33] << 10);
                 avg_lum >>= 9;
                 atomic_set(&sd->avg_lum, avg_lum);
-               gspca_frame_add(gspca_dev, LAST_PACKET,
-                               data, len);
+               gspca_frame_add(gspca_dev, LAST_PACKET, NULL, 0);
                 return;
         }
         if (gspca_dev->last_packet_type == LAST_PACKET) {
diff --git a/drivers/media/video/ivtv/ivtvfb.c b/drivers/media/video/ivtv/ivtvfb.c

index be03a712731c3b61a249d607927fb852f0517881..f0316d02f09f6df1297a6f3e5e6ed4e0c1b8ff1f 100644 (file)
--- a/drivers/media/video/ivtv/ivtvfb.c
+++ b/drivers/media/video/ivtv/ivtvfb.c
@@ -466,6 +466,8 @@ static int ivtvfb_ioctl(struct fb_info *info, unsigned int cmd, unsigned long ar
                         struct fb_vblank vblank;
                         u32 trace;
  
+                       memset(&vblank, 0, sizeof(struct fb_vblank));
+
                         vblank.flags = FB_VBLANK_HAVE_COUNT |FB_VBLANK_HAVE_VCOUNT |
                                         FB_VBLANK_HAVE_VSYNC;
                         trace = read_reg(IVTV_REG_DEC_LINE_FIELD) >> 16;
diff --git a/drivers/media/video/mem2mem_testdev.c b/drivers/media/video/mem2mem_testdev.c

index 4525335f9bd416388484cbc5872ea0db3c3af768..a7210d981388e8c4724f524e3fd5c77bbd672dca 100644 (file)
--- a/drivers/media/video/mem2mem_testdev.c
+++ b/drivers/media/video/mem2mem_testdev.c
@@ -239,7 +239,7 @@ static int device_process(struct m2mtest_ctx *ctx,
                 return -EFAULT;
         }
  
-       if (in_buf->vb.size < out_buf->vb.size) {
+       if (in_buf->vb.size > out_buf->vb.size) {
                 v4l2_err(&dev->v4l2_dev, "Output buffer is too small\n");
                 return -EINVAL;
         }
@@ -1014,6 +1014,7 @@ static int m2mtest_remove(struct platform_device *pdev)
         v4l2_m2m_release(dev->m2m_dev);
         del_timer_sync(&dev->timer);
         video_unregister_device(dev->vfd);
+       video_device_release(dev->vfd);
         v4l2_device_unregister(&dev->v4l2_dev);
         kfree(dev);
  
diff --git a/drivers/media/video/mt9m111.c b/drivers/media/video/mt9m111.c

index 758a4db27d65651481eec16b970f755a9036f622..c71af4e0e517f61631b1cc936021104dbcd30f92 100644 (file)
--- a/drivers/media/video/mt9m111.c
+++ b/drivers/media/video/mt9m111.c
@@ -447,6 +447,9 @@ static int mt9m111_s_crop(struct v4l2_subdev *sd, struct v4l2_crop *a)
         dev_dbg(&client->dev, "%s left=%d, top=%d, width=%d, height=%d\n",
                 __func__, rect.left, rect.top, rect.width, rect.height);
  
+       if (a->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+               return -EINVAL;
+
         ret = mt9m111_make_rect(client, &rect);
         if (!ret)
                 mt9m111->rect = rect;
@@ -466,12 +469,14 @@ static int mt9m111_g_crop(struct v4l2_subdev *sd, struct v4l2_crop *a)
  
  static int mt9m111_cropcap(struct v4l2_subdev *sd, struct v4l2_cropcap *a)
  {
+       if (a->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+               return -EINVAL;
+
         a->bounds.left                  = MT9M111_MIN_DARK_COLS;
         a->bounds.top                   = MT9M111_MIN_DARK_ROWS;
         a->bounds.width                 = MT9M111_MAX_WIDTH;
         a->bounds.height                = MT9M111_MAX_HEIGHT;
         a->defrect                      = a->bounds;
-       a->type                         = V4L2_BUF_TYPE_VIDEO_CAPTURE;
         a->pixelaspect.numerator        = 1;
         a->pixelaspect.denominator      = 1;
  
@@ -487,6 +492,7 @@ static int mt9m111_g_fmt(struct v4l2_subdev *sd,
         mf->width       = mt9m111->rect.width;
         mf->height      = mt9m111->rect.height;
         mf->code        = mt9m111->fmt->code;
+       mf->colorspace  = mt9m111->fmt->colorspace;
         mf->field       = V4L2_FIELD_NONE;
  
         return 0;
diff --git a/drivers/media/video/mt9v022.c b/drivers/media/video/mt9v022.c

index e7cd23cd63941ecb3380243f623960cf91c8ea31..b48473c7896b4b31d6d816ac544be70e6a1ac3c7 100644 (file)
--- a/drivers/media/video/mt9v022.c
+++ b/drivers/media/video/mt9v022.c
@@ -402,9 +402,6 @@ static int mt9v022_s_fmt(struct v4l2_subdev *sd,
                 if (mt9v022->model != V4L2_IDENT_MT9V022IX7ATC)
                         return -EINVAL;
                 break;
-       case 0:
-               /* No format change, only geometry */
-               break;
         default:
                 return -EINVAL;
         }
diff --git a/drivers/media/video/mx2_camera.c b/drivers/media/video/mx2_camera.c

index 66ff174151b5f3d909022fbc096f8c432e6f65f4..b6ea67221d1d5fc64594348f49715539c19f9840 100644 (file)
--- a/drivers/media/video/mx2_camera.c
+++ b/drivers/media/video/mx2_camera.c
@@ -378,6 +378,9 @@ static void mx25_camera_frame_done(struct mx2_camera_dev *pcdev, int fb,
  
         spin_lock_irqsave(&pcdev->lock, flags);
  
+       if (*fb_active == NULL)
+               goto out;
+
         vb = &(*fb_active)->vb;
         dev_dbg(pcdev->dev, "%s (vb=0x%p) 0x%08lx %d\n", __func__,
                 vb, vb->baddr, vb->bsize);
@@ -402,6 +405,7 @@ static void mx25_camera_frame_done(struct mx2_camera_dev *pcdev, int fb,
  
         *fb_active = buf;
  
+out:
         spin_unlock_irqrestore(&pcdev->lock, flags);
  }
  
diff --git a/drivers/media/video/pvrusb2/pvrusb2-ctrl.c b/drivers/media/video/pvrusb2/pvrusb2-ctrl.c

index 1b992b847198a486bc0eb62342d3d1b020c99919..55ea914c7fcd3e0a21a17671ef68294831b4a567 100644 (file)
--- a/drivers/media/video/pvrusb2/pvrusb2-ctrl.c
+++ b/drivers/media/video/pvrusb2/pvrusb2-ctrl.c
@@ -513,7 +513,7 @@ int pvr2_ctrl_sym_to_value(struct pvr2_ctrl *cptr,
                         if (ret >= 0) {
                                 ret = pvr2_ctrl_range_check(cptr,*valptr);
                         }
-                       if (maskptr) *maskptr = ~0;
+                       *maskptr = ~0;
                 } else if (cptr->info->type == pvr2_ctl_bool) {
                         ret = parse_token(ptr,len,valptr,boolNames,
                                           ARRAY_SIZE(boolNames));
@@ -522,7 +522,7 @@ int pvr2_ctrl_sym_to_value(struct pvr2_ctrl *cptr,
                         } else if (ret == 0) {
                                 *valptr = (*valptr & 1) ? !0 : 0;
                         }
-                       if (maskptr) *maskptr = 1;
+                       *maskptr = 1;
                 } else if (cptr->info->type == pvr2_ctl_enum) {
                         ret = parse_token(
                                 ptr,len,valptr,
@@ -531,7 +531,7 @@ int pvr2_ctrl_sym_to_value(struct pvr2_ctrl *cptr,
                         if (ret >= 0) {
                                 ret = pvr2_ctrl_range_check(cptr,*valptr);
                         }
-                       if (maskptr) *maskptr = ~0;
+                       *maskptr = ~0;
                 } else if (cptr->info->type == pvr2_ctl_bitmask) {
                         ret = parse_tlist(
                                 ptr,len,maskptr,valptr,
diff --git a/drivers/media/video/s5p-fimc/fimc-core.c b/drivers/media/video/s5p-fimc/fimc-core.c

index b151c7be8a506b18ae1ff544de0ba604f7e05e8f..6961c55baf9b1140609dd470b4abc3e05eed86c3 100644 (file)
--- a/drivers/media/video/s5p-fimc/fimc-core.c
+++ b/drivers/media/video/s5p-fimc/fimc-core.c
@@ -393,6 +393,37 @@ static void fimc_set_yuv_order(struct fimc_ctx *ctx)
         dbg("ctx->out_order_1p= %d", ctx->out_order_1p);
  }
  
+static void fimc_prepare_dma_offset(struct fimc_ctx *ctx, struct fimc_frame *f)
+{
+       struct samsung_fimc_variant *variant = ctx->fimc_dev->variant;
+
+       f->dma_offset.y_h = f->offs_h;
+       if (!variant->pix_hoff)
+               f->dma_offset.y_h *= (f->fmt->depth >> 3);
+
+       f->dma_offset.y_v = f->offs_v;
+
+       f->dma_offset.cb_h = f->offs_h;
+       f->dma_offset.cb_v = f->offs_v;
+
+       f->dma_offset.cr_h = f->offs_h;
+       f->dma_offset.cr_v = f->offs_v;
+
+       if (!variant->pix_hoff) {
+               if (f->fmt->planes_cnt == 3) {
+                       f->dma_offset.cb_h >>= 1;
+                       f->dma_offset.cr_h >>= 1;
+               }
+               if (f->fmt->color == S5P_FIMC_YCBCR420) {
+                       f->dma_offset.cb_v >>= 1;
+                       f->dma_offset.cr_v >>= 1;
+               }
+       }
+
+       dbg("in_offset: color= %d, y_h= %d, y_v= %d",
+           f->fmt->color, f->dma_offset.y_h, f->dma_offset.y_v);
+}
+
  /**
   * fimc_prepare_config - check dimensions, operation and color mode
   *                      and pre-calculate offset and the scaling coefficients.
@@ -406,7 +437,6 @@ static int fimc_prepare_config(struct fimc_ctx *ctx, u32 flags)
  {
         struct fimc_frame *s_frame, *d_frame;
         struct fimc_vid_buffer *buf = NULL;
-       struct samsung_fimc_variant *variant = ctx->fimc_dev->variant;
         int ret = 0;
  
         s_frame = &ctx->s_frame;
@@ -419,61 +449,16 @@ static int fimc_prepare_config(struct fimc_ctx *ctx, u32 flags)
                         swap(d_frame->width, d_frame->height);
                 }
  
-               /* Prepare the output offset ratios for scaler. */
-               d_frame->dma_offset.y_h = d_frame->offs_h;
-               if (!variant->pix_hoff)
-                       d_frame->dma_offset.y_h *= (d_frame->fmt->depth >> 3);
-
-               d_frame->dma_offset.y_v = d_frame->offs_v;
-
-               d_frame->dma_offset.cb_h = d_frame->offs_h;
-               d_frame->dma_offset.cb_v = d_frame->offs_v;
-
-               d_frame->dma_offset.cr_h = d_frame->offs_h;
-               d_frame->dma_offset.cr_v = d_frame->offs_v;
+               /* Prepare the DMA offset ratios for scaler. */
+               fimc_prepare_dma_offset(ctx, &ctx->s_frame);
+               fimc_prepare_dma_offset(ctx, &ctx->d_frame);
  
-               if (!variant->pix_hoff && d_frame->fmt->planes_cnt == 3) {
-                       d_frame->dma_offset.cb_h >>= 1;
-                       d_frame->dma_offset.cb_v >>= 1;
-                       d_frame->dma_offset.cr_h >>= 1;
-                       d_frame->dma_offset.cr_v >>= 1;
-               }
-
-               dbg("out offset: color= %d, y_h= %d, y_v= %d",
-                       d_frame->fmt->color,
-                       d_frame->dma_offset.y_h, d_frame->dma_offset.y_v);
-
-               /* Prepare the input offset ratios for scaler. */
-               s_frame->dma_offset.y_h = s_frame->offs_h;
-               if (!variant->pix_hoff)
-                       s_frame->dma_offset.y_h *= (s_frame->fmt->depth >> 3);
-               s_frame->dma_offset.y_v = s_frame->offs_v;
-
-               s_frame->dma_offset.cb_h = s_frame->offs_h;
-               s_frame->dma_offset.cb_v = s_frame->offs_v;
-
-               s_frame->dma_offset.cr_h = s_frame->offs_h;
-               s_frame->dma_offset.cr_v = s_frame->offs_v;
-
-               if (!variant->pix_hoff && s_frame->fmt->planes_cnt == 3) {
-                       s_frame->dma_offset.cb_h >>= 1;
-                       s_frame->dma_offset.cb_v >>= 1;
-                       s_frame->dma_offset.cr_h >>= 1;
-                       s_frame->dma_offset.cr_v >>= 1;
-               }
-
-               dbg("in offset: color= %d, y_h= %d, y_v= %d",
-                       s_frame->fmt->color, s_frame->dma_offset.y_h,
-                       s_frame->dma_offset.y_v);
-
-               fimc_set_yuv_order(ctx);
-
-               /* Check against the scaler ratio. */
                 if (s_frame->height > (SCALER_MAX_VRATIO * d_frame->height) ||
                     s_frame->width > (SCALER_MAX_HRATIO * d_frame->width)) {
                         err("out of scaler range");
                         return -EINVAL;
                 }
+               fimc_set_yuv_order(ctx);
         }
  
         /* Input DMA mode is not allowed when the scaler is disabled. */
@@ -822,7 +807,8 @@ static int fimc_m2m_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
         } else {
                 v4l2_err(&ctx->fimc_dev->m2m.v4l2_dev,
                          "Wrong buffer/video queue type (%d)\n", f->type);
-               return -EINVAL;
+               ret = -EINVAL;
+               goto s_fmt_out;
         }
  
         pix = &f->fmt.pix;
@@ -1414,8 +1400,10 @@ static int fimc_probe(struct platform_device *pdev)
         }
  
         fimc->work_queue = create_workqueue(dev_name(&fimc->pdev->dev));
-       if (!fimc->work_queue)
+       if (!fimc->work_queue) {
+               ret = -ENOMEM;
                 goto err_irq;
+       }
  
         ret = fimc_register_m2m_device(fimc);
         if (ret)
@@ -1492,6 +1480,7 @@ static struct samsung_fimc_variant fimc2_variant_s5p = {
  };
  
  static struct samsung_fimc_variant fimc01_variant_s5pv210 = {
+       .pix_hoff       = 1,
         .has_inp_rot    = 1,
         .has_out_rot    = 1,
         .min_inp_pixsize = 16,
@@ -1506,6 +1495,7 @@ static struct samsung_fimc_variant fimc01_variant_s5pv210 = {
  };
  
  static struct samsung_fimc_variant fimc2_variant_s5pv210 = {
+       .pix_hoff        = 1,
         .min_inp_pixsize = 16,
         .min_out_pixsize = 32,
  
diff --git a/drivers/media/video/saa7134/saa7134-cards.c b/drivers/media/video/saa7134/saa7134-cards.c

index ec697fcd406ede6b23c9691ceb5d0fe1fc0134a4..bb8d83d8ddafbd79dd892e971e6d8152f5483a37 100644 (file)
--- a/drivers/media/video/saa7134/saa7134-cards.c
+++ b/drivers/media/video/saa7134/saa7134-cards.c
@@ -4323,13 +4323,13 @@ struct saa7134_board saa7134_boards[] = {
         },
         [SAA7134_BOARD_BEHOLD_COLUMBUS_TVFM] = {
                 /*       Beholder Intl. Ltd. 2008      */
-               /*Dmitry Belimov <d.belimov@gmail.com> */
-               .name           = "Beholder BeholdTV Columbus TVFM",
+               /* Dmitry Belimov <d.belimov@gmail.com> */
+               .name           = "Beholder BeholdTV Columbus TV/FM",
                 .audio_clock    = 0x00187de7,
                 .tuner_type     = TUNER_ALPS_TSBE5_PAL,
-               .radio_type     = UNSET,
-               .tuner_addr     = ADDR_UNSET,
-               .radio_addr     = ADDR_UNSET,
+               .radio_type     = TUNER_TEA5767,
+               .tuner_addr     = 0xc2 >> 1,
+               .radio_addr     = 0xc0 >> 1,
                 .tda9887_conf   = TDA9887_PRESENT,
                 .gpiomask       = 0x000A8004,
                 .inputs         = {{
diff --git a/drivers/media/video/saa7164/saa7164-buffer.c b/drivers/media/video/saa7164/saa7164-buffer.c

index 5713f3a4b76c952bf9b1db333bb547caf3e40376..ddd25d32723dc0436f2477a641afaa0d75d9f2d3 100644 (file)
--- a/drivers/media/video/saa7164/saa7164-buffer.c
+++ b/drivers/media/video/saa7164/saa7164-buffer.c
@@ -136,10 +136,11 @@ ret:
  int saa7164_buffer_dealloc(struct saa7164_tsport *port,
         struct saa7164_buffer *buf)
  {
-       struct saa7164_dev *dev = port->dev;
+       struct saa7164_dev *dev;
  
-       if ((buf == 0) || (port == 0))
+       if (!buf || !port)
                 return SAA_ERR_BAD_PARAMETER;
+       dev = port->dev;
  
         dprintk(DBGLVL_BUF, "%s() deallocating buffer @ 0x%p\n", __func__, buf);
  
diff --git a/drivers/media/video/uvc/uvc_driver.c b/drivers/media/video/uvc/uvc_driver.c

index 8bdd940f32e689c5b51a94ec8975014567d9f00c..2ac85d8984f025cb0ec9933f87bf00e91bf656dd 100644 (file)
--- a/drivers/media/video/uvc/uvc_driver.c
+++ b/drivers/media/video/uvc/uvc_driver.c
@@ -486,6 +486,12 @@ static int uvc_parse_format(struct uvc_device *dev,
                             max(frame->dwFrameInterval[0],
                                 frame->dwDefaultFrameInterval));
  
+               if (dev->quirks & UVC_QUIRK_RESTRICT_FRAME_RATE) {
+                       frame->bFrameIntervalType = 1;
+                       frame->dwFrameInterval[0] =
+                               frame->dwDefaultFrameInterval;
+               }
+
                 uvc_trace(UVC_TRACE_DESCR, "- %ux%u (%u.%u fps)\n",
                         frame->wWidth, frame->wHeight,
                         10000000/frame->dwDefaultFrameInterval,
@@ -2026,6 +2032,15 @@ static struct usb_device_id uvc_ids[] = {
           .bInterfaceClass      = USB_CLASS_VENDOR_SPEC,
           .bInterfaceSubClass   = 1,
           .bInterfaceProtocol   = 0 },
+       /* Chicony CNF7129 (Asus EEE 100HE) */
+       { .match_flags          = USB_DEVICE_ID_MATCH_DEVICE
+                               | USB_DEVICE_ID_MATCH_INT_INFO,
+         .idVendor             = 0x04f2,
+         .idProduct            = 0xb071,
+         .bInterfaceClass      = USB_CLASS_VIDEO,
+         .bInterfaceSubClass   = 1,
+         .bInterfaceProtocol   = 0,
+         .driver_info          = UVC_QUIRK_RESTRICT_FRAME_RATE },
         /* Alcor Micro AU3820 (Future Boy PC USB Webcam) */
         { .match_flags          = USB_DEVICE_ID_MATCH_DEVICE
                                 | USB_DEVICE_ID_MATCH_INT_INFO,
@@ -2091,6 +2106,15 @@ static struct usb_device_id uvc_ids[] = {
           .bInterfaceProtocol   = 0,
           .driver_info          = UVC_QUIRK_PROBE_MINMAX
                                 | UVC_QUIRK_PROBE_DEF },
+       /* IMC Networks (Medion Akoya) */
+       { .match_flags          = USB_DEVICE_ID_MATCH_DEVICE
+                               | USB_DEVICE_ID_MATCH_INT_INFO,
+         .idVendor             = 0x13d3,
+         .idProduct            = 0x5103,
+         .bInterfaceClass      = USB_CLASS_VIDEO,
+         .bInterfaceSubClass   = 1,
+         .bInterfaceProtocol   = 0,
+         .driver_info          = UVC_QUIRK_STREAM_NO_FID },
         /* Syntek (HP Spartan) */
         { .match_flags          = USB_DEVICE_ID_MATCH_DEVICE
                                 | USB_DEVICE_ID_MATCH_INT_INFO,
diff --git a/drivers/media/video/uvc/uvcvideo.h b/drivers/media/video/uvc/uvcvideo.h

index bdacf3beabf54fcbe1f9f901692a0134e6b48ed1..892e0e51916c31853d9e8fa681ecce27e75edc53 100644 (file)
--- a/drivers/media/video/uvc/uvcvideo.h
+++ b/drivers/media/video/uvc/uvcvideo.h
@@ -182,6 +182,7 @@ struct uvc_xu_control {
  #define UVC_QUIRK_IGNORE_SELECTOR_UNIT 0x00000020
  #define UVC_QUIRK_FIX_BANDWIDTH                0x00000080
  #define UVC_QUIRK_PROBE_DEF            0x00000100
+#define UVC_QUIRK_RESTRICT_FRAME_RATE  0x00000200
  
  /* Format flags */
  #define UVC_FMT_FLAG_COMPRESSED                0x00000001
diff --git a/drivers/media/video/v4l2-compat-ioctl32.c b/drivers/media/video/v4l2-compat-ioctl32.c

index 073f01390cdd0a00de7e34dc7cd30360b72912c4..86294ed35c9b643cc7bab7411904476d0f7467a7 100644 (file)
--- a/drivers/media/video/v4l2-compat-ioctl32.c
+++ b/drivers/media/video/v4l2-compat-ioctl32.c
@@ -193,17 +193,24 @@ static int put_video_window32(struct video_window *kp, struct video_window32 __u
  struct video_code32 {
         char            loadwhat[16];   /* name or tag of file being passed */
         compat_int_t    datasize;
-       unsigned char   *data;
+       compat_uptr_t   data;
  };
  
-static int get_microcode32(struct video_code *kp, struct video_code32 __user *up)
+static struct video_code __user *get_microcode32(struct video_code32 *kp)
  {
-       if (!access_ok(VERIFY_READ, up, sizeof(struct video_code32)) ||
-               copy_from_user(kp->loadwhat, up->loadwhat, sizeof(up->loadwhat)) ||
-               get_user(kp->datasize, &up->datasize) ||
-               copy_from_user(kp->data, up->data, up->datasize))
-                       return -EFAULT;
-       return 0;
+       struct video_code __user *up;
+
+       up = compat_alloc_user_space(sizeof(*up));
+
+       /*
+        * NOTE! We don't actually care if these fail. If the
+        * user address is invalid, the native ioctl will do
+        * the error handling for us
+        */
+       (void) copy_to_user(up->loadwhat, kp->loadwhat, sizeof(up->loadwhat));
+       (void) put_user(kp->datasize, &up->datasize);
+       (void) put_user(compat_ptr(kp->data), &up->data);
+       return up;
  }
  
  #define VIDIOCGTUNER32         _IOWR('v', 4, struct video_tuner32)
@@ -739,7 +746,7 @@ static long do_video_ioctl(struct file *file, unsigned int cmd, unsigned long ar
                 struct video_tuner vt;
                 struct video_buffer vb;
                 struct video_window vw;
-               struct video_code vc;
+               struct video_code32 vc;
                 struct video_audio va;
  #endif
                 struct v4l2_format v2f;
@@ -818,8 +825,11 @@ static long do_video_ioctl(struct file *file, unsigned int cmd, unsigned long ar
                 break;
  
         case VIDIOCSMICROCODE:
-               err = get_microcode32(&karg.vc, up);
-               compatible_arg = 0;
+               /* Copy the 32-bit "video_code32" to kernel space */
+               if (copy_from_user(&karg.vc, up, sizeof(karg.vc)))
+                       return -EFAULT;
+               /* Convert the 32-bit version to a 64-bit version in user space */
+               up = get_microcode32(&karg.vc);
                 break;
  
         case VIDIOCSFREQ:
diff --git a/drivers/media/video/videobuf-dma-contig.c b/drivers/media/video/videobuf-dma-contig.c

index 372b87efcd0538ec6c91c1bb45749177294f7442..6ff9e4bac3ea14fd6248bf07e1dfc6109fe43d27 100644 (file)
--- a/drivers/media/video/videobuf-dma-contig.c
+++ b/drivers/media/video/videobuf-dma-contig.c
@@ -393,8 +393,10 @@ void videobuf_dma_contig_free(struct videobuf_queue *q,
         }
  
         /* read() method */
-       dma_free_coherent(q->dev, mem->size, mem->vaddr, mem->dma_handle);
-       mem->vaddr = NULL;
+       if (mem->vaddr) {
+               dma_free_coherent(q->dev, mem->size, mem->vaddr, mem->dma_handle);
+               mem->vaddr = NULL;
+       }
  }
  EXPORT_SYMBOL_GPL(videobuf_dma_contig_free);
  
diff --git a/drivers/media/video/videobuf-dma-sg.c b/drivers/media/video/videobuf-dma-sg.c

index 06f9a9c2a39add9256a58850d0cf5c4b0c52bf5b..2ad0bc252b0eaed1612ddaac477d21f6d7033b0c 100644 (file)
--- a/drivers/media/video/videobuf-dma-sg.c
+++ b/drivers/media/video/videobuf-dma-sg.c
@@ -94,7 +94,7 @@ err:
   * must free the memory.
   */
  static struct scatterlist *videobuf_pages_to_sg(struct page **pages,
-                                               int nr_pages, int offset)
+                                       int nr_pages, int offset, size_t size)
  {
         struct scatterlist *sglist;
         int i;
@@ -110,12 +110,14 @@ static struct scatterlist *videobuf_pages_to_sg(struct page **pages,
                 /* DMA to highmem pages might not work */
                 goto highmem;
         sg_set_page(&sglist[0], pages[0], PAGE_SIZE - offset, offset);
+       size -= PAGE_SIZE - offset;
         for (i = 1; i < nr_pages; i++) {
                 if (NULL == pages[i])
                         goto nopage;
                 if (PageHighMem(pages[i]))
                         goto highmem;
-               sg_set_page(&sglist[i], pages[i], PAGE_SIZE, 0);
+               sg_set_page(&sglist[i], pages[i], min(PAGE_SIZE, size), 0);
+               size -= min(PAGE_SIZE, size);
         }
         return sglist;
  
@@ -170,7 +172,8 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
  
         first = (data          & PAGE_MASK) >> PAGE_SHIFT;
         last  = ((data+size-1) & PAGE_MASK) >> PAGE_SHIFT;
-       dma->offset   = data & ~PAGE_MASK;
+       dma->offset = data & ~PAGE_MASK;
+       dma->size = size;
         dma->nr_pages = last-first+1;
         dma->pages = kmalloc(dma->nr_pages * sizeof(struct page *), GFP_KERNEL);
         if (NULL == dma->pages)
@@ -252,7 +255,7 @@ int videobuf_dma_map(struct device *dev, struct videobuf_dmabuf *dma)
  
         if (dma->pages) {
                 dma->sglist = videobuf_pages_to_sg(dma->pages, dma->nr_pages,
-                                                  dma->offset);
+                                                  dma->offset, dma->size);
         }
         if (dma->vaddr) {
                 dma->sglist = videobuf_vmalloc_to_sg(dma->vaddr,
diff --git a/drivers/mfd/max8925-core.c b/drivers/mfd/max8925-core.c

index 04028a9ee082735278557b606840619ade7e29a9..428377a5a6f56fe94ca033730a49102f19164dd2 100644 (file)
--- a/drivers/mfd/max8925-core.c
+++ b/drivers/mfd/max8925-core.c
@@ -429,24 +429,25 @@ static void max8925_irq_sync_unlock(unsigned int irq)
         irq_tsc = cache_tsc;
         for (i = 0; i < ARRAY_SIZE(max8925_irqs); i++) {
                 irq_data = &max8925_irqs[i];
+               /* 1 -- disable, 0 -- enable */
                 switch (irq_data->mask_reg) {
                 case MAX8925_CHG_IRQ1_MASK:
-                       irq_chg[0] &= irq_data->enable;
+                       irq_chg[0] &= ~irq_data->enable;
                         break;
                 case MAX8925_CHG_IRQ2_MASK:
-                       irq_chg[1] &= irq_data->enable;
+                       irq_chg[1] &= ~irq_data->enable;
                         break;
                 case MAX8925_ON_OFF_IRQ1_MASK:
-                       irq_on[0] &= irq_data->enable;
+                       irq_on[0] &= ~irq_data->enable;
                         break;
                 case MAX8925_ON_OFF_IRQ2_MASK:
-                       irq_on[1] &= irq_data->enable;
+                       irq_on[1] &= ~irq_data->enable;
                         break;
                 case MAX8925_RTC_IRQ_MASK:
-                       irq_rtc &= irq_data->enable;
+                       irq_rtc &= ~irq_data->enable;
                         break;
                 case MAX8925_TSC_IRQ_MASK:
-                       irq_tsc &= irq_data->enable;
+                       irq_tsc &= ~irq_data->enable;
                         break;
                 default:
                         dev_err(chip->dev, "wrong IRQ\n");
diff --git a/drivers/mfd/wm831x-irq.c b/drivers/mfd/wm831x-irq.c

index 7dabe4dbd3732e1d75c396b9b1e01bdeafafa57c..294183b6260b1facff3d26764eb3cea8c6d4b011 100644 (file)
--- a/drivers/mfd/wm831x-irq.c
+++ b/drivers/mfd/wm831x-irq.c
@@ -394,8 +394,13 @@ static int wm831x_irq_set_type(unsigned int irq, unsigned int type)
  
         irq = irq - wm831x->irq_base;
  
-       if (irq < WM831X_IRQ_GPIO_1 || irq > WM831X_IRQ_GPIO_11)
-               return -EINVAL;
+       if (irq < WM831X_IRQ_GPIO_1 || irq > WM831X_IRQ_GPIO_11) {
+               /* Ignore internal-only IRQs */
+               if (irq >= 0 && irq < WM831X_NUM_IRQS)
+                       return 0;
+               else
+                       return -EINVAL;
+       }
  
         switch (type) {
         case IRQ_TYPE_EDGE_BOTH:
diff --git a/drivers/misc/bh1780gli.c b/drivers/misc/bh1780gli.c

index 714c6b487313a4ad7b360687591b75a6cddedc55..d5f3a3fd231931508948a9b0227aae2a3648a96f 100644 (file)
--- a/drivers/misc/bh1780gli.c
+++ b/drivers/misc/bh1780gli.c
@@ -190,7 +190,6 @@ static int __devexit bh1780_remove(struct i2c_client *client)
  
         ddata = i2c_get_clientdata(client);
         sysfs_remove_group(&client->dev.kobj, &bh1780_attr_group);
-       i2c_set_clientdata(client, NULL);
         kfree(ddata);
  
         return 0;
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c

index 5db49b124ffa158793e0cfb3d1db72321679d440..09eee6df0653c84fc5c08023f8913a6afbb884a7 100644 (file)
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1631,6 +1631,19 @@ int mmc_suspend_host(struct mmc_host *host)
         if (host->bus_ops && !host->bus_dead) {
                 if (host->bus_ops->suspend)
                         err = host->bus_ops->suspend(host);
+               if (err == -ENOSYS || !host->bus_ops->resume) {
+                       /*
+                        * We simply "remove" the card in this case.
+                        * It will be redetected on resume.
+                        */
+                       if (host->bus_ops->remove)
+                               host->bus_ops->remove(host);
+                       mmc_claim_host(host);
+                       mmc_detach_bus(host);
+                       mmc_release_host(host);
+                       host->pm_flags = 0;
+                       err = 0;
+               }
         }
         mmc_bus_put(host);
  
diff --git a/drivers/mtd/nand/mxc_nand.c b/drivers/mtd/nand/mxc_nand.c

index b2828e84d24377fdbf2be2d9af3d2492ee7feea2..214b03afdd482920adda308e51088092d884a2d1 100644 (file)
--- a/drivers/mtd/nand/mxc_nand.c
+++ b/drivers/mtd/nand/mxc_nand.c
@@ -30,6 +30,8 @@
  #include <linux/clk.h>
  #include <linux/err.h>
  #include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/completion.h>
  
  #include <asm/mach/flash.h>
  #include <mach/mxc_nand.h>
@@ -151,7 +153,7 @@ struct mxc_nand_host {
         int                     irq;
         int                     eccsize;
  
-       wait_queue_head_t       irq_waitq;
+       struct completion       op_completion;
  
         uint8_t                 *data_buf;
         unsigned int            buf_start;
@@ -164,6 +166,7 @@ struct mxc_nand_host {
         void                    (*send_read_id)(struct mxc_nand_host *);
         uint16_t                (*get_dev_status)(struct mxc_nand_host *);
         int                     (*check_int)(struct mxc_nand_host *);
+       void                    (*irq_control)(struct mxc_nand_host *, int);
  };
  
  /* OOB placement block for use with hardware ecc generation */
@@ -216,9 +219,12 @@ static irqreturn_t mxc_nfc_irq(int irq, void *dev_id)
  {
         struct mxc_nand_host *host = dev_id;
  
-       disable_irq_nosync(irq);
+       if (!host->check_int(host))
+               return IRQ_NONE;
  
-       wake_up(&host->irq_waitq);
+       host->irq_control(host, 0);
+
+       complete(&host->op_completion);
  
         return IRQ_HANDLED;
  }
@@ -245,11 +251,54 @@ static int check_int_v1_v2(struct mxc_nand_host *host)
         if (!(tmp & NFC_V1_V2_CONFIG2_INT))
                 return 0;
  
-       writew(tmp & ~NFC_V1_V2_CONFIG2_INT, NFC_V1_V2_CONFIG2);
+       if (!cpu_is_mx21())
+               writew(tmp & ~NFC_V1_V2_CONFIG2_INT, NFC_V1_V2_CONFIG2);
  
         return 1;
  }
  
+/*
+ * It has been observed that the i.MX21 cannot read the CONFIG2:INT bit
+ * if interrupts are masked (CONFIG1:INT_MSK is set). To handle this, the
+ * driver can enable/disable the irq line rather than simply masking the
+ * interrupts.
+ */
+static void irq_control_mx21(struct mxc_nand_host *host, int activate)
+{
+       if (activate)
+               enable_irq(host->irq);
+       else
+               disable_irq_nosync(host->irq);
+}
+
+static void irq_control_v1_v2(struct mxc_nand_host *host, int activate)
+{
+       uint16_t tmp;
+
+       tmp = readw(NFC_V1_V2_CONFIG1);
+
+       if (activate)
+               tmp &= ~NFC_V1_V2_CONFIG1_INT_MSK;
+       else
+               tmp |= NFC_V1_V2_CONFIG1_INT_MSK;
+
+       writew(tmp, NFC_V1_V2_CONFIG1);
+}
+
+static void irq_control_v3(struct mxc_nand_host *host, int activate)
+{
+       uint32_t tmp;
+
+       tmp = readl(NFC_V3_CONFIG2);
+
+       if (activate)
+               tmp &= ~NFC_V3_CONFIG2_INT_MSK;
+       else
+               tmp |= NFC_V3_CONFIG2_INT_MSK;
+
+       writel(tmp, NFC_V3_CONFIG2);
+}
+
  /* This function polls the NANDFC to wait for the basic operation to
   * complete by checking the INT bit of config2 register.
   */
@@ -259,10 +308,9 @@ static void wait_op_done(struct mxc_nand_host *host, int useirq)
  
         if (useirq) {
                 if (!host->check_int(host)) {
-
-                       enable_irq(host->irq);
-
-                       wait_event(host->irq_waitq, host->check_int(host));
+                       INIT_COMPLETION(host->op_completion);
+                       host->irq_control(host, 1);
+                       wait_for_completion(&host->op_completion);
                 }
         } else {
                 while (max_retries-- > 0) {
@@ -799,6 +847,7 @@ static void preset_v3(struct mtd_info *mtd)
                 NFC_V3_CONFIG2_2CMD_PHASES |
                 NFC_V3_CONFIG2_SPAS(mtd->oobsize >> 1) |
                 NFC_V3_CONFIG2_ST_CMD(0x70) |
+               NFC_V3_CONFIG2_INT_MSK |
                 NFC_V3_CONFIG2_NUM_ADDR_PHASE0;
  
         if (chip->ecc.mode == NAND_ECC_HW)
@@ -1024,6 +1073,10 @@ static int __init mxcnd_probe(struct platform_device *pdev)
                 host->send_read_id = send_read_id_v1_v2;
                 host->get_dev_status = get_dev_status_v1_v2;
                 host->check_int = check_int_v1_v2;
+               if (cpu_is_mx21())
+                       host->irq_control = irq_control_mx21;
+               else
+                       host->irq_control = irq_control_v1_v2;
         }
  
         if (nfc_is_v21()) {
@@ -1062,6 +1115,7 @@ static int __init mxcnd_probe(struct platform_device *pdev)
                 host->send_read_id = send_read_id_v3;
                 host->check_int = check_int_v3;
                 host->get_dev_status = get_dev_status_v3;
+               host->irq_control = irq_control_v3;
                 oob_smallpage = &nandv2_hw_eccoob_smallpage;
                 oob_largepage = &nandv2_hw_eccoob_largepage;
         } else
@@ -1093,14 +1147,34 @@ static int __init mxcnd_probe(struct platform_device *pdev)
                 this->options |= NAND_USE_FLASH_BBT;
         }
  
-       init_waitqueue_head(&host->irq_waitq);
+       init_completion(&host->op_completion);
  
         host->irq = platform_get_irq(pdev, 0);
  
+       /*
+        * mask the interrupt. For i.MX21 explicitely call
+        * irq_control_v1_v2 to use the mask bit. We can't call
+        * disable_irq_nosync() for an interrupt we do not own yet.
+        */
+       if (cpu_is_mx21())
+               irq_control_v1_v2(host, 0);
+       else
+               host->irq_control(host, 0);
+
         err = request_irq(host->irq, mxc_nfc_irq, IRQF_DISABLED, DRIVER_NAME, host);
         if (err)
                 goto eirq;
  
+       host->irq_control(host, 0);
+
+       /*
+        * Now that the interrupt is disabled make sure the interrupt
+        * mask bit is cleared on i.MX21. Otherwise we can't read
+        * the interrupt status bit on this machine.
+        */
+       if (cpu_is_mx21())
+               irq_control_v1_v2(host, 1);
+
         /* first scan to find the device and get the page size */
         if (nand_scan_ident(mtd, 1, NULL)) {
                 err = -ENXIO;
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c

index 133d51528f8dc0fb79eae4d12230e1a65bd7595e..513e0a76a4a73866d52bba8151e43556a3b30a54 100644 (file)
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -413,7 +413,7 @@ static inline int omap_nand_dma_transfer(struct mtd_info *mtd, void *addr,
                 prefetch_status = gpmc_read_status(GPMC_PREFETCH_COUNT);
         } while (prefetch_status);
         /* disable and stop the PFPW engine */
-       gpmc_prefetch_reset();
+       gpmc_prefetch_reset(info->gpmc_cs);
  
         dma_unmap_single(&info->pdev->dev, dma_addr, len, dir);
         return 0;
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig

index 2cc81a54cbf322a49ccbf474f5d41f654faf109d..5db667c0b3711f235dfc49c52a4d165b12e6b3fd 100644 (file)
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2428,7 +2428,7 @@ config UGETH_TX_ON_DEMAND
  
  config MV643XX_ETH
         tristate "Marvell Discovery (643XX) and Orion ethernet support"
-       depends on MV64X60 || PPC32 || PLAT_ORION
+       depends on (MV64X60 || PPC32 || PLAT_ORION) && INET
         select INET_LRO
         select PHYLIB
         help
@@ -2803,7 +2803,7 @@ config NIU
  
  config PASEMI_MAC
         tristate "PA Semi 1/10Gbit MAC"
-       depends on PPC_PASEMI && PCI
+       depends on PPC_PASEMI && PCI && INET
         select PHYLIB
         select INET_LRO
         help
diff --git a/drivers/net/b44.c b/drivers/net/b44.c

index 1e620e287ae0cc9fbe9894a105fedbfdc9fbb6be..efeffdf9e5fab30d2bd97a07e5e866fec60ed595 100644 (file)
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -2170,8 +2170,6 @@ static int __devinit b44_init_one(struct ssb_device *sdev,
         dev->irq = sdev->irq;
         SET_ETHTOOL_OPS(dev, &b44_ethtool_ops);
  
-       netif_carrier_off(dev);
-
         err = ssb_bus_powerup(sdev->bus, 0);
         if (err) {
                 dev_err(sdev->dev,
@@ -2213,6 +2211,8 @@ static int __devinit b44_init_one(struct ssb_device *sdev,
                 goto err_out_powerdown;
         }
  
+       netif_carrier_off(dev);
+
         ssb_set_drvdata(sdev, dev);
  
         /* Chip reset provides power to the b44 MAC & PCI cores, which
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c

index 3b16f62d5606c741e97fb0a7c7b0b211c903157d..e953c6ad6e6d1ea3fd7e22fddc8f5f27ba1f8b38 100644 (file)
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5164,6 +5164,15 @@ int bond_create(struct net *net, const char *name)
                 res = dev_alloc_name(bond_dev, "bond%d");
                 if (res < 0)
                         goto out;
+       } else {
+               /*
+                * If we're given a name to register
+                * we need to ensure that its not already
+                * registered
+                */
+               res = -EEXIST;
+               if (__dev_get_by_name(net, name) != NULL)
+                       goto out;
         }
  
         res = register_netdevice(bond_dev);
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c

index a333b42111b8c2ba20b92eca94bf5648704c4f9a..6372610ed24093b8ed99fdf002e44442be49f696 100644 (file)
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -533,8 +533,15 @@ static inline void ehea_fill_skb(struct net_device *dev,
         int length = cqe->num_bytes_transfered - 4;     /*remove CRC */
  
         skb_put(skb, length);
-       skb->ip_summed = CHECKSUM_UNNECESSARY;
         skb->protocol = eth_type_trans(skb, dev);
+
+       /* The packet was not an IPV4 packet so a complemented checksum was
+          calculated. The value is found in the Internet Checksum field. */
+       if (cqe->status & EHEA_CQE_BLIND_CKSUM) {
+               skb->ip_summed = CHECKSUM_COMPLETE;
+               skb->csum = csum_unfold(~cqe->inet_checksum_value);
+       } else
+               skb->ip_summed = CHECKSUM_UNNECESSARY;
  }
  
  static inline struct sk_buff *get_skb_by_index(struct sk_buff **skb_array,
diff --git a/drivers/net/ehea/ehea_qmr.h b/drivers/net/ehea/ehea_qmr.h

index f608a6c54af5845727494c9749e2b786390dfa78..38104734a3be82b0fb4480526295510a4769e8bf 100644 (file)
--- a/drivers/net/ehea/ehea_qmr.h
+++ b/drivers/net/ehea/ehea_qmr.h
@@ -150,6 +150,7 @@ struct ehea_rwqe {
  #define EHEA_CQE_TYPE_RQ           0x60
  #define EHEA_CQE_STAT_ERR_MASK     0x700F
  #define EHEA_CQE_STAT_FAT_ERR_MASK 0xF
+#define EHEA_CQE_BLIND_CKSUM       0x8000
  #define EHEA_CQE_STAT_ERR_TCP      0x4000
  #define EHEA_CQE_STAT_ERR_IP       0x2000
  #define EHEA_CQE_STAT_ERR_CRC      0x1000
diff --git a/drivers/net/fec.c b/drivers/net/fec.c

index 768b840aeb6b7b0bf2ceec87469bb0b4925b5d20..cce32d43175f5c3ed5f6e7ef2ebd56eede8e9139 100644 (file)
--- a/drivers/net/fec.c
+++ b/drivers/net/fec.c
@@ -678,24 +678,37 @@ static int fec_enet_mii_probe(struct net_device *dev)
  {
         struct fec_enet_private *fep = netdev_priv(dev);
         struct phy_device *phy_dev = NULL;
-       int ret;
+       char mdio_bus_id[MII_BUS_ID_SIZE];
+       char phy_name[MII_BUS_ID_SIZE + 3];
+       int phy_id;
  
         fep->phy_dev = NULL;
  
-       /* find the first phy */
-       phy_dev = phy_find_first(fep->mii_bus);
-       if (!phy_dev) {
-               printk(KERN_ERR "%s: no PHY found\n", dev->name);
-               return -ENODEV;
+       /* check for attached phy */
+       for (phy_id = 0; (phy_id < PHY_MAX_ADDR); phy_id++) {
+               if ((fep->mii_bus->phy_mask & (1 << phy_id)))
+                       continue;
+               if (fep->mii_bus->phy_map[phy_id] == NULL)
+                       continue;
+               if (fep->mii_bus->phy_map[phy_id]->phy_id == 0)
+                       continue;
+               strncpy(mdio_bus_id, fep->mii_bus->id, MII_BUS_ID_SIZE);
+               break;
         }
  
-       /* attach the mac to the phy */
-       ret = phy_connect_direct(dev, phy_dev,
-                            &fec_enet_adjust_link, 0,
-                            PHY_INTERFACE_MODE_MII);
-       if (ret) {
-               printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
-               return ret;
+       if (phy_id >= PHY_MAX_ADDR) {
+               printk(KERN_INFO "%s: no PHY, assuming direct connection "
+                       "to switch\n", dev->name);
+               strncpy(mdio_bus_id, "0", MII_BUS_ID_SIZE);
+               phy_id = 0;
+       }
+
+       snprintf(phy_name, MII_BUS_ID_SIZE, PHY_ID_FMT, mdio_bus_id, phy_id);
+       phy_dev = phy_connect(dev, phy_name, &fec_enet_adjust_link, 0,
+               PHY_INTERFACE_MODE_MII);
+       if (IS_ERR(phy_dev)) {
+               printk(KERN_ERR "%s: could not attach to PHY\n", dev->name);
+               return PTR_ERR(phy_dev);
         }
  
         /* mask with MAC supported features */
@@ -738,7 +751,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
         fep->mii_bus->read = fec_enet_mdio_read;
         fep->mii_bus->write = fec_enet_mdio_write;
         fep->mii_bus->reset = fec_enet_mdio_reset;
-       snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id);
+       snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id + 1);
         fep->mii_bus->priv = fep;
         fep->mii_bus->parent = &pdev->dev;
  
@@ -1311,6 +1324,9 @@ fec_probe(struct platform_device *pdev)
         if (ret)
                 goto failed_mii_init;
  
+       /* Carrier starts down, phylib will bring it up */
+       netif_carrier_off(ndev);
+
         ret = register_netdev(ndev);
         if (ret)
                 goto failed_register;
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c

index a0da4a17b025ce08469659a420d37944861876d3..992db2fa136e9c5e6f5130c053393aa662d4e5c9 100644 (file)
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1212,7 +1212,8 @@ static void rtl8169_update_counters(struct net_device *dev)
         if ((RTL_R8(ChipCmd) & CmdRxEnb) == 0)
                 return;
  
-       counters = pci_alloc_consistent(tp->pci_dev, sizeof(*counters), &paddr);
+       counters = dma_alloc_coherent(&tp->pci_dev->dev, sizeof(*counters),
+                                     &paddr, GFP_KERNEL);
         if (!counters)
                 return;
  
@@ -1233,7 +1234,8 @@ static void rtl8169_update_counters(struct net_device *dev)
         RTL_W32(CounterAddrLow, 0);
         RTL_W32(CounterAddrHigh, 0);
  
-       pci_free_consistent(tp->pci_dev, sizeof(*counters), counters, paddr);
+       dma_free_coherent(&tp->pci_dev->dev, sizeof(*counters), counters,
+                         paddr);
  }
  
  static void rtl8169_get_ethtool_stats(struct net_device *dev,
@@ -3292,15 +3294,15 @@ static int rtl8169_open(struct net_device *dev)
  
         /*
          * Rx and Tx desscriptors needs 256 bytes alignment.
-        * pci_alloc_consistent provides more.
+        * dma_alloc_coherent provides more.
          */
-       tp->TxDescArray = pci_alloc_consistent(pdev, R8169_TX_RING_BYTES,
-                                              &tp->TxPhyAddr);
+       tp->TxDescArray = dma_alloc_coherent(&pdev->dev, R8169_TX_RING_BYTES,
+                                            &tp->TxPhyAddr, GFP_KERNEL);
         if (!tp->TxDescArray)
                 goto err_pm_runtime_put;
  
-       tp->RxDescArray = pci_alloc_consistent(pdev, R8169_RX_RING_BYTES,
-                                              &tp->RxPhyAddr);
+       tp->RxDescArray = dma_alloc_coherent(&pdev->dev, R8169_RX_RING_BYTES,
+                                            &tp->RxPhyAddr, GFP_KERNEL);
         if (!tp->RxDescArray)
                 goto err_free_tx_0;
  
@@ -3334,12 +3336,12 @@ out:
  err_release_ring_2:
         rtl8169_rx_clear(tp);
  err_free_rx_1:
-       pci_free_consistent(pdev, R8169_RX_RING_BYTES, tp->RxDescArray,
-                           tp->RxPhyAddr);
+       dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
+                         tp->RxPhyAddr);
         tp->RxDescArray = NULL;
  err_free_tx_0:
-       pci_free_consistent(pdev, R8169_TX_RING_BYTES, tp->TxDescArray,
-                           tp->TxPhyAddr);
+       dma_free_coherent(&pdev->dev, R8169_TX_RING_BYTES, tp->TxDescArray,
+                         tp->TxPhyAddr);
         tp->TxDescArray = NULL;
  err_pm_runtime_put:
         pm_runtime_put_noidle(&pdev->dev);
@@ -3975,7 +3977,7 @@ static void rtl8169_free_rx_skb(struct rtl8169_private *tp,
  {
         struct pci_dev *pdev = tp->pci_dev;
  
-       pci_unmap_single(pdev, le64_to_cpu(desc->addr), tp->rx_buf_sz,
+       dma_unmap_single(&pdev->dev, le64_to_cpu(desc->addr), tp->rx_buf_sz,
                          PCI_DMA_FROMDEVICE);
         dev_kfree_skb(*sk_buff);
         *sk_buff = NULL;
@@ -4000,7 +4002,7 @@ static inline void rtl8169_map_to_asic(struct RxDesc *desc, dma_addr_t mapping,
  static struct sk_buff *rtl8169_alloc_rx_skb(struct pci_dev *pdev,
                                             struct net_device *dev,
                                             struct RxDesc *desc, int rx_buf_sz,
-                                           unsigned int align)
+                                           unsigned int align, gfp_t gfp)
  {
         struct sk_buff *skb;
         dma_addr_t mapping;
@@ -4008,13 +4010,13 @@ static struct sk_buff *rtl8169_alloc_rx_skb(struct pci_dev *pdev,
  
         pad = align ? align : NET_IP_ALIGN;
  
-       skb = netdev_alloc_skb(dev, rx_buf_sz + pad);
+       skb = __netdev_alloc_skb(dev, rx_buf_sz + pad, gfp);
         if (!skb)
                 goto err_out;
  
         skb_reserve(skb, align ? ((pad - 1) & (unsigned long)skb->data) : pad);
  
-       mapping = pci_map_single(pdev, skb->data, rx_buf_sz,
+       mapping = dma_map_single(&pdev->dev, skb->data, rx_buf_sz,
                                  PCI_DMA_FROMDEVICE);
  
         rtl8169_map_to_asic(desc, mapping, rx_buf_sz);
@@ -4039,7 +4041,7 @@ static void rtl8169_rx_clear(struct rtl8169_private *tp)
  }
  
  static u32 rtl8169_rx_fill(struct rtl8169_private *tp, struct net_device *dev,
-                          u32 start, u32 end)
+                          u32 start, u32 end, gfp_t gfp)
  {
         u32 cur;
  
@@ -4054,7 +4056,7 @@ static u32 rtl8169_rx_fill(struct rtl8169_private *tp, struct net_device *dev,
  
                 skb = rtl8169_alloc_rx_skb(tp->pci_dev, dev,
                                            tp->RxDescArray + i,
-                                          tp->rx_buf_sz, tp->align);
+                                          tp->rx_buf_sz, tp->align, gfp);
                 if (!skb)
                         break;
  
@@ -4082,7 +4084,7 @@ static int rtl8169_init_ring(struct net_device *dev)
         memset(tp->tx_skb, 0x0, NUM_TX_DESC * sizeof(struct ring_info));
         memset(tp->Rx_skbuff, 0x0, NUM_RX_DESC * sizeof(struct sk_buff *));
  
-       if (rtl8169_rx_fill(tp, dev, 0, NUM_RX_DESC) != NUM_RX_DESC)
+       if (rtl8169_rx_fill(tp, dev, 0, NUM_RX_DESC, GFP_KERNEL) != NUM_RX_DESC)
                 goto err_out;
  
         rtl8169_mark_as_last_descriptor(tp->RxDescArray + NUM_RX_DESC - 1);
@@ -4099,7 +4101,8 @@ static void rtl8169_unmap_tx_skb(struct pci_dev *pdev, struct ring_info *tx_skb,
  {
         unsigned int len = tx_skb->len;
  
-       pci_unmap_single(pdev, le64_to_cpu(desc->addr), len, PCI_DMA_TODEVICE);
+       dma_unmap_single(&pdev->dev, le64_to_cpu(desc->addr), len,
+                        PCI_DMA_TODEVICE);
         desc->opts1 = 0x00;
         desc->opts2 = 0x00;
         desc->addr = 0x00;
@@ -4243,7 +4246,8 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb,
                 txd = tp->TxDescArray + entry;
                 len = frag->size;
                 addr = ((void *) page_address(frag->page)) + frag->page_offset;
-               mapping = pci_map_single(tp->pci_dev, addr, len, PCI_DMA_TODEVICE);
+               mapping = dma_map_single(&tp->pci_dev->dev, addr, len,
+                                        PCI_DMA_TODEVICE);
  
                 /* anti gcc 2.95.3 bugware (sic) */
                 status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
@@ -4313,7 +4317,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
                 tp->tx_skb[entry].skb = skb;
         }
  
-       mapping = pci_map_single(tp->pci_dev, skb->data, len, PCI_DMA_TODEVICE);
+       mapping = dma_map_single(&tp->pci_dev->dev, skb->data, len,
+                                PCI_DMA_TODEVICE);
  
         tp->tx_skb[entry].len = len;
         txd->addr = cpu_to_le64(mapping);
@@ -4477,8 +4482,8 @@ static inline bool rtl8169_try_rx_copy(struct sk_buff **sk_buff,
         if (!skb)
                 goto out;
  
-       pci_dma_sync_single_for_cpu(tp->pci_dev, addr, pkt_size,
-                                   PCI_DMA_FROMDEVICE);
+       dma_sync_single_for_cpu(&tp->pci_dev->dev, addr, pkt_size,
+                               PCI_DMA_FROMDEVICE);
         skb_copy_from_linear_data(*sk_buff, skb->data, pkt_size);
         *sk_buff = skb;
         done = true;
@@ -4549,11 +4554,11 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
                         rtl8169_rx_csum(skb, desc);
  
                         if (rtl8169_try_rx_copy(&skb, tp, pkt_size, addr)) {
-                               pci_dma_sync_single_for_device(pdev, addr,
+                               dma_sync_single_for_device(&pdev->dev, addr,
                                         pkt_size, PCI_DMA_FROMDEVICE);
                                 rtl8169_mark_to_asic(desc, tp->rx_buf_sz);
                         } else {
-                               pci_unmap_single(pdev, addr, tp->rx_buf_sz,
+                               dma_unmap_single(&pdev->dev, addr, tp->rx_buf_sz,
                                                  PCI_DMA_FROMDEVICE);
                                 tp->Rx_skbuff[entry] = NULL;
                         }
@@ -4583,7 +4588,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
         count = cur_rx - tp->cur_rx;
         tp->cur_rx = cur_rx;
  
-       delta = rtl8169_rx_fill(tp, dev, tp->dirty_rx, tp->cur_rx);
+       delta = rtl8169_rx_fill(tp, dev, tp->dirty_rx, tp->cur_rx, GFP_ATOMIC);
         if (!delta && count)
                 netif_info(tp, intr, dev, "no Rx buffer allocated\n");
         tp->dirty_rx += delta;
@@ -4769,10 +4774,10 @@ static int rtl8169_close(struct net_device *dev)
  
         free_irq(dev->irq, dev);
  
-       pci_free_consistent(pdev, R8169_RX_RING_BYTES, tp->RxDescArray,
-                           tp->RxPhyAddr);
-       pci_free_consistent(pdev, R8169_TX_RING_BYTES, tp->TxDescArray,
-                           tp->TxPhyAddr);
+       dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
+                         tp->RxPhyAddr);
+       dma_free_coherent(&pdev->dev, R8169_TX_RING_BYTES, tp->TxDescArray,
+                         tp->TxPhyAddr);
         tp->TxDescArray = NULL;
         tp->RxDescArray = NULL;
  
diff --git a/drivers/net/skge.c b/drivers/net/skge.c

index 40e5c46e7571ad46f1c7abf655d0235f89762bfd..465ae7e84507385b079c7922cb7360cec428a30f 100644 (file)
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -43,6 +43,7 @@
  #include <linux/seq_file.h>
  #include <linux/mii.h>
  #include <linux/slab.h>
+#include <linux/dmi.h>
  #include <asm/irq.h>
  
  #include "skge.h"
@@ -3868,6 +3869,8 @@ static void __devinit skge_show_addr(struct net_device *dev)
         netif_info(skge, probe, skge->netdev, "addr %pM\n", dev->dev_addr);
  }
  
+static int only_32bit_dma;
+
  static int __devinit skge_probe(struct pci_dev *pdev,
                                 const struct pci_device_id *ent)
  {
@@ -3889,7 +3892,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
  
         pci_set_master(pdev);
  
-       if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
+       if (!only_32bit_dma && !pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
                 using_dac = 1;
                 err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
         } else if (!(err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32)))) {
@@ -4147,8 +4150,21 @@ static struct pci_driver skge_driver = {
         .shutdown =     skge_shutdown,
  };
  
+static struct dmi_system_id skge_32bit_dma_boards[] = {
+       {
+               .ident = "Gigabyte nForce boards",
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "Gigabyte Technology Co"),
+                       DMI_MATCH(DMI_BOARD_NAME, "nForce"),
+               },
+       },
+       {}
+};
+
  static int __init skge_init_module(void)
  {
+       if (dmi_check_system(skge_32bit_dma_boards))
+               only_32bit_dma = 1;
         skge_debug_init();
         return pci_register_driver(&skge_driver);
  }
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c

index bc3af78a869ff52881077b89cf6e5e544b6e8a91..1ec4b9e0239a8ff8c0d3cb18882ff517ef528881 100644 (file)
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4666,7 +4666,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
                                        desc_idx, *post_ptr);
                 drop_it_no_recycle:
                         /* Other statistics kept track of by card. */
-                       tp->net_stats.rx_dropped++;
+                       tp->rx_dropped++;
                         goto next_pkt;
                 }
  
@@ -4726,7 +4726,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
                 if (len > (tp->dev->mtu + ETH_HLEN) &&
                     skb->protocol != htons(ETH_P_8021Q)) {
                         dev_kfree_skb(skb);
-                       goto next_pkt;
+                       goto drop_it_no_recycle;
                 }
  
                 if (desc->type_flags & RXD_FLAG_VLAN &&
@@ -9240,6 +9240,8 @@ static struct rtnl_link_stats64 *tg3_get_stats64(struct net_device *dev,
         stats->rx_missed_errors = old_stats->rx_missed_errors +
                 get_stat64(&hw_stats->rx_discards);
  
+       stats->rx_dropped = tp->rx_dropped;
+
         return stats;
  }
  
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h

index 4937bd19096413bae1115b82cc63ce5123207536..be7ff138a7f98d58a4f4d4c8ff7fe70bb83911d9 100644 (file)
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -2759,7 +2759,7 @@ struct tg3 {
  
  
         /* begin "everything else" cacheline(s) section */
-       struct rtnl_link_stats64        net_stats;
+       unsigned long                   rx_dropped;
         struct rtnl_link_stats64        net_stats_prev;
         struct tg3_ethtool_stats        estats;
         struct tg3_ethtool_stats        estats_prev;
diff --git a/drivers/net/wimax/i2400m/rx.c b/drivers/net/wimax/i2400m/rx.c

index 8cc9e319f4356da8904cbf4e7a495d59eba94645..1737d1488b35704f3196975a76bcc919e943808b 100644 (file)
--- a/drivers/net/wimax/i2400m/rx.c
+++ b/drivers/net/wimax/i2400m/rx.c
@@ -1244,16 +1244,16 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
         int i, result;
         struct device *dev = i2400m_dev(i2400m);
         const struct i2400m_msg_hdr *msg_hdr;
-       size_t pl_itr, pl_size, skb_len;
+       size_t pl_itr, pl_size;
         unsigned long flags;
-       unsigned num_pls, single_last;
+       unsigned num_pls, single_last, skb_len;
  
         skb_len = skb->len;
-       d_fnstart(4, dev, "(i2400m %p skb %p [size %zu])\n",
+       d_fnstart(4, dev, "(i2400m %p skb %p [size %u])\n",
                   i2400m, skb, skb_len);
         result = -EIO;
         msg_hdr = (void *) skb->data;
-       result = i2400m_rx_msg_hdr_check(i2400m, msg_hdr, skb->len);
+       result = i2400m_rx_msg_hdr_check(i2400m, msg_hdr, skb_len);
         if (result < 0)
                 goto error_msg_hdr_check;
         result = -EIO;
@@ -1261,10 +1261,10 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
         pl_itr = sizeof(*msg_hdr) +     /* Check payload descriptor(s) */
                 num_pls * sizeof(msg_hdr->pld[0]);
         pl_itr = ALIGN(pl_itr, I2400M_PL_ALIGN);
-       if (pl_itr > skb->len) {        /* got all the payload descriptors? */
+       if (pl_itr > skb_len) { /* got all the payload descriptors? */
                 dev_err(dev, "RX: HW BUG? message too short (%u bytes) for "
                         "%u payload descriptors (%zu each, total %zu)\n",
-                       skb->len, num_pls, sizeof(msg_hdr->pld[0]), pl_itr);
+                       skb_len, num_pls, sizeof(msg_hdr->pld[0]), pl_itr);
                 goto error_pl_descr_short;
         }
         /* Walk each payload payload--check we really got it */
@@ -1272,7 +1272,7 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
                 /* work around old gcc warnings */
                 pl_size = i2400m_pld_size(&msg_hdr->pld[i]);
                 result = i2400m_rx_pl_descr_check(i2400m, &msg_hdr->pld[i],
-                                                 pl_itr, skb->len);
+                                                 pl_itr, skb_len);
                 if (result < 0)
                         goto error_pl_descr_check;
                 single_last = num_pls == 1 || i == num_pls - 1;
@@ -1290,16 +1290,16 @@ int i2400m_rx(struct i2400m *i2400m, struct sk_buff *skb)
         if (i < i2400m->rx_pl_min)
                 i2400m->rx_pl_min = i;
         i2400m->rx_num++;
-       i2400m->rx_size_acc += skb->len;
-       if (skb->len < i2400m->rx_size_min)
-               i2400m->rx_size_min = skb->len;
-       if (skb->len > i2400m->rx_size_max)
-               i2400m->rx_size_max = skb->len;
+       i2400m->rx_size_acc += skb_len;
+       if (skb_len < i2400m->rx_size_min)
+               i2400m->rx_size_min = skb_len;
+       if (skb_len > i2400m->rx_size_max)
+               i2400m->rx_size_max = skb_len;
         spin_unlock_irqrestore(&i2400m->rx_lock, flags);
  error_pl_descr_check:
  error_pl_descr_short:
  error_msg_hdr_check:
-       d_fnend(4, dev, "(i2400m %p skb %p [size %zu]) = %d\n",
+       d_fnend(4, dev, "(i2400m %p skb %p [size %u]) = %d\n",
                 i2400m, skb, skb_len, result);
         return result;
  }
diff --git a/drivers/net/wireless/ath/ath9k/ani.c b/drivers/net/wireless/ath/ath9k/ani.c

index cc648b6ae31cef3b3a650ab7414b76fa061173e4..a3d95cca8f0c5be9a32c76d57082da1206cd44d1 100644 (file)
--- a/drivers/net/wireless/ath/ath9k/ani.c
+++ b/drivers/net/wireless/ath/ath9k/ani.c
@@ -543,7 +543,7 @@ static u8 ath9k_hw_chan_2_clockrate_mhz(struct ath_hw *ah)
         if (conf_is_ht40(conf))
                 return clockrate * 2;
  
-       return clockrate * 2;
+       return clockrate;
  }
  
  static int32_t ath9k_hw_ani_get_listen_time(struct ath_hw *ah)
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c

index 9dd9e64c2b0b1a69f11a6d40a5e0e2cbc84d8b22..8fd00a6e512019075e966a038d2c5d09539ccf85 100644 (file)
--- a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
@@ -1411,7 +1411,7 @@ void iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
         clear_bit(STATUS_SCAN_HW, &priv->status);
         clear_bit(STATUS_SCANNING, &priv->status);
         /* inform mac80211 scan aborted */
-       queue_work(priv->workqueue, &priv->scan_completed);
+       queue_work(priv->workqueue, &priv->abort_scan);
  }
  
  int iwlagn_manage_ibss_station(struct iwl_priv *priv,
diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c

index 59a308b02f95fdc077a84a7d12d71db7451b1816..d31661c1ce778259996b5428f9cff95b87f1a3db 100644 (file)
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -3018,7 +3018,7 @@ void iwl3945_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
         clear_bit(STATUS_SCANNING, &priv->status);
  
         /* inform mac80211 scan aborted */
-       queue_work(priv->workqueue, &priv->scan_completed);
+       queue_work(priv->workqueue, &priv->abort_scan);
  }
  
  static void iwl3945_bg_restart(struct work_struct *data)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c

index 89ed181cd90cd9cd68506a212eb423ba60f9c8e2..857ae01734a66156c8abb92335be91cc674964a0 100644 (file)
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -162,6 +162,26 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_CBUS_1,       quirk_isa_d
  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC,     PCI_DEVICE_ID_NEC_CBUS_2,       quirk_isa_dma_hangs);
  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC,     PCI_DEVICE_ID_NEC_CBUS_3,       quirk_isa_dma_hangs);
  
+/*
+ * Intel NM10 "TigerPoint" LPC PM1a_STS.BM_STS must be clear
+ * for some HT machines to use C4 w/o hanging.
+ */
+static void __devinit quirk_tigerpoint_bm_sts(struct pci_dev *dev)
+{
+       u32 pmbase;
+       u16 pm1a;
+
+       pci_read_config_dword(dev, 0x40, &pmbase);
+       pmbase = pmbase & 0xff80;
+       pm1a = inw(pmbase);
+
+       if (pm1a & 0x10) {
+               dev_info(&dev->dev, FW_BUG "TigerPoint LPC.BM_STS cleared\n");
+               outw(0x10, pmbase);
+       }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_TGP_LPC, quirk_tigerpoint_bm_sts);
+
  /*
   *     Chipsets where PCI->PCI transfers vanish or hang
   */
diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c

index 9024480a82288ec071e26008f616fe2d9556c18b..c44a5e8b8b82da9d06706d9cd3a3ec0fcb2b883c 100644 (file)
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -51,7 +51,6 @@
   * TODO:
   *   - handle CPU hotplug
   *   - provide turbo enable/disable api
- *   - make sure we can write turbo enable/disable reg based on MISC_EN
   *
   * Related documents:
   *   - CDI 403777, 403778 - Auburndale EDS vol 1 & 2
@@ -230,7 +229,7 @@
  #define THM_TC2                0xac
  #define THM_DTV                0xb0
  #define THM_ITV                0xd8
-#define   ITV_ME_SEQNO_MASK 0x000f0000 /* ME should update every ~200ms */
+#define   ITV_ME_SEQNO_MASK 0x00ff0000 /* ME should update every ~200ms */
  #define   ITV_ME_SEQNO_SHIFT (16)
  #define   ITV_MCH_TEMP_MASK 0x0000ff00
  #define   ITV_MCH_TEMP_SHIFT (8)
@@ -325,6 +324,7 @@ struct ips_driver {
         bool gpu_preferred;
         bool poll_turbo_status;
         bool second_cpu;
+       bool turbo_toggle_allowed;
         struct ips_mcp_limits *limits;
  
         /* Optional MCH interfaces for if i915 is in use */
@@ -415,7 +415,7 @@ static void ips_cpu_lower(struct ips_driver *ips)
         new_limit = cur_limit - 8; /* 1W decrease */
  
         /* Clamp to SKU TDP limit */
-       if (((new_limit * 10) / 8) < (ips->orig_turbo_limit & TURBO_TDP_MASK))
+       if (new_limit  < (ips->orig_turbo_limit & TURBO_TDP_MASK))
                 new_limit = ips->orig_turbo_limit & TURBO_TDP_MASK;
  
         thm_writew(THM_MPCPC, (new_limit * 10) / 8);
@@ -461,7 +461,8 @@ static void ips_enable_cpu_turbo(struct ips_driver *ips)
         if (ips->__cpu_turbo_on)
                 return;
  
-       on_each_cpu(do_enable_cpu_turbo, ips, 1);
+       if (ips->turbo_toggle_allowed)
+               on_each_cpu(do_enable_cpu_turbo, ips, 1);
  
         ips->__cpu_turbo_on = true;
  }
@@ -498,7 +499,8 @@ static void ips_disable_cpu_turbo(struct ips_driver *ips)
         if (!ips->__cpu_turbo_on)
                 return;
  
-       on_each_cpu(do_disable_cpu_turbo, ips, 1);
+       if (ips->turbo_toggle_allowed)
+               on_each_cpu(do_disable_cpu_turbo, ips, 1);
  
         ips->__cpu_turbo_on = false;
  }
@@ -598,17 +600,29 @@ static bool mcp_exceeded(struct ips_driver *ips)
  {
         unsigned long flags;
         bool ret = false;
+       u32 temp_limit;
+       u32 avg_power;
+       const char *msg = "MCP limit exceeded: ";
  
         spin_lock_irqsave(&ips->turbo_status_lock, flags);
-       if (ips->mcp_avg_temp > (ips->mcp_temp_limit * 100))
-               ret = true;
-       if (ips->cpu_avg_power + ips->mch_avg_power > ips->mcp_power_limit)
+
+       temp_limit = ips->mcp_temp_limit * 100;
+       if (ips->mcp_avg_temp > temp_limit) {
+               dev_info(&ips->dev->dev,
+                       "%sAvg temp %u, limit %u\n", msg, ips->mcp_avg_temp,
+                       temp_limit);
                 ret = true;
-       spin_unlock_irqrestore(&ips->turbo_status_lock, flags);
+       }
  
-       if (ret)
+       avg_power = ips->cpu_avg_power + ips->mch_avg_power;
+       if (avg_power > ips->mcp_power_limit) {
                 dev_info(&ips->dev->dev,
-                        "MCP power or thermal limit exceeded\n");
+                       "%sAvg power %u, limit %u\n", msg, avg_power,
+                       ips->mcp_power_limit);
+               ret = true;
+       }
+
+       spin_unlock_irqrestore(&ips->turbo_status_lock, flags);
  
         return ret;
  }
@@ -662,6 +676,27 @@ static bool mch_exceeded(struct ips_driver *ips)
         return ret;
  }
  
+/**
+ * verify_limits - verify BIOS provided limits
+ * @ips: IPS structure
+ *
+ * BIOS can optionally provide non-default limits for power and temp.  Check
+ * them here and use the defaults if the BIOS values are not provided or
+ * are otherwise unusable.
+ */
+static void verify_limits(struct ips_driver *ips)
+{
+       if (ips->mcp_power_limit < ips->limits->mcp_power_limit ||
+           ips->mcp_power_limit > 35000)
+               ips->mcp_power_limit = ips->limits->mcp_power_limit;
+
+       if (ips->mcp_temp_limit < ips->limits->core_temp_limit ||
+           ips->mcp_temp_limit < ips->limits->mch_temp_limit ||
+           ips->mcp_temp_limit > 150)
+               ips->mcp_temp_limit = min(ips->limits->core_temp_limit,
+                                         ips->limits->mch_temp_limit);
+}
+
  /**
   * update_turbo_limits - get various limits & settings from regs
   * @ips: IPS driver struct
@@ -680,12 +715,21 @@ static void update_turbo_limits(struct ips_driver *ips)
         u32 hts = thm_readl(THM_HTS);
  
         ips->cpu_turbo_enabled = !(hts & HTS_PCTD_DIS);
-       ips->gpu_turbo_enabled = !(hts & HTS_GTD_DIS);
+       /* 
+        * Disable turbo for now, until we can figure out why the power figures
+        * are wrong
+        */
+       ips->cpu_turbo_enabled = false;
+
+       if (ips->gpu_busy)
+               ips->gpu_turbo_enabled = !(hts & HTS_GTD_DIS);
+
         ips->core_power_limit = thm_readw(THM_MPCPC);
         ips->mch_power_limit = thm_readw(THM_MMGPC);
         ips->mcp_temp_limit = thm_readw(THM_PTL);
         ips->mcp_power_limit = thm_readw(THM_MPPC);
  
+       verify_limits(ips);
         /* Ignore BIOS CPU vs GPU pref */
  }
  
@@ -858,7 +902,7 @@ static u32 get_cpu_power(struct ips_driver *ips, u32 *last, int period)
         ret = (ret * 1000) / 65535;
         *last = val;
  
-       return ret;
+       return 0;
  }
  
  static const u16 temp_decay_factor = 2;
@@ -940,7 +984,6 @@ static int ips_monitor(void *data)
                 kfree(mch_samples);
                 kfree(cpu_samples);
                 kfree(mchp_samples);
-               kthread_stop(ips->adjust);
                 return -ENOMEM;
         }
  
@@ -948,7 +991,7 @@ static int ips_monitor(void *data)
                 ITV_ME_SEQNO_SHIFT;
         seqno_timestamp = get_jiffies_64();
  
-       old_cpu_power = thm_readl(THM_CEC) / 65535;
+       old_cpu_power = thm_readl(THM_CEC);
         schedule_timeout_interruptible(msecs_to_jiffies(IPS_SAMPLE_PERIOD));
  
         /* Collect an initial average */
@@ -1150,11 +1193,18 @@ static irqreturn_t ips_irq_handler(int irq, void *arg)
                                 STS_GPL_SHIFT;
                         /* ignore EC CPU vs GPU pref */
                         ips->cpu_turbo_enabled = !(sts & STS_PCTD_DIS);
-                       ips->gpu_turbo_enabled = !(sts & STS_GTD_DIS);
+                       /* 
+                        * Disable turbo for now, until we can figure
+                        * out why the power figures are wrong
+                        */
+                       ips->cpu_turbo_enabled = false;
+                       if (ips->gpu_busy)
+                               ips->gpu_turbo_enabled = !(sts & STS_GTD_DIS);
                         ips->mcp_temp_limit = (sts & STS_PTL_MASK) >>
                                 STS_PTL_SHIFT;
                         ips->mcp_power_limit = (tc1 & STS_PPL_MASK) >>
                                 STS_PPL_SHIFT;
+                       verify_limits(ips);
                         spin_unlock(&ips->turbo_status_lock);
  
                         thm_writeb(THM_SEC, SEC_ACK);
@@ -1333,8 +1383,10 @@ static struct ips_mcp_limits *ips_detect_cpu(struct ips_driver *ips)
          * turbo manually or we'll get an illegal MSR access, even though
          * turbo will still be available.
          */
-       if (!(misc_en & IA32_MISC_TURBO_EN))
-               ; /* add turbo MSR write allowed flag if necessary */
+       if (misc_en & IA32_MISC_TURBO_EN)
+               ips->turbo_toggle_allowed = true;
+       else
+               ips->turbo_toggle_allowed = false;
  
         if (strstr(boot_cpu_data.x86_model_id, "CPU       M"))
                 limits = &ips_sv_limits;
@@ -1351,9 +1403,10 @@ static struct ips_mcp_limits *ips_detect_cpu(struct ips_driver *ips)
         tdp = turbo_power & TURBO_TDP_MASK;
  
         /* Sanity check TDP against CPU */
-       if (limits->mcp_power_limit != (tdp / 8) * 1000) {
-               dev_warn(&ips->dev->dev, "Warning: CPU TDP doesn't match expected value (found %d, expected %d)\n",
-                        tdp / 8, limits->mcp_power_limit / 1000);
+       if (limits->core_power_limit != (tdp / 8) * 1000) {
+               dev_info(&ips->dev->dev, "CPU TDP doesn't match expected value (found %d, expected %d)\n",
+                        tdp / 8, limits->core_power_limit / 1000);
+               limits->core_power_limit = (tdp / 8) * 1000;
         }
  
  out:
@@ -1390,7 +1443,7 @@ static bool ips_get_i915_syms(struct ips_driver *ips)
         return true;
  
  out_put_busy:
-       symbol_put(i915_gpu_turbo_disable);
+       symbol_put(i915_gpu_busy);
  out_put_lower:
         symbol_put(i915_gpu_lower);
  out_put_raise:
@@ -1532,22 +1585,27 @@ static int ips_probe(struct pci_dev *dev, const struct pci_device_id *id)
         /* Save turbo limits & ratios */
         rdmsrl(TURBO_POWER_CURRENT_LIMIT, ips->orig_turbo_limit);
  
-       ips_enable_cpu_turbo(ips);
-       ips->cpu_turbo_enabled = true;
+       ips_disable_cpu_turbo(ips);
+       ips->cpu_turbo_enabled = false;
  
-       /* Set up the work queue and monitor/adjust threads */
-       ips->monitor = kthread_run(ips_monitor, ips, "ips-monitor");
-       if (IS_ERR(ips->monitor)) {
+       /* Create thermal adjust thread */
+       ips->adjust = kthread_create(ips_adjust, ips, "ips-adjust");
+       if (IS_ERR(ips->adjust)) {
                 dev_err(&dev->dev,
-                       "failed to create thermal monitor thread, aborting\n");
+                       "failed to create thermal adjust thread, aborting\n");
                 ret = -ENOMEM;
                 goto error_free_irq;
+
         }
  
-       ips->adjust = kthread_create(ips_adjust, ips, "ips-adjust");
-       if (IS_ERR(ips->adjust)) {
+       /*
+        * Set up the work queue and monitor thread. The monitor thread
+        * will wake up ips_adjust thread.
+        */
+       ips->monitor = kthread_run(ips_monitor, ips, "ips-monitor");
+       if (IS_ERR(ips->monitor)) {
                 dev_err(&dev->dev,
-                       "failed to create thermal adjust thread, aborting\n");
+                       "failed to create thermal monitor thread, aborting\n");
                 ret = -ENOMEM;
                 goto error_thread_cleanup;
         }
@@ -1566,7 +1624,7 @@ static int ips_probe(struct pci_dev *dev, const struct pci_device_id *id)
         return ret;
  
  error_thread_cleanup:
-       kthread_stop(ips->monitor);
+       kthread_stop(ips->adjust);
  error_free_irq:
         free_irq(ips->dev->irq, ips);
  error_unmap:
diff --git a/drivers/regulator/ad5398.c b/drivers/regulator/ad5398.c

index df1fb53c09d266ff9369add1303c47f3d24702ab..a4be41614eebd41fa5733c09a7e9645c64c72c59 100644 (file)
--- a/drivers/regulator/ad5398.c
+++ b/drivers/regulator/ad5398.c
@@ -256,7 +256,6 @@ static int __devexit ad5398_remove(struct i2c_client *client)
  
         regulator_unregister(chip->rdev);
         kfree(chip);
-       i2c_set_clientdata(client, NULL);
  
         return 0;
  }
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c

index 422a709d271d51d593899db82b0b930cf93f2847..cc8b337b9119de5e955aabe1935ad931a895c71a 100644 (file)
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -700,7 +700,7 @@ static void print_constraints(struct regulator_dev *rdev)
             constraints->min_uA != constraints->max_uA) {
                 ret = _regulator_get_current_limit(rdev);
                 if (ret > 0)
-                       count += sprintf(buf + count, "at %d uA ", ret / 1000);
+                       count += sprintf(buf + count, "at %d mA ", ret / 1000);
         }
  
         if (constraints->valid_modes_mask & REGULATOR_MODE_FAST)
@@ -2302,8 +2302,10 @@ struct regulator_dev *regulator_register(struct regulator_desc *regulator_desc,
         dev_set_name(&rdev->dev, "regulator.%d",
                      atomic_inc_return(&regulator_no) - 1);
         ret = device_register(&rdev->dev);
-       if (ret != 0)
+       if (ret != 0) {
+               put_device(&rdev->dev);
                 goto clean;
+       }
  
         dev_set_drvdata(&rdev->dev, rdev);
  
diff --git a/drivers/regulator/isl6271a-regulator.c b/drivers/regulator/isl6271a-regulator.c

index d61ecb885a8c94e857e78dcb23452ee9540385f0..b8cc6389a541a0e3cbc0cc4a60231c166cd2e2e1 100644 (file)
--- a/drivers/regulator/isl6271a-regulator.c
+++ b/drivers/regulator/isl6271a-regulator.c
@@ -191,8 +191,6 @@ static int __devexit isl6271a_remove(struct i2c_client *i2c)
         struct isl_pmic *pmic = i2c_get_clientdata(i2c);
         int i;
  
-       i2c_set_clientdata(i2c, NULL);
-
         for (i = 0; i < 3; i++)
                 regulator_unregister(pmic->rdev[i]);
  
diff --git a/drivers/regulator/max8649.c b/drivers/regulator/max8649.c

index 4520ace3f7e707f82ccbf0fb068df921dfc6c2df..6b60a9c0366b3c5236fa7019844274c8b1155b3e 100644 (file)
--- a/drivers/regulator/max8649.c
+++ b/drivers/regulator/max8649.c
@@ -330,7 +330,7 @@ static int __devinit max8649_regulator_probe(struct i2c_client *client,
                 /* set external clock frequency */
                 info->extclk_freq = pdata->extclk_freq;
                 max8649_set_bits(info->i2c, MAX8649_SYNC, MAX8649_EXT_MASK,
-                                info->extclk_freq);
+                                info->extclk_freq << 6);
         }
  
         if (pdata->ramp_timing) {
diff --git a/drivers/rtc/rtc-ds3232.c b/drivers/rtc/rtc-ds3232.c

index 9daed8db83d3e5400559ac3c51c86d0e6b45f00d..9de8516e3531e70bad818747f41de4b8052486bd 100644 (file)
--- a/drivers/rtc/rtc-ds3232.c
+++ b/drivers/rtc/rtc-ds3232.c
@@ -268,7 +268,6 @@ out_irq:
                 free_irq(client->irq, client);
  
  out_free:
-       i2c_set_clientdata(client, NULL);
         kfree(ds3232);
         return ret;
  }
@@ -287,7 +286,6 @@ static int __devexit ds3232_remove(struct i2c_client *client)
         }
  
         rtc_device_unregister(ds3232->rtc);
-       i2c_set_clientdata(client, NULL);
         kfree(ds3232);
         return 0;
  }
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c

index ad0ed212db4ad094441f7a5656639e989c980738..348fba0a8976467fa9724411699b918a9cd8b0ea 100644 (file)
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1046,13 +1046,13 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf,
  
         /* If the user actually wanted this page, we can skip the rest */
         if (page == 0)
-               return -EINVAL;
+               return 0;
  
         for (i = 0; i < min((int)buf[3], buf_len - 4); i++)
                 if (buf[i + 4] == page)
                         goto found;
  
-       if (i < buf[3] && i > buf_len)
+       if (i < buf[3] && i >= buf_len - 4)
                 /* ran off the end of the buffer, give us benefit of doubt */
                 goto found;
         /* The device claims it doesn't support the requested page */
diff --git a/drivers/serial/ioc3_serial.c b/drivers/serial/ioc3_serial.c

index 93de907b12088a54ab5809cf79c46ba5ec2757f2..800c54602339780cc79d615e16fb89f49bf3d5f5 100644 (file)
--- a/drivers/serial/ioc3_serial.c
+++ b/drivers/serial/ioc3_serial.c
@@ -2044,6 +2044,7 @@ ioc3uart_probe(struct ioc3_submodule *is, struct ioc3_driver_data *idd)
                 if (!port) {
                         printk(KERN_WARNING
                                "IOC3 serial memory not available for port\n");
+                       ret = -ENOMEM;
                         goto out4;
                 }
                 spin_lock_init(&port->ip_lock);
diff --git a/drivers/serial/mfd.c b/drivers/serial/mfd.c

index 324c385a653d3e0dc8240b0c28c2eab560ee9382..5dff45c76d32c5f026dc1ae408166a4f5e9ac6d2 100644 (file)
--- a/drivers/serial/mfd.c
+++ b/drivers/serial/mfd.c
@@ -27,6 +27,7 @@
  #include <linux/init.h>
  #include <linux/console.h>
  #include <linux/sysrq.h>
+#include <linux/slab.h>
  #include <linux/serial_reg.h>
  #include <linux/circ_buf.h>
  #include <linux/delay.h>
diff --git a/drivers/serial/mrst_max3110.c b/drivers/serial/mrst_max3110.c

index f6ad1ecbff79ec9688fbc024a2036f351efff26e..51c15f58e01ef84a8ce43ba8b62472f0e4d7bec0 100644 (file)
--- a/drivers/serial/mrst_max3110.c
+++ b/drivers/serial/mrst_max3110.c
@@ -29,6 +29,7 @@
  
  #include <linux/module.h>
  #include <linux/ioport.h>
+#include <linux/irq.h>
  #include <linux/init.h>
  #include <linux/console.h>
  #include <linux/sysrq.h>
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c

index 0bcf4c1601a23a8225c2c4395c2309e191439f5d..b5a78a1f4421a0c19aa8735bd49f076fd8f91b46 100644 (file)
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -23,6 +23,7 @@
  #include <linux/init.h>
  #include <linux/cache.h>
  #include <linux/mutex.h>
+#include <linux/of_device.h>
  #include <linux/slab.h>
  #include <linux/mod_devicetable.h>
  #include <linux/spi/spi.h>
@@ -86,6 +87,10 @@ static int spi_match_device(struct device *dev, struct device_driver *drv)
         const struct spi_device *spi = to_spi_device(dev);
         const struct spi_driver *sdrv = to_spi_driver(drv);
  
+       /* Attempt an OF style match */
+       if (of_driver_match_device(dev, drv))
+               return 1;
+
         if (sdrv->id_table)
                 return !!spi_match_id(sdrv->id_table, spi);
  
diff --git a/drivers/spi/spi_gpio.c b/drivers/spi/spi_gpio.c

index e24a63498acb84f7e4f9f83710aca3fadefcc177..63e51b011d508c8922bb047cfb94572049323493 100644 (file)
--- a/drivers/spi/spi_gpio.c
+++ b/drivers/spi/spi_gpio.c
@@ -350,7 +350,7 @@ static int __init spi_gpio_probe(struct platform_device *pdev)
         spi_gpio->bitbang.master = spi_master_get(master);
         spi_gpio->bitbang.chipselect = spi_gpio_chipselect;
  
-       if ((master_flags & (SPI_MASTER_NO_RX | SPI_MASTER_NO_RX)) == 0) {
+       if ((master_flags & (SPI_MASTER_NO_TX | SPI_MASTER_NO_RX)) == 0) {
                 spi_gpio->bitbang.txrx_word[SPI_MODE_0] = spi_gpio_txrx_word_mode0;
                 spi_gpio->bitbang.txrx_word[SPI_MODE_1] = spi_gpio_txrx_word_mode1;
                 spi_gpio->bitbang.txrx_word[SPI_MODE_2] = spi_gpio_txrx_word_mode2;
diff --git a/drivers/spi/spi_mpc8xxx.c b/drivers/spi/spi_mpc8xxx.c

index d31b57f7baaf3c3bf970294afd13a1396cecf46c..1dd86b835cd86a800aa8db8257e4a37422a33b8a 100644 (file)
--- a/drivers/spi/spi_mpc8xxx.c
+++ b/drivers/spi/spi_mpc8xxx.c
@@ -408,11 +408,17 @@ static void mpc8xxx_spi_cpm_bufs_start(struct mpc8xxx_spi *mspi)
  
         xfer_ofs = mspi->xfer_in_progress->len - mspi->count;
  
-       out_be32(&rx_bd->cbd_bufaddr, mspi->rx_dma + xfer_ofs);
+       if (mspi->rx_dma == mspi->dma_dummy_rx)
+               out_be32(&rx_bd->cbd_bufaddr, mspi->rx_dma);
+       else
+               out_be32(&rx_bd->cbd_bufaddr, mspi->rx_dma + xfer_ofs);
         out_be16(&rx_bd->cbd_datlen, 0);
         out_be16(&rx_bd->cbd_sc, BD_SC_EMPTY | BD_SC_INTRPT | BD_SC_WRAP);
  
-       out_be32(&tx_bd->cbd_bufaddr, mspi->tx_dma + xfer_ofs);
+       if (mspi->tx_dma == mspi->dma_dummy_tx)
+               out_be32(&tx_bd->cbd_bufaddr, mspi->tx_dma);
+       else
+               out_be32(&tx_bd->cbd_bufaddr, mspi->tx_dma + xfer_ofs);
         out_be16(&tx_bd->cbd_datlen, xfer_len);
         out_be16(&tx_bd->cbd_sc, BD_SC_READY | BD_SC_INTRPT | BD_SC_WRAP |
                                  BD_SC_LAST);
diff --git a/drivers/staging/tm6000/Kconfig b/drivers/staging/tm6000/Kconfig

index c725356cc3466ec5963290415765aab5fb50188a..de7ebb99d8f6cc61ff600b2d9580627f1a12e78d 100644 (file)
--- a/drivers/staging/tm6000/Kconfig
+++ b/drivers/staging/tm6000/Kconfig
@@ -1,6 +1,6 @@
  config VIDEO_TM6000
         tristate "TV Master TM5600/6000/6010 driver"
-       depends on VIDEO_DEV && I2C && INPUT && USB && EXPERIMENTAL
+       depends on VIDEO_DEV && I2C && INPUT && IR_CORE && USB && EXPERIMENTAL
         select VIDEO_TUNER
         select MEDIA_TUNER_XC2028
         select MEDIA_TUNER_XC5000
diff --git a/drivers/staging/tm6000/tm6000-input.c b/drivers/staging/tm6000/tm6000-input.c

index 32f7a0af6938094380e9de73b933f652f0a00612..54f7667cc7062b640f2fb97717f7c7c26768f2cf 100644 (file)
--- a/drivers/staging/tm6000/tm6000-input.c
+++ b/drivers/staging/tm6000/tm6000-input.c
@@ -46,7 +46,7 @@ MODULE_PARM_DESC(enable_ir, "enable ir (default is enable");
         }
  
  struct tm6000_ir_poll_result {
-       u8 rc_data[4];
+       u16 rc_data;
  };
  
  struct tm6000_IR {
@@ -60,9 +60,9 @@ struct tm6000_IR {
         int                     polling;
         struct delayed_work     work;
         u8                      wait:1;
+       u8                      key:1;
         struct urb              *int_urb;
         u8                      *urb_data;
-       u8                      key:1;
  
         int (*get_key) (struct tm6000_IR *, struct tm6000_ir_poll_result *);
  
@@ -122,13 +122,14 @@ static void tm6000_ir_urb_received(struct urb *urb)
  
         if (urb->status != 0)
                 printk(KERN_INFO "not ready\n");
-       else if (urb->actual_length > 0)
+       else if (urb->actual_length > 0) {
                 memcpy(ir->urb_data, urb->transfer_buffer, urb->actual_length);
  
-       dprintk("data %02x %02x %02x %02x\n", ir->urb_data[0],
-       ir->urb_data[1], ir->urb_data[2], ir->urb_data[3]);
+               dprintk("data %02x %02x %02x %02x\n", ir->urb_data[0],
+                       ir->urb_data[1], ir->urb_data[2], ir->urb_data[3]);
  
-       ir->key = 1;
+               ir->key = 1;
+       }
  
         rc = usb_submit_urb(urb, GFP_ATOMIC);
  }
@@ -140,30 +141,47 @@ static int default_polling_getkey(struct tm6000_IR *ir,
         int rc;
         u8 buf[2];
  
-       if (ir->wait && !&dev->int_in) {
-               poll_result->rc_data[0] = 0xff;
+       if (ir->wait && !&dev->int_in)
                 return 0;
-       }
  
         if (&dev->int_in) {
-               poll_result->rc_data[0] = ir->urb_data[0];
-               poll_result->rc_data[1] = ir->urb_data[1];
+               if (ir->ir.ir_type == IR_TYPE_RC5)
+                       poll_result->rc_data = ir->urb_data[0];
+               else
+                       poll_result->rc_data = ir->urb_data[0] | ir->urb_data[1] << 8;
         } else {
                 tm6000_set_reg(dev, REQ_04_EN_DISABLE_MCU_INT, 2, 0);
                 msleep(10);
                 tm6000_set_reg(dev, REQ_04_EN_DISABLE_MCU_INT, 2, 1);
                 msleep(10);
  
-               rc = tm6000_read_write_usb(dev, USB_DIR_IN | USB_TYPE_VENDOR |
-                USB_RECIP_DEVICE, REQ_02_GET_IR_CODE, 0, 0, buf, 1);
+               if (ir->ir.ir_type == IR_TYPE_RC5) {
+                       rc = tm6000_read_write_usb(dev, USB_DIR_IN |
+                               USB_TYPE_VENDOR | USB_RECIP_DEVICE,
+                               REQ_02_GET_IR_CODE, 0, 0, buf, 1);
  
-               msleep(10);
+                       msleep(10);
  
-               dprintk("read data=%02x\n", buf[0]);
-               if (rc < 0)
-                       return rc;
+                       dprintk("read data=%02x\n", buf[0]);
+                       if (rc < 0)
+                               return rc;
  
-               poll_result->rc_data[0] = buf[0];
+                       poll_result->rc_data = buf[0];
+               } else {
+                       rc = tm6000_read_write_usb(dev, USB_DIR_IN |
+                               USB_TYPE_VENDOR | USB_RECIP_DEVICE,
+                               REQ_02_GET_IR_CODE, 0, 0, buf, 2);
+
+                       msleep(10);
+
+                       dprintk("read data=%04x\n", buf[0] | buf[1] << 8);
+                       if (rc < 0)
+                               return rc;
+
+                       poll_result->rc_data = buf[0] | buf[1] << 8;
+               }
+               if ((poll_result->rc_data & 0x00ff) != 0xff)
+                       ir->key = 1;
         }
         return 0;
  }
@@ -180,12 +198,11 @@ static void tm6000_ir_handle_key(struct tm6000_IR *ir)
                 return;
         }
  
-       dprintk("ir->get_key result data=%02x %02x\n",
-               poll_result.rc_data[0], poll_result.rc_data[1]);
+       dprintk("ir->get_key result data=%04x\n", poll_result.rc_data);
  
-       if (poll_result.rc_data[0] != 0xff && ir->key == 1) {
+       if (ir->key) {
                 ir_input_keydown(ir->input->input_dev, &ir->ir,
-                       poll_result.rc_data[0] | poll_result.rc_data[1] << 8);
+                               (u32)poll_result.rc_data);
  
                 ir_input_nokey(ir->input->input_dev, &ir->ir);
                 ir->key = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c

index 29bac5118877ef2028a781e6b409e2fe36463a99..d409495876f11b24fabaebc01d57f02224fe459f 100644 (file)
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -755,7 +755,10 @@ int register_xenstore_notifier(struct notifier_block *nb)
  {
         int ret = 0;
  
-       blocking_notifier_chain_register(&xenstore_chain, nb);
+       if (xenstored_ready > 0)
+               ret = nb->notifier_call(nb, 0, NULL);
+       else
+               blocking_notifier_chain_register(&xenstore_chain, nb);
  
         return ret;
  }
@@ -769,7 +772,7 @@ EXPORT_SYMBOL_GPL(unregister_xenstore_notifier);
  
  void xenbus_probe(struct work_struct *unused)
  {
-       BUG_ON((xenstored_ready <= 0));
+       xenstored_ready = 1;
  
         /* Enumerate devices in xenstore and watch for changes. */
         xenbus_probe_devices(&xenbus_frontend);
@@ -835,8 +838,8 @@ static int __init xenbus_init(void)
                         xen_store_evtchn = xen_start_info->store_evtchn;
                         xen_store_mfn = xen_start_info->store_mfn;
                         xen_store_interface = mfn_to_virt(xen_store_mfn);
+                       xenstored_ready = 1;
                 }
-               xenstored_ready = 1;
         }
  
         /* Initialize the interface to xenstore. */
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c

index f96eff04e11ab4a8b23f7489ee4b0de50e67e152..a6395bdb26aeb13b7b98c74df4f77780c1c95412 100644 (file)
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -134,10 +134,6 @@ static int aout_core_dump(struct coredump_params *cprm)
                 if (!dump_write(file, dump_start, dump_size))
                         goto end_coredump;
         }
-/* Finally dump the task struct.  Not be used by gdb, but could be useful */
-       set_fs(KERNEL_DS);
-       if (!dump_write(file, current, sizeof(*current)))
-               goto end_coredump;
  end_coredump:
         set_fs(fs);
         return has_dumped;
diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig

index 0fcd2640c23fdda2c7fd99415b730837c591eba8..9eb134ea6eb223a45be745c77f9530b80fcdc082 100644 (file)
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -1,9 +1,11 @@
  config CEPH_FS
          tristate "Ceph distributed file system (EXPERIMENTAL)"
         depends on INET && EXPERIMENTAL
+       select CEPH_LIB
         select LIBCRC32C
         select CRYPTO_AES
         select CRYPTO
+       default n
         help
           Choose Y or M here to include support for mounting the
           experimental Ceph distributed file system.  Ceph is an extremely
@@ -14,15 +16,3 @@ config CEPH_FS
  
           If unsure, say N.
  
-config CEPH_FS_PRETTYDEBUG
-       bool "Include file:line in ceph debug output"
-       depends on CEPH_FS
-       default n
-       help
-         If you say Y here, debug output will include a filename and
-         line to aid debugging.  This icnreases kernel size and slows
-         execution slightly when debug call sites are enabled (e.g.,
-         via CONFIG_DYNAMIC_DEBUG).
-
-         If unsure, say N.
-
diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile

index 278e1172600dc3a3d5acba3654c53719d6f38697..9e6c4f2e8ff1f3e2712979d791da9e55fa780982 100644 (file)
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -8,15 +8,8 @@ obj-$(CONFIG_CEPH_FS) += ceph.o
  
  ceph-objs := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
         export.o caps.o snap.o xattr.o \
-       messenger.o msgpool.o buffer.o pagelist.o \
-       mds_client.o mdsmap.o \
-       mon_client.o \
-       osd_client.o osdmap.o crush/crush.o crush/mapper.o crush/hash.o \
-       debugfs.o \
-       auth.o auth_none.o \
-       crypto.o armor.o \
-       auth_x.o \
-       ceph_fs.o ceph_strings.o ceph_hash.o ceph_frag.o
+       mds_client.o mdsmap.o strings.o ceph_frag.o \
+       debugfs.o
  
  else
  #Otherwise we were called directly from the command
diff --git a/fs/ceph/README b/fs/ceph/README

deleted file mode 100644 (file)

index 18352fa..0000000
--- a/fs/ceph/README
+++ /dev/null
@@ -1,20 +0,0 @@
-#
-# The following files are shared by (and manually synchronized
-# between) the Ceph userland and kernel client.
-#
-# userland                  kernel
-src/include/ceph_fs.h      fs/ceph/ceph_fs.h
-src/include/ceph_fs.cc     fs/ceph/ceph_fs.c
-src/include/msgr.h         fs/ceph/msgr.h
-src/include/rados.h        fs/ceph/rados.h
-src/include/ceph_strings.cc fs/ceph/ceph_strings.c
-src/include/ceph_frag.h            fs/ceph/ceph_frag.h
-src/include/ceph_frag.cc    fs/ceph/ceph_frag.c
-src/include/ceph_hash.h            fs/ceph/ceph_hash.h
-src/include/ceph_hash.cc    fs/ceph/ceph_hash.c
-src/crush/crush.c          fs/ceph/crush/crush.c
-src/crush/crush.h          fs/ceph/crush/crush.h
-src/crush/mapper.c         fs/ceph/crush/mapper.c
-src/crush/mapper.h         fs/ceph/crush/mapper.h
-src/crush/hash.h           fs/ceph/crush/hash.h
-src/crush/hash.c           fs/ceph/crush/hash.c
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c

index efbc604001c8bbfa3b96eb107529f0f03256b1a1..51bcc5ce323024a995d300b4a6035ecf8e72e94b 100644 (file)
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1,4 +1,4 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/backing-dev.h>
  #include <linux/fs.h>
@@ -10,7 +10,8 @@
  #include <linux/task_io_accounting_ops.h>
  
  #include "super.h"
-#include "osd_client.h"
+#include "mds_client.h"
+#include <linux/ceph/osd_client.h>
  
  /*
   * Ceph address space ops.
@@ -193,7 +194,8 @@ static int readpage_nounlock(struct file *filp, struct page *page)
  {
         struct inode *inode = filp->f_dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_osd_client *osdc = &ceph_inode_to_client(inode)->osdc;
+       struct ceph_osd_client *osdc = 
+               &ceph_inode_to_client(inode)->client->osdc;
         int err = 0;
         u64 len = PAGE_CACHE_SIZE;
  
@@ -265,7 +267,8 @@ static int ceph_readpages(struct file *file, struct address_space *mapping,
  {
         struct inode *inode = file->f_dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_osd_client *osdc = &ceph_inode_to_client(inode)->osdc;
+       struct ceph_osd_client *osdc =
+               &ceph_inode_to_client(inode)->client->osdc;
         int rc = 0;
         struct page **pages;
         loff_t offset;
@@ -365,7 +368,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
  {
         struct inode *inode;
         struct ceph_inode_info *ci;
-       struct ceph_client *client;
+       struct ceph_fs_client *fsc;
         struct ceph_osd_client *osdc;
         loff_t page_off = page->index << PAGE_CACHE_SHIFT;
         int len = PAGE_CACHE_SIZE;
@@ -383,8 +386,8 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
         }
         inode = page->mapping->host;
         ci = ceph_inode(inode);
-       client = ceph_inode_to_client(inode);
-       osdc = &client->osdc;
+       fsc = ceph_inode_to_client(inode);
+       osdc = &fsc->client->osdc;
  
         /* verify this is a writeable snap context */
         snapc = (void *)page->private;
@@ -414,10 +417,10 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
         dout("writepage %p page %p index %lu on %llu~%u snapc %p\n",
              inode, page, page->index, page_off, len, snapc);
  
-       writeback_stat = atomic_long_inc_return(&client->writeback_count);
+       writeback_stat = atomic_long_inc_return(&fsc->writeback_count);
         if (writeback_stat >
-           CONGESTION_ON_THRESH(client->mount_args->congestion_kb))
-               set_bdi_congested(&client->backing_dev_info, BLK_RW_ASYNC);
+           CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
+               set_bdi_congested(&fsc->backing_dev_info, BLK_RW_ASYNC);
  
         set_page_writeback(page);
         err = ceph_osdc_writepages(osdc, ceph_vino(inode),
@@ -496,7 +499,7 @@ static void writepages_finish(struct ceph_osd_request *req,
         struct address_space *mapping = inode->i_mapping;
         __s32 rc = -EIO;
         u64 bytes = 0;
-       struct ceph_client *client = ceph_inode_to_client(inode);
+       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
         long writeback_stat;
         unsigned issued = ceph_caps_issued(ci);
  
@@ -529,10 +532,10 @@ static void writepages_finish(struct ceph_osd_request *req,
                 WARN_ON(!PageUptodate(page));
  
                 writeback_stat =
-                       atomic_long_dec_return(&client->writeback_count);
+                       atomic_long_dec_return(&fsc->writeback_count);
                 if (writeback_stat <
-                   CONGESTION_OFF_THRESH(client->mount_args->congestion_kb))
-                       clear_bdi_congested(&client->backing_dev_info,
+                   CONGESTION_OFF_THRESH(fsc->mount_options->congestion_kb))
+                       clear_bdi_congested(&fsc->backing_dev_info,
                                             BLK_RW_ASYNC);
  
                 ceph_put_snap_context((void *)page->private);
@@ -569,13 +572,13 @@ static void writepages_finish(struct ceph_osd_request *req,
   * mempool.  we avoid the mempool if we can because req->r_num_pages
   * may be less than the maximum write size.
   */
-static void alloc_page_vec(struct ceph_client *client,
+static void alloc_page_vec(struct ceph_fs_client *fsc,
                            struct ceph_osd_request *req)
  {
         req->r_pages = kmalloc(sizeof(struct page *) * req->r_num_pages,
                                GFP_NOFS);
         if (!req->r_pages) {
-               req->r_pages = mempool_alloc(client->wb_pagevec_pool, GFP_NOFS);
+               req->r_pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
                 req->r_pages_from_pool = 1;
                 WARN_ON(!req->r_pages);
         }
@@ -590,7 +593,7 @@ static int ceph_writepages_start(struct address_space *mapping,
         struct inode *inode = mapping->host;
         struct backing_dev_info *bdi = mapping->backing_dev_info;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_client *client;
+       struct ceph_fs_client *fsc;
         pgoff_t index, start, end;
         int range_whole = 0;
         int should_loop = 1;
@@ -617,13 +620,13 @@ static int ceph_writepages_start(struct address_space *mapping,
              wbc->sync_mode == WB_SYNC_NONE ? "NONE" :
              (wbc->sync_mode == WB_SYNC_ALL ? "ALL" : "HOLD"));
  
-       client = ceph_inode_to_client(inode);
-       if (client->mount_state == CEPH_MOUNT_SHUTDOWN) {
+       fsc = ceph_inode_to_client(inode);
+       if (fsc->mount_state == CEPH_MOUNT_SHUTDOWN) {
                 pr_warning("writepage_start %p on forced umount\n", inode);
                 return -EIO; /* we're in a forced umount, don't write! */
         }
-       if (client->mount_args->wsize && client->mount_args->wsize < wsize)
-               wsize = client->mount_args->wsize;
+       if (fsc->mount_options->wsize && fsc->mount_options->wsize < wsize)
+               wsize = fsc->mount_options->wsize;
         if (wsize < PAGE_CACHE_SIZE)
                 wsize = PAGE_CACHE_SIZE;
         max_pages_ever = wsize >> PAGE_CACHE_SHIFT;
@@ -769,7 +772,7 @@ get_more_pages:
                                 offset = (unsigned long long)page->index
                                         << PAGE_CACHE_SHIFT;
                                 len = wsize;
-                               req = ceph_osdc_new_request(&client->osdc,
+                               req = ceph_osdc_new_request(&fsc->client->osdc,
                                             &ci->i_layout,
                                             ceph_vino(inode),
                                             offset, &len,
@@ -782,7 +785,7 @@ get_more_pages:
                                             &inode->i_mtime, true, 1);
                                 max_pages = req->r_num_pages;
  
-                               alloc_page_vec(client, req);
+                               alloc_page_vec(fsc, req);
                                 req->r_callback = writepages_finish;
                                 req->r_inode = inode;
                         }
@@ -794,10 +797,10 @@ get_more_pages:
                              inode, page, page->index);
  
                         writeback_stat =
-                              atomic_long_inc_return(&client->writeback_count);
+                              atomic_long_inc_return(&fsc->writeback_count);
                         if (writeback_stat > CONGESTION_ON_THRESH(
-                                   client->mount_args->congestion_kb)) {
-                               set_bdi_congested(&client->backing_dev_info,
+                                   fsc->mount_options->congestion_kb)) {
+                               set_bdi_congested(&fsc->backing_dev_info,
                                                   BLK_RW_ASYNC);
                         }
  
@@ -846,7 +849,7 @@ get_more_pages:
                 op->payload_len = cpu_to_le32(len);
                 req->r_request->hdr.data_len = cpu_to_le32(len);
  
-               ceph_osdc_start_request(&client->osdc, req, true);
+               ceph_osdc_start_request(&fsc->client->osdc, req, true);
                 req = NULL;
  
                 /* continue? */
@@ -915,7 +918,7 @@ static int ceph_update_writeable_page(struct file *file,
  {
         struct inode *inode = file->f_dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
         loff_t page_off = pos & PAGE_CACHE_MASK;
         int pos_in_page = pos & ~PAGE_CACHE_MASK;
         int end_in_page = pos_in_page + len;
@@ -1053,8 +1056,8 @@ static int ceph_write_end(struct file *file, struct address_space *mapping,
                           struct page *page, void *fsdata)
  {
         struct inode *inode = file->f_dentry->d_inode;
-       struct ceph_client *client = ceph_inode_to_client(inode);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         unsigned from = pos & (PAGE_CACHE_SIZE - 1);
         int check_cap = 0;
  
@@ -1123,7 +1126,7 @@ static int ceph_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
  {
         struct inode *inode = vma->vm_file->f_dentry->d_inode;
         struct page *page = vmf->page;
-       struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
         loff_t off = page->index << PAGE_CACHE_SHIFT;
         loff_t size, len;
         int ret;
diff --git a/fs/ceph/armor.c b/fs/ceph/armor.c

deleted file mode 100644 (file)

index eb2a666..0000000
--- a/fs/ceph/armor.c
+++ /dev/null
@@ -1,103 +0,0 @@
-
-#include <linux/errno.h>
-
-int ceph_armor(char *dst, const char *src, const char *end);
-int ceph_unarmor(char *dst, const char *src, const char *end);
-
-/*
- * base64 encode/decode.
- */
-
-static const char *pem_key =
-       "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
-
-static int encode_bits(int c)
-{
-       return pem_key[c];
-}
-
-static int decode_bits(char c)
-{
-       if (c >= 'A' && c <= 'Z')
-               return c - 'A';
-       if (c >= 'a' && c <= 'z')
-               return c - 'a' + 26;
-       if (c >= '0' && c <= '9')
-               return c - '0' + 52;
-       if (c == '+')
-               return 62;
-       if (c == '/')
-               return 63;
-       if (c == '=')
-               return 0; /* just non-negative, please */
-       return -EINVAL;
-}
-
-int ceph_armor(char *dst, const char *src, const char *end)
-{
-       int olen = 0;
-       int line = 0;
-
-       while (src < end) {
-               unsigned char a, b, c;
-
-               a = *src++;
-               *dst++ = encode_bits(a >> 2);
-               if (src < end) {
-                       b = *src++;
-                       *dst++ = encode_bits(((a & 3) << 4) | (b >> 4));
-                       if (src < end) {
-                               c = *src++;
-                               *dst++ = encode_bits(((b & 15) << 2) |
-                                                    (c >> 6));
-                               *dst++ = encode_bits(c & 63);
-                       } else {
-                               *dst++ = encode_bits((b & 15) << 2);
-                               *dst++ = '=';
-                       }
-               } else {
-                       *dst++ = encode_bits(((a & 3) << 4));
-                       *dst++ = '=';
-                       *dst++ = '=';
-               }
-               olen += 4;
-               line += 4;
-               if (line == 64) {
-                       line = 0;
-                       *(dst++) = '\n';
-                       olen++;
-               }
-       }
-       return olen;
-}
-
-int ceph_unarmor(char *dst, const char *src, const char *end)
-{
-       int olen = 0;
-
-       while (src < end) {
-               int a, b, c, d;
-
-               if (src < end && src[0] == '\n')
-                       src++;
-               if (src + 4 > end)
-                       return -EINVAL;
-               a = decode_bits(src[0]);
-               b = decode_bits(src[1]);
-               c = decode_bits(src[2]);
-               d = decode_bits(src[3]);
-               if (a < 0 || b < 0 || c < 0 || d < 0)
-                       return -EINVAL;
-
-               *dst++ = (a << 2) | (b >> 4);
-               if (src[2] == '=')
-                       return olen + 1;
-               *dst++ = ((b & 15) << 4) | (c >> 2);
-               if (src[3] == '=')
-                       return olen + 2;
-               *dst++ = ((c & 3) << 6) | d;
-               olen += 3;
-               src += 4;
-       }
-       return olen;
-}
diff --git a/fs/ceph/auth.c b/fs/ceph/auth.c

deleted file mode 100644 (file)

index 6d2e306..0000000
--- a/fs/ceph/auth.c
+++ /dev/null
@@ -1,259 +0,0 @@
-#include "ceph_debug.h"
-
-#include <linux/module.h>
-#include <linux/err.h>
-#include <linux/slab.h>
-
-#include "types.h"
-#include "auth_none.h"
-#include "auth_x.h"
-#include "decode.h"
-#include "super.h"
-
-#include "messenger.h"
-
-/*
- * get protocol handler
- */
-static u32 supported_protocols[] = {
-       CEPH_AUTH_NONE,
-       CEPH_AUTH_CEPHX
-};
-
-static int ceph_auth_init_protocol(struct ceph_auth_client *ac, int protocol)
-{
-       switch (protocol) {
-       case CEPH_AUTH_NONE:
-               return ceph_auth_none_init(ac);
-       case CEPH_AUTH_CEPHX:
-               return ceph_x_init(ac);
-       default:
-               return -ENOENT;
-       }
-}
-
-/*
- * setup, teardown.
- */
-struct ceph_auth_client *ceph_auth_init(const char *name, const char *secret)
-{
-       struct ceph_auth_client *ac;
-       int ret;
-
-       dout("auth_init name '%s' secret '%s'\n", name, secret);
-
-       ret = -ENOMEM;
-       ac = kzalloc(sizeof(*ac), GFP_NOFS);
-       if (!ac)
-               goto out;
-
-       ac->negotiating = true;
-       if (name)
-               ac->name = name;
-       else
-               ac->name = CEPH_AUTH_NAME_DEFAULT;
-       dout("auth_init name %s secret %s\n", ac->name, secret);
-       ac->secret = secret;
-       return ac;
-
-out:
-       return ERR_PTR(ret);
-}
-
-void ceph_auth_destroy(struct ceph_auth_client *ac)
-{
-       dout("auth_destroy %p\n", ac);
-       if (ac->ops)
-               ac->ops->destroy(ac);
-       kfree(ac);
-}
-
-/*
- * Reset occurs when reconnecting to the monitor.
- */
-void ceph_auth_reset(struct ceph_auth_client *ac)
-{
-       dout("auth_reset %p\n", ac);
-       if (ac->ops && !ac->negotiating)
-               ac->ops->reset(ac);
-       ac->negotiating = true;
-}
-
-int ceph_entity_name_encode(const char *name, void **p, void *end)
-{
-       int len = strlen(name);
-
-       if (*p + 2*sizeof(u32) + len > end)
-               return -ERANGE;
-       ceph_encode_32(p, CEPH_ENTITY_TYPE_CLIENT);
-       ceph_encode_32(p, len);
-       ceph_encode_copy(p, name, len);
-       return 0;
-}
-
-/*
- * Initiate protocol negotiation with monitor.  Include entity name
- * and list supported protocols.
- */
-int ceph_auth_build_hello(struct ceph_auth_client *ac, void *buf, size_t len)
-{
-       struct ceph_mon_request_header *monhdr = buf;
-       void *p = monhdr + 1, *end = buf + len, *lenp;
-       int i, num;
-       int ret;
-
-       dout("auth_build_hello\n");
-       monhdr->have_version = 0;
-       monhdr->session_mon = cpu_to_le16(-1);
-       monhdr->session_mon_tid = 0;
-
-       ceph_encode_32(&p, 0);  /* no protocol, yet */
-
-       lenp = p;
-       p += sizeof(u32);
-
-       ceph_decode_need(&p, end, 1 + sizeof(u32), bad);
-       ceph_encode_8(&p, 1);
-       num = ARRAY_SIZE(supported_protocols);
-       ceph_encode_32(&p, num);
-       ceph_decode_need(&p, end, num * sizeof(u32), bad);
-       for (i = 0; i < num; i++)
-               ceph_encode_32(&p, supported_protocols[i]);
-
-       ret = ceph_entity_name_encode(ac->name, &p, end);
-       if (ret < 0)
-               return ret;
-       ceph_decode_need(&p, end, sizeof(u64), bad);
-       ceph_encode_64(&p, ac->global_id);
-
-       ceph_encode_32(&lenp, p - lenp - sizeof(u32));
-       return p - buf;
-
-bad:
-       return -ERANGE;
-}
-
-static int ceph_build_auth_request(struct ceph_auth_client *ac,
-                                  void *msg_buf, size_t msg_len)
-{
-       struct ceph_mon_request_header *monhdr = msg_buf;
-       void *p = monhdr + 1;
-       void *end = msg_buf + msg_len;
-       int ret;
-
-       monhdr->have_version = 0;
-       monhdr->session_mon = cpu_to_le16(-1);
-       monhdr->session_mon_tid = 0;
-
-       ceph_encode_32(&p, ac->protocol);
-
-       ret = ac->ops->build_request(ac, p + sizeof(u32), end);
-       if (ret < 0) {
-               pr_err("error %d building auth method %s request\n", ret,
-                      ac->ops->name);
-               return ret;
-       }
-       dout(" built request %d bytes\n", ret);
-       ceph_encode_32(&p, ret);
-       return p + ret - msg_buf;
-}
-
-/*
- * Handle auth message from monitor.
- */
-int ceph_handle_auth_reply(struct ceph_auth_client *ac,
-                          void *buf, size_t len,
-                          void *reply_buf, size_t reply_len)
-{
-       void *p = buf;
-       void *end = buf + len;
-       int protocol;
-       s32 result;
-       u64 global_id;
-       void *payload, *payload_end;
-       int payload_len;
-       char *result_msg;
-       int result_msg_len;
-       int ret = -EINVAL;
-
-       dout("handle_auth_reply %p %p\n", p, end);
-       ceph_decode_need(&p, end, sizeof(u32) * 3 + sizeof(u64), bad);
-       protocol = ceph_decode_32(&p);
-       result = ceph_decode_32(&p);
-       global_id = ceph_decode_64(&p);
-       payload_len = ceph_decode_32(&p);
-       payload = p;
-       p += payload_len;
-       ceph_decode_need(&p, end, sizeof(u32), bad);
-       result_msg_len = ceph_decode_32(&p);
-       result_msg = p;
-       p += result_msg_len;
-       if (p != end)
-               goto bad;
-
-       dout(" result %d '%.*s' gid %llu len %d\n", result, result_msg_len,
-            result_msg, global_id, payload_len);
-
-       payload_end = payload + payload_len;
-
-       if (global_id && ac->global_id != global_id) {
-               dout(" set global_id %lld -> %lld\n", ac->global_id, global_id);
-               ac->global_id = global_id;
-       }
-
-       if (ac->negotiating) {
-               /* server does not support our protocols? */
-               if (!protocol && result < 0) {
-                       ret = result;
-                       goto out;
-               }
-               /* set up (new) protocol handler? */
-               if (ac->protocol && ac->protocol != protocol) {
-                       ac->ops->destroy(ac);
-                       ac->protocol = 0;
-                       ac->ops = NULL;
-               }
-               if (ac->protocol != protocol) {
-                       ret = ceph_auth_init_protocol(ac, protocol);
-                       if (ret) {
-                               pr_err("error %d on auth protocol %d init\n",
-                                      ret, protocol);
-                               goto out;
-                       }
-               }
-
-               ac->negotiating = false;
-       }
-
-       ret = ac->ops->handle_reply(ac, result, payload, payload_end);
-       if (ret == -EAGAIN) {
-               return ceph_build_auth_request(ac, reply_buf, reply_len);
-       } else if (ret) {
-               pr_err("auth method '%s' error %d\n", ac->ops->name, ret);
-               return ret;
-       }
-       return 0;
-
-bad:
-       pr_err("failed to decode auth msg\n");
-out:
-       return ret;
-}
-
-int ceph_build_auth(struct ceph_auth_client *ac,
-                   void *msg_buf, size_t msg_len)
-{
-       if (!ac->protocol)
-               return ceph_auth_build_hello(ac, msg_buf, msg_len);
-       BUG_ON(!ac->ops);
-       if (ac->ops->should_authenticate(ac))
-               return ceph_build_auth_request(ac, msg_buf, msg_len);
-       return 0;
-}
-
-int ceph_auth_is_authenticated(struct ceph_auth_client *ac)
-{
-       if (!ac->ops)
-               return 0;
-       return ac->ops->is_authenticated(ac);
-}
diff --git a/fs/ceph/auth.h b/fs/ceph/auth.h

deleted file mode 100644 (file)

index d38a2fb..0000000
--- a/fs/ceph/auth.h
+++ /dev/null
@@ -1,92 +0,0 @@
-#ifndef _FS_CEPH_AUTH_H
-#define _FS_CEPH_AUTH_H
-
-#include "types.h"
-#include "buffer.h"
-
-/*
- * Abstract interface for communicating with the authenticate module.
- * There is some handshake that takes place between us and the monitor
- * to acquire the necessary keys.  These are used to generate an
- * 'authorizer' that we use when connecting to a service (mds, osd).
- */
-
-struct ceph_auth_client;
-struct ceph_authorizer;
-
-struct ceph_auth_client_ops {
-       const char *name;
-
-       /*
-        * true if we are authenticated and can connect to
-        * services.
-        */
-       int (*is_authenticated)(struct ceph_auth_client *ac);
-
-       /*
-        * true if we should (re)authenticate, e.g., when our tickets
-        * are getting old and crusty.
-        */
-       int (*should_authenticate)(struct ceph_auth_client *ac);
-
-       /*
-        * build requests and process replies during monitor
-        * handshake.  if handle_reply returns -EAGAIN, we build
-        * another request.
-        */
-       int (*build_request)(struct ceph_auth_client *ac, void *buf, void *end);
-       int (*handle_reply)(struct ceph_auth_client *ac, int result,
-                           void *buf, void *end);
-
-       /*
-        * Create authorizer for connecting to a service, and verify
-        * the response to authenticate the service.
-        */
-       int (*create_authorizer)(struct ceph_auth_client *ac, int peer_type,
-                                struct ceph_authorizer **a,
-                                void **buf, size_t *len,
-                                void **reply_buf, size_t *reply_len);
-       int (*verify_authorizer_reply)(struct ceph_auth_client *ac,
-                                      struct ceph_authorizer *a, size_t len);
-       void (*destroy_authorizer)(struct ceph_auth_client *ac,
-                                  struct ceph_authorizer *a);
-       void (*invalidate_authorizer)(struct ceph_auth_client *ac,
-                                     int peer_type);
-
-       /* reset when we (re)connect to a monitor */
-       void (*reset)(struct ceph_auth_client *ac);
-
-       void (*destroy)(struct ceph_auth_client *ac);
-};
-
-struct ceph_auth_client {
-       u32 protocol;           /* CEPH_AUTH_* */
-       void *private;          /* for use by protocol implementation */
-       const struct ceph_auth_client_ops *ops;  /* null iff protocol==0 */
-
-       bool negotiating;       /* true if negotiating protocol */
-       const char *name;       /* entity name */
-       u64 global_id;          /* our unique id in system */
-       const char *secret;     /* our secret key */
-       unsigned want_keys;     /* which services we want */
-};
-
-extern struct ceph_auth_client *ceph_auth_init(const char *name,
-                                              const char *secret);
-extern void ceph_auth_destroy(struct ceph_auth_client *ac);
-
-extern void ceph_auth_reset(struct ceph_auth_client *ac);
-
-extern int ceph_auth_build_hello(struct ceph_auth_client *ac,
-                                void *buf, size_t len);
-extern int ceph_handle_auth_reply(struct ceph_auth_client *ac,
-                                 void *buf, size_t len,
-                                 void *reply_buf, size_t reply_len);
-extern int ceph_entity_name_encode(const char *name, void **p, void *end);
-
-extern int ceph_build_auth(struct ceph_auth_client *ac,
-                   void *msg_buf, size_t msg_len);
-
-extern int ceph_auth_is_authenticated(struct ceph_auth_client *ac);
-
-#endif
diff --git a/fs/ceph/auth_none.c b/fs/ceph/auth_none.c

deleted file mode 100644 (file)

index ad1dc21..0000000
--- a/fs/ceph/auth_none.c
+++ /dev/null
@@ -1,131 +0,0 @@
-
-#include "ceph_debug.h"
-
-#include <linux/err.h>
-#include <linux/module.h>
-#include <linux/random.h>
-#include <linux/slab.h>
-
-#include "auth_none.h"
-#include "auth.h"
-#include "decode.h"
-
-static void reset(struct ceph_auth_client *ac)
-{
-       struct ceph_auth_none_info *xi = ac->private;
-
-       xi->starting = true;
-       xi->built_authorizer = false;
-}
-
-static void destroy(struct ceph_auth_client *ac)
-{
-       kfree(ac->private);
-       ac->private = NULL;
-}
-
-static int is_authenticated(struct ceph_auth_client *ac)
-{
-       struct ceph_auth_none_info *xi = ac->private;
-
-       return !xi->starting;
-}
-
-static int should_authenticate(struct ceph_auth_client *ac)
-{
-       struct ceph_auth_none_info *xi = ac->private;
-
-       return xi->starting;
-}
-
-/*
- * the generic auth code decode the global_id, and we carry no actual
- * authenticate state, so nothing happens here.
- */
-static int handle_reply(struct ceph_auth_client *ac, int result,
-                       void *buf, void *end)
-{
-       struct ceph_auth_none_info *xi = ac->private;
-
-       xi->starting = false;
-       return result;
-}
-
-/*
- * build an 'authorizer' with our entity_name and global_id.  we can
- * reuse a single static copy since it is identical for all services
- * we connect to.
- */
-static int ceph_auth_none_create_authorizer(
-       struct ceph_auth_client *ac, int peer_type,
-       struct ceph_authorizer **a,
-       void **buf, size_t *len,
-       void **reply_buf, size_t *reply_len)
-{
-       struct ceph_auth_none_info *ai = ac->private;
-       struct ceph_none_authorizer *au = &ai->au;
-       void *p, *end;
-       int ret;
-
-       if (!ai->built_authorizer) {
-               p = au->buf;
-               end = p + sizeof(au->buf);
-               ceph_encode_8(&p, 1);
-               ret = ceph_entity_name_encode(ac->name, &p, end - 8);
-               if (ret < 0)
-                       goto bad;
-               ceph_decode_need(&p, end, sizeof(u64), bad2);
-               ceph_encode_64(&p, ac->global_id);
-               au->buf_len = p - (void *)au->buf;
-               ai->built_authorizer = true;
-               dout("built authorizer len %d\n", au->buf_len);
-       }
-
-       *a = (struct ceph_authorizer *)au;
-       *buf = au->buf;
-       *len = au->buf_len;
-       *reply_buf = au->reply_buf;
-       *reply_len = sizeof(au->reply_buf);
-       return 0;
-
-bad2:
-       ret = -ERANGE;
-bad:
-       return ret;
-}
-
-static void ceph_auth_none_destroy_authorizer(struct ceph_auth_client *ac,
-                                     struct ceph_authorizer *a)
-{
-       /* nothing to do */
-}
-
-static const struct ceph_auth_client_ops ceph_auth_none_ops = {
-       .name = "none",
-       .reset = reset,
-       .destroy = destroy,
-       .is_authenticated = is_authenticated,
-       .should_authenticate = should_authenticate,
-       .handle_reply = handle_reply,
-       .create_authorizer = ceph_auth_none_create_authorizer,
-       .destroy_authorizer = ceph_auth_none_destroy_authorizer,
-};
-
-int ceph_auth_none_init(struct ceph_auth_client *ac)
-{
-       struct ceph_auth_none_info *xi;
-
-       dout("ceph_auth_none_init %p\n", ac);
-       xi = kzalloc(sizeof(*xi), GFP_NOFS);
-       if (!xi)
-               return -ENOMEM;
-
-       xi->starting = true;
-       xi->built_authorizer = false;
-
-       ac->protocol = CEPH_AUTH_NONE;
-       ac->private = xi;
-       ac->ops = &ceph_auth_none_ops;
-       return 0;
-}
-
diff --git a/fs/ceph/auth_none.h b/fs/ceph/auth_none.h

deleted file mode 100644 (file)

index 8164df1..0000000
--- a/fs/ceph/auth_none.h
+++ /dev/null
@@ -1,30 +0,0 @@
-#ifndef _FS_CEPH_AUTH_NONE_H
-#define _FS_CEPH_AUTH_NONE_H
-
-#include <linux/slab.h>
-
-#include "auth.h"
-
-/*
- * null security mode.
- *
- * we use a single static authorizer that simply encodes our entity name
- * and global id.
- */
-
-struct ceph_none_authorizer {
-       char buf[128];
-       int buf_len;
-       char reply_buf[0];
-};
-
-struct ceph_auth_none_info {
-       bool starting;
-       bool built_authorizer;
-       struct ceph_none_authorizer au;   /* we only need one; it's static */
-};
-
-extern int ceph_auth_none_init(struct ceph_auth_client *ac);
-
-#endif
-
diff --git a/fs/ceph/auth_x.c b/fs/ceph/auth_x.c

deleted file mode 100644 (file)

index a2d002c..0000000
--- a/fs/ceph/auth_x.c
+++ /dev/null
@@ -1,687 +0,0 @@
-
-#include "ceph_debug.h"
-
-#include <linux/err.h>
-#include <linux/module.h>
-#include <linux/random.h>
-#include <linux/slab.h>
-
-#include "auth_x.h"
-#include "auth_x_protocol.h"
-#include "crypto.h"
-#include "auth.h"
-#include "decode.h"
-
-#define TEMP_TICKET_BUF_LEN    256
-
-static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed);
-
-static int ceph_x_is_authenticated(struct ceph_auth_client *ac)
-{
-       struct ceph_x_info *xi = ac->private;
-       int need;
-
-       ceph_x_validate_tickets(ac, &need);
-       dout("ceph_x_is_authenticated want=%d need=%d have=%d\n",
-            ac->want_keys, need, xi->have_keys);
-       return (ac->want_keys & xi->have_keys) == ac->want_keys;
-}
-
-static int ceph_x_should_authenticate(struct ceph_auth_client *ac)
-{
-       struct ceph_x_info *xi = ac->private;
-       int need;
-
-       ceph_x_validate_tickets(ac, &need);
-       dout("ceph_x_should_authenticate want=%d need=%d have=%d\n",
-            ac->want_keys, need, xi->have_keys);
-       return need != 0;
-}
-
-static int ceph_x_encrypt_buflen(int ilen)
-{
-       return sizeof(struct ceph_x_encrypt_header) + ilen + 16 +
-               sizeof(u32);
-}
-
-static int ceph_x_encrypt(struct ceph_crypto_key *secret,
-                         void *ibuf, int ilen, void *obuf, size_t olen)
-{
-       struct ceph_x_encrypt_header head = {
-               .struct_v = 1,
-               .magic = cpu_to_le64(CEPHX_ENC_MAGIC)
-       };
-       size_t len = olen - sizeof(u32);
-       int ret;
-
-       ret = ceph_encrypt2(secret, obuf + sizeof(u32), &len,
-                           &head, sizeof(head), ibuf, ilen);
-       if (ret)
-               return ret;
-       ceph_encode_32(&obuf, len);
-       return len + sizeof(u32);
-}
-
-static int ceph_x_decrypt(struct ceph_crypto_key *secret,
-                         void **p, void *end, void *obuf, size_t olen)
-{
-       struct ceph_x_encrypt_header head;
-       size_t head_len = sizeof(head);
-       int len, ret;
-
-       len = ceph_decode_32(p);
-       if (*p + len > end)
-               return -EINVAL;
-
-       dout("ceph_x_decrypt len %d\n", len);
-       ret = ceph_decrypt2(secret, &head, &head_len, obuf, &olen,
-                           *p, len);
-       if (ret)
-               return ret;
-       if (head.struct_v != 1 || le64_to_cpu(head.magic) != CEPHX_ENC_MAGIC)
-               return -EPERM;
-       *p += len;
-       return olen;
-}
-
-/*
- * get existing (or insert new) ticket handler
- */
-static struct ceph_x_ticket_handler *
-get_ticket_handler(struct ceph_auth_client *ac, int service)
-{
-       struct ceph_x_ticket_handler *th;
-       struct ceph_x_info *xi = ac->private;
-       struct rb_node *parent = NULL, **p = &xi->ticket_handlers.rb_node;
-
-       while (*p) {
-               parent = *p;
-               th = rb_entry(parent, struct ceph_x_ticket_handler, node);
-               if (service < th->service)
-                       p = &(*p)->rb_left;
-               else if (service > th->service)
-                       p = &(*p)->rb_right;
-               else
-                       return th;
-       }
-
-       /* add it */
-       th = kzalloc(sizeof(*th), GFP_NOFS);
-       if (!th)
-               return ERR_PTR(-ENOMEM);
-       th->service = service;
-       rb_link_node(&th->node, parent, p);
-       rb_insert_color(&th->node, &xi->ticket_handlers);
-       return th;
-}
-
-static void remove_ticket_handler(struct ceph_auth_client *ac,
-                                 struct ceph_x_ticket_handler *th)
-{
-       struct ceph_x_info *xi = ac->private;
-
-       dout("remove_ticket_handler %p %d\n", th, th->service);
-       rb_erase(&th->node, &xi->ticket_handlers);
-       ceph_crypto_key_destroy(&th->session_key);
-       if (th->ticket_blob)
-               ceph_buffer_put(th->ticket_blob);
-       kfree(th);
-}
-
-static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
-                                   struct ceph_crypto_key *secret,
-                                   void *buf, void *end)
-{
-       struct ceph_x_info *xi = ac->private;
-       int num;
-       void *p = buf;
-       int ret;
-       char *dbuf;
-       char *ticket_buf;
-       u8 reply_struct_v;
-
-       dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
-       if (!dbuf)
-               return -ENOMEM;
-
-       ret = -ENOMEM;
-       ticket_buf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
-       if (!ticket_buf)
-               goto out_dbuf;
-
-       ceph_decode_need(&p, end, 1 + sizeof(u32), bad);
-       reply_struct_v = ceph_decode_8(&p);
-       if (reply_struct_v != 1)
-               goto bad;
-       num = ceph_decode_32(&p);
-       dout("%d tickets\n", num);
-       while (num--) {
-               int type;
-               u8 tkt_struct_v, blob_struct_v;
-               struct ceph_x_ticket_handler *th;
-               void *dp, *dend;
-               int dlen;
-               char is_enc;
-               struct timespec validity;
-               struct ceph_crypto_key old_key;
-               void *tp, *tpend;
-               struct ceph_timespec new_validity;
-               struct ceph_crypto_key new_session_key;
-               struct ceph_buffer *new_ticket_blob;
-               unsigned long new_expires, new_renew_after;
-               u64 new_secret_id;
-
-               ceph_decode_need(&p, end, sizeof(u32) + 1, bad);
-
-               type = ceph_decode_32(&p);
-               dout(" ticket type %d %s\n", type, ceph_entity_type_name(type));
-
-               tkt_struct_v = ceph_decode_8(&p);
-               if (tkt_struct_v != 1)
-                       goto bad;
-
-               th = get_ticket_handler(ac, type);
-               if (IS_ERR(th)) {
-                       ret = PTR_ERR(th);
-                       goto out;
-               }
-
-               /* blob for me */
-               dlen = ceph_x_decrypt(secret, &p, end, dbuf,
-                                     TEMP_TICKET_BUF_LEN);
-               if (dlen <= 0) {
-                       ret = dlen;
-                       goto out;
-               }
-               dout(" decrypted %d bytes\n", dlen);
-               dend = dbuf + dlen;
-               dp = dbuf;
-
-               tkt_struct_v = ceph_decode_8(&dp);
-               if (tkt_struct_v != 1)
-                       goto bad;
-
-               memcpy(&old_key, &th->session_key, sizeof(old_key));
-               ret = ceph_crypto_key_decode(&new_session_key, &dp, dend);
-               if (ret)
-                       goto out;
-
-               ceph_decode_copy(&dp, &new_validity, sizeof(new_validity));
-               ceph_decode_timespec(&validity, &new_validity);
-               new_expires = get_seconds() + validity.tv_sec;
-               new_renew_after = new_expires - (validity.tv_sec / 4);
-               dout(" expires=%lu renew_after=%lu\n", new_expires,
-                    new_renew_after);
-
-               /* ticket blob for service */
-               ceph_decode_8_safe(&p, end, is_enc, bad);
-               tp = ticket_buf;
-               if (is_enc) {
-                       /* encrypted */
-                       dout(" encrypted ticket\n");
-                       dlen = ceph_x_decrypt(&old_key, &p, end, ticket_buf,
-                                             TEMP_TICKET_BUF_LEN);
-                       if (dlen < 0) {
-                               ret = dlen;
-                               goto out;
-                       }
-                       dlen = ceph_decode_32(&tp);
-               } else {
-                       /* unencrypted */
-                       ceph_decode_32_safe(&p, end, dlen, bad);
-                       ceph_decode_need(&p, end, dlen, bad);
-                       ceph_decode_copy(&p, ticket_buf, dlen);
-               }
-               tpend = tp + dlen;
-               dout(" ticket blob is %d bytes\n", dlen);
-               ceph_decode_need(&tp, tpend, 1 + sizeof(u64), bad);
-               blob_struct_v = ceph_decode_8(&tp);
-               new_secret_id = ceph_decode_64(&tp);
-               ret = ceph_decode_buffer(&new_ticket_blob, &tp, tpend);
-               if (ret)
-                       goto out;
-
-               /* all is well, update our ticket */
-               ceph_crypto_key_destroy(&th->session_key);
-               if (th->ticket_blob)
-                       ceph_buffer_put(th->ticket_blob);
-               th->session_key = new_session_key;
-               th->ticket_blob = new_ticket_blob;
-               th->validity = new_validity;
-               th->secret_id = new_secret_id;
-               th->expires = new_expires;
-               th->renew_after = new_renew_after;
-               dout(" got ticket service %d (%s) secret_id %lld len %d\n",
-                    type, ceph_entity_type_name(type), th->secret_id,
-                    (int)th->ticket_blob->vec.iov_len);
-               xi->have_keys |= th->service;
-       }
-
-       ret = 0;
-out:
-       kfree(ticket_buf);
-out_dbuf:
-       kfree(dbuf);
-       return ret;
-
-bad:
-       ret = -EINVAL;
-       goto out;
-}
-
-static int ceph_x_build_authorizer(struct ceph_auth_client *ac,
-                                  struct ceph_x_ticket_handler *th,
-                                  struct ceph_x_authorizer *au)
-{
-       int maxlen;
-       struct ceph_x_authorize_a *msg_a;
-       struct ceph_x_authorize_b msg_b;
-       void *p, *end;
-       int ret;
-       int ticket_blob_len =
-               (th->ticket_blob ? th->ticket_blob->vec.iov_len : 0);
-
-       dout("build_authorizer for %s %p\n",
-            ceph_entity_type_name(th->service), au);
-
-       maxlen = sizeof(*msg_a) + sizeof(msg_b) +
-               ceph_x_encrypt_buflen(ticket_blob_len);
-       dout("  need len %d\n", maxlen);
-       if (au->buf && au->buf->alloc_len < maxlen) {
-               ceph_buffer_put(au->buf);
-               au->buf = NULL;
-       }
-       if (!au->buf) {
-               au->buf = ceph_buffer_new(maxlen, GFP_NOFS);
-               if (!au->buf)
-                       return -ENOMEM;
-       }
-       au->service = th->service;
-
-       msg_a = au->buf->vec.iov_base;
-       msg_a->struct_v = 1;
-       msg_a->global_id = cpu_to_le64(ac->global_id);
-       msg_a->service_id = cpu_to_le32(th->service);
-       msg_a->ticket_blob.struct_v = 1;
-       msg_a->ticket_blob.secret_id = cpu_to_le64(th->secret_id);
-       msg_a->ticket_blob.blob_len = cpu_to_le32(ticket_blob_len);
-       if (ticket_blob_len) {
-               memcpy(msg_a->ticket_blob.blob, th->ticket_blob->vec.iov_base,
-                      th->ticket_blob->vec.iov_len);
-       }
-       dout(" th %p secret_id %lld %lld\n", th, th->secret_id,
-            le64_to_cpu(msg_a->ticket_blob.secret_id));
-
-       p = msg_a + 1;
-       p += ticket_blob_len;
-       end = au->buf->vec.iov_base + au->buf->vec.iov_len;
-
-       get_random_bytes(&au->nonce, sizeof(au->nonce));
-       msg_b.struct_v = 1;
-       msg_b.nonce = cpu_to_le64(au->nonce);
-       ret = ceph_x_encrypt(&th->session_key, &msg_b, sizeof(msg_b),
-                            p, end - p);
-       if (ret < 0)
-               goto out_buf;
-       p += ret;
-       au->buf->vec.iov_len = p - au->buf->vec.iov_base;
-       dout(" built authorizer nonce %llx len %d\n", au->nonce,
-            (int)au->buf->vec.iov_len);
-       BUG_ON(au->buf->vec.iov_len > maxlen);
-       return 0;
-
-out_buf:
-       ceph_buffer_put(au->buf);
-       au->buf = NULL;
-       return ret;
-}
-
-static int ceph_x_encode_ticket(struct ceph_x_ticket_handler *th,
-                               void **p, void *end)
-{
-       ceph_decode_need(p, end, 1 + sizeof(u64), bad);
-       ceph_encode_8(p, 1);
-       ceph_encode_64(p, th->secret_id);
-       if (th->ticket_blob) {
-               const char *buf = th->ticket_blob->vec.iov_base;
-               u32 len = th->ticket_blob->vec.iov_len;
-
-               ceph_encode_32_safe(p, end, len, bad);
-               ceph_encode_copy_safe(p, end, buf, len, bad);
-       } else {
-               ceph_encode_32_safe(p, end, 0, bad);
-       }
-
-       return 0;
-bad:
-       return -ERANGE;
-}
-
-static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed)
-{
-       int want = ac->want_keys;
-       struct ceph_x_info *xi = ac->private;
-       int service;
-
-       *pneed = ac->want_keys & ~(xi->have_keys);
-
-       for (service = 1; service <= want; service <<= 1) {
-               struct ceph_x_ticket_handler *th;
-
-               if (!(ac->want_keys & service))
-                       continue;
-
-               if (*pneed & service)
-                       continue;
-
-               th = get_ticket_handler(ac, service);
-
-               if (IS_ERR(th)) {
-                       *pneed |= service;
-                       continue;
-               }
-
-               if (get_seconds() >= th->renew_after)
-                       *pneed |= service;
-               if (get_seconds() >= th->expires)
-                       xi->have_keys &= ~service;
-       }
-}
-
-
-static int ceph_x_build_request(struct ceph_auth_client *ac,
-                               void *buf, void *end)
-{
-       struct ceph_x_info *xi = ac->private;
-       int need;
-       struct ceph_x_request_header *head = buf;
-       int ret;
-       struct ceph_x_ticket_handler *th =
-               get_ticket_handler(ac, CEPH_ENTITY_TYPE_AUTH);
-
-       if (IS_ERR(th))
-               return PTR_ERR(th);
-
-       ceph_x_validate_tickets(ac, &need);
-
-       dout("build_request want %x have %x need %x\n",
-            ac->want_keys, xi->have_keys, need);
-
-       if (need & CEPH_ENTITY_TYPE_AUTH) {
-               struct ceph_x_authenticate *auth = (void *)(head + 1);
-               void *p = auth + 1;
-               struct ceph_x_challenge_blob tmp;
-               char tmp_enc[40];
-               u64 *u;
-
-               if (p > end)
-                       return -ERANGE;
-
-               dout(" get_auth_session_key\n");
-               head->op = cpu_to_le16(CEPHX_GET_AUTH_SESSION_KEY);
-
-               /* encrypt and hash */
-               get_random_bytes(&auth->client_challenge, sizeof(u64));
-               tmp.client_challenge = auth->client_challenge;
-               tmp.server_challenge = cpu_to_le64(xi->server_challenge);
-               ret = ceph_x_encrypt(&xi->secret, &tmp, sizeof(tmp),
-                                    tmp_enc, sizeof(tmp_enc));
-               if (ret < 0)
-                       return ret;
-
-               auth->struct_v = 1;
-               auth->key = 0;
-               for (u = (u64 *)tmp_enc; u + 1 <= (u64 *)(tmp_enc + ret); u++)
-                       auth->key ^= *(__le64 *)u;
-               dout(" server_challenge %llx client_challenge %llx key %llx\n",
-                    xi->server_challenge, le64_to_cpu(auth->client_challenge),
-                    le64_to_cpu(auth->key));
-
-               /* now encode the old ticket if exists */
-               ret = ceph_x_encode_ticket(th, &p, end);
-               if (ret < 0)
-                       return ret;
-
-               return p - buf;
-       }
-
-       if (need) {
-               void *p = head + 1;
-               struct ceph_x_service_ticket_request *req;
-
-               if (p > end)
-                       return -ERANGE;
-               head->op = cpu_to_le16(CEPHX_GET_PRINCIPAL_SESSION_KEY);
-
-               ret = ceph_x_build_authorizer(ac, th, &xi->auth_authorizer);
-               if (ret)
-                       return ret;
-               ceph_encode_copy(&p, xi->auth_authorizer.buf->vec.iov_base,
-                                xi->auth_authorizer.buf->vec.iov_len);
-
-               req = p;
-               req->keys = cpu_to_le32(need);
-               p += sizeof(*req);
-               return p - buf;
-       }
-
-       return 0;
-}
-
-static int ceph_x_handle_reply(struct ceph_auth_client *ac, int result,
-                              void *buf, void *end)
-{
-       struct ceph_x_info *xi = ac->private;
-       struct ceph_x_reply_header *head = buf;
-       struct ceph_x_ticket_handler *th;
-       int len = end - buf;
-       int op;
-       int ret;
-
-       if (result)
-               return result;  /* XXX hmm? */
-
-       if (xi->starting) {
-               /* it's a hello */
-               struct ceph_x_server_challenge *sc = buf;
-
-               if (len != sizeof(*sc))
-                       return -EINVAL;
-               xi->server_challenge = le64_to_cpu(sc->server_challenge);
-               dout("handle_reply got server challenge %llx\n",
-                    xi->server_challenge);
-               xi->starting = false;
-               xi->have_keys &= ~CEPH_ENTITY_TYPE_AUTH;
-               return -EAGAIN;
-       }
-
-       op = le16_to_cpu(head->op);
-       result = le32_to_cpu(head->result);
-       dout("handle_reply op %d result %d\n", op, result);
-       switch (op) {
-       case CEPHX_GET_AUTH_SESSION_KEY:
-               /* verify auth key */
-               ret = ceph_x_proc_ticket_reply(ac, &xi->secret,
-                                              buf + sizeof(*head), end);
-               break;
-
-       case CEPHX_GET_PRINCIPAL_SESSION_KEY:
-               th = get_ticket_handler(ac, CEPH_ENTITY_TYPE_AUTH);
-               if (IS_ERR(th))
-                       return PTR_ERR(th);
-               ret = ceph_x_proc_ticket_reply(ac, &th->session_key,
-                                              buf + sizeof(*head), end);
-               break;
-
-       default:
-               return -EINVAL;
-       }
-       if (ret)
-               return ret;
-       if (ac->want_keys == xi->have_keys)
-               return 0;
-       return -EAGAIN;
-}
-
-static int ceph_x_create_authorizer(
-       struct ceph_auth_client *ac, int peer_type,
-       struct ceph_authorizer **a,
-       void **buf, size_t *len,
-       void **reply_buf, size_t *reply_len)
-{
-       struct ceph_x_authorizer *au;
-       struct ceph_x_ticket_handler *th;
-       int ret;
-
-       th = get_ticket_handler(ac, peer_type);
-       if (IS_ERR(th))
-               return PTR_ERR(th);
-
-       au = kzalloc(sizeof(*au), GFP_NOFS);
-       if (!au)
-               return -ENOMEM;
-
-       ret = ceph_x_build_authorizer(ac, th, au);
-       if (ret) {
-               kfree(au);
-               return ret;
-       }
-
-       *a = (struct ceph_authorizer *)au;
-       *buf = au->buf->vec.iov_base;
-       *len = au->buf->vec.iov_len;
-       *reply_buf = au->reply_buf;
-       *reply_len = sizeof(au->reply_buf);
-       return 0;
-}
-
-static int ceph_x_verify_authorizer_reply(struct ceph_auth_client *ac,
-                                         struct ceph_authorizer *a, size_t len)
-{
-       struct ceph_x_authorizer *au = (void *)a;
-       struct ceph_x_ticket_handler *th;
-       int ret = 0;
-       struct ceph_x_authorize_reply reply;
-       void *p = au->reply_buf;
-       void *end = p + sizeof(au->reply_buf);
-
-       th = get_ticket_handler(ac, au->service);
-       if (IS_ERR(th))
-               return PTR_ERR(th);
-       ret = ceph_x_decrypt(&th->session_key, &p, end, &reply, sizeof(reply));
-       if (ret < 0)
-               return ret;
-       if (ret != sizeof(reply))
-               return -EPERM;
-
-       if (au->nonce + 1 != le64_to_cpu(reply.nonce_plus_one))
-               ret = -EPERM;
-       else
-               ret = 0;
-       dout("verify_authorizer_reply nonce %llx got %llx ret %d\n",
-            au->nonce, le64_to_cpu(reply.nonce_plus_one), ret);
-       return ret;
-}
-
-static void ceph_x_destroy_authorizer(struct ceph_auth_client *ac,
-                                     struct ceph_authorizer *a)
-{
-       struct ceph_x_authorizer *au = (void *)a;
-
-       ceph_buffer_put(au->buf);
-       kfree(au);
-}
-
-
-static void ceph_x_reset(struct ceph_auth_client *ac)
-{
-       struct ceph_x_info *xi = ac->private;
-
-       dout("reset\n");
-       xi->starting = true;
-       xi->server_challenge = 0;
-}
-
-static void ceph_x_destroy(struct ceph_auth_client *ac)
-{
-       struct ceph_x_info *xi = ac->private;
-       struct rb_node *p;
-
-       dout("ceph_x_destroy %p\n", ac);
-       ceph_crypto_key_destroy(&xi->secret);
-
-       while ((p = rb_first(&xi->ticket_handlers)) != NULL) {
-               struct ceph_x_ticket_handler *th =
-                       rb_entry(p, struct ceph_x_ticket_handler, node);
-               remove_ticket_handler(ac, th);
-       }
-
-       if (xi->auth_authorizer.buf)
-               ceph_buffer_put(xi->auth_authorizer.buf);
-
-       kfree(ac->private);
-       ac->private = NULL;
-}
-
-static void ceph_x_invalidate_authorizer(struct ceph_auth_client *ac,
-                                  int peer_type)
-{
-       struct ceph_x_ticket_handler *th;
-
-       th = get_ticket_handler(ac, peer_type);
-       if (!IS_ERR(th))
-               remove_ticket_handler(ac, th);
-}
-
-
-static const struct ceph_auth_client_ops ceph_x_ops = {
-       .name = "x",
-       .is_authenticated = ceph_x_is_authenticated,
-       .should_authenticate = ceph_x_should_authenticate,
-       .build_request = ceph_x_build_request,
-       .handle_reply = ceph_x_handle_reply,
-       .create_authorizer = ceph_x_create_authorizer,
-       .verify_authorizer_reply = ceph_x_verify_authorizer_reply,
-       .destroy_authorizer = ceph_x_destroy_authorizer,
-       .invalidate_authorizer = ceph_x_invalidate_authorizer,
-       .reset =  ceph_x_reset,
-       .destroy = ceph_x_destroy,
-};
-
-
-int ceph_x_init(struct ceph_auth_client *ac)
-{
-       struct ceph_x_info *xi;
-       int ret;
-
-       dout("ceph_x_init %p\n", ac);
-       ret = -ENOMEM;
-       xi = kzalloc(sizeof(*xi), GFP_NOFS);
-       if (!xi)
-               goto out;
-
-       ret = -EINVAL;
-       if (!ac->secret) {
-               pr_err("no secret set (for auth_x protocol)\n");
-               goto out_nomem;
-       }
-
-       ret = ceph_crypto_key_unarmor(&xi->secret, ac->secret);
-       if (ret)
-               goto out_nomem;
-
-       xi->starting = true;
-       xi->ticket_handlers = RB_ROOT;
-
-       ac->protocol = CEPH_AUTH_CEPHX;
-       ac->private = xi;
-       ac->ops = &ceph_x_ops;
-       return 0;
-
-out_nomem:
-       kfree(xi);
-out:
-       return ret;
-}
-
-
diff --git a/fs/ceph/auth_x.h b/fs/ceph/auth_x.h

deleted file mode 100644 (file)

index ff6f818..0000000
--- a/fs/ceph/auth_x.h
+++ /dev/null
@@ -1,49 +0,0 @@
-#ifndef _FS_CEPH_AUTH_X_H
-#define _FS_CEPH_AUTH_X_H
-
-#include <linux/rbtree.h>
-
-#include "crypto.h"
-#include "auth.h"
-#include "auth_x_protocol.h"
-
-/*
- * Handle ticket for a single service.
- */
-struct ceph_x_ticket_handler {
-       struct rb_node node;
-       unsigned service;
-
-       struct ceph_crypto_key session_key;
-       struct ceph_timespec validity;
-
-       u64 secret_id;
-       struct ceph_buffer *ticket_blob;
-
-       unsigned long renew_after, expires;
-};
-
-
-struct ceph_x_authorizer {
-       struct ceph_buffer *buf;
-       unsigned service;
-       u64 nonce;
-       char reply_buf[128];  /* big enough for encrypted blob */
-};
-
-struct ceph_x_info {
-       struct ceph_crypto_key secret;
-
-       bool starting;
-       u64 server_challenge;
-
-       unsigned have_keys;
-       struct rb_root ticket_handlers;
-
-       struct ceph_x_authorizer auth_authorizer;
-};
-
-extern int ceph_x_init(struct ceph_auth_client *ac);
-
-#endif
-
diff --git a/fs/ceph/auth_x_protocol.h b/fs/ceph/auth_x_protocol.h

deleted file mode 100644 (file)

index 671d305..0000000
--- a/fs/ceph/auth_x_protocol.h
+++ /dev/null
@@ -1,90 +0,0 @@
-#ifndef __FS_CEPH_AUTH_X_PROTOCOL
-#define __FS_CEPH_AUTH_X_PROTOCOL
-
-#define CEPHX_GET_AUTH_SESSION_KEY      0x0100
-#define CEPHX_GET_PRINCIPAL_SESSION_KEY 0x0200
-#define CEPHX_GET_ROTATING_KEY          0x0400
-
-/* common bits */
-struct ceph_x_ticket_blob {
-       __u8 struct_v;
-       __le64 secret_id;
-       __le32 blob_len;
-       char blob[];
-} __attribute__ ((packed));
-
-
-/* common request/reply headers */
-struct ceph_x_request_header {
-       __le16 op;
-} __attribute__ ((packed));
-
-struct ceph_x_reply_header {
-       __le16 op;
-       __le32 result;
-} __attribute__ ((packed));
-
-
-/* authenticate handshake */
-
-/* initial hello (no reply header) */
-struct ceph_x_server_challenge {
-       __u8 struct_v;
-       __le64 server_challenge;
-} __attribute__ ((packed));
-
-struct ceph_x_authenticate {
-       __u8 struct_v;
-       __le64 client_challenge;
-       __le64 key;
-       /* ticket blob */
-} __attribute__ ((packed));
-
-struct ceph_x_service_ticket_request {
-       __u8 struct_v;
-       __le32 keys;
-} __attribute__ ((packed));
-
-struct ceph_x_challenge_blob {
-       __le64 server_challenge;
-       __le64 client_challenge;
-} __attribute__ ((packed));
-
-
-
-/* authorize handshake */
-
-/*
- * The authorizer consists of two pieces:
- *  a - service id, ticket blob
- *  b - encrypted with session key
- */
-struct ceph_x_authorize_a {
-       __u8 struct_v;
-       __le64 global_id;
-       __le32 service_id;
-       struct ceph_x_ticket_blob ticket_blob;
-} __attribute__ ((packed));
-
-struct ceph_x_authorize_b {
-       __u8 struct_v;
-       __le64 nonce;
-} __attribute__ ((packed));
-
-struct ceph_x_authorize_reply {
-       __u8 struct_v;
-       __le64 nonce_plus_one;
-} __attribute__ ((packed));
-
-
-/*
- * encyption bundle
- */
-#define CEPHX_ENC_MAGIC 0xff009cad8826aa55ull
-
-struct ceph_x_encrypt_header {
-       __u8 struct_v;
-       __le64 magic;
-} __attribute__ ((packed));
-
-#endif
diff --git a/fs/ceph/buffer.c b/fs/ceph/buffer.c

deleted file mode 100644 (file)

index cd39f17..0000000
--- a/fs/ceph/buffer.c
+++ /dev/null
@@ -1,65 +0,0 @@
-
-#include "ceph_debug.h"
-
-#include <linux/slab.h>
-
-#include "buffer.h"
-#include "decode.h"
-
-struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
-{
-       struct ceph_buffer *b;
-
-       b = kmalloc(sizeof(*b), gfp);
-       if (!b)
-               return NULL;
-
-       b->vec.iov_base = kmalloc(len, gfp | __GFP_NOWARN);
-       if (b->vec.iov_base) {
-               b->is_vmalloc = false;
-       } else {
-               b->vec.iov_base = __vmalloc(len, gfp, PAGE_KERNEL);
-               if (!b->vec.iov_base) {
-                       kfree(b);
-                       return NULL;
-               }
-               b->is_vmalloc = true;
-       }
-
-       kref_init(&b->kref);
-       b->alloc_len = len;
-       b->vec.iov_len = len;
-       dout("buffer_new %p\n", b);
-       return b;
-}
-
-void ceph_buffer_release(struct kref *kref)
-{
-       struct ceph_buffer *b = container_of(kref, struct ceph_buffer, kref);
-
-       dout("buffer_release %p\n", b);
-       if (b->vec.iov_base) {
-               if (b->is_vmalloc)
-                       vfree(b->vec.iov_base);
-               else
-                       kfree(b->vec.iov_base);
-       }
-       kfree(b);
-}
-
-int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end)
-{
-       size_t len;
-
-       ceph_decode_need(p, end, sizeof(u32), bad);
-       len = ceph_decode_32(p);
-       dout("decode_buffer len %d\n", (int)len);
-       ceph_decode_need(p, end, len, bad);
-       *b = ceph_buffer_new(len, GFP_NOFS);
-       if (!*b)
-               return -ENOMEM;
-       ceph_decode_copy(p, (*b)->vec.iov_base, len);
-       return 0;
-bad:
-       return -EINVAL;
-}
diff --git a/fs/ceph/buffer.h b/fs/ceph/buffer.h

deleted file mode 100644 (file)

index 58d1901..0000000
--- a/fs/ceph/buffer.h
+++ /dev/null
@@ -1,39 +0,0 @@
-#ifndef __FS_CEPH_BUFFER_H
-#define __FS_CEPH_BUFFER_H
-
-#include <linux/kref.h>
-#include <linux/mm.h>
-#include <linux/vmalloc.h>
-#include <linux/types.h>
-#include <linux/uio.h>
-
-/*
- * a simple reference counted buffer.
- *
- * use kmalloc for small sizes (<= one page), vmalloc for larger
- * sizes.
- */
-struct ceph_buffer {
-       struct kref kref;
-       struct kvec vec;
-       size_t alloc_len;
-       bool is_vmalloc;
-};
-
-extern struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp);
-extern void ceph_buffer_release(struct kref *kref);
-
-static inline struct ceph_buffer *ceph_buffer_get(struct ceph_buffer *b)
-{
-       kref_get(&b->kref);
-       return b;
-}
-
-static inline void ceph_buffer_put(struct ceph_buffer *b)
-{
-       kref_put(&b->kref, ceph_buffer_release);
-}
-
-extern int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end);
-
-#endif
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c

index 73c153092f7292f20616108a3f2e61cc0630659e..98ab13e2b71d9d8ebcbf1715da112d78cb0bac17 100644 (file)
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1,4 +1,4 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/fs.h>
  #include <linux/kernel.h>
@@ -9,8 +9,9 @@
  #include <linux/writeback.h>
  
  #include "super.h"
-#include "decode.h"
-#include "messenger.h"
+#include "mds_client.h"
+#include <linux/ceph/decode.h>
+#include <linux/ceph/messenger.h>
  
  /*
   * Capability management
@@ -287,11 +288,11 @@ void ceph_put_cap(struct ceph_mds_client *mdsc, struct ceph_cap *cap)
         spin_unlock(&mdsc->caps_list_lock);
  }
  
-void ceph_reservation_status(struct ceph_client *client,
+void ceph_reservation_status(struct ceph_fs_client *fsc,
                              int *total, int *avail, int *used, int *reserved,
                              int *min)
  {
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_mds_client *mdsc = fsc->mdsc;
  
         if (total)
                 *total = mdsc->caps_total_count;
@@ -399,7 +400,7 @@ static void __insert_cap_node(struct ceph_inode_info *ci,
  static void __cap_set_timeouts(struct ceph_mds_client *mdsc,
                                struct ceph_inode_info *ci)
  {
-       struct ceph_mount_args *ma = mdsc->client->mount_args;
+       struct ceph_mount_options *ma = mdsc->fsc->mount_options;
  
         ci->i_hold_caps_min = round_jiffies(jiffies +
                                             ma->caps_wanted_delay_min * HZ);
@@ -515,7 +516,7 @@ int ceph_add_cap(struct inode *inode,
                  unsigned seq, unsigned mseq, u64 realmino, int flags,
                  struct ceph_cap_reservation *caps_reservation)
  {
-       struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
         struct ceph_inode_info *ci = ceph_inode(inode);
         struct ceph_cap *new_cap = NULL;
         struct ceph_cap *cap;
@@ -873,7 +874,7 @@ void __ceph_remove_cap(struct ceph_cap *cap)
         struct ceph_mds_session *session = cap->session;
         struct ceph_inode_info *ci = cap->ci;
         struct ceph_mds_client *mdsc =
-               &ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+               ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
         int removed = 0;
  
         dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
@@ -1210,7 +1211,7 @@ void __ceph_flush_snaps(struct ceph_inode_info *ci,
         int mds;
         struct ceph_cap_snap *capsnap;
         u32 mseq;
-       struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
         struct ceph_mds_session *session = NULL; /* if session != NULL, we hold
                                                     session->s_mutex */
         u64 next_follows = 0;  /* keep track of how far we've gotten through the
@@ -1336,7 +1337,7 @@ static void ceph_flush_snaps(struct ceph_inode_info *ci)
  void __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask)
  {
         struct ceph_mds_client *mdsc =
-               &ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+               ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
         struct inode *inode = &ci->vfs_inode;
         int was = ci->i_dirty_caps;
         int dirty = 0;
@@ -1378,7 +1379,7 @@ void __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask)
  static int __mark_caps_flushing(struct inode *inode,
                                  struct ceph_mds_session *session)
  {
-       struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
         struct ceph_inode_info *ci = ceph_inode(inode);
         int flushing;
  
@@ -1416,17 +1417,6 @@ static int __mark_caps_flushing(struct inode *inode,
  /*
   * try to invalidate mapping pages without blocking.
   */
-static int mapping_is_empty(struct address_space *mapping)
-{
-       struct page *page = find_get_page(mapping, 0);
-
-       if (!page)
-               return 1;
-
-       put_page(page);
-       return 0;
-}
-
  static int try_nonblocking_invalidate(struct inode *inode)
  {
         struct ceph_inode_info *ci = ceph_inode(inode);
@@ -1436,7 +1426,7 @@ static int try_nonblocking_invalidate(struct inode *inode)
         invalidate_mapping_pages(&inode->i_data, 0, -1);
         spin_lock(&inode->i_lock);
  
-       if (mapping_is_empty(&inode->i_data) &&
+       if (inode->i_data.nrpages == 0 &&
             invalidating_gen == ci->i_rdcache_gen) {
                 /* success. */
                 dout("try_nonblocking_invalidate %p success\n", inode);
@@ -1462,8 +1452,8 @@ static int try_nonblocking_invalidate(struct inode *inode)
  void ceph_check_caps(struct ceph_inode_info *ci, int flags,
                      struct ceph_mds_session *session)
  {
-       struct ceph_client *client = ceph_inode_to_client(&ci->vfs_inode);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_inode_to_client(&ci->vfs_inode);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct inode *inode = &ci->vfs_inode;
         struct ceph_cap *cap;
         int file_wanted, used;
@@ -1533,7 +1523,7 @@ retry_locked:
          */
         if ((!is_delayed || mdsc->stopping) &&
             ci->i_wrbuffer_ref == 0 &&               /* no dirty pages... */
-           ci->i_rdcache_gen &&                     /* may have cached pages */
+           inode->i_data.nrpages &&                 /* have cached pages */
             (file_wanted == 0 ||                     /* no open files */
              (revoking & (CEPH_CAP_FILE_CACHE|
                           CEPH_CAP_FILE_LAZYIO))) && /*  or revoking cache */
@@ -1706,7 +1696,7 @@ ack:
  static int try_flush_caps(struct inode *inode, struct ceph_mds_session *session,
                           unsigned *flush_tid)
  {
-       struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
         struct ceph_inode_info *ci = ceph_inode(inode);
         int unlock_session = session ? 0 : 1;
         int flushing = 0;
@@ -1872,7 +1862,7 @@ int ceph_write_inode(struct inode *inode, struct writeback_control *wbc)
                                        caps_are_flushed(inode, flush_tid));
         } else {
                 struct ceph_mds_client *mdsc =
-                       &ceph_sb_to_client(inode->i_sb)->mdsc;
+                       ceph_sb_to_client(inode->i_sb)->mdsc;
  
                 spin_lock(&inode->i_lock);
                 if (__ceph_caps_dirty(ci))
@@ -2283,7 +2273,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
  {
         struct ceph_inode_info *ci = ceph_inode(inode);
         int mds = session->s_mds;
-       int seq = le32_to_cpu(grant->seq);
+       unsigned seq = le32_to_cpu(grant->seq);
+       unsigned issue_seq = le32_to_cpu(grant->issue_seq);
         int newcaps = le32_to_cpu(grant->caps);
         int issued, implemented, used, wanted, dirty;
         u64 size = le64_to_cpu(grant->size);
@@ -2295,8 +2286,8 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
         int revoked_rdcache = 0;
         int queue_invalidate = 0;
  
-       dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n",
-            inode, cap, mds, seq, ceph_cap_string(newcaps));
+       dout("handle_cap_grant inode %p cap %p mds%d seq %u/%u %s\n",
+            inode, cap, mds, seq, issue_seq, ceph_cap_string(newcaps));
         dout(" size %llu max_size %llu, i_size %llu\n", size, max_size,
                 inode->i_size);
  
@@ -2392,6 +2383,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
         }
  
         cap->seq = seq;
+       cap->issue_seq = issue_seq;
  
         /* file layout may have changed */
         ci->i_layout = grant->layout;
@@ -2463,7 +2455,7 @@ static void handle_cap_flush_ack(struct inode *inode, u64 flush_tid,
         __releases(inode->i_lock)
  {
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
         unsigned seq = le32_to_cpu(m->seq);
         int dirty = le32_to_cpu(m->dirty);
         int cleaned = 0;
@@ -2711,7 +2703,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
                       struct ceph_msg *msg)
  {
         struct ceph_mds_client *mdsc = session->s_mdsc;
-       struct super_block *sb = mdsc->client->sb;
+       struct super_block *sb = mdsc->fsc->sb;
         struct inode *inode;
         struct ceph_cap *cap;
         struct ceph_mds_caps *h;
@@ -2774,15 +2766,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
                 if (op == CEPH_CAP_OP_IMPORT)
                         __queue_cap_release(session, vino.ino, cap_id,
                                             mseq, seq);
-
-               /*
-                * send any full release message to try to move things
-                * along for the mds (who clearly thinks we still have this
-                * cap).
-                */
-               ceph_add_cap_releases(mdsc, session);
-               ceph_send_cap_releases(mdsc, session);
-               goto done;
+               goto flush_cap_releases;
         }
  
         /* these will work even if we don't have a cap yet */
@@ -2810,7 +2794,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
                 dout(" no cap on %p ino %llx.%llx from mds%d\n",
                      inode, ceph_ino(inode), ceph_snap(inode), mds);
                 spin_unlock(&inode->i_lock);
-               goto done;
+               goto flush_cap_releases;
         }
  
         /* note that each of these drops i_lock for us */
@@ -2834,6 +2818,17 @@ void ceph_handle_caps(struct ceph_mds_session *session,
                        ceph_cap_op_name(op));
         }
  
+       goto done;
+
+flush_cap_releases:
+       /*
+        * send any full release message to try to move things
+        * along for the mds (who clearly thinks we still have this
+        * cap).
+        */
+       ceph_add_cap_releases(mdsc, session);
+       ceph_send_cap_releases(mdsc, session);
+
  done:
         mutex_unlock(&session->s_mutex);
  done_unlocked:
diff --git a/fs/ceph/ceph_debug.h b/fs/ceph/ceph_debug.h

deleted file mode 100644 (file)

index 1818c23..0000000
--- a/fs/ceph/ceph_debug.h
+++ /dev/null
@@ -1,37 +0,0 @@
-#ifndef _FS_CEPH_DEBUG_H
-#define _FS_CEPH_DEBUG_H
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#ifdef CONFIG_CEPH_FS_PRETTYDEBUG
-
-/*
- * wrap pr_debug to include a filename:lineno prefix on each line.
- * this incurs some overhead (kernel size and execution time) due to
- * the extra function call at each call site.
- */
-
-# if defined(DEBUG) || defined(CONFIG_DYNAMIC_DEBUG)
-extern const char *ceph_file_part(const char *s, int len);
-#  define dout(fmt, ...)                                               \
-       pr_debug(" %12.12s:%-4d : " fmt,                                \
-                ceph_file_part(__FILE__, sizeof(__FILE__)),            \
-                __LINE__, ##__VA_ARGS__)
-# else
-/* faux printk call just to see any compiler warnings. */
-#  define dout(fmt, ...)       do {                            \
-               if (0)                                          \
-                       printk(KERN_DEBUG fmt, ##__VA_ARGS__);  \
-       } while (0)
-# endif
-
-#else
-
-/*
- * or, just wrap pr_debug
- */
-# define dout(fmt, ...)        pr_debug(" " fmt, ##__VA_ARGS__)
-
-#endif
-
-#endif
diff --git a/fs/ceph/ceph_frag.c b/fs/ceph/ceph_frag.c

index ab6cf35c40919843e40f29137500e45a8170e5ab..bdce8b1fbd06794d9de7be918c9ab8aab97bcbd2 100644 (file)
--- a/fs/ceph/ceph_frag.c
+++ b/fs/ceph/ceph_frag.c
@@ -1,7 +1,8 @@
  /*
   * Ceph 'frag' type
   */
-#include "types.h"
+#include <linux/module.h>
+#include <linux/ceph/types.h>
  
  int ceph_frag_compare(__u32 a, __u32 b)
  {
diff --git a/fs/ceph/ceph_frag.h b/fs/ceph/ceph_frag.h

deleted file mode 100644 (file)

index 5babb8e..0000000
--- a/fs/ceph/ceph_frag.h
+++ /dev/null
@@ -1,109 +0,0 @@
-#ifndef FS_CEPH_FRAG_H
-#define FS_CEPH_FRAG_H
-
-/*
- * "Frags" are a way to describe a subset of a 32-bit number space,
- * using a mask and a value to match against that mask.  Any given frag
- * (subset of the number space) can be partitioned into 2^n sub-frags.
- *
- * Frags are encoded into a 32-bit word:
- *   8 upper bits = "bits"
- *  24 lower bits = "value"
- * (We could go to 5+27 bits, but who cares.)
- *
- * We use the _most_ significant bits of the 24 bit value.  This makes
- * values logically sort.
- *
- * Unfortunately, because the "bits" field is still in the high bits, we
- * can't sort encoded frags numerically.  However, it does allow you
- * to feed encoded frags as values into frag_contains_value.
- */
-static inline __u32 ceph_frag_make(__u32 b, __u32 v)
-{
-       return (b << 24) |
-               (v & (0xffffffu << (24-b)) & 0xffffffu);
-}
-static inline __u32 ceph_frag_bits(__u32 f)
-{
-       return f >> 24;
-}
-static inline __u32 ceph_frag_value(__u32 f)
-{
-       return f & 0xffffffu;
-}
-static inline __u32 ceph_frag_mask(__u32 f)
-{
-       return (0xffffffu << (24-ceph_frag_bits(f))) & 0xffffffu;
-}
-static inline __u32 ceph_frag_mask_shift(__u32 f)
-{
-       return 24 - ceph_frag_bits(f);
-}
-
-static inline int ceph_frag_contains_value(__u32 f, __u32 v)
-{
-       return (v & ceph_frag_mask(f)) == ceph_frag_value(f);
-}
-static inline int ceph_frag_contains_frag(__u32 f, __u32 sub)
-{
-       /* is sub as specific as us, and contained by us? */
-       return ceph_frag_bits(sub) >= ceph_frag_bits(f) &&
-              (ceph_frag_value(sub) & ceph_frag_mask(f)) == ceph_frag_value(f);
-}
-
-static inline __u32 ceph_frag_parent(__u32 f)
-{
-       return ceph_frag_make(ceph_frag_bits(f) - 1,
-                        ceph_frag_value(f) & (ceph_frag_mask(f) << 1));
-}
-static inline int ceph_frag_is_left_child(__u32 f)
-{
-       return ceph_frag_bits(f) > 0 &&
-               (ceph_frag_value(f) & (0x1000000 >> ceph_frag_bits(f))) == 0;
-}
-static inline int ceph_frag_is_right_child(__u32 f)
-{
-       return ceph_frag_bits(f) > 0 &&
-               (ceph_frag_value(f) & (0x1000000 >> ceph_frag_bits(f))) == 1;
-}
-static inline __u32 ceph_frag_sibling(__u32 f)
-{
-       return ceph_frag_make(ceph_frag_bits(f),
-                     ceph_frag_value(f) ^ (0x1000000 >> ceph_frag_bits(f)));
-}
-static inline __u32 ceph_frag_left_child(__u32 f)
-{
-       return ceph_frag_make(ceph_frag_bits(f)+1, ceph_frag_value(f));
-}
-static inline __u32 ceph_frag_right_child(__u32 f)
-{
-       return ceph_frag_make(ceph_frag_bits(f)+1,
-             ceph_frag_value(f) | (0x1000000 >> (1+ceph_frag_bits(f))));
-}
-static inline __u32 ceph_frag_make_child(__u32 f, int by, int i)
-{
-       int newbits = ceph_frag_bits(f) + by;
-       return ceph_frag_make(newbits,
-                        ceph_frag_value(f) | (i << (24 - newbits)));
-}
-static inline int ceph_frag_is_leftmost(__u32 f)
-{
-       return ceph_frag_value(f) == 0;
-}
-static inline int ceph_frag_is_rightmost(__u32 f)
-{
-       return ceph_frag_value(f) == ceph_frag_mask(f);
-}
-static inline __u32 ceph_frag_next(__u32 f)
-{
-       return ceph_frag_make(ceph_frag_bits(f),
-                        ceph_frag_value(f) + (0x1000000 >> ceph_frag_bits(f)));
-}
-
-/*
- * comparator to sort frags logically, as when traversing the
- * number space in ascending order...
- */
-int ceph_frag_compare(__u32 a, __u32 b);
-
-#endif
diff --git a/fs/ceph/ceph_fs.c b/fs/ceph/ceph_fs.c

deleted file mode 100644 (file)

index 3ac6cc7..0000000
--- a/fs/ceph/ceph_fs.c
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
- * Some non-inline ceph helpers
- */
-#include "types.h"
-
-/*
- * return true if @layout appears to be valid
- */
-int ceph_file_layout_is_valid(const struct ceph_file_layout *layout)
-{
-       __u32 su = le32_to_cpu(layout->fl_stripe_unit);
-       __u32 sc = le32_to_cpu(layout->fl_stripe_count);
-       __u32 os = le32_to_cpu(layout->fl_object_size);
-
-       /* stripe unit, object size must be non-zero, 64k increment */
-       if (!su || (su & (CEPH_MIN_STRIPE_UNIT-1)))
-               return 0;
-       if (!os || (os & (CEPH_MIN_STRIPE_UNIT-1)))
-               return 0;
-       /* object size must be a multiple of stripe unit */
-       if (os < su || os % su)
-               return 0;
-       /* stripe count must be non-zero */
-       if (!sc)
-               return 0;
-       return 1;
-}
-
-
-int ceph_flags_to_mode(int flags)
-{
-       int mode;
-
-#ifdef O_DIRECTORY  /* fixme */
-       if ((flags & O_DIRECTORY) == O_DIRECTORY)
-               return CEPH_FILE_MODE_PIN;
-#endif
-       if ((flags & O_APPEND) == O_APPEND)
-               flags |= O_WRONLY;
-
-       if ((flags & O_ACCMODE) == O_RDWR)
-               mode = CEPH_FILE_MODE_RDWR;
-       else if ((flags & O_ACCMODE) == O_WRONLY)
-               mode = CEPH_FILE_MODE_WR;
-       else
-               mode = CEPH_FILE_MODE_RD;
-
-#ifdef O_LAZY
-       if (flags & O_LAZY)
-               mode |= CEPH_FILE_MODE_LAZY;
-#endif
-
-       return mode;
-}
-
-int ceph_caps_for_mode(int mode)
-{
-       int caps = CEPH_CAP_PIN;
-
-       if (mode & CEPH_FILE_MODE_RD)
-               caps |= CEPH_CAP_FILE_SHARED |
-                       CEPH_CAP_FILE_RD | CEPH_CAP_FILE_CACHE;
-       if (mode & CEPH_FILE_MODE_WR)
-               caps |= CEPH_CAP_FILE_EXCL |
-                       CEPH_CAP_FILE_WR | CEPH_CAP_FILE_BUFFER |
-                       CEPH_CAP_AUTH_SHARED | CEPH_CAP_AUTH_EXCL |
-                       CEPH_CAP_XATTR_SHARED | CEPH_CAP_XATTR_EXCL;
-       if (mode & CEPH_FILE_MODE_LAZY)
-               caps |= CEPH_CAP_FILE_LAZYIO;
-
-       return caps;
-}
diff --git a/fs/ceph/ceph_fs.h b/fs/ceph/ceph_fs.h

deleted file mode 100644 (file)

index d5619ac..0000000
--- a/fs/ceph/ceph_fs.h
+++ /dev/null
@@ -1,728 +0,0 @@
-/*
- * ceph_fs.h - Ceph constants and data types to share between kernel and
- * user space.
- *
- * Most types in this file are defined as little-endian, and are
- * primarily intended to describe data structures that pass over the
- * wire or that are stored on disk.
- *
- * LGPL2
- */
-
-#ifndef CEPH_FS_H
-#define CEPH_FS_H
-
-#include "msgr.h"
-#include "rados.h"
-
-/*
- * subprotocol versions.  when specific messages types or high-level
- * protocols change, bump the affected components.  we keep rev
- * internal cluster protocols separately from the public,
- * client-facing protocol.
- */
-#define CEPH_OSD_PROTOCOL     8 /* cluster internal */
-#define CEPH_MDS_PROTOCOL    12 /* cluster internal */
-#define CEPH_MON_PROTOCOL     5 /* cluster internal */
-#define CEPH_OSDC_PROTOCOL   24 /* server/client */
-#define CEPH_MDSC_PROTOCOL   32 /* server/client */
-#define CEPH_MONC_PROTOCOL   15 /* server/client */
-
-
-#define CEPH_INO_ROOT  1
-#define CEPH_INO_CEPH  2        /* hidden .ceph dir */
-
-/* arbitrary limit on max # of monitors (cluster of 3 is typical) */
-#define CEPH_MAX_MON   31
-
-
-/*
- * feature bits
- */
-#define CEPH_FEATURE_UID            (1<<0)
-#define CEPH_FEATURE_NOSRCADDR      (1<<1)
-#define CEPH_FEATURE_MONCLOCKCHECK  (1<<2)
-#define CEPH_FEATURE_FLOCK          (1<<3)
-
-
-/*
- * ceph_file_layout - describe data layout for a file/inode
- */
-struct ceph_file_layout {
-       /* file -> object mapping */
-       __le32 fl_stripe_unit;     /* stripe unit, in bytes.  must be multiple
-                                     of page size. */
-       __le32 fl_stripe_count;    /* over this many objects */
-       __le32 fl_object_size;     /* until objects are this big, then move to
-                                     new objects */
-       __le32 fl_cas_hash;        /* 0 = none; 1 = sha256 */
-
-       /* pg -> disk layout */
-       __le32 fl_object_stripe_unit;  /* for per-object parity, if any */
-
-       /* object -> pg layout */
-       __le32 fl_pg_preferred; /* preferred primary for pg (-1 for none) */
-       __le32 fl_pg_pool;      /* namespace, crush ruleset, rep level */
-} __attribute__ ((packed));
-
-#define CEPH_MIN_STRIPE_UNIT 65536
-
-int ceph_file_layout_is_valid(const struct ceph_file_layout *layout);
-
-
-/* crypto algorithms */
-#define CEPH_CRYPTO_NONE 0x0
-#define CEPH_CRYPTO_AES  0x1
-
-#define CEPH_AES_IV "cephsageyudagreg"
-
-/* security/authentication protocols */
-#define CEPH_AUTH_UNKNOWN      0x0
-#define CEPH_AUTH_NONE         0x1
-#define CEPH_AUTH_CEPHX                0x2
-
-#define CEPH_AUTH_UID_DEFAULT ((__u64) -1)
-
-
-/*********************************************
- * message layer
- */
-
-/*
- * message types
- */
-
-/* misc */
-#define CEPH_MSG_SHUTDOWN               1
-#define CEPH_MSG_PING                   2
-
-/* client <-> monitor */
-#define CEPH_MSG_MON_MAP                4
-#define CEPH_MSG_MON_GET_MAP            5
-#define CEPH_MSG_STATFS                 13
-#define CEPH_MSG_STATFS_REPLY           14
-#define CEPH_MSG_MON_SUBSCRIBE          15
-#define CEPH_MSG_MON_SUBSCRIBE_ACK      16
-#define CEPH_MSG_AUTH                  17
-#define CEPH_MSG_AUTH_REPLY            18
-
-/* client <-> mds */
-#define CEPH_MSG_MDS_MAP                21
-
-#define CEPH_MSG_CLIENT_SESSION         22
-#define CEPH_MSG_CLIENT_RECONNECT       23
-
-#define CEPH_MSG_CLIENT_REQUEST         24
-#define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
-#define CEPH_MSG_CLIENT_REPLY           26
-#define CEPH_MSG_CLIENT_CAPS            0x310
-#define CEPH_MSG_CLIENT_LEASE           0x311
-#define CEPH_MSG_CLIENT_SNAP            0x312
-#define CEPH_MSG_CLIENT_CAPRELEASE      0x313
-
-/* pool ops */
-#define CEPH_MSG_POOLOP_REPLY           48
-#define CEPH_MSG_POOLOP                 49
-
-
-/* osd */
-#define CEPH_MSG_OSD_MAP          41
-#define CEPH_MSG_OSD_OP           42
-#define CEPH_MSG_OSD_OPREPLY      43
-
-/* pool operations */
-enum {
-  POOL_OP_CREATE                       = 0x01,
-  POOL_OP_DELETE                       = 0x02,
-  POOL_OP_AUID_CHANGE                  = 0x03,
-  POOL_OP_CREATE_SNAP                  = 0x11,
-  POOL_OP_DELETE_SNAP                  = 0x12,
-  POOL_OP_CREATE_UNMANAGED_SNAP                = 0x21,
-  POOL_OP_DELETE_UNMANAGED_SNAP                = 0x22,
-};
-
-struct ceph_mon_request_header {
-       __le64 have_version;
-       __le16 session_mon;
-       __le64 session_mon_tid;
-} __attribute__ ((packed));
-
-struct ceph_mon_statfs {
-       struct ceph_mon_request_header monhdr;
-       struct ceph_fsid fsid;
-} __attribute__ ((packed));
-
-struct ceph_statfs {
-       __le64 kb, kb_used, kb_avail;
-       __le64 num_objects;
-} __attribute__ ((packed));
-
-struct ceph_mon_statfs_reply {
-       struct ceph_fsid fsid;
-       __le64 version;
-       struct ceph_statfs st;
-} __attribute__ ((packed));
-
-const char *ceph_pool_op_name(int op);
-
-struct ceph_mon_poolop {
-       struct ceph_mon_request_header monhdr;
-       struct ceph_fsid fsid;
-       __le32 pool;
-       __le32 op;
-       __le64 auid;
-       __le64 snapid;
-       __le32 name_len;
-} __attribute__ ((packed));
-
-struct ceph_mon_poolop_reply {
-       struct ceph_mon_request_header monhdr;
-       struct ceph_fsid fsid;
-       __le32 reply_code;
-       __le32 epoch;
-       char has_data;
-       char data[0];
-} __attribute__ ((packed));
-
-struct ceph_mon_unmanaged_snap {
-       __le64 snapid;
-} __attribute__ ((packed));
-
-struct ceph_osd_getmap {
-       struct ceph_mon_request_header monhdr;
-       struct ceph_fsid fsid;
-       __le32 start;
-} __attribute__ ((packed));
-
-struct ceph_mds_getmap {
-       struct ceph_mon_request_header monhdr;
-       struct ceph_fsid fsid;
-} __attribute__ ((packed));
-
-struct ceph_client_mount {
-       struct ceph_mon_request_header monhdr;
-} __attribute__ ((packed));
-
-struct ceph_mon_subscribe_item {
-       __le64 have_version;    __le64 have;
-       __u8 onetime;
-} __attribute__ ((packed));
-
-struct ceph_mon_subscribe_ack {
-       __le32 duration;         /* seconds */
-       struct ceph_fsid fsid;
-} __attribute__ ((packed));
-
-/*
- * mds states
- *   > 0 -> in
- *  <= 0 -> out
- */
-#define CEPH_MDS_STATE_DNE          0  /* down, does not exist. */
-#define CEPH_MDS_STATE_STOPPED     -1  /* down, once existed, but no subtrees.
-                                         empty log. */
-#define CEPH_MDS_STATE_BOOT        -4  /* up, boot announcement. */
-#define CEPH_MDS_STATE_STANDBY     -5  /* up, idle.  waiting for assignment. */
-#define CEPH_MDS_STATE_CREATING    -6  /* up, creating MDS instance. */
-#define CEPH_MDS_STATE_STARTING    -7  /* up, starting previously stopped mds */
-#define CEPH_MDS_STATE_STANDBY_REPLAY -8 /* up, tailing active node's journal */
-
-#define CEPH_MDS_STATE_REPLAY       8  /* up, replaying journal. */
-#define CEPH_MDS_STATE_RESOLVE      9  /* up, disambiguating distributed
-                                         operations (import, rename, etc.) */
-#define CEPH_MDS_STATE_RECONNECT    10 /* up, reconnect to clients */
-#define CEPH_MDS_STATE_REJOIN       11 /* up, rejoining distributed cache */
-#define CEPH_MDS_STATE_CLIENTREPLAY 12 /* up, replaying client operations */
-#define CEPH_MDS_STATE_ACTIVE       13 /* up, active */
-#define CEPH_MDS_STATE_STOPPING     14 /* up, but exporting metadata */
-
-extern const char *ceph_mds_state_name(int s);
-
-
-/*
- * metadata lock types.
- *  - these are bitmasks.. we can compose them
- *  - they also define the lock ordering by the MDS
- *  - a few of these are internal to the mds
- */
-#define CEPH_LOCK_DVERSION    1
-#define CEPH_LOCK_DN          2
-#define CEPH_LOCK_ISNAP       16
-#define CEPH_LOCK_IVERSION    32    /* mds internal */
-#define CEPH_LOCK_IFILE       64
-#define CEPH_LOCK_IAUTH       128
-#define CEPH_LOCK_ILINK       256
-#define CEPH_LOCK_IDFT        512   /* dir frag tree */
-#define CEPH_LOCK_INEST       1024  /* mds internal */
-#define CEPH_LOCK_IXATTR      2048
-#define CEPH_LOCK_IFLOCK      4096  /* advisory file locks */
-#define CEPH_LOCK_INO         8192  /* immutable inode bits; not a lock */
-
-/* client_session ops */
-enum {
-       CEPH_SESSION_REQUEST_OPEN,
-       CEPH_SESSION_OPEN,
-       CEPH_SESSION_REQUEST_CLOSE,
-       CEPH_SESSION_CLOSE,
-       CEPH_SESSION_REQUEST_RENEWCAPS,
-       CEPH_SESSION_RENEWCAPS,
-       CEPH_SESSION_STALE,
-       CEPH_SESSION_RECALL_STATE,
-};
-
-extern const char *ceph_session_op_name(int op);
-
-struct ceph_mds_session_head {
-       __le32 op;
-       __le64 seq;
-       struct ceph_timespec stamp;
-       __le32 max_caps, max_leases;
-} __attribute__ ((packed));
-
-/* client_request */
-/*
- * metadata ops.
- *  & 0x001000 -> write op
- *  & 0x010000 -> follow symlink (e.g. stat(), not lstat()).
- &  & 0x100000 -> use weird ino/path trace
- */
-#define CEPH_MDS_OP_WRITE        0x001000
-enum {
-       CEPH_MDS_OP_LOOKUP     = 0x00100,
-       CEPH_MDS_OP_GETATTR    = 0x00101,
-       CEPH_MDS_OP_LOOKUPHASH = 0x00102,
-       CEPH_MDS_OP_LOOKUPPARENT = 0x00103,
-
-       CEPH_MDS_OP_SETXATTR   = 0x01105,
-       CEPH_MDS_OP_RMXATTR    = 0x01106,
-       CEPH_MDS_OP_SETLAYOUT  = 0x01107,
-       CEPH_MDS_OP_SETATTR    = 0x01108,
-       CEPH_MDS_OP_SETFILELOCK= 0x01109,
-       CEPH_MDS_OP_GETFILELOCK= 0x00110,
-
-       CEPH_MDS_OP_MKNOD      = 0x01201,
-       CEPH_MDS_OP_LINK       = 0x01202,
-       CEPH_MDS_OP_UNLINK     = 0x01203,
-       CEPH_MDS_OP_RENAME     = 0x01204,
-       CEPH_MDS_OP_MKDIR      = 0x01220,
-       CEPH_MDS_OP_RMDIR      = 0x01221,
-       CEPH_MDS_OP_SYMLINK    = 0x01222,
-
-       CEPH_MDS_OP_CREATE     = 0x01301,
-       CEPH_MDS_OP_OPEN       = 0x00302,
-       CEPH_MDS_OP_READDIR    = 0x00305,
-
-       CEPH_MDS_OP_LOOKUPSNAP = 0x00400,
-       CEPH_MDS_OP_MKSNAP     = 0x01400,
-       CEPH_MDS_OP_RMSNAP     = 0x01401,
-       CEPH_MDS_OP_LSSNAP     = 0x00402,
-};
-
-extern const char *ceph_mds_op_name(int op);
-
-
-#define CEPH_SETATTR_MODE   1
-#define CEPH_SETATTR_UID    2
-#define CEPH_SETATTR_GID    4
-#define CEPH_SETATTR_MTIME  8
-#define CEPH_SETATTR_ATIME 16
-#define CEPH_SETATTR_SIZE  32
-#define CEPH_SETATTR_CTIME 64
-
-union ceph_mds_request_args {
-       struct {
-               __le32 mask;                 /* CEPH_CAP_* */
-       } __attribute__ ((packed)) getattr;
-       struct {
-               __le32 mode;
-               __le32 uid;
-               __le32 gid;
-               struct ceph_timespec mtime;
-               struct ceph_timespec atime;
-               __le64 size, old_size;       /* old_size needed by truncate */
-               __le32 mask;                 /* CEPH_SETATTR_* */
-       } __attribute__ ((packed)) setattr;
-       struct {
-               __le32 frag;                 /* which dir fragment */
-               __le32 max_entries;          /* how many dentries to grab */
-               __le32 max_bytes;
-       } __attribute__ ((packed)) readdir;
-       struct {
-               __le32 mode;
-               __le32 rdev;
-       } __attribute__ ((packed)) mknod;
-       struct {
-               __le32 mode;
-       } __attribute__ ((packed)) mkdir;
-       struct {
-               __le32 flags;
-               __le32 mode;
-               __le32 stripe_unit;          /* layout for newly created file */
-               __le32 stripe_count;         /* ... */
-               __le32 object_size;
-               __le32 file_replication;
-               __le32 preferred;
-       } __attribute__ ((packed)) open;
-       struct {
-               __le32 flags;
-       } __attribute__ ((packed)) setxattr;
-       struct {
-               struct ceph_file_layout layout;
-       } __attribute__ ((packed)) setlayout;
-       struct {
-               __u8 rule; /* currently fcntl or flock */
-               __u8 type; /* shared, exclusive, remove*/
-               __le64 pid; /* process id requesting the lock */
-               __le64 pid_namespace;
-               __le64 start; /* initial location to lock */
-               __le64 length; /* num bytes to lock from start */
-               __u8 wait; /* will caller wait for lock to become available? */
-       } __attribute__ ((packed)) filelock_change;
-} __attribute__ ((packed));
-
-#define CEPH_MDS_FLAG_REPLAY        1  /* this is a replayed op */
-#define CEPH_MDS_FLAG_WANT_DENTRY   2  /* want dentry in reply */
-
-struct ceph_mds_request_head {
-       __le64 oldest_client_tid;
-       __le32 mdsmap_epoch;           /* on client */
-       __le32 flags;                  /* CEPH_MDS_FLAG_* */
-       __u8 num_retry, num_fwd;       /* count retry, fwd attempts */
-       __le16 num_releases;           /* # include cap/lease release records */
-       __le32 op;                     /* mds op code */
-       __le32 caller_uid, caller_gid;
-       __le64 ino;                    /* use this ino for openc, mkdir, mknod,
-                                         etc. (if replaying) */
-       union ceph_mds_request_args args;
-} __attribute__ ((packed));
-
-/* cap/lease release record */
-struct ceph_mds_request_release {
-       __le64 ino, cap_id;            /* ino and unique cap id */
-       __le32 caps, wanted;           /* new issued, wanted */
-       __le32 seq, issue_seq, mseq;
-       __le32 dname_seq;              /* if releasing a dentry lease, a */
-       __le32 dname_len;              /* string follows. */
-} __attribute__ ((packed));
-
-/* client reply */
-struct ceph_mds_reply_head {
-       __le32 op;
-       __le32 result;
-       __le32 mdsmap_epoch;
-       __u8 safe;                     /* true if committed to disk */
-       __u8 is_dentry, is_target;     /* true if dentry, target inode records
-                                         are included with reply */
-} __attribute__ ((packed));
-
-/* one for each node split */
-struct ceph_frag_tree_split {
-       __le32 frag;                   /* this frag splits... */
-       __le32 by;                     /* ...by this many bits */
-} __attribute__ ((packed));
-
-struct ceph_frag_tree_head {
-       __le32 nsplits;                /* num ceph_frag_tree_split records */
-       struct ceph_frag_tree_split splits[];
-} __attribute__ ((packed));
-
-/* capability issue, for bundling with mds reply */
-struct ceph_mds_reply_cap {
-       __le32 caps, wanted;           /* caps issued, wanted */
-       __le64 cap_id;
-       __le32 seq, mseq;
-       __le64 realm;                  /* snap realm */
-       __u8 flags;                    /* CEPH_CAP_FLAG_* */
-} __attribute__ ((packed));
-
-#define CEPH_CAP_FLAG_AUTH  1          /* cap is issued by auth mds */
-
-/* inode record, for bundling with mds reply */
-struct ceph_mds_reply_inode {
-       __le64 ino;
-       __le64 snapid;
-       __le32 rdev;
-       __le64 version;                /* inode version */
-       __le64 xattr_version;          /* version for xattr blob */
-       struct ceph_mds_reply_cap cap; /* caps issued for this inode */
-       struct ceph_file_layout layout;
-       struct ceph_timespec ctime, mtime, atime;
-       __le32 time_warp_seq;
-       __le64 size, max_size, truncate_size;
-       __le32 truncate_seq;
-       __le32 mode, uid, gid;
-       __le32 nlink;
-       __le64 files, subdirs, rbytes, rfiles, rsubdirs;  /* dir stats */
-       struct ceph_timespec rctime;
-       struct ceph_frag_tree_head fragtree;  /* (must be at end of struct) */
-} __attribute__ ((packed));
-/* followed by frag array, then symlink string, then xattr blob */
-
-/* reply_lease follows dname, and reply_inode */
-struct ceph_mds_reply_lease {
-       __le16 mask;            /* lease type(s) */
-       __le32 duration_ms;     /* lease duration */
-       __le32 seq;
-} __attribute__ ((packed));
-
-struct ceph_mds_reply_dirfrag {
-       __le32 frag;            /* fragment */
-       __le32 auth;            /* auth mds, if this is a delegation point */
-       __le32 ndist;           /* number of mds' this is replicated on */
-       __le32 dist[];
-} __attribute__ ((packed));
-
-#define CEPH_LOCK_FCNTL    1
-#define CEPH_LOCK_FLOCK    2
-
-#define CEPH_LOCK_SHARED   1
-#define CEPH_LOCK_EXCL     2
-#define CEPH_LOCK_UNLOCK   4
-
-struct ceph_filelock {
-       __le64 start;/* file offset to start lock at */
-       __le64 length; /* num bytes to lock; 0 for all following start */
-       __le64 client; /* which client holds the lock */
-       __le64 pid; /* process id holding the lock on the client */
-       __le64 pid_namespace;
-       __u8 type; /* shared lock, exclusive lock, or unlock */
-} __attribute__ ((packed));
-
-
-/* file access modes */
-#define CEPH_FILE_MODE_PIN        0
-#define CEPH_FILE_MODE_RD         1
-#define CEPH_FILE_MODE_WR         2
-#define CEPH_FILE_MODE_RDWR       3  /* RD | WR */
-#define CEPH_FILE_MODE_LAZY       4  /* lazy io */
-#define CEPH_FILE_MODE_NUM        8  /* bc these are bit fields.. mostly */
-
-int ceph_flags_to_mode(int flags);
-
-
-/* capability bits */
-#define CEPH_CAP_PIN         1  /* no specific capabilities beyond the pin */
-
-/* generic cap bits */
-#define CEPH_CAP_GSHARED     1  /* client can reads */
-#define CEPH_CAP_GEXCL       2  /* client can read and update */
-#define CEPH_CAP_GCACHE      4  /* (file) client can cache reads */
-#define CEPH_CAP_GRD         8  /* (file) client can read */
-#define CEPH_CAP_GWR        16  /* (file) client can write */
-#define CEPH_CAP_GBUFFER    32  /* (file) client can buffer writes */
-#define CEPH_CAP_GWREXTEND  64  /* (file) client can extend EOF */
-#define CEPH_CAP_GLAZYIO   128  /* (file) client can perform lazy io */
-
-/* per-lock shift */
-#define CEPH_CAP_SAUTH      2
-#define CEPH_CAP_SLINK      4
-#define CEPH_CAP_SXATTR     6
-#define CEPH_CAP_SFILE      8
-#define CEPH_CAP_SFLOCK    20 
-
-#define CEPH_CAP_BITS       22
-
-/* composed values */
-#define CEPH_CAP_AUTH_SHARED  (CEPH_CAP_GSHARED  << CEPH_CAP_SAUTH)
-#define CEPH_CAP_AUTH_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SAUTH)
-#define CEPH_CAP_LINK_SHARED  (CEPH_CAP_GSHARED  << CEPH_CAP_SLINK)
-#define CEPH_CAP_LINK_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SLINK)
-#define CEPH_CAP_XATTR_SHARED (CEPH_CAP_GSHARED  << CEPH_CAP_SXATTR)
-#define CEPH_CAP_XATTR_EXCL    (CEPH_CAP_GEXCL     << CEPH_CAP_SXATTR)
-#define CEPH_CAP_FILE(x)    (x << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_SHARED   (CEPH_CAP_GSHARED   << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_CACHE    (CEPH_CAP_GCACHE    << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_RD       (CEPH_CAP_GRD       << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_WR       (CEPH_CAP_GWR       << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_BUFFER   (CEPH_CAP_GBUFFER   << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_WREXTEND (CEPH_CAP_GWREXTEND << CEPH_CAP_SFILE)
-#define CEPH_CAP_FILE_LAZYIO   (CEPH_CAP_GLAZYIO   << CEPH_CAP_SFILE)
-#define CEPH_CAP_FLOCK_SHARED  (CEPH_CAP_GSHARED   << CEPH_CAP_SFLOCK)
-#define CEPH_CAP_FLOCK_EXCL    (CEPH_CAP_GEXCL     << CEPH_CAP_SFLOCK)
-
-
-/* cap masks (for getattr) */
-#define CEPH_STAT_CAP_INODE    CEPH_CAP_PIN
-#define CEPH_STAT_CAP_TYPE     CEPH_CAP_PIN  /* mode >> 12 */
-#define CEPH_STAT_CAP_SYMLINK  CEPH_CAP_PIN
-#define CEPH_STAT_CAP_UID      CEPH_CAP_AUTH_SHARED
-#define CEPH_STAT_CAP_GID      CEPH_CAP_AUTH_SHARED
-#define CEPH_STAT_CAP_MODE     CEPH_CAP_AUTH_SHARED
-#define CEPH_STAT_CAP_NLINK    CEPH_CAP_LINK_SHARED
-#define CEPH_STAT_CAP_LAYOUT   CEPH_CAP_FILE_SHARED
-#define CEPH_STAT_CAP_MTIME    CEPH_CAP_FILE_SHARED
-#define CEPH_STAT_CAP_SIZE     CEPH_CAP_FILE_SHARED
-#define CEPH_STAT_CAP_ATIME    CEPH_CAP_FILE_SHARED  /* fixme */
-#define CEPH_STAT_CAP_XATTR    CEPH_CAP_XATTR_SHARED
-#define CEPH_STAT_CAP_INODE_ALL (CEPH_CAP_PIN |                        \
-                                CEPH_CAP_AUTH_SHARED | \
-                                CEPH_CAP_LINK_SHARED | \
-                                CEPH_CAP_FILE_SHARED | \
-                                CEPH_CAP_XATTR_SHARED)
-
-#define CEPH_CAP_ANY_SHARED (CEPH_CAP_AUTH_SHARED |                    \
-                             CEPH_CAP_LINK_SHARED |                    \
-                             CEPH_CAP_XATTR_SHARED |                   \
-                             CEPH_CAP_FILE_SHARED)
-#define CEPH_CAP_ANY_RD   (CEPH_CAP_ANY_SHARED | CEPH_CAP_FILE_RD |    \
-                          CEPH_CAP_FILE_CACHE)
-
-#define CEPH_CAP_ANY_EXCL (CEPH_CAP_AUTH_EXCL |                \
-                          CEPH_CAP_LINK_EXCL |         \
-                          CEPH_CAP_XATTR_EXCL |        \
-                          CEPH_CAP_FILE_EXCL)
-#define CEPH_CAP_ANY_FILE_WR (CEPH_CAP_FILE_WR | CEPH_CAP_FILE_BUFFER |        \
-                             CEPH_CAP_FILE_EXCL)
-#define CEPH_CAP_ANY_WR   (CEPH_CAP_ANY_EXCL | CEPH_CAP_ANY_FILE_WR)
-#define CEPH_CAP_ANY      (CEPH_CAP_ANY_RD | CEPH_CAP_ANY_EXCL | \
-                          CEPH_CAP_ANY_FILE_WR | CEPH_CAP_FILE_LAZYIO | \
-                          CEPH_CAP_PIN)
-
-#define CEPH_CAP_LOCKS (CEPH_LOCK_IFILE | CEPH_LOCK_IAUTH | CEPH_LOCK_ILINK | \
-                       CEPH_LOCK_IXATTR)
-
-int ceph_caps_for_mode(int mode);
-
-enum {
-       CEPH_CAP_OP_GRANT,         /* mds->client grant */
-       CEPH_CAP_OP_REVOKE,        /* mds->client revoke */
-       CEPH_CAP_OP_TRUNC,         /* mds->client trunc notify */
-       CEPH_CAP_OP_EXPORT,        /* mds has exported the cap */
-       CEPH_CAP_OP_IMPORT,        /* mds has imported the cap */
-       CEPH_CAP_OP_UPDATE,        /* client->mds update */
-       CEPH_CAP_OP_DROP,          /* client->mds drop cap bits */
-       CEPH_CAP_OP_FLUSH,         /* client->mds cap writeback */
-       CEPH_CAP_OP_FLUSH_ACK,     /* mds->client flushed */
-       CEPH_CAP_OP_FLUSHSNAP,     /* client->mds flush snapped metadata */
-       CEPH_CAP_OP_FLUSHSNAP_ACK, /* mds->client flushed snapped metadata */
-       CEPH_CAP_OP_RELEASE,       /* client->mds release (clean) cap */
-       CEPH_CAP_OP_RENEW,         /* client->mds renewal request */
-};
-
-extern const char *ceph_cap_op_name(int op);
-
-/*
- * caps message, used for capability callbacks, acks, requests, etc.
- */
-struct ceph_mds_caps {
-       __le32 op;                  /* CEPH_CAP_OP_* */
-       __le64 ino, realm;
-       __le64 cap_id;
-       __le32 seq, issue_seq;
-       __le32 caps, wanted, dirty; /* latest issued/wanted/dirty */
-       __le32 migrate_seq;
-       __le64 snap_follows;
-       __le32 snap_trace_len;
-
-       /* authlock */
-       __le32 uid, gid, mode;
-
-       /* linklock */
-       __le32 nlink;
-
-       /* xattrlock */
-       __le32 xattr_len;
-       __le64 xattr_version;
-
-       /* filelock */
-       __le64 size, max_size, truncate_size;
-       __le32 truncate_seq;
-       struct ceph_timespec mtime, atime, ctime;
-       struct ceph_file_layout layout;
-       __le32 time_warp_seq;
-} __attribute__ ((packed));
-
-/* cap release msg head */
-struct ceph_mds_cap_release {
-       __le32 num;                /* number of cap_items that follow */
-} __attribute__ ((packed));
-
-struct ceph_mds_cap_item {
-       __le64 ino;
-       __le64 cap_id;
-       __le32 migrate_seq, seq;
-} __attribute__ ((packed));
-
-#define CEPH_MDS_LEASE_REVOKE           1  /*    mds  -> client */
-#define CEPH_MDS_LEASE_RELEASE          2  /* client  -> mds    */
-#define CEPH_MDS_LEASE_RENEW            3  /* client <-> mds    */
-#define CEPH_MDS_LEASE_REVOKE_ACK       4  /* client  -> mds    */
-
-extern const char *ceph_lease_op_name(int o);
-
-/* lease msg header */
-struct ceph_mds_lease {
-       __u8 action;            /* CEPH_MDS_LEASE_* */
-       __le16 mask;            /* which lease */
-       __le64 ino;
-       __le64 first, last;     /* snap range */
-       __le32 seq;
-       __le32 duration_ms;     /* duration of renewal */
-} __attribute__ ((packed));
-/* followed by a __le32+string for dname */
-
-/* client reconnect */
-struct ceph_mds_cap_reconnect {
-       __le64 cap_id;
-       __le32 wanted;
-       __le32 issued;
-       __le64 snaprealm;
-       __le64 pathbase;        /* base ino for our path to this ino */
-       __le32 flock_len;       /* size of flock state blob, if any */
-} __attribute__ ((packed));
-/* followed by flock blob */
-
-struct ceph_mds_cap_reconnect_v1 {
-       __le64 cap_id;
-       __le32 wanted;
-       __le32 issued;
-       __le64 size;
-       struct ceph_timespec mtime, atime;
-       __le64 snaprealm;
-       __le64 pathbase;        /* base ino for our path to this ino */
-} __attribute__ ((packed));
-
-struct ceph_mds_snaprealm_reconnect {
-       __le64 ino;     /* snap realm base */
-       __le64 seq;     /* snap seq for this snap realm */
-       __le64 parent;  /* parent realm */
-} __attribute__ ((packed));
-
-/*
- * snaps
- */
-enum {
-       CEPH_SNAP_OP_UPDATE,  /* CREATE or DESTROY */
-       CEPH_SNAP_OP_CREATE,
-       CEPH_SNAP_OP_DESTROY,
-       CEPH_SNAP_OP_SPLIT,
-};
-
-extern const char *ceph_snap_op_name(int o);
-
-/* snap msg header */
-struct ceph_mds_snap_head {
-       __le32 op;                /* CEPH_SNAP_OP_* */
-       __le64 split;             /* ino to split off, if any */
-       __le32 num_split_inos;    /* # inos belonging to new child realm */
-       __le32 num_split_realms;  /* # child realms udner new child realm */
-       __le32 trace_len;         /* size of snap trace blob */
-} __attribute__ ((packed));
-/* followed by split ino list, then split realms, then the trace blob */
-
-/*
- * encode info about a snaprealm, as viewed by a client
- */
-struct ceph_mds_snap_realm {
-       __le64 ino;           /* ino */
-       __le64 created;       /* snap: when created */
-       __le64 parent;        /* ino: parent realm */
-       __le64 parent_since;  /* snap: same parent since */
-       __le64 seq;           /* snap: version */
-       __le32 num_snaps;
-       __le32 num_prior_parent_snaps;
-} __attribute__ ((packed));
-/* followed by my snap list, then prior parent snap list */
-
-#endif
diff --git a/fs/ceph/ceph_hash.c b/fs/ceph/ceph_hash.c

deleted file mode 100644 (file)

index bd57001..0000000
--- a/fs/ceph/ceph_hash.c
+++ /dev/null
@@ -1,118 +0,0 @@
-
-#include "types.h"
-
-/*
- * Robert Jenkin's hash function.
- * http://burtleburtle.net/bob/hash/evahash.html
- * This is in the public domain.
- */
-#define mix(a, b, c)                                           \
-       do {                                                    \
-               a = a - b;  a = a - c;  a = a ^ (c >> 13);      \
-               b = b - c;  b = b - a;  b = b ^ (a << 8);       \
-               c = c - a;  c = c - b;  c = c ^ (b >> 13);      \
-               a = a - b;  a = a - c;  a = a ^ (c >> 12);      \
-               b = b - c;  b = b - a;  b = b ^ (a << 16);      \
-               c = c - a;  c = c - b;  c = c ^ (b >> 5);       \
-               a = a - b;  a = a - c;  a = a ^ (c >> 3);       \
-               b = b - c;  b = b - a;  b = b ^ (a << 10);      \
-               c = c - a;  c = c - b;  c = c ^ (b >> 15);      \
-       } while (0)
-
-unsigned ceph_str_hash_rjenkins(const char *str, unsigned length)
-{
-       const unsigned char *k = (const unsigned char *)str;
-       __u32 a, b, c;  /* the internal state */
-       __u32 len;      /* how many key bytes still need mixing */
-
-       /* Set up the internal state */
-       len = length;
-       a = 0x9e3779b9;      /* the golden ratio; an arbitrary value */
-       b = a;
-       c = 0;               /* variable initialization of internal state */
-
-       /* handle most of the key */
-       while (len >= 12) {
-               a = a + (k[0] + ((__u32)k[1] << 8) + ((__u32)k[2] << 16) +
-                        ((__u32)k[3] << 24));
-               b = b + (k[4] + ((__u32)k[5] << 8) + ((__u32)k[6] << 16) +
-                        ((__u32)k[7] << 24));
-               c = c + (k[8] + ((__u32)k[9] << 8) + ((__u32)k[10] << 16) +
-                        ((__u32)k[11] << 24));
-               mix(a, b, c);
-               k = k + 12;
-               len = len - 12;
-       }
-
-       /* handle the last 11 bytes */
-       c = c + length;
-       switch (len) {            /* all the case statements fall through */
-       case 11:
-               c = c + ((__u32)k[10] << 24);
-       case 10:
-               c = c + ((__u32)k[9] << 16);
-       case 9:
-               c = c + ((__u32)k[8] << 8);
-               /* the first byte of c is reserved for the length */
-       case 8:
-               b = b + ((__u32)k[7] << 24);
-       case 7:
-               b = b + ((__u32)k[6] << 16);
-       case 6:
-               b = b + ((__u32)k[5] << 8);
-       case 5:
-               b = b + k[4];
-       case 4:
-               a = a + ((__u32)k[3] << 24);
-       case 3:
-               a = a + ((__u32)k[2] << 16);
-       case 2:
-               a = a + ((__u32)k[1] << 8);
-       case 1:
-               a = a + k[0];
-               /* case 0: nothing left to add */
-       }
-       mix(a, b, c);
-
-       return c;
-}
-
-/*
- * linux dcache hash
- */
-unsigned ceph_str_hash_linux(const char *str, unsigned length)
-{
-       unsigned long hash = 0;
-       unsigned char c;
-
-       while (length--) {
-               c = *str++;
-               hash = (hash + (c << 4) + (c >> 4)) * 11;
-       }
-       return hash;
-}
-
-
-unsigned ceph_str_hash(int type, const char *s, unsigned len)
-{
-       switch (type) {
-       case CEPH_STR_HASH_LINUX:
-               return ceph_str_hash_linux(s, len);
-       case CEPH_STR_HASH_RJENKINS:
-               return ceph_str_hash_rjenkins(s, len);
-       default:
-               return -1;
-       }
-}
-
-const char *ceph_str_hash_name(int type)
-{
-       switch (type) {
-       case CEPH_STR_HASH_LINUX:
-               return "linux";
-       case CEPH_STR_HASH_RJENKINS:
-               return "rjenkins";
-       default:
-               return "unknown";
-       }
-}
diff --git a/fs/ceph/ceph_hash.h b/fs/ceph/ceph_hash.h

deleted file mode 100644 (file)

index d099c3f..0000000
--- a/fs/ceph/ceph_hash.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef FS_CEPH_HASH_H
-#define FS_CEPH_HASH_H
-
-#define CEPH_STR_HASH_LINUX      0x1  /* linux dcache hash */
-#define CEPH_STR_HASH_RJENKINS   0x2  /* robert jenkins' */
-
-extern unsigned ceph_str_hash_linux(const char *s, unsigned len);
-extern unsigned ceph_str_hash_rjenkins(const char *s, unsigned len);
-
-extern unsigned ceph_str_hash(int type, const char *s, unsigned len);
-extern const char *ceph_str_hash_name(int type);
-
-#endif
diff --git a/fs/ceph/ceph_strings.c b/fs/ceph/ceph_strings.c

deleted file mode 100644 (file)

index c6179d3..0000000
--- a/fs/ceph/ceph_strings.c
+++ /dev/null
@@ -1,193 +0,0 @@
-/*
- * Ceph string constants
- */
-#include "types.h"
-
-const char *ceph_entity_type_name(int type)
-{
-       switch (type) {
-       case CEPH_ENTITY_TYPE_MDS: return "mds";
-       case CEPH_ENTITY_TYPE_OSD: return "osd";
-       case CEPH_ENTITY_TYPE_MON: return "mon";
-       case CEPH_ENTITY_TYPE_CLIENT: return "client";
-       case CEPH_ENTITY_TYPE_AUTH: return "auth";
-       default: return "unknown";
-       }
-}
-
-const char *ceph_osd_op_name(int op)
-{
-       switch (op) {
-       case CEPH_OSD_OP_READ: return "read";
-       case CEPH_OSD_OP_STAT: return "stat";
-
-       case CEPH_OSD_OP_MASKTRUNC: return "masktrunc";
-
-       case CEPH_OSD_OP_WRITE: return "write";
-       case CEPH_OSD_OP_DELETE: return "delete";
-       case CEPH_OSD_OP_TRUNCATE: return "truncate";
-       case CEPH_OSD_OP_ZERO: return "zero";
-       case CEPH_OSD_OP_WRITEFULL: return "writefull";
-       case CEPH_OSD_OP_ROLLBACK: return "rollback";
-
-       case CEPH_OSD_OP_APPEND: return "append";
-       case CEPH_OSD_OP_STARTSYNC: return "startsync";
-       case CEPH_OSD_OP_SETTRUNC: return "settrunc";
-       case CEPH_OSD_OP_TRIMTRUNC: return "trimtrunc";
-
-       case CEPH_OSD_OP_TMAPUP: return "tmapup";
-       case CEPH_OSD_OP_TMAPGET: return "tmapget";
-       case CEPH_OSD_OP_TMAPPUT: return "tmapput";
-
-       case CEPH_OSD_OP_GETXATTR: return "getxattr";
-       case CEPH_OSD_OP_GETXATTRS: return "getxattrs";
-       case CEPH_OSD_OP_SETXATTR: return "setxattr";
-       case CEPH_OSD_OP_SETXATTRS: return "setxattrs";
-       case CEPH_OSD_OP_RESETXATTRS: return "resetxattrs";
-       case CEPH_OSD_OP_RMXATTR: return "rmxattr";
-       case CEPH_OSD_OP_CMPXATTR: return "cmpxattr";
-
-       case CEPH_OSD_OP_PULL: return "pull";
-       case CEPH_OSD_OP_PUSH: return "push";
-       case CEPH_OSD_OP_BALANCEREADS: return "balance-reads";
-       case CEPH_OSD_OP_UNBALANCEREADS: return "unbalance-reads";
-       case CEPH_OSD_OP_SCRUB: return "scrub";
-
-       case CEPH_OSD_OP_WRLOCK: return "wrlock";
-       case CEPH_OSD_OP_WRUNLOCK: return "wrunlock";
-       case CEPH_OSD_OP_RDLOCK: return "rdlock";
-       case CEPH_OSD_OP_RDUNLOCK: return "rdunlock";
-       case CEPH_OSD_OP_UPLOCK: return "uplock";
-       case CEPH_OSD_OP_DNLOCK: return "dnlock";
-
-       case CEPH_OSD_OP_CALL: return "call";
-
-       case CEPH_OSD_OP_PGLS: return "pgls";
-       }
-       return "???";
-}
-
-const char *ceph_mds_state_name(int s)
-{
-       switch (s) {
-               /* down and out */
-       case CEPH_MDS_STATE_DNE:        return "down:dne";
-       case CEPH_MDS_STATE_STOPPED:    return "down:stopped";
-               /* up and out */
-       case CEPH_MDS_STATE_BOOT:       return "up:boot";
-       case CEPH_MDS_STATE_STANDBY:    return "up:standby";
-       case CEPH_MDS_STATE_STANDBY_REPLAY:    return "up:standby-replay";
-       case CEPH_MDS_STATE_CREATING:   return "up:creating";
-       case CEPH_MDS_STATE_STARTING:   return "up:starting";
-               /* up and in */
-       case CEPH_MDS_STATE_REPLAY:     return "up:replay";
-       case CEPH_MDS_STATE_RESOLVE:    return "up:resolve";
-       case CEPH_MDS_STATE_RECONNECT:  return "up:reconnect";
-       case CEPH_MDS_STATE_REJOIN:     return "up:rejoin";
-       case CEPH_MDS_STATE_CLIENTREPLAY: return "up:clientreplay";
-       case CEPH_MDS_STATE_ACTIVE:     return "up:active";
-       case CEPH_MDS_STATE_STOPPING:   return "up:stopping";
-       }
-       return "???";
-}
-
-const char *ceph_session_op_name(int op)
-{
-       switch (op) {
-       case CEPH_SESSION_REQUEST_OPEN: return "request_open";
-       case CEPH_SESSION_OPEN: return "open";
-       case CEPH_SESSION_REQUEST_CLOSE: return "request_close";
-       case CEPH_SESSION_CLOSE: return "close";
-       case CEPH_SESSION_REQUEST_RENEWCAPS: return "request_renewcaps";
-       case CEPH_SESSION_RENEWCAPS: return "renewcaps";
-       case CEPH_SESSION_STALE: return "stale";
-       case CEPH_SESSION_RECALL_STATE: return "recall_state";
-       }
-       return "???";
-}
-
-const char *ceph_mds_op_name(int op)
-{
-       switch (op) {
-       case CEPH_MDS_OP_LOOKUP:  return "lookup";
-       case CEPH_MDS_OP_LOOKUPHASH:  return "lookuphash";
-       case CEPH_MDS_OP_LOOKUPPARENT:  return "lookupparent";
-       case CEPH_MDS_OP_GETATTR:  return "getattr";
-       case CEPH_MDS_OP_SETXATTR: return "setxattr";
-       case CEPH_MDS_OP_SETATTR: return "setattr";
-       case CEPH_MDS_OP_RMXATTR: return "rmxattr";
-       case CEPH_MDS_OP_READDIR: return "readdir";
-       case CEPH_MDS_OP_MKNOD: return "mknod";
-       case CEPH_MDS_OP_LINK: return "link";
-       case CEPH_MDS_OP_UNLINK: return "unlink";
-       case CEPH_MDS_OP_RENAME: return "rename";
-       case CEPH_MDS_OP_MKDIR: return "mkdir";
-       case CEPH_MDS_OP_RMDIR: return "rmdir";
-       case CEPH_MDS_OP_SYMLINK: return "symlink";
-       case CEPH_MDS_OP_CREATE: return "create";
-       case CEPH_MDS_OP_OPEN: return "open";
-       case CEPH_MDS_OP_LOOKUPSNAP: return "lookupsnap";
-       case CEPH_MDS_OP_LSSNAP: return "lssnap";
-       case CEPH_MDS_OP_MKSNAP: return "mksnap";
-       case CEPH_MDS_OP_RMSNAP: return "rmsnap";
-       case CEPH_MDS_OP_SETFILELOCK: return "setfilelock";
-       case CEPH_MDS_OP_GETFILELOCK: return "getfilelock";
-       }
-       return "???";
-}
-
-const char *ceph_cap_op_name(int op)
-{
-       switch (op) {
-       case CEPH_CAP_OP_GRANT: return "grant";
-       case CEPH_CAP_OP_REVOKE: return "revoke";
-       case CEPH_CAP_OP_TRUNC: return "trunc";
-       case CEPH_CAP_OP_EXPORT: return "export";
-       case CEPH_CAP_OP_IMPORT: return "import";
-       case CEPH_CAP_OP_UPDATE: return "update";
-       case CEPH_CAP_OP_DROP: return "drop";
-       case CEPH_CAP_OP_FLUSH: return "flush";
-       case CEPH_CAP_OP_FLUSH_ACK: return "flush_ack";
-       case CEPH_CAP_OP_FLUSHSNAP: return "flushsnap";
-       case CEPH_CAP_OP_FLUSHSNAP_ACK: return "flushsnap_ack";
-       case CEPH_CAP_OP_RELEASE: return "release";
-       case CEPH_CAP_OP_RENEW: return "renew";
-       }
-       return "???";
-}
-
-const char *ceph_lease_op_name(int o)
-{
-       switch (o) {
-       case CEPH_MDS_LEASE_REVOKE: return "revoke";
-       case CEPH_MDS_LEASE_RELEASE: return "release";
-       case CEPH_MDS_LEASE_RENEW: return "renew";
-       case CEPH_MDS_LEASE_REVOKE_ACK: return "revoke_ack";
-       }
-       return "???";
-}
-
-const char *ceph_snap_op_name(int o)
-{
-       switch (o) {
-       case CEPH_SNAP_OP_UPDATE: return "update";
-       case CEPH_SNAP_OP_CREATE: return "create";
-       case CEPH_SNAP_OP_DESTROY: return "destroy";
-       case CEPH_SNAP_OP_SPLIT: return "split";
-       }
-       return "???";
-}
-
-const char *ceph_pool_op_name(int op)
-{
-       switch (op) {
-       case POOL_OP_CREATE: return "create";
-       case POOL_OP_DELETE: return "delete";
-       case POOL_OP_AUID_CHANGE: return "auid change";
-       case POOL_OP_CREATE_SNAP: return "create snap";
-       case POOL_OP_DELETE_SNAP: return "delete snap";
-       case POOL_OP_CREATE_UNMANAGED_SNAP: return "create unmanaged snap";
-       case POOL_OP_DELETE_UNMANAGED_SNAP: return "delete unmanaged snap";
-       }
-       return "???";
-}
diff --git a/fs/ceph/crush/crush.c b/fs/ceph/crush/crush.c

deleted file mode 100644 (file)

index fabd302..0000000
--- a/fs/ceph/crush/crush.c
+++ /dev/null
@@ -1,151 +0,0 @@
-
-#ifdef __KERNEL__
-# include <linux/slab.h>
-#else
-# include <stdlib.h>
-# include <assert.h>
-# define kfree(x) do { if (x) free(x); } while (0)
-# define BUG_ON(x) assert(!(x))
-#endif
-
-#include "crush.h"
-
-const char *crush_bucket_alg_name(int alg)
-{
-       switch (alg) {
-       case CRUSH_BUCKET_UNIFORM: return "uniform";
-       case CRUSH_BUCKET_LIST: return "list";
-       case CRUSH_BUCKET_TREE: return "tree";
-       case CRUSH_BUCKET_STRAW: return "straw";
-       default: return "unknown";
-       }
-}
-
-/**
- * crush_get_bucket_item_weight - Get weight of an item in given bucket
- * @b: bucket pointer
- * @p: item index in bucket
- */
-int crush_get_bucket_item_weight(struct crush_bucket *b, int p)
-{
-       if (p >= b->size)
-               return 0;
-
-       switch (b->alg) {
-       case CRUSH_BUCKET_UNIFORM:
-               return ((struct crush_bucket_uniform *)b)->item_weight;
-       case CRUSH_BUCKET_LIST:
-               return ((struct crush_bucket_list *)b)->item_weights[p];
-       case CRUSH_BUCKET_TREE:
-               if (p & 1)
-                       return ((struct crush_bucket_tree *)b)->node_weights[p];
-               return 0;
-       case CRUSH_BUCKET_STRAW:
-               return ((struct crush_bucket_straw *)b)->item_weights[p];
-       }
-       return 0;
-}
-
-/**
- * crush_calc_parents - Calculate parent vectors for the given crush map.
- * @map: crush_map pointer
- */
-void crush_calc_parents(struct crush_map *map)
-{
-       int i, b, c;
-
-       for (b = 0; b < map->max_buckets; b++) {
-               if (map->buckets[b] == NULL)
-                       continue;
-               for (i = 0; i < map->buckets[b]->size; i++) {
-                       c = map->buckets[b]->items[i];
-                       BUG_ON(c >= map->max_devices ||
-                              c < -map->max_buckets);
-                       if (c >= 0)
-                               map->device_parents[c] = map->buckets[b]->id;
-                       else
-                               map->bucket_parents[-1-c] = map->buckets[b]->id;
-               }
-       }
-}
-
-void crush_destroy_bucket_uniform(struct crush_bucket_uniform *b)
-{
-       kfree(b->h.perm);
-       kfree(b->h.items);
-       kfree(b);
-}
-
-void crush_destroy_bucket_list(struct crush_bucket_list *b)
-{
-       kfree(b->item_weights);
-       kfree(b->sum_weights);
-       kfree(b->h.perm);
-       kfree(b->h.items);
-       kfree(b);
-}
-
-void crush_destroy_bucket_tree(struct crush_bucket_tree *b)
-{
-       kfree(b->node_weights);
-       kfree(b);
-}
-
-void crush_destroy_bucket_straw(struct crush_bucket_straw *b)
-{
-       kfree(b->straws);
-       kfree(b->item_weights);
-       kfree(b->h.perm);
-       kfree(b->h.items);
-       kfree(b);
-}
-
-void crush_destroy_bucket(struct crush_bucket *b)
-{
-       switch (b->alg) {
-       case CRUSH_BUCKET_UNIFORM:
-               crush_destroy_bucket_uniform((struct crush_bucket_uniform *)b);
-               break;
-       case CRUSH_BUCKET_LIST:
-               crush_destroy_bucket_list((struct crush_bucket_list *)b);
-               break;
-       case CRUSH_BUCKET_TREE:
-               crush_destroy_bucket_tree((struct crush_bucket_tree *)b);
-               break;
-       case CRUSH_BUCKET_STRAW:
-               crush_destroy_bucket_straw((struct crush_bucket_straw *)b);
-               break;
-       }
-}
-
-/**
- * crush_destroy - Destroy a crush_map
- * @map: crush_map pointer
- */
-void crush_destroy(struct crush_map *map)
-{
-       int b;
-
-       /* buckets */
-       if (map->buckets) {
-               for (b = 0; b < map->max_buckets; b++) {
-                       if (map->buckets[b] == NULL)
-                               continue;
-                       crush_destroy_bucket(map->buckets[b]);
-               }
-               kfree(map->buckets);
-       }
-
-       /* rules */
-       if (map->rules) {
-               for (b = 0; b < map->max_rules; b++)
-                       kfree(map->rules[b]);
-               kfree(map->rules);
-       }
-
-       kfree(map->bucket_parents);
-       kfree(map->device_parents);
-       kfree(map);
-}
-
-
diff --git a/fs/ceph/crush/crush.h b/fs/ceph/crush/crush.h

deleted file mode 100644 (file)

index 97e435b..0000000
--- a/fs/ceph/crush/crush.h
+++ /dev/null
@@ -1,180 +0,0 @@
-#ifndef CEPH_CRUSH_CRUSH_H
-#define CEPH_CRUSH_CRUSH_H
-
-#include <linux/types.h>
-
-/*
- * CRUSH is a pseudo-random data distribution algorithm that
- * efficiently distributes input values (typically, data objects)
- * across a heterogeneous, structured storage cluster.
- *
- * The algorithm was originally described in detail in this paper
- * (although the algorithm has evolved somewhat since then):
- *
- *     http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
- *
- * LGPL2
- */
-
-
-#define CRUSH_MAGIC 0x00010000ul   /* for detecting algorithm revisions */
-
-
-#define CRUSH_MAX_DEPTH 10  /* max crush hierarchy depth */
-#define CRUSH_MAX_SET   10  /* max size of a mapping result */
-
-
-/*
- * CRUSH uses user-defined "rules" to describe how inputs should be
- * mapped to devices.  A rule consists of sequence of steps to perform
- * to generate the set of output devices.
- */
-struct crush_rule_step {
-       __u32 op;
-       __s32 arg1;
-       __s32 arg2;
-};
-
-/* step op codes */
-enum {
-       CRUSH_RULE_NOOP = 0,
-       CRUSH_RULE_TAKE = 1,          /* arg1 = value to start with */
-       CRUSH_RULE_CHOOSE_FIRSTN = 2, /* arg1 = num items to pick */
-                                     /* arg2 = type */
-       CRUSH_RULE_CHOOSE_INDEP = 3,  /* same */
-       CRUSH_RULE_EMIT = 4,          /* no args */
-       CRUSH_RULE_CHOOSE_LEAF_FIRSTN = 6,
-       CRUSH_RULE_CHOOSE_LEAF_INDEP = 7,
-};
-
-/*
- * for specifying choose num (arg1) relative to the max parameter
- * passed to do_rule
- */
-#define CRUSH_CHOOSE_N            0
-#define CRUSH_CHOOSE_N_MINUS(x)   (-(x))
-
-/*
- * The rule mask is used to describe what the rule is intended for.
- * Given a ruleset and size of output set, we search through the
- * rule list for a matching rule_mask.
- */
-struct crush_rule_mask {
-       __u8 ruleset;
-       __u8 type;
-       __u8 min_size;
-       __u8 max_size;
-};
-
-struct crush_rule {
-       __u32 len;
-       struct crush_rule_mask mask;
-       struct crush_rule_step steps[0];
-};
-
-#define crush_rule_size(len) (sizeof(struct crush_rule) + \
-                             (len)*sizeof(struct crush_rule_step))
-
-
-
-/*
- * A bucket is a named container of other items (either devices or
- * other buckets).  Items within a bucket are chosen using one of a
- * few different algorithms.  The table summarizes how the speed of
- * each option measures up against mapping stability when items are
- * added or removed.
- *
- *  Bucket Alg     Speed       Additions    Removals
- *  ------------------------------------------------
- *  uniform         O(1)       poor         poor
- *  list            O(n)       optimal      poor
- *  tree            O(log n)   good         good
- *  straw           O(n)       optimal      optimal
- */
-enum {
-       CRUSH_BUCKET_UNIFORM = 1,
-       CRUSH_BUCKET_LIST = 2,
-       CRUSH_BUCKET_TREE = 3,
-       CRUSH_BUCKET_STRAW = 4
-};
-extern const char *crush_bucket_alg_name(int alg);
-
-struct crush_bucket {
-       __s32 id;        /* this'll be negative */
-       __u16 type;      /* non-zero; type=0 is reserved for devices */
-       __u8 alg;        /* one of CRUSH_BUCKET_* */
-       __u8 hash;       /* which hash function to use, CRUSH_HASH_* */
-       __u32 weight;    /* 16-bit fixed point */
-       __u32 size;      /* num items */
-       __s32 *items;
-
-       /*
-        * cached random permutation: used for uniform bucket and for
-        * the linear search fallback for the other bucket types.
-        */
-       __u32 perm_x;  /* @x for which *perm is defined */
-       __u32 perm_n;  /* num elements of *perm that are permuted/defined */
-       __u32 *perm;
-};
-
-struct crush_bucket_uniform {
-       struct crush_bucket h;
-       __u32 item_weight;  /* 16-bit fixed point; all items equally weighted */
-};
-
-struct crush_bucket_list {
-       struct crush_bucket h;
-       __u32 *item_weights;  /* 16-bit fixed point */
-       __u32 *sum_weights;   /* 16-bit fixed point.  element i is sum
-                                of weights 0..i, inclusive */
-};
-
-struct crush_bucket_tree {
-       struct crush_bucket h;  /* note: h.size is _tree_ size, not number of
-                                  actual items */
-       __u8 num_nodes;
-       __u32 *node_weights;
-};
-
-struct crush_bucket_straw {
-       struct crush_bucket h;
-       __u32 *item_weights;   /* 16-bit fixed point */
-       __u32 *straws;         /* 16-bit fixed point */
-};
-
-
-
-/*
- * CRUSH map includes all buckets, rules, etc.
- */
-struct crush_map {
-       struct crush_bucket **buckets;
-       struct crush_rule **rules;
-
-       /*
-        * Parent pointers to identify the parent bucket a device or
-        * bucket in the hierarchy.  If an item appears more than
-        * once, this is the _last_ time it appeared (where buckets
-        * are processed in bucket id order, from -1 on down to
-        * -max_buckets.
-        */
-       __u32 *bucket_parents;
-       __u32 *device_parents;
-
-       __s32 max_buckets;
-       __u32 max_rules;
-       __s32 max_devices;
-};
-
-
-/* crush.c */
-extern int crush_get_bucket_item_weight(struct crush_bucket *b, int pos);
-extern void crush_calc_parents(struct crush_map *map);
-extern void crush_destroy_bucket_uniform(struct crush_bucket_uniform *b);
-extern void crush_destroy_bucket_list(struct crush_bucket_list *b);
-extern void crush_destroy_bucket_tree(struct crush_bucket_tree *b);
-extern void crush_destroy_bucket_straw(struct crush_bucket_straw *b);
-extern void crush_destroy_bucket(struct crush_bucket *b);
-extern void crush_destroy(struct crush_map *map);
-
-#endif
diff --git a/fs/ceph/crush/hash.c b/fs/ceph/crush/hash.c

deleted file mode 100644 (file)

index 5873aed..0000000
--- a/fs/ceph/crush/hash.c
+++ /dev/null
@@ -1,149 +0,0 @@
-
-#include <linux/types.h>
-#include "hash.h"
-
-/*
- * Robert Jenkins' function for mixing 32-bit values
- * http://burtleburtle.net/bob/hash/evahash.html
- * a, b = random bits, c = input and output
- */
-#define crush_hashmix(a, b, c) do {                    \
-               a = a-b;  a = a-c;  a = a^(c>>13);      \
-               b = b-c;  b = b-a;  b = b^(a<<8);       \
-               c = c-a;  c = c-b;  c = c^(b>>13);      \
-               a = a-b;  a = a-c;  a = a^(c>>12);      \
-               b = b-c;  b = b-a;  b = b^(a<<16);      \
-               c = c-a;  c = c-b;  c = c^(b>>5);       \
-               a = a-b;  a = a-c;  a = a^(c>>3);       \
-               b = b-c;  b = b-a;  b = b^(a<<10);      \
-               c = c-a;  c = c-b;  c = c^(b>>15);      \
-       } while (0)
-
-#define crush_hash_seed 1315423911
-
-static __u32 crush_hash32_rjenkins1(__u32 a)
-{
-       __u32 hash = crush_hash_seed ^ a;
-       __u32 b = a;
-       __u32 x = 231232;
-       __u32 y = 1232;
-       crush_hashmix(b, x, hash);
-       crush_hashmix(y, a, hash);
-       return hash;
-}
-
-static __u32 crush_hash32_rjenkins1_2(__u32 a, __u32 b)
-{
-       __u32 hash = crush_hash_seed ^ a ^ b;
-       __u32 x = 231232;
-       __u32 y = 1232;
-       crush_hashmix(a, b, hash);
-       crush_hashmix(x, a, hash);
-       crush_hashmix(b, y, hash);
-       return hash;
-}
-
-static __u32 crush_hash32_rjenkins1_3(__u32 a, __u32 b, __u32 c)
-{
-       __u32 hash = crush_hash_seed ^ a ^ b ^ c;
-       __u32 x = 231232;
-       __u32 y = 1232;
-       crush_hashmix(a, b, hash);
-       crush_hashmix(c, x, hash);
-       crush_hashmix(y, a, hash);
-       crush_hashmix(b, x, hash);
-       crush_hashmix(y, c, hash);
-       return hash;
-}
-
-static __u32 crush_hash32_rjenkins1_4(__u32 a, __u32 b, __u32 c, __u32 d)
-{
-       __u32 hash = crush_hash_seed ^ a ^ b ^ c ^ d;
-       __u32 x = 231232;
-       __u32 y = 1232;
-       crush_hashmix(a, b, hash);
-       crush_hashmix(c, d, hash);
-       crush_hashmix(a, x, hash);
-       crush_hashmix(y, b, hash);
-       crush_hashmix(c, x, hash);
-       crush_hashmix(y, d, hash);
-       return hash;
-}
-
-static __u32 crush_hash32_rjenkins1_5(__u32 a, __u32 b, __u32 c, __u32 d,
-                                     __u32 e)
-{
-       __u32 hash = crush_hash_seed ^ a ^ b ^ c ^ d ^ e;
-       __u32 x = 231232;
-       __u32 y = 1232;
-       crush_hashmix(a, b, hash);
-       crush_hashmix(c, d, hash);
-       crush_hashmix(e, x, hash);
-       crush_hashmix(y, a, hash);
-       crush_hashmix(b, x, hash);
-       crush_hashmix(y, c, hash);
-       crush_hashmix(d, x, hash);
-       crush_hashmix(y, e, hash);
-       return hash;
-}
-
-
-__u32 crush_hash32(int type, __u32 a)
-{
-       switch (type) {
-       case CRUSH_HASH_RJENKINS1:
-               return crush_hash32_rjenkins1(a);
-       default:
-               return 0;
-       }
-}
-
-__u32 crush_hash32_2(int type, __u32 a, __u32 b)
-{
-       switch (type) {
-       case CRUSH_HASH_RJENKINS1:
-               return crush_hash32_rjenkins1_2(a, b);
-       default:
-               return 0;
-       }
-}
-
-__u32 crush_hash32_3(int type, __u32 a, __u32 b, __u32 c)
-{
-       switch (type) {
-       case CRUSH_HASH_RJENKINS1:
-               return crush_hash32_rjenkins1_3(a, b, c);
-       default:
-               return 0;
-       }
-}
-
-__u32 crush_hash32_4(int type, __u32 a, __u32 b, __u32 c, __u32 d)
-{
-       switch (type) {
-       case CRUSH_HASH_RJENKINS1:
-               return crush_hash32_rjenkins1_4(a, b, c, d);
-       default:
-               return 0;
-       }
-}
-
-__u32 crush_hash32_5(int type, __u32 a, __u32 b, __u32 c, __u32 d, __u32 e)
-{
-       switch (type) {
-       case CRUSH_HASH_RJENKINS1:
-               return crush_hash32_rjenkins1_5(a, b, c, d, e);
-       default:
-               return 0;
-       }
-}
-
-const char *crush_hash_name(int type)
-{
-       switch (type) {
-       case CRUSH_HASH_RJENKINS1:
-               return "rjenkins1";
-       default:
-               return "unknown";
-       }
-}
diff --git a/fs/ceph/crush/hash.h b/fs/ceph/crush/hash.h

deleted file mode 100644 (file)

index 91e8842..0000000
--- a/fs/ceph/crush/hash.h
+++ /dev/null
@@ -1,17 +0,0 @@
-#ifndef CEPH_CRUSH_HASH_H
-#define CEPH_CRUSH_HASH_H
-
-#define CRUSH_HASH_RJENKINS1   0
-
-#define CRUSH_HASH_DEFAULT CRUSH_HASH_RJENKINS1
-
-extern const char *crush_hash_name(int type);
-
-extern __u32 crush_hash32(int type, __u32 a);
-extern __u32 crush_hash32_2(int type, __u32 a, __u32 b);
-extern __u32 crush_hash32_3(int type, __u32 a, __u32 b, __u32 c);
-extern __u32 crush_hash32_4(int type, __u32 a, __u32 b, __u32 c, __u32 d);
-extern __u32 crush_hash32_5(int type, __u32 a, __u32 b, __u32 c, __u32 d,
-                           __u32 e);
-
-#endif
diff --git a/fs/ceph/crush/mapper.c b/fs/ceph/crush/mapper.c

deleted file mode 100644 (file)

index a4eec13..0000000
--- a/fs/ceph/crush/mapper.c
+++ /dev/null
@@ -1,609 +0,0 @@
-
-#ifdef __KERNEL__
-# include <linux/string.h>
-# include <linux/slab.h>
-# include <linux/bug.h>
-# include <linux/kernel.h>
-# ifndef dprintk
-#  define dprintk(args...)
-# endif
-#else
-# include <string.h>
-# include <stdio.h>
-# include <stdlib.h>
-# include <assert.h>
-# define BUG_ON(x) assert(!(x))
-# define dprintk(args...) /* printf(args) */
-# define kmalloc(x, f) malloc(x)
-# define kfree(x) free(x)
-#endif
-
-#include "crush.h"
-#include "hash.h"
-
-/*
- * Implement the core CRUSH mapping algorithm.
- */
-
-/**
- * crush_find_rule - find a crush_rule id for a given ruleset, type, and size.
- * @map: the crush_map
- * @ruleset: the storage ruleset id (user defined)
- * @type: storage ruleset type (user defined)
- * @size: output set size
- */
-int crush_find_rule(struct crush_map *map, int ruleset, int type, int size)
-{
-       int i;
-
-       for (i = 0; i < map->max_rules; i++) {
-               if (map->rules[i] &&
-                   map->rules[i]->mask.ruleset == ruleset &&
-                   map->rules[i]->mask.type == type &&
-                   map->rules[i]->mask.min_size <= size &&
-                   map->rules[i]->mask.max_size >= size)
-                       return i;
-       }
-       return -1;
-}
-
-
-/*
- * bucket choose methods
- *
- * For each bucket algorithm, we have a "choose" method that, given a
- * crush input @x and replica position (usually, position in output set) @r,
- * will produce an item in the bucket.
- */
-
-/*
- * Choose based on a random permutation of the bucket.
- *
- * We used to use some prime number arithmetic to do this, but it
- * wasn't very random, and had some other bad behaviors.  Instead, we
- * calculate an actual random permutation of the bucket members.
- * Since this is expensive, we optimize for the r=0 case, which
- * captures the vast majority of calls.
- */
-static int bucket_perm_choose(struct crush_bucket *bucket,
-                             int x, int r)
-{
-       unsigned pr = r % bucket->size;
-       unsigned i, s;
-
-       /* start a new permutation if @x has changed */
-       if (bucket->perm_x != x || bucket->perm_n == 0) {
-               dprintk("bucket %d new x=%d\n", bucket->id, x);
-               bucket->perm_x = x;
-
-               /* optimize common r=0 case */
-               if (pr == 0) {
-                       s = crush_hash32_3(bucket->hash, x, bucket->id, 0) %
-                               bucket->size;
-                       bucket->perm[0] = s;
-                       bucket->perm_n = 0xffff;   /* magic value, see below */
-                       goto out;
-               }
-
-               for (i = 0; i < bucket->size; i++)
-                       bucket->perm[i] = i;
-               bucket->perm_n = 0;
-       } else if (bucket->perm_n == 0xffff) {
-               /* clean up after the r=0 case above */
-               for (i = 1; i < bucket->size; i++)
-                       bucket->perm[i] = i;
-               bucket->perm[bucket->perm[0]] = 0;
-               bucket->perm_n = 1;
-       }
-
-       /* calculate permutation up to pr */
-       for (i = 0; i < bucket->perm_n; i++)
-               dprintk(" perm_choose have %d: %d\n", i, bucket->perm[i]);
-       while (bucket->perm_n <= pr) {
-               unsigned p = bucket->perm_n;
-               /* no point in swapping the final entry */
-               if (p < bucket->size - 1) {
-                       i = crush_hash32_3(bucket->hash, x, bucket->id, p) %
-                               (bucket->size - p);
-                       if (i) {
-                               unsigned t = bucket->perm[p + i];
-                               bucket->perm[p + i] = bucket->perm[p];
-                               bucket->perm[p] = t;
-                       }
-                       dprintk(" perm_choose swap %d with %d\n", p, p+i);
-               }
-               bucket->perm_n++;
-       }
-       for (i = 0; i < bucket->size; i++)
-               dprintk(" perm_choose  %d: %d\n", i, bucket->perm[i]);
-
-       s = bucket->perm[pr];
-out:
-       dprintk(" perm_choose %d sz=%d x=%d r=%d (%d) s=%d\n", bucket->id,
-               bucket->size, x, r, pr, s);
-       return bucket->items[s];
-}
-
-/* uniform */
-static int bucket_uniform_choose(struct crush_bucket_uniform *bucket,
-                                int x, int r)
-{
-       return bucket_perm_choose(&bucket->h, x, r);
-}
-
-/* list */
-static int bucket_list_choose(struct crush_bucket_list *bucket,
-                             int x, int r)
-{
-       int i;
-
-       for (i = bucket->h.size-1; i >= 0; i--) {
-               __u64 w = crush_hash32_4(bucket->h.hash,x, bucket->h.items[i],
-                                        r, bucket->h.id);
-               w &= 0xffff;
-               dprintk("list_choose i=%d x=%d r=%d item %d weight %x "
-                       "sw %x rand %llx",
-                       i, x, r, bucket->h.items[i], bucket->item_weights[i],
-                       bucket->sum_weights[i], w);
-               w *= bucket->sum_weights[i];
-               w = w >> 16;
-               /*dprintk(" scaled %llx\n", w);*/
-               if (w < bucket->item_weights[i])
-                       return bucket->h.items[i];
-       }
-
-       BUG_ON(1);
-       return 0;
-}
-
-
-/* (binary) tree */
-static int height(int n)
-{
-       int h = 0;
-       while ((n & 1) == 0) {
-               h++;
-               n = n >> 1;
-       }
-       return h;
-}
-
-static int left(int x)
-{
-       int h = height(x);
-       return x - (1 << (h-1));
-}
-
-static int right(int x)
-{
-       int h = height(x);
-       return x + (1 << (h-1));
-}
-
-static int terminal(int x)
-{
-       return x & 1;
-}
-
-static int bucket_tree_choose(struct crush_bucket_tree *bucket,
-                             int x, int r)
-{
-       int n, l;
-       __u32 w;
-       __u64 t;
-
-       /* start at root */
-       n = bucket->num_nodes >> 1;
-
-       while (!terminal(n)) {
-               /* pick point in [0, w) */
-               w = bucket->node_weights[n];
-               t = (__u64)crush_hash32_4(bucket->h.hash, x, n, r,
-                                         bucket->h.id) * (__u64)w;
-               t = t >> 32;
-
-               /* descend to the left or right? */
-               l = left(n);
-               if (t < bucket->node_weights[l])
-                       n = l;
-               else
-                       n = right(n);
-       }
-
-       return bucket->h.items[n >> 1];
-}
-
-
-/* straw */
-
-static int bucket_straw_choose(struct crush_bucket_straw *bucket,
-                              int x, int r)
-{
-       int i;
-       int high = 0;
-       __u64 high_draw = 0;
-       __u64 draw;
-
-       for (i = 0; i < bucket->h.size; i++) {
-               draw = crush_hash32_3(bucket->h.hash, x, bucket->h.items[i], r);
-               draw &= 0xffff;
-               draw *= bucket->straws[i];
-               if (i == 0 || draw > high_draw) {
-                       high = i;
-                       high_draw = draw;
-               }
-       }
-       return bucket->h.items[high];
-}
-
-static int crush_bucket_choose(struct crush_bucket *in, int x, int r)
-{
-       dprintk(" crush_bucket_choose %d x=%d r=%d\n", in->id, x, r);
-       switch (in->alg) {
-       case CRUSH_BUCKET_UNIFORM:
-               return bucket_uniform_choose((struct crush_bucket_uniform *)in,
-                                         x, r);
-       case CRUSH_BUCKET_LIST:
-               return bucket_list_choose((struct crush_bucket_list *)in,
-                                         x, r);
-       case CRUSH_BUCKET_TREE:
-               return bucket_tree_choose((struct crush_bucket_tree *)in,
-                                         x, r);
-       case CRUSH_BUCKET_STRAW:
-               return bucket_straw_choose((struct crush_bucket_straw *)in,
-                                          x, r);
-       default:
-               BUG_ON(1);
-               return in->items[0];
-       }
-}
-
-/*
- * true if device is marked "out" (failed, fully offloaded)
- * of the cluster
- */
-static int is_out(struct crush_map *map, __u32 *weight, int item, int x)
-{
-       if (weight[item] >= 0x10000)
-               return 0;
-       if (weight[item] == 0)
-               return 1;
-       if ((crush_hash32_2(CRUSH_HASH_RJENKINS1, x, item) & 0xffff)
-           < weight[item])
-               return 0;
-       return 1;
-}
-
-/**
- * crush_choose - choose numrep distinct items of given type
- * @map: the crush_map
- * @bucket: the bucket we are choose an item from
- * @x: crush input value
- * @numrep: the number of items to choose
- * @type: the type of item to choose
- * @out: pointer to output vector
- * @outpos: our position in that vector
- * @firstn: true if choosing "first n" items, false if choosing "indep"
- * @recurse_to_leaf: true if we want one device under each item of given type
- * @out2: second output vector for leaf items (if @recurse_to_leaf)
- */
-static int crush_choose(struct crush_map *map,
-                       struct crush_bucket *bucket,
-                       __u32 *weight,
-                       int x, int numrep, int type,
-                       int *out, int outpos,
-                       int firstn, int recurse_to_leaf,
-                       int *out2)
-{
-       int rep;
-       int ftotal, flocal;
-       int retry_descent, retry_bucket, skip_rep;
-       struct crush_bucket *in = bucket;
-       int r;
-       int i;
-       int item = 0;
-       int itemtype;
-       int collide, reject;
-       const int orig_tries = 5; /* attempts before we fall back to search */
-
-       dprintk("CHOOSE%s bucket %d x %d outpos %d numrep %d\n", recurse_to_leaf ? "_LEAF" : "",
-               bucket->id, x, outpos, numrep);
-
-       for (rep = outpos; rep < numrep; rep++) {
-               /* keep trying until we get a non-out, non-colliding item */
-               ftotal = 0;
-               skip_rep = 0;
-               do {
-                       retry_descent = 0;
-                       in = bucket;               /* initial bucket */
-
-                       /* choose through intervening buckets */
-                       flocal = 0;
-                       do {
-                               collide = 0;
-                               retry_bucket = 0;
-                               r = rep;
-                               if (in->alg == CRUSH_BUCKET_UNIFORM) {
-                                       /* be careful */
-                                       if (firstn || numrep >= in->size)
-                                               /* r' = r + f_total */
-                                               r += ftotal;
-                                       else if (in->size % numrep == 0)
-                                               /* r'=r+(n+1)*f_local */
-                                               r += (numrep+1) *
-                                                       (flocal+ftotal);
-                                       else
-                                               /* r' = r + n*f_local */
-                                               r += numrep * (flocal+ftotal);
-                               } else {
-                                       if (firstn)
-                                               /* r' = r + f_total */
-                                               r += ftotal;
-                                       else
-                                               /* r' = r + n*f_local */
-                                               r += numrep * (flocal+ftotal);
-                               }
-
-                               /* bucket choose */
-                               if (in->size == 0) {
-                                       reject = 1;
-                                       goto reject;
-                               }
-                               if (flocal >= (in->size>>1) &&
-                                   flocal > orig_tries)
-                                       item = bucket_perm_choose(in, x, r);
-                               else
-                                       item = crush_bucket_choose(in, x, r);
-                               BUG_ON(item >= map->max_devices);
-
-                               /* desired type? */
-                               if (item < 0)
-                                       itemtype = map->buckets[-1-item]->type;
-                               else
-                                       itemtype = 0;
-                               dprintk("  item %d type %d\n", item, itemtype);
-
-                               /* keep going? */
-                               if (itemtype != type) {
-                                       BUG_ON(item >= 0 ||
-                                              (-1-item) >= map->max_buckets);
-                                       in = map->buckets[-1-item];
-                                       retry_bucket = 1;
-                                       continue;
-                               }
-
-                               /* collision? */
-                               for (i = 0; i < outpos; i++) {
-                                       if (out[i] == item) {
-                                               collide = 1;
-                                               break;
-                                       }
-                               }
-
-                               reject = 0;
-                               if (recurse_to_leaf) {
-                                       if (item < 0) {
-                                               if (crush_choose(map,
-                                                        map->buckets[-1-item],
-                                                        weight,
-                                                        x, outpos+1, 0,
-                                                        out2, outpos,
-                                                        firstn, 0,
-                                                        NULL) <= outpos)
-                                                       /* didn't get leaf */
-                                                       reject = 1;
-                                       } else {
-                                               /* we already have a leaf! */
-                                               out2[outpos] = item;
-                                       }
-                               }
-
-                               if (!reject) {
-                                       /* out? */
-                                       if (itemtype == 0)
-                                               reject = is_out(map, weight,
-                                                               item, x);
-                                       else
-                                               reject = 0;
-                               }
-
-reject:
-                               if (reject || collide) {
-                                       ftotal++;
-                                       flocal++;
-
-                                       if (collide && flocal < 3)
-                                               /* retry locally a few times */
-                                               retry_bucket = 1;
-                                       else if (flocal < in->size + orig_tries)
-                                               /* exhaustive bucket search */
-                                               retry_bucket = 1;
-                                       else if (ftotal < 20)
-                                               /* then retry descent */
-                                               retry_descent = 1;
-                                       else
-                                               /* else give up */
-                                               skip_rep = 1;
-                                       dprintk("  reject %d  collide %d  "
-                                               "ftotal %d  flocal %d\n",
-                                               reject, collide, ftotal,
-                                               flocal);
-                               }
-                       } while (retry_bucket);
-               } while (retry_descent);
-
-               if (skip_rep) {
-                       dprintk("skip rep\n");
-                       continue;
-               }
-
-               dprintk("CHOOSE got %d\n", item);
-               out[outpos] = item;
-               outpos++;
-       }
-
-       dprintk("CHOOSE returns %d\n", outpos);
-       return outpos;
-}
-
-
-/**
- * crush_do_rule - calculate a mapping with the given input and rule
- * @map: the crush_map
- * @ruleno: the rule id
- * @x: hash input
- * @result: pointer to result vector
- * @result_max: maximum result size
- * @force: force initial replica choice; -1 for none
- */
-int crush_do_rule(struct crush_map *map,
-                 int ruleno, int x, int *result, int result_max,
-                 int force, __u32 *weight)
-{
-       int result_len;
-       int force_context[CRUSH_MAX_DEPTH];
-       int force_pos = -1;
-       int a[CRUSH_MAX_SET];
-       int b[CRUSH_MAX_SET];
-       int c[CRUSH_MAX_SET];
-       int recurse_to_leaf;
-       int *w;
-       int wsize = 0;
-       int *o;
-       int osize;
-       int *tmp;
-       struct crush_rule *rule;
-       int step;
-       int i, j;
-       int numrep;
-       int firstn;
-       int rc = -1;
-
-       BUG_ON(ruleno >= map->max_rules);
-
-       rule = map->rules[ruleno];
-       result_len = 0;
-       w = a;
-       o = b;
-
-       /*
-        * determine hierarchical context of force, if any.  note
-        * that this may or may not correspond to the specific types
-        * referenced by the crush rule.
-        */
-       if (force >= 0) {
-               if (force >= map->max_devices ||
-                   map->device_parents[force] == 0) {
-                       /*dprintk("CRUSH: forcefed device dne\n");*/
-                       rc = -1;  /* force fed device dne */
-                       goto out;
-               }
-               if (!is_out(map, weight, force, x)) {
-                       while (1) {
-                               force_context[++force_pos] = force;
-                               if (force >= 0)
-                                       force = map->device_parents[force];
-                               else
-                                       force = map->bucket_parents[-1-force];
-                               if (force == 0)
-                                       break;
-                       }
-               }
-       }
-
-       for (step = 0; step < rule->len; step++) {
-               firstn = 0;
-               switch (rule->steps[step].op) {
-               case CRUSH_RULE_TAKE:
-                       w[0] = rule->steps[step].arg1;
-                       if (force_pos >= 0) {
-                               BUG_ON(force_context[force_pos] != w[0]);
-                               force_pos--;
-                       }
-                       wsize = 1;
-                       break;
-
-               case CRUSH_RULE_CHOOSE_LEAF_FIRSTN:
-               case CRUSH_RULE_CHOOSE_FIRSTN:
-                       firstn = 1;
-               case CRUSH_RULE_CHOOSE_LEAF_INDEP:
-               case CRUSH_RULE_CHOOSE_INDEP:
-                       BUG_ON(wsize == 0);
-
-                       recurse_to_leaf =
-                               rule->steps[step].op ==
-                                CRUSH_RULE_CHOOSE_LEAF_FIRSTN ||
-                               rule->steps[step].op ==
-                               CRUSH_RULE_CHOOSE_LEAF_INDEP;
-
-                       /* reset output */
-                       osize = 0;
-
-                       for (i = 0; i < wsize; i++) {
-                               /*
-                                * see CRUSH_N, CRUSH_N_MINUS macros.
-                                * basically, numrep <= 0 means relative to
-                                * the provided result_max
-                                */
-                               numrep = rule->steps[step].arg1;
-                               if (numrep <= 0) {
-                                       numrep += result_max;
-                                       if (numrep <= 0)
-                                               continue;
-                               }
-                               j = 0;
-                               if (osize == 0 && force_pos >= 0) {
-                                       /* skip any intermediate types */
-                                       while (force_pos &&
-                                              force_context[force_pos] < 0 &&
-                                              rule->steps[step].arg2 !=
-                                              map->buckets[-1 -
-                                              force_context[force_pos]]->type)
-                                               force_pos--;
-                                       o[osize] = force_context[force_pos];
-                                       if (recurse_to_leaf)
-                                               c[osize] = force_context[0];
-                                       j++;
-                                       force_pos--;
-                               }
-                               osize += crush_choose(map,
-                                                     map->buckets[-1-w[i]],
-                                                     weight,
-                                                     x, numrep,
-                                                     rule->steps[step].arg2,
-                                                     o+osize, j,
-                                                     firstn,
-                                                     recurse_to_leaf, c+osize);
-                       }
-
-                       if (recurse_to_leaf)
-                               /* copy final _leaf_ values to output set */
-                               memcpy(o, c, osize*sizeof(*o));
-
-                       /* swap t and w arrays */
-                       tmp = o;
-                       o = w;
-                       w = tmp;
-                       wsize = osize;
-                       break;
-
-
-               case CRUSH_RULE_EMIT:
-                       for (i = 0; i < wsize && result_len < result_max; i++) {
-                               result[result_len] = w[i];
-                               result_len++;
-                       }
-                       wsize = 0;
-                       break;
-
-               default:
-                       BUG_ON(1);
-               }
-       }
-       rc = result_len;
-
-out:
-       return rc;
-}
-
-
diff --git a/fs/ceph/crush/mapper.h b/fs/ceph/crush/mapper.h

deleted file mode 100644 (file)

index c46b99c..0000000
--- a/fs/ceph/crush/mapper.h
+++ /dev/null
@@ -1,20 +0,0 @@
-#ifndef CEPH_CRUSH_MAPPER_H
-#define CEPH_CRUSH_MAPPER_H
-
-/*
- * CRUSH functions for find rules and then mapping an input to an
- * output set.
- *
- * LGPL2
- */
-
-#include "crush.h"
-
-extern int crush_find_rule(struct crush_map *map, int pool, int type, int size);
-extern int crush_do_rule(struct crush_map *map,
-                        int ruleno,
-                        int x, int *result, int result_max,
-                        int forcefeed,    /* -1 for none */
-                        __u32 *weights);
-
-#endif
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c

deleted file mode 100644 (file)

index a3e627f..0000000
--- a/fs/ceph/crypto.c
+++ /dev/null
@@ -1,412 +0,0 @@
-
-#include "ceph_debug.h"
-
-#include <linux/err.h>
-#include <linux/scatterlist.h>
-#include <linux/slab.h>
-#include <crypto/hash.h>
-
-#include "crypto.h"
-#include "decode.h"
-
-int ceph_crypto_key_encode(struct ceph_crypto_key *key, void **p, void *end)
-{
-       if (*p + sizeof(u16) + sizeof(key->created) +
-           sizeof(u16) + key->len > end)
-               return -ERANGE;
-       ceph_encode_16(p, key->type);
-       ceph_encode_copy(p, &key->created, sizeof(key->created));
-       ceph_encode_16(p, key->len);
-       ceph_encode_copy(p, key->key, key->len);
-       return 0;
-}
-
-int ceph_crypto_key_decode(struct ceph_crypto_key *key, void **p, void *end)
-{
-       ceph_decode_need(p, end, 2*sizeof(u16) + sizeof(key->created), bad);
-       key->type = ceph_decode_16(p);
-       ceph_decode_copy(p, &key->created, sizeof(key->created));
-       key->len = ceph_decode_16(p);
-       ceph_decode_need(p, end, key->len, bad);
-       key->key = kmalloc(key->len, GFP_NOFS);
-       if (!key->key)
-               return -ENOMEM;
-       ceph_decode_copy(p, key->key, key->len);
-       return 0;
-
-bad:
-       dout("failed to decode crypto key\n");
-       return -EINVAL;
-}
-
-int ceph_crypto_key_unarmor(struct ceph_crypto_key *key, const char *inkey)
-{
-       int inlen = strlen(inkey);
-       int blen = inlen * 3 / 4;
-       void *buf, *p;
-       int ret;
-
-       dout("crypto_key_unarmor %s\n", inkey);
-       buf = kmalloc(blen, GFP_NOFS);
-       if (!buf)
-               return -ENOMEM;
-       blen = ceph_unarmor(buf, inkey, inkey+inlen);
-       if (blen < 0) {
-               kfree(buf);
-               return blen;
-       }
-
-       p = buf;
-       ret = ceph_crypto_key_decode(key, &p, p + blen);
-       kfree(buf);
-       if (ret)
-               return ret;
-       dout("crypto_key_unarmor key %p type %d len %d\n", key,
-            key->type, key->len);
-       return 0;
-}
-
-
-
-#define AES_KEY_SIZE 16
-
-static struct crypto_blkcipher *ceph_crypto_alloc_cipher(void)
-{
-       return crypto_alloc_blkcipher("cbc(aes)", 0, CRYPTO_ALG_ASYNC);
-}
-
-static const u8 *aes_iv = (u8 *)CEPH_AES_IV;
-
-static int ceph_aes_encrypt(const void *key, int key_len,
-                           void *dst, size_t *dst_len,
-                           const void *src, size_t src_len)
-{
-       struct scatterlist sg_in[2], sg_out[1];
-       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
-       struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
-       int ret;
-       void *iv;
-       int ivsize;
-       size_t zero_padding = (0x10 - (src_len & 0x0f));
-       char pad[16];
-
-       if (IS_ERR(tfm))
-               return PTR_ERR(tfm);
-
-       memset(pad, zero_padding, zero_padding);
-
-       *dst_len = src_len + zero_padding;
-
-       crypto_blkcipher_setkey((void *)tfm, key, key_len);
-       sg_init_table(sg_in, 2);
-       sg_set_buf(&sg_in[0], src, src_len);
-       sg_set_buf(&sg_in[1], pad, zero_padding);
-       sg_init_table(sg_out, 1);
-       sg_set_buf(sg_out, dst, *dst_len);
-       iv = crypto_blkcipher_crt(tfm)->iv;
-       ivsize = crypto_blkcipher_ivsize(tfm);
-
-       memcpy(iv, aes_iv, ivsize);
-       /*
-       print_hex_dump(KERN_ERR, "enc key: ", DUMP_PREFIX_NONE, 16, 1,
-                      key, key_len, 1);
-       print_hex_dump(KERN_ERR, "enc src: ", DUMP_PREFIX_NONE, 16, 1,
-                       src, src_len, 1);
-       print_hex_dump(KERN_ERR, "enc pad: ", DUMP_PREFIX_NONE, 16, 1,
-                       pad, zero_padding, 1);
-       */
-       ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
-                                    src_len + zero_padding);
-       crypto_free_blkcipher(tfm);
-       if (ret < 0)
-               pr_err("ceph_aes_crypt failed %d\n", ret);
-       /*
-       print_hex_dump(KERN_ERR, "enc out: ", DUMP_PREFIX_NONE, 16, 1,
-                      dst, *dst_len, 1);
-       */
-       return 0;
-}
-
-static int ceph_aes_encrypt2(const void *key, int key_len, void *dst,
-                            size_t *dst_len,
-                            const void *src1, size_t src1_len,
-                            const void *src2, size_t src2_len)
-{
-       struct scatterlist sg_in[3], sg_out[1];
-       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
-       struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
-       int ret;
-       void *iv;
-       int ivsize;
-       size_t zero_padding = (0x10 - ((src1_len + src2_len) & 0x0f));
-       char pad[16];
-
-       if (IS_ERR(tfm))
-               return PTR_ERR(tfm);
-
-       memset(pad, zero_padding, zero_padding);
-
-       *dst_len = src1_len + src2_len + zero_padding;
-
-       crypto_blkcipher_setkey((void *)tfm, key, key_len);
-       sg_init_table(sg_in, 3);
-       sg_set_buf(&sg_in[0], src1, src1_len);
-       sg_set_buf(&sg_in[1], src2, src2_len);
-       sg_set_buf(&sg_in[2], pad, zero_padding);
-       sg_init_table(sg_out, 1);
-       sg_set_buf(sg_out, dst, *dst_len);
-       iv = crypto_blkcipher_crt(tfm)->iv;
-       ivsize = crypto_blkcipher_ivsize(tfm);
-
-       memcpy(iv, aes_iv, ivsize);
-       /*
-       print_hex_dump(KERN_ERR, "enc  key: ", DUMP_PREFIX_NONE, 16, 1,
-                      key, key_len, 1);
-       print_hex_dump(KERN_ERR, "enc src1: ", DUMP_PREFIX_NONE, 16, 1,
-                       src1, src1_len, 1);
-       print_hex_dump(KERN_ERR, "enc src2: ", DUMP_PREFIX_NONE, 16, 1,
-                       src2, src2_len, 1);
-       print_hex_dump(KERN_ERR, "enc  pad: ", DUMP_PREFIX_NONE, 16, 1,
-                       pad, zero_padding, 1);
-       */
-       ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
-                                    src1_len + src2_len + zero_padding);
-       crypto_free_blkcipher(tfm);
-       if (ret < 0)
-               pr_err("ceph_aes_crypt2 failed %d\n", ret);
-       /*
-       print_hex_dump(KERN_ERR, "enc  out: ", DUMP_PREFIX_NONE, 16, 1,
-                      dst, *dst_len, 1);
-       */
-       return 0;
-}
-
-static int ceph_aes_decrypt(const void *key, int key_len,
-                           void *dst, size_t *dst_len,
-                           const void *src, size_t src_len)
-{
-       struct scatterlist sg_in[1], sg_out[2];
-       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
-       struct blkcipher_desc desc = { .tfm = tfm };
-       char pad[16];
-       void *iv;
-       int ivsize;
-       int ret;
-       int last_byte;
-
-       if (IS_ERR(tfm))
-               return PTR_ERR(tfm);
-
-       crypto_blkcipher_setkey((void *)tfm, key, key_len);
-       sg_init_table(sg_in, 1);
-       sg_init_table(sg_out, 2);
-       sg_set_buf(sg_in, src, src_len);
-       sg_set_buf(&sg_out[0], dst, *dst_len);
-       sg_set_buf(&sg_out[1], pad, sizeof(pad));
-
-       iv = crypto_blkcipher_crt(tfm)->iv;
-       ivsize = crypto_blkcipher_ivsize(tfm);
-
-       memcpy(iv, aes_iv, ivsize);
-
-       /*
-       print_hex_dump(KERN_ERR, "dec key: ", DUMP_PREFIX_NONE, 16, 1,
-                      key, key_len, 1);
-       print_hex_dump(KERN_ERR, "dec  in: ", DUMP_PREFIX_NONE, 16, 1,
-                      src, src_len, 1);
-       */
-
-       ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
-       crypto_free_blkcipher(tfm);
-       if (ret < 0) {
-               pr_err("ceph_aes_decrypt failed %d\n", ret);
-               return ret;
-       }
-
-       if (src_len <= *dst_len)
-               last_byte = ((char *)dst)[src_len - 1];
-       else
-               last_byte = pad[src_len - *dst_len - 1];
-       if (last_byte <= 16 && src_len >= last_byte) {
-               *dst_len = src_len - last_byte;
-       } else {
-               pr_err("ceph_aes_decrypt got bad padding %d on src len %d\n",
-                      last_byte, (int)src_len);
-               return -EPERM;  /* bad padding */
-       }
-       /*
-       print_hex_dump(KERN_ERR, "dec out: ", DUMP_PREFIX_NONE, 16, 1,
-                      dst, *dst_len, 1);
-       */
-       return 0;
-}
-
-static int ceph_aes_decrypt2(const void *key, int key_len,
-                            void *dst1, size_t *dst1_len,
-                            void *dst2, size_t *dst2_len,
-                            const void *src, size_t src_len)
-{
-       struct scatterlist sg_in[1], sg_out[3];
-       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
-       struct blkcipher_desc desc = { .tfm = tfm };
-       char pad[16];
-       void *iv;
-       int ivsize;
-       int ret;
-       int last_byte;
-
-       if (IS_ERR(tfm))
-               return PTR_ERR(tfm);
-
-       sg_init_table(sg_in, 1);
-       sg_set_buf(sg_in, src, src_len);
-       sg_init_table(sg_out, 3);
-       sg_set_buf(&sg_out[0], dst1, *dst1_len);
-       sg_set_buf(&sg_out[1], dst2, *dst2_len);
-       sg_set_buf(&sg_out[2], pad, sizeof(pad));
-
-       crypto_blkcipher_setkey((void *)tfm, key, key_len);
-       iv = crypto_blkcipher_crt(tfm)->iv;
-       ivsize = crypto_blkcipher_ivsize(tfm);
-
-       memcpy(iv, aes_iv, ivsize);
-
-       /*
-       print_hex_dump(KERN_ERR, "dec  key: ", DUMP_PREFIX_NONE, 16, 1,
-                      key, key_len, 1);
-       print_hex_dump(KERN_ERR, "dec   in: ", DUMP_PREFIX_NONE, 16, 1,
-                      src, src_len, 1);
-       */
-
-       ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
-       crypto_free_blkcipher(tfm);
-       if (ret < 0) {
-               pr_err("ceph_aes_decrypt failed %d\n", ret);
-               return ret;
-       }
-
-       if (src_len <= *dst1_len)
-               last_byte = ((char *)dst1)[src_len - 1];
-       else if (src_len <= *dst1_len + *dst2_len)
-               last_byte = ((char *)dst2)[src_len - *dst1_len - 1];
-       else
-               last_byte = pad[src_len - *dst1_len - *dst2_len - 1];
-       if (last_byte <= 16 && src_len >= last_byte) {
-               src_len -= last_byte;
-       } else {
-               pr_err("ceph_aes_decrypt got bad padding %d on src len %d\n",
-                      last_byte, (int)src_len);
-               return -EPERM;  /* bad padding */
-       }
-
-       if (src_len < *dst1_len) {
-               *dst1_len = src_len;
-               *dst2_len = 0;
-       } else {
-               *dst2_len = src_len - *dst1_len;
-       }
-       /*
-       print_hex_dump(KERN_ERR, "dec  out1: ", DUMP_PREFIX_NONE, 16, 1,
-                      dst1, *dst1_len, 1);
-       print_hex_dump(KERN_ERR, "dec  out2: ", DUMP_PREFIX_NONE, 16, 1,
-                      dst2, *dst2_len, 1);
-       */
-
-       return 0;
-}
-
-
-int ceph_decrypt(struct ceph_crypto_key *secret, void *dst, size_t *dst_len,
-                const void *src, size_t src_len)
-{
-       switch (secret->type) {
-       case CEPH_CRYPTO_NONE:
-               if (*dst_len < src_len)
-                       return -ERANGE;
-               memcpy(dst, src, src_len);
-               *dst_len = src_len;
-               return 0;
-
-       case CEPH_CRYPTO_AES:
-               return ceph_aes_decrypt(secret->key, secret->len, dst,
-                                       dst_len, src, src_len);
-
-       default:
-               return -EINVAL;
-       }
-}
-
-int ceph_decrypt2(struct ceph_crypto_key *secret,
-                       void *dst1, size_t *dst1_len,
-                       void *dst2, size_t *dst2_len,
-                       const void *src, size_t src_len)
-{
-       size_t t;
-
-       switch (secret->type) {
-       case CEPH_CRYPTO_NONE:
-               if (*dst1_len + *dst2_len < src_len)
-                       return -ERANGE;
-               t = min(*dst1_len, src_len);
-               memcpy(dst1, src, t);
-               *dst1_len = t;
-               src += t;
-               src_len -= t;
-               if (src_len) {
-                       t = min(*dst2_len, src_len);
-                       memcpy(dst2, src, t);
-                       *dst2_len = t;
-               }
-               return 0;
-
-       case CEPH_CRYPTO_AES:
-               return ceph_aes_decrypt2(secret->key, secret->len,
-                                        dst1, dst1_len, dst2, dst2_len,
-                                        src, src_len);
-
-       default:
-               return -EINVAL;
-       }
-}
-
-int ceph_encrypt(struct ceph_crypto_key *secret, void *dst, size_t *dst_len,
-                const void *src, size_t src_len)
-{
-       switch (secret->type) {
-       case CEPH_CRYPTO_NONE:
-               if (*dst_len < src_len)
-                       return -ERANGE;
-               memcpy(dst, src, src_len);
-               *dst_len = src_len;
-               return 0;
-
-       case CEPH_CRYPTO_AES:
-               return ceph_aes_encrypt(secret->key, secret->len, dst,
-                                       dst_len, src, src_len);
-
-       default:
-               return -EINVAL;
-       }
-}
-
-int ceph_encrypt2(struct ceph_crypto_key *secret, void *dst, size_t *dst_len,
-                 const void *src1, size_t src1_len,
-                 const void *src2, size_t src2_len)
-{
-       switch (secret->type) {
-       case CEPH_CRYPTO_NONE:
-               if (*dst_len < src1_len + src2_len)
-                       return -ERANGE;
-               memcpy(dst, src1, src1_len);
-               memcpy(dst + src1_len, src2, src2_len);
-               *dst_len = src1_len + src2_len;
-               return 0;
-
-       case CEPH_CRYPTO_AES:
-               return ceph_aes_encrypt2(secret->key, secret->len, dst, dst_len,
-                                        src1, src1_len, src2, src2_len);
-
-       default:
-               return -EINVAL;
-       }
-}
diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h

deleted file mode 100644 (file)

index bdf3860..0000000
--- a/fs/ceph/crypto.h
+++ /dev/null
@@ -1,48 +0,0 @@
-#ifndef _FS_CEPH_CRYPTO_H
-#define _FS_CEPH_CRYPTO_H
-
-#include "types.h"
-#include "buffer.h"
-
-/*
- * cryptographic secret
- */
-struct ceph_crypto_key {
-       int type;
-       struct ceph_timespec created;
-       int len;
-       void *key;
-};
-
-static inline void ceph_crypto_key_destroy(struct ceph_crypto_key *key)
-{
-       kfree(key->key);
-}
-
-extern int ceph_crypto_key_encode(struct ceph_crypto_key *key,
-                                 void **p, void *end);
-extern int ceph_crypto_key_decode(struct ceph_crypto_key *key,
-                                 void **p, void *end);
-extern int ceph_crypto_key_unarmor(struct ceph_crypto_key *key, const char *in);
-
-/* crypto.c */
-extern int ceph_decrypt(struct ceph_crypto_key *secret,
-                       void *dst, size_t *dst_len,
-                       const void *src, size_t src_len);
-extern int ceph_encrypt(struct ceph_crypto_key *secret,
-                       void *dst, size_t *dst_len,
-                       const void *src, size_t src_len);
-extern int ceph_decrypt2(struct ceph_crypto_key *secret,
-                       void *dst1, size_t *dst1_len,
-                       void *dst2, size_t *dst2_len,
-                       const void *src, size_t src_len);
-extern int ceph_encrypt2(struct ceph_crypto_key *secret,
-                        void *dst, size_t *dst_len,
-                        const void *src1, size_t src1_len,
-                        const void *src2, size_t src2_len);
-
-/* armor.c */
-extern int ceph_armor(char *dst, const char *src, const char *end);
-extern int ceph_unarmor(char *dst, const char *src, const char *end);
-
-#endif
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c

index 6fd8b20a86112c367c788a20c2f134108acc40e8..7ae1b3d55b58a7b70bf55e79a0788f211b90ff3c 100644 (file)
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -1,4 +1,4 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/device.h>
  #include <linux/slab.h>
@@ -7,143 +7,49 @@
  #include <linux/debugfs.h>
  #include <linux/seq_file.h>
  
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
+
  #include "super.h"
-#include "mds_client.h"
-#include "mon_client.h"
-#include "auth.h"
  
  #ifdef CONFIG_DEBUG_FS
  
-/*
- * Implement /sys/kernel/debug/ceph fun
- *
- * /sys/kernel/debug/ceph/client*  - an instance of the ceph client
- *      .../osdmap      - current osdmap
- *      .../mdsmap      - current mdsmap
- *      .../monmap      - current monmap
- *      .../osdc        - active osd requests
- *      .../mdsc        - active mds requests
- *      .../monc        - mon client state
- *      .../dentry_lru  - dump contents of dentry lru
- *      .../caps        - expose cap (reservation) stats
- *      .../bdi         - symlink to ../../bdi/something
- */
-
-static struct dentry *ceph_debugfs_dir;
-
-static int monmap_show(struct seq_file *s, void *p)
-{
-       int i;
-       struct ceph_client *client = s->private;
-
-       if (client->monc.monmap == NULL)
-               return 0;
-
-       seq_printf(s, "epoch %d\n", client->monc.monmap->epoch);
-       for (i = 0; i < client->monc.monmap->num_mon; i++) {
-               struct ceph_entity_inst *inst =
-                       &client->monc.monmap->mon_inst[i];
-
-               seq_printf(s, "\t%s%lld\t%s\n",
-                          ENTITY_NAME(inst->name),
-                          pr_addr(&inst->addr.in_addr));
-       }
-       return 0;
-}
+#include "mds_client.h"
  
  static int mdsmap_show(struct seq_file *s, void *p)
  {
         int i;
-       struct ceph_client *client = s->private;
+       struct ceph_fs_client *fsc = s->private;
  
-       if (client->mdsc.mdsmap == NULL)
+       if (fsc->mdsc == NULL || fsc->mdsc->mdsmap == NULL)
                 return 0;
-       seq_printf(s, "epoch %d\n", client->mdsc.mdsmap->m_epoch);
-       seq_printf(s, "root %d\n", client->mdsc.mdsmap->m_root);
+       seq_printf(s, "epoch %d\n", fsc->mdsc->mdsmap->m_epoch);
+       seq_printf(s, "root %d\n", fsc->mdsc->mdsmap->m_root);
         seq_printf(s, "session_timeout %d\n",
-                      client->mdsc.mdsmap->m_session_timeout);
+                      fsc->mdsc->mdsmap->m_session_timeout);
         seq_printf(s, "session_autoclose %d\n",
-                      client->mdsc.mdsmap->m_session_autoclose);
-       for (i = 0; i < client->mdsc.mdsmap->m_max_mds; i++) {
+                      fsc->mdsc->mdsmap->m_session_autoclose);
+       for (i = 0; i < fsc->mdsc->mdsmap->m_max_mds; i++) {
                 struct ceph_entity_addr *addr =
-                       &client->mdsc.mdsmap->m_info[i].addr;
-               int state = client->mdsc.mdsmap->m_info[i].state;
+                       &fsc->mdsc->mdsmap->m_info[i].addr;
+               int state = fsc->mdsc->mdsmap->m_info[i].state;
  
-               seq_printf(s, "\tmds%d\t%s\t(%s)\n", i, pr_addr(&addr->in_addr),
+               seq_printf(s, "\tmds%d\t%s\t(%s)\n", i,
+                              ceph_pr_addr(&addr->in_addr),
                                ceph_mds_state_name(state));
         }
         return 0;
  }
  
-static int osdmap_show(struct seq_file *s, void *p)
-{
-       int i;
-       struct ceph_client *client = s->private;
-       struct rb_node *n;
-
-       if (client->osdc.osdmap == NULL)
-               return 0;
-       seq_printf(s, "epoch %d\n", client->osdc.osdmap->epoch);
-       seq_printf(s, "flags%s%s\n",
-                  (client->osdc.osdmap->flags & CEPH_OSDMAP_NEARFULL) ?
-                  " NEARFULL" : "",
-                  (client->osdc.osdmap->flags & CEPH_OSDMAP_FULL) ?
-                  " FULL" : "");
-       for (n = rb_first(&client->osdc.osdmap->pg_pools); n; n = rb_next(n)) {
-               struct ceph_pg_pool_info *pool =
-                       rb_entry(n, struct ceph_pg_pool_info, node);
-               seq_printf(s, "pg_pool %d pg_num %d / %d, lpg_num %d / %d\n",
-                          pool->id, pool->v.pg_num, pool->pg_num_mask,
-                          pool->v.lpg_num, pool->lpg_num_mask);
-       }
-       for (i = 0; i < client->osdc.osdmap->max_osd; i++) {
-               struct ceph_entity_addr *addr =
-                       &client->osdc.osdmap->osd_addr[i];
-               int state = client->osdc.osdmap->osd_state[i];
-               char sb[64];
-
-               seq_printf(s, "\tosd%d\t%s\t%3d%%\t(%s)\n",
-                          i, pr_addr(&addr->in_addr),
-                          ((client->osdc.osdmap->osd_weight[i]*100) >> 16),
-                          ceph_osdmap_state_str(sb, sizeof(sb), state));
-       }
-       return 0;
-}
-
-static int monc_show(struct seq_file *s, void *p)
-{
-       struct ceph_client *client = s->private;
-       struct ceph_mon_generic_request *req;
-       struct ceph_mon_client *monc = &client->monc;
-       struct rb_node *rp;
-
-       mutex_lock(&monc->mutex);
-
-       if (monc->have_mdsmap)
-               seq_printf(s, "have mdsmap %u\n", (unsigned)monc->have_mdsmap);
-       if (monc->have_osdmap)
-               seq_printf(s, "have osdmap %u\n", (unsigned)monc->have_osdmap);
-       if (monc->want_next_osdmap)
-               seq_printf(s, "want next osdmap\n");
-
-       for (rp = rb_first(&monc->generic_request_tree); rp; rp = rb_next(rp)) {
-               __u16 op;
-               req = rb_entry(rp, struct ceph_mon_generic_request, node);
-               op = le16_to_cpu(req->request->hdr.type);
-               if (op == CEPH_MSG_STATFS)
-                       seq_printf(s, "%lld statfs\n", req->tid);
-               else
-                       seq_printf(s, "%lld unknown\n", req->tid);
-       }
-
-       mutex_unlock(&monc->mutex);
-       return 0;
-}
-
+/*
+ * mdsc debugfs
+ */
  static int mdsc_show(struct seq_file *s, void *p)
  {
-       struct ceph_client *client = s->private;
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = s->private;
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         struct rb_node *rp;
         int pathlen;
@@ -214,61 +120,12 @@ static int mdsc_show(struct seq_file *s, void *p)
         return 0;
  }
  
-static int osdc_show(struct seq_file *s, void *pp)
-{
-       struct ceph_client *client = s->private;
-       struct ceph_osd_client *osdc = &client->osdc;
-       struct rb_node *p;
-
-       mutex_lock(&osdc->request_mutex);
-       for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
-               struct ceph_osd_request *req;
-               struct ceph_osd_request_head *head;
-               struct ceph_osd_op *op;
-               int num_ops;
-               int opcode, olen;
-               int i;
-
-               req = rb_entry(p, struct ceph_osd_request, r_node);
-
-               seq_printf(s, "%lld\tosd%d\t%d.%x\t", req->r_tid,
-                          req->r_osd ? req->r_osd->o_osd : -1,
-                          le32_to_cpu(req->r_pgid.pool),
-                          le16_to_cpu(req->r_pgid.ps));
-
-               head = req->r_request->front.iov_base;
-               op = (void *)(head + 1);
-
-               num_ops = le16_to_cpu(head->num_ops);
-               olen = le32_to_cpu(head->object_len);
-               seq_printf(s, "%.*s", olen,
-                          (const char *)(head->ops + num_ops));
-
-               if (req->r_reassert_version.epoch)
-                       seq_printf(s, "\t%u'%llu",
-                          (unsigned)le32_to_cpu(req->r_reassert_version.epoch),
-                          le64_to_cpu(req->r_reassert_version.version));
-               else
-                       seq_printf(s, "\t");
-
-               for (i = 0; i < num_ops; i++) {
-                       opcode = le16_to_cpu(op->op);
-                       seq_printf(s, "\t%s", ceph_osd_op_name(opcode));
-                       op++;
-               }
-
-               seq_printf(s, "\n");
-       }
-       mutex_unlock(&osdc->request_mutex);
-       return 0;
-}
-
  static int caps_show(struct seq_file *s, void *p)
  {
-       struct ceph_client *client = s->private;
+       struct ceph_fs_client *fsc = s->private;
         int total, avail, used, reserved, min;
  
-       ceph_reservation_status(client, &total, &avail, &used, &reserved, &min);
+       ceph_reservation_status(fsc, &total, &avail, &used, &reserved, &min);
         seq_printf(s, "total\t\t%d\n"
                    "avail\t\t%d\n"
                    "used\t\t%d\n"
@@ -280,8 +137,8 @@ static int caps_show(struct seq_file *s, void *p)
  
  static int dentry_lru_show(struct seq_file *s, void *ptr)
  {
-       struct ceph_client *client = s->private;
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = s->private;
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_dentry_info *di;
  
         spin_lock(&mdsc->dentry_lru_lock);
@@ -295,199 +152,124 @@ static int dentry_lru_show(struct seq_file *s, void *ptr)
         return 0;
  }
  
-#define DEFINE_SHOW_FUNC(name)                                         \
-static int name##_open(struct inode *inode, struct file *file)         \
-{                                                                      \
-       struct seq_file *sf;                                            \
-       int ret;                                                        \
-                                                                       \
-       ret = single_open(file, name, NULL);                            \
-       sf = file->private_data;                                        \
-       sf->private = inode->i_private;                                 \
-       return ret;                                                     \
-}                                                                      \
-                                                                       \
-static const struct file_operations name##_fops = {                    \
-       .open           = name##_open,                                  \
-       .read           = seq_read,                                     \
-       .llseek         = seq_lseek,                                    \
-       .release        = single_release,                               \
-};
-
-DEFINE_SHOW_FUNC(monmap_show)
-DEFINE_SHOW_FUNC(mdsmap_show)
-DEFINE_SHOW_FUNC(osdmap_show)
-DEFINE_SHOW_FUNC(monc_show)
-DEFINE_SHOW_FUNC(mdsc_show)
-DEFINE_SHOW_FUNC(osdc_show)
-DEFINE_SHOW_FUNC(dentry_lru_show)
-DEFINE_SHOW_FUNC(caps_show)
+CEPH_DEFINE_SHOW_FUNC(mdsmap_show)
+CEPH_DEFINE_SHOW_FUNC(mdsc_show)
+CEPH_DEFINE_SHOW_FUNC(caps_show)
+CEPH_DEFINE_SHOW_FUNC(dentry_lru_show)
+
  
+/*
+ * debugfs
+ */
  static int congestion_kb_set(void *data, u64 val)
  {
-       struct ceph_client *client = (struct ceph_client *)data;
-
-       if (client)
-               client->mount_args->congestion_kb = (int)val;
+       struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
  
+       fsc->mount_options->congestion_kb = (int)val;
         return 0;
  }
  
  static int congestion_kb_get(void *data, u64 *val)
  {
-       struct ceph_client *client = (struct ceph_client *)data;
-
-       if (client)
-               *val = (u64)client->mount_args->congestion_kb;
+       struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
  
+       *val = (u64)fsc->mount_options->congestion_kb;
         return 0;
  }
  
-
  DEFINE_SIMPLE_ATTRIBUTE(congestion_kb_fops, congestion_kb_get,
                         congestion_kb_set, "%llu\n");
  
-int __init ceph_debugfs_init(void)
-{
-       ceph_debugfs_dir = debugfs_create_dir("ceph", NULL);
-       if (!ceph_debugfs_dir)
-               return -ENOMEM;
-       return 0;
-}
  
-void ceph_debugfs_cleanup(void)
+void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
  {
-       debugfs_remove(ceph_debugfs_dir);
+       dout("ceph_fs_debugfs_cleanup\n");
+       debugfs_remove(fsc->debugfs_bdi);
+       debugfs_remove(fsc->debugfs_congestion_kb);
+       debugfs_remove(fsc->debugfs_mdsmap);
+       debugfs_remove(fsc->debugfs_caps);
+       debugfs_remove(fsc->debugfs_mdsc);
+       debugfs_remove(fsc->debugfs_dentry_lru);
  }
  
-int ceph_debugfs_client_init(struct ceph_client *client)
+int ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
  {
-       int ret = 0;
-       char name[80];
-
-       snprintf(name, sizeof(name), "%pU.client%lld", &client->fsid,
-                client->monc.auth->global_id);
+       char name[100];
+       int err = -ENOMEM;
  
-       client->debugfs_dir = debugfs_create_dir(name, ceph_debugfs_dir);
-       if (!client->debugfs_dir)
-               goto out;
-
-       client->monc.debugfs_file = debugfs_create_file("monc",
-                                                     0600,
-                                                     client->debugfs_dir,
-                                                     client,
-                                                     &monc_show_fops);
-       if (!client->monc.debugfs_file)
+       dout("ceph_fs_debugfs_init\n");
+       fsc->debugfs_congestion_kb =
+               debugfs_create_file("writeback_congestion_kb",
+                                   0600,
+                                   fsc->client->debugfs_dir,
+                                   fsc,
+                                   &congestion_kb_fops);
+       if (!fsc->debugfs_congestion_kb)
                 goto out;
  
-       client->mdsc.debugfs_file = debugfs_create_file("mdsc",
-                                                     0600,
-                                                     client->debugfs_dir,
-                                                     client,
-                                                     &mdsc_show_fops);
-       if (!client->mdsc.debugfs_file)
-               goto out;
+       dout("a\n");
  
-       client->osdc.debugfs_file = debugfs_create_file("osdc",
-                                                     0600,
-                                                     client->debugfs_dir,
-                                                     client,
-                                                     &osdc_show_fops);
-       if (!client->osdc.debugfs_file)
+       snprintf(name, sizeof(name), "../../bdi/%s",
+                dev_name(fsc->backing_dev_info.dev));
+       fsc->debugfs_bdi =
+               debugfs_create_symlink("bdi",
+                                      fsc->client->debugfs_dir,
+                                      name);
+       if (!fsc->debugfs_bdi)
                 goto out;
  
-       client->debugfs_monmap = debugfs_create_file("monmap",
+       dout("b\n");
+       fsc->debugfs_mdsmap = debugfs_create_file("mdsmap",
                                         0600,
-                                       client->debugfs_dir,
-                                       client,
-                                       &monmap_show_fops);
-       if (!client->debugfs_monmap)
-               goto out;
-
-       client->debugfs_mdsmap = debugfs_create_file("mdsmap",
-                                       0600,
-                                       client->debugfs_dir,
-                                       client,
+                                       fsc->client->debugfs_dir,
+                                       fsc,
                                         &mdsmap_show_fops);
-       if (!client->debugfs_mdsmap)
-               goto out;
-
-       client->debugfs_osdmap = debugfs_create_file("osdmap",
-                                       0600,
-                                       client->debugfs_dir,
-                                       client,
-                                       &osdmap_show_fops);
-       if (!client->debugfs_osdmap)
+       if (!fsc->debugfs_mdsmap)
                 goto out;
  
-       client->debugfs_dentry_lru = debugfs_create_file("dentry_lru",
-                                       0600,
-                                       client->debugfs_dir,
-                                       client,
-                                       &dentry_lru_show_fops);
-       if (!client->debugfs_dentry_lru)
+       dout("ca\n");
+       fsc->debugfs_mdsc = debugfs_create_file("mdsc",
+                                               0600,
+                                               fsc->client->debugfs_dir,
+                                               fsc,
+                                               &mdsc_show_fops);
+       if (!fsc->debugfs_mdsc)
                 goto out;
  
-       client->debugfs_caps = debugfs_create_file("caps",
+       dout("da\n");
+       fsc->debugfs_caps = debugfs_create_file("caps",
                                                    0400,
-                                                  client->debugfs_dir,
-                                                  client,
+                                                  fsc->client->debugfs_dir,
+                                                  fsc,
                                                    &caps_show_fops);
-       if (!client->debugfs_caps)
+       if (!fsc->debugfs_caps)
                 goto out;
  
-       client->debugfs_congestion_kb =
-               debugfs_create_file("writeback_congestion_kb",
-                                   0600,
-                                   client->debugfs_dir,
-                                   client,
-                                   &congestion_kb_fops);
-       if (!client->debugfs_congestion_kb)
+       dout("ea\n");
+       fsc->debugfs_dentry_lru = debugfs_create_file("dentry_lru",
+                                       0600,
+                                       fsc->client->debugfs_dir,
+                                       fsc,
+                                       &dentry_lru_show_fops);
+       if (!fsc->debugfs_dentry_lru)
                 goto out;
  
-       sprintf(name, "../../bdi/%s", dev_name(client->sb->s_bdi->dev));
-       client->debugfs_bdi = debugfs_create_symlink("bdi", client->debugfs_dir,
-                                                    name);
-
         return 0;
  
  out:
-       ceph_debugfs_client_cleanup(client);
-       return ret;
+       ceph_fs_debugfs_cleanup(fsc);
+       return err;
  }
  
-void ceph_debugfs_client_cleanup(struct ceph_client *client)
-{
-       debugfs_remove(client->debugfs_bdi);
-       debugfs_remove(client->debugfs_caps);
-       debugfs_remove(client->debugfs_dentry_lru);
-       debugfs_remove(client->debugfs_osdmap);
-       debugfs_remove(client->debugfs_mdsmap);
-       debugfs_remove(client->debugfs_monmap);
-       debugfs_remove(client->osdc.debugfs_file);
-       debugfs_remove(client->mdsc.debugfs_file);
-       debugfs_remove(client->monc.debugfs_file);
-       debugfs_remove(client->debugfs_congestion_kb);
-       debugfs_remove(client->debugfs_dir);
-}
  
  #else  /* CONFIG_DEBUG_FS */
  
-int __init ceph_debugfs_init(void)
-{
-       return 0;
-}
-
-void ceph_debugfs_cleanup(void)
-{
-}
-
-int ceph_debugfs_client_init(struct ceph_client *client)
+int ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
  {
         return 0;
  }
  
-void ceph_debugfs_client_cleanup(struct ceph_client *client)
+void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
  {
  }
  
diff --git a/fs/ceph/decode.h b/fs/ceph/decode.h

deleted file mode 100644 (file)

index 3d25415..0000000
--- a/fs/ceph/decode.h
+++ /dev/null
@@ -1,196 +0,0 @@
-#ifndef __CEPH_DECODE_H
-#define __CEPH_DECODE_H
-
-#include <asm/unaligned.h>
-#include <linux/time.h>
-
-#include "types.h"
-
-/*
- * in all cases,
- *   void **p     pointer to position pointer
- *   void *end    pointer to end of buffer (last byte + 1)
- */
-
-static inline u64 ceph_decode_64(void **p)
-{
-       u64 v = get_unaligned_le64(*p);
-       *p += sizeof(u64);
-       return v;
-}
-static inline u32 ceph_decode_32(void **p)
-{
-       u32 v = get_unaligned_le32(*p);
-       *p += sizeof(u32);
-       return v;
-}
-static inline u16 ceph_decode_16(void **p)
-{
-       u16 v = get_unaligned_le16(*p);
-       *p += sizeof(u16);
-       return v;
-}
-static inline u8 ceph_decode_8(void **p)
-{
-       u8 v = *(u8 *)*p;
-       (*p)++;
-       return v;
-}
-static inline void ceph_decode_copy(void **p, void *pv, size_t n)
-{
-       memcpy(pv, *p, n);
-       *p += n;
-}
-
-/*
- * bounds check input.
- */
-#define ceph_decode_need(p, end, n, bad)               \
-       do {                                            \
-               if (unlikely(*(p) + (n) > (end)))       \
-                       goto bad;                       \
-       } while (0)
-
-#define ceph_decode_64_safe(p, end, v, bad)                    \
-       do {                                                    \
-               ceph_decode_need(p, end, sizeof(u64), bad);     \
-               v = ceph_decode_64(p);                          \
-       } while (0)
-#define ceph_decode_32_safe(p, end, v, bad)                    \
-       do {                                                    \
-               ceph_decode_need(p, end, sizeof(u32), bad);     \
-               v = ceph_decode_32(p);                          \
-       } while (0)
-#define ceph_decode_16_safe(p, end, v, bad)                    \
-       do {                                                    \
-               ceph_decode_need(p, end, sizeof(u16), bad);     \
-               v = ceph_decode_16(p);                          \
-       } while (0)
-#define ceph_decode_8_safe(p, end, v, bad)                     \
-       do {                                                    \
-               ceph_decode_need(p, end, sizeof(u8), bad);      \
-               v = ceph_decode_8(p);                           \
-       } while (0)
-
-#define ceph_decode_copy_safe(p, end, pv, n, bad)              \
-       do {                                                    \
-               ceph_decode_need(p, end, n, bad);               \
-               ceph_decode_copy(p, pv, n);                     \
-       } while (0)
-
-/*
- * struct ceph_timespec <-> struct timespec
- */
-static inline void ceph_decode_timespec(struct timespec *ts,
-                                       const struct ceph_timespec *tv)
-{
-       ts->tv_sec = le32_to_cpu(tv->tv_sec);
-       ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
-}
-static inline void ceph_encode_timespec(struct ceph_timespec *tv,
-                                       const struct timespec *ts)
-{
-       tv->tv_sec = cpu_to_le32(ts->tv_sec);
-       tv->tv_nsec = cpu_to_le32(ts->tv_nsec);
-}
-
-/*
- * sockaddr_storage <-> ceph_sockaddr
- */
-static inline void ceph_encode_addr(struct ceph_entity_addr *a)
-{
-       __be16 ss_family = htons(a->in_addr.ss_family);
-       a->in_addr.ss_family = *(__u16 *)&ss_family;
-}
-static inline void ceph_decode_addr(struct ceph_entity_addr *a)
-{
-       __be16 ss_family = *(__be16 *)&a->in_addr.ss_family;
-       a->in_addr.ss_family = ntohs(ss_family);
-       WARN_ON(a->in_addr.ss_family == 512);
-}
-
-/*
- * encoders
- */
-static inline void ceph_encode_64(void **p, u64 v)
-{
-       put_unaligned_le64(v, (__le64 *)*p);
-       *p += sizeof(u64);
-}
-static inline void ceph_encode_32(void **p, u32 v)
-{
-       put_unaligned_le32(v, (__le32 *)*p);
-       *p += sizeof(u32);
-}
-static inline void ceph_encode_16(void **p, u16 v)
-{
-       put_unaligned_le16(v, (__le16 *)*p);
-       *p += sizeof(u16);
-}
-static inline void ceph_encode_8(void **p, u8 v)
-{
-       *(u8 *)*p = v;
-       (*p)++;
-}
-static inline void ceph_encode_copy(void **p, const void *s, int len)
-{
-       memcpy(*p, s, len);
-       *p += len;
-}
-
-/*
- * filepath, string encoders
- */
-static inline void ceph_encode_filepath(void **p, void *end,
-                                       u64 ino, const char *path)
-{
-       u32 len = path ? strlen(path) : 0;
-       BUG_ON(*p + sizeof(ino) + sizeof(len) + len > end);
-       ceph_encode_8(p, 1);
-       ceph_encode_64(p, ino);
-       ceph_encode_32(p, len);
-       if (len)
-               memcpy(*p, path, len);
-       *p += len;
-}
-
-static inline void ceph_encode_string(void **p, void *end,
-                                     const char *s, u32 len)
-{
-       BUG_ON(*p + sizeof(len) + len > end);
-       ceph_encode_32(p, len);
-       if (len)
-               memcpy(*p, s, len);
-       *p += len;
-}
-
-#define ceph_encode_need(p, end, n, bad)               \
-       do {                                            \
-               if (unlikely(*(p) + (n) > (end)))       \
-                       goto bad;                       \
-       } while (0)
-
-#define ceph_encode_64_safe(p, end, v, bad)                    \
-       do {                                                    \
-               ceph_encode_need(p, end, sizeof(u64), bad);     \
-               ceph_encode_64(p, v);                           \
-       } while (0)
-#define ceph_encode_32_safe(p, end, v, bad)                    \
-       do {                                                    \
-               ceph_encode_need(p, end, sizeof(u32), bad);     \
-               ceph_encode_32(p, v);                   \
-       } while (0)
-#define ceph_encode_16_safe(p, end, v, bad)                    \
-       do {                                                    \
-               ceph_encode_need(p, end, sizeof(u16), bad);     \
-               ceph_encode_16(p, v);                   \
-       } while (0)
-
-#define ceph_encode_copy_safe(p, end, pv, n, bad)              \
-       do {                                                    \
-               ceph_encode_need(p, end, n, bad);               \
-               ceph_encode_copy(p, pv, n);                     \
-       } while (0)
-
-
-#endif
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c

index a1986eb52045b8184d6ab31e46b8a2497295e9ba..e0a2dc6fcafcb62266909c5ec71329e58083d1d7 100644 (file)
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1,4 +1,4 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/spinlock.h>
  #include <linux/fs_struct.h>
@@ -7,6 +7,7 @@
  #include <linux/sched.h>
  
  #include "super.h"
+#include "mds_client.h"
  
  /*
   * Directory operations: readdir, lookup, create, link, unlink,
@@ -94,10 +95,7 @@ static unsigned fpos_off(loff_t p)
   */
  static int __dcache_readdir(struct file *filp,
                             void *dirent, filldir_t filldir)
-               __releases(inode->i_lock)
-               __acquires(inode->i_lock)
  {
-       struct inode *inode = filp->f_dentry->d_inode;
         struct ceph_file_info *fi = filp->private_data;
         struct dentry *parent = filp->f_dentry;
         struct inode *dir = parent->d_inode;
@@ -153,7 +151,6 @@ more:
  
         atomic_inc(&dentry->d_count);
         spin_unlock(&dcache_lock);
-       spin_unlock(&inode->i_lock);
  
         dout(" %llu (%llu) dentry %p %.*s %p\n", di->offset, filp->f_pos,
              dentry, dentry->d_name.len, dentry->d_name.name, dentry->d_inode);
@@ -171,35 +168,30 @@ more:
                 } else {
                         dput(last);
                 }
-               last = NULL;
         }
-
-       spin_lock(&inode->i_lock);
-       spin_lock(&dcache_lock);
-
         last = dentry;
  
         if (err < 0)
-               goto out_unlock;
+               goto out;
  
-       p = p->prev;
         filp->f_pos++;
  
         /* make sure a dentry wasn't dropped while we didn't have dcache_lock */
-       if ((ceph_inode(dir)->i_ceph_flags & CEPH_I_COMPLETE))
-               goto more;
-       dout(" lost I_COMPLETE on %p; falling back to mds\n", dir);
-       err = -EAGAIN;
+       if (!ceph_i_test(dir, CEPH_I_COMPLETE)) {
+               dout(" lost I_COMPLETE on %p; falling back to mds\n", dir);
+               err = -EAGAIN;
+               goto out;
+       }
+
+       spin_lock(&dcache_lock);
+       p = p->prev;    /* advance to next dentry */
+       goto more;
  
  out_unlock:
         spin_unlock(&dcache_lock);
-
-       if (last) {
-               spin_unlock(&inode->i_lock);
+out:
+       if (last)
                 dput(last);
-               spin_lock(&inode->i_lock);
-       }
-
         return err;
  }
  
@@ -227,15 +219,15 @@ static int ceph_readdir(struct file *filp, void *dirent, filldir_t filldir)
         struct ceph_file_info *fi = filp->private_data;
         struct inode *inode = filp->f_dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_client *client = ceph_inode_to_client(inode);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         unsigned frag = fpos_frag(filp->f_pos);
         int off = fpos_off(filp->f_pos);
         int err;
         u32 ftype;
         struct ceph_mds_reply_info_parsed *rinfo;
-       const int max_entries = client->mount_args->max_readdir;
-       const int max_bytes = client->mount_args->max_readdir_bytes;
+       const int max_entries = fsc->mount_options->max_readdir;
+       const int max_bytes = fsc->mount_options->max_readdir_bytes;
  
         dout("readdir %p filp %p frag %u off %u\n", inode, filp, frag, off);
         if (fi->at_end)
@@ -267,17 +259,17 @@ static int ceph_readdir(struct file *filp, void *dirent, filldir_t filldir)
         /* can we use the dcache? */
         spin_lock(&inode->i_lock);
         if ((filp->f_pos == 2 || fi->dentry) &&
-           !ceph_test_opt(client, NOASYNCREADDIR) &&
+           !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
             ceph_snap(inode) != CEPH_SNAPDIR &&
             (ci->i_ceph_flags & CEPH_I_COMPLETE) &&
             __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
+               spin_unlock(&inode->i_lock);
                 err = __dcache_readdir(filp, dirent, filldir);
-               if (err != -EAGAIN) {
-                       spin_unlock(&inode->i_lock);
+               if (err != -EAGAIN)
                         return err;
-               }
+       } else {
+               spin_unlock(&inode->i_lock);
         }
-       spin_unlock(&inode->i_lock);
         if (fi->dentry) {
                 err = note_last_dentry(fi, fi->dentry->d_name.name,
                                        fi->dentry->d_name.len);
@@ -487,14 +479,13 @@ static loff_t ceph_dir_llseek(struct file *file, loff_t offset, int origin)
  struct dentry *ceph_finish_lookup(struct ceph_mds_request *req,
                                   struct dentry *dentry, int err)
  {
-       struct ceph_client *client = ceph_sb_to_client(dentry->d_sb);
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
         struct inode *parent = dentry->d_parent->d_inode;
  
         /* .snap dir? */
         if (err == -ENOENT &&
-           ceph_vino(parent).ino != CEPH_INO_ROOT && /* no .snap in root dir */
             strcmp(dentry->d_name.name,
-                  client->mount_args->snapdir_name) == 0) {
+                  fsc->mount_options->snapdir_name) == 0) {
                 struct inode *inode = ceph_get_snapdir(parent);
                 dout("ENOENT on snapdir %p '%.*s', linking to snapdir %p\n",
                      dentry, dentry->d_name.len, dentry->d_name.name, inode);
@@ -539,8 +530,8 @@ static int is_root_ceph_dentry(struct inode *inode, struct dentry *dentry)
  static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
                                   struct nameidata *nd)
  {
-       struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int op;
         int err;
@@ -572,7 +563,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
                 spin_lock(&dir->i_lock);
                 dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
                 if (strncmp(dentry->d_name.name,
-                           client->mount_args->snapdir_name,
+                           fsc->mount_options->snapdir_name,
                             dentry->d_name.len) &&
                     !is_root_ceph_dentry(dir, dentry) &&
                     (ci->i_ceph_flags & CEPH_I_COMPLETE) &&
@@ -629,8 +620,8 @@ int ceph_handle_notrace_create(struct inode *dir, struct dentry *dentry)
  static int ceph_mknod(struct inode *dir, struct dentry *dentry,
                       int mode, dev_t rdev)
  {
-       struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int err;
  
@@ -685,8 +676,8 @@ static int ceph_create(struct inode *dir, struct dentry *dentry, int mode,
  static int ceph_symlink(struct inode *dir, struct dentry *dentry,
                             const char *dest)
  {
-       struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int err;
  
@@ -716,8 +707,8 @@ static int ceph_symlink(struct inode *dir, struct dentry *dentry,
  
  static int ceph_mkdir(struct inode *dir, struct dentry *dentry, int mode)
  {
-       struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int err = -EROFS;
         int op;
@@ -758,8 +749,8 @@ out:
  static int ceph_link(struct dentry *old_dentry, struct inode *dir,
                      struct dentry *dentry)
  {
-       struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int err;
  
@@ -813,8 +804,8 @@ static int drop_caps_for_unlink(struct inode *inode)
   */
  static int ceph_unlink(struct inode *dir, struct dentry *dentry)
  {
-       struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct inode *inode = dentry->d_inode;
         struct ceph_mds_request *req;
         int err = -EROFS;
@@ -854,8 +845,8 @@ out:
  static int ceph_rename(struct inode *old_dir, struct dentry *old_dentry,
                        struct inode *new_dir, struct dentry *new_dentry)
  {
-       struct ceph_client *client = ceph_sb_to_client(old_dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(old_dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int err;
  
@@ -1076,7 +1067,7 @@ static ssize_t ceph_read_dir(struct file *file, char __user *buf, size_t size,
         struct ceph_inode_info *ci = ceph_inode(inode);
         int left;
  
-       if (!ceph_test_opt(ceph_sb_to_client(inode->i_sb), DIRSTAT))
+       if (!ceph_test_mount_opt(ceph_sb_to_client(inode->i_sb), DIRSTAT))
                 return -EISDIR;
  
         if (!cf->dir_info) {
@@ -1177,7 +1168,7 @@ void ceph_dentry_lru_add(struct dentry *dn)
         dout("dentry_lru_add %p %p '%.*s'\n", di, dn,
              dn->d_name.len, dn->d_name.name);
         if (di) {
-               mdsc = &ceph_sb_to_client(dn->d_sb)->mdsc;
+               mdsc = ceph_sb_to_client(dn->d_sb)->mdsc;
                 spin_lock(&mdsc->dentry_lru_lock);
                 list_add_tail(&di->lru, &mdsc->dentry_lru);
                 mdsc->num_dentry++;
@@ -1193,7 +1184,7 @@ void ceph_dentry_lru_touch(struct dentry *dn)
         dout("dentry_lru_touch %p %p '%.*s' (offset %lld)\n", di, dn,
              dn->d_name.len, dn->d_name.name, di->offset);
         if (di) {
-               mdsc = &ceph_sb_to_client(dn->d_sb)->mdsc;
+               mdsc = ceph_sb_to_client(dn->d_sb)->mdsc;
                 spin_lock(&mdsc->dentry_lru_lock);
                 list_move_tail(&di->lru, &mdsc->dentry_lru);
                 spin_unlock(&mdsc->dentry_lru_lock);
@@ -1208,7 +1199,7 @@ void ceph_dentry_lru_del(struct dentry *dn)
         dout("dentry_lru_del %p %p '%.*s'\n", di, dn,
              dn->d_name.len, dn->d_name.name);
         if (di) {
-               mdsc = &ceph_sb_to_client(dn->d_sb)->mdsc;
+               mdsc = ceph_sb_to_client(dn->d_sb)->mdsc;
                 spin_lock(&mdsc->dentry_lru_lock);
                 list_del_init(&di->lru);
                 mdsc->num_dentry--;
diff --git a/fs/ceph/export.c b/fs/ceph/export.c

index 4480cb1c63e7c69b107628481388cf1e35f49b8d..2297d9426992b0132b991e0d4e129982c4ab4a01 100644 (file)
--- a/fs/ceph/export.c
+++ b/fs/ceph/export.c
@@ -1,10 +1,11 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/exportfs.h>
  #include <linux/slab.h>
  #include <asm/unaligned.h>
  
  #include "super.h"
+#include "mds_client.h"
  
  /*
   * NFS export support
@@ -42,32 +43,37 @@ struct ceph_nfs_confh {
  static int ceph_encode_fh(struct dentry *dentry, u32 *rawfh, int *max_len,
                           int connectable)
  {
+       int type;
         struct ceph_nfs_fh *fh = (void *)rawfh;
         struct ceph_nfs_confh *cfh = (void *)rawfh;
         struct dentry *parent = dentry->d_parent;
         struct inode *inode = dentry->d_inode;
-       int type;
+       int connected_handle_length = sizeof(*cfh)/4;
+       int handle_length = sizeof(*fh)/4;
  
         /* don't re-export snaps */
         if (ceph_snap(inode) != CEPH_NOSNAP)
                 return -EINVAL;
  
-       if (*max_len >= sizeof(*cfh)) {
+       if (*max_len >= connected_handle_length) {
                 dout("encode_fh %p connectable\n", dentry);
                 cfh->ino = ceph_ino(dentry->d_inode);
                 cfh->parent_ino = ceph_ino(parent->d_inode);
                 cfh->parent_name_hash = parent->d_name.hash;
-               *max_len = sizeof(*cfh);
+               *max_len = connected_handle_length;
                 type = 2;
-       } else if (*max_len > sizeof(*fh)) {
-               if (connectable)
-                       return -ENOSPC;
+       } else if (*max_len >= handle_length) {
+               if (connectable) {
+                       *max_len = connected_handle_length;
+                       return 255;
+               }
                 dout("encode_fh %p\n", dentry);
                 fh->ino = ceph_ino(dentry->d_inode);
-               *max_len = sizeof(*fh);
+               *max_len = handle_length;
                 type = 1;
         } else {
-               return -ENOSPC;
+               *max_len = handle_length;
+               return 255;
         }
         return type;
  }
@@ -115,7 +121,7 @@ static struct dentry *__fh_to_dentry(struct super_block *sb,
  static struct dentry *__cfh_to_dentry(struct super_block *sb,
                                       struct ceph_nfs_confh *cfh)
  {
-       struct ceph_mds_client *mdsc = &ceph_sb_to_client(sb)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(sb)->mdsc;
         struct inode *inode;
         struct dentry *dentry;
         struct ceph_vino vino;
diff --git a/fs/ceph/file.c b/fs/ceph/file.c

index 8c044a4f045751c62420e67664705f981efe238a..e77c28cf369059112dd06ee3b7c78c5146f0ffdc 100644 (file)
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1,5 +1,6 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
+#include <linux/module.h>
  #include <linux/sched.h>
  #include <linux/slab.h>
  #include <linux/file.h>
@@ -38,8 +39,8 @@
  static struct ceph_mds_request *
  prepare_open_request(struct super_block *sb, int flags, int create_mode)
  {
-       struct ceph_client *client = ceph_sb_to_client(sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int want_auth = USE_ANY_MDS;
         int op = (flags & O_CREAT) ? CEPH_MDS_OP_CREATE : CEPH_MDS_OP_OPEN;
@@ -117,8 +118,8 @@ static int ceph_init_file(struct inode *inode, struct file *file, int fmode)
  int ceph_open(struct inode *inode, struct file *file)
  {
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_client *client = ceph_sb_to_client(inode->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         struct ceph_file_info *cf = file->private_data;
         struct inode *parent_inode = file->f_dentry->d_parent->d_inode;
@@ -216,8 +217,8 @@ struct dentry *ceph_lookup_open(struct inode *dir, struct dentry *dentry,
                                 struct nameidata *nd, int mode,
                                 int locked_dir)
  {
-       struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct file *file = nd->intent.open.file;
         struct inode *parent_inode = get_dentry_parent_inode(file->f_dentry);
         struct ceph_mds_request *req;
@@ -269,163 +270,6 @@ int ceph_release(struct inode *inode, struct file *file)
         return 0;
  }
  
-/*
- * build a vector of user pages
- */
-static struct page **get_direct_page_vector(const char __user *data,
-                                           int num_pages,
-                                           loff_t off, size_t len)
-{
-       struct page **pages;
-       int rc;
-
-       pages = kmalloc(sizeof(*pages) * num_pages, GFP_NOFS);
-       if (!pages)
-               return ERR_PTR(-ENOMEM);
-
-       down_read(&current->mm->mmap_sem);
-       rc = get_user_pages(current, current->mm, (unsigned long)data,
-                           num_pages, 0, 0, pages, NULL);
-       up_read(&current->mm->mmap_sem);
-       if (rc < 0)
-               goto fail;
-       return pages;
-
-fail:
-       kfree(pages);
-       return ERR_PTR(rc);
-}
-
-static void put_page_vector(struct page **pages, int num_pages)
-{
-       int i;
-
-       for (i = 0; i < num_pages; i++)
-               put_page(pages[i]);
-       kfree(pages);
-}
-
-void ceph_release_page_vector(struct page **pages, int num_pages)
-{
-       int i;
-
-       for (i = 0; i < num_pages; i++)
-               __free_pages(pages[i], 0);
-       kfree(pages);
-}
-
-/*
- * allocate a vector new pages
- */
-static struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
-{
-       struct page **pages;
-       int i;
-
-       pages = kmalloc(sizeof(*pages) * num_pages, flags);
-       if (!pages)
-               return ERR_PTR(-ENOMEM);
-       for (i = 0; i < num_pages; i++) {
-               pages[i] = __page_cache_alloc(flags);
-               if (pages[i] == NULL) {
-                       ceph_release_page_vector(pages, i);
-                       return ERR_PTR(-ENOMEM);
-               }
-       }
-       return pages;
-}
-
-/*
- * copy user data into a page vector
- */
-static int copy_user_to_page_vector(struct page **pages,
-                                   const char __user *data,
-                                   loff_t off, size_t len)
-{
-       int i = 0;
-       int po = off & ~PAGE_CACHE_MASK;
-       int left = len;
-       int l, bad;
-
-       while (left > 0) {
-               l = min_t(int, PAGE_CACHE_SIZE-po, left);
-               bad = copy_from_user(page_address(pages[i]) + po, data, l);
-               if (bad == l)
-                       return -EFAULT;
-               data += l - bad;
-               left -= l - bad;
-               po += l - bad;
-               if (po == PAGE_CACHE_SIZE) {
-                       po = 0;
-                       i++;
-               }
-       }
-       return len;
-}
-
-/*
- * copy user data from a page vector into a user pointer
- */
-static int copy_page_vector_to_user(struct page **pages, char __user *data,
-                                   loff_t off, size_t len)
-{
-       int i = 0;
-       int po = off & ~PAGE_CACHE_MASK;
-       int left = len;
-       int l, bad;
-
-       while (left > 0) {
-               l = min_t(int, left, PAGE_CACHE_SIZE-po);
-               bad = copy_to_user(data, page_address(pages[i]) + po, l);
-               if (bad == l)
-                       return -EFAULT;
-               data += l - bad;
-               left -= l - bad;
-               if (po) {
-                       po += l - bad;
-                       if (po == PAGE_CACHE_SIZE)
-                               po = 0;
-               }
-               i++;
-       }
-       return len;
-}
-
-/*
- * Zero an extent within a page vector.  Offset is relative to the
- * start of the first page.
- */
-static void zero_page_vector_range(int off, int len, struct page **pages)
-{
-       int i = off >> PAGE_CACHE_SHIFT;
-
-       off &= ~PAGE_CACHE_MASK;
-
-       dout("zero_page_vector_page %u~%u\n", off, len);
-
-       /* leading partial page? */
-       if (off) {
-               int end = min((int)PAGE_CACHE_SIZE, off + len);
-               dout("zeroing %d %p head from %d\n", i, pages[i],
-                    (int)off);
-               zero_user_segment(pages[i], off, end);
-               len -= (end - off);
-               i++;
-       }
-       while (len >= PAGE_CACHE_SIZE) {
-               dout("zeroing %d %p len=%d\n", i, pages[i], len);
-               zero_user_segment(pages[i], 0, PAGE_CACHE_SIZE);
-               len -= PAGE_CACHE_SIZE;
-               i++;
-       }
-       /* trailing partial page? */
-       if (len) {
-               dout("zeroing %d %p tail to %d\n", i, pages[i], (int)len);
-               zero_user_segment(pages[i], 0, len);
-       }
-}
-
-
  /*
   * Read a range of bytes striped over one or more objects.  Iterate over
   * objects we stripe over.  (That's not atomic, but good enough for now.)
@@ -438,7 +282,7 @@ static int striped_read(struct inode *inode,
                         struct page **pages, int num_pages,
                         int *checkeof)
  {
-       struct ceph_client *client = ceph_inode_to_client(inode);
+       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
         struct ceph_inode_info *ci = ceph_inode(inode);
         u64 pos, this_len;
         int page_off = off & ~PAGE_CACHE_MASK; /* first byte's offset in page */
@@ -459,7 +303,7 @@ static int striped_read(struct inode *inode,
  
  more:
         this_len = left;
-       ret = ceph_osdc_readpages(&client->osdc, ceph_vino(inode),
+       ret = ceph_osdc_readpages(&fsc->client->osdc, ceph_vino(inode),
                                   &ci->i_layout, pos, &this_len,
                                   ci->i_truncate_seq,
                                   ci->i_truncate_size,
@@ -477,8 +321,8 @@ more:
  
                 if (read < pos - off) {
                         dout(" zero gap %llu to %llu\n", off + read, pos);
-                       zero_page_vector_range(page_off + read,
-                                              pos - off - read, pages);
+                       ceph_zero_page_vector_range(page_off + read,
+                                                   pos - off - read, pages);
                 }
                 pos += ret;
                 read = pos - off;
@@ -495,8 +339,8 @@ more:
                 /* was original extent fully inside i_size? */
                 if (pos + left <= inode->i_size) {
                         dout("zero tail\n");
-                       zero_page_vector_range(page_off + read, len - read,
-                                              pages);
+                       ceph_zero_page_vector_range(page_off + read, len - read,
+                                                   pages);
                         read = len;
                         goto out;
                 }
@@ -531,7 +375,7 @@ static ssize_t ceph_sync_read(struct file *file, char __user *data,
              (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
  
         if (file->f_flags & O_DIRECT) {
-               pages = get_direct_page_vector(data, num_pages, off, len);
+               pages = ceph_get_direct_page_vector(data, num_pages, off, len);
  
                 /*
                  * flush any page cache pages in this range.  this
@@ -552,13 +396,13 @@ static ssize_t ceph_sync_read(struct file *file, char __user *data,
         ret = striped_read(inode, off, len, pages, num_pages, checkeof);
  
         if (ret >= 0 && (file->f_flags & O_DIRECT) == 0)
-               ret = copy_page_vector_to_user(pages, data, off, ret);
+               ret = ceph_copy_page_vector_to_user(pages, data, off, ret);
         if (ret >= 0)
                 *poff = off + ret;
  
  done:
         if (file->f_flags & O_DIRECT)
-               put_page_vector(pages, num_pages);
+               ceph_put_page_vector(pages, num_pages);
         else
                 ceph_release_page_vector(pages, num_pages);
         dout("sync_read result %d\n", ret);
@@ -594,7 +438,7 @@ static ssize_t ceph_sync_write(struct file *file, const char __user *data,
  {
         struct inode *inode = file->f_dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_client *client = ceph_inode_to_client(inode);
+       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
         struct ceph_osd_request *req;
         struct page **pages;
         int num_pages;
@@ -642,7 +486,7 @@ static ssize_t ceph_sync_write(struct file *file, const char __user *data,
          */
  more:
         len = left;
-       req = ceph_osdc_new_request(&client->osdc, &ci->i_layout,
+       req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
                                     ceph_vino(inode), pos, &len,
                                     CEPH_OSD_OP_WRITE, flags,
                                     ci->i_snap_realm->cached_context,
@@ -655,7 +499,7 @@ more:
         num_pages = calc_pages_for(pos, len);
  
         if (file->f_flags & O_DIRECT) {
-               pages = get_direct_page_vector(data, num_pages, pos, len);
+               pages = ceph_get_direct_page_vector(data, num_pages, pos, len);
                 if (IS_ERR(pages)) {
                         ret = PTR_ERR(pages);
                         goto out;
@@ -673,7 +517,7 @@ more:
                         ret = PTR_ERR(pages);
                         goto out;
                 }
-               ret = copy_user_to_page_vector(pages, data, pos, len);
+               ret = ceph_copy_user_to_page_vector(pages, data, pos, len);
                 if (ret < 0) {
                         ceph_release_page_vector(pages, num_pages);
                         goto out;
@@ -689,7 +533,7 @@ more:
         req->r_num_pages = num_pages;
         req->r_inode = inode;
  
-       ret = ceph_osdc_start_request(&client->osdc, req, false);
+       ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
         if (!ret) {
                 if (req->r_safe_callback) {
                         /*
@@ -697,15 +541,15 @@ more:
                          * start_request so that a tid has been assigned.
                          */
                         spin_lock(&ci->i_unsafe_lock);
-                       list_add(&ci->i_unsafe_writes, &req->r_unsafe_item);
+                       list_add(&req->r_unsafe_item, &ci->i_unsafe_writes);
                         spin_unlock(&ci->i_unsafe_lock);
                         ceph_get_cap_refs(ci, CEPH_CAP_FILE_WR);
                 }
-               ret = ceph_osdc_wait_request(&client->osdc, req);
+               ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
         }
  
         if (file->f_flags & O_DIRECT)
-               put_page_vector(pages, num_pages);
+               ceph_put_page_vector(pages, num_pages);
         else if (file->f_flags & O_SYNC)
                 ceph_release_page_vector(pages, num_pages);
  
@@ -814,7 +658,8 @@ static ssize_t ceph_aio_write(struct kiocb *iocb, const struct iovec *iov,
         struct ceph_file_info *fi = file->private_data;
         struct inode *inode = file->f_dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_osd_client *osdc = &ceph_sb_to_client(inode->i_sb)->osdc;
+       struct ceph_osd_client *osdc =
+               &ceph_sb_to_client(inode->i_sb)->client->osdc;
         loff_t endoff = pos + iov->iov_len;
         int want, got = 0;
         int ret, err;
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c

index 62377ec37edf05c14cf8567c0d22e19e02762253..1d6a45b5a04c696591879d141165627746d6a476 100644 (file)
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1,4 +1,4 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/module.h>
  #include <linux/fs.h>
@@ -13,7 +13,8 @@
  #include <linux/pagevec.h>
  
  #include "super.h"
-#include "decode.h"
+#include "mds_client.h"
+#include <linux/ceph/decode.h>
  
  /*
   * Ceph inode operations
@@ -384,7 +385,7 @@ void ceph_destroy_inode(struct inode *inode)
          */
         if (ci->i_snap_realm) {
                 struct ceph_mds_client *mdsc =
-                       &ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+                       ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
                 struct ceph_snap_realm *realm = ci->i_snap_realm;
  
                 dout(" dropping residual ref to snap realm %p\n", realm);
@@ -685,7 +686,7 @@ static int fill_inode(struct inode *inode,
                 }
  
                 /* it may be better to set st_size in getattr instead? */
-               if (ceph_test_opt(ceph_sb_to_client(inode->i_sb), RBYTES))
+               if (ceph_test_mount_opt(ceph_sb_to_client(inode->i_sb), RBYTES))
                         inode->i_size = ci->i_rbytes;
                 break;
         default:
@@ -901,7 +902,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req,
         struct inode *in = NULL;
         struct ceph_mds_reply_inode *ininfo;
         struct ceph_vino vino;
-       struct ceph_client *client = ceph_sb_to_client(sb);
+       struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
         int i = 0;
         int err = 0;
  
@@ -965,7 +966,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req,
          */
         if (rinfo->head->is_dentry && !req->r_aborted &&
             (rinfo->head->is_target || strncmp(req->r_dentry->d_name.name,
-                                              client->mount_args->snapdir_name,
+                                              fsc->mount_options->snapdir_name,
                                                req->r_dentry->d_name.len))) {
                 /*
                  * lookup link rename   : null -> possibly existing inode
@@ -1533,7 +1534,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr)
         struct inode *parent_inode = dentry->d_parent->d_inode;
         const unsigned int ia_valid = attr->ia_valid;
         struct ceph_mds_request *req;
-       struct ceph_mds_client *mdsc = &ceph_sb_to_client(dentry->d_sb)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(dentry->d_sb)->mdsc;
         int issued;
         int release = 0, dirtied = 0;
         int mask = 0;
@@ -1728,8 +1729,8 @@ out:
   */
  int ceph_do_getattr(struct inode *inode, int mask)
  {
-       struct ceph_client *client = ceph_sb_to_client(inode->i_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req;
         int err;
  
diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c

index 76e307d2aba16868c9b1e8136496d929e0cd971f..8888c9ba68dbfec194e06f06142547ee2d35c8bc 100644 (file)
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -1,8 +1,10 @@
  #include <linux/in.h>
  
-#include "ioctl.h"
  #include "super.h"
-#include "ceph_debug.h"
+#include "mds_client.h"
+#include <linux/ceph/ceph_debug.h>
+
+#include "ioctl.h"
  
  
  /*
@@ -37,7 +39,7 @@ static long ceph_ioctl_set_layout(struct file *file, void __user *arg)
  {
         struct inode *inode = file->f_dentry->d_inode;
         struct inode *parent_inode = file->f_dentry->d_parent->d_inode;
-       struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
         struct ceph_mds_request *req;
         struct ceph_ioctl_layout l;
         int err, i;
@@ -89,6 +91,68 @@ static long ceph_ioctl_set_layout(struct file *file, void __user *arg)
         return err;
  }
  
+/*
+ * Set a layout policy on a directory inode. All items in the tree
+ * rooted at this inode will inherit this layout on creation,
+ * (It doesn't apply retroactively )
+ * unless a subdirectory has its own layout policy.
+ */
+static long ceph_ioctl_set_layout_policy (struct file *file, void __user *arg)
+{
+       struct inode *inode = file->f_dentry->d_inode;
+       struct ceph_mds_request *req;
+       struct ceph_ioctl_layout l;
+       int err, i;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
+
+       /* copy and validate */
+       if (copy_from_user(&l, arg, sizeof(l)))
+               return -EFAULT;
+
+       if ((l.object_size & ~PAGE_MASK) ||
+           (l.stripe_unit & ~PAGE_MASK) ||
+           !l.stripe_unit ||
+           (l.object_size &&
+               (unsigned)l.object_size % (unsigned)l.stripe_unit))
+               return -EINVAL;
+
+       /* make sure it's a valid data pool */
+       if (l.data_pool > 0) {
+               mutex_lock(&mdsc->mutex);
+               err = -EINVAL;
+               for (i = 0; i < mdsc->mdsmap->m_num_data_pg_pools; i++)
+                       if (mdsc->mdsmap->m_data_pg_pools[i] == l.data_pool) {
+                               err = 0;
+                               break;
+                       }
+               mutex_unlock(&mdsc->mutex);
+               if (err)
+                       return err;
+       }
+
+       req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SETDIRLAYOUT,
+                                      USE_AUTH_MDS);
+
+       if (IS_ERR(req))
+               return PTR_ERR(req);
+       req->r_inode = igrab(inode);
+
+       req->r_args.setlayout.layout.fl_stripe_unit =
+                       cpu_to_le32(l.stripe_unit);
+       req->r_args.setlayout.layout.fl_stripe_count =
+                       cpu_to_le32(l.stripe_count);
+       req->r_args.setlayout.layout.fl_object_size =
+                       cpu_to_le32(l.object_size);
+       req->r_args.setlayout.layout.fl_pg_pool =
+                       cpu_to_le32(l.data_pool);
+       req->r_args.setlayout.layout.fl_pg_preferred =
+                       cpu_to_le32(l.preferred_osd);
+
+       err = ceph_mdsc_do_request(mdsc, inode, req);
+       ceph_mdsc_put_request(req);
+       return err;
+}
+
  /*
   * Return object name, size/offset information, and location (OSD
   * number, network address) for a given file offset.
@@ -98,7 +162,8 @@ static long ceph_ioctl_get_dataloc(struct file *file, void __user *arg)
         struct ceph_ioctl_dataloc dl;
         struct inode *inode = file->f_dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
-       struct ceph_osd_client *osdc = &ceph_sb_to_client(inode->i_sb)->osdc;
+       struct ceph_osd_client *osdc =
+               &ceph_sb_to_client(inode->i_sb)->client->osdc;
         u64 len = 1, olen;
         u64 tmp;
         struct ceph_object_layout ol;
@@ -174,11 +239,15 @@ long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
         case CEPH_IOC_SET_LAYOUT:
                 return ceph_ioctl_set_layout(file, (void __user *)arg);
  
+       case CEPH_IOC_SET_LAYOUT_POLICY:
+               return ceph_ioctl_set_layout_policy(file, (void __user *)arg);
+
         case CEPH_IOC_GET_DATALOC:
                 return ceph_ioctl_get_dataloc(file, (void __user *)arg);
  
         case CEPH_IOC_LAZYIO:
                 return ceph_ioctl_lazyio(file);
         }
+
         return -ENOTTY;
  }
diff --git a/fs/ceph/ioctl.h b/fs/ceph/ioctl.h

index 88451a3b6857d14bc7b21f418bcd82e56e9fdc64..a6ce54e94eb5ab435670093cd6ad789e72cc627b 100644 (file)
--- a/fs/ceph/ioctl.h
+++ b/fs/ceph/ioctl.h
@@ -4,7 +4,7 @@
  #include <linux/ioctl.h>
  #include <linux/types.h>
  
-#define CEPH_IOCTL_MAGIC 0x97
+#define CEPH_IOCTL_MAGIC 0x98
  
  /* just use u64 to align sanely on all archs */
  struct ceph_ioctl_layout {
@@ -17,6 +17,8 @@ struct ceph_ioctl_layout {
                                    struct ceph_ioctl_layout)
  #define CEPH_IOC_SET_LAYOUT _IOW(CEPH_IOCTL_MAGIC, 2,          \
                                    struct ceph_ioctl_layout)
+#define CEPH_IOC_SET_LAYOUT_POLICY _IOW(CEPH_IOCTL_MAGIC, 5,   \
+                                  struct ceph_ioctl_layout)
  
  /*
   * Extract identity, address of the OSD and object storing a given
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c

index ff4e753aae929d37d414567d22fd6afef7316c7e..40abde93c345d054279fd51cbca998525c4931c7 100644 (file)
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -1,11 +1,11 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/file.h>
  #include <linux/namei.h>
  
  #include "super.h"
  #include "mds_client.h"
-#include "pagelist.h"
+#include <linux/ceph/pagelist.h>
  
  /**
   * Implement fcntl and flock locking functions.
@@ -16,7 +16,7 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct file *file,
  {
         struct inode *inode = file->f_dentry->d_inode;
         struct ceph_mds_client *mdsc =
-               &ceph_sb_to_client(inode->i_sb)->mdsc;
+               ceph_sb_to_client(inode->i_sb)->mdsc;
         struct ceph_mds_request *req;
         int err;
  
@@ -181,8 +181,9 @@ void ceph_count_locks(struct inode *inode, int *fcntl_count, int *flock_count)
   * Encode the flock and fcntl locks for the given inode into the pagelist.
   * Format is: #fcntl locks, sequential fcntl locks, #flock locks,
   * sequential flock locks.
- * Must be called with BLK already held, and the lock numbers should have
- * been gathered under the same lock holding window.
+ * Must be called with lock_flocks() already held.
+ * If we encounter more of a specific lock type than expected,
+ * we return the value 1.
   */
  int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
                       int num_fcntl_locks, int num_flock_locks)
@@ -190,6 +191,8 @@ int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
         struct file_lock *lock;
         struct ceph_filelock cephlock;
         int err = 0;
+       int seen_fcntl = 0;
+       int seen_flock = 0;
  
         dout("encoding %d flock and %d fcntl locks", num_flock_locks,
              num_fcntl_locks);
@@ -198,6 +201,11 @@ int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
                 goto fail;
         for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
                 if (lock->fl_flags & FL_POSIX) {
+                       ++seen_fcntl;
+                       if (seen_fcntl > num_fcntl_locks) {
+                               err = -ENOSPC;
+                               goto fail;
+                       }
                         err = lock_to_ceph_filelock(lock, &cephlock);
                         if (err)
                                 goto fail;
@@ -213,6 +221,11 @@ int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
                 goto fail;
         for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
                 if (lock->fl_flags & FL_FLOCK) {
+                       ++seen_flock;
+                       if (seen_flock > num_flock_locks) {
+                               err = -ENOSPC;
+                               goto fail;
+                       }
                         err = lock_to_ceph_filelock(lock, &cephlock);
                         if (err)
                                 goto fail;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c

index fad95f8f2608bc4e86c0876bedd81cb5be46a4ef..3142b15940c25656a43ec3a5d72af3e1ee1cece9 100644 (file)
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1,17 +1,21 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
+#include <linux/fs.h>
  #include <linux/wait.h>
  #include <linux/slab.h>
  #include <linux/sched.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
  #include <linux/smp_lock.h>
  
-#include "mds_client.h"
-#include "mon_client.h"
  #include "super.h"
-#include "messenger.h"
-#include "decode.h"
-#include "auth.h"
-#include "pagelist.h"
+#include "mds_client.h"
+
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/pagelist.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
  
  /*
   * A cluster of MDS (metadata server) daemons is responsible for
@@ -286,8 +290,9 @@ void ceph_put_mds_session(struct ceph_mds_session *s)
              atomic_read(&s->s_ref), atomic_read(&s->s_ref)-1);
         if (atomic_dec_and_test(&s->s_ref)) {
                 if (s->s_authorizer)
-                       s->s_mdsc->client->monc.auth->ops->destroy_authorizer(
-                               s->s_mdsc->client->monc.auth, s->s_authorizer);
+                    s->s_mdsc->fsc->client->monc.auth->ops->destroy_authorizer(
+                            s->s_mdsc->fsc->client->monc.auth,
+                            s->s_authorizer);
                 kfree(s);
         }
  }
@@ -344,7 +349,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
         s->s_seq = 0;
         mutex_init(&s->s_mutex);
  
-       ceph_con_init(mdsc->client->msgr, &s->s_con);
+       ceph_con_init(mdsc->fsc->client->msgr, &s->s_con);
         s->s_con.private = s;
         s->s_con.ops = &mds_con_ops;
         s->s_con.peer_name.type = CEPH_ENTITY_TYPE_MDS;
@@ -599,7 +604,7 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
         } else if (req->r_dentry) {
                 struct inode *dir = req->r_dentry->d_parent->d_inode;
  
-               if (dir->i_sb != mdsc->client->sb) {
+               if (dir->i_sb != mdsc->fsc->sb) {
                         /* not this fs! */
                         inode = req->r_dentry->d_inode;
                 } else if (ceph_snap(dir) != CEPH_NOSNAP) {
@@ -884,7 +889,7 @@ static int remove_session_caps_cb(struct inode *inode, struct ceph_cap *cap,
         __ceph_remove_cap(cap);
         if (!__ceph_is_any_real_caps(ci)) {
                 struct ceph_mds_client *mdsc =
-                       &ceph_sb_to_client(inode->i_sb)->mdsc;
+                       ceph_sb_to_client(inode->i_sb)->mdsc;
  
                 spin_lock(&mdsc->cap_dirty_lock);
                 if (!list_empty(&ci->i_dirty_item)) {
@@ -1146,7 +1151,7 @@ int ceph_add_cap_releases(struct ceph_mds_client *mdsc,
         struct ceph_msg *msg, *partial = NULL;
         struct ceph_mds_cap_release *head;
         int err = -ENOMEM;
-       int extra = mdsc->client->mount_args->cap_release_safety;
+       int extra = mdsc->fsc->mount_options->cap_release_safety;
         int num;
  
         dout("add_cap_releases %p mds%d extra %d\n", session, session->s_mds,
@@ -2085,7 +2090,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
  
         /* insert trace into our cache */
         mutex_lock(&req->r_fill_mutex);
-       err = ceph_fill_trace(mdsc->client->sb, req, req->r_session);
+       err = ceph_fill_trace(mdsc->fsc->sb, req, req->r_session);
         if (err == 0) {
                 if (result == 0 && rinfo->dir_nr)
                         ceph_readdir_prepopulate(req, req->r_session);
@@ -2361,19 +2366,35 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
  
         if (recon_state->flock) {
                 int num_fcntl_locks, num_flock_locks;
-
-               lock_kernel();
-               ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
-               rec.v2.flock_len = (2*sizeof(u32) +
-                                   (num_fcntl_locks+num_flock_locks) *
-                                   sizeof(struct ceph_filelock));
-
-               err = ceph_pagelist_append(pagelist, &rec, reclen);
-               if (!err)
-                       err = ceph_encode_locks(inode, pagelist,
-                                               num_fcntl_locks,
-                                               num_flock_locks);
-               unlock_kernel();
+               struct ceph_pagelist_cursor trunc_point;
+
+               ceph_pagelist_set_cursor(pagelist, &trunc_point);
+               do {
+                       lock_flocks();
+                       ceph_count_locks(inode, &num_fcntl_locks,
+                                        &num_flock_locks);
+                       rec.v2.flock_len = (2*sizeof(u32) +
+                                           (num_fcntl_locks+num_flock_locks) *
+                                           sizeof(struct ceph_filelock));
+                       unlock_flocks();
+
+                       /* pre-alloc pagelist */
+                       ceph_pagelist_truncate(pagelist, &trunc_point);
+                       err = ceph_pagelist_append(pagelist, &rec, reclen);
+                       if (!err)
+                               err = ceph_pagelist_reserve(pagelist,
+                                                           rec.v2.flock_len);
+
+                       /* encode locks */
+                       if (!err) {
+                               lock_flocks();
+                               err = ceph_encode_locks(inode,
+                                                       pagelist,
+                                                       num_fcntl_locks,
+                                                       num_flock_locks);
+                               unlock_flocks();
+                       }
+               } while (err == -ENOSPC);
         } else {
                 err = ceph_pagelist_append(pagelist, &rec, reclen);
         }
@@ -2613,7 +2634,7 @@ static void handle_lease(struct ceph_mds_client *mdsc,
                          struct ceph_mds_session *session,
                          struct ceph_msg *msg)
  {
-       struct super_block *sb = mdsc->client->sb;
+       struct super_block *sb = mdsc->fsc->sb;
         struct inode *inode;
         struct ceph_inode_info *ci;
         struct dentry *parent, *dentry;
@@ -2891,10 +2912,16 @@ static void delayed_work(struct work_struct *work)
         schedule_delayed(mdsc);
  }
  
+int ceph_mdsc_init(struct ceph_fs_client *fsc)
  
-int ceph_mdsc_init(struct ceph_mds_client *mdsc, struct ceph_client *client)
  {
-       mdsc->client = client;
+       struct ceph_mds_client *mdsc;
+
+       mdsc = kzalloc(sizeof(struct ceph_mds_client), GFP_NOFS);
+       if (!mdsc)
+               return -ENOMEM;
+       mdsc->fsc = fsc;
+       fsc->mdsc = mdsc;
         mutex_init(&mdsc->mutex);
         mdsc->mdsmap = kzalloc(sizeof(*mdsc->mdsmap), GFP_NOFS);
         if (mdsc->mdsmap == NULL)
@@ -2927,7 +2954,7 @@ int ceph_mdsc_init(struct ceph_mds_client *mdsc, struct ceph_client *client)
         INIT_LIST_HEAD(&mdsc->dentry_lru);
  
         ceph_caps_init(mdsc);
-       ceph_adjust_min_caps(mdsc, client->min_caps);
+       ceph_adjust_min_caps(mdsc, fsc->min_caps);
  
         return 0;
  }
@@ -2939,7 +2966,7 @@ int ceph_mdsc_init(struct ceph_mds_client *mdsc, struct ceph_client *client)
  static void wait_requests(struct ceph_mds_client *mdsc)
  {
         struct ceph_mds_request *req;
-       struct ceph_client *client = mdsc->client;
+       struct ceph_fs_client *fsc = mdsc->fsc;
  
         mutex_lock(&mdsc->mutex);
         if (__get_oldest_req(mdsc)) {
@@ -2947,7 +2974,7 @@ static void wait_requests(struct ceph_mds_client *mdsc)
  
                 dout("wait_requests waiting for requests\n");
                 wait_for_completion_timeout(&mdsc->safe_umount_waiters,
-                                   client->mount_args->mount_timeout * HZ);
+                                   fsc->client->options->mount_timeout * HZ);
  
                 /* tear down remaining requests */
                 mutex_lock(&mdsc->mutex);
@@ -3030,7 +3057,7 @@ void ceph_mdsc_sync(struct ceph_mds_client *mdsc)
  {
         u64 want_tid, want_flush;
  
-       if (mdsc->client->mount_state == CEPH_MOUNT_SHUTDOWN)
+       if (mdsc->fsc->mount_state == CEPH_MOUNT_SHUTDOWN)
                 return;
  
         dout("sync\n");
@@ -3053,7 +3080,7 @@ bool done_closing_sessions(struct ceph_mds_client *mdsc)
  {
         int i, n = 0;
  
-       if (mdsc->client->mount_state == CEPH_MOUNT_SHUTDOWN)
+       if (mdsc->fsc->mount_state == CEPH_MOUNT_SHUTDOWN)
                 return true;
  
         mutex_lock(&mdsc->mutex);
@@ -3071,8 +3098,8 @@ void ceph_mdsc_close_sessions(struct ceph_mds_client *mdsc)
  {
         struct ceph_mds_session *session;
         int i;
-       struct ceph_client *client = mdsc->client;
-       unsigned long timeout = client->mount_args->mount_timeout * HZ;
+       struct ceph_fs_client *fsc = mdsc->fsc;
+       unsigned long timeout = fsc->client->options->mount_timeout * HZ;
  
         dout("close_sessions\n");
  
@@ -3119,7 +3146,7 @@ void ceph_mdsc_close_sessions(struct ceph_mds_client *mdsc)
         dout("stopped\n");
  }
  
-void ceph_mdsc_stop(struct ceph_mds_client *mdsc)
+static void ceph_mdsc_stop(struct ceph_mds_client *mdsc)
  {
         dout("stop\n");
         cancel_delayed_work_sync(&mdsc->delayed_work); /* cancel timer */
@@ -3129,6 +3156,15 @@ void ceph_mdsc_stop(struct ceph_mds_client *mdsc)
         ceph_caps_finalize(mdsc);
  }
  
+void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
+{
+       struct ceph_mds_client *mdsc = fsc->mdsc;
+
+       ceph_mdsc_stop(mdsc);
+       fsc->mdsc = NULL;
+       kfree(mdsc);
+}
+
  
  /*
   * handle mds map update.
@@ -3145,14 +3181,14 @@ void ceph_mdsc_handle_map(struct ceph_mds_client *mdsc, struct ceph_msg *msg)
  
         ceph_decode_need(&p, end, sizeof(fsid)+2*sizeof(u32), bad);
         ceph_decode_copy(&p, &fsid, sizeof(fsid));
-       if (ceph_check_fsid(mdsc->client, &fsid) < 0)
+       if (ceph_check_fsid(mdsc->fsc->client, &fsid) < 0)
                 return;
         epoch = ceph_decode_32(&p);
         maplen = ceph_decode_32(&p);
         dout("handle_map epoch %u len %d\n", epoch, (int)maplen);
  
         /* do we need it? */
-       ceph_monc_got_mdsmap(&mdsc->client->monc, epoch);
+       ceph_monc_got_mdsmap(&mdsc->fsc->client->monc, epoch);
         mutex_lock(&mdsc->mutex);
         if (mdsc->mdsmap && epoch <= mdsc->mdsmap->m_epoch) {
                 dout("handle_map epoch %u <= our %u\n",
@@ -3176,7 +3212,7 @@ void ceph_mdsc_handle_map(struct ceph_mds_client *mdsc, struct ceph_msg *msg)
         } else {
                 mdsc->mdsmap = newmap;  /* first mds map */
         }
-       mdsc->client->sb->s_maxbytes = mdsc->mdsmap->m_max_file_size;
+       mdsc->fsc->sb->s_maxbytes = mdsc->mdsmap->m_max_file_size;
  
         __wake_requests(mdsc, &mdsc->waiting_for_map);
  
@@ -3277,7 +3313,7 @@ static int get_authorizer(struct ceph_connection *con,
  {
         struct ceph_mds_session *s = con->private;
         struct ceph_mds_client *mdsc = s->s_mdsc;
-       struct ceph_auth_client *ac = mdsc->client->monc.auth;
+       struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;
         int ret = 0;
  
         if (force_new && s->s_authorizer) {
@@ -3311,7 +3347,7 @@ static int verify_authorizer_reply(struct ceph_connection *con, int len)
  {
         struct ceph_mds_session *s = con->private;
         struct ceph_mds_client *mdsc = s->s_mdsc;
-       struct ceph_auth_client *ac = mdsc->client->monc.auth;
+       struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;
  
         return ac->ops->verify_authorizer_reply(ac, s->s_authorizer, len);
  }
@@ -3320,12 +3356,12 @@ static int invalidate_authorizer(struct ceph_connection *con)
  {
         struct ceph_mds_session *s = con->private;
         struct ceph_mds_client *mdsc = s->s_mdsc;
-       struct ceph_auth_client *ac = mdsc->client->monc.auth;
+       struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;
  
         if (ac->ops->invalidate_authorizer)
                 ac->ops->invalidate_authorizer(ac, CEPH_ENTITY_TYPE_MDS);
  
-       return ceph_monc_validate_auth(&mdsc->client->monc);
+       return ceph_monc_validate_auth(&mdsc->fsc->client->monc);
  }
  
  static const struct ceph_connection_operations mds_con_ops = {
@@ -3338,7 +3374,4 @@ static const struct ceph_connection_operations mds_con_ops = {
         .peer_reset = peer_reset,
  };
  
-
-
-
  /* eof */
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h

index c98267ce6d2ad97e1d9c86bc0660e2d82d39366c..d66d63c7235526ef63d16ff0ea1e9ba3899df586 100644 (file)
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -8,9 +8,9 @@
  #include <linux/rbtree.h>
  #include <linux/spinlock.h>
  
-#include "types.h"
-#include "messenger.h"
-#include "mdsmap.h"
+#include <linux/ceph/types.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/mdsmap.h>
  
  /*
   * Some lock dependencies:
@@ -26,7 +26,7 @@
   *
   */
  
-struct ceph_client;
+struct ceph_fs_client;
  struct ceph_cap;
  
  /*
@@ -230,7 +230,7 @@ struct ceph_mds_request {
   * mds client state
   */
  struct ceph_mds_client {
-       struct ceph_client      *client;
+       struct ceph_fs_client  *fsc;
         struct mutex            mutex;         /* all nested structures */
  
         struct ceph_mdsmap      *mdsmap;
@@ -289,11 +289,6 @@ struct ceph_mds_client {
         int             caps_avail_count;    /* unused, unreserved */
         int             caps_min_count;      /* keep at least this many
                                                 (unreserved) */
-
-#ifdef CONFIG_DEBUG_FS
-       struct dentry     *debugfs_file;
-#endif
-
         spinlock_t        dentry_lru_lock;
         struct list_head  dentry_lru;
         int               num_dentry;
@@ -316,10 +311,9 @@ extern void ceph_put_mds_session(struct ceph_mds_session *s);
  extern int ceph_send_msg_mds(struct ceph_mds_client *mdsc,
                              struct ceph_msg *msg, int mds);
  
-extern int ceph_mdsc_init(struct ceph_mds_client *mdsc,
-                          struct ceph_client *client);
+extern int ceph_mdsc_init(struct ceph_fs_client *fsc);
  extern void ceph_mdsc_close_sessions(struct ceph_mds_client *mdsc);
-extern void ceph_mdsc_stop(struct ceph_mds_client *mdsc);
+extern void ceph_mdsc_destroy(struct ceph_fs_client *fsc);
  
  extern void ceph_mdsc_sync(struct ceph_mds_client *mdsc);
  
diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c

index 040be6d1150be5ace2be71e955ac3b3525b8fd76..73b7d44e8a354264e3f08f66e8cb788851328029 100644 (file)
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -1,4 +1,4 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/bug.h>
  #include <linux/err.h>
@@ -6,9 +6,9 @@
  #include <linux/slab.h>
  #include <linux/types.h>
  
-#include "mdsmap.h"
-#include "messenger.h"
-#include "decode.h"
+#include <linux/ceph/mdsmap.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
  
  #include "super.h"
  
@@ -117,7 +117,8 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end)
                 }
  
                 dout("mdsmap_decode %d/%d %lld mds%d.%d %s %s\n",
-                    i+1, n, global_id, mds, inc, pr_addr(&addr.in_addr),
+                    i+1, n, global_id, mds, inc,
+                    ceph_pr_addr(&addr.in_addr),
                      ceph_mds_state_name(state));
                 if (mds >= 0 && mds < m->m_max_mds && state > 0) {
                         m->m_info[mds].global_id = global_id;
diff --git a/fs/ceph/mdsmap.h b/fs/ceph/mdsmap.h

deleted file mode 100644 (file)

index 4c5cb08..0000000
--- a/fs/ceph/mdsmap.h
+++ /dev/null
@@ -1,62 +0,0 @@
-#ifndef _FS_CEPH_MDSMAP_H
-#define _FS_CEPH_MDSMAP_H
-
-#include "types.h"
-
-/*
- * mds map - describe servers in the mds cluster.
- *
- * we limit fields to those the client actually xcares about
- */
-struct ceph_mds_info {
-       u64 global_id;
-       struct ceph_entity_addr addr;
-       s32 state;
-       int num_export_targets;
-       bool laggy;
-       u32 *export_targets;
-};
-
-struct ceph_mdsmap {
-       u32 m_epoch, m_client_epoch, m_last_failure;
-       u32 m_root;
-       u32 m_session_timeout;          /* seconds */
-       u32 m_session_autoclose;        /* seconds */
-       u64 m_max_file_size;
-       u32 m_max_mds;                  /* size of m_addr, m_state arrays */
-       struct ceph_mds_info *m_info;
-
-       /* which object pools file data can be stored in */
-       int m_num_data_pg_pools;
-       u32 *m_data_pg_pools;
-       u32 m_cas_pg_pool;
-};
-
-static inline struct ceph_entity_addr *
-ceph_mdsmap_get_addr(struct ceph_mdsmap *m, int w)
-{
-       if (w >= m->m_max_mds)
-               return NULL;
-       return &m->m_info[w].addr;
-}
-
-static inline int ceph_mdsmap_get_state(struct ceph_mdsmap *m, int w)
-{
-       BUG_ON(w < 0);
-       if (w >= m->m_max_mds)
-               return CEPH_MDS_STATE_DNE;
-       return m->m_info[w].state;
-}
-
-static inline bool ceph_mdsmap_is_laggy(struct ceph_mdsmap *m, int w)
-{
-       if (w >= 0 && w < m->m_max_mds)
-               return m->m_info[w].laggy;
-       return false;
-}
-
-extern int ceph_mdsmap_get_random_mds(struct ceph_mdsmap *m);
-extern struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end);
-extern void ceph_mdsmap_destroy(struct ceph_mdsmap *m);
-
-#endif
diff --git a/fs/ceph/messenger.c b/fs/ceph/messenger.c

deleted file mode 100644 (file)

index 2502d76..0000000
--- a/fs/ceph/messenger.c
+++ /dev/null
@@ -1,2277 +0,0 @@
-#include "ceph_debug.h"
-
-#include <linux/crc32c.h>
-#include <linux/ctype.h>
-#include <linux/highmem.h>
-#include <linux/inet.h>
-#include <linux/kthread.h>
-#include <linux/net.h>
-#include <linux/slab.h>
-#include <linux/socket.h>
-#include <linux/string.h>
-#include <net/tcp.h>
-
-#include "super.h"
-#include "messenger.h"
-#include "decode.h"
-#include "pagelist.h"
-
-/*
- * Ceph uses the messenger to exchange ceph_msg messages with other
- * hosts in the system.  The messenger provides ordered and reliable
- * delivery.  We tolerate TCP disconnects by reconnecting (with
- * exponential backoff) in the case of a fault (disconnection, bad
- * crc, protocol error).  Acks allow sent messages to be discarded by
- * the sender.
- */
-
-/* static tag bytes (protocol control messages) */
-static char tag_msg = CEPH_MSGR_TAG_MSG;
-static char tag_ack = CEPH_MSGR_TAG_ACK;
-static char tag_keepalive = CEPH_MSGR_TAG_KEEPALIVE;
-
-#ifdef CONFIG_LOCKDEP
-static struct lock_class_key socket_class;
-#endif
-
-
-static void queue_con(struct ceph_connection *con);
-static void con_work(struct work_struct *);
-static void ceph_fault(struct ceph_connection *con);
-
-/*
- * nicely render a sockaddr as a string.
- */
-#define MAX_ADDR_STR 20
-#define MAX_ADDR_STR_LEN 60
-static char addr_str[MAX_ADDR_STR][MAX_ADDR_STR_LEN];
-static DEFINE_SPINLOCK(addr_str_lock);
-static int last_addr_str;
-
-const char *pr_addr(const struct sockaddr_storage *ss)
-{
-       int i;
-       char *s;
-       struct sockaddr_in *in4 = (void *)ss;
-       struct sockaddr_in6 *in6 = (void *)ss;
-
-       spin_lock(&addr_str_lock);
-       i = last_addr_str++;
-       if (last_addr_str == MAX_ADDR_STR)
-               last_addr_str = 0;
-       spin_unlock(&addr_str_lock);
-       s = addr_str[i];
-
-       switch (ss->ss_family) {
-       case AF_INET:
-               snprintf(s, MAX_ADDR_STR_LEN, "%pI4:%u", &in4->sin_addr,
-                        (unsigned int)ntohs(in4->sin_port));
-               break;
-
-       case AF_INET6:
-               snprintf(s, MAX_ADDR_STR_LEN, "[%pI6c]:%u", &in6->sin6_addr,
-                        (unsigned int)ntohs(in6->sin6_port));
-               break;
-
-       default:
-               sprintf(s, "(unknown sockaddr family %d)", (int)ss->ss_family);
-       }
-
-       return s;
-}
-
-static void encode_my_addr(struct ceph_messenger *msgr)
-{
-       memcpy(&msgr->my_enc_addr, &msgr->inst.addr, sizeof(msgr->my_enc_addr));
-       ceph_encode_addr(&msgr->my_enc_addr);
-}
-
-/*
- * work queue for all reading and writing to/from the socket.
- */
-struct workqueue_struct *ceph_msgr_wq;
-
-int __init ceph_msgr_init(void)
-{
-       ceph_msgr_wq = create_workqueue("ceph-msgr");
-       if (IS_ERR(ceph_msgr_wq)) {
-               int ret = PTR_ERR(ceph_msgr_wq);
-               pr_err("msgr_init failed to create workqueue: %d\n", ret);
-               ceph_msgr_wq = NULL;
-               return ret;
-       }
-       return 0;
-}
-
-void ceph_msgr_exit(void)
-{
-       destroy_workqueue(ceph_msgr_wq);
-}
-
-void ceph_msgr_flush(void)
-{
-       flush_workqueue(ceph_msgr_wq);
-}
-
-
-/*
- * socket callback functions
- */
-
-/* data available on socket, or listen socket received a connect */
-static void ceph_data_ready(struct sock *sk, int count_unused)
-{
-       struct ceph_connection *con =
-               (struct ceph_connection *)sk->sk_user_data;
-       if (sk->sk_state != TCP_CLOSE_WAIT) {
-               dout("ceph_data_ready on %p state = %lu, queueing work\n",
-                    con, con->state);
-               queue_con(con);
-       }
-}
-
-/* socket has buffer space for writing */
-static void ceph_write_space(struct sock *sk)
-{
-       struct ceph_connection *con =
-               (struct ceph_connection *)sk->sk_user_data;
-
-       /* only queue to workqueue if there is data we want to write. */
-       if (test_bit(WRITE_PENDING, &con->state)) {
-               dout("ceph_write_space %p queueing write work\n", con);
-               queue_con(con);
-       } else {
-               dout("ceph_write_space %p nothing to write\n", con);
-       }
-
-       /* since we have our own write_space, clear the SOCK_NOSPACE flag */
-       clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-}
-
-/* socket's state has changed */
-static void ceph_state_change(struct sock *sk)
-{
-       struct ceph_connection *con =
-               (struct ceph_connection *)sk->sk_user_data;
-
-       dout("ceph_state_change %p state = %lu sk_state = %u\n",
-            con, con->state, sk->sk_state);
-
-       if (test_bit(CLOSED, &con->state))
-               return;
-
-       switch (sk->sk_state) {
-       case TCP_CLOSE:
-               dout("ceph_state_change TCP_CLOSE\n");
-       case TCP_CLOSE_WAIT:
-               dout("ceph_state_change TCP_CLOSE_WAIT\n");
-               if (test_and_set_bit(SOCK_CLOSED, &con->state) == 0) {
-                       if (test_bit(CONNECTING, &con->state))
-                               con->error_msg = "connection failed";
-                       else
-                               con->error_msg = "socket closed";
-                       queue_con(con);
-               }
-               break;
-       case TCP_ESTABLISHED:
-               dout("ceph_state_change TCP_ESTABLISHED\n");
-               queue_con(con);
-               break;
-       }
-}
-
-/*
- * set up socket callbacks
- */
-static void set_sock_callbacks(struct socket *sock,
-                              struct ceph_connection *con)
-{
-       struct sock *sk = sock->sk;
-       sk->sk_user_data = (void *)con;
-       sk->sk_data_ready = ceph_data_ready;
-       sk->sk_write_space = ceph_write_space;
-       sk->sk_state_change = ceph_state_change;
-}
-
-
-/*
- * socket helpers
- */
-
-/*
- * initiate connection to a remote socket.
- */
-static struct socket *ceph_tcp_connect(struct ceph_connection *con)
-{
-       struct sockaddr_storage *paddr = &con->peer_addr.in_addr;
-       struct socket *sock;
-       int ret;
-
-       BUG_ON(con->sock);
-       ret = sock_create_kern(con->peer_addr.in_addr.ss_family, SOCK_STREAM,
-                              IPPROTO_TCP, &sock);
-       if (ret)
-               return ERR_PTR(ret);
-       con->sock = sock;
-       sock->sk->sk_allocation = GFP_NOFS;
-
-#ifdef CONFIG_LOCKDEP
-       lockdep_set_class(&sock->sk->sk_lock, &socket_class);
-#endif
-
-       set_sock_callbacks(sock, con);
-
-       dout("connect %s\n", pr_addr(&con->peer_addr.in_addr));
-
-       ret = sock->ops->connect(sock, (struct sockaddr *)paddr, sizeof(*paddr),
-                                O_NONBLOCK);
-       if (ret == -EINPROGRESS) {
-               dout("connect %s EINPROGRESS sk_state = %u\n",
-                    pr_addr(&con->peer_addr.in_addr),
-                    sock->sk->sk_state);
-               ret = 0;
-       }
-       if (ret < 0) {
-               pr_err("connect %s error %d\n",
-                      pr_addr(&con->peer_addr.in_addr), ret);
-               sock_release(sock);
-               con->sock = NULL;
-               con->error_msg = "connect error";
-       }
-
-       if (ret < 0)
-               return ERR_PTR(ret);
-       return sock;
-}
-
-static int ceph_tcp_recvmsg(struct socket *sock, void *buf, size_t len)
-{
-       struct kvec iov = {buf, len};
-       struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL };
-
-       return kernel_recvmsg(sock, &msg, &iov, 1, len, msg.msg_flags);
-}
-
-/*
- * write something.  @more is true if caller will be sending more data
- * shortly.
- */
-static int ceph_tcp_sendmsg(struct socket *sock, struct kvec *iov,
-                    size_t kvlen, size_t len, int more)
-{
-       struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL };
-
-       if (more)
-               msg.msg_flags |= MSG_MORE;
-       else
-               msg.msg_flags |= MSG_EOR;  /* superfluous, but what the hell */
-
-       return kernel_sendmsg(sock, &msg, iov, kvlen, len);
-}
-
-
-/*
- * Shutdown/close the socket for the given connection.
- */
-static int con_close_socket(struct ceph_connection *con)
-{
-       int rc;
-
-       dout("con_close_socket on %p sock %p\n", con, con->sock);
-       if (!con->sock)
-               return 0;
-       set_bit(SOCK_CLOSED, &con->state);
-       rc = con->sock->ops->shutdown(con->sock, SHUT_RDWR);
-       sock_release(con->sock);
-       con->sock = NULL;
-       clear_bit(SOCK_CLOSED, &con->state);
-       return rc;
-}
-
-/*
- * Reset a connection.  Discard all incoming and outgoing messages
- * and clear *_seq state.
- */
-static void ceph_msg_remove(struct ceph_msg *msg)
-{
-       list_del_init(&msg->list_head);
-       ceph_msg_put(msg);
-}
-static void ceph_msg_remove_list(struct list_head *head)
-{
-       while (!list_empty(head)) {
-               struct ceph_msg *msg = list_first_entry(head, struct ceph_msg,
-                                                       list_head);
-               ceph_msg_remove(msg);
-       }
-}
-
-static void reset_connection(struct ceph_connection *con)
-{
-       /* reset connection, out_queue, msg_ and connect_seq */
-       /* discard existing out_queue and msg_seq */
-       ceph_msg_remove_list(&con->out_queue);
-       ceph_msg_remove_list(&con->out_sent);
-
-       if (con->in_msg) {
-               ceph_msg_put(con->in_msg);
-               con->in_msg = NULL;
-       }
-
-       con->connect_seq = 0;
-       con->out_seq = 0;
-       if (con->out_msg) {
-               ceph_msg_put(con->out_msg);
-               con->out_msg = NULL;
-       }
-       con->out_keepalive_pending = false;
-       con->in_seq = 0;
-       con->in_seq_acked = 0;
-}
-
-/*
- * mark a peer down.  drop any open connections.
- */
-void ceph_con_close(struct ceph_connection *con)
-{
-       dout("con_close %p peer %s\n", con, pr_addr(&con->peer_addr.in_addr));
-       set_bit(CLOSED, &con->state);  /* in case there's queued work */
-       clear_bit(STANDBY, &con->state);  /* avoid connect_seq bump */
-       clear_bit(LOSSYTX, &con->state);  /* so we retry next connect */
-       clear_bit(KEEPALIVE_PENDING, &con->state);
-       clear_bit(WRITE_PENDING, &con->state);
-       mutex_lock(&con->mutex);
-       reset_connection(con);
-       con->peer_global_seq = 0;
-       cancel_delayed_work(&con->work);
-       mutex_unlock(&con->mutex);
-       queue_con(con);
-}
-
-/*
- * Reopen a closed connection, with a new peer address.
- */
-void ceph_con_open(struct ceph_connection *con, struct ceph_entity_addr *addr)
-{
-       dout("con_open %p %s\n", con, pr_addr(&addr->in_addr));
-       set_bit(OPENING, &con->state);
-       clear_bit(CLOSED, &con->state);
-       memcpy(&con->peer_addr, addr, sizeof(*addr));
-       con->delay = 0;      /* reset backoff memory */
-       queue_con(con);
-}
-
-/*
- * return true if this connection ever successfully opened
- */
-bool ceph_con_opened(struct ceph_connection *con)
-{
-       return con->connect_seq > 0;
-}
-
-/*
- * generic get/put
- */
-struct ceph_connection *ceph_con_get(struct ceph_connection *con)
-{
-       dout("con_get %p nref = %d -> %d\n", con,
-            atomic_read(&con->nref), atomic_read(&con->nref) + 1);
-       if (atomic_inc_not_zero(&con->nref))
-               return con;
-       return NULL;
-}
-
-void ceph_con_put(struct ceph_connection *con)
-{
-       dout("con_put %p nref = %d -> %d\n", con,
-            atomic_read(&con->nref), atomic_read(&con->nref) - 1);
-       BUG_ON(atomic_read(&con->nref) == 0);
-       if (atomic_dec_and_test(&con->nref)) {
-               BUG_ON(con->sock);
-               kfree(con);
-       }
-}
-
-/*
- * initialize a new connection.
- */
-void ceph_con_init(struct ceph_messenger *msgr, struct ceph_connection *con)
-{
-       dout("con_init %p\n", con);
-       memset(con, 0, sizeof(*con));
-       atomic_set(&con->nref, 1);
-       con->msgr = msgr;
-       mutex_init(&con->mutex);
-       INIT_LIST_HEAD(&con->out_queue);
-       INIT_LIST_HEAD(&con->out_sent);
-       INIT_DELAYED_WORK(&con->work, con_work);
-}
-
-
-/*
- * We maintain a global counter to order connection attempts.  Get
- * a unique seq greater than @gt.
- */
-static u32 get_global_seq(struct ceph_messenger *msgr, u32 gt)
-{
-       u32 ret;
-
-       spin_lock(&msgr->global_seq_lock);
-       if (msgr->global_seq < gt)
-               msgr->global_seq = gt;
-       ret = ++msgr->global_seq;
-       spin_unlock(&msgr->global_seq_lock);
-       return ret;
-}
-
-
-/*
- * Prepare footer for currently outgoing message, and finish things
- * off.  Assumes out_kvec* are already valid.. we just add on to the end.
- */
-static void prepare_write_message_footer(struct ceph_connection *con, int v)
-{
-       struct ceph_msg *m = con->out_msg;
-
-       dout("prepare_write_message_footer %p\n", con);
-       con->out_kvec_is_msg = true;
-       con->out_kvec[v].iov_base = &m->footer;
-       con->out_kvec[v].iov_len = sizeof(m->footer);
-       con->out_kvec_bytes += sizeof(m->footer);
-       con->out_kvec_left++;
-       con->out_more = m->more_to_follow;
-       con->out_msg_done = true;
-}
-
-/*
- * Prepare headers for the next outgoing message.
- */
-static void prepare_write_message(struct ceph_connection *con)
-{
-       struct ceph_msg *m;
-       int v = 0;
-
-       con->out_kvec_bytes = 0;
-       con->out_kvec_is_msg = true;
-       con->out_msg_done = false;
-
-       /* Sneak an ack in there first?  If we can get it into the same
-        * TCP packet that's a good thing. */
-       if (con->in_seq > con->in_seq_acked) {
-               con->in_seq_acked = con->in_seq;
-               con->out_kvec[v].iov_base = &tag_ack;
-               con->out_kvec[v++].iov_len = 1;
-               con->out_temp_ack = cpu_to_le64(con->in_seq_acked);
-               con->out_kvec[v].iov_base = &con->out_temp_ack;
-               con->out_kvec[v++].iov_len = sizeof(con->out_temp_ack);
-               con->out_kvec_bytes = 1 + sizeof(con->out_temp_ack);
-       }
-
-       m = list_first_entry(&con->out_queue,
-                      struct ceph_msg, list_head);
-       con->out_msg = m;
-       if (test_bit(LOSSYTX, &con->state)) {
-               list_del_init(&m->list_head);
-       } else {
-               /* put message on sent list */
-               ceph_msg_get(m);
-               list_move_tail(&m->list_head, &con->out_sent);
-       }
-
-       /*
-        * only assign outgoing seq # if we haven't sent this message
-        * yet.  if it is requeued, resend with it's original seq.
-        */
-       if (m->needs_out_seq) {
-               m->hdr.seq = cpu_to_le64(++con->out_seq);
-               m->needs_out_seq = false;
-       }
-
-       dout("prepare_write_message %p seq %lld type %d len %d+%d+%d %d pgs\n",
-            m, con->out_seq, le16_to_cpu(m->hdr.type),
-            le32_to_cpu(m->hdr.front_len), le32_to_cpu(m->hdr.middle_len),
-            le32_to_cpu(m->hdr.data_len),
-            m->nr_pages);
-       BUG_ON(le32_to_cpu(m->hdr.front_len) != m->front.iov_len);
-
-       /* tag + hdr + front + middle */
-       con->out_kvec[v].iov_base = &tag_msg;
-       con->out_kvec[v++].iov_len = 1;
-       con->out_kvec[v].iov_base = &m->hdr;
-       con->out_kvec[v++].iov_len = sizeof(m->hdr);
-       con->out_kvec[v++] = m->front;
-       if (m->middle)
-               con->out_kvec[v++] = m->middle->vec;
-       con->out_kvec_left = v;
-       con->out_kvec_bytes += 1 + sizeof(m->hdr) + m->front.iov_len +
-               (m->middle ? m->middle->vec.iov_len : 0);
-       con->out_kvec_cur = con->out_kvec;
-
-       /* fill in crc (except data pages), footer */
-       con->out_msg->hdr.crc =
-               cpu_to_le32(crc32c(0, (void *)&m->hdr,
-                                     sizeof(m->hdr) - sizeof(m->hdr.crc)));
-       con->out_msg->footer.flags = CEPH_MSG_FOOTER_COMPLETE;
-       con->out_msg->footer.front_crc =
-               cpu_to_le32(crc32c(0, m->front.iov_base, m->front.iov_len));
-       if (m->middle)
-               con->out_msg->footer.middle_crc =
-                       cpu_to_le32(crc32c(0, m->middle->vec.iov_base,
-                                          m->middle->vec.iov_len));
-       else
-               con->out_msg->footer.middle_crc = 0;
-       con->out_msg->footer.data_crc = 0;
-       dout("prepare_write_message front_crc %u data_crc %u\n",
-            le32_to_cpu(con->out_msg->footer.front_crc),
-            le32_to_cpu(con->out_msg->footer.middle_crc));
-
-       /* is there a data payload? */
-       if (le32_to_cpu(m->hdr.data_len) > 0) {
-               /* initialize page iterator */
-               con->out_msg_pos.page = 0;
-               con->out_msg_pos.page_pos =
-                       le16_to_cpu(m->hdr.data_off) & ~PAGE_MASK;
-               con->out_msg_pos.data_pos = 0;
-               con->out_msg_pos.did_page_crc = 0;
-               con->out_more = 1;  /* data + footer will follow */
-       } else {
-               /* no, queue up footer too and be done */
-               prepare_write_message_footer(con, v);
-       }
-
-       set_bit(WRITE_PENDING, &con->state);
-}
-
-/*
- * Prepare an ack.
- */
-static void prepare_write_ack(struct ceph_connection *con)
-{
-       dout("prepare_write_ack %p %llu -> %llu\n", con,
-            con->in_seq_acked, con->in_seq);
-       con->in_seq_acked = con->in_seq;
-
-       con->out_kvec[0].iov_base = &tag_ack;
-       con->out_kvec[0].iov_len = 1;
-       con->out_temp_ack = cpu_to_le64(con->in_seq_acked);
-       con->out_kvec[1].iov_base = &con->out_temp_ack;
-       con->out_kvec[1].iov_len = sizeof(con->out_temp_ack);
-       con->out_kvec_left = 2;
-       con->out_kvec_bytes = 1 + sizeof(con->out_temp_ack);
-       con->out_kvec_cur = con->out_kvec;
-       con->out_more = 1;  /* more will follow.. eventually.. */
-       set_bit(WRITE_PENDING, &con->state);
-}
-
-/*
- * Prepare to write keepalive byte.
- */
-static void prepare_write_keepalive(struct ceph_connection *con)
-{
-       dout("prepare_write_keepalive %p\n", con);
-       con->out_kvec[0].iov_base = &tag_keepalive;
-       con->out_kvec[0].iov_len = 1;
-       con->out_kvec_left = 1;
-       con->out_kvec_bytes = 1;
-       con->out_kvec_cur = con->out_kvec;
-       set_bit(WRITE_PENDING, &con->state);
-}
-
-/*
- * Connection negotiation.
- */
-
-static void prepare_connect_authorizer(struct ceph_connection *con)
-{
-       void *auth_buf;
-       int auth_len = 0;
-       int auth_protocol = 0;
-
-       mutex_unlock(&con->mutex);
-       if (con->ops->get_authorizer)
-               con->ops->get_authorizer(con, &auth_buf, &auth_len,
-                                        &auth_protocol, &con->auth_reply_buf,
-                                        &con->auth_reply_buf_len,
-                                        con->auth_retry);
-       mutex_lock(&con->mutex);
-
-       con->out_connect.authorizer_protocol = cpu_to_le32(auth_protocol);
-       con->out_connect.authorizer_len = cpu_to_le32(auth_len);
-
-       con->out_kvec[con->out_kvec_left].iov_base = auth_buf;
-       con->out_kvec[con->out_kvec_left].iov_len = auth_len;
-       con->out_kvec_left++;
-       con->out_kvec_bytes += auth_len;
-}
-
-/*
- * We connected to a peer and are saying hello.
- */
-static void prepare_write_banner(struct ceph_messenger *msgr,
-                                struct ceph_connection *con)
-{
-       int len = strlen(CEPH_BANNER);
-
-       con->out_kvec[0].iov_base = CEPH_BANNER;
-       con->out_kvec[0].iov_len = len;
-       con->out_kvec[1].iov_base = &msgr->my_enc_addr;
-       con->out_kvec[1].iov_len = sizeof(msgr->my_enc_addr);
-       con->out_kvec_left = 2;
-       con->out_kvec_bytes = len + sizeof(msgr->my_enc_addr);
-       con->out_kvec_cur = con->out_kvec;
-       con->out_more = 0;
-       set_bit(WRITE_PENDING, &con->state);
-}
-
-static void prepare_write_connect(struct ceph_messenger *msgr,
-                                 struct ceph_connection *con,
-                                 int after_banner)
-{
-       unsigned global_seq = get_global_seq(con->msgr, 0);
-       int proto;
-
-       switch (con->peer_name.type) {
-       case CEPH_ENTITY_TYPE_MON:
-               proto = CEPH_MONC_PROTOCOL;
-               break;
-       case CEPH_ENTITY_TYPE_OSD:
-               proto = CEPH_OSDC_PROTOCOL;
-               break;
-       case CEPH_ENTITY_TYPE_MDS:
-               proto = CEPH_MDSC_PROTOCOL;
-               break;
-       default:
-               BUG();
-       }
-
-       dout("prepare_write_connect %p cseq=%d gseq=%d proto=%d\n", con,
-            con->connect_seq, global_seq, proto);
-
-       con->out_connect.features = cpu_to_le64(CEPH_FEATURE_SUPPORTED);
-       con->out_connect.host_type = cpu_to_le32(CEPH_ENTITY_TYPE_CLIENT);
-       con->out_connect.connect_seq = cpu_to_le32(con->connect_seq);
-       con->out_connect.global_seq = cpu_to_le32(global_seq);
-       con->out_connect.protocol_version = cpu_to_le32(proto);
-       con->out_connect.flags = 0;
-
-       if (!after_banner) {
-               con->out_kvec_left = 0;
-               con->out_kvec_bytes = 0;
-       }
-       con->out_kvec[con->out_kvec_left].iov_base = &con->out_connect;
-       con->out_kvec[con->out_kvec_left].iov_len = sizeof(con->out_connect);
-       con->out_kvec_left++;
-       con->out_kvec_bytes += sizeof(con->out_connect);
-       con->out_kvec_cur = con->out_kvec;
-       con->out_more = 0;
-       set_bit(WRITE_PENDING, &con->state);
-
-       prepare_connect_authorizer(con);
-}
-
-
-/*
- * write as much of pending kvecs to the socket as we can.
- *  1 -> done
- *  0 -> socket full, but more to do
- * <0 -> error
- */
-static int write_partial_kvec(struct ceph_connection *con)
-{
-       int ret;
-
-       dout("write_partial_kvec %p %d left\n", con, con->out_kvec_bytes);
-       while (con->out_kvec_bytes > 0) {
-               ret = ceph_tcp_sendmsg(con->sock, con->out_kvec_cur,
-                                      con->out_kvec_left, con->out_kvec_bytes,
-                                      con->out_more);
-               if (ret <= 0)
-                       goto out;
-               con->out_kvec_bytes -= ret;
-               if (con->out_kvec_bytes == 0)
-                       break;            /* done */
-               while (ret > 0) {
-                       if (ret >= con->out_kvec_cur->iov_len) {
-                               ret -= con->out_kvec_cur->iov_len;
-                               con->out_kvec_cur++;
-                               con->out_kvec_left--;
-                       } else {
-                               con->out_kvec_cur->iov_len -= ret;
-                               con->out_kvec_cur->iov_base += ret;
-                               ret = 0;
-                               break;
-                       }
-               }
-       }
-       con->out_kvec_left = 0;
-       con->out_kvec_is_msg = false;
-       ret = 1;
-out:
-       dout("write_partial_kvec %p %d left in %d kvecs ret = %d\n", con,
-            con->out_kvec_bytes, con->out_kvec_left, ret);
-       return ret;  /* done! */
-}
-
-/*
- * Write as much message data payload as we can.  If we finish, queue
- * up the footer.
- *  1 -> done, footer is now queued in out_kvec[].
- *  0 -> socket full, but more to do
- * <0 -> error
- */
-static int write_partial_msg_pages(struct ceph_connection *con)
-{
-       struct ceph_msg *msg = con->out_msg;
-       unsigned data_len = le32_to_cpu(msg->hdr.data_len);
-       size_t len;
-       int crc = con->msgr->nocrc;
-       int ret;
-
-       dout("write_partial_msg_pages %p msg %p page %d/%d offset %d\n",
-            con, con->out_msg, con->out_msg_pos.page, con->out_msg->nr_pages,
-            con->out_msg_pos.page_pos);
-
-       while (con->out_msg_pos.page < con->out_msg->nr_pages) {
-               struct page *page = NULL;
-               void *kaddr = NULL;
-
-               /*
-                * if we are calculating the data crc (the default), we need
-                * to map the page.  if our pages[] has been revoked, use the
-                * zero page.
-                */
-               if (msg->pages) {
-                       page = msg->pages[con->out_msg_pos.page];
-                       if (crc)
-                               kaddr = kmap(page);
-               } else if (msg->pagelist) {
-                       page = list_first_entry(&msg->pagelist->head,
-                                               struct page, lru);
-                       if (crc)
-                               kaddr = kmap(page);
-               } else {
-                       page = con->msgr->zero_page;
-                       if (crc)
-                               kaddr = page_address(con->msgr->zero_page);
-               }
-               len = min((int)(PAGE_SIZE - con->out_msg_pos.page_pos),
-                         (int)(data_len - con->out_msg_pos.data_pos));
-               if (crc && !con->out_msg_pos.did_page_crc) {
-                       void *base = kaddr + con->out_msg_pos.page_pos;
-                       u32 tmpcrc = le32_to_cpu(con->out_msg->footer.data_crc);
-
-                       BUG_ON(kaddr == NULL);
-                       con->out_msg->footer.data_crc =
-                               cpu_to_le32(crc32c(tmpcrc, base, len));
-                       con->out_msg_pos.did_page_crc = 1;
-               }
-
-               ret = kernel_sendpage(con->sock, page,
-                                     con->out_msg_pos.page_pos, len,
-                                     MSG_DONTWAIT | MSG_NOSIGNAL |
-                                     MSG_MORE);
-
-               if (crc && (msg->pages || msg->pagelist))
-                       kunmap(page);
-
-               if (ret <= 0)
-                       goto out;
-
-               con->out_msg_pos.data_pos += ret;
-               con->out_msg_pos.page_pos += ret;
-               if (ret == len) {
-                       con->out_msg_pos.page_pos = 0;
-                       con->out_msg_pos.page++;
-                       con->out_msg_pos.did_page_crc = 0;
-                       if (msg->pagelist)
-                               list_move_tail(&page->lru,
-                                              &msg->pagelist->head);
-               }
-       }
-
-       dout("write_partial_msg_pages %p msg %p done\n", con, msg);
-
-       /* prepare and queue up footer, too */
-       if (!crc)
-               con->out_msg->footer.flags |= CEPH_MSG_FOOTER_NOCRC;
-       con->out_kvec_bytes = 0;
-       con->out_kvec_left = 0;
-       con->out_kvec_cur = con->out_kvec;
-       prepare_write_message_footer(con, 0);
-       ret = 1;
-out:
-       return ret;
-}
-
-/*
- * write some zeros
- */
-static int write_partial_skip(struct ceph_connection *con)
-{
-       int ret;
-
-       while (con->out_skip > 0) {
-               struct kvec iov = {
-                       .iov_base = page_address(con->msgr->zero_page),
-                       .iov_len = min(con->out_skip, (int)PAGE_CACHE_SIZE)
-               };
-
-               ret = ceph_tcp_sendmsg(con->sock, &iov, 1, iov.iov_len, 1);
-               if (ret <= 0)
-                       goto out;
-               con->out_skip -= ret;
-       }
-       ret = 1;
-out:
-       return ret;
-}
-
-/*
- * Prepare to read connection handshake, or an ack.
- */
-static void prepare_read_banner(struct ceph_connection *con)
-{
-       dout("prepare_read_banner %p\n", con);
-       con->in_base_pos = 0;
-}
-
-static void prepare_read_connect(struct ceph_connection *con)
-{
-       dout("prepare_read_connect %p\n", con);
-       con->in_base_pos = 0;
-}
-
-static void prepare_read_ack(struct ceph_connection *con)
-{
-       dout("prepare_read_ack %p\n", con);
-       con->in_base_pos = 0;
-}
-
-static void prepare_read_tag(struct ceph_connection *con)
-{
-       dout("prepare_read_tag %p\n", con);
-       con->in_base_pos = 0;
-       con->in_tag = CEPH_MSGR_TAG_READY;
-}
-
-/*
- * Prepare to read a message.
- */
-static int prepare_read_message(struct ceph_connection *con)
-{
-       dout("prepare_read_message %p\n", con);
-       BUG_ON(con->in_msg != NULL);
-       con->in_base_pos = 0;
-       con->in_front_crc = con->in_middle_crc = con->in_data_crc = 0;
-       return 0;
-}
-
-
-static int read_partial(struct ceph_connection *con,
-                       int *to, int size, void *object)
-{
-       *to += size;
-       while (con->in_base_pos < *to) {
-               int left = *to - con->in_base_pos;
-               int have = size - left;
-               int ret = ceph_tcp_recvmsg(con->sock, object + have, left);
-               if (ret <= 0)
-                       return ret;
-               con->in_base_pos += ret;
-       }
-       return 1;
-}
-
-
-/*
- * Read all or part of the connect-side handshake on a new connection
- */
-static int read_partial_banner(struct ceph_connection *con)
-{
-       int ret, to = 0;
-
-       dout("read_partial_banner %p at %d\n", con, con->in_base_pos);
-
-       /* peer's banner */
-       ret = read_partial(con, &to, strlen(CEPH_BANNER), con->in_banner);
-       if (ret <= 0)
-               goto out;
-       ret = read_partial(con, &to, sizeof(con->actual_peer_addr),
-                          &con->actual_peer_addr);
-       if (ret <= 0)
-               goto out;
-       ret = read_partial(con, &to, sizeof(con->peer_addr_for_me),
-                          &con->peer_addr_for_me);
-       if (ret <= 0)
-               goto out;
-out:
-       return ret;
-}
-
-static int read_partial_connect(struct ceph_connection *con)
-{
-       int ret, to = 0;
-
-       dout("read_partial_connect %p at %d\n", con, con->in_base_pos);
-
-       ret = read_partial(con, &to, sizeof(con->in_reply), &con->in_reply);
-       if (ret <= 0)
-               goto out;
-       ret = read_partial(con, &to, le32_to_cpu(con->in_reply.authorizer_len),
-                          con->auth_reply_buf);
-       if (ret <= 0)
-               goto out;
-
-       dout("read_partial_connect %p tag %d, con_seq = %u, g_seq = %u\n",
-            con, (int)con->in_reply.tag,
-            le32_to_cpu(con->in_reply.connect_seq),
-            le32_to_cpu(con->in_reply.global_seq));
-out:
-       return ret;
-
-}
-
-/*
- * Verify the hello banner looks okay.
- */
-static int verify_hello(struct ceph_connection *con)
-{
-       if (memcmp(con->in_banner, CEPH_BANNER, strlen(CEPH_BANNER))) {
-               pr_err("connect to %s got bad banner\n",
-                      pr_addr(&con->peer_addr.in_addr));
-               con->error_msg = "protocol error, bad banner";
-               return -1;
-       }
-       return 0;
-}
-
-static bool addr_is_blank(struct sockaddr_storage *ss)
-{
-       switch (ss->ss_family) {
-       case AF_INET:
-               return ((struct sockaddr_in *)ss)->sin_addr.s_addr == 0;
-       case AF_INET6:
-               return
-                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[0] == 0 &&
-                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[1] == 0 &&
-                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[2] == 0 &&
-                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[3] == 0;
-       }
-       return false;
-}
-
-static int addr_port(struct sockaddr_storage *ss)
-{
-       switch (ss->ss_family) {
-       case AF_INET:
-               return ntohs(((struct sockaddr_in *)ss)->sin_port);
-       case AF_INET6:
-               return ntohs(((struct sockaddr_in6 *)ss)->sin6_port);
-       }
-       return 0;
-}
-
-static void addr_set_port(struct sockaddr_storage *ss, int p)
-{
-       switch (ss->ss_family) {
-       case AF_INET:
-               ((struct sockaddr_in *)ss)->sin_port = htons(p);
-       case AF_INET6:
-               ((struct sockaddr_in6 *)ss)->sin6_port = htons(p);
-       }
-}
-
-/*
- * Parse an ip[:port] list into an addr array.  Use the default
- * monitor port if a port isn't specified.
- */
-int ceph_parse_ips(const char *c, const char *end,
-                  struct ceph_entity_addr *addr,
-                  int max_count, int *count)
-{
-       int i;
-       const char *p = c;
-
-       dout("parse_ips on '%.*s'\n", (int)(end-c), c);
-       for (i = 0; i < max_count; i++) {
-               const char *ipend;
-               struct sockaddr_storage *ss = &addr[i].in_addr;
-               struct sockaddr_in *in4 = (void *)ss;
-               struct sockaddr_in6 *in6 = (void *)ss;
-               int port;
-               char delim = ',';
-
-               if (*p == '[') {
-                       delim = ']';
-                       p++;
-               }
-
-               memset(ss, 0, sizeof(*ss));
-               if (in4_pton(p, end - p, (u8 *)&in4->sin_addr.s_addr,
-                            delim, &ipend))
-                       ss->ss_family = AF_INET;
-               else if (in6_pton(p, end - p, (u8 *)&in6->sin6_addr.s6_addr,
-                                 delim, &ipend))
-                       ss->ss_family = AF_INET6;
-               else
-                       goto bad;
-               p = ipend;
-
-               if (delim == ']') {
-                       if (*p != ']') {
-                               dout("missing matching ']'\n");
-                               goto bad;
-                       }
-                       p++;
-               }
-
-               /* port? */
-               if (p < end && *p == ':') {
-                       port = 0;
-                       p++;
-                       while (p < end && *p >= '0' && *p <= '9') {
-                               port = (port * 10) + (*p - '0');
-                               p++;
-                       }
-                       if (port > 65535 || port == 0)
-                               goto bad;
-               } else {
-                       port = CEPH_MON_PORT;
-               }
-
-               addr_set_port(ss, port);
-
-               dout("parse_ips got %s\n", pr_addr(ss));
-
-               if (p == end)
-                       break;
-               if (*p != ',')
-                       goto bad;
-               p++;
-       }
-
-       if (p != end)
-               goto bad;
-
-       if (count)
-               *count = i + 1;
-       return 0;
-
-bad:
-       pr_err("parse_ips bad ip '%.*s'\n", (int)(end - c), c);
-       return -EINVAL;
-}
-
-static int process_banner(struct ceph_connection *con)
-{
-       dout("process_banner on %p\n", con);
-
-       if (verify_hello(con) < 0)
-               return -1;
-
-       ceph_decode_addr(&con->actual_peer_addr);
-       ceph_decode_addr(&con->peer_addr_for_me);
-
-       /*
-        * Make sure the other end is who we wanted.  note that the other
-        * end may not yet know their ip address, so if it's 0.0.0.0, give
-        * them the benefit of the doubt.
-        */
-       if (memcmp(&con->peer_addr, &con->actual_peer_addr,
-                  sizeof(con->peer_addr)) != 0 &&
-           !(addr_is_blank(&con->actual_peer_addr.in_addr) &&
-             con->actual_peer_addr.nonce == con->peer_addr.nonce)) {
-               pr_warning("wrong peer, want %s/%d, got %s/%d\n",
-                          pr_addr(&con->peer_addr.in_addr),
-                          (int)le32_to_cpu(con->peer_addr.nonce),
-                          pr_addr(&con->actual_peer_addr.in_addr),
-                          (int)le32_to_cpu(con->actual_peer_addr.nonce));
-               con->error_msg = "wrong peer at address";
-               return -1;
-       }
-
-       /*
-        * did we learn our address?
-        */
-       if (addr_is_blank(&con->msgr->inst.addr.in_addr)) {
-               int port = addr_port(&con->msgr->inst.addr.in_addr);
-
-               memcpy(&con->msgr->inst.addr.in_addr,
-                      &con->peer_addr_for_me.in_addr,
-                      sizeof(con->peer_addr_for_me.in_addr));
-               addr_set_port(&con->msgr->inst.addr.in_addr, port);
-               encode_my_addr(con->msgr);
-               dout("process_banner learned my addr is %s\n",
-                    pr_addr(&con->msgr->inst.addr.in_addr));
-       }
-
-       set_bit(NEGOTIATING, &con->state);
-       prepare_read_connect(con);
-       return 0;
-}
-
-static void fail_protocol(struct ceph_connection *con)
-{
-       reset_connection(con);
-       set_bit(CLOSED, &con->state);  /* in case there's queued work */
-
-       mutex_unlock(&con->mutex);
-       if (con->ops->bad_proto)
-               con->ops->bad_proto(con);
-       mutex_lock(&con->mutex);
-}
-
-static int process_connect(struct ceph_connection *con)
-{
-       u64 sup_feat = CEPH_FEATURE_SUPPORTED;
-       u64 req_feat = CEPH_FEATURE_REQUIRED;
-       u64 server_feat = le64_to_cpu(con->in_reply.features);
-
-       dout("process_connect on %p tag %d\n", con, (int)con->in_tag);
-
-       switch (con->in_reply.tag) {
-       case CEPH_MSGR_TAG_FEATURES:
-               pr_err("%s%lld %s feature set mismatch,"
-                      " my %llx < server's %llx, missing %llx\n",
-                      ENTITY_NAME(con->peer_name),
-                      pr_addr(&con->peer_addr.in_addr),
-                      sup_feat, server_feat, server_feat & ~sup_feat);
-               con->error_msg = "missing required protocol features";
-               fail_protocol(con);
-               return -1;
-
-       case CEPH_MSGR_TAG_BADPROTOVER:
-               pr_err("%s%lld %s protocol version mismatch,"
-                      " my %d != server's %d\n",
-                      ENTITY_NAME(con->peer_name),
-                      pr_addr(&con->peer_addr.in_addr),
-                      le32_to_cpu(con->out_connect.protocol_version),
-                      le32_to_cpu(con->in_reply.protocol_version));
-               con->error_msg = "protocol version mismatch";
-               fail_protocol(con);
-               return -1;
-
-       case CEPH_MSGR_TAG_BADAUTHORIZER:
-               con->auth_retry++;
-               dout("process_connect %p got BADAUTHORIZER attempt %d\n", con,
-                    con->auth_retry);
-               if (con->auth_retry == 2) {
-                       con->error_msg = "connect authorization failure";
-                       reset_connection(con);
-                       set_bit(CLOSED, &con->state);
-                       return -1;
-               }
-               con->auth_retry = 1;
-               prepare_write_connect(con->msgr, con, 0);
-               prepare_read_connect(con);
-               break;
-
-       case CEPH_MSGR_TAG_RESETSESSION:
-               /*
-                * If we connected with a large connect_seq but the peer
-                * has no record of a session with us (no connection, or
-                * connect_seq == 0), they will send RESETSESION to indicate
-                * that they must have reset their session, and may have
-                * dropped messages.
-                */
-               dout("process_connect got RESET peer seq %u\n",
-                    le32_to_cpu(con->in_connect.connect_seq));
-               pr_err("%s%lld %s connection reset\n",
-                      ENTITY_NAME(con->peer_name),
-                      pr_addr(&con->peer_addr.in_addr));
-               reset_connection(con);
-               prepare_write_connect(con->msgr, con, 0);
-               prepare_read_connect(con);
-
-               /* Tell ceph about it. */
-               mutex_unlock(&con->mutex);
-               pr_info("reset on %s%lld\n", ENTITY_NAME(con->peer_name));
-               if (con->ops->peer_reset)
-                       con->ops->peer_reset(con);
-               mutex_lock(&con->mutex);
-               break;
-
-       case CEPH_MSGR_TAG_RETRY_SESSION:
-               /*
-                * If we sent a smaller connect_seq than the peer has, try
-                * again with a larger value.
-                */
-               dout("process_connect got RETRY my seq = %u, peer_seq = %u\n",
-                    le32_to_cpu(con->out_connect.connect_seq),
-                    le32_to_cpu(con->in_connect.connect_seq));
-               con->connect_seq = le32_to_cpu(con->in_connect.connect_seq);
-               prepare_write_connect(con->msgr, con, 0);
-               prepare_read_connect(con);
-               break;
-
-       case CEPH_MSGR_TAG_RETRY_GLOBAL:
-               /*
-                * If we sent a smaller global_seq than the peer has, try
-                * again with a larger value.
-                */
-               dout("process_connect got RETRY_GLOBAL my %u peer_gseq %u\n",
-                    con->peer_global_seq,
-                    le32_to_cpu(con->in_connect.global_seq));
-               get_global_seq(con->msgr,
-                              le32_to_cpu(con->in_connect.global_seq));
-               prepare_write_connect(con->msgr, con, 0);
-               prepare_read_connect(con);
-               break;
-
-       case CEPH_MSGR_TAG_READY:
-               if (req_feat & ~server_feat) {
-                       pr_err("%s%lld %s protocol feature mismatch,"
-                              " my required %llx > server's %llx, need %llx\n",
-                              ENTITY_NAME(con->peer_name),
-                              pr_addr(&con->peer_addr.in_addr),
-                              req_feat, server_feat, req_feat & ~server_feat);
-                       con->error_msg = "missing required protocol features";
-                       fail_protocol(con);
-                       return -1;
-               }
-               clear_bit(CONNECTING, &con->state);
-               con->peer_global_seq = le32_to_cpu(con->in_reply.global_seq);
-               con->connect_seq++;
-               con->peer_features = server_feat;
-               dout("process_connect got READY gseq %d cseq %d (%d)\n",
-                    con->peer_global_seq,
-                    le32_to_cpu(con->in_reply.connect_seq),
-                    con->connect_seq);
-               WARN_ON(con->connect_seq !=
-                       le32_to_cpu(con->in_reply.connect_seq));
-
-               if (con->in_reply.flags & CEPH_MSG_CONNECT_LOSSY)
-                       set_bit(LOSSYTX, &con->state);
-
-               prepare_read_tag(con);
-               break;
-
-       case CEPH_MSGR_TAG_WAIT:
-               /*
-                * If there is a connection race (we are opening
-                * connections to each other), one of us may just have
-                * to WAIT.  This shouldn't happen if we are the
-                * client.
-                */
-               pr_err("process_connect peer connecting WAIT\n");
-
-       default:
-               pr_err("connect protocol error, will retry\n");
-               con->error_msg = "protocol error, garbage tag during connect";
-               return -1;
-       }
-       return 0;
-}
-
-
-/*
- * read (part of) an ack
- */
-static int read_partial_ack(struct ceph_connection *con)
-{
-       int to = 0;
-
-       return read_partial(con, &to, sizeof(con->in_temp_ack),
-                           &con->in_temp_ack);
-}
-
-
-/*
- * We can finally discard anything that's been acked.
- */
-static void process_ack(struct ceph_connection *con)
-{
-       struct ceph_msg *m;
-       u64 ack = le64_to_cpu(con->in_temp_ack);
-       u64 seq;
-
-       while (!list_empty(&con->out_sent)) {
-               m = list_first_entry(&con->out_sent, struct ceph_msg,
-                                    list_head);
-               seq = le64_to_cpu(m->hdr.seq);
-               if (seq > ack)
-                       break;
-               dout("got ack for seq %llu type %d at %p\n", seq,
-                    le16_to_cpu(m->hdr.type), m);
-               ceph_msg_remove(m);
-       }
-       prepare_read_tag(con);
-}
-
-
-
-
-static int read_partial_message_section(struct ceph_connection *con,
-                                       struct kvec *section,
-                                       unsigned int sec_len, u32 *crc)
-{
-       int left;
-       int ret;
-
-       BUG_ON(!section);
-
-       while (section->iov_len < sec_len) {
-               BUG_ON(section->iov_base == NULL);
-               left = sec_len - section->iov_len;
-               ret = ceph_tcp_recvmsg(con->sock, (char *)section->iov_base +
-                                      section->iov_len, left);
-               if (ret <= 0)
-                       return ret;
-               section->iov_len += ret;
-               if (section->iov_len == sec_len)
-                       *crc = crc32c(0, section->iov_base,
-                                     section->iov_len);
-       }
-
-       return 1;
-}
-
-static struct ceph_msg *ceph_alloc_msg(struct ceph_connection *con,
-                               struct ceph_msg_header *hdr,
-                               int *skip);
-/*
- * read (part of) a message.
- */
-static int read_partial_message(struct ceph_connection *con)
-{
-       struct ceph_msg *m = con->in_msg;
-       void *p;
-       int ret;
-       int to, left;
-       unsigned front_len, middle_len, data_len, data_off;
-       int datacrc = con->msgr->nocrc;
-       int skip;
-       u64 seq;
-
-       dout("read_partial_message con %p msg %p\n", con, m);
-
-       /* header */
-       while (con->in_base_pos < sizeof(con->in_hdr)) {
-               left = sizeof(con->in_hdr) - con->in_base_pos;
-               ret = ceph_tcp_recvmsg(con->sock,
-                                      (char *)&con->in_hdr + con->in_base_pos,
-                                      left);
-               if (ret <= 0)
-                       return ret;
-               con->in_base_pos += ret;
-               if (con->in_base_pos == sizeof(con->in_hdr)) {
-                       u32 crc = crc32c(0, (void *)&con->in_hdr,
-                                sizeof(con->in_hdr) - sizeof(con->in_hdr.crc));
-                       if (crc != le32_to_cpu(con->in_hdr.crc)) {
-                               pr_err("read_partial_message bad hdr "
-                                      " crc %u != expected %u\n",
-                                      crc, con->in_hdr.crc);
-                               return -EBADMSG;
-                       }
-               }
-       }
-       front_len = le32_to_cpu(con->in_hdr.front_len);
-       if (front_len > CEPH_MSG_MAX_FRONT_LEN)
-               return -EIO;
-       middle_len = le32_to_cpu(con->in_hdr.middle_len);
-       if (middle_len > CEPH_MSG_MAX_DATA_LEN)
-               return -EIO;
-       data_len = le32_to_cpu(con->in_hdr.data_len);
-       if (data_len > CEPH_MSG_MAX_DATA_LEN)
-               return -EIO;
-       data_off = le16_to_cpu(con->in_hdr.data_off);
-
-       /* verify seq# */
-       seq = le64_to_cpu(con->in_hdr.seq);
-       if ((s64)seq - (s64)con->in_seq < 1) {
-               pr_info("skipping %s%lld %s seq %lld, expected %lld\n",
-                       ENTITY_NAME(con->peer_name),
-                       pr_addr(&con->peer_addr.in_addr),
-                       seq, con->in_seq + 1);
-               con->in_base_pos = -front_len - middle_len - data_len -
-                       sizeof(m->footer);
-               con->in_tag = CEPH_MSGR_TAG_READY;
-               con->in_seq++;
-               return 0;
-       } else if ((s64)seq - (s64)con->in_seq > 1) {
-               pr_err("read_partial_message bad seq %lld expected %lld\n",
-                      seq, con->in_seq + 1);
-               con->error_msg = "bad message sequence # for incoming message";
-               return -EBADMSG;
-       }
-
-       /* allocate message? */
-       if (!con->in_msg) {
-               dout("got hdr type %d front %d data %d\n", con->in_hdr.type,
-                    con->in_hdr.front_len, con->in_hdr.data_len);
-               skip = 0;
-               con->in_msg = ceph_alloc_msg(con, &con->in_hdr, &skip);
-               if (skip) {
-                       /* skip this message */
-                       dout("alloc_msg said skip message\n");
-                       BUG_ON(con->in_msg);
-                       con->in_base_pos = -front_len - middle_len - data_len -
-                               sizeof(m->footer);
-                       con->in_tag = CEPH_MSGR_TAG_READY;
-                       con->in_seq++;
-                       return 0;
-               }
-               if (!con->in_msg) {
-                       con->error_msg =
-                               "error allocating memory for incoming message";
-                       return -ENOMEM;
-               }
-               m = con->in_msg;
-               m->front.iov_len = 0;    /* haven't read it yet */
-               if (m->middle)
-                       m->middle->vec.iov_len = 0;
-
-               con->in_msg_pos.page = 0;
-               con->in_msg_pos.page_pos = data_off & ~PAGE_MASK;
-               con->in_msg_pos.data_pos = 0;
-       }
-
-       /* front */
-       ret = read_partial_message_section(con, &m->front, front_len,
-                                          &con->in_front_crc);
-       if (ret <= 0)
-               return ret;
-
-       /* middle */
-       if (m->middle) {
-               ret = read_partial_message_section(con, &m->middle->vec,
-                                                  middle_len,
-                                                  &con->in_middle_crc);
-               if (ret <= 0)
-                       return ret;
-       }
-
-       /* (page) data */
-       while (con->in_msg_pos.data_pos < data_len) {
-               left = min((int)(data_len - con->in_msg_pos.data_pos),
-                          (int)(PAGE_SIZE - con->in_msg_pos.page_pos));
-               BUG_ON(m->pages == NULL);
-               p = kmap(m->pages[con->in_msg_pos.page]);
-               ret = ceph_tcp_recvmsg(con->sock, p + con->in_msg_pos.page_pos,
-                                      left);
-               if (ret > 0 && datacrc)
-                       con->in_data_crc =
-                               crc32c(con->in_data_crc,
-                                         p + con->in_msg_pos.page_pos, ret);
-               kunmap(m->pages[con->in_msg_pos.page]);
-               if (ret <= 0)
-                       return ret;
-               con->in_msg_pos.data_pos += ret;
-               con->in_msg_pos.page_pos += ret;
-               if (con->in_msg_pos.page_pos == PAGE_SIZE) {
-                       con->in_msg_pos.page_pos = 0;
-                       con->in_msg_pos.page++;
-               }
-       }
-
-       /* footer */
-       to = sizeof(m->hdr) + sizeof(m->footer);
-       while (con->in_base_pos < to) {
-               left = to - con->in_base_pos;
-               ret = ceph_tcp_recvmsg(con->sock, (char *)&m->footer +
-                                      (con->in_base_pos - sizeof(m->hdr)),
-                                      left);
-               if (ret <= 0)
-                       return ret;
-               con->in_base_pos += ret;
-       }
-       dout("read_partial_message got msg %p %d (%u) + %d (%u) + %d (%u)\n",
-            m, front_len, m->footer.front_crc, middle_len,
-            m->footer.middle_crc, data_len, m->footer.data_crc);
-
-       /* crc ok? */
-       if (con->in_front_crc != le32_to_cpu(m->footer.front_crc)) {
-               pr_err("read_partial_message %p front crc %u != exp. %u\n",
-                      m, con->in_front_crc, m->footer.front_crc);
-               return -EBADMSG;
-       }
-       if (con->in_middle_crc != le32_to_cpu(m->footer.middle_crc)) {
-               pr_err("read_partial_message %p middle crc %u != exp %u\n",
-                      m, con->in_middle_crc, m->footer.middle_crc);
-               return -EBADMSG;
-       }
-       if (datacrc &&
-           (m->footer.flags & CEPH_MSG_FOOTER_NOCRC) == 0 &&
-           con->in_data_crc != le32_to_cpu(m->footer.data_crc)) {
-               pr_err("read_partial_message %p data crc %u != exp. %u\n", m,
-                      con->in_data_crc, le32_to_cpu(m->footer.data_crc));
-               return -EBADMSG;
-       }
-
-       return 1; /* done! */
-}
-
-/*
- * Process message.  This happens in the worker thread.  The callback should
- * be careful not to do anything that waits on other incoming messages or it
- * may deadlock.
- */
-static void process_message(struct ceph_connection *con)
-{
-       struct ceph_msg *msg;
-
-       msg = con->in_msg;
-       con->in_msg = NULL;
-
-       /* if first message, set peer_name */
-       if (con->peer_name.type == 0)
-               con->peer_name = msg->hdr.src;
-
-       con->in_seq++;
-       mutex_unlock(&con->mutex);
-
-       dout("===== %p %llu from %s%lld %d=%s len %d+%d (%u %u %u) =====\n",
-            msg, le64_to_cpu(msg->hdr.seq),
-            ENTITY_NAME(msg->hdr.src),
-            le16_to_cpu(msg->hdr.type),
-            ceph_msg_type_name(le16_to_cpu(msg->hdr.type)),
-            le32_to_cpu(msg->hdr.front_len),
-            le32_to_cpu(msg->hdr.data_len),
-            con->in_front_crc, con->in_middle_crc, con->in_data_crc);
-       con->ops->dispatch(con, msg);
-
-       mutex_lock(&con->mutex);
-       prepare_read_tag(con);
-}
-
-
-/*
- * Write something to the socket.  Called in a worker thread when the
- * socket appears to be writeable and we have something ready to send.
- */
-static int try_write(struct ceph_connection *con)
-{
-       struct ceph_messenger *msgr = con->msgr;
-       int ret = 1;
-
-       dout("try_write start %p state %lu nref %d\n", con, con->state,
-            atomic_read(&con->nref));
-
-more:
-       dout("try_write out_kvec_bytes %d\n", con->out_kvec_bytes);
-
-       /* open the socket first? */
-       if (con->sock == NULL) {
-               /*
-                * if we were STANDBY and are reconnecting _this_
-                * connection, bump connect_seq now.  Always bump
-                * global_seq.
-                */
-               if (test_and_clear_bit(STANDBY, &con->state))
-                       con->connect_seq++;
-
-               prepare_write_banner(msgr, con);
-               prepare_write_connect(msgr, con, 1);
-               prepare_read_banner(con);
-               set_bit(CONNECTING, &con->state);
-               clear_bit(NEGOTIATING, &con->state);
-
-               BUG_ON(con->in_msg);
-               con->in_tag = CEPH_MSGR_TAG_READY;
-               dout("try_write initiating connect on %p new state %lu\n",
-                    con, con->state);
-               con->sock = ceph_tcp_connect(con);
-               if (IS_ERR(con->sock)) {
-                       con->sock = NULL;
-                       con->error_msg = "connect error";
-                       ret = -1;
-                       goto out;
-               }
-       }
-
-more_kvec:
-       /* kvec data queued? */
-       if (con->out_skip) {
-               ret = write_partial_skip(con);
-               if (ret <= 0)
-                       goto done;
-               if (ret < 0) {
-                       dout("try_write write_partial_skip err %d\n", ret);
-                       goto done;
-               }
-       }
-       if (con->out_kvec_left) {
-               ret = write_partial_kvec(con);
-               if (ret <= 0)
-                       goto done;
-       }
-
-       /* msg pages? */
-       if (con->out_msg) {
-               if (con->out_msg_done) {
-                       ceph_msg_put(con->out_msg);
-                       con->out_msg = NULL;   /* we're done with this one */
-                       goto do_next;
-               }
-
-               ret = write_partial_msg_pages(con);
-               if (ret == 1)
-                       goto more_kvec;  /* we need to send the footer, too! */
-               if (ret == 0)
-                       goto done;
-               if (ret < 0) {
-                       dout("try_write write_partial_msg_pages err %d\n",
-                            ret);
-                       goto done;
-               }
-       }
-
-do_next:
-       if (!test_bit(CONNECTING, &con->state)) {
-               /* is anything else pending? */
-               if (!list_empty(&con->out_queue)) {
-                       prepare_write_message(con);
-                       goto more;
-               }
-               if (con->in_seq > con->in_seq_acked) {
-                       prepare_write_ack(con);
-                       goto more;
-               }
-               if (test_and_clear_bit(KEEPALIVE_PENDING, &con->state)) {
-                       prepare_write_keepalive(con);
-                       goto more;
-               }
-       }
-
-       /* Nothing to do! */
-       clear_bit(WRITE_PENDING, &con->state);
-       dout("try_write nothing else to write.\n");
-done:
-       ret = 0;
-out:
-       dout("try_write done on %p\n", con);
-       return ret;
-}
-
-
-
-/*
- * Read what we can from the socket.
- */
-static int try_read(struct ceph_connection *con)
-{
-       int ret = -1;
-
-       if (!con->sock)
-               return 0;
-
-       if (test_bit(STANDBY, &con->state))
-               return 0;
-
-       dout("try_read start on %p\n", con);
-
-more:
-       dout("try_read tag %d in_base_pos %d\n", (int)con->in_tag,
-            con->in_base_pos);
-       if (test_bit(CONNECTING, &con->state)) {
-               if (!test_bit(NEGOTIATING, &con->state)) {
-                       dout("try_read connecting\n");
-                       ret = read_partial_banner(con);
-                       if (ret <= 0)
-                               goto done;
-                       if (process_banner(con) < 0) {
-                               ret = -1;
-                               goto out;
-                       }
-               }
-               ret = read_partial_connect(con);
-               if (ret <= 0)
-                       goto done;
-               if (process_connect(con) < 0) {
-                       ret = -1;
-                       goto out;
-               }
-               goto more;
-       }
-
-       if (con->in_base_pos < 0) {
-               /*
-                * skipping + discarding content.
-                *
-                * FIXME: there must be a better way to do this!
-                */
-               static char buf[1024];
-               int skip = min(1024, -con->in_base_pos);
-               dout("skipping %d / %d bytes\n", skip, -con->in_base_pos);
-               ret = ceph_tcp_recvmsg(con->sock, buf, skip);
-               if (ret <= 0)
-                       goto done;
-               con->in_base_pos += ret;
-               if (con->in_base_pos)
-                       goto more;
-       }
-       if (con->in_tag == CEPH_MSGR_TAG_READY) {
-               /*
-                * what's next?
-                */
-               ret = ceph_tcp_recvmsg(con->sock, &con->in_tag, 1);
-               if (ret <= 0)
-                       goto done;
-               dout("try_read got tag %d\n", (int)con->in_tag);
-               switch (con->in_tag) {
-               case CEPH_MSGR_TAG_MSG:
-                       prepare_read_message(con);
-                       break;
-               case CEPH_MSGR_TAG_ACK:
-                       prepare_read_ack(con);
-                       break;
-               case CEPH_MSGR_TAG_CLOSE:
-                       set_bit(CLOSED, &con->state);   /* fixme */
-                       goto done;
-               default:
-                       goto bad_tag;
-               }
-       }
-       if (con->in_tag == CEPH_MSGR_TAG_MSG) {
-               ret = read_partial_message(con);
-               if (ret <= 0) {
-                       switch (ret) {
-                       case -EBADMSG:
-                               con->error_msg = "bad crc";
-                               ret = -EIO;
-                               goto out;
-                       case -EIO:
-                               con->error_msg = "io error";
-                               goto out;
-                       default:
-                               goto done;
-                       }
-               }
-               if (con->in_tag == CEPH_MSGR_TAG_READY)
-                       goto more;
-               process_message(con);
-               goto more;
-       }
-       if (con->in_tag == CEPH_MSGR_TAG_ACK) {
-               ret = read_partial_ack(con);
-               if (ret <= 0)
-                       goto done;
-               process_ack(con);
-               goto more;
-       }
-
-done:
-       ret = 0;
-out:
-       dout("try_read done on %p\n", con);
-       return ret;
-
-bad_tag:
-       pr_err("try_read bad con->in_tag = %d\n", (int)con->in_tag);
-       con->error_msg = "protocol error, garbage tag";
-       ret = -1;
-       goto out;
-}
-
-
-/*
- * Atomically queue work on a connection.  Bump @con reference to
- * avoid races with connection teardown.
- *
- * There is some trickery going on with QUEUED and BUSY because we
- * only want a _single_ thread operating on each connection at any
- * point in time, but we want to use all available CPUs.
- *
- * The worker thread only proceeds if it can atomically set BUSY.  It
- * clears QUEUED and does it's thing.  When it thinks it's done, it
- * clears BUSY, then rechecks QUEUED.. if it's set again, it loops
- * (tries again to set BUSY).
- *
- * To queue work, we first set QUEUED, _then_ if BUSY isn't set, we
- * try to queue work.  If that fails (work is already queued, or BUSY)
- * we give up (work also already being done or is queued) but leave QUEUED
- * set so that the worker thread will loop if necessary.
- */
-static void queue_con(struct ceph_connection *con)
-{
-       if (test_bit(DEAD, &con->state)) {
-               dout("queue_con %p ignoring: DEAD\n",
-                    con);
-               return;
-       }
-
-       if (!con->ops->get(con)) {
-               dout("queue_con %p ref count 0\n", con);
-               return;
-       }
-
-       set_bit(QUEUED, &con->state);
-       if (test_bit(BUSY, &con->state)) {
-               dout("queue_con %p - already BUSY\n", con);
-               con->ops->put(con);
-       } else if (!queue_work(ceph_msgr_wq, &con->work.work)) {
-               dout("queue_con %p - already queued\n", con);
-               con->ops->put(con);
-       } else {
-               dout("queue_con %p\n", con);
-       }
-}
-
-/*
- * Do some work on a connection.  Drop a connection ref when we're done.
- */
-static void con_work(struct work_struct *work)
-{
-       struct ceph_connection *con = container_of(work, struct ceph_connection,
-                                                  work.work);
-       int backoff = 0;
-
-more:
-       if (test_and_set_bit(BUSY, &con->state) != 0) {
-               dout("con_work %p BUSY already set\n", con);
-               goto out;
-       }
-       dout("con_work %p start, clearing QUEUED\n", con);
-       clear_bit(QUEUED, &con->state);
-
-       mutex_lock(&con->mutex);
-
-       if (test_bit(CLOSED, &con->state)) { /* e.g. if we are replaced */
-               dout("con_work CLOSED\n");
-               con_close_socket(con);
-               goto done;
-       }
-       if (test_and_clear_bit(OPENING, &con->state)) {
-               /* reopen w/ new peer */
-               dout("con_work OPENING\n");
-               con_close_socket(con);
-       }
-
-       if (test_and_clear_bit(SOCK_CLOSED, &con->state) ||
-           try_read(con) < 0 ||
-           try_write(con) < 0) {
-               mutex_unlock(&con->mutex);
-               backoff = 1;
-               ceph_fault(con);     /* error/fault path */
-               goto done_unlocked;
-       }
-
-done:
-       mutex_unlock(&con->mutex);
-
-done_unlocked:
-       clear_bit(BUSY, &con->state);
-       dout("con->state=%lu\n", con->state);
-       if (test_bit(QUEUED, &con->state)) {
-               if (!backoff || test_bit(OPENING, &con->state)) {
-                       dout("con_work %p QUEUED reset, looping\n", con);
-                       goto more;
-               }
-               dout("con_work %p QUEUED reset, but just faulted\n", con);
-               clear_bit(QUEUED, &con->state);
-       }
-       dout("con_work %p done\n", con);
-
-out:
-       con->ops->put(con);
-}
-
-
-/*
- * Generic error/fault handler.  A retry mechanism is used with
- * exponential backoff
- */
-static void ceph_fault(struct ceph_connection *con)
-{
-       pr_err("%s%lld %s %s\n", ENTITY_NAME(con->peer_name),
-              pr_addr(&con->peer_addr.in_addr), con->error_msg);
-       dout("fault %p state %lu to peer %s\n",
-            con, con->state, pr_addr(&con->peer_addr.in_addr));
-
-       if (test_bit(LOSSYTX, &con->state)) {
-               dout("fault on LOSSYTX channel\n");
-               goto out;
-       }
-
-       mutex_lock(&con->mutex);
-       if (test_bit(CLOSED, &con->state))
-               goto out_unlock;
-
-       con_close_socket(con);
-
-       if (con->in_msg) {
-               ceph_msg_put(con->in_msg);
-               con->in_msg = NULL;
-       }
-
-       /* Requeue anything that hasn't been acked */
-       list_splice_init(&con->out_sent, &con->out_queue);
-
-       /* If there are no messages in the queue, place the connection
-        * in a STANDBY state (i.e., don't try to reconnect just yet). */
-       if (list_empty(&con->out_queue) && !con->out_keepalive_pending) {
-               dout("fault setting STANDBY\n");
-               set_bit(STANDBY, &con->state);
-       } else {
-               /* retry after a delay. */
-               if (con->delay == 0)
-                       con->delay = BASE_DELAY_INTERVAL;
-               else if (con->delay < MAX_DELAY_INTERVAL)
-                       con->delay *= 2;
-               dout("fault queueing %p delay %lu\n", con, con->delay);
-               con->ops->get(con);
-               if (queue_delayed_work(ceph_msgr_wq, &con->work,
-                                      round_jiffies_relative(con->delay)) == 0)
-                       con->ops->put(con);
-       }
-
-out_unlock:
-       mutex_unlock(&con->mutex);
-out:
-       /*
-        * in case we faulted due to authentication, invalidate our
-        * current tickets so that we can get new ones.
-        */
-       if (con->auth_retry && con->ops->invalidate_authorizer) {
-               dout("calling invalidate_authorizer()\n");
-               con->ops->invalidate_authorizer(con);
-       }
-
-       if (con->ops->fault)
-               con->ops->fault(con);
-}
-
-
-
-/*
- * create a new messenger instance
- */
-struct ceph_messenger *ceph_messenger_create(struct ceph_entity_addr *myaddr)
-{
-       struct ceph_messenger *msgr;
-
-       msgr = kzalloc(sizeof(*msgr), GFP_KERNEL);
-       if (msgr == NULL)
-               return ERR_PTR(-ENOMEM);
-
-       spin_lock_init(&msgr->global_seq_lock);
-
-       /* the zero page is needed if a request is "canceled" while the message
-        * is being written over the socket */
-       msgr->zero_page = __page_cache_alloc(GFP_KERNEL | __GFP_ZERO);
-       if (!msgr->zero_page) {
-               kfree(msgr);
-               return ERR_PTR(-ENOMEM);
-       }
-       kmap(msgr->zero_page);
-
-       if (myaddr)
-               msgr->inst.addr = *myaddr;
-
-       /* select a random nonce */
-       msgr->inst.addr.type = 0;
-       get_random_bytes(&msgr->inst.addr.nonce, sizeof(msgr->inst.addr.nonce));
-       encode_my_addr(msgr);
-
-       dout("messenger_create %p\n", msgr);
-       return msgr;
-}
-
-void ceph_messenger_destroy(struct ceph_messenger *msgr)
-{
-       dout("destroy %p\n", msgr);
-       kunmap(msgr->zero_page);
-       __free_page(msgr->zero_page);
-       kfree(msgr);
-       dout("destroyed messenger %p\n", msgr);
-}
-
-/*
- * Queue up an outgoing message on the given connection.
- */
-void ceph_con_send(struct ceph_connection *con, struct ceph_msg *msg)
-{
-       if (test_bit(CLOSED, &con->state)) {
-               dout("con_send %p closed, dropping %p\n", con, msg);
-               ceph_msg_put(msg);
-               return;
-       }
-
-       /* set src+dst */
-       msg->hdr.src = con->msgr->inst.name;
-
-       BUG_ON(msg->front.iov_len != le32_to_cpu(msg->hdr.front_len));
-
-       msg->needs_out_seq = true;
-
-       /* queue */
-       mutex_lock(&con->mutex);
-       BUG_ON(!list_empty(&msg->list_head));
-       list_add_tail(&msg->list_head, &con->out_queue);
-       dout("----- %p to %s%lld %d=%s len %d+%d+%d -----\n", msg,
-            ENTITY_NAME(con->peer_name), le16_to_cpu(msg->hdr.type),
-            ceph_msg_type_name(le16_to_cpu(msg->hdr.type)),
-            le32_to_cpu(msg->hdr.front_len),
-            le32_to_cpu(msg->hdr.middle_len),
-            le32_to_cpu(msg->hdr.data_len));
-       mutex_unlock(&con->mutex);
-
-       /* if there wasn't anything waiting to send before, queue
-        * new work */
-       if (test_and_set_bit(WRITE_PENDING, &con->state) == 0)
-               queue_con(con);
-}
-
-/*
- * Revoke a message that was previously queued for send
- */
-void ceph_con_revoke(struct ceph_connection *con, struct ceph_msg *msg)
-{
-       mutex_lock(&con->mutex);
-       if (!list_empty(&msg->list_head)) {
-               dout("con_revoke %p msg %p - was on queue\n", con, msg);
-               list_del_init(&msg->list_head);
-               ceph_msg_put(msg);
-               msg->hdr.seq = 0;
-       }
-       if (con->out_msg == msg) {
-               dout("con_revoke %p msg %p - was sending\n", con, msg);
-               con->out_msg = NULL;
-               if (con->out_kvec_is_msg) {
-                       con->out_skip = con->out_kvec_bytes;
-                       con->out_kvec_is_msg = false;
-               }
-               ceph_msg_put(msg);
-               msg->hdr.seq = 0;
-       }
-       mutex_unlock(&con->mutex);
-}
-
-/*
- * Revoke a message that we may be reading data into
- */
-void ceph_con_revoke_message(struct ceph_connection *con, struct ceph_msg *msg)
-{
-       mutex_lock(&con->mutex);
-       if (con->in_msg && con->in_msg == msg) {
-               unsigned front_len = le32_to_cpu(con->in_hdr.front_len);
-               unsigned middle_len = le32_to_cpu(con->in_hdr.middle_len);
-               unsigned data_len = le32_to_cpu(con->in_hdr.data_len);
-
-               /* skip rest of message */
-               dout("con_revoke_pages %p msg %p revoked\n", con, msg);
-                       con->in_base_pos = con->in_base_pos -
-                               sizeof(struct ceph_msg_header) -
-                               front_len -
-                               middle_len -
-                               data_len -
-                               sizeof(struct ceph_msg_footer);
-               ceph_msg_put(con->in_msg);
-               con->in_msg = NULL;
-               con->in_tag = CEPH_MSGR_TAG_READY;
-               con->in_seq++;
-       } else {
-               dout("con_revoke_pages %p msg %p pages %p no-op\n",
-                    con, con->in_msg, msg);
-       }
-       mutex_unlock(&con->mutex);
-}
-
-/*
- * Queue a keepalive byte to ensure the tcp connection is alive.
- */
-void ceph_con_keepalive(struct ceph_connection *con)
-{
-       if (test_and_set_bit(KEEPALIVE_PENDING, &con->state) == 0 &&
-           test_and_set_bit(WRITE_PENDING, &con->state) == 0)
-               queue_con(con);
-}
-
-
-/*
- * construct a new message with given type, size
- * the new msg has a ref count of 1.
- */
-struct ceph_msg *ceph_msg_new(int type, int front_len, gfp_t flags)
-{
-       struct ceph_msg *m;
-
-       m = kmalloc(sizeof(*m), flags);
-       if (m == NULL)
-               goto out;
-       kref_init(&m->kref);
-       INIT_LIST_HEAD(&m->list_head);
-
-       m->hdr.tid = 0;
-       m->hdr.type = cpu_to_le16(type);
-       m->hdr.priority = cpu_to_le16(CEPH_MSG_PRIO_DEFAULT);
-       m->hdr.version = 0;
-       m->hdr.front_len = cpu_to_le32(front_len);
-       m->hdr.middle_len = 0;
-       m->hdr.data_len = 0;
-       m->hdr.data_off = 0;
-       m->hdr.reserved = 0;
-       m->footer.front_crc = 0;
-       m->footer.middle_crc = 0;
-       m->footer.data_crc = 0;
-       m->footer.flags = 0;
-       m->front_max = front_len;
-       m->front_is_vmalloc = false;
-       m->more_to_follow = false;
-       m->pool = NULL;
-
-       /* front */
-       if (front_len) {
-               if (front_len > PAGE_CACHE_SIZE) {
-                       m->front.iov_base = __vmalloc(front_len, flags,
-                                                     PAGE_KERNEL);
-                       m->front_is_vmalloc = true;
-               } else {
-                       m->front.iov_base = kmalloc(front_len, flags);
-               }
-               if (m->front.iov_base == NULL) {
-                       pr_err("msg_new can't allocate %d bytes\n",
-                            front_len);
-                       goto out2;
-               }
-       } else {
-               m->front.iov_base = NULL;
-       }
-       m->front.iov_len = front_len;
-
-       /* middle */
-       m->middle = NULL;
-
-       /* data */
-       m->nr_pages = 0;
-       m->pages = NULL;
-       m->pagelist = NULL;
-
-       dout("ceph_msg_new %p front %d\n", m, front_len);
-       return m;
-
-out2:
-       ceph_msg_put(m);
-out:
-       pr_err("msg_new can't create type %d front %d\n", type, front_len);
-       return NULL;
-}
-
-/*
- * Allocate "middle" portion of a message, if it is needed and wasn't
- * allocated by alloc_msg.  This allows us to read a small fixed-size
- * per-type header in the front and then gracefully fail (i.e.,
- * propagate the error to the caller based on info in the front) when
- * the middle is too large.
- */
-static int ceph_alloc_middle(struct ceph_connection *con, struct ceph_msg *msg)
-{
-       int type = le16_to_cpu(msg->hdr.type);
-       int middle_len = le32_to_cpu(msg->hdr.middle_len);
-
-       dout("alloc_middle %p type %d %s middle_len %d\n", msg, type,
-            ceph_msg_type_name(type), middle_len);
-       BUG_ON(!middle_len);
-       BUG_ON(msg->middle);
-
-       msg->middle = ceph_buffer_new(middle_len, GFP_NOFS);
-       if (!msg->middle)
-               return -ENOMEM;
-       return 0;
-}
-
-/*
- * Generic message allocator, for incoming messages.
- */
-static struct ceph_msg *ceph_alloc_msg(struct ceph_connection *con,
-                               struct ceph_msg_header *hdr,
-                               int *skip)
-{
-       int type = le16_to_cpu(hdr->type);
-       int front_len = le32_to_cpu(hdr->front_len);
-       int middle_len = le32_to_cpu(hdr->middle_len);
-       struct ceph_msg *msg = NULL;
-       int ret;
-
-       if (con->ops->alloc_msg) {
-               mutex_unlock(&con->mutex);
-               msg = con->ops->alloc_msg(con, hdr, skip);
-               mutex_lock(&con->mutex);
-               if (!msg || *skip)
-                       return NULL;
-       }
-       if (!msg) {
-               *skip = 0;
-               msg = ceph_msg_new(type, front_len, GFP_NOFS);
-               if (!msg) {
-                       pr_err("unable to allocate msg type %d len %d\n",
-                              type, front_len);
-                       return NULL;
-               }
-       }
-       memcpy(&msg->hdr, &con->in_hdr, sizeof(con->in_hdr));
-
-       if (middle_len && !msg->middle) {
-               ret = ceph_alloc_middle(con, msg);
-               if (ret < 0) {
-                       ceph_msg_put(msg);
-                       return NULL;
-               }
-       }
-
-       return msg;
-}
-
-
-/*
- * Free a generically kmalloc'd message.
- */
-void ceph_msg_kfree(struct ceph_msg *m)
-{
-       dout("msg_kfree %p\n", m);
-       if (m->front_is_vmalloc)
-               vfree(m->front.iov_base);
-       else
-               kfree(m->front.iov_base);
-       kfree(m);
-}
-
-/*
- * Drop a msg ref.  Destroy as needed.
- */
-void ceph_msg_last_put(struct kref *kref)
-{
-       struct ceph_msg *m = container_of(kref, struct ceph_msg, kref);
-
-       dout("ceph_msg_put last one on %p\n", m);
-       WARN_ON(!list_empty(&m->list_head));
-
-       /* drop middle, data, if any */
-       if (m->middle) {
-               ceph_buffer_put(m->middle);
-               m->middle = NULL;
-       }
-       m->nr_pages = 0;
-       m->pages = NULL;
-
-       if (m->pagelist) {
-               ceph_pagelist_release(m->pagelist);
-               kfree(m->pagelist);
-               m->pagelist = NULL;
-       }
-
-       if (m->pool)
-               ceph_msgpool_put(m->pool, m);
-       else
-               ceph_msg_kfree(m);
-}
-
-void ceph_msg_dump(struct ceph_msg *msg)
-{
-       pr_debug("msg_dump %p (front_max %d nr_pages %d)\n", msg,
-                msg->front_max, msg->nr_pages);
-       print_hex_dump(KERN_DEBUG, "header: ",
-                      DUMP_PREFIX_OFFSET, 16, 1,
-                      &msg->hdr, sizeof(msg->hdr), true);
-       print_hex_dump(KERN_DEBUG, " front: ",
-                      DUMP_PREFIX_OFFSET, 16, 1,
-                      msg->front.iov_base, msg->front.iov_len, true);
-       if (msg->middle)
-               print_hex_dump(KERN_DEBUG, "middle: ",
-                              DUMP_PREFIX_OFFSET, 16, 1,
-                              msg->middle->vec.iov_base,
-                              msg->middle->vec.iov_len, true);
-       print_hex_dump(KERN_DEBUG, "footer: ",
-                      DUMP_PREFIX_OFFSET, 16, 1,
-                      &msg->footer, sizeof(msg->footer), true);
-}
diff --git a/fs/ceph/messenger.h b/fs/ceph/messenger.h

deleted file mode 100644 (file)

index 76fbc95..0000000
--- a/fs/ceph/messenger.h
+++ /dev/null
@@ -1,253 +0,0 @@
-#ifndef __FS_CEPH_MESSENGER_H
-#define __FS_CEPH_MESSENGER_H
-
-#include <linux/kref.h>
-#include <linux/mutex.h>
-#include <linux/net.h>
-#include <linux/radix-tree.h>
-#include <linux/uio.h>
-#include <linux/version.h>
-#include <linux/workqueue.h>
-
-#include "types.h"
-#include "buffer.h"
-
-struct ceph_msg;
-struct ceph_connection;
-
-extern struct workqueue_struct *ceph_msgr_wq;       /* receive work queue */
-
-/*
- * Ceph defines these callbacks for handling connection events.
- */
-struct ceph_connection_operations {
-       struct ceph_connection *(*get)(struct ceph_connection *);
-       void (*put)(struct ceph_connection *);
-
-       /* handle an incoming message. */
-       void (*dispatch) (struct ceph_connection *con, struct ceph_msg *m);
-
-       /* authorize an outgoing connection */
-       int (*get_authorizer) (struct ceph_connection *con,
-                              void **buf, int *len, int *proto,
-                              void **reply_buf, int *reply_len, int force_new);
-       int (*verify_authorizer_reply) (struct ceph_connection *con, int len);
-       int (*invalidate_authorizer)(struct ceph_connection *con);
-
-       /* protocol version mismatch */
-       void (*bad_proto) (struct ceph_connection *con);
-
-       /* there was some error on the socket (disconnect, whatever) */
-       void (*fault) (struct ceph_connection *con);
-
-       /* a remote host as terminated a message exchange session, and messages
-        * we sent (or they tried to send us) may be lost. */
-       void (*peer_reset) (struct ceph_connection *con);
-
-       struct ceph_msg * (*alloc_msg) (struct ceph_connection *con,
-                                       struct ceph_msg_header *hdr,
-                                       int *skip);
-};
-
-/* use format string %s%d */
-#define ENTITY_NAME(n) ceph_entity_type_name((n).type), le64_to_cpu((n).num)
-
-struct ceph_messenger {
-       struct ceph_entity_inst inst;    /* my name+address */
-       struct ceph_entity_addr my_enc_addr;
-       struct page *zero_page;          /* used in certain error cases */
-
-       bool nocrc;
-
-       /*
-        * the global_seq counts connections i (attempt to) initiate
-        * in order to disambiguate certain connect race conditions.
-        */
-       u32 global_seq;
-       spinlock_t global_seq_lock;
-};
-
-/*
- * a single message.  it contains a header (src, dest, message type, etc.),
- * footer (crc values, mainly), a "front" message body, and possibly a
- * data payload (stored in some number of pages).
- */
-struct ceph_msg {
-       struct ceph_msg_header hdr;     /* header */
-       struct ceph_msg_footer footer;  /* footer */
-       struct kvec front;              /* unaligned blobs of message */
-       struct ceph_buffer *middle;
-       struct page **pages;            /* data payload.  NOT OWNER. */
-       unsigned nr_pages;              /* size of page array */
-       struct ceph_pagelist *pagelist; /* instead of pages */
-       struct list_head list_head;
-       struct kref kref;
-       bool front_is_vmalloc;
-       bool more_to_follow;
-       bool needs_out_seq;
-       int front_max;
-
-       struct ceph_msgpool *pool;
-};
-
-struct ceph_msg_pos {
-       int page, page_pos;  /* which page; offset in page */
-       int data_pos;        /* offset in data payload */
-       int did_page_crc;    /* true if we've calculated crc for current page */
-};
-
-/* ceph connection fault delay defaults, for exponential backoff */
-#define BASE_DELAY_INTERVAL    (HZ/2)
-#define MAX_DELAY_INTERVAL     (5 * 60 * HZ)
-
-/*
- * ceph_connection state bit flags
- *
- * QUEUED and BUSY are used together to ensure that only a single
- * thread is currently opening, reading or writing data to the socket.
- */
-#define LOSSYTX         0  /* we can close channel or drop messages on errors */
-#define CONNECTING     1
-#define NEGOTIATING    2
-#define KEEPALIVE_PENDING      3
-#define WRITE_PENDING  4  /* we have data ready to send */
-#define QUEUED          5  /* there is work queued on this connection */
-#define BUSY            6  /* work is being done */
-#define STANDBY                8  /* no outgoing messages, socket closed.  we keep
-                           * the ceph_connection around to maintain shared
-                           * state with the peer. */
-#define CLOSED         10 /* we've closed the connection */
-#define SOCK_CLOSED    11 /* socket state changed to closed */
-#define OPENING         13 /* open connection w/ (possibly new) peer */
-#define DEAD            14 /* dead, about to kfree */
-
-/*
- * A single connection with another host.
- *
- * We maintain a queue of outgoing messages, and some session state to
- * ensure that we can preserve the lossless, ordered delivery of
- * messages in the case of a TCP disconnect.
- */
-struct ceph_connection {
-       void *private;
-       atomic_t nref;
-
-       const struct ceph_connection_operations *ops;
-
-       struct ceph_messenger *msgr;
-       struct socket *sock;
-       unsigned long state;    /* connection state (see flags above) */
-       const char *error_msg;  /* error message, if any */
-
-       struct ceph_entity_addr peer_addr; /* peer address */
-       struct ceph_entity_name peer_name; /* peer name */
-       struct ceph_entity_addr peer_addr_for_me;
-       unsigned peer_features;
-       u32 connect_seq;      /* identify the most recent connection
-                                attempt for this connection, client */
-       u32 peer_global_seq;  /* peer's global seq for this connection */
-
-       int auth_retry;       /* true if we need a newer authorizer */
-       void *auth_reply_buf;   /* where to put the authorizer reply */
-       int auth_reply_buf_len;
-
-       struct mutex mutex;
-
-       /* out queue */
-       struct list_head out_queue;
-       struct list_head out_sent;   /* sending or sent but unacked */
-       u64 out_seq;                 /* last message queued for send */
-       bool out_keepalive_pending;
-
-       u64 in_seq, in_seq_acked;  /* last message received, acked */
-
-       /* connection negotiation temps */
-       char in_banner[CEPH_BANNER_MAX_LEN];
-       union {
-               struct {  /* outgoing connection */
-                       struct ceph_msg_connect out_connect;
-                       struct ceph_msg_connect_reply in_reply;
-               };
-               struct {  /* incoming */
-                       struct ceph_msg_connect in_connect;
-                       struct ceph_msg_connect_reply out_reply;
-               };
-       };
-       struct ceph_entity_addr actual_peer_addr;
-
-       /* message out temps */
-       struct ceph_msg *out_msg;        /* sending message (== tail of
-                                           out_sent) */
-       bool out_msg_done;
-       struct ceph_msg_pos out_msg_pos;
-
-       struct kvec out_kvec[8],         /* sending header/footer data */
-               *out_kvec_cur;
-       int out_kvec_left;   /* kvec's left in out_kvec */
-       int out_skip;        /* skip this many bytes */
-       int out_kvec_bytes;  /* total bytes left */
-       bool out_kvec_is_msg; /* kvec refers to out_msg */
-       int out_more;        /* there is more data after the kvecs */
-       __le64 out_temp_ack; /* for writing an ack */
-
-       /* message in temps */
-       struct ceph_msg_header in_hdr;
-       struct ceph_msg *in_msg;
-       struct ceph_msg_pos in_msg_pos;
-       u32 in_front_crc, in_middle_crc, in_data_crc;  /* calculated crc */
-
-       char in_tag;         /* protocol control byte */
-       int in_base_pos;     /* bytes read */
-       __le64 in_temp_ack;  /* for reading an ack */
-
-       struct delayed_work work;           /* send|recv work */
-       unsigned long       delay;          /* current delay interval */
-};
-
-
-extern const char *pr_addr(const struct sockaddr_storage *ss);
-extern int ceph_parse_ips(const char *c, const char *end,
-                         struct ceph_entity_addr *addr,
-                         int max_count, int *count);
-
-
-extern int ceph_msgr_init(void);
-extern void ceph_msgr_exit(void);
-extern void ceph_msgr_flush(void);
-
-extern struct ceph_messenger *ceph_messenger_create(
-       struct ceph_entity_addr *myaddr);
-extern void ceph_messenger_destroy(struct ceph_messenger *);
-
-extern void ceph_con_init(struct ceph_messenger *msgr,
-                         struct ceph_connection *con);
-extern void ceph_con_open(struct ceph_connection *con,
-                         struct ceph_entity_addr *addr);
-extern bool ceph_con_opened(struct ceph_connection *con);
-extern void ceph_con_close(struct ceph_connection *con);
-extern void ceph_con_send(struct ceph_connection *con, struct ceph_msg *msg);
-extern void ceph_con_revoke(struct ceph_connection *con, struct ceph_msg *msg);
-extern void ceph_con_revoke_message(struct ceph_connection *con,
-                                 struct ceph_msg *msg);
-extern void ceph_con_keepalive(struct ceph_connection *con);
-extern struct ceph_connection *ceph_con_get(struct ceph_connection *con);
-extern void ceph_con_put(struct ceph_connection *con);
-
-extern struct ceph_msg *ceph_msg_new(int type, int front_len, gfp_t flags);
-extern void ceph_msg_kfree(struct ceph_msg *m);
-
-
-static inline struct ceph_msg *ceph_msg_get(struct ceph_msg *msg)
-{
-       kref_get(&msg->kref);
-       return msg;
-}
-extern void ceph_msg_last_put(struct kref *kref);
-static inline void ceph_msg_put(struct ceph_msg *msg)
-{
-       kref_put(&msg->kref, ceph_msg_last_put);
-}
-
-extern void ceph_msg_dump(struct ceph_msg *msg);
-
-#endif
diff --git a/fs/ceph/mon_client.c b/fs/ceph/mon_client.c

deleted file mode 100644 (file)

index b2a5a3e..0000000
--- a/fs/ceph/mon_client.c
+++ /dev/null
@@ -1,1018 +0,0 @@
-#include "ceph_debug.h"
-
-#include <linux/types.h>
-#include <linux/slab.h>
-#include <linux/random.h>
-#include <linux/sched.h>
-
-#include "mon_client.h"
-#include "super.h"
-#include "auth.h"
-#include "decode.h"
-
-/*
- * Interact with Ceph monitor cluster.  Handle requests for new map
- * versions, and periodically resend as needed.  Also implement
- * statfs() and umount().
- *
- * A small cluster of Ceph "monitors" are responsible for managing critical
- * cluster configuration and state information.  An odd number (e.g., 3, 5)
- * of cmon daemons use a modified version of the Paxos part-time parliament
- * algorithm to manage the MDS map (mds cluster membership), OSD map, and
- * list of clients who have mounted the file system.
- *
- * We maintain an open, active session with a monitor at all times in order to
- * receive timely MDSMap updates.  We periodically send a keepalive byte on the
- * TCP socket to ensure we detect a failure.  If the connection does break, we
- * randomly hunt for a new monitor.  Once the connection is reestablished, we
- * resend any outstanding requests.
- */
-
-static const struct ceph_connection_operations mon_con_ops;
-
-static int __validate_auth(struct ceph_mon_client *monc);
-
-/*
- * Decode a monmap blob (e.g., during mount).
- */
-struct ceph_monmap *ceph_monmap_decode(void *p, void *end)
-{
-       struct ceph_monmap *m = NULL;
-       int i, err = -EINVAL;
-       struct ceph_fsid fsid;
-       u32 epoch, num_mon;
-       u16 version;
-       u32 len;
-
-       ceph_decode_32_safe(&p, end, len, bad);
-       ceph_decode_need(&p, end, len, bad);
-
-       dout("monmap_decode %p %p len %d\n", p, end, (int)(end-p));
-
-       ceph_decode_16_safe(&p, end, version, bad);
-
-       ceph_decode_need(&p, end, sizeof(fsid) + 2*sizeof(u32), bad);
-       ceph_decode_copy(&p, &fsid, sizeof(fsid));
-       epoch = ceph_decode_32(&p);
-
-       num_mon = ceph_decode_32(&p);
-       ceph_decode_need(&p, end, num_mon*sizeof(m->mon_inst[0]), bad);
-
-       if (num_mon >= CEPH_MAX_MON)
-               goto bad;
-       m = kmalloc(sizeof(*m) + sizeof(m->mon_inst[0])*num_mon, GFP_NOFS);
-       if (m == NULL)
-               return ERR_PTR(-ENOMEM);
-       m->fsid = fsid;
-       m->epoch = epoch;
-       m->num_mon = num_mon;
-       ceph_decode_copy(&p, m->mon_inst, num_mon*sizeof(m->mon_inst[0]));
-       for (i = 0; i < num_mon; i++)
-               ceph_decode_addr(&m->mon_inst[i].addr);
-
-       dout("monmap_decode epoch %d, num_mon %d\n", m->epoch,
-            m->num_mon);
-       for (i = 0; i < m->num_mon; i++)
-               dout("monmap_decode  mon%d is %s\n", i,
-                    pr_addr(&m->mon_inst[i].addr.in_addr));
-       return m;
-
-bad:
-       dout("monmap_decode failed with %d\n", err);
-       kfree(m);
-       return ERR_PTR(err);
-}
-
-/*
- * return true if *addr is included in the monmap.
- */
-int ceph_monmap_contains(struct ceph_monmap *m, struct ceph_entity_addr *addr)
-{
-       int i;
-
-       for (i = 0; i < m->num_mon; i++)
-               if (memcmp(addr, &m->mon_inst[i].addr, sizeof(*addr)) == 0)
-                       return 1;
-       return 0;
-}
-
-/*
- * Send an auth request.
- */
-static void __send_prepared_auth_request(struct ceph_mon_client *monc, int len)
-{
-       monc->pending_auth = 1;
-       monc->m_auth->front.iov_len = len;
-       monc->m_auth->hdr.front_len = cpu_to_le32(len);
-       ceph_con_revoke(monc->con, monc->m_auth);
-       ceph_msg_get(monc->m_auth);  /* keep our ref */
-       ceph_con_send(monc->con, monc->m_auth);
-}
-
-/*
- * Close monitor session, if any.
- */
-static void __close_session(struct ceph_mon_client *monc)
-{
-       if (monc->con) {
-               dout("__close_session closing mon%d\n", monc->cur_mon);
-               ceph_con_revoke(monc->con, monc->m_auth);
-               ceph_con_close(monc->con);
-               monc->cur_mon = -1;
-               monc->pending_auth = 0;
-               ceph_auth_reset(monc->auth);
-       }
-}
-
-/*
- * Open a session with a (new) monitor.
- */
-static int __open_session(struct ceph_mon_client *monc)
-{
-       char r;
-       int ret;
-
-       if (monc->cur_mon < 0) {
-               get_random_bytes(&r, 1);
-               monc->cur_mon = r % monc->monmap->num_mon;
-               dout("open_session num=%d r=%d -> mon%d\n",
-                    monc->monmap->num_mon, r, monc->cur_mon);
-               monc->sub_sent = 0;
-               monc->sub_renew_after = jiffies;  /* i.e., expired */
-               monc->want_next_osdmap = !!monc->want_next_osdmap;
-
-               dout("open_session mon%d opening\n", monc->cur_mon);
-               monc->con->peer_name.type = CEPH_ENTITY_TYPE_MON;
-               monc->con->peer_name.num = cpu_to_le64(monc->cur_mon);
-               ceph_con_open(monc->con,
-                             &monc->monmap->mon_inst[monc->cur_mon].addr);
-
-               /* initiatiate authentication handshake */
-               ret = ceph_auth_build_hello(monc->auth,
-                                           monc->m_auth->front.iov_base,
-                                           monc->m_auth->front_max);
-               __send_prepared_auth_request(monc, ret);
-       } else {
-               dout("open_session mon%d already open\n", monc->cur_mon);
-       }
-       return 0;
-}
-
-static bool __sub_expired(struct ceph_mon_client *monc)
-{
-       return time_after_eq(jiffies, monc->sub_renew_after);
-}
-
-/*
- * Reschedule delayed work timer.
- */
-static void __schedule_delayed(struct ceph_mon_client *monc)
-{
-       unsigned delay;
-
-       if (monc->cur_mon < 0 || __sub_expired(monc))
-               delay = 10 * HZ;
-       else
-               delay = 20 * HZ;
-       dout("__schedule_delayed after %u\n", delay);
-       schedule_delayed_work(&monc->delayed_work, delay);
-}
-
-/*
- * Send subscribe request for mdsmap and/or osdmap.
- */
-static void __send_subscribe(struct ceph_mon_client *monc)
-{
-       dout("__send_subscribe sub_sent=%u exp=%u want_osd=%d\n",
-            (unsigned)monc->sub_sent, __sub_expired(monc),
-            monc->want_next_osdmap);
-       if ((__sub_expired(monc) && !monc->sub_sent) ||
-           monc->want_next_osdmap == 1) {
-               struct ceph_msg *msg = monc->m_subscribe;
-               struct ceph_mon_subscribe_item *i;
-               void *p, *end;
-
-               p = msg->front.iov_base;
-               end = p + msg->front_max;
-
-               dout("__send_subscribe to 'mdsmap' %u+\n",
-                    (unsigned)monc->have_mdsmap);
-               if (monc->want_next_osdmap) {
-                       dout("__send_subscribe to 'osdmap' %u\n",
-                            (unsigned)monc->have_osdmap);
-                       ceph_encode_32(&p, 3);
-                       ceph_encode_string(&p, end, "osdmap", 6);
-                       i = p;
-                       i->have = cpu_to_le64(monc->have_osdmap);
-                       i->onetime = 1;
-                       p += sizeof(*i);
-                       monc->want_next_osdmap = 2;  /* requested */
-               } else {
-                       ceph_encode_32(&p, 2);
-               }
-               ceph_encode_string(&p, end, "mdsmap", 6);
-               i = p;
-               i->have = cpu_to_le64(monc->have_mdsmap);
-               i->onetime = 0;
-               p += sizeof(*i);
-               ceph_encode_string(&p, end, "monmap", 6);
-               i = p;
-               i->have = 0;
-               i->onetime = 0;
-               p += sizeof(*i);
-
-               msg->front.iov_len = p - msg->front.iov_base;
-               msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
-               ceph_con_revoke(monc->con, msg);
-               ceph_con_send(monc->con, ceph_msg_get(msg));
-
-               monc->sub_sent = jiffies | 1;  /* never 0 */
-       }
-}
-
-static void handle_subscribe_ack(struct ceph_mon_client *monc,
-                                struct ceph_msg *msg)
-{
-       unsigned seconds;
-       struct ceph_mon_subscribe_ack *h = msg->front.iov_base;
-
-       if (msg->front.iov_len < sizeof(*h))
-               goto bad;
-       seconds = le32_to_cpu(h->duration);
-
-       mutex_lock(&monc->mutex);
-       if (monc->hunting) {
-               pr_info("mon%d %s session established\n",
-                       monc->cur_mon, pr_addr(&monc->con->peer_addr.in_addr));
-               monc->hunting = false;
-       }
-       dout("handle_subscribe_ack after %d seconds\n", seconds);
-       monc->sub_renew_after = monc->sub_sent + (seconds >> 1)*HZ - 1;
-       monc->sub_sent = 0;
-       mutex_unlock(&monc->mutex);
-       return;
-bad:
-       pr_err("got corrupt subscribe-ack msg\n");
-       ceph_msg_dump(msg);
-}
-
-/*
- * Keep track of which maps we have
- */
-int ceph_monc_got_mdsmap(struct ceph_mon_client *monc, u32 got)
-{
-       mutex_lock(&monc->mutex);
-       monc->have_mdsmap = got;
-       mutex_unlock(&monc->mutex);
-       return 0;
-}
-
-int ceph_monc_got_osdmap(struct ceph_mon_client *monc, u32 got)
-{
-       mutex_lock(&monc->mutex);
-       monc->have_osdmap = got;
-       monc->want_next_osdmap = 0;
-       mutex_unlock(&monc->mutex);
-       return 0;
-}
-
-/*
- * Register interest in the next osdmap
- */
-void ceph_monc_request_next_osdmap(struct ceph_mon_client *monc)
-{
-       dout("request_next_osdmap have %u\n", monc->have_osdmap);
-       mutex_lock(&monc->mutex);
-       if (!monc->want_next_osdmap)
-               monc->want_next_osdmap = 1;
-       if (monc->want_next_osdmap < 2)
-               __send_subscribe(monc);
-       mutex_unlock(&monc->mutex);
-}
-
-/*
- *
- */
-int ceph_monc_open_session(struct ceph_mon_client *monc)
-{
-       if (!monc->con) {
-               monc->con = kmalloc(sizeof(*monc->con), GFP_KERNEL);
-               if (!monc->con)
-                       return -ENOMEM;
-               ceph_con_init(monc->client->msgr, monc->con);
-               monc->con->private = monc;
-               monc->con->ops = &mon_con_ops;
-       }
-
-       mutex_lock(&monc->mutex);
-       __open_session(monc);
-       __schedule_delayed(monc);
-       mutex_unlock(&monc->mutex);
-       return 0;
-}
-
-/*
- * The monitor responds with mount ack indicate mount success.  The
- * included client ticket allows the client to talk to MDSs and OSDs.
- */
-static void ceph_monc_handle_map(struct ceph_mon_client *monc,
-                                struct ceph_msg *msg)
-{
-       struct ceph_client *client = monc->client;
-       struct ceph_monmap *monmap = NULL, *old = monc->monmap;
-       void *p, *end;
-
-       mutex_lock(&monc->mutex);
-
-       dout("handle_monmap\n");
-       p = msg->front.iov_base;
-       end = p + msg->front.iov_len;
-
-       monmap = ceph_monmap_decode(p, end);
-       if (IS_ERR(monmap)) {
-               pr_err("problem decoding monmap, %d\n",
-                      (int)PTR_ERR(monmap));
-               goto out;
-       }
-
-       if (ceph_check_fsid(monc->client, &monmap->fsid) < 0) {
-               kfree(monmap);
-               goto out;
-       }
-
-       client->monc.monmap = monmap;
-       kfree(old);
-
-out:
-       mutex_unlock(&monc->mutex);
-       wake_up_all(&client->auth_wq);
-}
-
-/*
- * generic requests (e.g., statfs, poolop)
- */
-static struct ceph_mon_generic_request *__lookup_generic_req(
-       struct ceph_mon_client *monc, u64 tid)
-{
-       struct ceph_mon_generic_request *req;
-       struct rb_node *n = monc->generic_request_tree.rb_node;
-
-       while (n) {
-               req = rb_entry(n, struct ceph_mon_generic_request, node);
-               if (tid < req->tid)
-                       n = n->rb_left;
-               else if (tid > req->tid)
-                       n = n->rb_right;
-               else
-                       return req;
-       }
-       return NULL;
-}
-
-static void __insert_generic_request(struct ceph_mon_client *monc,
-                           struct ceph_mon_generic_request *new)
-{
-       struct rb_node **p = &monc->generic_request_tree.rb_node;
-       struct rb_node *parent = NULL;
-       struct ceph_mon_generic_request *req = NULL;
-
-       while (*p) {
-               parent = *p;
-               req = rb_entry(parent, struct ceph_mon_generic_request, node);
-               if (new->tid < req->tid)
-                       p = &(*p)->rb_left;
-               else if (new->tid > req->tid)
-                       p = &(*p)->rb_right;
-               else
-                       BUG();
-       }
-
-       rb_link_node(&new->node, parent, p);
-       rb_insert_color(&new->node, &monc->generic_request_tree);
-}
-
-static void release_generic_request(struct kref *kref)
-{
-       struct ceph_mon_generic_request *req =
-               container_of(kref, struct ceph_mon_generic_request, kref);
-
-       if (req->reply)
-               ceph_msg_put(req->reply);
-       if (req->request)
-               ceph_msg_put(req->request);
-
-       kfree(req);
-}
-
-static void put_generic_request(struct ceph_mon_generic_request *req)
-{
-       kref_put(&req->kref, release_generic_request);
-}
-
-static void get_generic_request(struct ceph_mon_generic_request *req)
-{
-       kref_get(&req->kref);
-}
-
-static struct ceph_msg *get_generic_reply(struct ceph_connection *con,
-                                        struct ceph_msg_header *hdr,
-                                        int *skip)
-{
-       struct ceph_mon_client *monc = con->private;
-       struct ceph_mon_generic_request *req;
-       u64 tid = le64_to_cpu(hdr->tid);
-       struct ceph_msg *m;
-
-       mutex_lock(&monc->mutex);
-       req = __lookup_generic_req(monc, tid);
-       if (!req) {
-               dout("get_generic_reply %lld dne\n", tid);
-               *skip = 1;
-               m = NULL;
-       } else {
-               dout("get_generic_reply %lld got %p\n", tid, req->reply);
-               m = ceph_msg_get(req->reply);
-               /*
-                * we don't need to track the connection reading into
-                * this reply because we only have one open connection
-                * at a time, ever.
-                */
-       }
-       mutex_unlock(&monc->mutex);
-       return m;
-}
-
-static int do_generic_request(struct ceph_mon_client *monc,
-                             struct ceph_mon_generic_request *req)
-{
-       int err;
-
-       /* register request */
-       mutex_lock(&monc->mutex);
-       req->tid = ++monc->last_tid;
-       req->request->hdr.tid = cpu_to_le64(req->tid);
-       __insert_generic_request(monc, req);
-       monc->num_generic_requests++;
-       ceph_con_send(monc->con, ceph_msg_get(req->request));
-       mutex_unlock(&monc->mutex);
-
-       err = wait_for_completion_interruptible(&req->completion);
-
-       mutex_lock(&monc->mutex);
-       rb_erase(&req->node, &monc->generic_request_tree);
-       monc->num_generic_requests--;
-       mutex_unlock(&monc->mutex);
-
-       if (!err)
-               err = req->result;
-       return err;
-}
-
-/*
- * statfs
- */
-static void handle_statfs_reply(struct ceph_mon_client *monc,
-                               struct ceph_msg *msg)
-{
-       struct ceph_mon_generic_request *req;
-       struct ceph_mon_statfs_reply *reply = msg->front.iov_base;
-       u64 tid = le64_to_cpu(msg->hdr.tid);
-
-       if (msg->front.iov_len != sizeof(*reply))
-               goto bad;
-       dout("handle_statfs_reply %p tid %llu\n", msg, tid);
-
-       mutex_lock(&monc->mutex);
-       req = __lookup_generic_req(monc, tid);
-       if (req) {
-               *(struct ceph_statfs *)req->buf = reply->st;
-               req->result = 0;
-               get_generic_request(req);
-       }
-       mutex_unlock(&monc->mutex);
-       if (req) {
-               complete_all(&req->completion);
-               put_generic_request(req);
-       }
-       return;
-
-bad:
-       pr_err("corrupt generic reply, tid %llu\n", tid);
-       ceph_msg_dump(msg);
-}
-
-/*
- * Do a synchronous statfs().
- */
-int ceph_monc_do_statfs(struct ceph_mon_client *monc, struct ceph_statfs *buf)
-{
-       struct ceph_mon_generic_request *req;
-       struct ceph_mon_statfs *h;
-       int err;
-
-       req = kzalloc(sizeof(*req), GFP_NOFS);
-       if (!req)
-               return -ENOMEM;
-
-       kref_init(&req->kref);
-       req->buf = buf;
-       req->buf_len = sizeof(*buf);
-       init_completion(&req->completion);
-
-       err = -ENOMEM;
-       req->request = ceph_msg_new(CEPH_MSG_STATFS, sizeof(*h), GFP_NOFS);
-       if (!req->request)
-               goto out;
-       req->reply = ceph_msg_new(CEPH_MSG_STATFS_REPLY, 1024, GFP_NOFS);
-       if (!req->reply)
-               goto out;
-
-       /* fill out request */
-       h = req->request->front.iov_base;
-       h->monhdr.have_version = 0;
-       h->monhdr.session_mon = cpu_to_le16(-1);
-       h->monhdr.session_mon_tid = 0;
-       h->fsid = monc->monmap->fsid;
-
-       err = do_generic_request(monc, req);
-
-out:
-       kref_put(&req->kref, release_generic_request);
-       return err;
-}
-
-/*
- * pool ops
- */
-static int get_poolop_reply_buf(const char *src, size_t src_len,
-                               char *dst, size_t dst_len)
-{
-       u32 buf_len;
-
-       if (src_len != sizeof(u32) + dst_len)
-               return -EINVAL;
-
-       buf_len = le32_to_cpu(*(u32 *)src);
-       if (buf_len != dst_len)
-               return -EINVAL;
-
-       memcpy(dst, src + sizeof(u32), dst_len);
-       return 0;
-}
-
-static void handle_poolop_reply(struct ceph_mon_client *monc,
-                               struct ceph_msg *msg)
-{
-       struct ceph_mon_generic_request *req;
-       struct ceph_mon_poolop_reply *reply = msg->front.iov_base;
-       u64 tid = le64_to_cpu(msg->hdr.tid);
-
-       if (msg->front.iov_len < sizeof(*reply))
-               goto bad;
-       dout("handle_poolop_reply %p tid %llu\n", msg, tid);
-
-       mutex_lock(&monc->mutex);
-       req = __lookup_generic_req(monc, tid);
-       if (req) {
-               if (req->buf_len &&
-                   get_poolop_reply_buf(msg->front.iov_base + sizeof(*reply),
-                                    msg->front.iov_len - sizeof(*reply),
-                                    req->buf, req->buf_len) < 0) {
-                       mutex_unlock(&monc->mutex);
-                       goto bad;
-               }
-               req->result = le32_to_cpu(reply->reply_code);
-               get_generic_request(req);
-       }
-       mutex_unlock(&monc->mutex);
-       if (req) {
-               complete(&req->completion);
-               put_generic_request(req);
-       }
-       return;
-
-bad:
-       pr_err("corrupt generic reply, tid %llu\n", tid);
-       ceph_msg_dump(msg);
-}
-
-/*
- * Do a synchronous pool op.
- */
-int ceph_monc_do_poolop(struct ceph_mon_client *monc, u32 op,
-                       u32 pool, u64 snapid,
-                       char *buf, int len)
-{
-       struct ceph_mon_generic_request *req;
-       struct ceph_mon_poolop *h;
-       int err;
-
-       req = kzalloc(sizeof(*req), GFP_NOFS);
-       if (!req)
-               return -ENOMEM;
-
-       kref_init(&req->kref);
-       req->buf = buf;
-       req->buf_len = len;
-       init_completion(&req->completion);
-
-       err = -ENOMEM;
-       req->request = ceph_msg_new(CEPH_MSG_POOLOP, sizeof(*h), GFP_NOFS);
-       if (!req->request)
-               goto out;
-       req->reply = ceph_msg_new(CEPH_MSG_POOLOP_REPLY, 1024, GFP_NOFS);
-       if (!req->reply)
-               goto out;
-
-       /* fill out request */
-       req->request->hdr.version = cpu_to_le16(2);
-       h = req->request->front.iov_base;
-       h->monhdr.have_version = 0;
-       h->monhdr.session_mon = cpu_to_le16(-1);
-       h->monhdr.session_mon_tid = 0;
-       h->fsid = monc->monmap->fsid;
-       h->pool = cpu_to_le32(pool);
-       h->op = cpu_to_le32(op);
-       h->auid = 0;
-       h->snapid = cpu_to_le64(snapid);
-       h->name_len = 0;
-
-       err = do_generic_request(monc, req);
-
-out:
-       kref_put(&req->kref, release_generic_request);
-       return err;
-}
-
-int ceph_monc_create_snapid(struct ceph_mon_client *monc,
-                           u32 pool, u64 *snapid)
-{
-       return ceph_monc_do_poolop(monc,  POOL_OP_CREATE_UNMANAGED_SNAP,
-                                  pool, 0, (char *)snapid, sizeof(*snapid));
-
-}
-
-int ceph_monc_delete_snapid(struct ceph_mon_client *monc,
-                           u32 pool, u64 snapid)
-{
-       return ceph_monc_do_poolop(monc,  POOL_OP_CREATE_UNMANAGED_SNAP,
-                                  pool, snapid, 0, 0);
-
-}
-
-/*
- * Resend pending generic requests.
- */
-static void __resend_generic_request(struct ceph_mon_client *monc)
-{
-       struct ceph_mon_generic_request *req;
-       struct rb_node *p;
-
-       for (p = rb_first(&monc->generic_request_tree); p; p = rb_next(p)) {
-               req = rb_entry(p, struct ceph_mon_generic_request, node);
-               ceph_con_revoke(monc->con, req->request);
-               ceph_con_send(monc->con, ceph_msg_get(req->request));
-       }
-}
-
-/*
- * Delayed work.  If we haven't mounted yet, retry.  Otherwise,
- * renew/retry subscription as needed (in case it is timing out, or we
- * got an ENOMEM).  And keep the monitor connection alive.
- */
-static void delayed_work(struct work_struct *work)
-{
-       struct ceph_mon_client *monc =
-               container_of(work, struct ceph_mon_client, delayed_work.work);
-
-       dout("monc delayed_work\n");
-       mutex_lock(&monc->mutex);
-       if (monc->hunting) {
-               __close_session(monc);
-               __open_session(monc);  /* continue hunting */
-       } else {
-               ceph_con_keepalive(monc->con);
-
-               __validate_auth(monc);
-
-               if (monc->auth->ops->is_authenticated(monc->auth))
-                       __send_subscribe(monc);
-       }
-       __schedule_delayed(monc);
-       mutex_unlock(&monc->mutex);
-}
-
-/*
- * On startup, we build a temporary monmap populated with the IPs
- * provided by mount(2).
- */
-static int build_initial_monmap(struct ceph_mon_client *monc)
-{
-       struct ceph_mount_args *args = monc->client->mount_args;
-       struct ceph_entity_addr *mon_addr = args->mon_addr;
-       int num_mon = args->num_mon;
-       int i;
-
-       /* build initial monmap */
-       monc->monmap = kzalloc(sizeof(*monc->monmap) +
-                              num_mon*sizeof(monc->monmap->mon_inst[0]),
-                              GFP_KERNEL);
-       if (!monc->monmap)
-               return -ENOMEM;
-       for (i = 0; i < num_mon; i++) {
-               monc->monmap->mon_inst[i].addr = mon_addr[i];
-               monc->monmap->mon_inst[i].addr.nonce = 0;
-               monc->monmap->mon_inst[i].name.type =
-                       CEPH_ENTITY_TYPE_MON;
-               monc->monmap->mon_inst[i].name.num = cpu_to_le64(i);
-       }
-       monc->monmap->num_mon = num_mon;
-       monc->have_fsid = false;
-
-       /* release addr memory */
-       kfree(args->mon_addr);
-       args->mon_addr = NULL;
-       args->num_mon = 0;
-       return 0;
-}
-
-int ceph_monc_init(struct ceph_mon_client *monc, struct ceph_client *cl)
-{
-       int err = 0;
-
-       dout("init\n");
-       memset(monc, 0, sizeof(*monc));
-       monc->client = cl;
-       monc->monmap = NULL;
-       mutex_init(&monc->mutex);
-
-       err = build_initial_monmap(monc);
-       if (err)
-               goto out;
-
-       monc->con = NULL;
-
-       /* authentication */
-       monc->auth = ceph_auth_init(cl->mount_args->name,
-                                   cl->mount_args->secret);
-       if (IS_ERR(monc->auth))
-               return PTR_ERR(monc->auth);
-       monc->auth->want_keys =
-               CEPH_ENTITY_TYPE_AUTH | CEPH_ENTITY_TYPE_MON |
-               CEPH_ENTITY_TYPE_OSD | CEPH_ENTITY_TYPE_MDS;
-
-       /* msgs */
-       err = -ENOMEM;
-       monc->m_subscribe_ack = ceph_msg_new(CEPH_MSG_MON_SUBSCRIBE_ACK,
-                                    sizeof(struct ceph_mon_subscribe_ack),
-                                    GFP_NOFS);
-       if (!monc->m_subscribe_ack)
-               goto out_monmap;
-
-       monc->m_subscribe = ceph_msg_new(CEPH_MSG_MON_SUBSCRIBE, 96, GFP_NOFS);
-       if (!monc->m_subscribe)
-               goto out_subscribe_ack;
-
-       monc->m_auth_reply = ceph_msg_new(CEPH_MSG_AUTH_REPLY, 4096, GFP_NOFS);
-       if (!monc->m_auth_reply)
-               goto out_subscribe;
-
-       monc->m_auth = ceph_msg_new(CEPH_MSG_AUTH, 4096, GFP_NOFS);
-       monc->pending_auth = 0;
-       if (!monc->m_auth)
-               goto out_auth_reply;
-
-       monc->cur_mon = -1;
-       monc->hunting = true;
-       monc->sub_renew_after = jiffies;
-       monc->sub_sent = 0;
-
-       INIT_DELAYED_WORK(&monc->delayed_work, delayed_work);
-       monc->generic_request_tree = RB_ROOT;
-       monc->num_generic_requests = 0;
-       monc->last_tid = 0;
-
-       monc->have_mdsmap = 0;
-       monc->have_osdmap = 0;
-       monc->want_next_osdmap = 1;
-       return 0;
-
-out_auth_reply:
-       ceph_msg_put(monc->m_auth_reply);
-out_subscribe:
-       ceph_msg_put(monc->m_subscribe);
-out_subscribe_ack:
-       ceph_msg_put(monc->m_subscribe_ack);
-out_monmap:
-       kfree(monc->monmap);
-out:
-       return err;
-}
-
-void ceph_monc_stop(struct ceph_mon_client *monc)
-{
-       dout("stop\n");
-       cancel_delayed_work_sync(&monc->delayed_work);
-
-       mutex_lock(&monc->mutex);
-       __close_session(monc);
-       if (monc->con) {
-               monc->con->private = NULL;
-               monc->con->ops->put(monc->con);
-               monc->con = NULL;
-       }
-       mutex_unlock(&monc->mutex);
-
-       ceph_auth_destroy(monc->auth);
-
-       ceph_msg_put(monc->m_auth);
-       ceph_msg_put(monc->m_auth_reply);
-       ceph_msg_put(monc->m_subscribe);
-       ceph_msg_put(monc->m_subscribe_ack);
-
-       kfree(monc->monmap);
-}
-
-static void handle_auth_reply(struct ceph_mon_client *monc,
-                             struct ceph_msg *msg)
-{
-       int ret;
-       int was_auth = 0;
-
-       mutex_lock(&monc->mutex);
-       if (monc->auth->ops)
-               was_auth = monc->auth->ops->is_authenticated(monc->auth);
-       monc->pending_auth = 0;
-       ret = ceph_handle_auth_reply(monc->auth, msg->front.iov_base,
-                                    msg->front.iov_len,
-                                    monc->m_auth->front.iov_base,
-                                    monc->m_auth->front_max);
-       if (ret < 0) {
-               monc->client->auth_err = ret;
-               wake_up_all(&monc->client->auth_wq);
-       } else if (ret > 0) {
-               __send_prepared_auth_request(monc, ret);
-       } else if (!was_auth && monc->auth->ops->is_authenticated(monc->auth)) {
-               dout("authenticated, starting session\n");
-
-               monc->client->msgr->inst.name.type = CEPH_ENTITY_TYPE_CLIENT;
-               monc->client->msgr->inst.name.num =
-                                       cpu_to_le64(monc->auth->global_id);
-
-               __send_subscribe(monc);
-               __resend_generic_request(monc);
-       }
-       mutex_unlock(&monc->mutex);
-}
-
-static int __validate_auth(struct ceph_mon_client *monc)
-{
-       int ret;
-
-       if (monc->pending_auth)
-               return 0;
-
-       ret = ceph_build_auth(monc->auth, monc->m_auth->front.iov_base,
-                             monc->m_auth->front_max);
-       if (ret <= 0)
-               return ret; /* either an error, or no need to authenticate */
-       __send_prepared_auth_request(monc, ret);
-       return 0;
-}
-
-int ceph_monc_validate_auth(struct ceph_mon_client *monc)
-{
-       int ret;
-
-       mutex_lock(&monc->mutex);
-       ret = __validate_auth(monc);
-       mutex_unlock(&monc->mutex);
-       return ret;
-}
-
-/*
- * handle incoming message
- */
-static void dispatch(struct ceph_connection *con, struct ceph_msg *msg)
-{
-       struct ceph_mon_client *monc = con->private;
-       int type = le16_to_cpu(msg->hdr.type);
-
-       if (!monc)
-               return;
-
-       switch (type) {
-       case CEPH_MSG_AUTH_REPLY:
-               handle_auth_reply(monc, msg);
-               break;
-
-       case CEPH_MSG_MON_SUBSCRIBE_ACK:
-               handle_subscribe_ack(monc, msg);
-               break;
-
-       case CEPH_MSG_STATFS_REPLY:
-               handle_statfs_reply(monc, msg);
-               break;
-
-       case CEPH_MSG_POOLOP_REPLY:
-               handle_poolop_reply(monc, msg);
-               break;
-
-       case CEPH_MSG_MON_MAP:
-               ceph_monc_handle_map(monc, msg);
-               break;
-
-       case CEPH_MSG_MDS_MAP:
-               ceph_mdsc_handle_map(&monc->client->mdsc, msg);
-               break;
-
-       case CEPH_MSG_OSD_MAP:
-               ceph_osdc_handle_map(&monc->client->osdc, msg);
-               break;
-
-       default:
-               pr_err("received unknown message type %d %s\n", type,
-                      ceph_msg_type_name(type));
-       }
-       ceph_msg_put(msg);
-}
-
-/*
- * Allocate memory for incoming message
- */
-static struct ceph_msg *mon_alloc_msg(struct ceph_connection *con,
-                                     struct ceph_msg_header *hdr,
-                                     int *skip)
-{
-       struct ceph_mon_client *monc = con->private;
-       int type = le16_to_cpu(hdr->type);
-       int front_len = le32_to_cpu(hdr->front_len);
-       struct ceph_msg *m = NULL;
-
-       *skip = 0;
-
-       switch (type) {
-       case CEPH_MSG_MON_SUBSCRIBE_ACK:
-               m = ceph_msg_get(monc->m_subscribe_ack);
-               break;
-       case CEPH_MSG_POOLOP_REPLY:
-       case CEPH_MSG_STATFS_REPLY:
-               return get_generic_reply(con, hdr, skip);
-       case CEPH_MSG_AUTH_REPLY:
-               m = ceph_msg_get(monc->m_auth_reply);
-               break;
-       case CEPH_MSG_MON_MAP:
-       case CEPH_MSG_MDS_MAP:
-       case CEPH_MSG_OSD_MAP:
-               m = ceph_msg_new(type, front_len, GFP_NOFS);
-               break;
-       }
-
-       if (!m) {
-               pr_info("alloc_msg unknown type %d\n", type);
-               *skip = 1;
-       }
-       return m;
-}
-
-/*
- * If the monitor connection resets, pick a new monitor and resubmit
- * any pending requests.
- */
-static void mon_fault(struct ceph_connection *con)
-{
-       struct ceph_mon_client *monc = con->private;
-
-       if (!monc)
-               return;
-
-       dout("mon_fault\n");
-       mutex_lock(&monc->mutex);
-       if (!con->private)
-               goto out;
-
-       if (monc->con && !monc->hunting)
-               pr_info("mon%d %s session lost, "
-                       "hunting for new mon\n", monc->cur_mon,
-                       pr_addr(&monc->con->peer_addr.in_addr));
-
-       __close_session(monc);
-       if (!monc->hunting) {
-               /* start hunting */
-               monc->hunting = true;
-               __open_session(monc);
-       } else {
-               /* already hunting, let's wait a bit */
-               __schedule_delayed(monc);
-       }
-out:
-       mutex_unlock(&monc->mutex);
-}
-
-static const struct ceph_connection_operations mon_con_ops = {
-       .get = ceph_con_get,
-       .put = ceph_con_put,
-       .dispatch = dispatch,
-       .fault = mon_fault,
-       .alloc_msg = mon_alloc_msg,
-};
diff --git a/fs/ceph/mon_client.h b/fs/ceph/mon_client.h

deleted file mode 100644 (file)

index 8e396f2..0000000
--- a/fs/ceph/mon_client.h
+++ /dev/null
@@ -1,121 +0,0 @@
-#ifndef _FS_CEPH_MON_CLIENT_H
-#define _FS_CEPH_MON_CLIENT_H
-
-#include <linux/completion.h>
-#include <linux/kref.h>
-#include <linux/rbtree.h>
-
-#include "messenger.h"
-
-struct ceph_client;
-struct ceph_mount_args;
-struct ceph_auth_client;
-
-/*
- * The monitor map enumerates the set of all monitors.
- */
-struct ceph_monmap {
-       struct ceph_fsid fsid;
-       u32 epoch;
-       u32 num_mon;
-       struct ceph_entity_inst mon_inst[0];
-};
-
-struct ceph_mon_client;
-struct ceph_mon_generic_request;
-
-
-/*
- * Generic mechanism for resending monitor requests.
- */
-typedef void (*ceph_monc_request_func_t)(struct ceph_mon_client *monc,
-                                        int newmon);
-
-/* a pending monitor request */
-struct ceph_mon_request {
-       struct ceph_mon_client *monc;
-       struct delayed_work delayed_work;
-       unsigned long delay;
-       ceph_monc_request_func_t do_request;
-};
-
-/*
- * ceph_mon_generic_request is being used for the statfs and poolop requests
- * which are bening done a bit differently because we need to get data back
- * to the caller
- */
-struct ceph_mon_generic_request {
-       struct kref kref;
-       u64 tid;
-       struct rb_node node;
-       int result;
-       void *buf;
-       int buf_len;
-       struct completion completion;
-       struct ceph_msg *request;  /* original request */
-       struct ceph_msg *reply;    /* and reply */
-};
-
-struct ceph_mon_client {
-       struct ceph_client *client;
-       struct ceph_monmap *monmap;
-
-       struct mutex mutex;
-       struct delayed_work delayed_work;
-
-       struct ceph_auth_client *auth;
-       struct ceph_msg *m_auth, *m_auth_reply, *m_subscribe, *m_subscribe_ack;
-       int pending_auth;
-
-       bool hunting;
-       int cur_mon;                       /* last monitor i contacted */
-       unsigned long sub_sent, sub_renew_after;
-       struct ceph_connection *con;
-       bool have_fsid;
-
-       /* pending generic requests */
-       struct rb_root generic_request_tree;
-       int num_generic_requests;
-       u64 last_tid;
-
-       /* mds/osd map */
-       int want_next_osdmap; /* 1 = want, 2 = want+asked */
-       u32 have_osdmap, have_mdsmap;
-
-#ifdef CONFIG_DEBUG_FS
-       struct dentry *debugfs_file;
-#endif
-};
-
-extern struct ceph_monmap *ceph_monmap_decode(void *p, void *end);
-extern int ceph_monmap_contains(struct ceph_monmap *m,
-                               struct ceph_entity_addr *addr);
-
-extern int ceph_monc_init(struct ceph_mon_client *monc, struct ceph_client *cl);
-extern void ceph_monc_stop(struct ceph_mon_client *monc);
-
-/*
- * The model here is to indicate that we need a new map of at least
- * epoch @want, and also call in when we receive a map.  We will
- * periodically rerequest the map from the monitor cluster until we
- * get what we want.
- */
-extern int ceph_monc_got_mdsmap(struct ceph_mon_client *monc, u32 have);
-extern int ceph_monc_got_osdmap(struct ceph_mon_client *monc, u32 have);
-
-extern void ceph_monc_request_next_osdmap(struct ceph_mon_client *monc);
-
-extern int ceph_monc_do_statfs(struct ceph_mon_client *monc,
-                              struct ceph_statfs *buf);
-
-extern int ceph_monc_open_session(struct ceph_mon_client *monc);
-
-extern int ceph_monc_validate_auth(struct ceph_mon_client *monc);
-
-extern int ceph_monc_create_snapid(struct ceph_mon_client *monc,
-                                  u32 pool, u64 *snapid);
-
-extern int ceph_monc_delete_snapid(struct ceph_mon_client *monc,
-                                  u32 pool, u64 snapid);
-
-#endif
diff --git a/fs/ceph/msgpool.c b/fs/ceph/msgpool.c

deleted file mode 100644 (file)

index dd65a64..0000000
--- a/fs/ceph/msgpool.c
+++ /dev/null
@@ -1,64 +0,0 @@
-#include "ceph_debug.h"
-
-#include <linux/err.h>
-#include <linux/sched.h>
-#include <linux/types.h>
-#include <linux/vmalloc.h>
-
-#include "msgpool.h"
-
-static void *alloc_fn(gfp_t gfp_mask, void *arg)
-{
-       struct ceph_msgpool *pool = arg;
-       void *p;
-
-       p = ceph_msg_new(0, pool->front_len, gfp_mask);
-       if (!p)
-               pr_err("msgpool %s alloc failed\n", pool->name);
-       return p;
-}
-
-static void free_fn(void *element, void *arg)
-{
-       ceph_msg_put(element);
-}
-
-int ceph_msgpool_init(struct ceph_msgpool *pool,
-                     int front_len, int size, bool blocking, const char *name)
-{
-       pool->front_len = front_len;
-       pool->pool = mempool_create(size, alloc_fn, free_fn, pool);
-       if (!pool->pool)
-               return -ENOMEM;
-       pool->name = name;
-       return 0;
-}
-
-void ceph_msgpool_destroy(struct ceph_msgpool *pool)
-{
-       mempool_destroy(pool->pool);
-}
-
-struct ceph_msg *ceph_msgpool_get(struct ceph_msgpool *pool,
-                                 int front_len)
-{
-       if (front_len > pool->front_len) {
-               pr_err("msgpool_get pool %s need front %d, pool size is %d\n",
-                      pool->name, front_len, pool->front_len);
-               WARN_ON(1);
-
-               /* try to alloc a fresh message */
-               return ceph_msg_new(0, front_len, GFP_NOFS);
-       }
-
-       return mempool_alloc(pool->pool, GFP_NOFS);
-}
-
-void ceph_msgpool_put(struct ceph_msgpool *pool, struct ceph_msg *msg)
-{
-       /* reset msg front_len; user may have changed it */
-       msg->front.iov_len = pool->front_len;
-       msg->hdr.front_len = cpu_to_le32(pool->front_len);
-
-       kref_init(&msg->kref);  /* retake single ref */
-}
diff --git a/fs/ceph/msgpool.h b/fs/ceph/msgpool.h

deleted file mode 100644 (file)

index a362605..0000000
--- a/fs/ceph/msgpool.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef _FS_CEPH_MSGPOOL
-#define _FS_CEPH_MSGPOOL
-
-#include <linux/mempool.h>
-#include "messenger.h"
-
-/*
- * we use memory pools for preallocating messages we may receive, to
- * avoid unexpected OOM conditions.
- */
-struct ceph_msgpool {
-       const char *name;
-       mempool_t *pool;
-       int front_len;          /* preallocated payload size */
-};
-
-extern int ceph_msgpool_init(struct ceph_msgpool *pool,
-                            int front_len, int size, bool blocking,
-                            const char *name);
-extern void ceph_msgpool_destroy(struct ceph_msgpool *pool);
-extern struct ceph_msg *ceph_msgpool_get(struct ceph_msgpool *,
-                                        int front_len);
-extern void ceph_msgpool_put(struct ceph_msgpool *, struct ceph_msg *);
-
-#endif
diff --git a/fs/ceph/msgr.h b/fs/ceph/msgr.h

deleted file mode 100644 (file)

index 680d3d6..0000000
--- a/fs/ceph/msgr.h
+++ /dev/null
@@ -1,175 +0,0 @@
-#ifndef CEPH_MSGR_H
-#define CEPH_MSGR_H
-
-/*
- * Data types for message passing layer used by Ceph.
- */
-
-#define CEPH_MON_PORT    6789  /* default monitor port */
-
-/*
- * client-side processes will try to bind to ports in this
- * range, simply for the benefit of tools like nmap or wireshark
- * that would like to identify the protocol.
- */
-#define CEPH_PORT_FIRST  6789
-#define CEPH_PORT_START  6800  /* non-monitors start here */
-#define CEPH_PORT_LAST   6900
-
-/*
- * tcp connection banner.  include a protocol version. and adjust
- * whenever the wire protocol changes.  try to keep this string length
- * constant.
- */
-#define CEPH_BANNER "ceph v027"
-#define CEPH_BANNER_MAX_LEN 30
-
-
-/*
- * Rollover-safe type and comparator for 32-bit sequence numbers.
- * Comparator returns -1, 0, or 1.
- */
-typedef __u32 ceph_seq_t;
-
-static inline __s32 ceph_seq_cmp(__u32 a, __u32 b)
-{
-       return (__s32)a - (__s32)b;
-}
-
-
-/*
- * entity_name -- logical name for a process participating in the
- * network, e.g. 'mds0' or 'osd3'.
- */
-struct ceph_entity_name {
-       __u8 type;      /* CEPH_ENTITY_TYPE_* */
-       __le64 num;
-} __attribute__ ((packed));
-
-#define CEPH_ENTITY_TYPE_MON    0x01
-#define CEPH_ENTITY_TYPE_MDS    0x02
-#define CEPH_ENTITY_TYPE_OSD    0x04
-#define CEPH_ENTITY_TYPE_CLIENT 0x08
-#define CEPH_ENTITY_TYPE_AUTH   0x20
-
-#define CEPH_ENTITY_TYPE_ANY    0xFF
-
-extern const char *ceph_entity_type_name(int type);
-
-/*
- * entity_addr -- network address
- */
-struct ceph_entity_addr {
-       __le32 type;
-       __le32 nonce;  /* unique id for process (e.g. pid) */
-       struct sockaddr_storage in_addr;
-} __attribute__ ((packed));
-
-struct ceph_entity_inst {
-       struct ceph_entity_name name;
-       struct ceph_entity_addr addr;
-} __attribute__ ((packed));
-
-
-/* used by message exchange protocol */
-#define CEPH_MSGR_TAG_READY         1  /* server->client: ready for messages */
-#define CEPH_MSGR_TAG_RESETSESSION  2  /* server->client: reset, try again */
-#define CEPH_MSGR_TAG_WAIT          3  /* server->client: wait for racing
-                                         incoming connection */
-#define CEPH_MSGR_TAG_RETRY_SESSION 4  /* server->client + cseq: try again
-                                         with higher cseq */
-#define CEPH_MSGR_TAG_RETRY_GLOBAL  5  /* server->client + gseq: try again
-                                         with higher gseq */
-#define CEPH_MSGR_TAG_CLOSE         6  /* closing pipe */
-#define CEPH_MSGR_TAG_MSG           7  /* message */
-#define CEPH_MSGR_TAG_ACK           8  /* message ack */
-#define CEPH_MSGR_TAG_KEEPALIVE     9  /* just a keepalive byte! */
-#define CEPH_MSGR_TAG_BADPROTOVER  10  /* bad protocol version */
-#define CEPH_MSGR_TAG_BADAUTHORIZER 11 /* bad authorizer */
-#define CEPH_MSGR_TAG_FEATURES      12 /* insufficient features */
-
-
-/*
- * connection negotiation
- */
-struct ceph_msg_connect {
-       __le64 features;     /* supported feature bits */
-       __le32 host_type;    /* CEPH_ENTITY_TYPE_* */
-       __le32 global_seq;   /* count connections initiated by this host */
-       __le32 connect_seq;  /* count connections initiated in this session */
-       __le32 protocol_version;
-       __le32 authorizer_protocol;
-       __le32 authorizer_len;
-       __u8  flags;         /* CEPH_MSG_CONNECT_* */
-} __attribute__ ((packed));
-
-struct ceph_msg_connect_reply {
-       __u8 tag;
-       __le64 features;     /* feature bits for this session */
-       __le32 global_seq;
-       __le32 connect_seq;
-       __le32 protocol_version;
-       __le32 authorizer_len;
-       __u8 flags;
-} __attribute__ ((packed));
-
-#define CEPH_MSG_CONNECT_LOSSY  1  /* messages i send may be safely dropped */
-
-
-/*
- * message header
- */
-struct ceph_msg_header_old {
-       __le64 seq;       /* message seq# for this session */
-       __le64 tid;       /* transaction id */
-       __le16 type;      /* message type */
-       __le16 priority;  /* priority.  higher value == higher priority */
-       __le16 version;   /* version of message encoding */
-
-       __le32 front_len; /* bytes in main payload */
-       __le32 middle_len;/* bytes in middle payload */
-       __le32 data_len;  /* bytes of data payload */
-       __le16 data_off;  /* sender: include full offset;
-                            receiver: mask against ~PAGE_MASK */
-
-       struct ceph_entity_inst src, orig_src;
-       __le32 reserved;
-       __le32 crc;       /* header crc32c */
-} __attribute__ ((packed));
-
-struct ceph_msg_header {
-       __le64 seq;       /* message seq# for this session */
-       __le64 tid;       /* transaction id */
-       __le16 type;      /* message type */
-       __le16 priority;  /* priority.  higher value == higher priority */
-       __le16 version;   /* version of message encoding */
-
-       __le32 front_len; /* bytes in main payload */
-       __le32 middle_len;/* bytes in middle payload */
-       __le32 data_len;  /* bytes of data payload */
-       __le16 data_off;  /* sender: include full offset;
-                            receiver: mask against ~PAGE_MASK */
-
-       struct ceph_entity_name src;
-       __le32 reserved;
-       __le32 crc;       /* header crc32c */
-} __attribute__ ((packed));
-
-#define CEPH_MSG_PRIO_LOW     64
-#define CEPH_MSG_PRIO_DEFAULT 127
-#define CEPH_MSG_PRIO_HIGH    196
-#define CEPH_MSG_PRIO_HIGHEST 255
-
-/*
- * follows data payload
- */
-struct ceph_msg_footer {
-       __le32 front_crc, middle_crc, data_crc;
-       __u8 flags;
-} __attribute__ ((packed));
-
-#define CEPH_MSG_FOOTER_COMPLETE  (1<<0)   /* msg wasn't aborted */
-#define CEPH_MSG_FOOTER_NOCRC     (1<<1)   /* no data crc */
-
-
-#endif
diff --git a/fs/ceph/osd_client.c b/fs/ceph/osd_client.c

deleted file mode 100644 (file)

index dfced1d..0000000
--- a/fs/ceph/osd_client.c
+++ /dev/null
@@ -1,1539 +0,0 @@
-#include "ceph_debug.h"
-
-#include <linux/err.h>
-#include <linux/highmem.h>
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/slab.h>
-#include <linux/uaccess.h>
-
-#include "super.h"
-#include "osd_client.h"
-#include "messenger.h"
-#include "decode.h"
-#include "auth.h"
-
-#define OSD_OP_FRONT_LEN       4096
-#define OSD_OPREPLY_FRONT_LEN  512
-
-static const struct ceph_connection_operations osd_con_ops;
-static int __kick_requests(struct ceph_osd_client *osdc,
-                         struct ceph_osd *kickosd);
-
-static void kick_requests(struct ceph_osd_client *osdc, struct ceph_osd *osd);
-
-/*
- * Implement client access to distributed object storage cluster.
- *
- * All data objects are stored within a cluster/cloud of OSDs, or
- * "object storage devices."  (Note that Ceph OSDs have _nothing_ to
- * do with the T10 OSD extensions to SCSI.)  Ceph OSDs are simply
- * remote daemons serving up and coordinating consistent and safe
- * access to storage.
- *
- * Cluster membership and the mapping of data objects onto storage devices
- * are described by the osd map.
- *
- * We keep track of pending OSD requests (read, write), resubmit
- * requests to different OSDs when the cluster topology/data layout
- * change, or retry the affected requests when the communications
- * channel with an OSD is reset.
- */
-
-/*
- * calculate the mapping of a file extent onto an object, and fill out the
- * request accordingly.  shorten extent as necessary if it crosses an
- * object boundary.
- *
- * fill osd op in request message.
- */
-static void calc_layout(struct ceph_osd_client *osdc,
-                       struct ceph_vino vino, struct ceph_file_layout *layout,
-                       u64 off, u64 *plen,
-                       struct ceph_osd_request *req)
-{
-       struct ceph_osd_request_head *reqhead = req->r_request->front.iov_base;
-       struct ceph_osd_op *op = (void *)(reqhead + 1);
-       u64 orig_len = *plen;
-       u64 objoff, objlen;    /* extent in object */
-       u64 bno;
-
-       reqhead->snapid = cpu_to_le64(vino.snap);
-
-       /* object extent? */
-       ceph_calc_file_object_mapping(layout, off, plen, &bno,
-                                     &objoff, &objlen);
-       if (*plen < orig_len)
-               dout(" skipping last %llu, final file extent %llu~%llu\n",
-                    orig_len - *plen, off, *plen);
-
-       sprintf(req->r_oid, "%llx.%08llx", vino.ino, bno);
-       req->r_oid_len = strlen(req->r_oid);
-
-       op->extent.offset = cpu_to_le64(objoff);
-       op->extent.length = cpu_to_le64(objlen);
-       req->r_num_pages = calc_pages_for(off, *plen);
-
-       dout("calc_layout %s (%d) %llu~%llu (%d pages)\n",
-            req->r_oid, req->r_oid_len, objoff, objlen, req->r_num_pages);
-}
-
-/*
- * requests
- */
-void ceph_osdc_release_request(struct kref *kref)
-{
-       struct ceph_osd_request *req = container_of(kref,
-                                                   struct ceph_osd_request,
-                                                   r_kref);
-
-       if (req->r_request)
-               ceph_msg_put(req->r_request);
-       if (req->r_reply)
-               ceph_msg_put(req->r_reply);
-       if (req->r_con_filling_msg) {
-               dout("release_request revoking pages %p from con %p\n",
-                    req->r_pages, req->r_con_filling_msg);
-               ceph_con_revoke_message(req->r_con_filling_msg,
-                                     req->r_reply);
-               ceph_con_put(req->r_con_filling_msg);
-       }
-       if (req->r_own_pages)
-               ceph_release_page_vector(req->r_pages,
-                                        req->r_num_pages);
-       ceph_put_snap_context(req->r_snapc);
-       if (req->r_mempool)
-               mempool_free(req, req->r_osdc->req_mempool);
-       else
-               kfree(req);
-}
-
-/*
- * build new request AND message, calculate layout, and adjust file
- * extent as needed.
- *
- * if the file was recently truncated, we include information about its
- * old and new size so that the object can be updated appropriately.  (we
- * avoid synchronously deleting truncated objects because it's slow.)
- *
- * if @do_sync, include a 'startsync' command so that the osd will flush
- * data quickly.
- */
-struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc,
-                                              struct ceph_file_layout *layout,
-                                              struct ceph_vino vino,
-                                              u64 off, u64 *plen,
-                                              int opcode, int flags,
-                                              struct ceph_snap_context *snapc,
-                                              int do_sync,
-                                              u32 truncate_seq,
-                                              u64 truncate_size,
-                                              struct timespec *mtime,
-                                              bool use_mempool, int num_reply)
-{
-       struct ceph_osd_request *req;
-       struct ceph_msg *msg;
-       struct ceph_osd_request_head *head;
-       struct ceph_osd_op *op;
-       void *p;
-       int num_op = 1 + do_sync;
-       size_t msg_size = sizeof(*head) + num_op*sizeof(*op);
-       int i;
-
-       if (use_mempool) {
-               req = mempool_alloc(osdc->req_mempool, GFP_NOFS);
-               memset(req, 0, sizeof(*req));
-       } else {
-               req = kzalloc(sizeof(*req), GFP_NOFS);
-       }
-       if (req == NULL)
-               return NULL;
-
-       req->r_osdc = osdc;
-       req->r_mempool = use_mempool;
-       kref_init(&req->r_kref);
-       init_completion(&req->r_completion);
-       init_completion(&req->r_safe_completion);
-       INIT_LIST_HEAD(&req->r_unsafe_item);
-       req->r_flags = flags;
-
-       WARN_ON((flags & (CEPH_OSD_FLAG_READ|CEPH_OSD_FLAG_WRITE)) == 0);
-
-       /* create reply message */
-       if (use_mempool)
-               msg = ceph_msgpool_get(&osdc->msgpool_op_reply, 0);
-       else
-               msg = ceph_msg_new(CEPH_MSG_OSD_OPREPLY,
-                                  OSD_OPREPLY_FRONT_LEN, GFP_NOFS);
-       if (!msg) {
-               ceph_osdc_put_request(req);
-               return NULL;
-       }
-       req->r_reply = msg;
-
-       /* create request message; allow space for oid */
-       msg_size += 40;
-       if (snapc)
-               msg_size += sizeof(u64) * snapc->num_snaps;
-       if (use_mempool)
-               msg = ceph_msgpool_get(&osdc->msgpool_op, 0);
-       else
-               msg = ceph_msg_new(CEPH_MSG_OSD_OP, msg_size, GFP_NOFS);
-       if (!msg) {
-               ceph_osdc_put_request(req);
-               return NULL;
-       }
-       msg->hdr.type = cpu_to_le16(CEPH_MSG_OSD_OP);
-       memset(msg->front.iov_base, 0, msg->front.iov_len);
-       head = msg->front.iov_base;
-       op = (void *)(head + 1);
-       p = (void *)(op + num_op);
-
-       req->r_request = msg;
-       req->r_snapc = ceph_get_snap_context(snapc);
-
-       head->client_inc = cpu_to_le32(1); /* always, for now. */
-       head->flags = cpu_to_le32(flags);
-       if (flags & CEPH_OSD_FLAG_WRITE)
-               ceph_encode_timespec(&head->mtime, mtime);
-       head->num_ops = cpu_to_le16(num_op);
-       op->op = cpu_to_le16(opcode);
-
-       /* calculate max write size */
-       calc_layout(osdc, vino, layout, off, plen, req);
-       req->r_file_layout = *layout;  /* keep a copy */
-
-       if (flags & CEPH_OSD_FLAG_WRITE) {
-               req->r_request->hdr.data_off = cpu_to_le16(off);
-               req->r_request->hdr.data_len = cpu_to_le32(*plen);
-               op->payload_len = cpu_to_le32(*plen);
-       }
-       op->extent.truncate_size = cpu_to_le64(truncate_size);
-       op->extent.truncate_seq = cpu_to_le32(truncate_seq);
-
-       /* fill in oid */
-       head->object_len = cpu_to_le32(req->r_oid_len);
-       memcpy(p, req->r_oid, req->r_oid_len);
-       p += req->r_oid_len;
-
-       if (do_sync) {
-               op++;
-               op->op = cpu_to_le16(CEPH_OSD_OP_STARTSYNC);
-       }
-       if (snapc) {
-               head->snap_seq = cpu_to_le64(snapc->seq);
-               head->num_snaps = cpu_to_le32(snapc->num_snaps);
-               for (i = 0; i < snapc->num_snaps; i++) {
-                       put_unaligned_le64(snapc->snaps[i], p);
-                       p += sizeof(u64);
-               }
-       }
-
-       BUG_ON(p > msg->front.iov_base + msg->front.iov_len);
-       msg_size = p - msg->front.iov_base;
-       msg->front.iov_len = msg_size;
-       msg->hdr.front_len = cpu_to_le32(msg_size);
-       return req;
-}
-
-/*
- * We keep osd requests in an rbtree, sorted by ->r_tid.
- */
-static void __insert_request(struct ceph_osd_client *osdc,
-                            struct ceph_osd_request *new)
-{
-       struct rb_node **p = &osdc->requests.rb_node;
-       struct rb_node *parent = NULL;
-       struct ceph_osd_request *req = NULL;
-
-       while (*p) {
-               parent = *p;
-               req = rb_entry(parent, struct ceph_osd_request, r_node);
-               if (new->r_tid < req->r_tid)
-                       p = &(*p)->rb_left;
-               else if (new->r_tid > req->r_tid)
-                       p = &(*p)->rb_right;
-               else
-                       BUG();
-       }
-
-       rb_link_node(&new->r_node, parent, p);
-       rb_insert_color(&new->r_node, &osdc->requests);
-}
-
-static struct ceph_osd_request *__lookup_request(struct ceph_osd_client *osdc,
-                                                u64 tid)
-{
-       struct ceph_osd_request *req;
-       struct rb_node *n = osdc->requests.rb_node;
-
-       while (n) {
-               req = rb_entry(n, struct ceph_osd_request, r_node);
-               if (tid < req->r_tid)
-                       n = n->rb_left;
-               else if (tid > req->r_tid)
-                       n = n->rb_right;
-               else
-                       return req;
-       }
-       return NULL;
-}
-
-static struct ceph_osd_request *
-__lookup_request_ge(struct ceph_osd_client *osdc,
-                   u64 tid)
-{
-       struct ceph_osd_request *req;
-       struct rb_node *n = osdc->requests.rb_node;
-
-       while (n) {
-               req = rb_entry(n, struct ceph_osd_request, r_node);
-               if (tid < req->r_tid) {
-                       if (!n->rb_left)
-                               return req;
-                       n = n->rb_left;
-               } else if (tid > req->r_tid) {
-                       n = n->rb_right;
-               } else {
-                       return req;
-               }
-       }
-       return NULL;
-}
-
-
-/*
- * If the osd connection drops, we need to resubmit all requests.
- */
-static void osd_reset(struct ceph_connection *con)
-{
-       struct ceph_osd *osd = con->private;
-       struct ceph_osd_client *osdc;
-
-       if (!osd)
-               return;
-       dout("osd_reset osd%d\n", osd->o_osd);
-       osdc = osd->o_osdc;
-       down_read(&osdc->map_sem);
-       kick_requests(osdc, osd);
-       up_read(&osdc->map_sem);
-}
-
-/*
- * Track open sessions with osds.
- */
-static struct ceph_osd *create_osd(struct ceph_osd_client *osdc)
-{
-       struct ceph_osd *osd;
-
-       osd = kzalloc(sizeof(*osd), GFP_NOFS);
-       if (!osd)
-               return NULL;
-
-       atomic_set(&osd->o_ref, 1);
-       osd->o_osdc = osdc;
-       INIT_LIST_HEAD(&osd->o_requests);
-       INIT_LIST_HEAD(&osd->o_osd_lru);
-       osd->o_incarnation = 1;
-
-       ceph_con_init(osdc->client->msgr, &osd->o_con);
-       osd->o_con.private = osd;
-       osd->o_con.ops = &osd_con_ops;
-       osd->o_con.peer_name.type = CEPH_ENTITY_TYPE_OSD;
-
-       INIT_LIST_HEAD(&osd->o_keepalive_item);
-       return osd;
-}
-
-static struct ceph_osd *get_osd(struct ceph_osd *osd)
-{
-       if (atomic_inc_not_zero(&osd->o_ref)) {
-               dout("get_osd %p %d -> %d\n", osd, atomic_read(&osd->o_ref)-1,
-                    atomic_read(&osd->o_ref));
-               return osd;
-       } else {
-               dout("get_osd %p FAIL\n", osd);
-               return NULL;
-       }
-}
-
-static void put_osd(struct ceph_osd *osd)
-{
-       dout("put_osd %p %d -> %d\n", osd, atomic_read(&osd->o_ref),
-            atomic_read(&osd->o_ref) - 1);
-       if (atomic_dec_and_test(&osd->o_ref)) {
-               struct ceph_auth_client *ac = osd->o_osdc->client->monc.auth;
-
-               if (osd->o_authorizer)
-                       ac->ops->destroy_authorizer(ac, osd->o_authorizer);
-               kfree(osd);
-       }
-}
-
-/*
- * remove an osd from our map
- */
-static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
-{
-       dout("__remove_osd %p\n", osd);
-       BUG_ON(!list_empty(&osd->o_requests));
-       rb_erase(&osd->o_node, &osdc->osds);
-       list_del_init(&osd->o_osd_lru);
-       ceph_con_close(&osd->o_con);
-       put_osd(osd);
-}
-
-static void __move_osd_to_lru(struct ceph_osd_client *osdc,
-                             struct ceph_osd *osd)
-{
-       dout("__move_osd_to_lru %p\n", osd);
-       BUG_ON(!list_empty(&osd->o_osd_lru));
-       list_add_tail(&osd->o_osd_lru, &osdc->osd_lru);
-       osd->lru_ttl = jiffies + osdc->client->mount_args->osd_idle_ttl * HZ;
-}
-
-static void __remove_osd_from_lru(struct ceph_osd *osd)
-{
-       dout("__remove_osd_from_lru %p\n", osd);
-       if (!list_empty(&osd->o_osd_lru))
-               list_del_init(&osd->o_osd_lru);
-}
-
-static void remove_old_osds(struct ceph_osd_client *osdc, int remove_all)
-{
-       struct ceph_osd *osd, *nosd;
-
-       dout("__remove_old_osds %p\n", osdc);
-       mutex_lock(&osdc->request_mutex);
-       list_for_each_entry_safe(osd, nosd, &osdc->osd_lru, o_osd_lru) {
-               if (!remove_all && time_before(jiffies, osd->lru_ttl))
-                       break;
-               __remove_osd(osdc, osd);
-       }
-       mutex_unlock(&osdc->request_mutex);
-}
-
-/*
- * reset osd connect
- */
-static int __reset_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
-{
-       struct ceph_osd_request *req;
-       int ret = 0;
-
-       dout("__reset_osd %p osd%d\n", osd, osd->o_osd);
-       if (list_empty(&osd->o_requests)) {
-               __remove_osd(osdc, osd);
-       } else if (memcmp(&osdc->osdmap->osd_addr[osd->o_osd],
-                         &osd->o_con.peer_addr,
-                         sizeof(osd->o_con.peer_addr)) == 0 &&
-                  !ceph_con_opened(&osd->o_con)) {
-               dout(" osd addr hasn't changed and connection never opened,"
-                    " letting msgr retry");
-               /* touch each r_stamp for handle_timeout()'s benfit */
-               list_for_each_entry(req, &osd->o_requests, r_osd_item)
-                       req->r_stamp = jiffies;
-               ret = -EAGAIN;
-       } else {
-               ceph_con_close(&osd->o_con);
-               ceph_con_open(&osd->o_con, &osdc->osdmap->osd_addr[osd->o_osd]);
-               osd->o_incarnation++;
-       }
-       return ret;
-}
-
-static void __insert_osd(struct ceph_osd_client *osdc, struct ceph_osd *new)
-{
-       struct rb_node **p = &osdc->osds.rb_node;
-       struct rb_node *parent = NULL;
-       struct ceph_osd *osd = NULL;
-
-       while (*p) {
-               parent = *p;
-               osd = rb_entry(parent, struct ceph_osd, o_node);
-               if (new->o_osd < osd->o_osd)
-                       p = &(*p)->rb_left;
-               else if (new->o_osd > osd->o_osd)
-                       p = &(*p)->rb_right;
-               else
-                       BUG();
-       }
-
-       rb_link_node(&new->o_node, parent, p);
-       rb_insert_color(&new->o_node, &osdc->osds);
-}
-
-static struct ceph_osd *__lookup_osd(struct ceph_osd_client *osdc, int o)
-{
-       struct ceph_osd *osd;
-       struct rb_node *n = osdc->osds.rb_node;
-
-       while (n) {
-               osd = rb_entry(n, struct ceph_osd, o_node);
-               if (o < osd->o_osd)
-                       n = n->rb_left;
-               else if (o > osd->o_osd)
-                       n = n->rb_right;
-               else
-                       return osd;
-       }
-       return NULL;
-}
-
-static void __schedule_osd_timeout(struct ceph_osd_client *osdc)
-{
-       schedule_delayed_work(&osdc->timeout_work,
-                       osdc->client->mount_args->osd_keepalive_timeout * HZ);
-}
-
-static void __cancel_osd_timeout(struct ceph_osd_client *osdc)
-{
-       cancel_delayed_work(&osdc->timeout_work);
-}
-
-/*
- * Register request, assign tid.  If this is the first request, set up
- * the timeout event.
- */
-static void register_request(struct ceph_osd_client *osdc,
-                            struct ceph_osd_request *req)
-{
-       mutex_lock(&osdc->request_mutex);
-       req->r_tid = ++osdc->last_tid;
-       req->r_request->hdr.tid = cpu_to_le64(req->r_tid);
-       INIT_LIST_HEAD(&req->r_req_lru_item);
-
-       dout("register_request %p tid %lld\n", req, req->r_tid);
-       __insert_request(osdc, req);
-       ceph_osdc_get_request(req);
-       osdc->num_requests++;
-
-       if (osdc->num_requests == 1) {
-               dout(" first request, scheduling timeout\n");
-               __schedule_osd_timeout(osdc);
-       }
-       mutex_unlock(&osdc->request_mutex);
-}
-
-/*
- * called under osdc->request_mutex
- */
-static void __unregister_request(struct ceph_osd_client *osdc,
-                                struct ceph_osd_request *req)
-{
-       dout("__unregister_request %p tid %lld\n", req, req->r_tid);
-       rb_erase(&req->r_node, &osdc->requests);
-       osdc->num_requests--;
-
-       if (req->r_osd) {
-               /* make sure the original request isn't in flight. */
-               ceph_con_revoke(&req->r_osd->o_con, req->r_request);
-
-               list_del_init(&req->r_osd_item);
-               if (list_empty(&req->r_osd->o_requests))
-                       __move_osd_to_lru(osdc, req->r_osd);
-               req->r_osd = NULL;
-       }
-
-       ceph_osdc_put_request(req);
-
-       list_del_init(&req->r_req_lru_item);
-       if (osdc->num_requests == 0) {
-               dout(" no requests, canceling timeout\n");
-               __cancel_osd_timeout(osdc);
-       }
-}
-
-/*
- * Cancel a previously queued request message
- */
-static void __cancel_request(struct ceph_osd_request *req)
-{
-       if (req->r_sent) {
-               ceph_con_revoke(&req->r_osd->o_con, req->r_request);
-               req->r_sent = 0;
-       }
-       list_del_init(&req->r_req_lru_item);
-}
-
-/*
- * Pick an osd (the first 'up' osd in the pg), allocate the osd struct
- * (as needed), and set the request r_osd appropriately.  If there is
- * no up osd, set r_osd to NULL.
- *
- * Return 0 if unchanged, 1 if changed, or negative on error.
- *
- * Caller should hold map_sem for read and request_mutex.
- */
-static int __map_osds(struct ceph_osd_client *osdc,
-                     struct ceph_osd_request *req)
-{
-       struct ceph_osd_request_head *reqhead = req->r_request->front.iov_base;
-       struct ceph_pg pgid;
-       int acting[CEPH_PG_MAX_SIZE];
-       int o = -1, num = 0;
-       int err;
-
-       dout("map_osds %p tid %lld\n", req, req->r_tid);
-       err = ceph_calc_object_layout(&reqhead->layout, req->r_oid,
-                                     &req->r_file_layout, osdc->osdmap);
-       if (err)
-               return err;
-       pgid = reqhead->layout.ol_pgid;
-       req->r_pgid = pgid;
-
-       err = ceph_calc_pg_acting(osdc->osdmap, pgid, acting);
-       if (err > 0) {
-               o = acting[0];
-               num = err;
-       }
-
-       if ((req->r_osd && req->r_osd->o_osd == o &&
-            req->r_sent >= req->r_osd->o_incarnation &&
-            req->r_num_pg_osds == num &&
-            memcmp(req->r_pg_osds, acting, sizeof(acting[0])*num) == 0) ||
-           (req->r_osd == NULL && o == -1))
-               return 0;  /* no change */
-
-       dout("map_osds tid %llu pgid %d.%x osd%d (was osd%d)\n",
-            req->r_tid, le32_to_cpu(pgid.pool), le16_to_cpu(pgid.ps), o,
-            req->r_osd ? req->r_osd->o_osd : -1);
-
-       /* record full pg acting set */
-       memcpy(req->r_pg_osds, acting, sizeof(acting[0]) * num);
-       req->r_num_pg_osds = num;
-
-       if (req->r_osd) {
-               __cancel_request(req);
-               list_del_init(&req->r_osd_item);
-               req->r_osd = NULL;
-       }
-
-       req->r_osd = __lookup_osd(osdc, o);
-       if (!req->r_osd && o >= 0) {
-               err = -ENOMEM;
-               req->r_osd = create_osd(osdc);
-               if (!req->r_osd)
-                       goto out;
-
-               dout("map_osds osd %p is osd%d\n", req->r_osd, o);
-               req->r_osd->o_osd = o;
-               req->r_osd->o_con.peer_name.num = cpu_to_le64(o);
-               __insert_osd(osdc, req->r_osd);
-
-               ceph_con_open(&req->r_osd->o_con, &osdc->osdmap->osd_addr[o]);
-       }
-
-       if (req->r_osd) {
-               __remove_osd_from_lru(req->r_osd);
-               list_add(&req->r_osd_item, &req->r_osd->o_requests);
-       }
-       err = 1;   /* osd or pg changed */
-
-out:
-       return err;
-}
-
-/*
- * caller should hold map_sem (for read) and request_mutex
- */
-static int __send_request(struct ceph_osd_client *osdc,
-                         struct ceph_osd_request *req)
-{
-       struct ceph_osd_request_head *reqhead;
-       int err;
-
-       err = __map_osds(osdc, req);
-       if (err < 0)
-               return err;
-       if (req->r_osd == NULL) {
-               dout("send_request %p no up osds in pg\n", req);
-               ceph_monc_request_next_osdmap(&osdc->client->monc);
-               return 0;
-       }
-
-       dout("send_request %p tid %llu to osd%d flags %d\n",
-            req, req->r_tid, req->r_osd->o_osd, req->r_flags);
-
-       reqhead = req->r_request->front.iov_base;
-       reqhead->osdmap_epoch = cpu_to_le32(osdc->osdmap->epoch);
-       reqhead->flags |= cpu_to_le32(req->r_flags);  /* e.g., RETRY */
-       reqhead->reassert_version = req->r_reassert_version;
-
-       req->r_stamp = jiffies;
-       list_move_tail(&req->r_req_lru_item, &osdc->req_lru);
-
-       ceph_msg_get(req->r_request); /* send consumes a ref */
-       ceph_con_send(&req->r_osd->o_con, req->r_request);
-       req->r_sent = req->r_osd->o_incarnation;
-       return 0;
-}
-
-/*
- * Timeout callback, called every N seconds when 1 or more osd
- * requests has been active for more than N seconds.  When this
- * happens, we ping all OSDs with requests who have timed out to
- * ensure any communications channel reset is detected.  Reset the
- * request timeouts another N seconds in the future as we go.
- * Reschedule the timeout event another N seconds in future (unless
- * there are no open requests).
- */
-static void handle_timeout(struct work_struct *work)
-{
-       struct ceph_osd_client *osdc =
-               container_of(work, struct ceph_osd_client, timeout_work.work);
-       struct ceph_osd_request *req, *last_req = NULL;
-       struct ceph_osd *osd;
-       unsigned long timeout = osdc->client->mount_args->osd_timeout * HZ;
-       unsigned long keepalive =
-               osdc->client->mount_args->osd_keepalive_timeout * HZ;
-       unsigned long last_stamp = 0;
-       struct rb_node *p;
-       struct list_head slow_osds;
-
-       dout("timeout\n");
-       down_read(&osdc->map_sem);
-
-       ceph_monc_request_next_osdmap(&osdc->client->monc);
-
-       mutex_lock(&osdc->request_mutex);
-       for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
-               req = rb_entry(p, struct ceph_osd_request, r_node);
-
-               if (req->r_resend) {
-                       int err;
-
-                       dout("osdc resending prev failed %lld\n", req->r_tid);
-                       err = __send_request(osdc, req);
-                       if (err)
-                               dout("osdc failed again on %lld\n", req->r_tid);
-                       else
-                               req->r_resend = false;
-                       continue;
-               }
-       }
-
-       /*
-        * reset osds that appear to be _really_ unresponsive.  this
-        * is a failsafe measure.. we really shouldn't be getting to
-        * this point if the system is working properly.  the monitors
-        * should mark the osd as failed and we should find out about
-        * it from an updated osd map.
-        */
-       while (timeout && !list_empty(&osdc->req_lru)) {
-               req = list_entry(osdc->req_lru.next, struct ceph_osd_request,
-                                r_req_lru_item);
-
-               if (time_before(jiffies, req->r_stamp + timeout))
-                       break;
-
-               BUG_ON(req == last_req && req->r_stamp == last_stamp);
-               last_req = req;
-               last_stamp = req->r_stamp;
-
-               osd = req->r_osd;
-               BUG_ON(!osd);
-               pr_warning(" tid %llu timed out on osd%d, will reset osd\n",
-                          req->r_tid, osd->o_osd);
-               __kick_requests(osdc, osd);
-       }
-
-       /*
-        * ping osds that are a bit slow.  this ensures that if there
-        * is a break in the TCP connection we will notice, and reopen
-        * a connection with that osd (from the fault callback).
-        */
-       INIT_LIST_HEAD(&slow_osds);
-       list_for_each_entry(req, &osdc->req_lru, r_req_lru_item) {
-               if (time_before(jiffies, req->r_stamp + keepalive))
-                       break;
-
-               osd = req->r_osd;
-               BUG_ON(!osd);
-               dout(" tid %llu is slow, will send keepalive on osd%d\n",
-                    req->r_tid, osd->o_osd);
-               list_move_tail(&osd->o_keepalive_item, &slow_osds);
-       }
-       while (!list_empty(&slow_osds)) {
-               osd = list_entry(slow_osds.next, struct ceph_osd,
-                                o_keepalive_item);
-               list_del_init(&osd->o_keepalive_item);
-               ceph_con_keepalive(&osd->o_con);
-       }
-
-       __schedule_osd_timeout(osdc);
-       mutex_unlock(&osdc->request_mutex);
-
-       up_read(&osdc->map_sem);
-}
-
-static void handle_osds_timeout(struct work_struct *work)
-{
-       struct ceph_osd_client *osdc =
-               container_of(work, struct ceph_osd_client,
-                            osds_timeout_work.work);
-       unsigned long delay =
-               osdc->client->mount_args->osd_idle_ttl * HZ >> 2;
-
-       dout("osds timeout\n");
-       down_read(&osdc->map_sem);
-       remove_old_osds(osdc, 0);
-       up_read(&osdc->map_sem);
-
-       schedule_delayed_work(&osdc->osds_timeout_work,
-                             round_jiffies_relative(delay));
-}
-
-/*
- * handle osd op reply.  either call the callback if it is specified,
- * or do the completion to wake up the waiting thread.
- */
-static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
-                        struct ceph_connection *con)
-{
-       struct ceph_osd_reply_head *rhead = msg->front.iov_base;
-       struct ceph_osd_request *req;
-       u64 tid;
-       int numops, object_len, flags;
-       s32 result;
-
-       tid = le64_to_cpu(msg->hdr.tid);
-       if (msg->front.iov_len < sizeof(*rhead))
-               goto bad;
-       numops = le32_to_cpu(rhead->num_ops);
-       object_len = le32_to_cpu(rhead->object_len);
-       result = le32_to_cpu(rhead->result);
-       if (msg->front.iov_len != sizeof(*rhead) + object_len +
-           numops * sizeof(struct ceph_osd_op))
-               goto bad;
-       dout("handle_reply %p tid %llu result %d\n", msg, tid, (int)result);
-
-       /* lookup */
-       mutex_lock(&osdc->request_mutex);
-       req = __lookup_request(osdc, tid);
-       if (req == NULL) {
-               dout("handle_reply tid %llu dne\n", tid);
-               mutex_unlock(&osdc->request_mutex);
-               return;
-       }
-       ceph_osdc_get_request(req);
-       flags = le32_to_cpu(rhead->flags);
-
-       /*
-        * if this connection filled our message, drop our reference now, to
-        * avoid a (safe but slower) revoke later.
-        */
-       if (req->r_con_filling_msg == con && req->r_reply == msg) {
-               dout(" dropping con_filling_msg ref %p\n", con);
-               req->r_con_filling_msg = NULL;
-               ceph_con_put(con);
-       }
-
-       if (!req->r_got_reply) {
-               unsigned bytes;
-
-               req->r_result = le32_to_cpu(rhead->result);
-               bytes = le32_to_cpu(msg->hdr.data_len);
-               dout("handle_reply result %d bytes %d\n", req->r_result,
-                    bytes);
-               if (req->r_result == 0)
-                       req->r_result = bytes;
-
-               /* in case this is a write and we need to replay, */
-               req->r_reassert_version = rhead->reassert_version;
-
-               req->r_got_reply = 1;
-       } else if ((flags & CEPH_OSD_FLAG_ONDISK) == 0) {
-               dout("handle_reply tid %llu dup ack\n", tid);
-               mutex_unlock(&osdc->request_mutex);
-               goto done;
-       }
-
-       dout("handle_reply tid %llu flags %d\n", tid, flags);
-
-       /* either this is a read, or we got the safe response */
-       if (result < 0 ||
-           (flags & CEPH_OSD_FLAG_ONDISK) ||
-           ((flags & CEPH_OSD_FLAG_WRITE) == 0))
-               __unregister_request(osdc, req);
-
-       mutex_unlock(&osdc->request_mutex);
-
-       if (req->r_callback)
-               req->r_callback(req, msg);
-       else
-               complete_all(&req->r_completion);
-
-       if (flags & CEPH_OSD_FLAG_ONDISK) {
-               if (req->r_safe_callback)
-                       req->r_safe_callback(req, msg);
-               complete_all(&req->r_safe_completion);  /* fsync waiter */
-       }
-
-done:
-       ceph_osdc_put_request(req);
-       return;
-
-bad:
-       pr_err("corrupt osd_op_reply got %d %d expected %d\n",
-              (int)msg->front.iov_len, le32_to_cpu(msg->hdr.front_len),
-              (int)sizeof(*rhead));
-       ceph_msg_dump(msg);
-}
-
-
-static int __kick_requests(struct ceph_osd_client *osdc,
-                         struct ceph_osd *kickosd)
-{
-       struct ceph_osd_request *req;
-       struct rb_node *p, *n;
-       int needmap = 0;
-       int err;
-
-       dout("kick_requests osd%d\n", kickosd ? kickosd->o_osd : -1);
-       if (kickosd) {
-               err = __reset_osd(osdc, kickosd);
-               if (err == -EAGAIN)
-                       return 1;
-       } else {
-               for (p = rb_first(&osdc->osds); p; p = n) {
-                       struct ceph_osd *osd =
-                               rb_entry(p, struct ceph_osd, o_node);
-
-                       n = rb_next(p);
-                       if (!ceph_osd_is_up(osdc->osdmap, osd->o_osd) ||
-                           memcmp(&osd->o_con.peer_addr,
-                                  ceph_osd_addr(osdc->osdmap,
-                                                osd->o_osd),
-                                  sizeof(struct ceph_entity_addr)) != 0)
-                               __reset_osd(osdc, osd);
-               }
-       }
-
-       for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
-               req = rb_entry(p, struct ceph_osd_request, r_node);
-
-               if (req->r_resend) {
-                       dout(" r_resend set on tid %llu\n", req->r_tid);
-                       __cancel_request(req);
-                       goto kick;
-               }
-               if (req->r_osd && kickosd == req->r_osd) {
-                       __cancel_request(req);
-                       goto kick;
-               }
-
-               err = __map_osds(osdc, req);
-               if (err == 0)
-                       continue;  /* no change */
-               if (err < 0) {
-                       /*
-                        * FIXME: really, we should set the request
-                        * error and fail if this isn't a 'nofail'
-                        * request, but that's a fair bit more
-                        * complicated to do.  So retry!
-                        */
-                       dout(" setting r_resend on %llu\n", req->r_tid);
-                       req->r_resend = true;
-                       continue;
-               }
-               if (req->r_osd == NULL) {
-                       dout("tid %llu maps to no valid osd\n", req->r_tid);
-                       needmap++;  /* request a newer map */
-                       continue;
-               }
-
-kick:
-               dout("kicking %p tid %llu osd%d\n", req, req->r_tid,
-                    req->r_osd ? req->r_osd->o_osd : -1);
-               req->r_flags |= CEPH_OSD_FLAG_RETRY;
-               err = __send_request(osdc, req);
-               if (err) {
-                       dout(" setting r_resend on %llu\n", req->r_tid);
-                       req->r_resend = true;
-               }
-       }
-
-       return needmap;
-}
-
-/*
- * Resubmit osd requests whose osd or osd address has changed.  Request
- * a new osd map if osds are down, or we are otherwise unable to determine
- * how to direct a request.
- *
- * Close connections to down osds.
- *
- * If @who is specified, resubmit requests for that specific osd.
- *
- * Caller should hold map_sem for read and request_mutex.
- */
-static void kick_requests(struct ceph_osd_client *osdc,
-                         struct ceph_osd *kickosd)
-{
-       int needmap;
-
-       mutex_lock(&osdc->request_mutex);
-       needmap = __kick_requests(osdc, kickosd);
-       mutex_unlock(&osdc->request_mutex);
-
-       if (needmap) {
-               dout("%d requests for down osds, need new map\n", needmap);
-               ceph_monc_request_next_osdmap(&osdc->client->monc);
-       }
-
-}
-/*
- * Process updated osd map.
- *
- * The message contains any number of incremental and full maps, normally
- * indicating some sort of topology change in the cluster.  Kick requests
- * off to different OSDs as needed.
- */
-void ceph_osdc_handle_map(struct ceph_osd_client *osdc, struct ceph_msg *msg)
-{
-       void *p, *end, *next;
-       u32 nr_maps, maplen;
-       u32 epoch;
-       struct ceph_osdmap *newmap = NULL, *oldmap;
-       int err;
-       struct ceph_fsid fsid;
-
-       dout("handle_map have %u\n", osdc->osdmap ? osdc->osdmap->epoch : 0);
-       p = msg->front.iov_base;
-       end = p + msg->front.iov_len;
-
-       /* verify fsid */
-       ceph_decode_need(&p, end, sizeof(fsid), bad);
-       ceph_decode_copy(&p, &fsid, sizeof(fsid));
-       if (ceph_check_fsid(osdc->client, &fsid) < 0)
-               return;
-
-       down_write(&osdc->map_sem);
-
-       /* incremental maps */
-       ceph_decode_32_safe(&p, end, nr_maps, bad);
-       dout(" %d inc maps\n", nr_maps);
-       while (nr_maps > 0) {
-               ceph_decode_need(&p, end, 2*sizeof(u32), bad);
-               epoch = ceph_decode_32(&p);
-               maplen = ceph_decode_32(&p);
-               ceph_decode_need(&p, end, maplen, bad);
-               next = p + maplen;
-               if (osdc->osdmap && osdc->osdmap->epoch+1 == epoch) {
-                       dout("applying incremental map %u len %d\n",
-                            epoch, maplen);
-                       newmap = osdmap_apply_incremental(&p, next,
-                                                         osdc->osdmap,
-                                                         osdc->client->msgr);
-                       if (IS_ERR(newmap)) {
-                               err = PTR_ERR(newmap);
-                               goto bad;
-                       }
-                       BUG_ON(!newmap);
-                       if (newmap != osdc->osdmap) {
-                               ceph_osdmap_destroy(osdc->osdmap);
-                               osdc->osdmap = newmap;
-                       }
-               } else {
-                       dout("ignoring incremental map %u len %d\n",
-                            epoch, maplen);
-               }
-               p = next;
-               nr_maps--;
-       }
-       if (newmap)
-               goto done;
-
-       /* full maps */
-       ceph_decode_32_safe(&p, end, nr_maps, bad);
-       dout(" %d full maps\n", nr_maps);
-       while (nr_maps) {
-               ceph_decode_need(&p, end, 2*sizeof(u32), bad);
-               epoch = ceph_decode_32(&p);
-               maplen = ceph_decode_32(&p);
-               ceph_decode_need(&p, end, maplen, bad);
-               if (nr_maps > 1) {
-                       dout("skipping non-latest full map %u len %d\n",
-                            epoch, maplen);
-               } else if (osdc->osdmap && osdc->osdmap->epoch >= epoch) {
-                       dout("skipping full map %u len %d, "
-                            "older than our %u\n", epoch, maplen,
-                            osdc->osdmap->epoch);
-               } else {
-                       dout("taking full map %u len %d\n", epoch, maplen);
-                       newmap = osdmap_decode(&p, p+maplen);
-                       if (IS_ERR(newmap)) {
-                               err = PTR_ERR(newmap);
-                               goto bad;
-                       }
-                       BUG_ON(!newmap);
-                       oldmap = osdc->osdmap;
-                       osdc->osdmap = newmap;
-                       if (oldmap)
-                               ceph_osdmap_destroy(oldmap);
-               }
-               p += maplen;
-               nr_maps--;
-       }
-
-done:
-       downgrade_write(&osdc->map_sem);
-       ceph_monc_got_osdmap(&osdc->client->monc, osdc->osdmap->epoch);
-       if (newmap)
-               kick_requests(osdc, NULL);
-       up_read(&osdc->map_sem);
-       wake_up_all(&osdc->client->auth_wq);
-       return;
-
-bad:
-       pr_err("osdc handle_map corrupt msg\n");
-       ceph_msg_dump(msg);
-       up_write(&osdc->map_sem);
-       return;
-}
-
-/*
- * Register request, send initial attempt.
- */
-int ceph_osdc_start_request(struct ceph_osd_client *osdc,
-                           struct ceph_osd_request *req,
-                           bool nofail)
-{
-       int rc = 0;
-
-       req->r_request->pages = req->r_pages;
-       req->r_request->nr_pages = req->r_num_pages;
-
-       register_request(osdc, req);
-
-       down_read(&osdc->map_sem);
-       mutex_lock(&osdc->request_mutex);
-       /*
-        * a racing kick_requests() may have sent the message for us
-        * while we dropped request_mutex above, so only send now if
-        * the request still han't been touched yet.
-        */
-       if (req->r_sent == 0) {
-               rc = __send_request(osdc, req);
-               if (rc) {
-                       if (nofail) {
-                               dout("osdc_start_request failed send, "
-                                    " marking %lld\n", req->r_tid);
-                               req->r_resend = true;
-                               rc = 0;
-                       } else {
-                               __unregister_request(osdc, req);
-                       }
-               }
-       }
-       mutex_unlock(&osdc->request_mutex);
-       up_read(&osdc->map_sem);
-       return rc;
-}
-
-/*
- * wait for a request to complete
- */
-int ceph_osdc_wait_request(struct ceph_osd_client *osdc,
-                          struct ceph_osd_request *req)
-{
-       int rc;
-
-       rc = wait_for_completion_interruptible(&req->r_completion);
-       if (rc < 0) {
-               mutex_lock(&osdc->request_mutex);
-               __cancel_request(req);
-               __unregister_request(osdc, req);
-               mutex_unlock(&osdc->request_mutex);
-               dout("wait_request tid %llu canceled/timed out\n", req->r_tid);
-               return rc;
-       }
-
-       dout("wait_request tid %llu result %d\n", req->r_tid, req->r_result);
-       return req->r_result;
-}
-
-/*
- * sync - wait for all in-flight requests to flush.  avoid starvation.
- */
-void ceph_osdc_sync(struct ceph_osd_client *osdc)
-{
-       struct ceph_osd_request *req;
-       u64 last_tid, next_tid = 0;
-
-       mutex_lock(&osdc->request_mutex);
-       last_tid = osdc->last_tid;
-       while (1) {
-               req = __lookup_request_ge(osdc, next_tid);
-               if (!req)
-                       break;
-               if (req->r_tid > last_tid)
-                       break;
-
-               next_tid = req->r_tid + 1;
-               if ((req->r_flags & CEPH_OSD_FLAG_WRITE) == 0)
-                       continue;
-
-               ceph_osdc_get_request(req);
-               mutex_unlock(&osdc->request_mutex);
-               dout("sync waiting on tid %llu (last is %llu)\n",
-                    req->r_tid, last_tid);
-               wait_for_completion(&req->r_safe_completion);
-               mutex_lock(&osdc->request_mutex);
-               ceph_osdc_put_request(req);
-       }
-       mutex_unlock(&osdc->request_mutex);
-       dout("sync done (thru tid %llu)\n", last_tid);
-}
-
-/*
- * init, shutdown
- */
-int ceph_osdc_init(struct ceph_osd_client *osdc, struct ceph_client *client)
-{
-       int err;
-
-       dout("init\n");
-       osdc->client = client;
-       osdc->osdmap = NULL;
-       init_rwsem(&osdc->map_sem);
-       init_completion(&osdc->map_waiters);
-       osdc->last_requested_map = 0;
-       mutex_init(&osdc->request_mutex);
-       osdc->last_tid = 0;
-       osdc->osds = RB_ROOT;
-       INIT_LIST_HEAD(&osdc->osd_lru);
-       osdc->requests = RB_ROOT;
-       INIT_LIST_HEAD(&osdc->req_lru);
-       osdc->num_requests = 0;
-       INIT_DELAYED_WORK(&osdc->timeout_work, handle_timeout);
-       INIT_DELAYED_WORK(&osdc->osds_timeout_work, handle_osds_timeout);
-
-       schedule_delayed_work(&osdc->osds_timeout_work,
-          round_jiffies_relative(osdc->client->mount_args->osd_idle_ttl * HZ));
-
-       err = -ENOMEM;
-       osdc->req_mempool = mempool_create_kmalloc_pool(10,
-                                       sizeof(struct ceph_osd_request));
-       if (!osdc->req_mempool)
-               goto out;
-
-       err = ceph_msgpool_init(&osdc->msgpool_op, OSD_OP_FRONT_LEN, 10, true,
-                               "osd_op");
-       if (err < 0)
-               goto out_mempool;
-       err = ceph_msgpool_init(&osdc->msgpool_op_reply,
-                               OSD_OPREPLY_FRONT_LEN, 10, true,
-                               "osd_op_reply");
-       if (err < 0)
-               goto out_msgpool;
-       return 0;
-
-out_msgpool:
-       ceph_msgpool_destroy(&osdc->msgpool_op);
-out_mempool:
-       mempool_destroy(osdc->req_mempool);
-out:
-       return err;
-}
-
-void ceph_osdc_stop(struct ceph_osd_client *osdc)
-{
-       cancel_delayed_work_sync(&osdc->timeout_work);
-       cancel_delayed_work_sync(&osdc->osds_timeout_work);
-       if (osdc->osdmap) {
-               ceph_osdmap_destroy(osdc->osdmap);
-               osdc->osdmap = NULL;
-       }
-       remove_old_osds(osdc, 1);
-       mempool_destroy(osdc->req_mempool);
-       ceph_msgpool_destroy(&osdc->msgpool_op);
-       ceph_msgpool_destroy(&osdc->msgpool_op_reply);
-}
-
-/*
- * Read some contiguous pages.  If we cross a stripe boundary, shorten
- * *plen.  Return number of bytes read, or error.
- */
-int ceph_osdc_readpages(struct ceph_osd_client *osdc,
-                       struct ceph_vino vino, struct ceph_file_layout *layout,
-                       u64 off, u64 *plen,
-                       u32 truncate_seq, u64 truncate_size,
-                       struct page **pages, int num_pages)
-{
-       struct ceph_osd_request *req;
-       int rc = 0;
-
-       dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
-            vino.snap, off, *plen);
-       req = ceph_osdc_new_request(osdc, layout, vino, off, plen,
-                                   CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
-                                   NULL, 0, truncate_seq, truncate_size, NULL,
-                                   false, 1);
-       if (!req)
-               return -ENOMEM;
-
-       /* it may be a short read due to an object boundary */
-       req->r_pages = pages;
-
-       dout("readpages  final extent is %llu~%llu (%d pages)\n",
-            off, *plen, req->r_num_pages);
-
-       rc = ceph_osdc_start_request(osdc, req, false);
-       if (!rc)
-               rc = ceph_osdc_wait_request(osdc, req);
-
-       ceph_osdc_put_request(req);
-       dout("readpages result %d\n", rc);
-       return rc;
-}
-
-/*
- * do a synchronous write on N pages
- */
-int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_vino vino,
-                        struct ceph_file_layout *layout,
-                        struct ceph_snap_context *snapc,
-                        u64 off, u64 len,
-                        u32 truncate_seq, u64 truncate_size,
-                        struct timespec *mtime,
-                        struct page **pages, int num_pages,
-                        int flags, int do_sync, bool nofail)
-{
-       struct ceph_osd_request *req;
-       int rc = 0;
-
-       BUG_ON(vino.snap != CEPH_NOSNAP);
-       req = ceph_osdc_new_request(osdc, layout, vino, off, &len,
-                                   CEPH_OSD_OP_WRITE,
-                                   flags | CEPH_OSD_FLAG_ONDISK |
-                                           CEPH_OSD_FLAG_WRITE,
-                                   snapc, do_sync,
-                                   truncate_seq, truncate_size, mtime,
-                                   nofail, 1);
-       if (!req)
-               return -ENOMEM;
-
-       /* it may be a short write due to an object boundary */
-       req->r_pages = pages;
-       dout("writepages %llu~%llu (%d pages)\n", off, len,
-            req->r_num_pages);
-
-       rc = ceph_osdc_start_request(osdc, req, nofail);
-       if (!rc)
-               rc = ceph_osdc_wait_request(osdc, req);
-
-       ceph_osdc_put_request(req);
-       if (rc == 0)
-               rc = len;
-       dout("writepages result %d\n", rc);
-       return rc;
-}
-
-/*
- * handle incoming message
- */
-static void dispatch(struct ceph_connection *con, struct ceph_msg *msg)
-{
-       struct ceph_osd *osd = con->private;
-       struct ceph_osd_client *osdc;
-       int type = le16_to_cpu(msg->hdr.type);
-
-       if (!osd)
-               goto out;
-       osdc = osd->o_osdc;
-
-       switch (type) {
-       case CEPH_MSG_OSD_MAP:
-               ceph_osdc_handle_map(osdc, msg);
-               break;
-       case CEPH_MSG_OSD_OPREPLY:
-               handle_reply(osdc, msg, con);
-               break;
-
-       default:
-               pr_err("received unknown message type %d %s\n", type,
-                      ceph_msg_type_name(type));
-       }
-out:
-       ceph_msg_put(msg);
-}
-
-/*
- * lookup and return message for incoming reply.  set up reply message
- * pages.
- */
-static struct ceph_msg *get_reply(struct ceph_connection *con,
-                                 struct ceph_msg_header *hdr,
-                                 int *skip)
-{
-       struct ceph_osd *osd = con->private;
-       struct ceph_osd_client *osdc = osd->o_osdc;
-       struct ceph_msg *m;
-       struct ceph_osd_request *req;
-       int front = le32_to_cpu(hdr->front_len);
-       int data_len = le32_to_cpu(hdr->data_len);
-       u64 tid;
-
-       tid = le64_to_cpu(hdr->tid);
-       mutex_lock(&osdc->request_mutex);
-       req = __lookup_request(osdc, tid);
-       if (!req) {
-               *skip = 1;
-               m = NULL;
-               pr_info("get_reply unknown tid %llu from osd%d\n", tid,
-                       osd->o_osd);
-               goto out;
-       }
-
-       if (req->r_con_filling_msg) {
-               dout("get_reply revoking msg %p from old con %p\n",
-                    req->r_reply, req->r_con_filling_msg);
-               ceph_con_revoke_message(req->r_con_filling_msg, req->r_reply);
-               ceph_con_put(req->r_con_filling_msg);
-               req->r_con_filling_msg = NULL;
-       }
-
-       if (front > req->r_reply->front.iov_len) {
-               pr_warning("get_reply front %d > preallocated %d\n",
-                          front, (int)req->r_reply->front.iov_len);
-               m = ceph_msg_new(CEPH_MSG_OSD_OPREPLY, front, GFP_NOFS);
-               if (!m)
-                       goto out;
-               ceph_msg_put(req->r_reply);
-               req->r_reply = m;
-       }
-       m = ceph_msg_get(req->r_reply);
-
-       if (data_len > 0) {
-               unsigned data_off = le16_to_cpu(hdr->data_off);
-               int want = calc_pages_for(data_off & ~PAGE_MASK, data_len);
-
-               if (unlikely(req->r_num_pages < want)) {
-                       pr_warning("tid %lld reply %d > expected %d pages\n",
-                                  tid, want, m->nr_pages);
-                       *skip = 1;
-                       ceph_msg_put(m);
-                       m = NULL;
-                       goto out;
-               }
-               m->pages = req->r_pages;
-               m->nr_pages = req->r_num_pages;
-       }
-       *skip = 0;
-       req->r_con_filling_msg = ceph_con_get(con);
-       dout("get_reply tid %lld %p\n", tid, m);
-
-out:
-       mutex_unlock(&osdc->request_mutex);
-       return m;
-
-}
-
-static struct ceph_msg *alloc_msg(struct ceph_connection *con,
-                                 struct ceph_msg_header *hdr,
-                                 int *skip)
-{
-       struct ceph_osd *osd = con->private;
-       int type = le16_to_cpu(hdr->type);
-       int front = le32_to_cpu(hdr->front_len);
-
-       switch (type) {
-       case CEPH_MSG_OSD_MAP:
-               return ceph_msg_new(type, front, GFP_NOFS);
-       case CEPH_MSG_OSD_OPREPLY:
-               return get_reply(con, hdr, skip);
-       default:
-               pr_info("alloc_msg unexpected msg type %d from osd%d\n", type,
-                       osd->o_osd);
-               *skip = 1;
-               return NULL;
-       }
-}
-
-/*
- * Wrappers to refcount containing ceph_osd struct
- */
-static struct ceph_connection *get_osd_con(struct ceph_connection *con)
-{
-       struct ceph_osd *osd = con->private;
-       if (get_osd(osd))
-               return con;
-       return NULL;
-}
-
-static void put_osd_con(struct ceph_connection *con)
-{
-       struct ceph_osd *osd = con->private;
-       put_osd(osd);
-}
-
-/*
- * authentication
- */
-static int get_authorizer(struct ceph_connection *con,
-                         void **buf, int *len, int *proto,
-                         void **reply_buf, int *reply_len, int force_new)
-{
-       struct ceph_osd *o = con->private;
-       struct ceph_osd_client *osdc = o->o_osdc;
-       struct ceph_auth_client *ac = osdc->client->monc.auth;
-       int ret = 0;
-
-       if (force_new && o->o_authorizer) {
-               ac->ops->destroy_authorizer(ac, o->o_authorizer);
-               o->o_authorizer = NULL;
-       }
-       if (o->o_authorizer == NULL) {
-               ret = ac->ops->create_authorizer(
-                       ac, CEPH_ENTITY_TYPE_OSD,
-                       &o->o_authorizer,
-                       &o->o_authorizer_buf,
-                       &o->o_authorizer_buf_len,
-                       &o->o_authorizer_reply_buf,
-                       &o->o_authorizer_reply_buf_len);
-               if (ret)
-                       return ret;
-       }
-
-       *proto = ac->protocol;
-       *buf = o->o_authorizer_buf;
-       *len = o->o_authorizer_buf_len;
-       *reply_buf = o->o_authorizer_reply_buf;
-       *reply_len = o->o_authorizer_reply_buf_len;
-       return 0;
-}
-
-
-static int verify_authorizer_reply(struct ceph_connection *con, int len)
-{
-       struct ceph_osd *o = con->private;
-       struct ceph_osd_client *osdc = o->o_osdc;
-       struct ceph_auth_client *ac = osdc->client->monc.auth;
-
-       return ac->ops->verify_authorizer_reply(ac, o->o_authorizer, len);
-}
-
-static int invalidate_authorizer(struct ceph_connection *con)
-{
-       struct ceph_osd *o = con->private;
-       struct ceph_osd_client *osdc = o->o_osdc;
-       struct ceph_auth_client *ac = osdc->client->monc.auth;
-
-       if (ac->ops->invalidate_authorizer)
-               ac->ops->invalidate_authorizer(ac, CEPH_ENTITY_TYPE_OSD);
-
-       return ceph_monc_validate_auth(&osdc->client->monc);
-}
-
-static const struct ceph_connection_operations osd_con_ops = {
-       .get = get_osd_con,
-       .put = put_osd_con,
-       .dispatch = dispatch,
-       .get_authorizer = get_authorizer,
-       .verify_authorizer_reply = verify_authorizer_reply,
-       .invalidate_authorizer = invalidate_authorizer,
-       .alloc_msg = alloc_msg,
-       .fault = osd_reset,
-};
diff --git a/fs/ceph/osd_client.h b/fs/ceph/osd_client.h

deleted file mode 100644 (file)

index ce77698..0000000
--- a/fs/ceph/osd_client.h
+++ /dev/null
@@ -1,167 +0,0 @@
-#ifndef _FS_CEPH_OSD_CLIENT_H
-#define _FS_CEPH_OSD_CLIENT_H
-
-#include <linux/completion.h>
-#include <linux/kref.h>
-#include <linux/mempool.h>
-#include <linux/rbtree.h>
-
-#include "types.h"
-#include "osdmap.h"
-#include "messenger.h"
-
-struct ceph_msg;
-struct ceph_snap_context;
-struct ceph_osd_request;
-struct ceph_osd_client;
-struct ceph_authorizer;
-
-/*
- * completion callback for async writepages
- */
-typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *,
-                                    struct ceph_msg *);
-
-/* a given osd we're communicating with */
-struct ceph_osd {
-       atomic_t o_ref;
-       struct ceph_osd_client *o_osdc;
-       int o_osd;
-       int o_incarnation;
-       struct rb_node o_node;
-       struct ceph_connection o_con;
-       struct list_head o_requests;
-       struct list_head o_osd_lru;
-       struct ceph_authorizer *o_authorizer;
-       void *o_authorizer_buf, *o_authorizer_reply_buf;
-       size_t o_authorizer_buf_len, o_authorizer_reply_buf_len;
-       unsigned long lru_ttl;
-       int o_marked_for_keepalive;
-       struct list_head o_keepalive_item;
-};
-
-/* an in-flight request */
-struct ceph_osd_request {
-       u64             r_tid;              /* unique for this client */
-       struct rb_node  r_node;
-       struct list_head r_req_lru_item;
-       struct list_head r_osd_item;
-       struct ceph_osd *r_osd;
-       struct ceph_pg   r_pgid;
-       int              r_pg_osds[CEPH_PG_MAX_SIZE];
-       int              r_num_pg_osds;
-
-       struct ceph_connection *r_con_filling_msg;
-
-       struct ceph_msg  *r_request, *r_reply;
-       int               r_result;
-       int               r_flags;     /* any additional flags for the osd */
-       u32               r_sent;      /* >0 if r_request is sending/sent */
-       int               r_got_reply;
-
-       struct ceph_osd_client *r_osdc;
-       struct kref       r_kref;
-       bool              r_mempool;
-       struct completion r_completion, r_safe_completion;
-       ceph_osdc_callback_t r_callback, r_safe_callback;
-       struct ceph_eversion r_reassert_version;
-       struct list_head  r_unsafe_item;
-
-       struct inode *r_inode;                /* for use by callbacks */
-
-       char              r_oid[40];          /* object name */
-       int               r_oid_len;
-       unsigned long     r_stamp;            /* send OR check time */
-       bool              r_resend;           /* msg send failed, needs retry */
-
-       struct ceph_file_layout r_file_layout;
-       struct ceph_snap_context *r_snapc;    /* snap context for writes */
-       unsigned          r_num_pages;        /* size of page array (follows) */
-       struct page     **r_pages;            /* pages for data payload */
-       int               r_pages_from_pool;
-       int               r_own_pages;        /* if true, i own page list */
-};
-
-struct ceph_osd_client {
-       struct ceph_client     *client;
-
-       struct ceph_osdmap     *osdmap;       /* current map */
-       struct rw_semaphore    map_sem;
-       struct completion      map_waiters;
-       u64                    last_requested_map;
-
-       struct mutex           request_mutex;
-       struct rb_root         osds;          /* osds */
-       struct list_head       osd_lru;       /* idle osds */
-       u64                    timeout_tid;   /* tid of timeout triggering rq */
-       u64                    last_tid;      /* tid of last request */
-       struct rb_root         requests;      /* pending requests */
-       struct list_head       req_lru;       /* pending requests lru */
-       int                    num_requests;
-       struct delayed_work    timeout_work;
-       struct delayed_work    osds_timeout_work;
-#ifdef CONFIG_DEBUG_FS
-       struct dentry          *debugfs_file;
-#endif
-
-       mempool_t              *req_mempool;
-
-       struct ceph_msgpool     msgpool_op;
-       struct ceph_msgpool     msgpool_op_reply;
-};
-
-extern int ceph_osdc_init(struct ceph_osd_client *osdc,
-                         struct ceph_client *client);
-extern void ceph_osdc_stop(struct ceph_osd_client *osdc);
-
-extern void ceph_osdc_handle_reply(struct ceph_osd_client *osdc,
-                                  struct ceph_msg *msg);
-extern void ceph_osdc_handle_map(struct ceph_osd_client *osdc,
-                                struct ceph_msg *msg);
-
-extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *,
-                                     struct ceph_file_layout *layout,
-                                     struct ceph_vino vino,
-                                     u64 offset, u64 *len, int op, int flags,
-                                     struct ceph_snap_context *snapc,
-                                     int do_sync, u32 truncate_seq,
-                                     u64 truncate_size,
-                                     struct timespec *mtime,
-                                     bool use_mempool, int num_reply);
-
-static inline void ceph_osdc_get_request(struct ceph_osd_request *req)
-{
-       kref_get(&req->r_kref);
-}
-extern void ceph_osdc_release_request(struct kref *kref);
-static inline void ceph_osdc_put_request(struct ceph_osd_request *req)
-{
-       kref_put(&req->r_kref, ceph_osdc_release_request);
-}
-
-extern int ceph_osdc_start_request(struct ceph_osd_client *osdc,
-                                  struct ceph_osd_request *req,
-                                  bool nofail);
-extern int ceph_osdc_wait_request(struct ceph_osd_client *osdc,
-                                 struct ceph_osd_request *req);
-extern void ceph_osdc_sync(struct ceph_osd_client *osdc);
-
-extern int ceph_osdc_readpages(struct ceph_osd_client *osdc,
-                              struct ceph_vino vino,
-                              struct ceph_file_layout *layout,
-                              u64 off, u64 *plen,
-                              u32 truncate_seq, u64 truncate_size,
-                              struct page **pages, int nr_pages);
-
-extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
-                               struct ceph_vino vino,
-                               struct ceph_file_layout *layout,
-                               struct ceph_snap_context *sc,
-                               u64 off, u64 len,
-                               u32 truncate_seq, u64 truncate_size,
-                               struct timespec *mtime,
-                               struct page **pages, int nr_pages,
-                               int flags, int do_sync, bool nofail);
-
-#endif
-
diff --git a/fs/ceph/osdmap.c b/fs/ceph/osdmap.c

deleted file mode 100644 (file)

index e31f118..0000000
--- a/fs/ceph/osdmap.c
+++ /dev/null
@@ -1,1110 +0,0 @@
-
-#include "ceph_debug.h"
-
-#include <linux/slab.h>
-#include <asm/div64.h>
-
-#include "super.h"
-#include "osdmap.h"
-#include "crush/hash.h"
-#include "crush/mapper.h"
-#include "decode.h"
-
-char *ceph_osdmap_state_str(char *str, int len, int state)
-{
-       int flag = 0;
-
-       if (!len)
-               goto done;
-
-       *str = '\0';
-       if (state) {
-               if (state & CEPH_OSD_EXISTS) {
-                       snprintf(str, len, "exists");
-                       flag = 1;
-               }
-               if (state & CEPH_OSD_UP) {
-                       snprintf(str, len, "%s%s%s", str, (flag ? ", " : ""),
-                                "up");
-                       flag = 1;
-               }
-       } else {
-               snprintf(str, len, "doesn't exist");
-       }
-done:
-       return str;
-}
-
-/* maps */
-
-static int calc_bits_of(unsigned t)
-{
-       int b = 0;
-       while (t) {
-               t = t >> 1;
-               b++;
-       }
-       return b;
-}
-
-/*
- * the foo_mask is the smallest value 2^n-1 that is >= foo.
- */
-static void calc_pg_masks(struct ceph_pg_pool_info *pi)
-{
-       pi->pg_num_mask = (1 << calc_bits_of(le32_to_cpu(pi->v.pg_num)-1)) - 1;
-       pi->pgp_num_mask =
-               (1 << calc_bits_of(le32_to_cpu(pi->v.pgp_num)-1)) - 1;
-       pi->lpg_num_mask =
-               (1 << calc_bits_of(le32_to_cpu(pi->v.lpg_num)-1)) - 1;
-       pi->lpgp_num_mask =
-               (1 << calc_bits_of(le32_to_cpu(pi->v.lpgp_num)-1)) - 1;
-}
-
-/*
- * decode crush map
- */
-static int crush_decode_uniform_bucket(void **p, void *end,
-                                      struct crush_bucket_uniform *b)
-{
-       dout("crush_decode_uniform_bucket %p to %p\n", *p, end);
-       ceph_decode_need(p, end, (1+b->h.size) * sizeof(u32), bad);
-       b->item_weight = ceph_decode_32(p);
-       return 0;
-bad:
-       return -EINVAL;
-}
-
-static int crush_decode_list_bucket(void **p, void *end,
-                                   struct crush_bucket_list *b)
-{
-       int j;
-       dout("crush_decode_list_bucket %p to %p\n", *p, end);
-       b->item_weights = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
-       if (b->item_weights == NULL)
-               return -ENOMEM;
-       b->sum_weights = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
-       if (b->sum_weights == NULL)
-               return -ENOMEM;
-       ceph_decode_need(p, end, 2 * b->h.size * sizeof(u32), bad);
-       for (j = 0; j < b->h.size; j++) {
-               b->item_weights[j] = ceph_decode_32(p);
-               b->sum_weights[j] = ceph_decode_32(p);
-       }
-       return 0;
-bad:
-       return -EINVAL;
-}
-
-static int crush_decode_tree_bucket(void **p, void *end,
-                                   struct crush_bucket_tree *b)
-{
-       int j;
-       dout("crush_decode_tree_bucket %p to %p\n", *p, end);
-       ceph_decode_32_safe(p, end, b->num_nodes, bad);
-       b->node_weights = kcalloc(b->num_nodes, sizeof(u32), GFP_NOFS);
-       if (b->node_weights == NULL)
-               return -ENOMEM;
-       ceph_decode_need(p, end, b->num_nodes * sizeof(u32), bad);
-       for (j = 0; j < b->num_nodes; j++)
-               b->node_weights[j] = ceph_decode_32(p);
-       return 0;
-bad:
-       return -EINVAL;
-}
-
-static int crush_decode_straw_bucket(void **p, void *end,
-                                    struct crush_bucket_straw *b)
-{
-       int j;
-       dout("crush_decode_straw_bucket %p to %p\n", *p, end);
-       b->item_weights = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
-       if (b->item_weights == NULL)
-               return -ENOMEM;
-       b->straws = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
-       if (b->straws == NULL)
-               return -ENOMEM;
-       ceph_decode_need(p, end, 2 * b->h.size * sizeof(u32), bad);
-       for (j = 0; j < b->h.size; j++) {
-               b->item_weights[j] = ceph_decode_32(p);
-               b->straws[j] = ceph_decode_32(p);
-       }
-       return 0;
-bad:
-       return -EINVAL;
-}
-
-static struct crush_map *crush_decode(void *pbyval, void *end)
-{
-       struct crush_map *c;
-       int err = -EINVAL;
-       int i, j;
-       void **p = &pbyval;
-       void *start = pbyval;
-       u32 magic;
-
-       dout("crush_decode %p to %p len %d\n", *p, end, (int)(end - *p));
-
-       c = kzalloc(sizeof(*c), GFP_NOFS);
-       if (c == NULL)
-               return ERR_PTR(-ENOMEM);
-
-       ceph_decode_need(p, end, 4*sizeof(u32), bad);
-       magic = ceph_decode_32(p);
-       if (magic != CRUSH_MAGIC) {
-               pr_err("crush_decode magic %x != current %x\n",
-                      (unsigned)magic, (unsigned)CRUSH_MAGIC);
-               goto bad;
-       }
-       c->max_buckets = ceph_decode_32(p);
-       c->max_rules = ceph_decode_32(p);
-       c->max_devices = ceph_decode_32(p);
-
-       c->device_parents = kcalloc(c->max_devices, sizeof(u32), GFP_NOFS);
-       if (c->device_parents == NULL)
-               goto badmem;
-       c->bucket_parents = kcalloc(c->max_buckets, sizeof(u32), GFP_NOFS);
-       if (c->bucket_parents == NULL)
-               goto badmem;
-
-       c->buckets = kcalloc(c->max_buckets, sizeof(*c->buckets), GFP_NOFS);
-       if (c->buckets == NULL)
-               goto badmem;
-       c->rules = kcalloc(c->max_rules, sizeof(*c->rules), GFP_NOFS);
-       if (c->rules == NULL)
-               goto badmem;
-
-       /* buckets */
-       for (i = 0; i < c->max_buckets; i++) {
-               int size = 0;
-               u32 alg;
-               struct crush_bucket *b;
-
-               ceph_decode_32_safe(p, end, alg, bad);
-               if (alg == 0) {
-                       c->buckets[i] = NULL;
-                       continue;
-               }
-               dout("crush_decode bucket %d off %x %p to %p\n",
-                    i, (int)(*p-start), *p, end);
-
-               switch (alg) {
-               case CRUSH_BUCKET_UNIFORM:
-                       size = sizeof(struct crush_bucket_uniform);
-                       break;
-               case CRUSH_BUCKET_LIST:
-                       size = sizeof(struct crush_bucket_list);
-                       break;
-               case CRUSH_BUCKET_TREE:
-                       size = sizeof(struct crush_bucket_tree);
-                       break;
-               case CRUSH_BUCKET_STRAW:
-                       size = sizeof(struct crush_bucket_straw);
-                       break;
-               default:
-                       err = -EINVAL;
-                       goto bad;
-               }
-               BUG_ON(size == 0);
-               b = c->buckets[i] = kzalloc(size, GFP_NOFS);
-               if (b == NULL)
-                       goto badmem;
-
-               ceph_decode_need(p, end, 4*sizeof(u32), bad);
-               b->id = ceph_decode_32(p);
-               b->type = ceph_decode_16(p);
-               b->alg = ceph_decode_8(p);
-               b->hash = ceph_decode_8(p);
-               b->weight = ceph_decode_32(p);
-               b->size = ceph_decode_32(p);
-
-               dout("crush_decode bucket size %d off %x %p to %p\n",
-                    b->size, (int)(*p-start), *p, end);
-
-               b->items = kcalloc(b->size, sizeof(__s32), GFP_NOFS);
-               if (b->items == NULL)
-                       goto badmem;
-               b->perm = kcalloc(b->size, sizeof(u32), GFP_NOFS);
-               if (b->perm == NULL)
-                       goto badmem;
-               b->perm_n = 0;
-
-               ceph_decode_need(p, end, b->size*sizeof(u32), bad);
-               for (j = 0; j < b->size; j++)
-                       b->items[j] = ceph_decode_32(p);
-
-               switch (b->alg) {
-               case CRUSH_BUCKET_UNIFORM:
-                       err = crush_decode_uniform_bucket(p, end,
-                                 (struct crush_bucket_uniform *)b);
-                       if (err < 0)
-                               goto bad;
-                       break;
-               case CRUSH_BUCKET_LIST:
-                       err = crush_decode_list_bucket(p, end,
-                              (struct crush_bucket_list *)b);
-                       if (err < 0)
-                               goto bad;
-                       break;
-               case CRUSH_BUCKET_TREE:
-                       err = crush_decode_tree_bucket(p, end,
-                               (struct crush_bucket_tree *)b);
-                       if (err < 0)
-                               goto bad;
-                       break;
-               case CRUSH_BUCKET_STRAW:
-                       err = crush_decode_straw_bucket(p, end,
-                               (struct crush_bucket_straw *)b);
-                       if (err < 0)
-                               goto bad;
-                       break;
-               }
-       }
-
-       /* rules */
-       dout("rule vec is %p\n", c->rules);
-       for (i = 0; i < c->max_rules; i++) {
-               u32 yes;
-               struct crush_rule *r;
-
-               ceph_decode_32_safe(p, end, yes, bad);
-               if (!yes) {
-                       dout("crush_decode NO rule %d off %x %p to %p\n",
-                            i, (int)(*p-start), *p, end);
-                       c->rules[i] = NULL;
-                       continue;
-               }
-
-               dout("crush_decode rule %d off %x %p to %p\n",
-                    i, (int)(*p-start), *p, end);
-
-               /* len */
-               ceph_decode_32_safe(p, end, yes, bad);
-#if BITS_PER_LONG == 32
-               err = -EINVAL;
-               if (yes > ULONG_MAX / sizeof(struct crush_rule_step))
-                       goto bad;
-#endif
-               r = c->rules[i] = kmalloc(sizeof(*r) +
-                                         yes*sizeof(struct crush_rule_step),
-                                         GFP_NOFS);
-               if (r == NULL)
-                       goto badmem;
-               dout(" rule %d is at %p\n", i, r);
-               r->len = yes;
-               ceph_decode_copy_safe(p, end, &r->mask, 4, bad); /* 4 u8's */
-               ceph_decode_need(p, end, r->len*3*sizeof(u32), bad);
-               for (j = 0; j < r->len; j++) {
-                       r->steps[j].op = ceph_decode_32(p);
-                       r->steps[j].arg1 = ceph_decode_32(p);
-                       r->steps[j].arg2 = ceph_decode_32(p);
-               }
-       }
-
-       /* ignore trailing name maps. */
-
-       dout("crush_decode success\n");
-       return c;
-
-badmem:
-       err = -ENOMEM;
-bad:
-       dout("crush_decode fail %d\n", err);
-       crush_destroy(c);
-       return ERR_PTR(err);
-}
-
-/*
- * rbtree of pg_mapping for handling pg_temp (explicit mapping of pgid
- * to a set of osds)
- */
-static int pgid_cmp(struct ceph_pg l, struct ceph_pg r)
-{
-       u64 a = *(u64 *)&l;
-       u64 b = *(u64 *)&r;
-
-       if (a < b)
-               return -1;
-       if (a > b)
-               return 1;
-       return 0;
-}
-
-static int __insert_pg_mapping(struct ceph_pg_mapping *new,
-                              struct rb_root *root)
-{
-       struct rb_node **p = &root->rb_node;
-       struct rb_node *parent = NULL;
-       struct ceph_pg_mapping *pg = NULL;
-       int c;
-
-       while (*p) {
-               parent = *p;
-               pg = rb_entry(parent, struct ceph_pg_mapping, node);
-               c = pgid_cmp(new->pgid, pg->pgid);
-               if (c < 0)
-                       p = &(*p)->rb_left;
-               else if (c > 0)
-                       p = &(*p)->rb_right;
-               else
-                       return -EEXIST;
-       }
-
-       rb_link_node(&new->node, parent, p);
-       rb_insert_color(&new->node, root);
-       return 0;
-}
-
-static struct ceph_pg_mapping *__lookup_pg_mapping(struct rb_root *root,
-                                                  struct ceph_pg pgid)
-{
-       struct rb_node *n = root->rb_node;
-       struct ceph_pg_mapping *pg;
-       int c;
-
-       while (n) {
-               pg = rb_entry(n, struct ceph_pg_mapping, node);
-               c = pgid_cmp(pgid, pg->pgid);
-               if (c < 0)
-                       n = n->rb_left;
-               else if (c > 0)
-                       n = n->rb_right;
-               else
-                       return pg;
-       }
-       return NULL;
-}
-
-/*
- * rbtree of pg pool info
- */
-static int __insert_pg_pool(struct rb_root *root, struct ceph_pg_pool_info *new)
-{
-       struct rb_node **p = &root->rb_node;
-       struct rb_node *parent = NULL;
-       struct ceph_pg_pool_info *pi = NULL;
-
-       while (*p) {
-               parent = *p;
-               pi = rb_entry(parent, struct ceph_pg_pool_info, node);
-               if (new->id < pi->id)
-                       p = &(*p)->rb_left;
-               else if (new->id > pi->id)
-                       p = &(*p)->rb_right;
-               else
-                       return -EEXIST;
-       }
-
-       rb_link_node(&new->node, parent, p);
-       rb_insert_color(&new->node, root);
-       return 0;
-}
-
-static struct ceph_pg_pool_info *__lookup_pg_pool(struct rb_root *root, int id)
-{
-       struct ceph_pg_pool_info *pi;
-       struct rb_node *n = root->rb_node;
-
-       while (n) {
-               pi = rb_entry(n, struct ceph_pg_pool_info, node);
-               if (id < pi->id)
-                       n = n->rb_left;
-               else if (id > pi->id)
-                       n = n->rb_right;
-               else
-                       return pi;
-       }
-       return NULL;
-}
-
-static void __remove_pg_pool(struct rb_root *root, struct ceph_pg_pool_info *pi)
-{
-       rb_erase(&pi->node, root);
-       kfree(pi->name);
-       kfree(pi);
-}
-
-static int __decode_pool(void **p, void *end, struct ceph_pg_pool_info *pi)
-{
-       unsigned n, m;
-
-       ceph_decode_copy(p, &pi->v, sizeof(pi->v));
-       calc_pg_masks(pi);
-
-       /* num_snaps * snap_info_t */
-       n = le32_to_cpu(pi->v.num_snaps);
-       while (n--) {
-               ceph_decode_need(p, end, sizeof(u64) + 1 + sizeof(u64) +
-                                sizeof(struct ceph_timespec), bad);
-               *p += sizeof(u64) +       /* key */
-                       1 + sizeof(u64) + /* u8, snapid */
-                       sizeof(struct ceph_timespec);
-               m = ceph_decode_32(p);    /* snap name */
-               *p += m;
-       }
-
-       *p += le32_to_cpu(pi->v.num_removed_snap_intervals) * sizeof(u64) * 2;
-       return 0;
-
-bad:
-       return -EINVAL;
-}
-
-static int __decode_pool_names(void **p, void *end, struct ceph_osdmap *map)
-{
-       struct ceph_pg_pool_info *pi;
-       u32 num, len, pool;
-
-       ceph_decode_32_safe(p, end, num, bad);
-       dout(" %d pool names\n", num);
-       while (num--) {
-               ceph_decode_32_safe(p, end, pool, bad);
-               ceph_decode_32_safe(p, end, len, bad);
-               dout("  pool %d len %d\n", pool, len);
-               pi = __lookup_pg_pool(&map->pg_pools, pool);
-               if (pi) {
-                       kfree(pi->name);
-                       pi->name = kmalloc(len + 1, GFP_NOFS);
-                       if (pi->name) {
-                               memcpy(pi->name, *p, len);
-                               pi->name[len] = '\0';
-                               dout("  name is %s\n", pi->name);
-                       }
-               }
-               *p += len;
-       }
-       return 0;
-
-bad:
-       return -EINVAL;
-}
-
-/*
- * osd map
- */
-void ceph_osdmap_destroy(struct ceph_osdmap *map)
-{
-       dout("osdmap_destroy %p\n", map);
-       if (map->crush)
-               crush_destroy(map->crush);
-       while (!RB_EMPTY_ROOT(&map->pg_temp)) {
-               struct ceph_pg_mapping *pg =
-                       rb_entry(rb_first(&map->pg_temp),
-                                struct ceph_pg_mapping, node);
-               rb_erase(&pg->node, &map->pg_temp);
-               kfree(pg);
-       }
-       while (!RB_EMPTY_ROOT(&map->pg_pools)) {
-               struct ceph_pg_pool_info *pi =
-                       rb_entry(rb_first(&map->pg_pools),
-                                struct ceph_pg_pool_info, node);
-               __remove_pg_pool(&map->pg_pools, pi);
-       }
-       kfree(map->osd_state);
-       kfree(map->osd_weight);
-       kfree(map->osd_addr);
-       kfree(map);
-}
-
-/*
- * adjust max osd value.  reallocate arrays.
- */
-static int osdmap_set_max_osd(struct ceph_osdmap *map, int max)
-{
-       u8 *state;
-       struct ceph_entity_addr *addr;
-       u32 *weight;
-
-       state = kcalloc(max, sizeof(*state), GFP_NOFS);
-       addr = kcalloc(max, sizeof(*addr), GFP_NOFS);
-       weight = kcalloc(max, sizeof(*weight), GFP_NOFS);
-       if (state == NULL || addr == NULL || weight == NULL) {
-               kfree(state);
-               kfree(addr);
-               kfree(weight);
-               return -ENOMEM;
-       }
-
-       /* copy old? */
-       if (map->osd_state) {
-               memcpy(state, map->osd_state, map->max_osd*sizeof(*state));
-               memcpy(addr, map->osd_addr, map->max_osd*sizeof(*addr));
-               memcpy(weight, map->osd_weight, map->max_osd*sizeof(*weight));
-               kfree(map->osd_state);
-               kfree(map->osd_addr);
-               kfree(map->osd_weight);
-       }
-
-       map->osd_state = state;
-       map->osd_weight = weight;
-       map->osd_addr = addr;
-       map->max_osd = max;
-       return 0;
-}
-
-/*
- * decode a full map.
- */
-struct ceph_osdmap *osdmap_decode(void **p, void *end)
-{
-       struct ceph_osdmap *map;
-       u16 version;
-       u32 len, max, i;
-       u8 ev;
-       int err = -EINVAL;
-       void *start = *p;
-       struct ceph_pg_pool_info *pi;
-
-       dout("osdmap_decode %p to %p len %d\n", *p, end, (int)(end - *p));
-
-       map = kzalloc(sizeof(*map), GFP_NOFS);
-       if (map == NULL)
-               return ERR_PTR(-ENOMEM);
-       map->pg_temp = RB_ROOT;
-
-       ceph_decode_16_safe(p, end, version, bad);
-       if (version > CEPH_OSDMAP_VERSION) {
-               pr_warning("got unknown v %d > %d of osdmap\n", version,
-                          CEPH_OSDMAP_VERSION);
-               goto bad;
-       }
-
-       ceph_decode_need(p, end, 2*sizeof(u64)+6*sizeof(u32), bad);
-       ceph_decode_copy(p, &map->fsid, sizeof(map->fsid));
-       map->epoch = ceph_decode_32(p);
-       ceph_decode_copy(p, &map->created, sizeof(map->created));
-       ceph_decode_copy(p, &map->modified, sizeof(map->modified));
-
-       ceph_decode_32_safe(p, end, max, bad);
-       while (max--) {
-               ceph_decode_need(p, end, 4 + 1 + sizeof(pi->v), bad);
-               pi = kzalloc(sizeof(*pi), GFP_NOFS);
-               if (!pi)
-                       goto bad;
-               pi->id = ceph_decode_32(p);
-               ev = ceph_decode_8(p); /* encoding version */
-               if (ev > CEPH_PG_POOL_VERSION) {
-                       pr_warning("got unknown v %d > %d of ceph_pg_pool\n",
-                                  ev, CEPH_PG_POOL_VERSION);
-                       kfree(pi);
-                       goto bad;
-               }
-               err = __decode_pool(p, end, pi);
-               if (err < 0)
-                       goto bad;
-               __insert_pg_pool(&map->pg_pools, pi);
-       }
-
-       if (version >= 5 && __decode_pool_names(p, end, map) < 0)
-               goto bad;
-
-       ceph_decode_32_safe(p, end, map->pool_max, bad);
-
-       ceph_decode_32_safe(p, end, map->flags, bad);
-
-       max = ceph_decode_32(p);
-
-       /* (re)alloc osd arrays */
-       err = osdmap_set_max_osd(map, max);
-       if (err < 0)
-               goto bad;
-       dout("osdmap_decode max_osd = %d\n", map->max_osd);
-
-       /* osds */
-       err = -EINVAL;
-       ceph_decode_need(p, end, 3*sizeof(u32) +
-                        map->max_osd*(1 + sizeof(*map->osd_weight) +
-                                      sizeof(*map->osd_addr)), bad);
-       *p += 4; /* skip length field (should match max) */
-       ceph_decode_copy(p, map->osd_state, map->max_osd);
-
-       *p += 4; /* skip length field (should match max) */
-       for (i = 0; i < map->max_osd; i++)
-               map->osd_weight[i] = ceph_decode_32(p);
-
-       *p += 4; /* skip length field (should match max) */
-       ceph_decode_copy(p, map->osd_addr, map->max_osd*sizeof(*map->osd_addr));
-       for (i = 0; i < map->max_osd; i++)
-               ceph_decode_addr(&map->osd_addr[i]);
-
-       /* pg_temp */
-       ceph_decode_32_safe(p, end, len, bad);
-       for (i = 0; i < len; i++) {
-               int n, j;
-               struct ceph_pg pgid;
-               struct ceph_pg_mapping *pg;
-
-               ceph_decode_need(p, end, sizeof(u32) + sizeof(u64), bad);
-               ceph_decode_copy(p, &pgid, sizeof(pgid));
-               n = ceph_decode_32(p);
-               ceph_decode_need(p, end, n * sizeof(u32), bad);
-               err = -ENOMEM;
-               pg = kmalloc(sizeof(*pg) + n*sizeof(u32), GFP_NOFS);
-               if (!pg)
-                       goto bad;
-               pg->pgid = pgid;
-               pg->len = n;
-               for (j = 0; j < n; j++)
-                       pg->osds[j] = ceph_decode_32(p);
-
-               err = __insert_pg_mapping(pg, &map->pg_temp);
-               if (err)
-                       goto bad;
-               dout(" added pg_temp %llx len %d\n", *(u64 *)&pgid, len);
-       }
-
-       /* crush */
-       ceph_decode_32_safe(p, end, len, bad);
-       dout("osdmap_decode crush len %d from off 0x%x\n", len,
-            (int)(*p - start));
-       ceph_decode_need(p, end, len, bad);
-       map->crush = crush_decode(*p, end);
-       *p += len;
-       if (IS_ERR(map->crush)) {
-               err = PTR_ERR(map->crush);
-               map->crush = NULL;
-               goto bad;
-       }
-
-       /* ignore the rest of the map */
-       *p = end;
-
-       dout("osdmap_decode done %p %p\n", *p, end);
-       return map;
-
-bad:
-       dout("osdmap_decode fail\n");
-       ceph_osdmap_destroy(map);
-       return ERR_PTR(err);
-}
-
-/*
- * decode and apply an incremental map update.
- */
-struct ceph_osdmap *osdmap_apply_incremental(void **p, void *end,
-                                            struct ceph_osdmap *map,
-                                            struct ceph_messenger *msgr)
-{
-       struct crush_map *newcrush = NULL;
-       struct ceph_fsid fsid;
-       u32 epoch = 0;
-       struct ceph_timespec modified;
-       u32 len, pool;
-       __s32 new_pool_max, new_flags, max;
-       void *start = *p;
-       int err = -EINVAL;
-       u16 version;
-       struct rb_node *rbp;
-
-       ceph_decode_16_safe(p, end, version, bad);
-       if (version > CEPH_OSDMAP_INC_VERSION) {
-               pr_warning("got unknown v %d > %d of inc osdmap\n", version,
-                          CEPH_OSDMAP_INC_VERSION);
-               goto bad;
-       }
-
-       ceph_decode_need(p, end, sizeof(fsid)+sizeof(modified)+2*sizeof(u32),
-                        bad);
-       ceph_decode_copy(p, &fsid, sizeof(fsid));
-       epoch = ceph_decode_32(p);
-       BUG_ON(epoch != map->epoch+1);
-       ceph_decode_copy(p, &modified, sizeof(modified));
-       new_pool_max = ceph_decode_32(p);
-       new_flags = ceph_decode_32(p);
-
-       /* full map? */
-       ceph_decode_32_safe(p, end, len, bad);
-       if (len > 0) {
-               dout("apply_incremental full map len %d, %p to %p\n",
-                    len, *p, end);
-               return osdmap_decode(p, min(*p+len, end));
-       }
-
-       /* new crush? */
-       ceph_decode_32_safe(p, end, len, bad);
-       if (len > 0) {
-               dout("apply_incremental new crush map len %d, %p to %p\n",
-                    len, *p, end);
-               newcrush = crush_decode(*p, min(*p+len, end));
-               if (IS_ERR(newcrush))
-                       return ERR_CAST(newcrush);
-               *p += len;
-       }
-
-       /* new flags? */
-       if (new_flags >= 0)
-               map->flags = new_flags;
-       if (new_pool_max >= 0)
-               map->pool_max = new_pool_max;
-
-       ceph_decode_need(p, end, 5*sizeof(u32), bad);
-
-       /* new max? */
-       max = ceph_decode_32(p);
-       if (max >= 0) {
-               err = osdmap_set_max_osd(map, max);
-               if (err < 0)
-                       goto bad;
-       }
-
-       map->epoch++;
-       map->modified = map->modified;
-       if (newcrush) {
-               if (map->crush)
-                       crush_destroy(map->crush);
-               map->crush = newcrush;
-               newcrush = NULL;
-       }
-
-       /* new_pool */
-       ceph_decode_32_safe(p, end, len, bad);
-       while (len--) {
-               __u8 ev;
-               struct ceph_pg_pool_info *pi;
-
-               ceph_decode_32_safe(p, end, pool, bad);
-               ceph_decode_need(p, end, 1 + sizeof(pi->v), bad);
-               ev = ceph_decode_8(p);  /* encoding version */
-               if (ev > CEPH_PG_POOL_VERSION) {
-                       pr_warning("got unknown v %d > %d of ceph_pg_pool\n",
-                                  ev, CEPH_PG_POOL_VERSION);
-                       goto bad;
-               }
-               pi = __lookup_pg_pool(&map->pg_pools, pool);
-               if (!pi) {
-                       pi = kzalloc(sizeof(*pi), GFP_NOFS);
-                       if (!pi) {
-                               err = -ENOMEM;
-                               goto bad;
-                       }
-                       pi->id = pool;
-                       __insert_pg_pool(&map->pg_pools, pi);
-               }
-               err = __decode_pool(p, end, pi);
-               if (err < 0)
-                       goto bad;
-       }
-       if (version >= 5 && __decode_pool_names(p, end, map) < 0)
-               goto bad;
-
-       /* old_pool */
-       ceph_decode_32_safe(p, end, len, bad);
-       while (len--) {
-               struct ceph_pg_pool_info *pi;
-
-               ceph_decode_32_safe(p, end, pool, bad);
-               pi = __lookup_pg_pool(&map->pg_pools, pool);
-               if (pi)
-                       __remove_pg_pool(&map->pg_pools, pi);
-       }
-
-       /* new_up */
-       err = -EINVAL;
-       ceph_decode_32_safe(p, end, len, bad);
-       while (len--) {
-               u32 osd;
-               struct ceph_entity_addr addr;
-               ceph_decode_32_safe(p, end, osd, bad);
-               ceph_decode_copy_safe(p, end, &addr, sizeof(addr), bad);
-               ceph_decode_addr(&addr);
-               pr_info("osd%d up\n", osd);
-               BUG_ON(osd >= map->max_osd);
-               map->osd_state[osd] |= CEPH_OSD_UP;
-               map->osd_addr[osd] = addr;
-       }
-
-       /* new_down */
-       ceph_decode_32_safe(p, end, len, bad);
-       while (len--) {
-               u32 osd;
-               ceph_decode_32_safe(p, end, osd, bad);
-               (*p)++;  /* clean flag */
-               pr_info("osd%d down\n", osd);
-               if (osd < map->max_osd)
-                       map->osd_state[osd] &= ~CEPH_OSD_UP;
-       }
-
-       /* new_weight */
-       ceph_decode_32_safe(p, end, len, bad);
-       while (len--) {
-               u32 osd, off;
-               ceph_decode_need(p, end, sizeof(u32)*2, bad);
-               osd = ceph_decode_32(p);
-               off = ceph_decode_32(p);
-               pr_info("osd%d weight 0x%x %s\n", osd, off,
-                    off == CEPH_OSD_IN ? "(in)" :
-                    (off == CEPH_OSD_OUT ? "(out)" : ""));
-               if (osd < map->max_osd)
-                       map->osd_weight[osd] = off;
-       }
-
-       /* new_pg_temp */
-       rbp = rb_first(&map->pg_temp);
-       ceph_decode_32_safe(p, end, len, bad);
-       while (len--) {
-               struct ceph_pg_mapping *pg;
-               int j;
-               struct ceph_pg pgid;
-               u32 pglen;
-               ceph_decode_need(p, end, sizeof(u64) + sizeof(u32), bad);
-               ceph_decode_copy(p, &pgid, sizeof(pgid));
-               pglen = ceph_decode_32(p);
-
-               /* remove any? */
-               while (rbp && pgid_cmp(rb_entry(rbp, struct ceph_pg_mapping,
-                                               node)->pgid, pgid) <= 0) {
-                       struct ceph_pg_mapping *cur =
-                               rb_entry(rbp, struct ceph_pg_mapping, node);
-
-                       rbp = rb_next(rbp);
-                       dout(" removed pg_temp %llx\n", *(u64 *)&cur->pgid);
-                       rb_erase(&cur->node, &map->pg_temp);
-                       kfree(cur);
-               }
-
-               if (pglen) {
-                       /* insert */
-                       ceph_decode_need(p, end, pglen*sizeof(u32), bad);
-                       pg = kmalloc(sizeof(*pg) + sizeof(u32)*pglen, GFP_NOFS);
-                       if (!pg) {
-                               err = -ENOMEM;
-                               goto bad;
-                       }
-                       pg->pgid = pgid;
-                       pg->len = pglen;
-                       for (j = 0; j < pglen; j++)
-                               pg->osds[j] = ceph_decode_32(p);
-                       err = __insert_pg_mapping(pg, &map->pg_temp);
-                       if (err) {
-                               kfree(pg);
-                               goto bad;
-                       }
-                       dout(" added pg_temp %llx len %d\n", *(u64 *)&pgid,
-                            pglen);
-               }
-       }
-       while (rbp) {
-               struct ceph_pg_mapping *cur =
-                       rb_entry(rbp, struct ceph_pg_mapping, node);
-
-               rbp = rb_next(rbp);
-               dout(" removed pg_temp %llx\n", *(u64 *)&cur->pgid);
-               rb_erase(&cur->node, &map->pg_temp);
-               kfree(cur);
-       }
-
-       /* ignore the rest */
-       *p = end;
-       return map;
-
-bad:
-       pr_err("corrupt inc osdmap epoch %d off %d (%p of %p-%p)\n",
-              epoch, (int)(*p - start), *p, start, end);
-       print_hex_dump(KERN_DEBUG, "osdmap: ",
-                      DUMP_PREFIX_OFFSET, 16, 1,
-                      start, end - start, true);
-       if (newcrush)
-               crush_destroy(newcrush);
-       return ERR_PTR(err);
-}
-
-
-
-
-/*
- * calculate file layout from given offset, length.
- * fill in correct oid, logical length, and object extent
- * offset, length.
- *
- * for now, we write only a single su, until we can
- * pass a stride back to the caller.
- */
-void ceph_calc_file_object_mapping(struct ceph_file_layout *layout,
-                                  u64 off, u64 *plen,
-                                  u64 *ono,
-                                  u64 *oxoff, u64 *oxlen)
-{
-       u32 osize = le32_to_cpu(layout->fl_object_size);
-       u32 su = le32_to_cpu(layout->fl_stripe_unit);
-       u32 sc = le32_to_cpu(layout->fl_stripe_count);
-       u32 bl, stripeno, stripepos, objsetno;
-       u32 su_per_object;
-       u64 t, su_offset;
-
-       dout("mapping %llu~%llu  osize %u fl_su %u\n", off, *plen,
-            osize, su);
-       su_per_object = osize / su;
-       dout("osize %u / su %u = su_per_object %u\n", osize, su,
-            su_per_object);
-
-       BUG_ON((su & ~PAGE_MASK) != 0);
-       /* bl = *off / su; */
-       t = off;
-       do_div(t, su);
-       bl = t;
-       dout("off %llu / su %u = bl %u\n", off, su, bl);
-
-       stripeno = bl / sc;
-       stripepos = bl % sc;
-       objsetno = stripeno / su_per_object;
-
-       *ono = objsetno * sc + stripepos;
-       dout("objset %u * sc %u = ono %u\n", objsetno, sc, (unsigned)*ono);
-
-       /* *oxoff = *off % layout->fl_stripe_unit;  # offset in su */
-       t = off;
-       su_offset = do_div(t, su);
-       *oxoff = su_offset + (stripeno % su_per_object) * su;
-
-       /*
-        * Calculate the length of the extent being written to the selected
-        * object. This is the minimum of the full length requested (plen) or
-        * the remainder of the current stripe being written to.
-        */
-       *oxlen = min_t(u64, *plen, su - su_offset);
-       *plen = *oxlen;
-
-       dout(" obj extent %llu~%llu\n", *oxoff, *oxlen);
-}
-
-/*
- * calculate an object layout (i.e. pgid) from an oid,
- * file_layout, and osdmap
- */
-int ceph_calc_object_layout(struct ceph_object_layout *ol,
-                           const char *oid,
-                           struct ceph_file_layout *fl,
-                           struct ceph_osdmap *osdmap)
-{
-       unsigned num, num_mask;
-       struct ceph_pg pgid;
-       s32 preferred = (s32)le32_to_cpu(fl->fl_pg_preferred);
-       int poolid = le32_to_cpu(fl->fl_pg_pool);
-       struct ceph_pg_pool_info *pool;
-       unsigned ps;
-
-       BUG_ON(!osdmap);
-
-       pool = __lookup_pg_pool(&osdmap->pg_pools, poolid);
-       if (!pool)
-               return -EIO;
-       ps = ceph_str_hash(pool->v.object_hash, oid, strlen(oid));
-       if (preferred >= 0) {
-               ps += preferred;
-               num = le32_to_cpu(pool->v.lpg_num);
-               num_mask = pool->lpg_num_mask;
-       } else {
-               num = le32_to_cpu(pool->v.pg_num);
-               num_mask = pool->pg_num_mask;
-       }
-
-       pgid.ps = cpu_to_le16(ps);
-       pgid.preferred = cpu_to_le16(preferred);
-       pgid.pool = fl->fl_pg_pool;
-       if (preferred >= 0)
-               dout("calc_object_layout '%s' pgid %d.%xp%d\n", oid, poolid, ps,
-                    (int)preferred);
-       else
-               dout("calc_object_layout '%s' pgid %d.%x\n", oid, poolid, ps);
-
-       ol->ol_pgid = pgid;
-       ol->ol_stripe_unit = fl->fl_object_stripe_unit;
-       return 0;
-}
-
-/*
- * Calculate raw osd vector for the given pgid.  Return pointer to osd
- * array, or NULL on failure.
- */
-static int *calc_pg_raw(struct ceph_osdmap *osdmap, struct ceph_pg pgid,
-                       int *osds, int *num)
-{
-       struct ceph_pg_mapping *pg;
-       struct ceph_pg_pool_info *pool;
-       int ruleno;
-       unsigned poolid, ps, pps;
-       int preferred;
-
-       /* pg_temp? */
-       pg = __lookup_pg_mapping(&osdmap->pg_temp, pgid);
-       if (pg) {
-               *num = pg->len;
-               return pg->osds;
-       }
-
-       /* crush */
-       poolid = le32_to_cpu(pgid.pool);
-       ps = le16_to_cpu(pgid.ps);
-       preferred = (s16)le16_to_cpu(pgid.preferred);
-
-       /* don't forcefeed bad device ids to crush */
-       if (preferred >= osdmap->max_osd ||
-           preferred >= osdmap->crush->max_devices)
-               preferred = -1;
-
-       pool = __lookup_pg_pool(&osdmap->pg_pools, poolid);
-       if (!pool)
-               return NULL;
-       ruleno = crush_find_rule(osdmap->crush, pool->v.crush_ruleset,
-                                pool->v.type, pool->v.size);
-       if (ruleno < 0) {
-               pr_err("no crush rule pool %d ruleset %d type %d size %d\n",
-                      poolid, pool->v.crush_ruleset, pool->v.type,
-                      pool->v.size);
-               return NULL;
-       }
-
-       if (preferred >= 0)
-               pps = ceph_stable_mod(ps,
-                                     le32_to_cpu(pool->v.lpgp_num),
-                                     pool->lpgp_num_mask);
-       else
-               pps = ceph_stable_mod(ps,
-                                     le32_to_cpu(pool->v.pgp_num),
-                                     pool->pgp_num_mask);
-       pps += poolid;
-       *num = crush_do_rule(osdmap->crush, ruleno, pps, osds,
-                            min_t(int, pool->v.size, *num),
-                            preferred, osdmap->osd_weight);
-       return osds;
-}
-
-/*
- * Return acting set for given pgid.
- */
-int ceph_calc_pg_acting(struct ceph_osdmap *osdmap, struct ceph_pg pgid,
-                       int *acting)
-{
-       int rawosds[CEPH_PG_MAX_SIZE], *osds;
-       int i, o, num = CEPH_PG_MAX_SIZE;
-
-       osds = calc_pg_raw(osdmap, pgid, rawosds, &num);
-       if (!osds)
-               return -1;
-
-       /* primary is first up osd */
-       o = 0;
-       for (i = 0; i < num; i++)
-               if (ceph_osd_is_up(osdmap, osds[i]))
-                       acting[o++] = osds[i];
-       return o;
-}
-
-/*
- * Return primary osd for given pgid, or -1 if none.
- */
-int ceph_calc_pg_primary(struct ceph_osdmap *osdmap, struct ceph_pg pgid)
-{
-       int rawosds[CEPH_PG_MAX_SIZE], *osds;
-       int i, num = CEPH_PG_MAX_SIZE;
-
-       osds = calc_pg_raw(osdmap, pgid, rawosds, &num);
-       if (!osds)
-               return -1;
-
-       /* primary is first up osd */
-       for (i = 0; i < num; i++)
-               if (ceph_osd_is_up(osdmap, osds[i]))
-                       return osds[i];
-       return -1;
-}
diff --git a/fs/ceph/osdmap.h b/fs/ceph/osdmap.h

deleted file mode 100644 (file)

index 970b547..0000000
--- a/fs/ceph/osdmap.h
+++ /dev/null
@@ -1,128 +0,0 @@
-#ifndef _FS_CEPH_OSDMAP_H
-#define _FS_CEPH_OSDMAP_H
-
-#include <linux/rbtree.h>
-#include "types.h"
-#include "ceph_fs.h"
-#include "crush/crush.h"
-
-/*
- * The osd map describes the current membership of the osd cluster and
- * specifies the mapping of objects to placement groups and placement
- * groups to (sets of) osds.  That is, it completely specifies the
- * (desired) distribution of all data objects in the system at some
- * point in time.
- *
- * Each map version is identified by an epoch, which increases monotonically.
- *
- * The map can be updated either via an incremental map (diff) describing
- * the change between two successive epochs, or as a fully encoded map.
- */
-struct ceph_pg_pool_info {
-       struct rb_node node;
-       int id;
-       struct ceph_pg_pool v;
-       int pg_num_mask, pgp_num_mask, lpg_num_mask, lpgp_num_mask;
-       char *name;
-};
-
-struct ceph_pg_mapping {
-       struct rb_node node;
-       struct ceph_pg pgid;
-       int len;
-       int osds[];
-};
-
-struct ceph_osdmap {
-       struct ceph_fsid fsid;
-       u32 epoch;
-       u32 mkfs_epoch;
-       struct ceph_timespec created, modified;
-
-       u32 flags;         /* CEPH_OSDMAP_* */
-
-       u32 max_osd;       /* size of osd_state, _offload, _addr arrays */
-       u8 *osd_state;     /* CEPH_OSD_* */
-       u32 *osd_weight;   /* 0 = failed, 0x10000 = 100% normal */
-       struct ceph_entity_addr *osd_addr;
-
-       struct rb_root pg_temp;
-       struct rb_root pg_pools;
-       u32 pool_max;
-
-       /* the CRUSH map specifies the mapping of placement groups to
-        * the list of osds that store+replicate them. */
-       struct crush_map *crush;
-};
-
-/*
- * file layout helpers
- */
-#define ceph_file_layout_su(l) ((__s32)le32_to_cpu((l).fl_stripe_unit))
-#define ceph_file_layout_stripe_count(l) \
-       ((__s32)le32_to_cpu((l).fl_stripe_count))
-#define ceph_file_layout_object_size(l) ((__s32)le32_to_cpu((l).fl_object_size))
-#define ceph_file_layout_cas_hash(l) ((__s32)le32_to_cpu((l).fl_cas_hash))
-#define ceph_file_layout_object_su(l) \
-       ((__s32)le32_to_cpu((l).fl_object_stripe_unit))
-#define ceph_file_layout_pg_preferred(l) \
-       ((__s32)le32_to_cpu((l).fl_pg_preferred))
-#define ceph_file_layout_pg_pool(l) \
-       ((__s32)le32_to_cpu((l).fl_pg_pool))
-
-static inline unsigned ceph_file_layout_stripe_width(struct ceph_file_layout *l)
-{
-       return le32_to_cpu(l->fl_stripe_unit) *
-               le32_to_cpu(l->fl_stripe_count);
-}
-
-/* "period" == bytes before i start on a new set of objects */
-static inline unsigned ceph_file_layout_period(struct ceph_file_layout *l)
-{
-       return le32_to_cpu(l->fl_object_size) *
-               le32_to_cpu(l->fl_stripe_count);
-}
-
-
-static inline int ceph_osd_is_up(struct ceph_osdmap *map, int osd)
-{
-       return (osd < map->max_osd) && (map->osd_state[osd] & CEPH_OSD_UP);
-}
-
-static inline bool ceph_osdmap_flag(struct ceph_osdmap *map, int flag)
-{
-       return map && (map->flags & flag);
-}
-
-extern char *ceph_osdmap_state_str(char *str, int len, int state);
-
-static inline struct ceph_entity_addr *ceph_osd_addr(struct ceph_osdmap *map,
-                                                    int osd)
-{
-       if (osd >= map->max_osd)
-               return NULL;
-       return &map->osd_addr[osd];
-}
-
-extern struct ceph_osdmap *osdmap_decode(void **p, void *end);
-extern struct ceph_osdmap *osdmap_apply_incremental(void **p, void *end,
-                                           struct ceph_osdmap *map,
-                                           struct ceph_messenger *msgr);
-extern void ceph_osdmap_destroy(struct ceph_osdmap *map);
-
-/* calculate mapping of a file extent to an object */
-extern void ceph_calc_file_object_mapping(struct ceph_file_layout *layout,
-                                         u64 off, u64 *plen,
-                                         u64 *bno, u64 *oxoff, u64 *oxlen);
-
-/* calculate mapping of object to a placement group */
-extern int ceph_calc_object_layout(struct ceph_object_layout *ol,
-                                  const char *oid,
-                                  struct ceph_file_layout *fl,
-                                  struct ceph_osdmap *osdmap);
-extern int ceph_calc_pg_acting(struct ceph_osdmap *osdmap, struct ceph_pg pgid,
-                              int *acting);
-extern int ceph_calc_pg_primary(struct ceph_osdmap *osdmap,
-                               struct ceph_pg pgid);
-
-#endif
diff --git a/fs/ceph/pagelist.c b/fs/ceph/pagelist.c

deleted file mode 100644 (file)

index 46a368b..0000000
--- a/fs/ceph/pagelist.c
+++ /dev/null
@@ -1,63 +0,0 @@
-
-#include <linux/gfp.h>
-#include <linux/pagemap.h>
-#include <linux/highmem.h>
-
-#include "pagelist.h"
-
-static void ceph_pagelist_unmap_tail(struct ceph_pagelist *pl)
-{
-       struct page *page = list_entry(pl->head.prev, struct page,
-                                      lru);
-       kunmap(page);
-}
-
-int ceph_pagelist_release(struct ceph_pagelist *pl)
-{
-       if (pl->mapped_tail)
-               ceph_pagelist_unmap_tail(pl);
-
-       while (!list_empty(&pl->head)) {
-               struct page *page = list_first_entry(&pl->head, struct page,
-                                                    lru);
-               list_del(&page->lru);
-               __free_page(page);
-       }
-       return 0;
-}
-
-static int ceph_pagelist_addpage(struct ceph_pagelist *pl)
-{
-       struct page *page = __page_cache_alloc(GFP_NOFS);
-       if (!page)
-               return -ENOMEM;
-       pl->room += PAGE_SIZE;
-       list_add_tail(&page->lru, &pl->head);
-       if (pl->mapped_tail)
-               ceph_pagelist_unmap_tail(pl);
-       pl->mapped_tail = kmap(page);
-       return 0;
-}
-
-int ceph_pagelist_append(struct ceph_pagelist *pl, void *buf, size_t len)
-{
-       while (pl->room < len) {
-               size_t bit = pl->room;
-               int ret;
-
-               memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK),
-                      buf, bit);
-               pl->length += bit;
-               pl->room -= bit;
-               buf += bit;
-               len -= bit;
-               ret = ceph_pagelist_addpage(pl);
-               if (ret)
-                       return ret;
-       }
-
-       memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK), buf, len);
-       pl->length += len;
-       pl->room -= len;
-       return 0;
-}
diff --git a/fs/ceph/pagelist.h b/fs/ceph/pagelist.h

deleted file mode 100644 (file)

index e8a4187..0000000
--- a/fs/ceph/pagelist.h
+++ /dev/null
@@ -1,54 +0,0 @@
-#ifndef __FS_CEPH_PAGELIST_H
-#define __FS_CEPH_PAGELIST_H
-
-#include <linux/list.h>
-
-struct ceph_pagelist {
-       struct list_head head;
-       void *mapped_tail;
-       size_t length;
-       size_t room;
-};
-
-static inline void ceph_pagelist_init(struct ceph_pagelist *pl)
-{
-       INIT_LIST_HEAD(&pl->head);
-       pl->mapped_tail = NULL;
-       pl->length = 0;
-       pl->room = 0;
-}
-extern int ceph_pagelist_release(struct ceph_pagelist *pl);
-
-extern int ceph_pagelist_append(struct ceph_pagelist *pl, void *d, size_t l);
-
-static inline int ceph_pagelist_encode_64(struct ceph_pagelist *pl, u64 v)
-{
-       __le64 ev = cpu_to_le64(v);
-       return ceph_pagelist_append(pl, &ev, sizeof(ev));
-}
-static inline int ceph_pagelist_encode_32(struct ceph_pagelist *pl, u32 v)
-{
-       __le32 ev = cpu_to_le32(v);
-       return ceph_pagelist_append(pl, &ev, sizeof(ev));
-}
-static inline int ceph_pagelist_encode_16(struct ceph_pagelist *pl, u16 v)
-{
-       __le16 ev = cpu_to_le16(v);
-       return ceph_pagelist_append(pl, &ev, sizeof(ev));
-}
-static inline int ceph_pagelist_encode_8(struct ceph_pagelist *pl, u8 v)
-{
-       return ceph_pagelist_append(pl, &v, 1);
-}
-static inline int ceph_pagelist_encode_string(struct ceph_pagelist *pl,
-                                             char *s, size_t len)
-{
-       int ret = ceph_pagelist_encode_32(pl, len);
-       if (ret)
-               return ret;
-       if (len)
-               return ceph_pagelist_append(pl, s, len);
-       return 0;
-}
-
-#endif
diff --git a/fs/ceph/rados.h b/fs/ceph/rados.h

deleted file mode 100644 (file)

index 6d5247f..0000000
--- a/fs/ceph/rados.h
+++ /dev/null
@@ -1,405 +0,0 @@
-#ifndef CEPH_RADOS_H
-#define CEPH_RADOS_H
-
-/*
- * Data types for the Ceph distributed object storage layer RADOS
- * (Reliable Autonomic Distributed Object Store).
- */
-
-#include "msgr.h"
-
-/*
- * osdmap encoding versions
- */
-#define CEPH_OSDMAP_INC_VERSION     5
-#define CEPH_OSDMAP_INC_VERSION_EXT 5
-#define CEPH_OSDMAP_VERSION         5
-#define CEPH_OSDMAP_VERSION_EXT     5
-
-/*
- * fs id
- */
-struct ceph_fsid {
-       unsigned char fsid[16];
-};
-
-static inline int ceph_fsid_compare(const struct ceph_fsid *a,
-                                   const struct ceph_fsid *b)
-{
-       return memcmp(a, b, sizeof(*a));
-}
-
-/*
- * ino, object, etc.
- */
-typedef __le64 ceph_snapid_t;
-#define CEPH_SNAPDIR ((__u64)(-1))  /* reserved for hidden .snap dir */
-#define CEPH_NOSNAP  ((__u64)(-2))  /* "head", "live" revision */
-#define CEPH_MAXSNAP ((__u64)(-3))  /* largest valid snapid */
-
-struct ceph_timespec {
-       __le32 tv_sec;
-       __le32 tv_nsec;
-} __attribute__ ((packed));
-
-
-/*
- * object layout - how objects are mapped into PGs
- */
-#define CEPH_OBJECT_LAYOUT_HASH     1
-#define CEPH_OBJECT_LAYOUT_LINEAR   2
-#define CEPH_OBJECT_LAYOUT_HASHINO  3
-
-/*
- * pg layout -- how PGs are mapped onto (sets of) OSDs
- */
-#define CEPH_PG_LAYOUT_CRUSH  0
-#define CEPH_PG_LAYOUT_HASH   1
-#define CEPH_PG_LAYOUT_LINEAR 2
-#define CEPH_PG_LAYOUT_HYBRID 3
-
-#define CEPH_PG_MAX_SIZE      16  /* max # osds in a single pg */
-
-/*
- * placement group.
- * we encode this into one __le64.
- */
-struct ceph_pg {
-       __le16 preferred; /* preferred primary osd */
-       __le16 ps;        /* placement seed */
-       __le32 pool;      /* object pool */
-} __attribute__ ((packed));
-
-/*
- * pg_pool is a set of pgs storing a pool of objects
- *
- *  pg_num -- base number of pseudorandomly placed pgs
- *
- *  pgp_num -- effective number when calculating pg placement.  this
- * is used for pg_num increases.  new pgs result in data being "split"
- * into new pgs.  for this to proceed smoothly, new pgs are intiially
- * colocated with their parents; that is, pgp_num doesn't increase
- * until the new pgs have successfully split.  only _then_ are the new
- * pgs placed independently.
- *
- *  lpg_num -- localized pg count (per device).  replicas are randomly
- * selected.
- *
- *  lpgp_num -- as above.
- */
-#define CEPH_PG_TYPE_REP     1
-#define CEPH_PG_TYPE_RAID4   2
-#define CEPH_PG_POOL_VERSION 2
-struct ceph_pg_pool {
-       __u8 type;                /* CEPH_PG_TYPE_* */
-       __u8 size;                /* number of osds in each pg */
-       __u8 crush_ruleset;       /* crush placement rule */
-       __u8 object_hash;         /* hash mapping object name to ps */
-       __le32 pg_num, pgp_num;   /* number of pg's */
-       __le32 lpg_num, lpgp_num; /* number of localized pg's */
-       __le32 last_change;       /* most recent epoch changed */
-       __le64 snap_seq;          /* seq for per-pool snapshot */
-       __le32 snap_epoch;        /* epoch of last snap */
-       __le32 num_snaps;
-       __le32 num_removed_snap_intervals; /* if non-empty, NO per-pool snaps */
-       __le64 auid;               /* who owns the pg */
-} __attribute__ ((packed));
-
-/*
- * stable_mod func is used to control number of placement groups.
- * similar to straight-up modulo, but produces a stable mapping as b
- * increases over time.  b is the number of bins, and bmask is the
- * containing power of 2 minus 1.
- *
- * b <= bmask and bmask=(2**n)-1
- * e.g., b=12 -> bmask=15, b=123 -> bmask=127
- */
-static inline int ceph_stable_mod(int x, int b, int bmask)
-{
-       if ((x & bmask) < b)
-               return x & bmask;
-       else
-               return x & (bmask >> 1);
-}
-
-/*
- * object layout - how a given object should be stored.
- */
-struct ceph_object_layout {
-       struct ceph_pg ol_pgid;   /* raw pg, with _full_ ps precision. */
-       __le32 ol_stripe_unit;    /* for per-object parity, if any */
-} __attribute__ ((packed));
-
-/*
- * compound epoch+version, used by storage layer to serialize mutations
- */
-struct ceph_eversion {
-       __le32 epoch;
-       __le64 version;
-} __attribute__ ((packed));
-
-/*
- * osd map bits
- */
-
-/* status bits */
-#define CEPH_OSD_EXISTS 1
-#define CEPH_OSD_UP     2
-
-/* osd weights.  fixed point value: 0x10000 == 1.0 ("in"), 0 == "out" */
-#define CEPH_OSD_IN  0x10000
-#define CEPH_OSD_OUT 0
-
-
-/*
- * osd map flag bits
- */
-#define CEPH_OSDMAP_NEARFULL (1<<0)  /* sync writes (near ENOSPC) */
-#define CEPH_OSDMAP_FULL     (1<<1)  /* no data writes (ENOSPC) */
-#define CEPH_OSDMAP_PAUSERD  (1<<2)  /* pause all reads */
-#define CEPH_OSDMAP_PAUSEWR  (1<<3)  /* pause all writes */
-#define CEPH_OSDMAP_PAUSEREC (1<<4)  /* pause recovery */
-
-/*
- * osd ops
- */
-#define CEPH_OSD_OP_MODE       0xf000
-#define CEPH_OSD_OP_MODE_RD    0x1000
-#define CEPH_OSD_OP_MODE_WR    0x2000
-#define CEPH_OSD_OP_MODE_RMW   0x3000
-#define CEPH_OSD_OP_MODE_SUB   0x4000
-
-#define CEPH_OSD_OP_TYPE       0x0f00
-#define CEPH_OSD_OP_TYPE_LOCK  0x0100
-#define CEPH_OSD_OP_TYPE_DATA  0x0200
-#define CEPH_OSD_OP_TYPE_ATTR  0x0300
-#define CEPH_OSD_OP_TYPE_EXEC  0x0400
-#define CEPH_OSD_OP_TYPE_PG    0x0500
-
-enum {
-       /** data **/
-       /* read */
-       CEPH_OSD_OP_READ      = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 1,
-       CEPH_OSD_OP_STAT      = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 2,
-
-       /* fancy read */
-       CEPH_OSD_OP_MASKTRUNC = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 4,
-
-       /* write */
-       CEPH_OSD_OP_WRITE     = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 1,
-       CEPH_OSD_OP_WRITEFULL = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 2,
-       CEPH_OSD_OP_TRUNCATE  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 3,
-       CEPH_OSD_OP_ZERO      = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 4,
-       CEPH_OSD_OP_DELETE    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 5,
-
-       /* fancy write */
-       CEPH_OSD_OP_APPEND    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 6,
-       CEPH_OSD_OP_STARTSYNC = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 7,
-       CEPH_OSD_OP_SETTRUNC  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 8,
-       CEPH_OSD_OP_TRIMTRUNC = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 9,
-
-       CEPH_OSD_OP_TMAPUP  = CEPH_OSD_OP_MODE_RMW | CEPH_OSD_OP_TYPE_DATA | 10,
-       CEPH_OSD_OP_TMAPPUT = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 11,
-       CEPH_OSD_OP_TMAPGET = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 12,
-
-       CEPH_OSD_OP_CREATE  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 13,
-       CEPH_OSD_OP_ROLLBACK= CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 14,
-
-       /** attrs **/
-       /* read */
-       CEPH_OSD_OP_GETXATTR  = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_ATTR | 1,
-       CEPH_OSD_OP_GETXATTRS = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_ATTR | 2,
-       CEPH_OSD_OP_CMPXATTR  = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_ATTR | 3,
-
-       /* write */
-       CEPH_OSD_OP_SETXATTR  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_ATTR | 1,
-       CEPH_OSD_OP_SETXATTRS = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_ATTR | 2,
-       CEPH_OSD_OP_RESETXATTRS = CEPH_OSD_OP_MODE_WR|CEPH_OSD_OP_TYPE_ATTR | 3,
-       CEPH_OSD_OP_RMXATTR   = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_ATTR | 4,
-
-       /** subop **/
-       CEPH_OSD_OP_PULL           = CEPH_OSD_OP_MODE_SUB | 1,
-       CEPH_OSD_OP_PUSH           = CEPH_OSD_OP_MODE_SUB | 2,
-       CEPH_OSD_OP_BALANCEREADS   = CEPH_OSD_OP_MODE_SUB | 3,
-       CEPH_OSD_OP_UNBALANCEREADS = CEPH_OSD_OP_MODE_SUB | 4,
-       CEPH_OSD_OP_SCRUB          = CEPH_OSD_OP_MODE_SUB | 5,
-
-       /** lock **/
-       CEPH_OSD_OP_WRLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 1,
-       CEPH_OSD_OP_WRUNLOCK  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 2,
-       CEPH_OSD_OP_RDLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 3,
-       CEPH_OSD_OP_RDUNLOCK  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 4,
-       CEPH_OSD_OP_UPLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 5,
-       CEPH_OSD_OP_DNLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 6,
-
-       /** exec **/
-       CEPH_OSD_OP_CALL    = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_EXEC | 1,
-
-       /** pg **/
-       CEPH_OSD_OP_PGLS      = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_PG | 1,
-};
-
-static inline int ceph_osd_op_type_lock(int op)
-{
-       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_LOCK;
-}
-static inline int ceph_osd_op_type_data(int op)
-{
-       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_DATA;
-}
-static inline int ceph_osd_op_type_attr(int op)
-{
-       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_ATTR;
-}
-static inline int ceph_osd_op_type_exec(int op)
-{
-       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_EXEC;
-}
-static inline int ceph_osd_op_type_pg(int op)
-{
-       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_PG;
-}
-
-static inline int ceph_osd_op_mode_subop(int op)
-{
-       return (op & CEPH_OSD_OP_MODE) == CEPH_OSD_OP_MODE_SUB;
-}
-static inline int ceph_osd_op_mode_read(int op)
-{
-       return (op & CEPH_OSD_OP_MODE) == CEPH_OSD_OP_MODE_RD;
-}
-static inline int ceph_osd_op_mode_modify(int op)
-{
-       return (op & CEPH_OSD_OP_MODE) == CEPH_OSD_OP_MODE_WR;
-}
-
-/*
- * note that the following tmap stuff is also defined in the ceph librados.h
- * any modification here needs to be updated there
- */
-#define CEPH_OSD_TMAP_HDR 'h'
-#define CEPH_OSD_TMAP_SET 's'
-#define CEPH_OSD_TMAP_RM  'r'
-
-extern const char *ceph_osd_op_name(int op);
-
-
-/*
- * osd op flags
- *
- * An op may be READ, WRITE, or READ|WRITE.
- */
-enum {
-       CEPH_OSD_FLAG_ACK = 1,          /* want (or is) "ack" ack */
-       CEPH_OSD_FLAG_ONNVRAM = 2,      /* want (or is) "onnvram" ack */
-       CEPH_OSD_FLAG_ONDISK = 4,       /* want (or is) "ondisk" ack */
-       CEPH_OSD_FLAG_RETRY = 8,        /* resend attempt */
-       CEPH_OSD_FLAG_READ = 16,        /* op may read */
-       CEPH_OSD_FLAG_WRITE = 32,       /* op may write */
-       CEPH_OSD_FLAG_ORDERSNAP = 64,   /* EOLDSNAP if snapc is out of order */
-       CEPH_OSD_FLAG_PEERSTAT = 128,   /* msg includes osd_peer_stat */
-       CEPH_OSD_FLAG_BALANCE_READS = 256,
-       CEPH_OSD_FLAG_PARALLELEXEC = 512, /* execute op in parallel */
-       CEPH_OSD_FLAG_PGOP = 1024,      /* pg op, no object */
-       CEPH_OSD_FLAG_EXEC = 2048,      /* op may exec */
-       CEPH_OSD_FLAG_EXEC_PUBLIC = 4096, /* op may exec (public) */
-};
-
-enum {
-       CEPH_OSD_OP_FLAG_EXCL = 1,      /* EXCL object create */
-};
-
-#define EOLDSNAPC    ERESTART  /* ORDERSNAP flag set; writer has old snapc*/
-#define EBLACKLISTED ESHUTDOWN /* blacklisted */
-
-/* xattr comparison */
-enum {
-       CEPH_OSD_CMPXATTR_OP_NOP = 0,
-       CEPH_OSD_CMPXATTR_OP_EQ  = 1,
-       CEPH_OSD_CMPXATTR_OP_NE  = 2,
-       CEPH_OSD_CMPXATTR_OP_GT  = 3,
-       CEPH_OSD_CMPXATTR_OP_GTE = 4,
-       CEPH_OSD_CMPXATTR_OP_LT  = 5,
-       CEPH_OSD_CMPXATTR_OP_LTE = 6
-};
-
-enum {
-       CEPH_OSD_CMPXATTR_MODE_STRING = 1,
-       CEPH_OSD_CMPXATTR_MODE_U64    = 2
-};
-
-/*
- * an individual object operation.  each may be accompanied by some data
- * payload
- */
-struct ceph_osd_op {
-       __le16 op;           /* CEPH_OSD_OP_* */
-       __le32 flags;        /* CEPH_OSD_FLAG_* */
-       union {
-               struct {
-                       __le64 offset, length;
-                       __le64 truncate_size;
-                       __le32 truncate_seq;
-               } __attribute__ ((packed)) extent;
-               struct {
-                       __le32 name_len;
-                       __le32 value_len;
-                       __u8 cmp_op;       /* CEPH_OSD_CMPXATTR_OP_* */
-                       __u8 cmp_mode;     /* CEPH_OSD_CMPXATTR_MODE_* */
-               } __attribute__ ((packed)) xattr;
-               struct {
-                       __u8 class_len;
-                       __u8 method_len;
-                       __u8 argc;
-                       __le32 indata_len;
-               } __attribute__ ((packed)) cls;
-               struct {
-                       __le64 cookie, count;
-               } __attribute__ ((packed)) pgls;
-               struct {
-                       __le64 snapid;
-               } __attribute__ ((packed)) snap;
-       };
-       __le32 payload_len;
-} __attribute__ ((packed));
-
-/*
- * osd request message header.  each request may include multiple
- * ceph_osd_op object operations.
- */
-struct ceph_osd_request_head {
-       __le32 client_inc;                 /* client incarnation */
-       struct ceph_object_layout layout;  /* pgid */
-       __le32 osdmap_epoch;               /* client's osdmap epoch */
-
-       __le32 flags;
-
-       struct ceph_timespec mtime;        /* for mutations only */
-       struct ceph_eversion reassert_version; /* if we are replaying op */
-
-       __le32 object_len;     /* length of object name */
-
-       __le64 snapid;         /* snapid to read */
-       __le64 snap_seq;       /* writer's snap context */
-       __le32 num_snaps;
-
-       __le16 num_ops;
-       struct ceph_osd_op ops[];  /* followed by ops[], obj, ticket, snaps */
-} __attribute__ ((packed));
-
-struct ceph_osd_reply_head {
-       __le32 client_inc;                /* client incarnation */
-       __le32 flags;
-       struct ceph_object_layout layout;
-       __le32 osdmap_epoch;
-       struct ceph_eversion reassert_version; /* for replaying uncommitted */
-
-       __le32 result;                    /* result code */
-
-       __le32 object_len;                /* length of object name */
-       __le32 num_ops;
-       struct ceph_osd_op ops[0];  /* ops[], object */
-} __attribute__ ((packed));
-
-
-#endif
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c

index 190b6c4a6f2b91aace658f8de7664f75cc0bdcc4..39c243acd062c810d33e60da27af51cbcfe058e8 100644 (file)
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -1,10 +1,12 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/sort.h>
  #include <linux/slab.h>
  
  #include "super.h"
-#include "decode.h"
+#include "mds_client.h"
+
+#include <linux/ceph/decode.h>
  
  /*
   * Snapshots in ceph are driven in large part by cooperation from the
@@ -526,7 +528,7 @@ int __ceph_finish_cap_snap(struct ceph_inode_info *ci,
                             struct ceph_cap_snap *capsnap)
  {
         struct inode *inode = &ci->vfs_inode;
-       struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+       struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
  
         BUG_ON(capsnap->writing);
         capsnap->size = inode->i_size;
@@ -747,7 +749,7 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
                       struct ceph_mds_session *session,
                       struct ceph_msg *msg)
  {
-       struct super_block *sb = mdsc->client->sb;
+       struct super_block *sb = mdsc->fsc->sb;
         int mds = session->s_mds;
         u64 split;
         int op;
diff --git a/fs/ceph/strings.c b/fs/ceph/strings.c

new file mode 100644 (file)

index 0000000..cd5097d
--- /dev/null
+++ b/fs/ceph/strings.c
@@ -0,0 +1,117 @@
+/*
+ * Ceph fs string constants
+ */
+#include <linux/module.h>
+#include <linux/ceph/types.h>
+
+
+const char *ceph_mds_state_name(int s)
+{
+       switch (s) {
+               /* down and out */
+       case CEPH_MDS_STATE_DNE:        return "down:dne";
+       case CEPH_MDS_STATE_STOPPED:    return "down:stopped";
+               /* up and out */
+       case CEPH_MDS_STATE_BOOT:       return "up:boot";
+       case CEPH_MDS_STATE_STANDBY:    return "up:standby";
+       case CEPH_MDS_STATE_STANDBY_REPLAY:    return "up:standby-replay";
+       case CEPH_MDS_STATE_CREATING:   return "up:creating";
+       case CEPH_MDS_STATE_STARTING:   return "up:starting";
+               /* up and in */
+       case CEPH_MDS_STATE_REPLAY:     return "up:replay";
+       case CEPH_MDS_STATE_RESOLVE:    return "up:resolve";
+       case CEPH_MDS_STATE_RECONNECT:  return "up:reconnect";
+       case CEPH_MDS_STATE_REJOIN:     return "up:rejoin";
+       case CEPH_MDS_STATE_CLIENTREPLAY: return "up:clientreplay";
+       case CEPH_MDS_STATE_ACTIVE:     return "up:active";
+       case CEPH_MDS_STATE_STOPPING:   return "up:stopping";
+       }
+       return "???";
+}
+
+const char *ceph_session_op_name(int op)
+{
+       switch (op) {
+       case CEPH_SESSION_REQUEST_OPEN: return "request_open";
+       case CEPH_SESSION_OPEN: return "open";
+       case CEPH_SESSION_REQUEST_CLOSE: return "request_close";
+       case CEPH_SESSION_CLOSE: return "close";
+       case CEPH_SESSION_REQUEST_RENEWCAPS: return "request_renewcaps";
+       case CEPH_SESSION_RENEWCAPS: return "renewcaps";
+       case CEPH_SESSION_STALE: return "stale";
+       case CEPH_SESSION_RECALL_STATE: return "recall_state";
+       }
+       return "???";
+}
+
+const char *ceph_mds_op_name(int op)
+{
+       switch (op) {
+       case CEPH_MDS_OP_LOOKUP:  return "lookup";
+       case CEPH_MDS_OP_LOOKUPHASH:  return "lookuphash";
+       case CEPH_MDS_OP_LOOKUPPARENT:  return "lookupparent";
+       case CEPH_MDS_OP_GETATTR:  return "getattr";
+       case CEPH_MDS_OP_SETXATTR: return "setxattr";
+       case CEPH_MDS_OP_SETATTR: return "setattr";
+       case CEPH_MDS_OP_RMXATTR: return "rmxattr";
+       case CEPH_MDS_OP_READDIR: return "readdir";
+       case CEPH_MDS_OP_MKNOD: return "mknod";
+       case CEPH_MDS_OP_LINK: return "link";
+       case CEPH_MDS_OP_UNLINK: return "unlink";
+       case CEPH_MDS_OP_RENAME: return "rename";
+       case CEPH_MDS_OP_MKDIR: return "mkdir";
+       case CEPH_MDS_OP_RMDIR: return "rmdir";
+       case CEPH_MDS_OP_SYMLINK: return "symlink";
+       case CEPH_MDS_OP_CREATE: return "create";
+       case CEPH_MDS_OP_OPEN: return "open";
+       case CEPH_MDS_OP_LOOKUPSNAP: return "lookupsnap";
+       case CEPH_MDS_OP_LSSNAP: return "lssnap";
+       case CEPH_MDS_OP_MKSNAP: return "mksnap";
+       case CEPH_MDS_OP_RMSNAP: return "rmsnap";
+       case CEPH_MDS_OP_SETFILELOCK: return "setfilelock";
+       case CEPH_MDS_OP_GETFILELOCK: return "getfilelock";
+       }
+       return "???";
+}
+
+const char *ceph_cap_op_name(int op)
+{
+       switch (op) {
+       case CEPH_CAP_OP_GRANT: return "grant";
+       case CEPH_CAP_OP_REVOKE: return "revoke";
+       case CEPH_CAP_OP_TRUNC: return "trunc";
+       case CEPH_CAP_OP_EXPORT: return "export";
+       case CEPH_CAP_OP_IMPORT: return "import";
+       case CEPH_CAP_OP_UPDATE: return "update";
+       case CEPH_CAP_OP_DROP: return "drop";
+       case CEPH_CAP_OP_FLUSH: return "flush";
+       case CEPH_CAP_OP_FLUSH_ACK: return "flush_ack";
+       case CEPH_CAP_OP_FLUSHSNAP: return "flushsnap";
+       case CEPH_CAP_OP_FLUSHSNAP_ACK: return "flushsnap_ack";
+       case CEPH_CAP_OP_RELEASE: return "release";
+       case CEPH_CAP_OP_RENEW: return "renew";
+       }
+       return "???";
+}
+
+const char *ceph_lease_op_name(int o)
+{
+       switch (o) {
+       case CEPH_MDS_LEASE_REVOKE: return "revoke";
+       case CEPH_MDS_LEASE_RELEASE: return "release";
+       case CEPH_MDS_LEASE_RENEW: return "renew";
+       case CEPH_MDS_LEASE_REVOKE_ACK: return "revoke_ack";
+       }
+       return "???";
+}
+
+const char *ceph_snap_op_name(int o)
+{
+       switch (o) {
+       case CEPH_SNAP_OP_UPDATE: return "update";
+       case CEPH_SNAP_OP_CREATE: return "create";
+       case CEPH_SNAP_OP_DESTROY: return "destroy";
+       case CEPH_SNAP_OP_SPLIT: return "split";
+       }
+       return "???";
+}
diff --git a/fs/ceph/super.c b/fs/ceph/super.c

index 9922628532b2c649dee6761710405adc552ca248..d6e0e042189184183b4ccf82da8787622e7f7d11 100644 (file)
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1,5 +1,5 @@
  
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <linux/backing-dev.h>
  #include <linux/ctype.h>
@@ -15,10 +15,13 @@
  #include <linux/statfs.h>
  #include <linux/string.h>
  
-#include "decode.h"
  #include "super.h"
-#include "mon_client.h"
-#include "auth.h"
+#include "mds_client.h"
+
+#include <linux/ceph/decode.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
  
  /*
   * Ceph superblock operations
@@ -26,36 +29,22 @@
   * Handle the basics of mounting, unmounting.
   */
  
-
-/*
- * find filename portion of a path (/foo/bar/baz -> baz)
- */
-const char *ceph_file_part(const char *s, int len)
-{
-       const char *e = s + len;
-
-       while (e != s && *(e-1) != '/')
-               e--;
-       return e;
-}
-
-
  /*
   * super ops
   */
  static void ceph_put_super(struct super_block *s)
  {
-       struct ceph_client *client = ceph_sb_to_client(s);
+       struct ceph_fs_client *fsc = ceph_sb_to_client(s);
  
         dout("put_super\n");
-       ceph_mdsc_close_sessions(&client->mdsc);
+       ceph_mdsc_close_sessions(fsc->mdsc);
  
         /*
          * ensure we release the bdi before put_anon_super releases
          * the device name.
          */
-       if (s->s_bdi == &client->backing_dev_info) {
-               bdi_unregister(&client->backing_dev_info);
+       if (s->s_bdi == &fsc->backing_dev_info) {
+               bdi_unregister(&fsc->backing_dev_info);
                 s->s_bdi = NULL;
         }
  
@@ -64,14 +53,14 @@ static void ceph_put_super(struct super_block *s)
  
  static int ceph_statfs(struct dentry *dentry, struct kstatfs *buf)
  {
-       struct ceph_client *client = ceph_inode_to_client(dentry->d_inode);
-       struct ceph_monmap *monmap = client->monc.monmap;
+       struct ceph_fs_client *fsc = ceph_inode_to_client(dentry->d_inode);
+       struct ceph_monmap *monmap = fsc->client->monc.monmap;
         struct ceph_statfs st;
         u64 fsid;
         int err;
  
         dout("statfs\n");
-       err = ceph_monc_do_statfs(&client->monc, &st);
+       err = ceph_monc_do_statfs(&fsc->client->monc, &st);
         if (err < 0)
                 return err;
  
@@ -104,238 +93,28 @@ static int ceph_statfs(struct dentry *dentry, struct kstatfs *buf)
  
  static int ceph_sync_fs(struct super_block *sb, int wait)
  {
-       struct ceph_client *client = ceph_sb_to_client(sb);
+       struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
  
         if (!wait) {
                 dout("sync_fs (non-blocking)\n");
-               ceph_flush_dirty_caps(&client->mdsc);
+               ceph_flush_dirty_caps(fsc->mdsc);
                 dout("sync_fs (non-blocking) done\n");
                 return 0;
         }
  
         dout("sync_fs (blocking)\n");
-       ceph_osdc_sync(&ceph_sb_to_client(sb)->osdc);
-       ceph_mdsc_sync(&ceph_sb_to_client(sb)->mdsc);
+       ceph_osdc_sync(&fsc->client->osdc);
+       ceph_mdsc_sync(fsc->mdsc);
         dout("sync_fs (blocking) done\n");
         return 0;
  }
  
-static int default_congestion_kb(void)
-{
-       int congestion_kb;
-
-       /*
-        * Copied from NFS
-        *
-        * congestion size, scale with available memory.
-        *
-        *  64MB:    8192k
-        * 128MB:   11585k
-        * 256MB:   16384k
-        * 512MB:   23170k
-        *   1GB:   32768k
-        *   2GB:   46340k
-        *   4GB:   65536k
-        *   8GB:   92681k
-        *  16GB:  131072k
-        *
-        * This allows larger machines to have larger/more transfers.
-        * Limit the default to 256M
-        */
-       congestion_kb = (16*int_sqrt(totalram_pages)) << (PAGE_SHIFT-10);
-       if (congestion_kb > 256*1024)
-               congestion_kb = 256*1024;
-
-       return congestion_kb;
-}
-
-/**
- * ceph_show_options - Show mount options in /proc/mounts
- * @m: seq_file to write to
- * @mnt: mount descriptor
- */
-static int ceph_show_options(struct seq_file *m, struct vfsmount *mnt)
-{
-       struct ceph_client *client = ceph_sb_to_client(mnt->mnt_sb);
-       struct ceph_mount_args *args = client->mount_args;
-
-       if (args->flags & CEPH_OPT_FSID)
-               seq_printf(m, ",fsid=%pU", &args->fsid);
-       if (args->flags & CEPH_OPT_NOSHARE)
-               seq_puts(m, ",noshare");
-       if (args->flags & CEPH_OPT_DIRSTAT)
-               seq_puts(m, ",dirstat");
-       if ((args->flags & CEPH_OPT_RBYTES) == 0)
-               seq_puts(m, ",norbytes");
-       if (args->flags & CEPH_OPT_NOCRC)
-               seq_puts(m, ",nocrc");
-       if (args->flags & CEPH_OPT_NOASYNCREADDIR)
-               seq_puts(m, ",noasyncreaddir");
-
-       if (args->mount_timeout != CEPH_MOUNT_TIMEOUT_DEFAULT)
-               seq_printf(m, ",mount_timeout=%d", args->mount_timeout);
-       if (args->osd_idle_ttl != CEPH_OSD_IDLE_TTL_DEFAULT)
-               seq_printf(m, ",osd_idle_ttl=%d", args->osd_idle_ttl);
-       if (args->osd_timeout != CEPH_OSD_TIMEOUT_DEFAULT)
-               seq_printf(m, ",osdtimeout=%d", args->osd_timeout);
-       if (args->osd_keepalive_timeout != CEPH_OSD_KEEPALIVE_DEFAULT)
-               seq_printf(m, ",osdkeepalivetimeout=%d",
-                        args->osd_keepalive_timeout);
-       if (args->wsize)
-               seq_printf(m, ",wsize=%d", args->wsize);
-       if (args->rsize != CEPH_MOUNT_RSIZE_DEFAULT)
-               seq_printf(m, ",rsize=%d", args->rsize);
-       if (args->congestion_kb != default_congestion_kb())
-               seq_printf(m, ",write_congestion_kb=%d", args->congestion_kb);
-       if (args->caps_wanted_delay_min != CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT)
-               seq_printf(m, ",caps_wanted_delay_min=%d",
-                        args->caps_wanted_delay_min);
-       if (args->caps_wanted_delay_max != CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT)
-               seq_printf(m, ",caps_wanted_delay_max=%d",
-                          args->caps_wanted_delay_max);
-       if (args->cap_release_safety != CEPH_CAP_RELEASE_SAFETY_DEFAULT)
-               seq_printf(m, ",cap_release_safety=%d",
-                          args->cap_release_safety);
-       if (args->max_readdir != CEPH_MAX_READDIR_DEFAULT)
-               seq_printf(m, ",readdir_max_entries=%d", args->max_readdir);
-       if (args->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
-               seq_printf(m, ",readdir_max_bytes=%d", args->max_readdir_bytes);
-       if (strcmp(args->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
-               seq_printf(m, ",snapdirname=%s", args->snapdir_name);
-       if (args->name)
-               seq_printf(m, ",name=%s", args->name);
-       if (args->secret)
-               seq_puts(m, ",secret=<hidden>");
-       return 0;
-}
-
-/*
- * caches
- */
-struct kmem_cache *ceph_inode_cachep;
-struct kmem_cache *ceph_cap_cachep;
-struct kmem_cache *ceph_dentry_cachep;
-struct kmem_cache *ceph_file_cachep;
-
-static void ceph_inode_init_once(void *foo)
-{
-       struct ceph_inode_info *ci = foo;
-       inode_init_once(&ci->vfs_inode);
-}
-
-static int __init init_caches(void)
-{
-       ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
-                                     sizeof(struct ceph_inode_info),
-                                     __alignof__(struct ceph_inode_info),
-                                     (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
-                                     ceph_inode_init_once);
-       if (ceph_inode_cachep == NULL)
-               return -ENOMEM;
-
-       ceph_cap_cachep = KMEM_CACHE(ceph_cap,
-                                    SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
-       if (ceph_cap_cachep == NULL)
-               goto bad_cap;
-
-       ceph_dentry_cachep = KMEM_CACHE(ceph_dentry_info,
-                                       SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
-       if (ceph_dentry_cachep == NULL)
-               goto bad_dentry;
-
-       ceph_file_cachep = KMEM_CACHE(ceph_file_info,
-                                     SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
-       if (ceph_file_cachep == NULL)
-               goto bad_file;
-
-       return 0;
-
-bad_file:
-       kmem_cache_destroy(ceph_dentry_cachep);
-bad_dentry:
-       kmem_cache_destroy(ceph_cap_cachep);
-bad_cap:
-       kmem_cache_destroy(ceph_inode_cachep);
-       return -ENOMEM;
-}
-
-static void destroy_caches(void)
-{
-       kmem_cache_destroy(ceph_inode_cachep);
-       kmem_cache_destroy(ceph_cap_cachep);
-       kmem_cache_destroy(ceph_dentry_cachep);
-       kmem_cache_destroy(ceph_file_cachep);
-}
-
-
-/*
- * ceph_umount_begin - initiate forced umount.  Tear down down the
- * mount, skipping steps that may hang while waiting for server(s).
- */
-static void ceph_umount_begin(struct super_block *sb)
-{
-       struct ceph_client *client = ceph_sb_to_client(sb);
-
-       dout("ceph_umount_begin - starting forced umount\n");
-       if (!client)
-               return;
-       client->mount_state = CEPH_MOUNT_SHUTDOWN;
-       return;
-}
-
-static const struct super_operations ceph_super_ops = {
-       .alloc_inode    = ceph_alloc_inode,
-       .destroy_inode  = ceph_destroy_inode,
-       .write_inode    = ceph_write_inode,
-       .sync_fs        = ceph_sync_fs,
-       .put_super      = ceph_put_super,
-       .show_options   = ceph_show_options,
-       .statfs         = ceph_statfs,
-       .umount_begin   = ceph_umount_begin,
-};
-
-
-const char *ceph_msg_type_name(int type)
-{
-       switch (type) {
-       case CEPH_MSG_SHUTDOWN: return "shutdown";
-       case CEPH_MSG_PING: return "ping";
-       case CEPH_MSG_AUTH: return "auth";
-       case CEPH_MSG_AUTH_REPLY: return "auth_reply";
-       case CEPH_MSG_MON_MAP: return "mon_map";
-       case CEPH_MSG_MON_GET_MAP: return "mon_get_map";
-       case CEPH_MSG_MON_SUBSCRIBE: return "mon_subscribe";
-       case CEPH_MSG_MON_SUBSCRIBE_ACK: return "mon_subscribe_ack";
-       case CEPH_MSG_STATFS: return "statfs";
-       case CEPH_MSG_STATFS_REPLY: return "statfs_reply";
-       case CEPH_MSG_MDS_MAP: return "mds_map";
-       case CEPH_MSG_CLIENT_SESSION: return "client_session";
-       case CEPH_MSG_CLIENT_RECONNECT: return "client_reconnect";
-       case CEPH_MSG_CLIENT_REQUEST: return "client_request";
-       case CEPH_MSG_CLIENT_REQUEST_FORWARD: return "client_request_forward";
-       case CEPH_MSG_CLIENT_REPLY: return "client_reply";
-       case CEPH_MSG_CLIENT_CAPS: return "client_caps";
-       case CEPH_MSG_CLIENT_CAPRELEASE: return "client_cap_release";
-       case CEPH_MSG_CLIENT_SNAP: return "client_snap";
-       case CEPH_MSG_CLIENT_LEASE: return "client_lease";
-       case CEPH_MSG_OSD_MAP: return "osd_map";
-       case CEPH_MSG_OSD_OP: return "osd_op";
-       case CEPH_MSG_OSD_OPREPLY: return "osd_opreply";
-       default: return "unknown";
-       }
-}
-
-
  /*
   * mount options
   */
  enum {
         Opt_wsize,
         Opt_rsize,
-       Opt_osdtimeout,
-       Opt_osdkeepalivetimeout,
-       Opt_mount_timeout,
-       Opt_osd_idle_ttl,
         Opt_caps_wanted_delay_min,
         Opt_caps_wanted_delay_max,
         Opt_cap_release_safety,
@@ -344,29 +123,19 @@ enum {
         Opt_congestion_kb,
         Opt_last_int,
         /* int args above */
-       Opt_fsid,
         Opt_snapdirname,
-       Opt_name,
-       Opt_secret,
         Opt_last_string,
         /* string args above */
-       Opt_ip,
-       Opt_noshare,
         Opt_dirstat,
         Opt_nodirstat,
         Opt_rbytes,
         Opt_norbytes,
-       Opt_nocrc,
         Opt_noasyncreaddir,
  };
  
-static match_table_t arg_tokens = {
+static match_table_t fsopt_tokens = {
         {Opt_wsize, "wsize=%d"},
         {Opt_rsize, "rsize=%d"},
-       {Opt_osdtimeout, "osdtimeout=%d"},
-       {Opt_osdkeepalivetimeout, "osdkeepalive=%d"},
-       {Opt_mount_timeout, "mount_timeout=%d"},
-       {Opt_osd_idle_ttl, "osd_idle_ttl=%d"},
         {Opt_caps_wanted_delay_min, "caps_wanted_delay_min=%d"},
         {Opt_caps_wanted_delay_max, "caps_wanted_delay_max=%d"},
         {Opt_cap_release_safety, "cap_release_safety=%d"},
@@ -374,403 +143,459 @@ static match_table_t arg_tokens = {
         {Opt_readdir_max_bytes, "readdir_max_bytes=%d"},
         {Opt_congestion_kb, "write_congestion_kb=%d"},
         /* int args above */
-       {Opt_fsid, "fsid=%s"},
         {Opt_snapdirname, "snapdirname=%s"},
-       {Opt_name, "name=%s"},
-       {Opt_secret, "secret=%s"},
         /* string args above */
-       {Opt_ip, "ip=%s"},
-       {Opt_noshare, "noshare"},
         {Opt_dirstat, "dirstat"},
         {Opt_nodirstat, "nodirstat"},
         {Opt_rbytes, "rbytes"},
         {Opt_norbytes, "norbytes"},
-       {Opt_nocrc, "nocrc"},
         {Opt_noasyncreaddir, "noasyncreaddir"},
         {-1, NULL}
  };
  
-static int parse_fsid(const char *str, struct ceph_fsid *fsid)
+static int parse_fsopt_token(char *c, void *private)
  {
-       int i = 0;
-       char tmp[3];
-       int err = -EINVAL;
-       int d;
-
-       dout("parse_fsid '%s'\n", str);
-       tmp[2] = 0;
-       while (*str && i < 16) {
-               if (ispunct(*str)) {
-                       str++;
-                       continue;
+       struct ceph_mount_options *fsopt = private;
+       substring_t argstr[MAX_OPT_ARGS];
+       int token, intval, ret;
+
+       token = match_token((char *)c, fsopt_tokens, argstr);
+       if (token < 0)
+               return -EINVAL;
+
+       if (token < Opt_last_int) {
+               ret = match_int(&argstr[0], &intval);
+               if (ret < 0) {
+                       pr_err("bad mount option arg (not int) "
+                              "at '%s'\n", c);
+                       return ret;
                 }
-               if (!isxdigit(str[0]) || !isxdigit(str[1]))
-                       break;
-               tmp[0] = str[0];
-               tmp[1] = str[1];
-               if (sscanf(tmp, "%x", &d) < 1)
-                       break;
-               fsid->fsid[i] = d & 0xff;
-               i++;
-               str += 2;
+               dout("got int token %d val %d\n", token, intval);
+       } else if (token > Opt_last_int && token < Opt_last_string) {
+               dout("got string token %d val %s\n", token,
+                    argstr[0].from);
+       } else {
+               dout("got token %d\n", token);
         }
  
-       if (i == 16)
-               err = 0;
-       dout("parse_fsid ret %d got fsid %pU", err, fsid);
-       return err;
+       switch (token) {
+       case Opt_snapdirname:
+               kfree(fsopt->snapdir_name);
+               fsopt->snapdir_name = kstrndup(argstr[0].from,
+                                              argstr[0].to-argstr[0].from,
+                                              GFP_KERNEL);
+               if (!fsopt->snapdir_name)
+                       return -ENOMEM;
+               break;
+
+               /* misc */
+       case Opt_wsize:
+               fsopt->wsize = intval;
+               break;
+       case Opt_rsize:
+               fsopt->rsize = intval;
+               break;
+       case Opt_caps_wanted_delay_min:
+               fsopt->caps_wanted_delay_min = intval;
+               break;
+       case Opt_caps_wanted_delay_max:
+               fsopt->caps_wanted_delay_max = intval;
+               break;
+       case Opt_readdir_max_entries:
+               fsopt->max_readdir = intval;
+               break;
+       case Opt_readdir_max_bytes:
+               fsopt->max_readdir_bytes = intval;
+               break;
+       case Opt_congestion_kb:
+               fsopt->congestion_kb = intval;
+               break;
+       case Opt_dirstat:
+               fsopt->flags |= CEPH_MOUNT_OPT_DIRSTAT;
+               break;
+       case Opt_nodirstat:
+               fsopt->flags &= ~CEPH_MOUNT_OPT_DIRSTAT;
+               break;
+       case Opt_rbytes:
+               fsopt->flags |= CEPH_MOUNT_OPT_RBYTES;
+               break;
+       case Opt_norbytes:
+               fsopt->flags &= ~CEPH_MOUNT_OPT_RBYTES;
+               break;
+       case Opt_noasyncreaddir:
+               fsopt->flags |= CEPH_MOUNT_OPT_NOASYNCREADDIR;
+               break;
+       default:
+               BUG_ON(token);
+       }
+       return 0;
  }
  
-static struct ceph_mount_args *parse_mount_args(int flags, char *options,
-                                               const char *dev_name,
-                                               const char **path)
+static void destroy_mount_options(struct ceph_mount_options *args)
  {
-       struct ceph_mount_args *args;
-       const char *c;
-       int err = -ENOMEM;
-       substring_t argstr[MAX_OPT_ARGS];
+       dout("destroy_mount_options %p\n", args);
+       kfree(args->snapdir_name);
+       kfree(args);
+}
  
-       args = kzalloc(sizeof(*args), GFP_KERNEL);
-       if (!args)
-               return ERR_PTR(-ENOMEM);
-       args->mon_addr = kcalloc(CEPH_MAX_MON, sizeof(*args->mon_addr),
-                                GFP_KERNEL);
-       if (!args->mon_addr)
-               goto out;
+static int strcmp_null(const char *s1, const char *s2)
+{
+       if (!s1 && !s2)
+               return 0;
+       if (s1 && !s2)
+               return -1;
+       if (!s1 && s2)
+               return 1;
+       return strcmp(s1, s2);
+}
  
-       dout("parse_mount_args %p, dev_name '%s'\n", args, dev_name);
-
-       /* start with defaults */
-       args->sb_flags = flags;
-       args->flags = CEPH_OPT_DEFAULT;
-       args->osd_timeout = CEPH_OSD_TIMEOUT_DEFAULT;
-       args->osd_keepalive_timeout = CEPH_OSD_KEEPALIVE_DEFAULT;
-       args->mount_timeout = CEPH_MOUNT_TIMEOUT_DEFAULT; /* seconds */
-       args->osd_idle_ttl = CEPH_OSD_IDLE_TTL_DEFAULT;   /* seconds */
-       args->caps_wanted_delay_min = CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT;
-       args->caps_wanted_delay_max = CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT;
-       args->rsize = CEPH_MOUNT_RSIZE_DEFAULT;
-       args->snapdir_name = kstrdup(CEPH_SNAPDIRNAME_DEFAULT, GFP_KERNEL);
-       args->cap_release_safety = CEPH_CAP_RELEASE_SAFETY_DEFAULT;
-       args->max_readdir = CEPH_MAX_READDIR_DEFAULT;
-       args->max_readdir_bytes = CEPH_MAX_READDIR_BYTES_DEFAULT;
-       args->congestion_kb = default_congestion_kb();
-
-       /* ip1[:port1][,ip2[:port2]...]:/subdir/in/fs */
-       err = -EINVAL;
-       if (!dev_name)
-               goto out;
-       *path = strstr(dev_name, ":/");
-       if (*path == NULL) {
-               pr_err("device name is missing path (no :/ in %s)\n",
-                      dev_name);
-               goto out;
-       }
+static int compare_mount_options(struct ceph_mount_options *new_fsopt,
+                                struct ceph_options *new_opt,
+                                struct ceph_fs_client *fsc)
+{
+       struct ceph_mount_options *fsopt1 = new_fsopt;
+       struct ceph_mount_options *fsopt2 = fsc->mount_options;
+       int ofs = offsetof(struct ceph_mount_options, snapdir_name);
+       int ret;
  
-       /* get mon ip(s) */
-       err = ceph_parse_ips(dev_name, *path, args->mon_addr,
-                            CEPH_MAX_MON, &args->num_mon);
-       if (err < 0)
-               goto out;
+       ret = memcmp(fsopt1, fsopt2, ofs);
+       if (ret)
+               return ret;
+
+       ret = strcmp_null(fsopt1->snapdir_name, fsopt2->snapdir_name);
+       if (ret)
+               return ret;
+
+       return ceph_compare_options(new_opt, fsc->client);
+}
+
+static int parse_mount_options(struct ceph_mount_options **pfsopt,
+                              struct ceph_options **popt,
+                              int flags, char *options,
+                              const char *dev_name,
+                              const char **path)
+{
+       struct ceph_mount_options *fsopt;
+       const char *dev_name_end;
+       int err = -ENOMEM;
+
+       fsopt = kzalloc(sizeof(*fsopt), GFP_KERNEL);
+       if (!fsopt)
+               return -ENOMEM;
+
+       dout("parse_mount_options %p, dev_name '%s'\n", fsopt, dev_name);
+
+        fsopt->sb_flags = flags;
+        fsopt->flags = CEPH_MOUNT_OPT_DEFAULT;
+
+        fsopt->rsize = CEPH_MOUNT_RSIZE_DEFAULT;
+        fsopt->snapdir_name = kstrdup(CEPH_SNAPDIRNAME_DEFAULT, GFP_KERNEL);
+        fsopt->cap_release_safety = CEPH_CAP_RELEASE_SAFETY_DEFAULT;
+        fsopt->max_readdir = CEPH_MAX_READDIR_DEFAULT;
+        fsopt->max_readdir_bytes = CEPH_MAX_READDIR_BYTES_DEFAULT;
+        fsopt->congestion_kb = default_congestion_kb();
+       
+        /* ip1[:port1][,ip2[:port2]...]:/subdir/in/fs */
+        err = -EINVAL;
+        if (!dev_name)
+                goto out;
+        *path = strstr(dev_name, ":/");
+        if (*path == NULL) {
+                pr_err("device name is missing path (no :/ in %s)\n",
+                       dev_name);
+                goto out;
+        }
+       dev_name_end = *path;
+       dout("device name '%.*s'\n", (int)(dev_name_end - dev_name), dev_name);
  
         /* path on server */
         *path += 2;
         dout("server path '%s'\n", *path);
  
-       /* parse mount options */
-       while ((c = strsep(&options, ",")) != NULL) {
-               int token, intval, ret;
-               if (!*c)
-                       continue;
-               err = -EINVAL;
-               token = match_token((char *)c, arg_tokens, argstr);
-               if (token < 0) {
-                       pr_err("bad mount option at '%s'\n", c);
-                       goto out;
-               }
-               if (token < Opt_last_int) {
-                       ret = match_int(&argstr[0], &intval);
-                       if (ret < 0) {
-                               pr_err("bad mount option arg (not int) "
-                                      "at '%s'\n", c);
-                               continue;
-                       }
-                       dout("got int token %d val %d\n", token, intval);
-               } else if (token > Opt_last_int && token < Opt_last_string) {
-                       dout("got string token %d val %s\n", token,
-                            argstr[0].from);
-               } else {
-                       dout("got token %d\n", token);
-               }
-               switch (token) {
-               case Opt_ip:
-                       err = ceph_parse_ips(argstr[0].from,
-                                            argstr[0].to,
-                                            &args->my_addr,
-                                            1, NULL);
-                       if (err < 0)
-                               goto out;
-                       args->flags |= CEPH_OPT_MYIP;
-                       break;
-
-               case Opt_fsid:
-                       err = parse_fsid(argstr[0].from, &args->fsid);
-                       if (err == 0)
-                               args->flags |= CEPH_OPT_FSID;
-                       break;
-               case Opt_snapdirname:
-                       kfree(args->snapdir_name);
-                       args->snapdir_name = kstrndup(argstr[0].from,
-                                             argstr[0].to-argstr[0].from,
-                                             GFP_KERNEL);
-                       break;
-               case Opt_name:
-                       args->name = kstrndup(argstr[0].from,
-                                             argstr[0].to-argstr[0].from,
-                                             GFP_KERNEL);
-                       break;
-               case Opt_secret:
-                       args->secret = kstrndup(argstr[0].from,
-                                               argstr[0].to-argstr[0].from,
-                                               GFP_KERNEL);
-                       break;
-
-                       /* misc */
-               case Opt_wsize:
-                       args->wsize = intval;
-                       break;
-               case Opt_rsize:
-                       args->rsize = intval;
-                       break;
-               case Opt_osdtimeout:
-                       args->osd_timeout = intval;
-                       break;
-               case Opt_osdkeepalivetimeout:
-                       args->osd_keepalive_timeout = intval;
-                       break;
-               case Opt_osd_idle_ttl:
-                       args->osd_idle_ttl = intval;
-                       break;
-               case Opt_mount_timeout:
-                       args->mount_timeout = intval;
-                       break;
-               case Opt_caps_wanted_delay_min:
-                       args->caps_wanted_delay_min = intval;
-                       break;
-               case Opt_caps_wanted_delay_max:
-                       args->caps_wanted_delay_max = intval;
-                       break;
-               case Opt_readdir_max_entries:
-                       args->max_readdir = intval;
-                       break;
-               case Opt_readdir_max_bytes:
-                       args->max_readdir_bytes = intval;
-                       break;
-               case Opt_congestion_kb:
-                       args->congestion_kb = intval;
-                       break;
-
-               case Opt_noshare:
-                       args->flags |= CEPH_OPT_NOSHARE;
-                       break;
-
-               case Opt_dirstat:
-                       args->flags |= CEPH_OPT_DIRSTAT;
-                       break;
-               case Opt_nodirstat:
-                       args->flags &= ~CEPH_OPT_DIRSTAT;
-                       break;
-               case Opt_rbytes:
-                       args->flags |= CEPH_OPT_RBYTES;
-                       break;
-               case Opt_norbytes:
-                       args->flags &= ~CEPH_OPT_RBYTES;
-                       break;
-               case Opt_nocrc:
-                       args->flags |= CEPH_OPT_NOCRC;
-                       break;
-               case Opt_noasyncreaddir:
-                       args->flags |= CEPH_OPT_NOASYNCREADDIR;
-                       break;
-
-               default:
-                       BUG_ON(token);
-               }
-       }
-       return args;
+       err = ceph_parse_options(popt, options, dev_name, dev_name_end,
+                                parse_fsopt_token, (void *)fsopt);
+       if (err)
+               goto out;
+
+       /* success */
+       *pfsopt = fsopt;
+       return 0;
  
  out:
-       kfree(args->mon_addr);
-       kfree(args);
-       return ERR_PTR(err);
+       destroy_mount_options(fsopt);
+       return err;
  }
  
-static void destroy_mount_args(struct ceph_mount_args *args)
+/**
+ * ceph_show_options - Show mount options in /proc/mounts
+ * @m: seq_file to write to
+ * @mnt: mount descriptor
+ */
+static int ceph_show_options(struct seq_file *m, struct vfsmount *mnt)
  {
-       dout("destroy_mount_args %p\n", args);
-       kfree(args->snapdir_name);
-       args->snapdir_name = NULL;
-       kfree(args->name);
-       args->name = NULL;
-       kfree(args->secret);
-       args->secret = NULL;
-       kfree(args);
+       struct ceph_fs_client *fsc = ceph_sb_to_client(mnt->mnt_sb);
+       struct ceph_mount_options *fsopt = fsc->mount_options;
+       struct ceph_options *opt = fsc->client->options;
+
+       if (opt->flags & CEPH_OPT_FSID)
+               seq_printf(m, ",fsid=%pU", &opt->fsid);
+       if (opt->flags & CEPH_OPT_NOSHARE)
+               seq_puts(m, ",noshare");
+       if (opt->flags & CEPH_OPT_NOCRC)
+               seq_puts(m, ",nocrc");
+
+       if (opt->name)
+               seq_printf(m, ",name=%s", opt->name);
+       if (opt->secret)
+               seq_puts(m, ",secret=<hidden>");
+
+       if (opt->mount_timeout != CEPH_MOUNT_TIMEOUT_DEFAULT)
+               seq_printf(m, ",mount_timeout=%d", opt->mount_timeout);
+       if (opt->osd_idle_ttl != CEPH_OSD_IDLE_TTL_DEFAULT)
+               seq_printf(m, ",osd_idle_ttl=%d", opt->osd_idle_ttl);
+       if (opt->osd_timeout != CEPH_OSD_TIMEOUT_DEFAULT)
+               seq_printf(m, ",osdtimeout=%d", opt->osd_timeout);
+       if (opt->osd_keepalive_timeout != CEPH_OSD_KEEPALIVE_DEFAULT)
+               seq_printf(m, ",osdkeepalivetimeout=%d",
+                          opt->osd_keepalive_timeout);
+
+       if (fsopt->flags & CEPH_MOUNT_OPT_DIRSTAT)
+               seq_puts(m, ",dirstat");
+       if ((fsopt->flags & CEPH_MOUNT_OPT_RBYTES) == 0)
+               seq_puts(m, ",norbytes");
+       if (fsopt->flags & CEPH_MOUNT_OPT_NOASYNCREADDIR)
+               seq_puts(m, ",noasyncreaddir");
+
+       if (fsopt->wsize)
+               seq_printf(m, ",wsize=%d", fsopt->wsize);
+       if (fsopt->rsize != CEPH_MOUNT_RSIZE_DEFAULT)
+               seq_printf(m, ",rsize=%d", fsopt->rsize);
+       if (fsopt->congestion_kb != default_congestion_kb())
+               seq_printf(m, ",write_congestion_kb=%d", fsopt->congestion_kb);
+       if (fsopt->caps_wanted_delay_min != CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT)
+               seq_printf(m, ",caps_wanted_delay_min=%d",
+                        fsopt->caps_wanted_delay_min);
+       if (fsopt->caps_wanted_delay_max != CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT)
+               seq_printf(m, ",caps_wanted_delay_max=%d",
+                          fsopt->caps_wanted_delay_max);
+       if (fsopt->cap_release_safety != CEPH_CAP_RELEASE_SAFETY_DEFAULT)
+               seq_printf(m, ",cap_release_safety=%d",
+                          fsopt->cap_release_safety);
+       if (fsopt->max_readdir != CEPH_MAX_READDIR_DEFAULT)
+               seq_printf(m, ",readdir_max_entries=%d", fsopt->max_readdir);
+       if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
+               seq_printf(m, ",readdir_max_bytes=%d", fsopt->max_readdir_bytes);
+       if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
+               seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name);
+       return 0;
  }
  
  /*
- * create a fresh client instance
+ * handle any mon messages the standard library doesn't understand.
+ * return error if we don't either.
   */
-static struct ceph_client *ceph_create_client(struct ceph_mount_args *args)
+static int extra_mon_dispatch(struct ceph_client *client, struct ceph_msg *msg)
  {
-       struct ceph_client *client;
+       struct ceph_fs_client *fsc = client->private;
+       int type = le16_to_cpu(msg->hdr.type);
+
+       switch (type) {
+       case CEPH_MSG_MDS_MAP:
+               ceph_mdsc_handle_map(fsc->mdsc, msg);
+               return 0;
+
+       default:
+               return -1;
+       }
+}
+
+/*
+ * create a new fs client
+ */
+struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
+                                       struct ceph_options *opt)
+{
+       struct ceph_fs_client *fsc;
         int err = -ENOMEM;
  
-       client = kzalloc(sizeof(*client), GFP_KERNEL);
-       if (client == NULL)
+       fsc = kzalloc(sizeof(*fsc), GFP_KERNEL);
+       if (!fsc)
                 return ERR_PTR(-ENOMEM);
  
-       mutex_init(&client->mount_mutex);
-
-       init_waitqueue_head(&client->auth_wq);
+       fsc->client = ceph_create_client(opt, fsc);
+       if (IS_ERR(fsc->client)) {
+               err = PTR_ERR(fsc->client);
+               goto fail;
+       }
+       fsc->client->extra_mon_dispatch = extra_mon_dispatch;
+       fsc->client->supported_features |= CEPH_FEATURE_FLOCK;
+       fsc->client->monc.want_mdsmap = 1;
  
-       client->sb = NULL;
-       client->mount_state = CEPH_MOUNT_MOUNTING;
-       client->mount_args = args;
+       fsc->mount_options = fsopt;
  
-       client->msgr = NULL;
+       fsc->sb = NULL;
+       fsc->mount_state = CEPH_MOUNT_MOUNTING;
  
-       client->auth_err = 0;
-       atomic_long_set(&client->writeback_count, 0);
+       atomic_long_set(&fsc->writeback_count, 0);
  
-       err = bdi_init(&client->backing_dev_info);
+       err = bdi_init(&fsc->backing_dev_info);
         if (err < 0)
-               goto fail;
+               goto fail_client;
  
         err = -ENOMEM;
-       client->wb_wq = create_workqueue("ceph-writeback");
-       if (client->wb_wq == NULL)
+       fsc->wb_wq = create_workqueue("ceph-writeback");
+       if (fsc->wb_wq == NULL)
                 goto fail_bdi;
-       client->pg_inv_wq = create_singlethread_workqueue("ceph-pg-invalid");
-       if (client->pg_inv_wq == NULL)
+       fsc->pg_inv_wq = create_singlethread_workqueue("ceph-pg-invalid");
+       if (fsc->pg_inv_wq == NULL)
                 goto fail_wb_wq;
-       client->trunc_wq = create_singlethread_workqueue("ceph-trunc");
-       if (client->trunc_wq == NULL)
+       fsc->trunc_wq = create_singlethread_workqueue("ceph-trunc");
+       if (fsc->trunc_wq == NULL)
                 goto fail_pg_inv_wq;
  
         /* set up mempools */
         err = -ENOMEM;
-       client->wb_pagevec_pool = mempool_create_kmalloc_pool(10,
-                             client->mount_args->wsize >> PAGE_CACHE_SHIFT);
-       if (!client->wb_pagevec_pool)
+       fsc->wb_pagevec_pool = mempool_create_kmalloc_pool(10,
+                             fsc->mount_options->wsize >> PAGE_CACHE_SHIFT);
+       if (!fsc->wb_pagevec_pool)
                 goto fail_trunc_wq;
  
         /* caps */
-       client->min_caps = args->max_readdir;
+       fsc->min_caps = fsopt->max_readdir;
+
+       return fsc;
  
-       /* subsystems */
-       err = ceph_monc_init(&client->monc, client);
-       if (err < 0)
-               goto fail_mempool;
-       err = ceph_osdc_init(&client->osdc, client);
-       if (err < 0)
-               goto fail_monc;
-       err = ceph_mdsc_init(&client->mdsc, client);
-       if (err < 0)
-               goto fail_osdc;
-       return client;
-
-fail_osdc:
-       ceph_osdc_stop(&client->osdc);
-fail_monc:
-       ceph_monc_stop(&client->monc);
-fail_mempool:
-       mempool_destroy(client->wb_pagevec_pool);
  fail_trunc_wq:
-       destroy_workqueue(client->trunc_wq);
+       destroy_workqueue(fsc->trunc_wq);
  fail_pg_inv_wq:
-       destroy_workqueue(client->pg_inv_wq);
+       destroy_workqueue(fsc->pg_inv_wq);
  fail_wb_wq:
-       destroy_workqueue(client->wb_wq);
+       destroy_workqueue(fsc->wb_wq);
  fail_bdi:
-       bdi_destroy(&client->backing_dev_info);
+       bdi_destroy(&fsc->backing_dev_info);
+fail_client:
+       ceph_destroy_client(fsc->client);
  fail:
-       kfree(client);
+       kfree(fsc);
         return ERR_PTR(err);
  }
  
-static void ceph_destroy_client(struct ceph_client *client)
+void destroy_fs_client(struct ceph_fs_client *fsc)
  {
-       dout("destroy_client %p\n", client);
+       dout("destroy_fs_client %p\n", fsc);
  
-       /* unmount */
-       ceph_mdsc_stop(&client->mdsc);
-       ceph_osdc_stop(&client->osdc);
+       destroy_workqueue(fsc->wb_wq);
+       destroy_workqueue(fsc->pg_inv_wq);
+       destroy_workqueue(fsc->trunc_wq);
  
-       /*
-        * make sure mds and osd connections close out before destroying
-        * the auth module, which is needed to free those connections'
-        * ceph_authorizers.
-        */
-       ceph_msgr_flush();
-
-       ceph_monc_stop(&client->monc);
+       bdi_destroy(&fsc->backing_dev_info);
  
-       ceph_debugfs_client_cleanup(client);
-       destroy_workqueue(client->wb_wq);
-       destroy_workqueue(client->pg_inv_wq);
-       destroy_workqueue(client->trunc_wq);
+       mempool_destroy(fsc->wb_pagevec_pool);
  
-       bdi_destroy(&client->backing_dev_info);
+       destroy_mount_options(fsc->mount_options);
  
-       if (client->msgr)
-               ceph_messenger_destroy(client->msgr);
-       mempool_destroy(client->wb_pagevec_pool);
+       ceph_fs_debugfs_cleanup(fsc);
  
-       destroy_mount_args(client->mount_args);
+       ceph_destroy_client(fsc->client);
  
-       kfree(client);
-       dout("destroy_client %p done\n", client);
+       kfree(fsc);
+       dout("destroy_fs_client %p done\n", fsc);
  }
  
  /*
- * Initially learn our fsid, or verify an fsid matches.
+ * caches
   */
-int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid)
+struct kmem_cache *ceph_inode_cachep;
+struct kmem_cache *ceph_cap_cachep;
+struct kmem_cache *ceph_dentry_cachep;
+struct kmem_cache *ceph_file_cachep;
+
+static void ceph_inode_init_once(void *foo)
  {
-       if (client->have_fsid) {
-               if (ceph_fsid_compare(&client->fsid, fsid)) {
-                       pr_err("bad fsid, had %pU got %pU",
-                              &client->fsid, fsid);
-                       return -1;
-               }
-       } else {
-               pr_info("client%lld fsid %pU\n", client->monc.auth->global_id,
-                       fsid);
-               memcpy(&client->fsid, fsid, sizeof(*fsid));
-               ceph_debugfs_client_init(client);
-               client->have_fsid = true;
-       }
+       struct ceph_inode_info *ci = foo;
+       inode_init_once(&ci->vfs_inode);
+}
+
+static int __init init_caches(void)
+{
+       ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
+                                     sizeof(struct ceph_inode_info),
+                                     __alignof__(struct ceph_inode_info),
+                                     (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
+                                     ceph_inode_init_once);
+       if (ceph_inode_cachep == NULL)
+               return -ENOMEM;
+
+       ceph_cap_cachep = KMEM_CACHE(ceph_cap,
+                                    SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+       if (ceph_cap_cachep == NULL)
+               goto bad_cap;
+
+       ceph_dentry_cachep = KMEM_CACHE(ceph_dentry_info,
+                                       SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+       if (ceph_dentry_cachep == NULL)
+               goto bad_dentry;
+
+       ceph_file_cachep = KMEM_CACHE(ceph_file_info,
+                                     SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+       if (ceph_file_cachep == NULL)
+               goto bad_file;
+
         return 0;
+
+bad_file:
+       kmem_cache_destroy(ceph_dentry_cachep);
+bad_dentry:
+       kmem_cache_destroy(ceph_cap_cachep);
+bad_cap:
+       kmem_cache_destroy(ceph_inode_cachep);
+       return -ENOMEM;
  }
  
+static void destroy_caches(void)
+{
+       kmem_cache_destroy(ceph_inode_cachep);
+       kmem_cache_destroy(ceph_cap_cachep);
+       kmem_cache_destroy(ceph_dentry_cachep);
+       kmem_cache_destroy(ceph_file_cachep);
+}
+
+
  /*
- * true if we have the mon map (and have thus joined the cluster)
+ * ceph_umount_begin - initiate forced umount.  Tear down down the
+ * mount, skipping steps that may hang while waiting for server(s).
   */
-static int have_mon_and_osd_map(struct ceph_client *client)
+static void ceph_umount_begin(struct super_block *sb)
  {
-       return client->monc.monmap && client->monc.monmap->epoch &&
-              client->osdc.osdmap && client->osdc.osdmap->epoch;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
+
+       dout("ceph_umount_begin - starting forced umount\n");
+       if (!fsc)
+               return;
+       fsc->mount_state = CEPH_MOUNT_SHUTDOWN;
+       return;
  }
  
+static const struct super_operations ceph_super_ops = {
+       .alloc_inode    = ceph_alloc_inode,
+       .destroy_inode  = ceph_destroy_inode,
+       .write_inode    = ceph_write_inode,
+       .sync_fs        = ceph_sync_fs,
+       .put_super      = ceph_put_super,
+       .show_options   = ceph_show_options,
+       .statfs         = ceph_statfs,
+       .umount_begin   = ceph_umount_begin,
+};
+
  /*
   * Bootstrap mount by opening the root directory.  Note the mount
   * @started time from caller, and time out if this takes too long.
   */
-static struct dentry *open_root_dentry(struct ceph_client *client,
+static struct dentry *open_root_dentry(struct ceph_fs_client *fsc,
                                        const char *path,
                                        unsigned long started)
  {
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct ceph_mds_request *req = NULL;
         int err;
         struct dentry *root;
@@ -784,14 +609,14 @@ static struct dentry *open_root_dentry(struct ceph_client *client,
         req->r_ino1.ino = CEPH_INO_ROOT;
         req->r_ino1.snap = CEPH_NOSNAP;
         req->r_started = started;
-       req->r_timeout = client->mount_args->mount_timeout * HZ;
+       req->r_timeout = fsc->client->options->mount_timeout * HZ;
         req->r_args.getattr.mask = cpu_to_le32(CEPH_STAT_CAP_INODE);
         req->r_num_caps = 2;
         err = ceph_mdsc_do_request(mdsc, NULL, req);
         if (err == 0) {
                 dout("open_root_inode success\n");
                 if (ceph_ino(req->r_target_inode) == CEPH_INO_ROOT &&
-                   client->sb->s_root == NULL)
+                   fsc->sb->s_root == NULL)
                         root = d_alloc_root(req->r_target_inode);
                 else
                         root = d_obtain_alias(req->r_target_inode);
@@ -804,105 +629,86 @@ static struct dentry *open_root_dentry(struct ceph_client *client,
         return root;
  }
  
+
+
+
  /*
   * mount: join the ceph cluster, and open root directory.
   */
-static int ceph_mount(struct ceph_client *client, struct vfsmount *mnt,
+static int ceph_mount(struct ceph_fs_client *fsc, struct vfsmount *mnt,
                       const char *path)
  {
-       struct ceph_entity_addr *myaddr = NULL;
         int err;
-       unsigned long timeout = client->mount_args->mount_timeout * HZ;
         unsigned long started = jiffies;  /* note the start time */
         struct dentry *root;
+       int first = 0;   /* first vfsmount for this super_block */
  
         dout("mount start\n");
-       mutex_lock(&client->mount_mutex);
-
-       /* initialize the messenger */
-       if (client->msgr == NULL) {
-               if (ceph_test_opt(client, MYIP))
-                       myaddr = &client->mount_args->my_addr;
-               client->msgr = ceph_messenger_create(myaddr);
-               if (IS_ERR(client->msgr)) {
-                       err = PTR_ERR(client->msgr);
-                       client->msgr = NULL;
-                       goto out;
-               }
-               client->msgr->nocrc = ceph_test_opt(client, NOCRC);
-       }
+       mutex_lock(&fsc->client->mount_mutex);
  
-       /* open session, and wait for mon, mds, and osd maps */
-       err = ceph_monc_open_session(&client->monc);
+       err = __ceph_open_session(fsc->client, started);
         if (err < 0)
                 goto out;
  
-       while (!have_mon_and_osd_map(client)) {
-               err = -EIO;
-               if (timeout && time_after_eq(jiffies, started + timeout))
-                       goto out;
-
-               /* wait */
-               dout("mount waiting for mon_map\n");
-               err = wait_event_interruptible_timeout(client->auth_wq,
-                      have_mon_and_osd_map(client) || (client->auth_err < 0),
-                      timeout);
-               if (err == -EINTR || err == -ERESTARTSYS)
-                       goto out;
-               if (client->auth_err < 0) {
-                       err = client->auth_err;
-                       goto out;
-               }
-       }
-
         dout("mount opening root\n");
-       root = open_root_dentry(client, "", started);
+       root = open_root_dentry(fsc, "", started);
         if (IS_ERR(root)) {
                 err = PTR_ERR(root);
                 goto out;
         }
-       if (client->sb->s_root)
+       if (fsc->sb->s_root) {
                 dput(root);
-       else
-               client->sb->s_root = root;
+       } else {
+               fsc->sb->s_root = root;
+               first = 1;
+
+               err = ceph_fs_debugfs_init(fsc);
+               if (err < 0)
+                       goto fail;
+       }
  
         if (path[0] == 0) {
                 dget(root);
         } else {
                 dout("mount opening base mountpoint\n");
-               root = open_root_dentry(client, path, started);
+               root = open_root_dentry(fsc, path, started);
                 if (IS_ERR(root)) {
                         err = PTR_ERR(root);
-                       dput(client->sb->s_root);
-                       client->sb->s_root = NULL;
-                       goto out;
+                       goto fail;
                 }
         }
  
         mnt->mnt_root = root;
-       mnt->mnt_sb = client->sb;
+       mnt->mnt_sb = fsc->sb;
  
-       client->mount_state = CEPH_MOUNT_MOUNTED;
+       fsc->mount_state = CEPH_MOUNT_MOUNTED;
         dout("mount success\n");
         err = 0;
  
  out:
-       mutex_unlock(&client->mount_mutex);
+       mutex_unlock(&fsc->client->mount_mutex);
         return err;
+
+fail:
+       if (first) {
+               dput(fsc->sb->s_root);
+               fsc->sb->s_root = NULL;
+       }
+       goto out;
  }
  
  static int ceph_set_super(struct super_block *s, void *data)
  {
-       struct ceph_client *client = data;
+       struct ceph_fs_client *fsc = data;
         int ret;
  
         dout("set_super %p data %p\n", s, data);
  
-       s->s_flags = client->mount_args->sb_flags;
+       s->s_flags = fsc->mount_options->sb_flags;
         s->s_maxbytes = 1ULL << 40;  /* temp value until we get mdsmap */
  
-       s->s_fs_info = client;
-       client->sb = s;
+       s->s_fs_info = fsc;
+       fsc->sb = s;
  
         s->s_op = &ceph_super_ops;
         s->s_export_op = &ceph_export_ops;
@@ -917,7 +723,7 @@ static int ceph_set_super(struct super_block *s, void *data)
  
  fail:
         s->s_fs_info = NULL;
-       client->sb = NULL;
+       fsc->sb = NULL;
         return ret;
  }
  
@@ -926,30 +732,23 @@ fail:
   */
  static int ceph_compare_super(struct super_block *sb, void *data)
  {
-       struct ceph_client *new = data;
-       struct ceph_mount_args *args = new->mount_args;
-       struct ceph_client *other = ceph_sb_to_client(sb);
-       int i;
+       struct ceph_fs_client *new = data;
+       struct ceph_mount_options *fsopt = new->mount_options;
+       struct ceph_options *opt = new->client->options;
+       struct ceph_fs_client *other = ceph_sb_to_client(sb);
  
         dout("ceph_compare_super %p\n", sb);
-       if (args->flags & CEPH_OPT_FSID) {
-               if (ceph_fsid_compare(&args->fsid, &other->fsid)) {
-                       dout("fsid doesn't match\n");
-                       return 0;
-               }
-       } else {
-               /* do we share (a) monitor? */
-               for (i = 0; i < new->monc.monmap->num_mon; i++)
-                       if (ceph_monmap_contains(other->monc.monmap,
-                                        &new->monc.monmap->mon_inst[i].addr))
-                               break;
-               if (i == new->monc.monmap->num_mon) {
-                       dout("mon ip not part of monmap\n");
-                       return 0;
-               }
-               dout("mon ip matches existing sb %p\n", sb);
+
+       if (compare_mount_options(fsopt, opt, other)) {
+               dout("monitor(s)/mount options don't match\n");
+               return 0;
         }
-       if (args->sb_flags != other->mount_args->sb_flags) {
+       if ((opt->flags & CEPH_OPT_FSID) &&
+           ceph_fsid_compare(&opt->fsid, &other->client->fsid)) {
+               dout("fsid doesn't match\n");
+               return 0;
+       }
+       if (fsopt->sb_flags != other->mount_options->sb_flags) {
                 dout("flags differ\n");
                 return 0;
         }
@@ -961,19 +760,20 @@ static int ceph_compare_super(struct super_block *sb, void *data)
   */
  static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
  
-static int ceph_register_bdi(struct super_block *sb, struct ceph_client *client)
+static int ceph_register_bdi(struct super_block *sb,
+                            struct ceph_fs_client *fsc)
  {
         int err;
  
         /* set ra_pages based on rsize mount option? */
-       if (client->mount_args->rsize >= PAGE_CACHE_SIZE)
-               client->backing_dev_info.ra_pages =
-                       (client->mount_args->rsize + PAGE_CACHE_SIZE - 1)
+       if (fsc->mount_options->rsize >= PAGE_CACHE_SIZE)
+               fsc->backing_dev_info.ra_pages =
+                       (fsc->mount_options->rsize + PAGE_CACHE_SIZE - 1)
                         >> PAGE_SHIFT;
-       err = bdi_register(&client->backing_dev_info, NULL, "ceph-%d",
+       err = bdi_register(&fsc->backing_dev_info, NULL, "ceph-%d",
                            atomic_long_inc_return(&bdi_seq));
         if (!err)
-               sb->s_bdi = &client->backing_dev_info;
+               sb->s_bdi = &fsc->backing_dev_info;
         return err;
  }
  
@@ -982,46 +782,52 @@ static int ceph_get_sb(struct file_system_type *fs_type,
                        struct vfsmount *mnt)
  {
         struct super_block *sb;
-       struct ceph_client *client;
+       struct ceph_fs_client *fsc;
         int err;
         int (*compare_super)(struct super_block *, void *) = ceph_compare_super;
         const char *path = NULL;
-       struct ceph_mount_args *args;
+       struct ceph_mount_options *fsopt = NULL;
+       struct ceph_options *opt = NULL;
  
         dout("ceph_get_sb\n");
-       args = parse_mount_args(flags, data, dev_name, &path);
-       if (IS_ERR(args)) {
-               err = PTR_ERR(args);
+       err = parse_mount_options(&fsopt, &opt, flags, data, dev_name, &path);
+       if (err < 0)
                 goto out_final;
-       }
  
         /* create client (which we may/may not use) */
-       client = ceph_create_client(args);
-       if (IS_ERR(client)) {
-               err = PTR_ERR(client);
+       fsc = create_fs_client(fsopt, opt);
+       if (IS_ERR(fsc)) {
+               err = PTR_ERR(fsc);
+               kfree(fsopt);
+               kfree(opt);
                 goto out_final;
         }
  
-       if (client->mount_args->flags & CEPH_OPT_NOSHARE)
+       err = ceph_mdsc_init(fsc);
+       if (err < 0)
+               goto out;
+
+       if (ceph_test_opt(fsc->client, NOSHARE))
                 compare_super = NULL;
-       sb = sget(fs_type, compare_super, ceph_set_super, client);
+       sb = sget(fs_type, compare_super, ceph_set_super, fsc);
         if (IS_ERR(sb)) {
                 err = PTR_ERR(sb);
                 goto out;
         }
  
-       if (ceph_sb_to_client(sb) != client) {
-               ceph_destroy_client(client);
-               client = ceph_sb_to_client(sb);
-               dout("get_sb got existing client %p\n", client);
+       if (ceph_sb_to_client(sb) != fsc) {
+               ceph_mdsc_destroy(fsc);
+               destroy_fs_client(fsc);
+               fsc = ceph_sb_to_client(sb);
+               dout("get_sb got existing client %p\n", fsc);
         } else {
-               dout("get_sb using new client %p\n", client);
-               err = ceph_register_bdi(sb, client);
+               dout("get_sb using new client %p\n", fsc);
+               err = ceph_register_bdi(sb, fsc);
                 if (err < 0)
                         goto out_splat;
         }
  
-       err = ceph_mount(client, mnt, path);
+       err = ceph_mount(fsc, mnt, path);
         if (err < 0)
                 goto out_splat;
         dout("root %p inode %p ino %llx.%llx\n", mnt->mnt_root,
@@ -1029,12 +835,13 @@ static int ceph_get_sb(struct file_system_type *fs_type,
         return 0;
  
  out_splat:
-       ceph_mdsc_close_sessions(&client->mdsc);
+       ceph_mdsc_close_sessions(fsc->mdsc);
         deactivate_locked_super(sb);
         goto out_final;
  
  out:
-       ceph_destroy_client(client);
+       ceph_mdsc_destroy(fsc);
+       destroy_fs_client(fsc);
  out_final:
         dout("ceph_get_sb fail %d\n", err);
         return err;
@@ -1042,11 +849,12 @@ out_final:
  
  static void ceph_kill_sb(struct super_block *s)
  {
-       struct ceph_client *client = ceph_sb_to_client(s);
+       struct ceph_fs_client *fsc = ceph_sb_to_client(s);
         dout("kill_sb %p\n", s);
-       ceph_mdsc_pre_umount(&client->mdsc);
+       ceph_mdsc_pre_umount(fsc->mdsc);
         kill_anon_super(s);    /* will call put_super after sb is r/o */
-       ceph_destroy_client(client);
+       ceph_mdsc_destroy(fsc);
+       destroy_fs_client(fsc);
  }
  
  static struct file_system_type ceph_fs_type = {
@@ -1062,36 +870,20 @@ static struct file_system_type ceph_fs_type = {
  
  static int __init init_ceph(void)
  {
-       int ret = 0;
-
-       ret = ceph_debugfs_init();
-       if (ret < 0)
-               goto out;
-
-       ret = ceph_msgr_init();
-       if (ret < 0)
-               goto out_debugfs;
-
-       ret = init_caches();
+       int ret = init_caches();
         if (ret)
-               goto out_msgr;
+               goto out;
  
         ret = register_filesystem(&ceph_fs_type);
         if (ret)
                 goto out_icache;
  
-       pr_info("loaded (mon/mds/osd proto %d/%d/%d, osdmap %d/%d %d/%d)\n",
-               CEPH_MONC_PROTOCOL, CEPH_MDSC_PROTOCOL, CEPH_OSDC_PROTOCOL,
-               CEPH_OSDMAP_VERSION, CEPH_OSDMAP_VERSION_EXT,
-               CEPH_OSDMAP_INC_VERSION, CEPH_OSDMAP_INC_VERSION_EXT);
+       pr_info("loaded (mds proto %d)\n", CEPH_MDSC_PROTOCOL);
+
         return 0;
  
  out_icache:
         destroy_caches();
-out_msgr:
-       ceph_msgr_exit();
-out_debugfs:
-       ceph_debugfs_cleanup();
  out:
         return ret;
  }
@@ -1101,8 +893,6 @@ static void __exit exit_ceph(void)
         dout("exit_ceph\n");
         unregister_filesystem(&ceph_fs_type);
         destroy_caches();
-       ceph_msgr_exit();
-       ceph_debugfs_cleanup();
  }
  
  module_init(init_ceph);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h

index b87638e84c4bc266e9b325f12e7fb98fc78e8986..1886294e12f7a3106c00f3376d92e43330514081 100644 (file)
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1,7 +1,7 @@
  #ifndef _FS_CEPH_SUPER_H
  #define _FS_CEPH_SUPER_H
  
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
  
  #include <asm/unaligned.h>
  #include <linux/backing-dev.h>
@@ -14,13 +14,7 @@
  #include <linux/writeback.h>
  #include <linux/slab.h>
  
-#include "types.h"
-#include "messenger.h"
-#include "msgpool.h"
-#include "mon_client.h"
-#include "mds_client.h"
-#include "osd_client.h"
-#include "ceph_fs.h"
+#include <linux/ceph/libceph.h>
  
  /* f_type in struct statfs */
  #define CEPH_SUPER_MAGIC 0x00c36400
@@ -30,42 +24,25 @@
  #define CEPH_BLOCK_SHIFT   20  /* 1 MB */
  #define CEPH_BLOCK         (1 << CEPH_BLOCK_SHIFT)
  
-/*
- * Supported features
- */
-#define CEPH_FEATURE_SUPPORTED CEPH_FEATURE_NOSRCADDR | CEPH_FEATURE_FLOCK
-#define CEPH_FEATURE_REQUIRED  CEPH_FEATURE_NOSRCADDR
+#define CEPH_MOUNT_OPT_DIRSTAT         (1<<4) /* `cat dirname` for stats */
+#define CEPH_MOUNT_OPT_RBYTES          (1<<5) /* dir st_bytes = rbytes */
+#define CEPH_MOUNT_OPT_NOASYNCREADDIR  (1<<7) /* no dcache readdir */
  
-/*
- * mount options
- */
-#define CEPH_OPT_FSID             (1<<0)
-#define CEPH_OPT_NOSHARE          (1<<1) /* don't share client with other sbs */
-#define CEPH_OPT_MYIP             (1<<2) /* specified my ip */
-#define CEPH_OPT_DIRSTAT          (1<<4) /* funky `cat dirname` for stats */
-#define CEPH_OPT_RBYTES           (1<<5) /* dir st_bytes = rbytes */
-#define CEPH_OPT_NOCRC            (1<<6) /* no data crc on writes */
-#define CEPH_OPT_NOASYNCREADDIR   (1<<7) /* no dcache readdir */
+#define CEPH_MOUNT_OPT_DEFAULT    (CEPH_MOUNT_OPT_RBYTES)
  
-#define CEPH_OPT_DEFAULT   (CEPH_OPT_RBYTES)
+#define ceph_set_mount_opt(fsc, opt) \
+       (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt;
+#define ceph_test_mount_opt(fsc, opt) \
+       (!!((fsc)->mount_options->flags & CEPH_MOUNT_OPT_##opt))
  
-#define ceph_set_opt(client, opt) \
-       (client)->mount_args->flags |= CEPH_OPT_##opt;
-#define ceph_test_opt(client, opt) \
-       (!!((client)->mount_args->flags & CEPH_OPT_##opt))
+#define CEPH_MAX_READDIR_DEFAULT        1024
+#define CEPH_MAX_READDIR_BYTES_DEFAULT  (512*1024)
+#define CEPH_SNAPDIRNAME_DEFAULT        ".snap"
  
-
-struct ceph_mount_args {
-       int sb_flags;
+struct ceph_mount_options {
         int flags;
-       struct ceph_fsid fsid;
-       struct ceph_entity_addr my_addr;
-       int num_mon;
-       struct ceph_entity_addr *mon_addr;
-       int mount_timeout;
-       int osd_idle_ttl;
-       int osd_timeout;
-       int osd_keepalive_timeout;
+       int sb_flags;
+
         int wsize;
         int rsize;            /* max readahead */
         int congestion_kb;    /* max writeback in flight */
@@ -73,82 +50,25 @@ struct ceph_mount_args {
         int cap_release_safety;
         int max_readdir;       /* max readdir result (entires) */
         int max_readdir_bytes; /* max readdir result (bytes) */
-       char *snapdir_name;   /* default ".snap" */
-       char *name;
-       char *secret;
-};
  
-/*
- * defaults
- */
-#define CEPH_MOUNT_TIMEOUT_DEFAULT  60
-#define CEPH_OSD_TIMEOUT_DEFAULT    60  /* seconds */
-#define CEPH_OSD_KEEPALIVE_DEFAULT  5
-#define CEPH_OSD_IDLE_TTL_DEFAULT    60
-#define CEPH_MOUNT_RSIZE_DEFAULT    (512*1024) /* readahead */
-#define CEPH_MAX_READDIR_DEFAULT    1024
-#define CEPH_MAX_READDIR_BYTES_DEFAULT    (512*1024)
-
-#define CEPH_MSG_MAX_FRONT_LEN (16*1024*1024)
-#define CEPH_MSG_MAX_DATA_LEN  (16*1024*1024)
-
-#define CEPH_SNAPDIRNAME_DEFAULT ".snap"
-#define CEPH_AUTH_NAME_DEFAULT   "guest"
-/*
- * Delay telling the MDS we no longer want caps, in case we reopen
- * the file.  Delay a minimum amount of time, even if we send a cap
- * message for some other reason.  Otherwise, take the oppotunity to
- * update the mds to avoid sending another message later.
- */
-#define CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT      5  /* cap release delay */
-#define CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT     60  /* cap release delay */
-
-#define CEPH_CAP_RELEASE_SAFETY_DEFAULT        (CEPH_CAPS_PER_RELEASE * 4)
-
-/* mount state */
-enum {
-       CEPH_MOUNT_MOUNTING,
-       CEPH_MOUNT_MOUNTED,
-       CEPH_MOUNT_UNMOUNTING,
-       CEPH_MOUNT_UNMOUNTED,
-       CEPH_MOUNT_SHUTDOWN,
-};
-
-/*
- * subtract jiffies
- */
-static inline unsigned long time_sub(unsigned long a, unsigned long b)
-{
-       BUG_ON(time_after(b, a));
-       return (long)a - (long)b;
-}
-
-/*
- * per-filesystem client state
- *
- * possibly shared by multiple mount points, if they are
- * mounting the same ceph filesystem/cluster.
- */
-struct ceph_client {
-       struct ceph_fsid fsid;
-       bool have_fsid;
+       /*
+        * everything above this point can be memcmp'd; everything below
+        * is handled in compare_mount_options()
+        */
  
-       struct mutex mount_mutex;       /* serialize mount attempts */
-       struct ceph_mount_args *mount_args;
+       char *snapdir_name;   /* default ".snap" */
+};
  
+struct ceph_fs_client {
         struct super_block *sb;
  
-       unsigned long mount_state;
-       wait_queue_head_t auth_wq;
-
-       int auth_err;
+       struct ceph_mount_options *mount_options;
+       struct ceph_client *client;
  
+       unsigned long mount_state;
         int min_caps;                  /* min caps i added */
  
-       struct ceph_messenger *msgr;   /* messenger instance */
-       struct ceph_mon_client monc;
-       struct ceph_mds_client mdsc;
-       struct ceph_osd_client osdc;
+       struct ceph_mds_client *mdsc;
  
         /* writeback */
         mempool_t *wb_pagevec_pool;
@@ -160,14 +80,14 @@ struct ceph_client {
         struct backing_dev_info backing_dev_info;
  
  #ifdef CONFIG_DEBUG_FS
-       struct dentry *debugfs_monmap;
-       struct dentry *debugfs_mdsmap, *debugfs_osdmap;
-       struct dentry *debugfs_dir, *debugfs_dentry_lru, *debugfs_caps;
+       struct dentry *debugfs_dentry_lru, *debugfs_caps;
         struct dentry *debugfs_congestion_kb;
         struct dentry *debugfs_bdi;
+       struct dentry *debugfs_mdsc, *debugfs_mdsmap;
  #endif
  };
  
+
  /*
   * File i/o capability.  This tracks shared state with the metadata
   * server that allows us to cache or writeback attributes or to read
@@ -275,6 +195,20 @@ struct ceph_inode_xattr {
         int should_free_val;
  };
  
+/*
+ * Ceph dentry state
+ */
+struct ceph_dentry_info {
+       struct ceph_mds_session *lease_session;
+       u32 lease_gen, lease_shared_gen;
+       u32 lease_seq;
+       unsigned long lease_renew_after, lease_renew_from;
+       struct list_head lru;
+       struct dentry *dentry;
+       u64 time;
+       u64 offset;
+};
+
  struct ceph_inode_xattrs_info {
         /*
          * (still encoded) xattr blob. we avoid the overhead of parsing
@@ -296,11 +230,6 @@ struct ceph_inode_xattrs_info {
  /*
   * Ceph inode.
   */
-#define CEPH_I_COMPLETE  1  /* we have complete directory cached */
-#define CEPH_I_NODELAY   4  /* do not delay cap release */
-#define CEPH_I_FLUSH     8  /* do not delay flush of dirty metadata */
-#define CEPH_I_NOFLUSH  16  /* do not flush dirty caps */
-
  struct ceph_inode_info {
         struct ceph_vino i_vino;   /* ceph ino + snap */
  
@@ -391,6 +320,63 @@ static inline struct ceph_inode_info *ceph_inode(struct inode *inode)
         return container_of(inode, struct ceph_inode_info, vfs_inode);
  }
  
+static inline struct ceph_vino ceph_vino(struct inode *inode)
+{
+       return ceph_inode(inode)->i_vino;
+}
+
+/*
+ * ino_t is <64 bits on many architectures, blech.
+ *
+ * don't include snap in ino hash, at least for now.
+ */
+static inline ino_t ceph_vino_to_ino(struct ceph_vino vino)
+{
+       ino_t ino = (ino_t)vino.ino;  /* ^ (vino.snap << 20); */
+#if BITS_PER_LONG == 32
+       ino ^= vino.ino >> (sizeof(u64)-sizeof(ino_t)) * 8;
+       if (!ino)
+               ino = 1;
+#endif
+       return ino;
+}
+
+/* for printf-style formatting */
+#define ceph_vinop(i) ceph_inode(i)->i_vino.ino, ceph_inode(i)->i_vino.snap
+
+static inline u64 ceph_ino(struct inode *inode)
+{
+       return ceph_inode(inode)->i_vino.ino;
+}
+static inline u64 ceph_snap(struct inode *inode)
+{
+       return ceph_inode(inode)->i_vino.snap;
+}
+
+static inline int ceph_ino_compare(struct inode *inode, void *data)
+{
+       struct ceph_vino *pvino = (struct ceph_vino *)data;
+       struct ceph_inode_info *ci = ceph_inode(inode);
+       return ci->i_vino.ino == pvino->ino &&
+               ci->i_vino.snap == pvino->snap;
+}
+
+static inline struct inode *ceph_find_inode(struct super_block *sb,
+                                           struct ceph_vino vino)
+{
+       ino_t t = ceph_vino_to_ino(vino);
+       return ilookup5(sb, t, ceph_ino_compare, &vino);
+}
+
+
+/*
+ * Ceph inode.
+ */
+#define CEPH_I_COMPLETE  1  /* we have complete directory cached */
+#define CEPH_I_NODELAY   4  /* do not delay cap release */
+#define CEPH_I_FLUSH     8  /* do not delay flush of dirty metadata */
+#define CEPH_I_NOFLUSH  16  /* do not flush dirty caps */
+
  static inline void ceph_i_clear(struct inode *inode, unsigned mask)
  {
         struct ceph_inode_info *ci = ceph_inode(inode);
@@ -414,8 +400,9 @@ static inline bool ceph_i_test(struct inode *inode, unsigned mask)
         struct ceph_inode_info *ci = ceph_inode(inode);
         bool r;
  
-       smp_mb();
+       spin_lock(&inode->i_lock);
         r = (ci->i_ceph_flags & mask) == mask;
+       spin_unlock(&inode->i_lock);
         return r;
  }
  
@@ -432,20 +419,6 @@ extern u32 ceph_choose_frag(struct ceph_inode_info *ci, u32 v,
                             struct ceph_inode_frag *pfrag,
                             int *found);
  
-/*
- * Ceph dentry state
- */
-struct ceph_dentry_info {
-       struct ceph_mds_session *lease_session;
-       u32 lease_gen, lease_shared_gen;
-       u32 lease_seq;
-       unsigned long lease_renew_after, lease_renew_from;
-       struct list_head lru;
-       struct dentry *dentry;
-       u64 time;
-       u64 offset;
-};
-
  static inline struct ceph_dentry_info *ceph_dentry(struct dentry *dentry)
  {
         return (struct ceph_dentry_info *)dentry->d_fsdata;
@@ -456,22 +429,6 @@ static inline loff_t ceph_make_fpos(unsigned frag, unsigned off)
         return ((loff_t)frag << 32) | (loff_t)off;
  }
  
-/*
- * ino_t is <64 bits on many architectures, blech.
- *
- * don't include snap in ino hash, at least for now.
- */
-static inline ino_t ceph_vino_to_ino(struct ceph_vino vino)
-{
-       ino_t ino = (ino_t)vino.ino;  /* ^ (vino.snap << 20); */
-#if BITS_PER_LONG == 32
-       ino ^= vino.ino >> (sizeof(u64)-sizeof(ino_t)) * 8;
-       if (!ino)
-               ino = 1;
-#endif
-       return ino;
-}
-
  static inline int ceph_set_ino_cb(struct inode *inode, void *data)
  {
         ceph_inode(inode)->i_vino = *(struct ceph_vino *)data;
@@ -479,39 +436,6 @@ static inline int ceph_set_ino_cb(struct inode *inode, void *data)
         return 0;
  }
  
-static inline struct ceph_vino ceph_vino(struct inode *inode)
-{
-       return ceph_inode(inode)->i_vino;
-}
-
-/* for printf-style formatting */
-#define ceph_vinop(i) ceph_inode(i)->i_vino.ino, ceph_inode(i)->i_vino.snap
-
-static inline u64 ceph_ino(struct inode *inode)
-{
-       return ceph_inode(inode)->i_vino.ino;
-}
-static inline u64 ceph_snap(struct inode *inode)
-{
-       return ceph_inode(inode)->i_vino.snap;
-}
-
-static inline int ceph_ino_compare(struct inode *inode, void *data)
-{
-       struct ceph_vino *pvino = (struct ceph_vino *)data;
-       struct ceph_inode_info *ci = ceph_inode(inode);
-       return ci->i_vino.ino == pvino->ino &&
-               ci->i_vino.snap == pvino->snap;
-}
-
-static inline struct inode *ceph_find_inode(struct super_block *sb,
-                                           struct ceph_vino vino)
-{
-       ino_t t = ceph_vino_to_ino(vino);
-       return ilookup5(sb, t, ceph_ino_compare, &vino);
-}
-
-
  /*
   * caps helpers
   */
@@ -576,18 +500,18 @@ extern int ceph_reserve_caps(struct ceph_mds_client *mdsc,
                              struct ceph_cap_reservation *ctx, int need);
  extern int ceph_unreserve_caps(struct ceph_mds_client *mdsc,
                                struct ceph_cap_reservation *ctx);
-extern void ceph_reservation_status(struct ceph_client *client,
+extern void ceph_reservation_status(struct ceph_fs_client *client,
                                     int *total, int *avail, int *used,
                                     int *reserved, int *min);
  
-static inline struct ceph_client *ceph_inode_to_client(struct inode *inode)
+static inline struct ceph_fs_client *ceph_inode_to_client(struct inode *inode)
  {
-       return (struct ceph_client *)inode->i_sb->s_fs_info;
+       return (struct ceph_fs_client *)inode->i_sb->s_fs_info;
  }
  
-static inline struct ceph_client *ceph_sb_to_client(struct super_block *sb)
+static inline struct ceph_fs_client *ceph_sb_to_client(struct super_block *sb)
  {
-       return (struct ceph_client *)sb->s_fs_info;
+       return (struct ceph_fs_client *)sb->s_fs_info;
  }
  
  
@@ -616,51 +540,6 @@ struct ceph_file_info {
  
  
  
-/*
- * snapshots
- */
-
-/*
- * A "snap context" is the set of existing snapshots when we
- * write data.  It is used by the OSD to guide its COW behavior.
- *
- * The ceph_snap_context is refcounted, and attached to each dirty
- * page, indicating which context the dirty data belonged when it was
- * dirtied.
- */
-struct ceph_snap_context {
-       atomic_t nref;
-       u64 seq;
-       int num_snaps;
-       u64 snaps[];
-};
-
-static inline struct ceph_snap_context *
-ceph_get_snap_context(struct ceph_snap_context *sc)
-{
-       /*
-       printk("get_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
-              atomic_read(&sc->nref)+1);
-       */
-       if (sc)
-               atomic_inc(&sc->nref);
-       return sc;
-}
-
-static inline void ceph_put_snap_context(struct ceph_snap_context *sc)
-{
-       if (!sc)
-               return;
-       /*
-       printk("put_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
-              atomic_read(&sc->nref)-1);
-       */
-       if (atomic_dec_and_test(&sc->nref)) {
-               /*printk(" deleting snap_context %p\n", sc);*/
-               kfree(sc);
-       }
-}
-
  /*
   * A "snap realm" describes a subset of the file hierarchy sharing
   * the same set of snapshots that apply to it.  The realms themselves
@@ -699,16 +578,33 @@ struct ceph_snap_realm {
         spinlock_t inodes_with_caps_lock;
  };
  
-
-
-/*
- * calculate the number of pages a given length and offset map onto,
- * if we align the data.
- */
-static inline int calc_pages_for(u64 off, u64 len)
+static inline int default_congestion_kb(void)
  {
-       return ((off+len+PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT) -
-               (off >> PAGE_CACHE_SHIFT);
+       int congestion_kb;
+
+       /*
+        * Copied from NFS
+        *
+        * congestion size, scale with available memory.
+        *
+        *  64MB:    8192k
+        * 128MB:   11585k
+        * 256MB:   16384k
+        * 512MB:   23170k
+        *   1GB:   32768k
+        *   2GB:   46340k
+        *   4GB:   65536k
+        *   8GB:   92681k
+        *  16GB:  131072k
+        *
+        * This allows larger machines to have larger/more transfers.
+        * Limit the default to 256M
+        */
+       congestion_kb = (16*int_sqrt(totalram_pages)) << (PAGE_SHIFT-10);
+       if (congestion_kb > 256*1024)
+               congestion_kb = 256*1024;
+
+       return congestion_kb;
  }
  
  
@@ -741,16 +637,6 @@ static inline bool __ceph_have_pending_cap_snap(struct ceph_inode_info *ci)
                            ci_item)->writing;
  }
  
-
-/* super.c */
-extern struct kmem_cache *ceph_inode_cachep;
-extern struct kmem_cache *ceph_cap_cachep;
-extern struct kmem_cache *ceph_dentry_cachep;
-extern struct kmem_cache *ceph_file_cachep;
-
-extern const char *ceph_msg_type_name(int type);
-extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
-
  /* inode.c */
  extern const struct inode_operations ceph_file_iops;
  
@@ -857,12 +743,18 @@ extern int ceph_mmap(struct file *file, struct vm_area_struct *vma);
  /* file.c */
  extern const struct file_operations ceph_file_fops;
  extern const struct address_space_operations ceph_aops;
+extern int ceph_copy_to_page_vector(struct page **pages,
+                                   const char *data,
+                                   loff_t off, size_t len);
+extern int ceph_copy_from_page_vector(struct page **pages,
+                                   char *data,
+                                   loff_t off, size_t len);
+extern struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags);
  extern int ceph_open(struct inode *inode, struct file *file);
  extern struct dentry *ceph_lookup_open(struct inode *dir, struct dentry *dentry,
                                        struct nameidata *nd, int mode,
                                        int locked_dir);
  extern int ceph_release(struct inode *inode, struct file *filp);
-extern void ceph_release_page_vector(struct page **pages, int num_pages);
  
  /* dir.c */
  extern const struct file_operations ceph_dir_fops;
@@ -892,12 +784,6 @@ extern long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
  /* export.c */
  extern const struct export_operations ceph_export_ops;
  
-/* debugfs.c */
-extern int ceph_debugfs_init(void);
-extern void ceph_debugfs_cleanup(void);
-extern int ceph_debugfs_client_init(struct ceph_client *client);
-extern void ceph_debugfs_client_cleanup(struct ceph_client *client);
-
  /* locks.c */
  extern int ceph_lock(struct file *file, int cmd, struct file_lock *fl);
  extern int ceph_flock(struct file *file, int cmd, struct file_lock *fl);
@@ -914,4 +800,8 @@ static inline struct inode *get_dentry_parent_inode(struct dentry *dentry)
         return NULL;
  }
  
+/* debugfs.c */
+extern int ceph_fs_debugfs_init(struct ceph_fs_client *client);
+extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
+
  #endif /* _FS_CEPH_SUPER_H */
diff --git a/fs/ceph/types.h b/fs/ceph/types.h

deleted file mode 100644 (file)

index 28b35a0..0000000
--- a/fs/ceph/types.h
+++ /dev/null
@@ -1,29 +0,0 @@
-#ifndef _FS_CEPH_TYPES_H
-#define _FS_CEPH_TYPES_H
-
-/* needed before including ceph_fs.h */
-#include <linux/in.h>
-#include <linux/types.h>
-#include <linux/fcntl.h>
-#include <linux/string.h>
-
-#include "ceph_fs.h"
-#include "ceph_frag.h"
-#include "ceph_hash.h"
-
-/*
- * Identify inodes by both their ino AND snapshot id (a u64).
- */
-struct ceph_vino {
-       u64 ino;
-       u64 snap;
-};
-
-
-/* context for the caps reservation mechanism */
-struct ceph_cap_reservation {
-       int count;
-};
-
-
-#endif
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c

index 9578af610b73fb48b69872ddeed222e58c8340f0..6e12a6ba5f79daabc1bc455a3a4db464b240c00a 100644 (file)
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -1,6 +1,9 @@
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+
  #include "super.h"
-#include "decode.h"
+#include "mds_client.h"
+
+#include <linux/ceph/decode.h>
  
  #include <linux/xattr.h>
  #include <linux/slab.h>
@@ -620,12 +623,12 @@ out:
  static int ceph_sync_setxattr(struct dentry *dentry, const char *name,
                               const char *value, size_t size, int flags)
  {
-       struct ceph_client *client = ceph_sb_to_client(dentry->d_sb);
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
         struct inode *inode = dentry->d_inode;
         struct ceph_inode_info *ci = ceph_inode(inode);
         struct inode *parent_inode = dentry->d_parent->d_inode;
         struct ceph_mds_request *req;
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         int err;
         int i, nr_pages;
         struct page **pages = NULL;
@@ -713,10 +716,9 @@ int ceph_setxattr(struct dentry *dentry, const char *name,
  
         /* preallocate memory for xattr name, value, index node */
         err = -ENOMEM;
-       newname = kmalloc(name_len + 1, GFP_NOFS);
+       newname = kmemdup(name, name_len + 1, GFP_NOFS);
         if (!newname)
                 goto out;
-       memcpy(newname, name, name_len + 1);
  
         if (val_len) {
                 newval = kmalloc(val_len + 1, GFP_NOFS);
@@ -777,8 +779,8 @@ out:
  
  static int ceph_send_removexattr(struct dentry *dentry, const char *name)
  {
-       struct ceph_client *client = ceph_sb_to_client(dentry->d_sb);
-       struct ceph_mds_client *mdsc = &client->mdsc;
+       struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
+       struct ceph_mds_client *mdsc = fsc->mdsc;
         struct inode *inode = dentry->d_inode;
         struct inode *parent_inode = dentry->d_parent->d_inode;
         struct ceph_mds_request *req;
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c

index c65c3419dd3703f12bb4994e9333c085c907ecfa..7e83b356cc9e3a93c2bc1b0e915d118884170474 100644 (file)
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -232,7 +232,7 @@ static int
  small_smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
                 void **request_buf)
  {
-       int rc = 0;
+       int rc;
  
         rc = cifs_reconnect_tcon(tcon, smb_command);
         if (rc)
@@ -250,7 +250,7 @@ small_smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
         if (tcon != NULL)
                 cifs_stats_inc(&tcon->num_smbs_sent);
  
-       return rc;
+       return 0;
  }
  
  int
@@ -281,16 +281,9 @@ small_smb_init_no_tc(const int smb_command, const int wct,
  
  /* If the return code is zero, this function must fill in request_buf pointer */
  static int
-smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
-        void **request_buf /* returned */ ,
-        void **response_buf /* returned */ )
+__smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
+                       void **request_buf, void **response_buf)
  {
-       int rc = 0;
-
-       rc = cifs_reconnect_tcon(tcon, smb_command);
-       if (rc)
-               return rc;
-
         *request_buf = cifs_buf_get();
         if (*request_buf == NULL) {
                 /* BB should we add a retry in here if not a writepage? */
@@ -309,7 +302,31 @@ smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
         if (tcon != NULL)
                 cifs_stats_inc(&tcon->num_smbs_sent);
  
-       return rc;
+       return 0;
+}
+
+/* If the return code is zero, this function must fill in request_buf pointer */
+static int
+smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
+        void **request_buf, void **response_buf)
+{
+       int rc;
+
+       rc = cifs_reconnect_tcon(tcon, smb_command);
+       if (rc)
+               return rc;
+
+       return __smb_init(smb_command, wct, tcon, request_buf, response_buf);
+}
+
+static int
+smb_init_no_reconnect(int smb_command, int wct, struct cifsTconInfo *tcon,
+                       void **request_buf, void **response_buf)
+{
+       if (tcon->ses->need_reconnect || tcon->need_reconnect)
+               return -EHOSTDOWN;
+
+       return __smb_init(smb_command, wct, tcon, request_buf, response_buf);
  }
  
  static int validate_t2(struct smb_t2_rsp *pSMB)
@@ -4534,8 +4551,8 @@ CIFSSMBQFSUnixInfo(const int xid, struct cifsTconInfo *tcon)
  
         cFYI(1, "In QFSUnixInfo");
  QFSUnixRetry:
-       rc = smb_init(SMB_COM_TRANSACTION2, 15, tcon, (void **) &pSMB,
-                     (void **) &pSMBr);
+       rc = smb_init_no_reconnect(SMB_COM_TRANSACTION2, 15, tcon,
+                                  (void **) &pSMB, (void **) &pSMBr);
         if (rc)
                 return rc;
  
@@ -4604,8 +4621,8 @@ CIFSSMBSetFSUnixInfo(const int xid, struct cifsTconInfo *tcon, __u64 cap)
         cFYI(1, "In SETFSUnixInfo");
  SETFSUnixRetry:
         /* BB switch to small buf init to save memory */
-       rc = smb_init(SMB_COM_TRANSACTION2, 15, tcon, (void **) &pSMB,
-                     (void **) &pSMBr);
+       rc = smb_init_no_reconnect(SMB_COM_TRANSACTION2, 15, tcon,
+                                       (void **) &pSMB, (void **) &pSMBr);
         if (rc)
                 return rc;
  
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c

index 93f77d438d3c8f3d6702d24486c4e0f64055796b..53cce8cc2224f4abe4754d2cc320a05a6f36d215 100644 (file)
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -801,6 +801,8 @@ retry_iget5_locked:
                         inode->i_flags |= S_NOATIME | S_NOCMTIME;
                 if (inode->i_state & I_NEW) {
                         inode->i_ino = hash;
+                       if (S_ISREG(inode->i_mode))
+                               inode->i_data.backing_dev_info = sb->s_bdi;
  #ifdef CONFIG_CIFS_FSCACHE
                         /* initialize per-inode cache cookie pointer */
                         CIFS_I(inode)->fscache = NULL;
diff --git a/fs/exec.c b/fs/exec.c

index 828dd2461d6beb7c37ed7df74ef11177c2bcba6d..6d2b6f93685813ba2b2119dc71c14a941061cf46 100644 (file)
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2014,3 +2014,43 @@ fail_creds:
  fail:
         return;
  }
+
+/*
+ * Core dumping helper functions.  These are the only things you should
+ * do on a core-file: use only these functions to write out all the
+ * necessary info.
+ */
+int dump_write(struct file *file, const void *addr, int nr)
+{
+       return access_ok(VERIFY_READ, addr, nr) && file->f_op->write(file, addr, nr, &file->f_pos) == nr;
+}
+EXPORT_SYMBOL(dump_write);
+
+int dump_seek(struct file *file, loff_t off)
+{
+       int ret = 1;
+
+       if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
+               if (file->f_op->llseek(file, off, SEEK_CUR) < 0)
+                       return 0;
+       } else {
+               char *buf = (char *)get_zeroed_page(GFP_KERNEL);
+
+               if (!buf)
+                       return 0;
+               while (off > 0) {
+                       unsigned long n = off;
+
+                       if (n > PAGE_SIZE)
+                               n = PAGE_SIZE;
+                       if (!dump_write(file, buf, n)) {
+                               ret = 0;
+                               break;
+                       }
+                       off -= n;
+               }
+               free_page((unsigned long)buf);
+       }
+       return ret;
+}
+EXPORT_SYMBOL(dump_seek);
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c

index eb7368ebd8cdc294c7ec4b2d1ad0d4953f699981..3eadd97324b140e679f269480b3737823fd008cb 100644 (file)
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -54,6 +54,9 @@ struct page_collect {
         unsigned nr_pages;
         unsigned long length;
         loff_t pg_first; /* keep 64bit also in 32-arches */
+       bool read_4_write; /* This means two things: that the read is sync
+                           * And the pages should not be unlocked.
+                           */
  };
  
  static void _pcol_init(struct page_collect *pcol, unsigned expected_pages,
@@ -71,6 +74,7 @@ static void _pcol_init(struct page_collect *pcol, unsigned expected_pages,
         pcol->nr_pages = 0;
         pcol->length = 0;
         pcol->pg_first = -1;
+       pcol->read_4_write = false;
  }
  
  static void _pcol_reset(struct page_collect *pcol)
@@ -347,7 +351,8 @@ static int readpage_strip(void *data, struct page *page)
                 if (PageError(page))
                         ClearPageError(page);
  
-               unlock_page(page);
+               if (!pcol->read_4_write)
+                       unlock_page(page);
                 EXOFS_DBGMSG("readpage_strip(0x%lx, 0x%lx) empty page,"
                              " splitting\n", inode->i_ino, page->index);
  
@@ -428,6 +433,7 @@ static int _readpage(struct page *page, bool is_sync)
         /* readpage_strip might call read_exec(,is_sync==false) at several
          * places but not if we have a single page.
          */
+       pcol.read_4_write = is_sync;
         ret = readpage_strip(&pcol, page);
         if (ret) {
                 EXOFS_ERR("_readpage => %d\n", ret);
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c

index 5581122bd2c00cd1f8727436c531bb5245b03b0a..ab38fef1c9a1a52eab7128fa6a7dad217d4ad744 100644 (file)
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -72,22 +72,11 @@ int writeback_in_progress(struct backing_dev_info *bdi)
  static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
  {
         struct super_block *sb = inode->i_sb;
-       struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
  
-       /*
-        * For inodes on standard filesystems, we use superblock's bdi. For
-        * inodes on virtual filesystems, we want to use inode mapping's bdi
-        * because they can possibly point to something useful (think about
-        * block_dev filesystem).
-        */
-       if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
-               /* Some device inodes could play dirty tricks. Catch them... */
-               WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi),
-                       "Dirtiable inode bdi %s != sb bdi %s\n",
-                       bdi->name, sb->s_bdi->name);
-               return sb->s_bdi;
-       }
-       return bdi;
+       if (strcmp(sb->s_type->name, "bdev") == 0)
+               return inode->i_mapping->backing_dev_info;
+
+       return sb->s_bdi;
  }
  
  static void bdi_queue_work(struct backing_dev_info *bdi,
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c

index d367af1514efe696b50a73374fffe77784c65b6a..cde755cca5642d41fb9cbbe05ac2d01f70f53c69 100644 (file)
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1354,7 +1354,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
         loff_t file_size;
         unsigned int num;
         unsigned int offset;
-       size_t total_len;
+       size_t total_len = 0;
  
         req = fuse_get_req(fc);
         if (IS_ERR(req))
diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig

index cc9665522148a730b010953596cc24edefc00ed6..c465ae066c62c6392ee22047d2be0ec650b89c98 100644 (file)
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -1,6 +1,6 @@
  config GFS2_FS
         tristate "GFS2 file system support"
-       depends on EXPERIMENTAL && (64BIT || LBDAF)
+       depends on (64BIT || LBDAF)
         select DLM if GFS2_FS_LOCKING_DLM
         select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
         select SYSFS if GFS2_FS_LOCKING_DLM
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c

index 194fe16d8418a332a274a74769b15277ff2d6858..6b24afb96aaedade304b48bb427e664eae8e6e53 100644 (file)
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -36,8 +36,8 @@
  #include "glops.h"
  
  
-static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
-                                  unsigned int from, unsigned int to)
+void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+                           unsigned int from, unsigned int to)
  {
         struct buffer_head *head = page_buffers(page);
         unsigned int bsize = head->b_size;
@@ -615,7 +615,7 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
         unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
         int alloc_required;
         int error = 0;
-       struct gfs2_alloc *al;
+       struct gfs2_alloc *al = NULL;
         pgoff_t index = pos >> PAGE_CACHE_SHIFT;
         unsigned from = pos & (PAGE_CACHE_SIZE - 1);
         unsigned to = from + len;
@@ -663,6 +663,8 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
                 rblocks += RES_STATFS + RES_QUOTA;
         if (&ip->i_inode == sdp->sd_rindex)
                 rblocks += 2 * RES_STATFS;
+       if (alloc_required)
+               rblocks += gfs2_rg_blocks(al);
  
         error = gfs2_trans_begin(sdp, rblocks,
                                  PAGE_CACHE_SIZE/sdp->sd_sb.sb_bsize);
@@ -696,13 +698,11 @@ out:
  
         page_cache_release(page);
  
-       /*
-        * XXX(truncate): the call below should probably be replaced with
-        * a call to the gfs2-specific truncate blocks helper to actually
-        * release disk blocks..
-        */
+       gfs2_trans_end(sdp);
         if (pos + len > ip->i_inode.i_size)
-               truncate_setsize(&ip->i_inode, ip->i_inode.i_size);
+               gfs2_trim_blocks(&ip->i_inode);
+       goto out_trans_fail;
+
  out_endtrans:
         gfs2_trans_end(sdp);
  out_trans_fail:
@@ -802,10 +802,8 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
         page_cache_release(page);
  
         if (copied) {
-               if (inode->i_size < to) {
+               if (inode->i_size < to)
                         i_size_write(inode, to);
-                       ip->i_disksize = inode->i_size;
-               }
                 gfs2_dinode_out(ip, di);
                 mark_inode_dirty(inode);
         }
@@ -876,8 +874,6 @@ static int gfs2_write_end(struct file *file, struct address_space *mapping,
  
         ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
         if (ret > 0) {
-               if (inode->i_size > ip->i_disksize)
-                       ip->i_disksize = inode->i_size;
                 gfs2_dinode_out(ip, dibh->b_data);
                 mark_inode_dirty(inode);
         }
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c

index 6f482809d1a35b4787e9cb62357958d532aeaa30..5476c066d4ee336733445eda2f804561179ecb41 100644 (file)
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -50,7 +50,7 @@ struct strip_mine {
   * @ip: the inode
   * @dibh: the dinode buffer
   * @block: the block number that was allocated
- * @private: any locked page held by the caller process
+ * @page: The (optional) page. This is looked up if @page is NULL
   *
   * Returns: errno
   */
@@ -109,8 +109,7 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh,
  /**
   * gfs2_unstuff_dinode - Unstuff a dinode when the data has grown too big
   * @ip: The GFS2 inode to unstuff
- * @unstuffer: the routine that handles unstuffing a non-zero length file
- * @private: private data for the unstuffer
+ * @page: The (optional) page. This is looked up if the @page is NULL
   *
   * This routine unstuffs a dinode and returns it to a "normal" state such
   * that the height can be grown in the traditional way.
@@ -132,7 +131,7 @@ int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page)
         if (error)
                 goto out;
  
-       if (ip->i_disksize) {
+       if (i_size_read(&ip->i_inode)) {
                 /* Get a free block, fill it with the stuffed data,
                    and write it out to disk */
  
@@ -161,7 +160,7 @@ int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page)
         di = (struct gfs2_dinode *)dibh->b_data;
         gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode));
  
-       if (ip->i_disksize) {
+       if (i_size_read(&ip->i_inode)) {
                 *(__be64 *)(di + 1) = cpu_to_be64(block);
                 gfs2_add_inode_blocks(&ip->i_inode, 1);
                 di->di_blocks = cpu_to_be64(gfs2_get_inode_blocks(&ip->i_inode));
@@ -884,84 +883,15 @@ out:
         return error;
  }
  
-/**
- * do_grow - Make a file look bigger than it is
- * @ip: the inode
- * @size: the size to set the file to
- *
- * Called with an exclusive lock on @ip.
- *
- * Returns: errno
- */
-
-static int do_grow(struct gfs2_inode *ip, u64 size)
-{
-       struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
-       struct gfs2_alloc *al;
-       struct buffer_head *dibh;
-       int error;
-
-       al = gfs2_alloc_get(ip);
-       if (!al)
-               return -ENOMEM;
-
-       error = gfs2_quota_lock_check(ip);
-       if (error)
-               goto out;
-
-       al->al_requested = sdp->sd_max_height + RES_DATA;
-
-       error = gfs2_inplace_reserve(ip);
-       if (error)
-               goto out_gunlock_q;
-
-       error = gfs2_trans_begin(sdp,
-                       sdp->sd_max_height + al->al_rgd->rd_length +
-                       RES_JDATA + RES_DINODE + RES_STATFS + RES_QUOTA, 0);
-       if (error)
-               goto out_ipres;
-
-       error = gfs2_meta_inode_buffer(ip, &dibh);
-       if (error)
-               goto out_end_trans;
-
-       if (size > sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) {
-               if (gfs2_is_stuffed(ip)) {
-                       error = gfs2_unstuff_dinode(ip, NULL);
-                       if (error)
-                               goto out_brelse;
-               }
-       }
-
-       ip->i_disksize = size;
-       ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
-       gfs2_trans_add_bh(ip->i_gl, dibh, 1);
-       gfs2_dinode_out(ip, dibh->b_data);
-
-out_brelse:
-       brelse(dibh);
-out_end_trans:
-       gfs2_trans_end(sdp);
-out_ipres:
-       gfs2_inplace_release(ip);
-out_gunlock_q:
-       gfs2_quota_unlock(ip);
-out:
-       gfs2_alloc_put(ip);
-       return error;
-}
-
-
  /**
   * gfs2_block_truncate_page - Deal with zeroing out data for truncate
   *
   * This is partly borrowed from ext3.
   */
-static int gfs2_block_truncate_page(struct address_space *mapping)
+static int gfs2_block_truncate_page(struct address_space *mapping, loff_t from)
  {
         struct inode *inode = mapping->host;
         struct gfs2_inode *ip = GFS2_I(inode);
-       loff_t from = inode->i_size;
         unsigned long index = from >> PAGE_CACHE_SHIFT;
         unsigned offset = from & (PAGE_CACHE_SIZE-1);
         unsigned blocksize, iblock, length, pos;
@@ -1023,9 +953,11 @@ unlock:
         return err;
  }
  
-static int trunc_start(struct gfs2_inode *ip, u64 size)
+static int trunc_start(struct inode *inode, u64 oldsize, u64 newsize)
  {
-       struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+       struct gfs2_inode *ip = GFS2_I(inode);
+       struct gfs2_sbd *sdp = GFS2_SB(inode);
+       struct address_space *mapping = inode->i_mapping;
         struct buffer_head *dibh;
         int journaled = gfs2_is_jdata(ip);
         int error;
@@ -1039,31 +971,26 @@ static int trunc_start(struct gfs2_inode *ip, u64 size)
         if (error)
                 goto out;
  
+       gfs2_trans_add_bh(ip->i_gl, dibh, 1);
+
         if (gfs2_is_stuffed(ip)) {
-               u64 dsize = size + sizeof(struct gfs2_dinode);
-               ip->i_disksize = size;
-               ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
-               gfs2_trans_add_bh(ip->i_gl, dibh, 1);
-               gfs2_dinode_out(ip, dibh->b_data);
-               if (dsize > dibh->b_size)
-                       dsize = dibh->b_size;
-               gfs2_buffer_clear_tail(dibh, dsize);
-               error = 1;
+               gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + newsize);
         } else {
-               if (size & (u64)(sdp->sd_sb.sb_bsize - 1))
-                       error = gfs2_block_truncate_page(ip->i_inode.i_mapping);
-
-               if (!error) {
-                       ip->i_disksize = size;
-                       ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
-                       ip->i_diskflags |= GFS2_DIF_TRUNC_IN_PROG;
-                       gfs2_trans_add_bh(ip->i_gl, dibh, 1);
-                       gfs2_dinode_out(ip, dibh->b_data);
+               if (newsize & (u64)(sdp->sd_sb.sb_bsize - 1)) {
+                       error = gfs2_block_truncate_page(mapping, newsize);
+                       if (error)
+                               goto out_brelse;
                 }
+               ip->i_diskflags |= GFS2_DIF_TRUNC_IN_PROG;
         }
  
-       brelse(dibh);
+       i_size_write(inode, newsize);
+       ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
+       gfs2_dinode_out(ip, dibh->b_data);
  
+       truncate_pagecache(inode, oldsize, newsize);
+out_brelse:
+       brelse(dibh);
  out:
         gfs2_trans_end(sdp);
         return error;
@@ -1123,7 +1050,7 @@ static int trunc_end(struct gfs2_inode *ip)
         if (error)
                 goto out;
  
-       if (!ip->i_disksize) {
+       if (!i_size_read(&ip->i_inode)) {
                 ip->i_height = 0;
                 ip->i_goal = ip->i_no_addr;
                 gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode));
@@ -1143,92 +1070,154 @@ out:
  
  /**
   * do_shrink - make a file smaller
- * @ip: the inode
- * @size: the size to make the file
- * @truncator: function to truncate the last partial block
+ * @inode: the inode
+ * @oldsize: the current inode size
+ * @newsize: the size to make the file
   *
- * Called with an exclusive lock on @ip.
+ * Called with an exclusive lock on @inode. The @size must
+ * be equal to or smaller than the current inode size.
   *
   * Returns: errno
   */
  
-static int do_shrink(struct gfs2_inode *ip, u64 size)
+static int do_shrink(struct inode *inode, u64 oldsize, u64 newsize)
  {
+       struct gfs2_inode *ip = GFS2_I(inode);
         int error;
  
-       error = trunc_start(ip, size);
+       error = trunc_start(inode, oldsize, newsize);
         if (error < 0)
                 return error;
-       if (error > 0)
+       if (gfs2_is_stuffed(ip))
                 return 0;
  
-       error = trunc_dealloc(ip, size);
-       if (!error)
+       error = trunc_dealloc(ip, newsize);
+       if (error == 0)
                 error = trunc_end(ip);
  
         return error;
  }
  
-static int do_touch(struct gfs2_inode *ip, u64 size)
+void gfs2_trim_blocks(struct inode *inode)
  {
-       struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+       u64 size = inode->i_size;
+       int ret;
+
+       ret = do_shrink(inode, size, size);
+       WARN_ON(ret != 0);
+}
+
+/**
+ * do_grow - Touch and update inode size
+ * @inode: The inode
+ * @size: The new size
+ *
+ * This function updates the timestamps on the inode and
+ * may also increase the size of the inode. This function
+ * must not be called with @size any smaller than the current
+ * inode size.
+ *
+ * Although it is not strictly required to unstuff files here,
+ * earlier versions of GFS2 have a bug in the stuffed file reading
+ * code which will result in a buffer overrun if the size is larger
+ * than the max stuffed file size. In order to prevent this from
+ * occuring, such files are unstuffed, but in other cases we can
+ * just update the inode size directly.
+ *
+ * Returns: 0 on success, or -ve on error
+ */
+
+static int do_grow(struct inode *inode, u64 size)
+{
+       struct gfs2_inode *ip = GFS2_I(inode);
+       struct gfs2_sbd *sdp = GFS2_SB(inode);
         struct buffer_head *dibh;
+       struct gfs2_alloc *al = NULL;
         int error;
  
-       error = gfs2_trans_begin(sdp, RES_DINODE, 0);
+       if (gfs2_is_stuffed(ip) &&
+           (size > (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)))) {
+               al = gfs2_alloc_get(ip);
+               if (al == NULL)
+                       return -ENOMEM;
+
+               error = gfs2_quota_lock_check(ip);
+               if (error)
+                       goto do_grow_alloc_put;
+
+               al->al_requested = 1;
+               error = gfs2_inplace_reserve(ip);
+               if (error)
+                       goto do_grow_qunlock;
+       }
+
+       error = gfs2_trans_begin(sdp, RES_DINODE + RES_STATFS + RES_RG_BIT, 0);
         if (error)
-               return error;
+               goto do_grow_release;
  
-       down_write(&ip->i_rw_mutex);
+       if (al) {
+               error = gfs2_unstuff_dinode(ip, NULL);
+               if (error)
+                       goto do_end_trans;
+       }
  
         error = gfs2_meta_inode_buffer(ip, &dibh);
         if (error)
-               goto do_touch_out;
+               goto do_end_trans;
  
+       i_size_write(inode, size);
         ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
         gfs2_trans_add_bh(ip->i_gl, dibh, 1);
         gfs2_dinode_out(ip, dibh->b_data);
         brelse(dibh);
  
-do_touch_out:
-       up_write(&ip->i_rw_mutex);
+do_end_trans:
         gfs2_trans_end(sdp);
+do_grow_release:
+       if (al) {
+               gfs2_inplace_release(ip);
+do_grow_qunlock:
+               gfs2_quota_unlock(ip);
+do_grow_alloc_put:
+               gfs2_alloc_put(ip);
+       }
         return error;
  }
  
  /**
- * gfs2_truncatei - make a file a given size
- * @ip: the inode
- * @size: the size to make the file
- * @truncator: function to truncate the last partial block
+ * gfs2_setattr_size - make a file a given size
+ * @inode: the inode
+ * @newsize: the size to make the file
   *
- * The file size can grow, shrink, or stay the same size.
+ * The file size can grow, shrink, or stay the same size. This
+ * is called holding i_mutex and an exclusive glock on the inode
+ * in question.
   *
   * Returns: errno
   */
  
-int gfs2_truncatei(struct gfs2_inode *ip, u64 size)
+int gfs2_setattr_size(struct inode *inode, u64 newsize)
  {
-       int error;
+       int ret;
+       u64 oldsize;
  
-       if (gfs2_assert_warn(GFS2_SB(&ip->i_inode), S_ISREG(ip->i_inode.i_mode)))
-               return -EINVAL;
+       BUG_ON(!S_ISREG(inode->i_mode));
  
-       if (size > ip->i_disksize)
-               error = do_grow(ip, size);
-       else if (size < ip->i_disksize)
-               error = do_shrink(ip, size);
-       else
-               /* update time stamps */
-               error = do_touch(ip, size);
+       ret = inode_newsize_ok(inode, newsize);
+       if (ret)
+               return ret;
  
-       return error;
+       oldsize = inode->i_size;
+       if (newsize >= oldsize)
+               return do_grow(inode, newsize);
+
+       return do_shrink(inode, oldsize, newsize);
  }
  
  int gfs2_truncatei_resume(struct gfs2_inode *ip)
  {
         int error;
-       error = trunc_dealloc(ip, ip->i_disksize);
+       error = trunc_dealloc(ip, i_size_read(&ip->i_inode));
         if (!error)
                 error = trunc_end(ip);
         return error;
@@ -1269,7 +1258,7 @@ int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
  
         shift = sdp->sd_sb.sb_bsize_shift;
         BUG_ON(gfs2_is_dir(ip));
-       end_of_file = (ip->i_disksize + sdp->sd_sb.sb_bsize - 1) >> shift;
+       end_of_file = (i_size_read(&ip->i_inode) + sdp->sd_sb.sb_bsize - 1) >> shift;
         lblock = offset >> shift;
         lblock_stop = (offset + len + sdp->sd_sb.sb_bsize - 1) >> shift;
         if (lblock_stop > end_of_file)
diff --git a/fs/gfs2/bmap.h b/fs/gfs2/bmap.h

index a20a5213135a50ae9293da676eb533e9c04063db..42fea03e2bd962b6674747967ac00499009d923f 100644 (file)
--- a/fs/gfs2/bmap.h
+++ b/fs/gfs2/bmap.h
@@ -44,14 +44,16 @@ static inline void gfs2_write_calc_reserv(const struct gfs2_inode *ip,
         }
  }
  
-int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page);
-int gfs2_block_map(struct inode *inode, sector_t lblock, struct buffer_head *bh, int create);
-int gfs2_extent_map(struct inode *inode, u64 lblock, int *new, u64 *dblock, unsigned *extlen);
-
-int gfs2_truncatei(struct gfs2_inode *ip, u64 size);
-int gfs2_truncatei_resume(struct gfs2_inode *ip);
-int gfs2_file_dealloc(struct gfs2_inode *ip);
-int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
-                             unsigned int len);
+extern int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page);
+extern int gfs2_block_map(struct inode *inode, sector_t lblock,
+                         struct buffer_head *bh, int create);
+extern int gfs2_extent_map(struct inode *inode, u64 lblock, int *new,
+                          u64 *dblock, unsigned *extlen);
+extern int gfs2_setattr_size(struct inode *inode, u64 size);
+extern void gfs2_trim_blocks(struct inode *inode);
+extern int gfs2_truncatei_resume(struct gfs2_inode *ip);
+extern int gfs2_file_dealloc(struct gfs2_inode *ip);
+extern int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
+                                    unsigned int len);
  
  #endif /* __BMAP_DOT_H__ */
diff --git a/fs/gfs2/dentry.c b/fs/gfs2/dentry.c

index bb7907bde3d81b63b6ae5b198283a52db36e97b0..6798755b3858685b611da5e439393bcffb332563 100644 (file)
--- a/fs/gfs2/dentry.c
+++ b/fs/gfs2/dentry.c
@@ -49,7 +49,7 @@ static int gfs2_drevalidate(struct dentry *dentry, struct nameidata *nd)
                 ip = GFS2_I(inode);
         }
  
-       if (sdp->sd_args.ar_localcaching)
+       if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
                 goto valid;
  
         had_lock = (gfs2_glock_is_locked_by_me(dip->i_gl) != NULL);
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c

index b9dd88a78dd47073e3af1645a1e0fa3bcb94d1bb..5c356d09c321c10133afc7cf93aba2eddd1cb3c1 100644 (file)
--- a/fs/gfs2/dir.c
+++ b/fs/gfs2/dir.c
@@ -79,6 +79,9 @@
  #define gfs2_disk_hash2offset(h) (((u64)(h)) >> 1)
  #define gfs2_dir_offset2hash(p) ((u32)(((u64)(p)) << 1))
  
+struct qstr gfs2_qdot __read_mostly;
+struct qstr gfs2_qdotdot __read_mostly;
+
  typedef int (*leaf_call_t) (struct gfs2_inode *dip, u32 index, u32 len,
                             u64 leaf_no, void *data);
  typedef int (*gfs2_dscan_t)(const struct gfs2_dirent *dent,
@@ -127,8 +130,8 @@ static int gfs2_dir_write_stuffed(struct gfs2_inode *ip, const char *buf,
  
         gfs2_trans_add_bh(ip->i_gl, dibh, 1);
         memcpy(dibh->b_data + offset + sizeof(struct gfs2_dinode), buf, size);
-       if (ip->i_disksize < offset + size)
-               ip->i_disksize = offset + size;
+       if (ip->i_inode.i_size < offset + size)
+               i_size_write(&ip->i_inode, offset + size);
         ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
         gfs2_dinode_out(ip, dibh->b_data);
  
@@ -225,8 +228,8 @@ out:
         if (error)
                 return error;
  
-       if (ip->i_disksize < offset + copied)
-               ip->i_disksize = offset + copied;
+       if (ip->i_inode.i_size < offset + copied)
+               i_size_write(&ip->i_inode, offset + copied);
         ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
  
         gfs2_trans_add_bh(ip->i_gl, dibh, 1);
@@ -275,12 +278,13 @@ static int gfs2_dir_read_data(struct gfs2_inode *ip, char *buf, u64 offset,
         unsigned int o;
         int copied = 0;
         int error = 0;
+       u64 disksize = i_size_read(&ip->i_inode);
  
-       if (offset >= ip->i_disksize)
+       if (offset >= disksize)
                 return 0;
  
-       if (offset + size > ip->i_disksize)
-               size = ip->i_disksize - offset;
+       if (offset + size > disksize)
+               size = disksize - offset;
  
         if (!size)
                 return 0;
@@ -727,7 +731,7 @@ static struct gfs2_dirent *gfs2_dirent_search(struct inode *inode,
                 unsigned hsize = 1 << ip->i_depth;
                 unsigned index;
                 u64 ln;
-               if (hsize * sizeof(u64) != ip->i_disksize) {
+               if (hsize * sizeof(u64) != i_size_read(inode)) {
                         gfs2_consist_inode(ip);
                         return ERR_PTR(-EIO);
                 }
@@ -879,7 +883,7 @@ static int dir_make_exhash(struct inode *inode)
         for (x = sdp->sd_hash_ptrs; x--; lp++)
                 *lp = cpu_to_be64(bn);
  
-       dip->i_disksize = sdp->sd_sb.sb_bsize / 2;
+       i_size_write(inode, sdp->sd_sb.sb_bsize / 2);
         gfs2_add_inode_blocks(&dip->i_inode, 1);
         dip->i_diskflags |= GFS2_DIF_EXHASH;
  
@@ -1057,11 +1061,12 @@ static int dir_double_exhash(struct gfs2_inode *dip)
         u64 *buf;
         u64 *from, *to;
         u64 block;
+       u64 disksize = i_size_read(&dip->i_inode);
         int x;
         int error = 0;
  
         hsize = 1 << dip->i_depth;
-       if (hsize * sizeof(u64) != dip->i_disksize) {
+       if (hsize * sizeof(u64) != disksize) {
                 gfs2_consist_inode(dip);
                 return -EIO;
         }
@@ -1072,7 +1077,7 @@ static int dir_double_exhash(struct gfs2_inode *dip)
         if (!buf)
                 return -ENOMEM;
  
-       for (block = dip->i_disksize >> sdp->sd_hash_bsize_shift; block--;) {
+       for (block = disksize >> sdp->sd_hash_bsize_shift; block--;) {
                 error = gfs2_dir_read_data(dip, (char *)buf,
                                             block * sdp->sd_hash_bsize,
                                             sdp->sd_hash_bsize, 1);
@@ -1370,7 +1375,7 @@ static int dir_e_read(struct inode *inode, u64 *offset, void *opaque,
         unsigned depth = 0;
  
         hsize = 1 << dip->i_depth;
-       if (hsize * sizeof(u64) != dip->i_disksize) {
+       if (hsize * sizeof(u64) != i_size_read(inode)) {
                 gfs2_consist_inode(dip);
                 return -EIO;
         }
@@ -1784,7 +1789,7 @@ static int foreach_leaf(struct gfs2_inode *dip, leaf_call_t lc, void *data)
         int error = 0;
  
         hsize = 1 << dip->i_depth;
-       if (hsize * sizeof(u64) != dip->i_disksize) {
+       if (hsize * sizeof(u64) != i_size_read(&dip->i_inode)) {
                 gfs2_consist_inode(dip);
                 return -EIO;
         }
diff --git a/fs/gfs2/dir.h b/fs/gfs2/dir.h

index 4f919440c3be3e20ed49c2acb742db6758096bea..a98f644bd3df33596cf2382767b89ca0cdd08161 100644 (file)
--- a/fs/gfs2/dir.h
+++ b/fs/gfs2/dir.h
@@ -17,23 +17,24 @@ struct inode;
  struct gfs2_inode;
  struct gfs2_inum;
  
-struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *filename);
-int gfs2_dir_check(struct inode *dir, const struct qstr *filename,
-                  const struct gfs2_inode *ip);
-int gfs2_dir_add(struct inode *inode, const struct qstr *filename,
-                const struct gfs2_inode *ip, unsigned int type);
-int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *filename);
-int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque,
-                 filldir_t filldir);
-int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename,
-                  const struct gfs2_inode *nip, unsigned int new_type);
+extern struct inode *gfs2_dir_search(struct inode *dir,
+                                    const struct qstr *filename);
+extern int gfs2_dir_check(struct inode *dir, const struct qstr *filename,
+                         const struct gfs2_inode *ip);
+extern int gfs2_dir_add(struct inode *inode, const struct qstr *filename,
+                       const struct gfs2_inode *ip, unsigned int type);
+extern int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *filename);
+extern int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque,
+                        filldir_t filldir);
+extern int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename,
+                         const struct gfs2_inode *nip, unsigned int new_type);
  
-int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip);
+extern int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip);
  
-int gfs2_diradd_alloc_required(struct inode *dir,
-                              const struct qstr *filename);
-int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
-                           struct buffer_head **bhp);
+extern int gfs2_diradd_alloc_required(struct inode *dir,
+                                     const struct qstr *filename);
+extern int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
+                                  struct buffer_head **bhp);
  
  static inline u32 gfs2_disk_hash(const char *data, int len)
  {
@@ -61,4 +62,7 @@ static inline void gfs2_qstr2dirent(const struct qstr *name, u16 reclen, struct
         memcpy(dent + 1, name->name, name->len);
  }
  
+extern struct qstr gfs2_qdot;
+extern struct qstr gfs2_qdotdot;
+
  #endif /* __DIR_DOT_H__ */
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c

index dfe237a3f8ad9e2a0f11bae403ff1f1d8687cbd0..06d582732d3427a058864d03ca6cb667c5063481 100644 (file)
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -126,16 +126,9 @@ static int gfs2_get_name(struct dentry *parent, char *name,
  
  static struct dentry *gfs2_get_parent(struct dentry *child)
  {
-       struct qstr dotdot;
         struct dentry *dentry;
  
-       /*
-        * XXX(hch): it would be a good idea to keep this around as a
-        *           static variable.
-        */
-       gfs2_str2qstr(&dotdot, "..");
-
-       dentry = d_obtain_alias(gfs2_lookupi(child->d_inode, &dotdot, 1));
+       dentry = d_obtain_alias(gfs2_lookupi(child->d_inode, &gfs2_qdotdot, 1));
         if (!IS_ERR(dentry))
                 dentry->d_op = &gfs2_dops;
         return dentry;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c

index 4edd662c8232b2c24d1f1767b937f258071f8211..237ee6a940df23ab0720ed99252657611d0be7ec 100644 (file)
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -382,8 +382,10 @@ static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
         rblocks = RES_DINODE + ind_blocks;
         if (gfs2_is_jdata(ip))
                 rblocks += data_blocks ? data_blocks : 1;
-       if (ind_blocks || data_blocks)
+       if (ind_blocks || data_blocks) {
                 rblocks += RES_STATFS + RES_QUOTA;
+               rblocks += gfs2_rg_blocks(al);
+       }
         ret = gfs2_trans_begin(sdp, rblocks, 0);
         if (ret)
                 goto out_trans_fail;
@@ -491,7 +493,7 @@ static int gfs2_open(struct inode *inode, struct file *file)
                         goto fail;
  
                 if (!(file->f_flags & O_LARGEFILE) &&
-                   ip->i_disksize > MAX_NON_LFS) {
+                   i_size_read(inode) > MAX_NON_LFS) {
                         error = -EOVERFLOW;
                         goto fail_gunlock;
                 }
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c

index 9adf8f924e08991c32d12938702e570f163c3215..87778857f0994fa504224d93c7c5611d10bc1e8c 100644 (file)
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -441,6 +441,8 @@ static void state_change(struct gfs2_glock *gl, unsigned int new_state)
                 else
                         gfs2_glock_put_nolock(gl);
         }
+       if (held1 && held2 && list_empty(&gl->gl_holders))
+               clear_bit(GLF_QUEUED, &gl->gl_flags);
  
         gl->gl_state = new_state;
         gl->gl_tchange = jiffies;
@@ -1012,6 +1014,7 @@ fail:
                 if (unlikely((gh->gh_flags & LM_FLAG_PRIORITY) && !insert_pt))
                         insert_pt = &gh2->gh_list;
         }
+       set_bit(GLF_QUEUED, &gl->gl_flags);
         if (likely(insert_pt == NULL)) {
                 list_add_tail(&gh->gh_list, &gl->gl_holders);
                 if (unlikely(gh->gh_flags & LM_FLAG_PRIORITY))
@@ -1310,10 +1313,12 @@ void gfs2_glock_cb(struct gfs2_glock *gl, unsigned int state)
  
         gfs2_glock_hold(gl);
         holdtime = gl->gl_tchange + gl->gl_ops->go_min_hold_time;
-       if (time_before(now, holdtime))
-               delay = holdtime - now;
-       if (test_bit(GLF_REPLY_PENDING, &gl->gl_flags))
-               delay = gl->gl_ops->go_min_hold_time;
+       if (test_bit(GLF_QUEUED, &gl->gl_flags)) {
+               if (time_before(now, holdtime))
+                       delay = holdtime - now;
+               if (test_bit(GLF_REPLY_PENDING, &gl->gl_flags))
+                       delay = gl->gl_ops->go_min_hold_time;
+       }
  
         spin_lock(&gl->gl_spin);
         handle_callback(gl, state, delay);
@@ -1512,7 +1517,7 @@ static void clear_glock(struct gfs2_glock *gl)
         spin_unlock(&lru_lock);
  
         spin_lock(&gl->gl_spin);
-       if (find_first_holder(gl) == NULL && gl->gl_state != LM_ST_UNLOCKED)
+       if (gl->gl_state != LM_ST_UNLOCKED)
                 handle_callback(gl, LM_ST_UNLOCKED, 0);
         spin_unlock(&gl->gl_spin);
         gfs2_glock_hold(gl);
@@ -1660,6 +1665,8 @@ static const char *gflags2str(char *buf, const unsigned long *gflags)
                 *p++ = 'I';
         if (test_bit(GLF_FROZEN, gflags))
                 *p++ = 'F';
+       if (test_bit(GLF_QUEUED, gflags))
+               *p++ = 'q';
         *p = 0;
         return buf;
  }
@@ -1776,10 +1783,12 @@ int __init gfs2_glock_init(void)
         }
  #endif
  
-       glock_workqueue = create_workqueue("glock_workqueue");
+       glock_workqueue = alloc_workqueue("glock_workqueue", WQ_RESCUER |
+                                         WQ_HIGHPRI | WQ_FREEZEABLE, 0);
         if (IS_ERR(glock_workqueue))
                 return PTR_ERR(glock_workqueue);
-       gfs2_delete_workqueue = create_workqueue("delete_workqueue");
+       gfs2_delete_workqueue = alloc_workqueue("delete_workqueue", WQ_RESCUER |
+                                               WQ_FREEZEABLE, 0);
         if (IS_ERR(gfs2_delete_workqueue)) {
                 destroy_workqueue(glock_workqueue);
                 return PTR_ERR(gfs2_delete_workqueue);
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h

index 2bda1911b1563347b52d4d2da7892768d93bd947..db1c26d6d2206c8f9e9b68396380ed8791f3c720 100644 (file)
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -215,7 +215,7 @@ void gfs2_glock_dq_uninit_m(unsigned int num_gh, struct gfs2_holder *ghs);
  void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...);
  
  /**
- * gfs2_glock_nq_init - intialize a holder and enqueue it on a glock
+ * gfs2_glock_nq_init - initialize a holder and enqueue it on a glock
   * @gl: the glock
   * @state: the state we're requesting
   * @flags: the modifier flags
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c

index 49f97d3bb690c512cb0f338d85938e622d501a49..0d149dcc04e515adfaaeb632a6677e5e3b555f45 100644 (file)
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -262,13 +262,12 @@ static int inode_go_dump(struct seq_file *seq, const struct gfs2_glock *gl)
         const struct gfs2_inode *ip = gl->gl_object;
         if (ip == NULL)
                 return 0;
-       gfs2_print_dbg(seq, " I: n:%llu/%llu t:%u f:0x%02lx d:0x%08x s:%llu/%llu\n",
+       gfs2_print_dbg(seq, " I: n:%llu/%llu t:%u f:0x%02lx d:0x%08x s:%llu\n",
                   (unsigned long long)ip->i_no_formal_ino,
                   (unsigned long long)ip->i_no_addr,
                   IF2DT(ip->i_inode.i_mode), ip->i_flags,
                   (unsigned int)ip->i_diskflags,
-                 (unsigned long long)ip->i_inode.i_size,
-                 (unsigned long long)ip->i_disksize);
+                 (unsigned long long)i_size_read(&ip->i_inode));
         return 0;
  }
  
@@ -453,7 +452,6 @@ const struct gfs2_glock_operations *gfs2_glops_list[] = {
         [LM_TYPE_META] = &gfs2_meta_glops,
         [LM_TYPE_INODE] = &gfs2_inode_glops,
         [LM_TYPE_RGRP] = &gfs2_rgrp_glops,
-       [LM_TYPE_NONDISK] = &gfs2_trans_glops,
         [LM_TYPE_IOPEN] = &gfs2_iopen_glops,
         [LM_TYPE_FLOCK] = &gfs2_flock_glops,
         [LM_TYPE_NONDISK] = &gfs2_nondisk_glops,
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h

index fdbf4b366fa540d295dcbd73b298099b50418f8c..764fbb49efc8e3adbdeda7f83f178b0fd6ea70f8 100644 (file)
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -196,6 +196,7 @@ enum {
         GLF_REPLY_PENDING               = 9,
         GLF_INITIAL                     = 10,
         GLF_FROZEN                      = 11,
+       GLF_QUEUED                      = 12,
  };
  
  struct gfs2_glock {
@@ -267,7 +268,6 @@ struct gfs2_inode {
         u64 i_no_formal_ino;
         u64 i_generation;
         u64 i_eattr;
-       loff_t i_disksize;
         unsigned long i_flags;          /* GIF_... */
         struct gfs2_glock *i_gl; /* Move into i_gh? */
         struct gfs2_holder i_iopen_gh;
@@ -416,11 +416,8 @@ struct gfs2_args {
         char ar_locktable[GFS2_LOCKNAME_LEN];   /* Name of the Lock Table */
         char ar_hostdata[GFS2_LOCKNAME_LEN];    /* Host specific data */
         unsigned int ar_spectator:1;            /* Don't get a journal */
-       unsigned int ar_ignore_local_fs:1;      /* Ignore optimisations */
         unsigned int ar_localflocks:1;          /* Let the VFS do flock|fcntl */
-       unsigned int ar_localcaching:1;         /* Local caching */
         unsigned int ar_debug:1;                /* Oops on errors */
-       unsigned int ar_upgrade:1;              /* Upgrade ondisk format */
         unsigned int ar_posix_acl:1;            /* Enable posix acls */
         unsigned int ar_quota:2;                /* off/account/on */
         unsigned int ar_suiddir:1;              /* suiddir support */
@@ -497,7 +494,7 @@ struct gfs2_sb_host {
   */
  
  struct lm_lockstruct {
-       unsigned int ls_jid;
+       int ls_jid;
         unsigned int ls_first;
         unsigned int ls_first_done;
         unsigned int ls_nodir;
@@ -572,6 +569,7 @@ struct gfs2_sbd {
         struct list_head sd_rindex_mru_list;
         struct gfs2_rgrpd *sd_rindex_forward;
         unsigned int sd_rgrps;
+       unsigned int sd_max_rg_data;
  
         /* Journal index stuff */
  
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c

index 08140f185a3792153e23bab24f03ac3107d04757..06370f8bd8cf4aafa95fd4df64e93ec8d657328d 100644 (file)
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -359,8 +359,7 @@ static int gfs2_dinode_in(struct gfs2_inode *ip, const void *buf)
          * to do that.
          */
         ip->i_inode.i_nlink = be32_to_cpu(str->di_nlink);
-       ip->i_disksize = be64_to_cpu(str->di_size);
-       i_size_write(&ip->i_inode, ip->i_disksize);
+       i_size_write(&ip->i_inode, be64_to_cpu(str->di_size));
         gfs2_set_inode_blocks(&ip->i_inode, be64_to_cpu(str->di_blocks));
         atime.tv_sec = be64_to_cpu(str->di_atime);
         atime.tv_nsec = be32_to_cpu(str->di_atime_nsec);
@@ -1055,7 +1054,7 @@ void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf)
         str->di_uid = cpu_to_be32(ip->i_inode.i_uid);
         str->di_gid = cpu_to_be32(ip->i_inode.i_gid);
         str->di_nlink = cpu_to_be32(ip->i_inode.i_nlink);
-       str->di_size = cpu_to_be64(ip->i_disksize);
+       str->di_size = cpu_to_be64(i_size_read(&ip->i_inode));
         str->di_blocks = cpu_to_be64(gfs2_get_inode_blocks(&ip->i_inode));
         str->di_atime = cpu_to_be64(ip->i_inode.i_atime.tv_sec);
         str->di_mtime = cpu_to_be64(ip->i_inode.i_mtime.tv_sec);
@@ -1085,8 +1084,8 @@ void gfs2_dinode_print(const struct gfs2_inode *ip)
                (unsigned long long)ip->i_no_formal_ino);
         printk(KERN_INFO "  no_addr = %llu\n",
                (unsigned long long)ip->i_no_addr);
-       printk(KERN_INFO "  i_disksize = %llu\n",
-              (unsigned long long)ip->i_disksize);
+       printk(KERN_INFO "  i_size = %llu\n",
+              (unsigned long long)i_size_read(&ip->i_inode));
         printk(KERN_INFO "  blocks = %llu\n",
                (unsigned long long)gfs2_get_inode_blocks(&ip->i_inode));
         printk(KERN_INFO "  i_goal = %llu\n",
diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h

index 300ada3f21de0cf5fc22677283343a76caf8200e..6720d7d5fbc6aac91083b399b95ba6c978922c67 100644 (file)
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -19,6 +19,8 @@ extern int gfs2_releasepage(struct page *page, gfp_t gfp_mask);
  extern int gfs2_internal_read(struct gfs2_inode *ip,
                               struct file_ra_state *ra_state,
                               char *buf, loff_t *pos, unsigned size);
+extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+                                  unsigned int from, unsigned int to);
  extern void gfs2_set_aops(struct inode *inode);
  
  static inline int gfs2_is_stuffed(const struct gfs2_inode *ip)
@@ -80,6 +82,19 @@ static inline void gfs2_inum_out(const struct gfs2_inode *ip,
         dent->de_inum.no_addr = cpu_to_be64(ip->i_no_addr);
  }
  
+static inline int gfs2_check_internal_file_size(struct inode *inode,
+                                               u64 minsize, u64 maxsize)
+{
+       u64 size = i_size_read(inode);
+       if (size < minsize || size > maxsize)
+               goto err;
+       if (size & ((1 << inode->i_blkbits) - 1))
+               goto err;
+       return 0;
+err:
+       gfs2_consist_inode(GFS2_I(inode));
+       return -EIO;
+}
  
  extern void gfs2_set_iop(struct inode *inode);
  extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, 
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c

index 0e0470ed34c273a341aed5860191bfadf42b41e5..1c09425b45fd728ba52c1f5f49c3feac187640a2 100644 (file)
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -42,9 +42,9 @@ static void gdlm_ast(void *arg)
                 ret |= LM_OUT_CANCELED;
                 goto out;
         case -EAGAIN: /* Try lock fails */
+       case -EDEADLK: /* Deadlock detected */
                 goto out;
-       case -EINVAL: /* Invalid */
-       case -ENOMEM: /* Out of memory */
+       case -ETIMEDOUT: /* Canceled due to timeout */
                 ret |= LM_OUT_ERROR;
                 goto out;
         case 0: /* Success */
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c

index b1e9630eb46a8d0338fef57ffa15caf23ab1cf0f..d7eb1e209aa899561f7168a6d2964460ae73a89f 100644 (file)
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -24,6 +24,7 @@
  #include "glock.h"
  #include "quota.h"
  #include "recovery.h"
+#include "dir.h"
  
  static struct shrinker qd_shrinker = {
         .shrink = gfs2_shrink_qd_memory,
@@ -78,6 +79,9 @@ static int __init init_gfs2_fs(void)
  {
         int error;
  
+       gfs2_str2qstr(&gfs2_qdot, ".");
+       gfs2_str2qstr(&gfs2_qdotdot, "..");
+
         error = gfs2_sys_init();
         if (error)
                 return error;
@@ -140,7 +144,7 @@ static int __init init_gfs2_fs(void)
  
         error = -ENOMEM;
         gfs_recovery_wq = alloc_workqueue("gfs_recovery",
-                                         WQ_NON_REENTRANT | WQ_RESCUER, 0);
+                                         WQ_RESCUER | WQ_FREEZEABLE, 0);
         if (!gfs_recovery_wq)
                 goto fail_wq;
  
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c

index 4d4b1e8ac64c02ef64ffd216a71ec801e2fdb625..aeafc233dc897fdb102df29bf052fc040b61faef 100644 (file)
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -38,14 +38,6 @@
  #define DO 0
  #define UNDO 1
  
-static const u32 gfs2_old_fs_formats[] = {
-        0
-};
-
-static const u32 gfs2_old_multihost_formats[] = {
-        0
-};
-
  /**
   * gfs2_tune_init - Fill a gfs2_tune structure with default values
   * @gt: tune
@@ -135,8 +127,6 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
  
  static int gfs2_check_sb(struct gfs2_sbd *sdp, struct gfs2_sb_host *sb, int silent)
  {
-       unsigned int x;
-
         if (sb->sb_magic != GFS2_MAGIC ||
             sb->sb_type != GFS2_METATYPE_SB) {
                 if (!silent)
@@ -150,55 +140,9 @@ static int gfs2_check_sb(struct gfs2_sbd *sdp, struct gfs2_sb_host *sb, int sile
             sb->sb_multihost_format == GFS2_FORMAT_MULTI)
                 return 0;
  
-       if (sb->sb_fs_format != GFS2_FORMAT_FS) {
-               for (x = 0; gfs2_old_fs_formats[x]; x++)
-                       if (gfs2_old_fs_formats[x] == sb->sb_fs_format)
-                               break;
+       fs_warn(sdp, "Unknown on-disk format, unable to mount\n");
  
-               if (!gfs2_old_fs_formats[x]) {
-                       printk(KERN_WARNING
-                              "GFS2: code version (%u, %u) is incompatible "
-                              "with ondisk format (%u, %u)\n",
-                              GFS2_FORMAT_FS, GFS2_FORMAT_MULTI,
-                              sb->sb_fs_format, sb->sb_multihost_format);
-                       printk(KERN_WARNING
-                              "GFS2: I don't know how to upgrade this FS\n");
-                       return -EINVAL;
-               }
-       }
-
-       if (sb->sb_multihost_format != GFS2_FORMAT_MULTI) {
-               for (x = 0; gfs2_old_multihost_formats[x]; x++)
-                       if (gfs2_old_multihost_formats[x] ==
-                           sb->sb_multihost_format)
-                               break;
-
-               if (!gfs2_old_multihost_formats[x]) {
-                       printk(KERN_WARNING
-                              "GFS2: code version (%u, %u) is incompatible "
-                              "with ondisk format (%u, %u)\n",
-                              GFS2_FORMAT_FS, GFS2_FORMAT_MULTI,
-                              sb->sb_fs_format, sb->sb_multihost_format);
-                       printk(KERN_WARNING
-                              "GFS2: I don't know how to upgrade this FS\n");
-                       return -EINVAL;
-               }
-       }
-
-       if (!sdp->sd_args.ar_upgrade) {
-               printk(KERN_WARNING
-                      "GFS2: code version (%u, %u) is incompatible "
-                      "with ondisk format (%u, %u)\n",
-                      GFS2_FORMAT_FS, GFS2_FORMAT_MULTI,
-                      sb->sb_fs_format, sb->sb_multihost_format);
-               printk(KERN_INFO
-                      "GFS2: Use the \"upgrade\" mount option to upgrade "
-                      "the FS\n");
-               printk(KERN_INFO "GFS2: See the manual for more details\n");
-               return -EINVAL;
-       }
-
-       return 0;
+       return -EINVAL;
  }
  
  static void end_bio_io_page(struct bio *bio, int error)
@@ -586,7 +530,7 @@ static int map_journal_extents(struct gfs2_sbd *sdp)
  
         prev_db = 0;
  
-       for (lb = 0; lb < ip->i_disksize >> sdp->sd_sb.sb_bsize_shift; lb++) {
+       for (lb = 0; lb < i_size_read(jd->jd_inode) >> sdp->sd_sb.sb_bsize_shift; lb++) {
                 bh.b_state = 0;
                 bh.b_blocknr = 0;
                 bh.b_size = 1 << ip->i_inode.i_blkbits;
@@ -1022,7 +966,6 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int silent)
         if (!strcmp("lock_nolock", proto)) {
                 lm = &nolock_ops;
                 sdp->sd_args.ar_localflocks = 1;
-               sdp->sd_args.ar_localcaching = 1;
  #ifdef CONFIG_GFS2_FS_LOCKING_DLM
         } else if (!strcmp("lock_dlm", proto)) {
                 lm = &gfs2_dlm_ops;
@@ -1113,8 +1056,6 @@ static int gfs2_journalid_wait(void *word)
  
  static int wait_on_journal(struct gfs2_sbd *sdp)
  {
-       if (sdp->sd_args.ar_spectator)
-               return 0;
         if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
                 return 0;
  
@@ -1217,6 +1158,20 @@ static int fill_super(struct super_block *sb, struct gfs2_args *args, int silent
         if (error)
                 goto fail_sb;
  
+       /*
+        * If user space has failed to join the cluster or some similar
+        * failure has occurred, then the journal id will contain a
+        * negative (error) number. This will then be returned to the
+        * caller (of the mount syscall). We do this even for spectator
+        * mounts (which just write a jid of 0 to indicate "ok" even though
+        * the jid is unused in the spectator case)
+        */
+       if (sdp->sd_lockstruct.ls_jid < 0) {
+               error = sdp->sd_lockstruct.ls_jid;
+               sdp->sd_lockstruct.ls_jid = 0;
+               goto fail_sb;
+       }
+
         error = init_inodes(sdp, DO);
         if (error)
                 goto fail_sb;
diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c

index 1009be2c9737687cdee8b48668a5f5d09752abeb..0534510200d5961887d3ee957d799e3b4669e76b 100644 (file)
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -18,6 +18,8 @@
  #include <linux/gfs2_ondisk.h>
  #include <linux/crc32.h>
  #include <linux/fiemap.h>
+#include <linux/swap.h>
+#include <linux/falloc.h>
  #include <asm/uaccess.h>
  
  #include "gfs2.h"
@@ -217,7 +219,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
                         goto out_gunlock_q;
  
                 error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
-                                        al->al_rgd->rd_length +
+                                        gfs2_rg_blocks(al) +
                                          2 * RES_DINODE + RES_STATFS +
                                          RES_QUOTA, 0);
                 if (error)
@@ -406,7 +408,6 @@ static int gfs2_symlink(struct inode *dir, struct dentry *dentry,
  
         ip = ghs[1].gh_gl->gl_object;
  
-       ip->i_disksize = size;
         i_size_write(inode, size);
  
         error = gfs2_meta_inode_buffer(ip, &dibh);
@@ -461,7 +462,7 @@ static int gfs2_mkdir(struct inode *dir, struct dentry *dentry, int mode)
         ip = ghs[1].gh_gl->gl_object;
  
         ip->i_inode.i_nlink = 2;
-       ip->i_disksize = sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode);
+       i_size_write(inode, sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode));
         ip->i_diskflags |= GFS2_DIF_JDATA;
         ip->i_entries = 2;
  
@@ -470,18 +471,15 @@ static int gfs2_mkdir(struct inode *dir, struct dentry *dentry, int mode)
         if (!gfs2_assert_withdraw(sdp, !error)) {
                 struct gfs2_dinode *di = (struct gfs2_dinode *)dibh->b_data;
                 struct gfs2_dirent *dent = (struct gfs2_dirent *)(di+1);
-               struct qstr str;
  
-               gfs2_str2qstr(&str, ".");
                 gfs2_trans_add_bh(ip->i_gl, dibh, 1);
-               gfs2_qstr2dirent(&str, GFS2_DIRENT_SIZE(str.len), dent);
+               gfs2_qstr2dirent(&gfs2_qdot, GFS2_DIRENT_SIZE(gfs2_qdot.len), dent);
                 dent->de_inum = di->di_num; /* already GFS2 endian */
                 dent->de_type = cpu_to_be16(DT_DIR);
                 di->di_entries = cpu_to_be32(1);
  
-               gfs2_str2qstr(&str, "..");
                 dent = (struct gfs2_dirent *)((char*)dent + GFS2_DIRENT_SIZE(1));
-               gfs2_qstr2dirent(&str, dibh->b_size - GFS2_DIRENT_SIZE(1) - sizeof(struct gfs2_dinode), dent);
+               gfs2_qstr2dirent(&gfs2_qdotdot, dibh->b_size - GFS2_DIRENT_SIZE(1) - sizeof(struct gfs2_dinode), dent);
  
                 gfs2_inum_out(dip, dent);
                 dent->de_type = cpu_to_be16(DT_DIR);
@@ -522,7 +520,6 @@ static int gfs2_mkdir(struct inode *dir, struct dentry *dentry, int mode)
  static int gfs2_rmdiri(struct gfs2_inode *dip, const struct qstr *name,
                        struct gfs2_inode *ip)
  {
-       struct qstr dotname;
         int error;
  
         if (ip->i_entries != 2) {
@@ -539,13 +536,11 @@ static int gfs2_rmdiri(struct gfs2_inode *dip, const struct qstr *name,
         if (error)
                 return error;
  
-       gfs2_str2qstr(&dotname, ".");
-       error = gfs2_dir_del(ip, &dotname);
+       error = gfs2_dir_del(ip, &gfs2_qdot);
         if (error)
                 return error;
  
-       gfs2_str2qstr(&dotname, "..");
-       error = gfs2_dir_del(ip, &dotname);
+       error = gfs2_dir_del(ip, &gfs2_qdotdot);
         if (error)
                 return error;
  
@@ -694,11 +689,8 @@ static int gfs2_ok_to_move(struct gfs2_inode *this, struct gfs2_inode *to)
         struct inode *dir = &to->i_inode;
         struct super_block *sb = dir->i_sb;
         struct inode *tmp;
-       struct qstr dotdot;
         int error = 0;
  
-       gfs2_str2qstr(&dotdot, "..");
-
         igrab(dir);
  
         for (;;) {
@@ -711,7 +703,7 @@ static int gfs2_ok_to_move(struct gfs2_inode *this, struct gfs2_inode *to)
                         break;
                 }
  
-               tmp = gfs2_lookupi(dir, &dotdot, 1);
+               tmp = gfs2_lookupi(dir, &gfs2_qdotdot, 1);
                 if (IS_ERR(tmp)) {
                         error = PTR_ERR(tmp);
                         break;
@@ -744,7 +736,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
         struct gfs2_inode *ip = GFS2_I(odentry->d_inode);
         struct gfs2_inode *nip = NULL;
         struct gfs2_sbd *sdp = GFS2_SB(odir);
-       struct gfs2_holder ghs[5], r_gh = { .gh_gl = NULL, };
+       struct gfs2_holder ghs[5], r_gh = { .gh_gl = NULL, }, ri_gh;
         struct gfs2_rgrpd *nrgd;
         unsigned int num_gh;
         int dir_rename = 0;
@@ -758,6 +750,9 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
                         return 0;
         }
  
+       error = gfs2_rindex_hold(sdp, &ri_gh);
+       if (error)
+               return error;
  
         if (odip != ndip) {
                 error = gfs2_glock_nq_init(sdp->sd_rename_gl, LM_ST_EXCLUSIVE,
@@ -887,12 +882,12 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
  
                 al->al_requested = sdp->sd_max_dirres;
  
-               error = gfs2_inplace_reserve(ndip);
+               error = gfs2_inplace_reserve_ri(ndip);
                 if (error)
                         goto out_gunlock_q;
  
                 error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
-                                        al->al_rgd->rd_length +
+                                        gfs2_rg_blocks(al) +
                                          4 * RES_DINODE + 4 * RES_LEAF +
                                          RES_STATFS + RES_QUOTA + 4, 0);
                 if (error)
@@ -920,9 +915,6 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
         }
  
         if (dir_rename) {
-               struct qstr name;
-               gfs2_str2qstr(&name, "..");
-
                 error = gfs2_change_nlink(ndip, +1);
                 if (error)
                         goto out_end_trans;
@@ -930,7 +922,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
                 if (error)
                         goto out_end_trans;
  
-               error = gfs2_dir_mvino(ip, &name, ndip, DT_DIR);
+               error = gfs2_dir_mvino(ip, &gfs2_qdotdot, ndip, DT_DIR);
                 if (error)
                         goto out_end_trans;
         } else {
@@ -972,6 +964,7 @@ out_gunlock_r:
         if (r_gh.gh_gl)
                 gfs2_glock_dq_uninit(&r_gh);
  out:
+       gfs2_glock_dq_uninit(&ri_gh);
         return error;
  }
  
@@ -990,7 +983,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
         struct gfs2_inode *ip = GFS2_I(dentry->d_inode);
         struct gfs2_holder i_gh;
         struct buffer_head *dibh;
-       unsigned int x;
+       unsigned int x, size;
         char *buf;
         int error;
  
@@ -1002,7 +995,8 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
                 return NULL;
         }
  
-       if (!ip->i_disksize) {
+       size = (unsigned int)i_size_read(&ip->i_inode);
+       if (size == 0) {
                 gfs2_consist_inode(ip);
                 buf = ERR_PTR(-EIO);
                 goto out;
@@ -1014,7 +1008,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
                 goto out;
         }
  
-       x = ip->i_disksize + 1;
+       x = size + 1;
         buf = kmalloc(x, GFP_NOFS);
         if (!buf)
                 buf = ERR_PTR(-ENOMEM);
@@ -1071,30 +1065,6 @@ int gfs2_permission(struct inode *inode, int mask)
         return error;
  }
  
-/*
- * XXX(truncate): the truncate_setsize calls should be moved to the end.
- */
-static int setattr_size(struct inode *inode, struct iattr *attr)
-{
-       struct gfs2_inode *ip = GFS2_I(inode);
-       struct gfs2_sbd *sdp = GFS2_SB(inode);
-       int error;
-
-       if (attr->ia_size != ip->i_disksize) {
-               error = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
-               if (error)
-                       return error;
-               truncate_setsize(inode, attr->ia_size);
-               gfs2_trans_end(sdp);
-       }
-
-       error = gfs2_truncatei(ip, attr->ia_size);
-       if (error && (inode->i_size != ip->i_disksize))
-               i_size_write(inode, ip->i_disksize);
-
-       return error;
-}
-
  static int setattr_chown(struct inode *inode, struct iattr *attr)
  {
         struct gfs2_inode *ip = GFS2_I(inode);
@@ -1195,7 +1165,7 @@ static int gfs2_setattr(struct dentry *dentry, struct iattr *attr)
                 goto out;
  
         if (attr->ia_valid & ATTR_SIZE)
-               error = setattr_size(inode, attr);
+               error = gfs2_setattr_size(inode, attr->ia_size);
         else if (attr->ia_valid & (ATTR_UID | ATTR_GID))
                 error = setattr_chown(inode, attr);
         else if ((attr->ia_valid & ATTR_MODE) && IS_POSIXACL(inode))
@@ -1301,6 +1271,257 @@ static int gfs2_removexattr(struct dentry *dentry, const char *name)
         return ret;
  }
  
+static void empty_write_end(struct page *page, unsigned from,
+                          unsigned to)
+{
+       struct gfs2_inode *ip = GFS2_I(page->mapping->host);
+
+       page_zero_new_buffers(page, from, to);
+       flush_dcache_page(page);
+       mark_page_accessed(page);
+
+       if (!gfs2_is_writeback(ip))
+               gfs2_page_add_databufs(ip, page, from, to);
+
+       block_commit_write(page, from, to);
+}
+
+
+static int write_empty_blocks(struct page *page, unsigned from, unsigned to)
+{
+       unsigned start, end, next;
+       struct buffer_head *bh, *head;
+       int error;
+
+       if (!page_has_buffers(page)) {
+               error = block_prepare_write(page, from, to, gfs2_block_map);
+               if (unlikely(error))
+                       return error;
+
+               empty_write_end(page, from, to);
+               return 0;
+       }
+
+       bh = head = page_buffers(page);
+       next = end = 0;
+       while (next < from) {
+               next += bh->b_size;
+               bh = bh->b_this_page;
+       }
+       start = next;
+       do {
+               next += bh->b_size;
+               if (buffer_mapped(bh)) {
+                       if (end) {
+                               error = block_prepare_write(page, start, end,
+                                                           gfs2_block_map);
+                               if (unlikely(error))
+                                       return error;
+                               empty_write_end(page, start, end);
+                               end = 0;
+                       }
+                       start = next;
+               }
+               else
+                       end = next;
+               bh = bh->b_this_page;
+       } while (next < to);
+
+       if (end) {
+               error = block_prepare_write(page, start, end, gfs2_block_map);
+               if (unlikely(error))
+                       return error;
+               empty_write_end(page, start, end);
+       }
+
+       return 0;
+}
+
+static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
+                          int mode)
+{
+       struct gfs2_inode *ip = GFS2_I(inode);
+       struct buffer_head *dibh;
+       int error;
+       u64 start = offset >> PAGE_CACHE_SHIFT;
+       unsigned int start_offset = offset & ~PAGE_CACHE_MASK;
+       u64 end = (offset + len - 1) >> PAGE_CACHE_SHIFT;
+       pgoff_t curr;
+       struct page *page;
+       unsigned int end_offset = (offset + len) & ~PAGE_CACHE_MASK;
+       unsigned int from, to;
+
+       if (!end_offset)
+               end_offset = PAGE_CACHE_SIZE;
+
+       error = gfs2_meta_inode_buffer(ip, &dibh);
+       if (unlikely(error))
+               goto out;
+
+       gfs2_trans_add_bh(ip->i_gl, dibh, 1);
+
+       if (gfs2_is_stuffed(ip)) {
+               error = gfs2_unstuff_dinode(ip, NULL);
+               if (unlikely(error))
+                       goto out;
+       }
+
+       curr = start;
+       offset = start << PAGE_CACHE_SHIFT;
+       from = start_offset;
+       to = PAGE_CACHE_SIZE;
+       while (curr <= end) {
+               page = grab_cache_page_write_begin(inode->i_mapping, curr,
+                                                  AOP_FLAG_NOFS);
+               if (unlikely(!page)) {
+                       error = -ENOMEM;
+                       goto out;
+               }
+
+               if (curr == end)
+                       to = end_offset;
+               error = write_empty_blocks(page, from, to);
+               if (!error && offset + to > inode->i_size &&
+                   !(mode & FALLOC_FL_KEEP_SIZE)) {
+                       i_size_write(inode, offset + to);
+               }
+               unlock_page(page);
+               page_cache_release(page);
+               if (error)
+                       goto out;
+               curr++;
+               offset += PAGE_CACHE_SIZE;
+               from = 0;
+       }
+
+       gfs2_dinode_out(ip, dibh->b_data);
+       mark_inode_dirty(inode);
+
+       brelse(dibh);
+
+out:
+       return error;
+}
+
+static void calc_max_reserv(struct gfs2_inode *ip, loff_t max, loff_t *len,
+                           unsigned int *data_blocks, unsigned int *ind_blocks)
+{
+       const struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+       unsigned int max_blocks = ip->i_alloc->al_rgd->rd_free_clone;
+       unsigned int tmp, max_data = max_blocks - 3 * (sdp->sd_max_height - 1);
+
+       for (tmp = max_data; tmp > sdp->sd_diptrs;) {
+               tmp = DIV_ROUND_UP(tmp, sdp->sd_inptrs);
+               max_data -= tmp;
+       }
+       /* This calculation isn't the exact reverse of gfs2_write_calc_reserve,
+          so it might end up with fewer data blocks */
+       if (max_data <= *data_blocks)
+               return;
+       *data_blocks = max_data;
+       *ind_blocks = max_blocks - max_data;
+       *len = ((loff_t)max_data - 3) << sdp->sd_sb.sb_bsize_shift;
+       if (*len > max) {
+               *len = max;
+               gfs2_write_calc_reserv(ip, max, data_blocks, ind_blocks);
+       }
+}
+
+static long gfs2_fallocate(struct inode *inode, int mode, loff_t offset,
+                          loff_t len)
+{
+       struct gfs2_sbd *sdp = GFS2_SB(inode);
+       struct gfs2_inode *ip = GFS2_I(inode);
+       unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
+       loff_t bytes, max_bytes;
+       struct gfs2_alloc *al;
+       int error;
+       loff_t next = (offset + len - 1) >> sdp->sd_sb.sb_bsize_shift;
+       next = (next + 1) << sdp->sd_sb.sb_bsize_shift;
+
+       offset = (offset >> sdp->sd_sb.sb_bsize_shift) <<
+                sdp->sd_sb.sb_bsize_shift;
+
+       len = next - offset;
+       bytes = sdp->sd_max_rg_data * sdp->sd_sb.sb_bsize / 2;
+       if (!bytes)
+               bytes = UINT_MAX;
+
+       gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
+       error = gfs2_glock_nq(&ip->i_gh);
+       if (unlikely(error))
+               goto out_uninit;
+
+       if (!gfs2_write_alloc_required(ip, offset, len))
+               goto out_unlock;
+
+       while (len > 0) {
+               if (len < bytes)
+                       bytes = len;
+               al = gfs2_alloc_get(ip);
+               if (!al) {
+                       error = -ENOMEM;
+                       goto out_unlock;
+               }
+
+               error = gfs2_quota_lock_check(ip);
+               if (error)
+                       goto out_alloc_put;
+
+retry:
+               gfs2_write_calc_reserv(ip, bytes, &data_blocks, &ind_blocks);
+
+               al->al_requested = data_blocks + ind_blocks;
+               error = gfs2_inplace_reserve(ip);
+               if (error) {
+                       if (error == -ENOSPC && bytes > sdp->sd_sb.sb_bsize) {
+                               bytes >>= 1;
+                               goto retry;
+                       }
+                       goto out_qunlock;
+               }
+               max_bytes = bytes;
+               calc_max_reserv(ip, len, &max_bytes, &data_blocks, &ind_blocks);
+               al->al_requested = data_blocks + ind_blocks;
+
+               rblocks = RES_DINODE + ind_blocks + RES_STATFS + RES_QUOTA +
+                         RES_RG_HDR + gfs2_rg_blocks(al);
+               if (gfs2_is_jdata(ip))
+                       rblocks += data_blocks ? data_blocks : 1;
+
+               error = gfs2_trans_begin(sdp, rblocks,
+                                        PAGE_CACHE_SIZE/sdp->sd_sb.sb_bsize);
+               if (error)
+                       goto out_trans_fail;
+
+               error = fallocate_chunk(inode, offset, max_bytes, mode);
+               gfs2_trans_end(sdp);
+
+               if (error)
+                       goto out_trans_fail;
+
+               len -= max_bytes;
+               offset += max_bytes;
+               gfs2_inplace_release(ip);
+               gfs2_quota_unlock(ip);
+               gfs2_alloc_put(ip);
+       }
+       goto out_unlock;
+
+out_trans_fail:
+       gfs2_inplace_release(ip);
+out_qunlock:
+       gfs2_quota_unlock(ip);
+out_alloc_put:
+       gfs2_alloc_put(ip);
+out_unlock:
+       gfs2_glock_dq(&ip->i_gh);
+out_uninit:
+       gfs2_holder_uninit(&ip->i_gh);
+       return error;
+}
+
+
  static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
                        u64 start, u64 len)
  {
@@ -1351,6 +1572,7 @@ const struct inode_operations gfs2_file_iops = {
         .getxattr = gfs2_getxattr,
         .listxattr = gfs2_listxattr,
         .removexattr = gfs2_removexattr,
+       .fallocate = gfs2_fallocate,
         .fiemap = gfs2_fiemap,
  };
  
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c

index 1bc6b5695e6dfb34870810b87bd09c819f25c93f..58a9b9998b42d0d9603c7a49ffc746ab4a22ef87 100644 (file)
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -735,10 +735,8 @@ get_a_page:
                 goto out;
  
         size = loc + sizeof(struct gfs2_quota);
-       if (size > inode->i_size) {
-               ip->i_disksize = size;
+       if (size > inode->i_size)
                 i_size_write(inode, size);
-       }
         inode->i_mtime = inode->i_atime = CURRENT_TIME;
         gfs2_trans_add_bh(ip->i_gl, dibh, 1);
         gfs2_dinode_out(ip, dibh->b_data);
@@ -817,7 +815,7 @@ static int do_sync(unsigned int num_qd, struct gfs2_quota_data **qda)
                 goto out_alloc;
  
         if (nalloc)
-               blocks += al->al_rgd->rd_length + nalloc * ind_blocks + RES_STATFS;
+               blocks += gfs2_rg_blocks(al) + nalloc * ind_blocks + RES_STATFS;
  
         error = gfs2_trans_begin(sdp, blocks, 0);
         if (error)
@@ -1190,18 +1188,17 @@ static void gfs2_quota_change_in(struct gfs2_quota_change_host *qc, const void *
  int gfs2_quota_init(struct gfs2_sbd *sdp)
  {
         struct gfs2_inode *ip = GFS2_I(sdp->sd_qc_inode);
-       unsigned int blocks = ip->i_disksize >> sdp->sd_sb.sb_bsize_shift;
+       u64 size = i_size_read(sdp->sd_qc_inode);
+       unsigned int blocks = size >> sdp->sd_sb.sb_bsize_shift;
         unsigned int x, slot = 0;
         unsigned int found = 0;
         u64 dblock;
         u32 extlen = 0;
         int error;
  
-       if (!ip->i_disksize || ip->i_disksize > (64 << 20) ||
-           ip->i_disksize & (sdp->sd_sb.sb_bsize - 1)) {
-               gfs2_consist_inode(ip);
+       if (gfs2_check_internal_file_size(sdp->sd_qc_inode, 1, 64 << 20))
                 return -EIO;
-       }
+
         sdp->sd_quota_slots = blocks * sdp->sd_qc_per_block;
         sdp->sd_quota_chunks = DIV_ROUND_UP(sdp->sd_quota_slots, 8 * PAGE_SIZE);
  
@@ -1589,6 +1586,7 @@ static int gfs2_set_dqblk(struct super_block *sb, int type, qid_t id,
                 error = gfs2_inplace_reserve(ip);
                 if (error)
                         goto out_alloc;
+               blocks += gfs2_rg_blocks(al);
         }
  
         error = gfs2_trans_begin(sdp, blocks + RES_DINODE + 1, 0);
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c

index f7f89a94a5a4598a4b532016a0c16846a62cef2b..f2a02edcac8f43e9de1dd22fd4c72ac7347f39a5 100644 (file)
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -455,11 +455,13 @@ void gfs2_recover_func(struct work_struct *work)
         int ro = 0;
         unsigned int pass;
         int error;
+       int jlocked = 0;
  
-       if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
+       if (sdp->sd_args.ar_spectator ||
+           (jd->jd_jid != sdp->sd_lockstruct.ls_jid)) {
                 fs_info(sdp, "jid=%u: Trying to acquire journal lock...\n",
                         jd->jd_jid);
-
+               jlocked = 1;
                 /* Acquire the journal lock so we can do recovery */
  
                 error = gfs2_glock_nq_num(sdp, jd->jd_jid, &gfs2_journal_glops,
@@ -554,13 +556,12 @@ void gfs2_recover_func(struct work_struct *work)
                         jd->jd_jid, t);
         }
  
-       if (jd->jd_jid != sdp->sd_lockstruct.ls_jid)
-               gfs2_glock_dq_uninit(&ji_gh);
-
         gfs2_recovery_done(sdp, jd->jd_jid, LM_RD_SUCCESS);
  
-       if (jd->jd_jid != sdp->sd_lockstruct.ls_jid)
+       if (jlocked) {
+               gfs2_glock_dq_uninit(&ji_gh);
                 gfs2_glock_dq_uninit(&j_gh);
+       }
  
         fs_info(sdp, "jid=%u: Done\n", jd->jd_jid);
         goto done;
@@ -568,7 +569,7 @@ void gfs2_recover_func(struct work_struct *work)
  fail_gunlock_tr:
         gfs2_glock_dq_uninit(&t_gh);
  fail_gunlock_ji:
-       if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
+       if (jlocked) {
                 gfs2_glock_dq_uninit(&ji_gh);
  fail_gunlock_j:
                 gfs2_glock_dq_uninit(&j_gh);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c

index 171a744f8e45d172f4e43eb793ae4e08dba8ba23..fb67f593f40856b213f03887642c44a3e4c7ccc5 100644 (file)
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -500,7 +500,7 @@ u64 gfs2_ri_total(struct gfs2_sbd *sdp)
         for (rgrps = 0;; rgrps++) {
                 loff_t pos = rgrps * sizeof(struct gfs2_rindex);
  
-               if (pos + sizeof(struct gfs2_rindex) >= ip->i_disksize)
+               if (pos + sizeof(struct gfs2_rindex) >= i_size_read(inode))
                         break;
                 error = gfs2_internal_read(ip, &ra_state, buf, &pos,
                                            sizeof(struct gfs2_rindex));
@@ -588,7 +588,9 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
         struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
         struct inode *inode = &ip->i_inode;
         struct file_ra_state ra_state;
-       u64 rgrp_count = ip->i_disksize;
+       u64 rgrp_count = i_size_read(inode);
+       struct gfs2_rgrpd *rgd;
+       unsigned int max_data = 0;
         int error;
  
         do_div(rgrp_count, sizeof(struct gfs2_rindex));
@@ -603,6 +605,10 @@ static int gfs2_ri_update(struct gfs2_inode *ip)
                 }
         }
  
+       list_for_each_entry(rgd, &sdp->sd_rindex_list, rd_list)
+               if (rgd->rd_data > max_data)
+                       max_data = rgd->rd_data;
+       sdp->sd_max_rg_data = max_data;
         sdp->sd_rindex_uptodate = 1;
         return 0;
  }
@@ -622,13 +628,15 @@ static int gfs2_ri_update_special(struct gfs2_inode *ip)
         struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
         struct inode *inode = &ip->i_inode;
         struct file_ra_state ra_state;
+       struct gfs2_rgrpd *rgd;
+       unsigned int max_data = 0;
         int error;
  
         file_ra_state_init(&ra_state, inode->i_mapping);
         for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) {
                 /* Ignore partials */
                 if ((sdp->sd_rgrps + 1) * sizeof(struct gfs2_rindex) >
-                   ip->i_disksize)
+                   i_size_read(inode))
                         break;
                 error = read_rindex_entry(ip, &ra_state);
                 if (error) {
@@ -636,6 +644,10 @@ static int gfs2_ri_update_special(struct gfs2_inode *ip)
                         return error;
                 }
         }
+       list_for_each_entry(rgd, &sdp->sd_rindex_list, rd_list)
+               if (rgd->rd_data > max_data)
+                       max_data = rgd->rd_data;
+       sdp->sd_max_rg_data = max_data;
  
         sdp->sd_rindex_uptodate = 1;
         return 0;
@@ -1188,7 +1200,8 @@ out:
   * Returns: errno
   */
  
-int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line)
+int gfs2_inplace_reserve_i(struct gfs2_inode *ip, int hold_rindex,
+                          char *file, unsigned int line)
  {
         struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
         struct gfs2_alloc *al = ip->i_alloc;
@@ -1199,12 +1212,15 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line)
                 return -EINVAL;
  
  try_again:
-       /* We need to hold the rindex unless the inode we're using is
-          the rindex itself, in which case it's already held. */
-       if (ip != GFS2_I(sdp->sd_rindex))
-               error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
-       else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */
-               error = gfs2_ri_update_special(ip);
+       if (hold_rindex) {
+               /* We need to hold the rindex unless the inode we're using is
+                  the rindex itself, in which case it's already held. */
+               if (ip != GFS2_I(sdp->sd_rindex))
+                       error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
+               else if (!sdp->sd_rgrps) /* We may not have the rindex read
+                                           in, so: */
+                       error = gfs2_ri_update_special(ip);
+       }
  
         if (error)
                 return error;
@@ -1215,7 +1231,7 @@ try_again:
            try to free it, and try the allocation again. */
         error = get_local_rgrp(ip, &unlinked, &last_unlinked);
         if (error) {
-               if (ip != GFS2_I(sdp->sd_rindex))
+               if (hold_rindex && ip != GFS2_I(sdp->sd_rindex))
                         gfs2_glock_dq_uninit(&al->al_ri_gh);
                 if (error != -EAGAIN)
                         return error;
@@ -1257,7 +1273,7 @@ void gfs2_inplace_release(struct gfs2_inode *ip)
         al->al_rgd = NULL;
         if (al->al_rgd_gh.gh_gl)
                 gfs2_glock_dq_uninit(&al->al_rgd_gh);
-       if (ip != GFS2_I(sdp->sd_rindex))
+       if (ip != GFS2_I(sdp->sd_rindex) && al->al_ri_gh.gh_gl)
                 gfs2_glock_dq_uninit(&al->al_ri_gh);
  }
  
@@ -1496,11 +1512,19 @@ int gfs2_alloc_block(struct gfs2_inode *ip, u64 *bn, unsigned int *n)
         struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
         struct buffer_head *dibh;
         struct gfs2_alloc *al = ip->i_alloc;
-       struct gfs2_rgrpd *rgd = al->al_rgd;
+       struct gfs2_rgrpd *rgd;
         u32 goal, blk;
         u64 block;
         int error;
  
+       /* Only happens if there is a bug in gfs2, return something distinctive
+        * to ensure that it is noticed.
+        */
+       if (al == NULL)
+               return -ECANCELED;
+
+       rgd = al->al_rgd;
+
         if (rgrp_contains_block(rgd, ip->i_goal))
                 goal = ip->i_goal - rgd->rd_data0;
         else
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h

index f07119d89557855fc9a8673b32e9119cad921570..0e35c0466f9a6c5979a3fe8c339def323bc37fad 100644 (file)
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -39,10 +39,12 @@ static inline void gfs2_alloc_put(struct gfs2_inode *ip)
         ip->i_alloc = NULL;
  }
  
-extern int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file,
-                                 unsigned int line);
+extern int gfs2_inplace_reserve_i(struct gfs2_inode *ip, int hold_rindex,
+                                 char *file, unsigned int line);
  #define gfs2_inplace_reserve(ip) \
-gfs2_inplace_reserve_i((ip), __FILE__, __LINE__)
+       gfs2_inplace_reserve_i((ip), 1, __FILE__, __LINE__)
+#define gfs2_inplace_reserve_ri(ip) \
+       gfs2_inplace_reserve_i((ip), 0, __FILE__, __LINE__)
  
  extern void gfs2_inplace_release(struct gfs2_inode *ip);
  
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c

index 77cb9f830ee47eb51520bd8581ebc700b426455e..047d1176096c79d6f0ce50227b8c098755fb0150 100644 (file)
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -85,6 +85,7 @@ static const match_table_t tokens = {
         {Opt_locktable, "locktable=%s"},
         {Opt_hostdata, "hostdata=%s"},
         {Opt_spectator, "spectator"},
+       {Opt_spectator, "norecovery"},
         {Opt_ignore_local_fs, "ignore_local_fs"},
         {Opt_localflocks, "localflocks"},
         {Opt_localcaching, "localcaching"},
@@ -159,13 +160,13 @@ int gfs2_mount_args(struct gfs2_args *args, char *options)
                         args->ar_spectator = 1;
                         break;
                 case Opt_ignore_local_fs:
-                       args->ar_ignore_local_fs = 1;
+                       /* Retained for backwards compat only */
                         break;
                 case Opt_localflocks:
                         args->ar_localflocks = 1;
                         break;
                 case Opt_localcaching:
-                       args->ar_localcaching = 1;
+                       /* Retained for backwards compat only */
                         break;
                 case Opt_debug:
                         if (args->ar_errors == GFS2_ERRORS_PANIC) {
@@ -179,7 +180,7 @@ int gfs2_mount_args(struct gfs2_args *args, char *options)
                         args->ar_debug = 0;
                         break;
                 case Opt_upgrade:
-                       args->ar_upgrade = 1;
+                       /* Retained for backwards compat only */
                         break;
                 case Opt_acl:
                         args->ar_posix_acl = 1;
@@ -342,15 +343,14 @@ int gfs2_jdesc_check(struct gfs2_jdesc *jd)
  {
         struct gfs2_inode *ip = GFS2_I(jd->jd_inode);
         struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
+       u64 size = i_size_read(jd->jd_inode);
  
-       if (ip->i_disksize < (8 << 20) || ip->i_disksize > (1 << 30) ||
-           (ip->i_disksize & (sdp->sd_sb.sb_bsize - 1))) {
-               gfs2_consist_inode(ip);
+       if (gfs2_check_internal_file_size(jd->jd_inode, 8 << 20, 1 << 30))
                 return -EIO;
-       }
-       jd->jd_blocks = ip->i_disksize >> sdp->sd_sb.sb_bsize_shift;
  
-       if (gfs2_write_alloc_required(ip, 0, ip->i_disksize)) {
+       jd->jd_blocks = size >> sdp->sd_sb.sb_bsize_shift;
+
+       if (gfs2_write_alloc_required(ip, 0, size)) {
                 gfs2_consist_inode(ip);
                 return -EIO;
         }
@@ -1129,9 +1129,7 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
  
         /* Some flags must not be changed */
         if (args_neq(&args, &sdp->sd_args, spectator) ||
-           args_neq(&args, &sdp->sd_args, ignore_local_fs) ||
             args_neq(&args, &sdp->sd_args, localflocks) ||
-           args_neq(&args, &sdp->sd_args, localcaching) ||
             args_neq(&args, &sdp->sd_args, meta))
                 return -EINVAL;
  
@@ -1234,16 +1232,10 @@ static int gfs2_show_options(struct seq_file *s, struct vfsmount *mnt)
                 seq_printf(s, ",hostdata=%s", args->ar_hostdata);
         if (args->ar_spectator)
                 seq_printf(s, ",spectator");
-       if (args->ar_ignore_local_fs)
-               seq_printf(s, ",ignore_local_fs");
         if (args->ar_localflocks)
                 seq_printf(s, ",localflocks");
-       if (args->ar_localcaching)
-               seq_printf(s, ",localcaching");
         if (args->ar_debug)
                 seq_printf(s, ",debug");
-       if (args->ar_upgrade)
-               seq_printf(s, ",upgrade");
         if (args->ar_posix_acl)
                 seq_printf(s, ",acl");
         if (args->ar_quota != GFS2_QUOTA_DEFAULT) {
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c

index ccacffd2faaa6d65d1f116b9b2788ee732786f3f..748ccb557c18fc504c28951f13d634fe9f05fa0a 100644 (file)
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -230,7 +230,10 @@ static ssize_t demote_rq_store(struct gfs2_sbd *sdp, const char *buf, size_t len
  
         if (gltype > LM_TYPE_JOURNAL)
                 return -EINVAL;
-       glops = gfs2_glops_list[gltype];
+       if (gltype == LM_TYPE_NONDISK && glnum == GFS2_TRANS_LOCK)
+               glops = &gfs2_trans_glops;
+       else
+               glops = gfs2_glops_list[gltype];
         if (glops == NULL)
                 return -EINVAL;
         if (!test_and_set_bit(SDF_DEMOTE, &sdp->sd_flags))
@@ -399,31 +402,32 @@ static ssize_t recover_status_show(struct gfs2_sbd *sdp, char *buf)
  
  static ssize_t jid_show(struct gfs2_sbd *sdp, char *buf)
  {
-       return sprintf(buf, "%u\n", sdp->sd_lockstruct.ls_jid);
+       return sprintf(buf, "%d\n", sdp->sd_lockstruct.ls_jid);
  }
  
  static ssize_t jid_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
  {
-        unsigned jid;
+        int jid;
         int rv;
  
-       rv = sscanf(buf, "%u", &jid);
+       rv = sscanf(buf, "%d", &jid);
         if (rv != 1)
                 return -EINVAL;
  
         spin_lock(&sdp->sd_jindex_spin);
         rv = -EINVAL;
-       if (sdp->sd_args.ar_spectator)
-               goto out;
         if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
                 goto out;
         rv = -EBUSY;
-       if (test_and_clear_bit(SDF_NOJOURNALID, &sdp->sd_flags) == 0)
+       if (test_bit(SDF_NOJOURNALID, &sdp->sd_flags) == 0)
                 goto out;
+       rv = 0;
+       if (sdp->sd_args.ar_spectator && jid > 0)
+               rv = jid = -EINVAL;
         sdp->sd_lockstruct.ls_jid = jid;
+       clear_bit(SDF_NOJOURNALID, &sdp->sd_flags);
         smp_mb__after_clear_bit();
         wake_up_bit(&sdp->sd_flags, SDF_NOJOURNALID);
-       rv = 0;
  out:
         spin_unlock(&sdp->sd_jindex_spin);
         return rv ? rv : len;
@@ -617,7 +621,7 @@ static int gfs2_uevent(struct kset *kset, struct kobject *kobj,
         add_uevent_var(env, "LOCKTABLE=%s", sdp->sd_table_name);
         add_uevent_var(env, "LOCKPROTO=%s", sdp->sd_proto_name);
         if (!test_bit(SDF_NOJOURNALID, &sdp->sd_flags))
-               add_uevent_var(env, "JOURNALID=%u", sdp->sd_lockstruct.ls_jid);
+               add_uevent_var(env, "JOURNALID=%d", sdp->sd_lockstruct.ls_jid);
         if (gfs2_uuid_valid(uuid))
                 add_uevent_var(env, "UUID=%pUB", uuid);
         return 0;
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h

index 148d55c14171dee39435c83d56c906c09dcd0cf8..cedb0bb96d968414d14e6eca3e5b934b68ee606e 100644 (file)
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -39,7 +39,8 @@
         {(1UL << GLF_INVALIDATE_IN_PROGRESS),   "i" },          \
         {(1UL << GLF_REPLY_PENDING),            "r" },          \
         {(1UL << GLF_INITIAL),                  "I" },          \
-       {(1UL << GLF_FROZEN),                   "F" })
+       {(1UL << GLF_FROZEN),                   "F" },          \
+       {(1UL << GLF_QUEUED),                   "q" })
  
  #ifndef NUMPTY
  #define NUMPTY
diff --git a/fs/gfs2/trans.h b/fs/gfs2/trans.h

index edf9d4bd908ee2726991f12ac869bfeb60971381..fb56b783e028c8ce61b0b663e67e2df14e408b07 100644 (file)
--- a/fs/gfs2/trans.h
+++ b/fs/gfs2/trans.h
@@ -20,11 +20,20 @@ struct gfs2_glock;
  #define RES_JDATA      1
  #define RES_DATA       1
  #define RES_LEAF       1
+#define RES_RG_HDR     1
  #define RES_RG_BIT     2
  #define RES_EATTR      1
  #define RES_STATFS     1
  #define RES_QUOTA      2
  
+/* reserve either the number of blocks to be allocated plus the rg header
+ * block, or all of the blocks in the rg, whichever is smaller */
+static inline unsigned int gfs2_rg_blocks(const struct gfs2_alloc *al)
+{
+       return (al->al_requested < al->al_rgd->rd_length)?
+              al->al_requested + 1 : al->al_rgd->rd_length;
+}
+
  int gfs2_trans_begin(struct gfs2_sbd *sdp, unsigned int blocks,
                      unsigned int revokes);
  
diff --git a/fs/gfs2/xattr.c b/fs/gfs2/xattr.c

index 776af6eb4bcb1b193ecf5ef858ac09cb0535b95f..30b58f07c8a6b219fc964efe101ce5f861397885 100644 (file)
--- a/fs/gfs2/xattr.c
+++ b/fs/gfs2/xattr.c
@@ -734,7 +734,7 @@ static int ea_alloc_skeleton(struct gfs2_inode *ip, struct gfs2_ea_request *er,
                 goto out_gunlock_q;
  
         error = gfs2_trans_begin(GFS2_SB(&ip->i_inode),
-                                blks + al->al_rgd->rd_length +
+                                blks + gfs2_rg_blocks(al) +
                                  RES_DINODE + RES_STATFS + RES_QUOTA, 0);
         if (error)
                 goto out_ipres;
diff --git a/fs/hfsplus/bfind.c b/fs/hfsplus/bfind.c

index 5007a41f1be9d345ff11dd7420285ee6a79c08e2..d182438c7ae4bea8b4ae108ad910ee795417410a 100644 (file)
--- a/fs/hfsplus/bfind.c
+++ b/fs/hfsplus/bfind.c
@@ -23,7 +23,7 @@ int hfs_find_init(struct hfs_btree *tree, struct hfs_find_data *fd)
         fd->search_key = ptr;
         fd->key = ptr + tree->max_key_len + 2;
         dprint(DBG_BNODE_REFS, "find_init: %d (%p)\n", tree->cnid, __builtin_return_address(0));
-       down(&tree->tree_lock);
+       mutex_lock(&tree->tree_lock);
         return 0;
  }
  
@@ -32,7 +32,7 @@ void hfs_find_exit(struct hfs_find_data *fd)
         hfs_bnode_put(fd->bnode);
         kfree(fd->search_key);
         dprint(DBG_BNODE_REFS, "find_exit: %d (%p)\n", fd->tree->cnid, __builtin_return_address(0));
-       up(&fd->tree->tree_lock);
+       mutex_unlock(&fd->tree->tree_lock);
         fd->tree = NULL;
  }
  
@@ -52,6 +52,10 @@ int __hfs_brec_find(struct hfs_bnode *bnode, struct hfs_find_data *fd)
                 rec = (e + b) / 2;
                 len = hfs_brec_lenoff(bnode, rec, &off);
                 keylen = hfs_brec_keylen(bnode, rec);
+               if (keylen == 0) {
+                       res = -EINVAL;
+                       goto fail;
+               }
                 hfs_bnode_read(bnode, fd->key, off, keylen);
                 cmpval = bnode->tree->keycmp(fd->key, fd->search_key);
                 if (!cmpval) {
@@ -67,6 +71,10 @@ int __hfs_brec_find(struct hfs_bnode *bnode, struct hfs_find_data *fd)
         if (rec != e && e >= 0) {
                 len = hfs_brec_lenoff(bnode, e, &off);
                 keylen = hfs_brec_keylen(bnode, e);
+               if (keylen == 0) {
+                       res = -EINVAL;
+                       goto fail;
+               }
                 hfs_bnode_read(bnode, fd->key, off, keylen);
         }
  done:
@@ -75,6 +83,7 @@ done:
         fd->keylength = keylen;
         fd->entryoffset = off + keylen;
         fd->entrylength = len - keylen;
+fail:
         return res;
  }
  
@@ -198,6 +207,10 @@ int hfs_brec_goto(struct hfs_find_data *fd, int cnt)
  
         len = hfs_brec_lenoff(bnode, fd->record, &off);
         keylen = hfs_brec_keylen(bnode, fd->record);
+       if (keylen == 0) {
+               res = -EINVAL;
+               goto out;
+       }
         fd->keyoffset = off;
         fd->keylength = keylen;
         fd->entryoffset = off + keylen;
diff --git a/fs/hfsplus/bitmap.c b/fs/hfsplus/bitmap.c

index ea30afc2a03c774221cc17341c446d30a0cb341b..ad57f5991eb1f14e3ac24dafa207e440d6d74be3 100644 (file)
--- a/fs/hfsplus/bitmap.c
+++ b/fs/hfsplus/bitmap.c
@@ -17,6 +17,7 @@
  
  int hfsplus_block_allocate(struct super_block *sb, u32 size, u32 offset, u32 *max)
  {
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
         struct page *page;
         struct address_space *mapping;
         __be32 *pptr, *curr, *end;
@@ -29,8 +30,8 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size, u32 offset, u32 *ma
                 return size;
  
         dprint(DBG_BITMAP, "block_allocate: %u,%u,%u\n", size, offset, len);
-       mutex_lock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
-       mapping = HFSPLUS_SB(sb).alloc_file->i_mapping;
+       mutex_lock(&sbi->alloc_mutex);
+       mapping = sbi->alloc_file->i_mapping;
         page = read_mapping_page(mapping, offset / PAGE_CACHE_BITS, NULL);
         if (IS_ERR(page)) {
                 start = size;
@@ -150,16 +151,17 @@ done:
         set_page_dirty(page);
         kunmap(page);
         *max = offset + (curr - pptr) * 32 + i - start;
-       HFSPLUS_SB(sb).free_blocks -= *max;
+       sbi->free_blocks -= *max;
         sb->s_dirt = 1;
         dprint(DBG_BITMAP, "-> %u,%u\n", start, *max);
  out:
-       mutex_unlock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
+       mutex_unlock(&sbi->alloc_mutex);
         return start;
  }
  
  int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count)
  {
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
         struct page *page;
         struct address_space *mapping;
         __be32 *pptr, *curr, *end;
@@ -172,11 +174,11 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count)
  
         dprint(DBG_BITMAP, "block_free: %u,%u\n", offset, count);
         /* are all of the bits in range? */
-       if ((offset + count) > HFSPLUS_SB(sb).total_blocks)
+       if ((offset + count) > sbi->total_blocks)
                 return -2;
  
-       mutex_lock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
-       mapping = HFSPLUS_SB(sb).alloc_file->i_mapping;
+       mutex_lock(&sbi->alloc_mutex);
+       mapping = sbi->alloc_file->i_mapping;
         pnr = offset / PAGE_CACHE_BITS;
         page = read_mapping_page(mapping, pnr, NULL);
         pptr = kmap(page);
@@ -224,9 +226,9 @@ done:
  out:
         set_page_dirty(page);
         kunmap(page);
-       HFSPLUS_SB(sb).free_blocks += len;
+       sbi->free_blocks += len;
         sb->s_dirt = 1;
-       mutex_unlock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
+       mutex_unlock(&sbi->alloc_mutex);
  
         return 0;
  }
diff --git a/fs/hfsplus/brec.c b/fs/hfsplus/brec.c

index c88e5d72a402ae2d29a8905cdccf7b59ccd4337d..2f39d05443e1a374b197359f70d20337562aa9e8 100644 (file)
--- a/fs/hfsplus/brec.c
+++ b/fs/hfsplus/brec.c
@@ -42,10 +42,13 @@ u16 hfs_brec_keylen(struct hfs_bnode *node, u16 rec)
                 recoff = hfs_bnode_read_u16(node, node->tree->node_size - (rec + 1) * 2);
                 if (!recoff)
                         return 0;
-               if (node->tree->attributes & HFS_TREE_BIGKEYS)
-                       retval = hfs_bnode_read_u16(node, recoff) + 2;
-               else
-                       retval = (hfs_bnode_read_u8(node, recoff) | 1) + 1;
+
+               retval = hfs_bnode_read_u16(node, recoff) + 2;
+               if (retval > node->tree->max_key_len + 2) {
+                       printk(KERN_ERR "hfs: keylen %d too large\n",
+                               retval);
+                       retval = 0;
+               }
         }
         return retval;
  }
@@ -216,7 +219,7 @@ skip:
  static struct hfs_bnode *hfs_bnode_split(struct hfs_find_data *fd)
  {
         struct hfs_btree *tree;
-       struct hfs_bnode *node, *new_node;
+       struct hfs_bnode *node, *new_node, *next_node;
         struct hfs_bnode_desc node_desc;
         int num_recs, new_rec_off, new_off, old_rec_off;
         int data_start, data_end, size;
@@ -235,6 +238,17 @@ static struct hfs_bnode *hfs_bnode_split(struct hfs_find_data *fd)
         new_node->type = node->type;
         new_node->height = node->height;
  
+       if (node->next)
+               next_node = hfs_bnode_find(tree, node->next);
+       else
+               next_node = NULL;
+
+       if (IS_ERR(next_node)) {
+               hfs_bnode_put(node);
+               hfs_bnode_put(new_node);
+               return next_node;
+       }
+
         size = tree->node_size / 2 - node->num_recs * 2 - 14;
         old_rec_off = tree->node_size - 4;
         num_recs = 1;
@@ -248,6 +262,8 @@ static struct hfs_bnode *hfs_bnode_split(struct hfs_find_data *fd)
                 /* panic? */
                 hfs_bnode_put(node);
                 hfs_bnode_put(new_node);
+               if (next_node)
+                       hfs_bnode_put(next_node);
                 return ERR_PTR(-ENOSPC);
         }
  
@@ -302,8 +318,7 @@ static struct hfs_bnode *hfs_bnode_split(struct hfs_find_data *fd)
         hfs_bnode_write(node, &node_desc, 0, sizeof(node_desc));
  
         /* update next bnode header */
-       if (new_node->next) {
-               struct hfs_bnode *next_node = hfs_bnode_find(tree, new_node->next);
+       if (next_node) {
                 next_node->prev = new_node->this;
                 hfs_bnode_read(next_node, &node_desc, 0, sizeof(node_desc));
                 node_desc.prev = cpu_to_be32(next_node->prev);
diff --git a/fs/hfsplus/btree.c b/fs/hfsplus/btree.c

index e49fcee1e293f725786e84ea6126e408e5eda7c8..22e4d4e329999c3ba9848036a3639bc194598f74 100644 (file)
--- a/fs/hfsplus/btree.c
+++ b/fs/hfsplus/btree.c
@@ -30,7 +30,7 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, u32 id)
         if (!tree)
                 return NULL;
  
-       init_MUTEX(&tree->tree_lock);
+       mutex_init(&tree->tree_lock);
         spin_lock_init(&tree->hash_lock);
         tree->sb = sb;
         tree->cnid = id;
@@ -39,10 +39,16 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, u32 id)
                 goto free_tree;
         tree->inode = inode;
  
+       if (!HFSPLUS_I(tree->inode)->first_blocks) {
+               printk(KERN_ERR
+                      "hfs: invalid btree extent records (0 size).\n");
+               goto free_inode;
+       }
+
         mapping = tree->inode->i_mapping;
         page = read_mapping_page(mapping, 0, NULL);
         if (IS_ERR(page))
-               goto free_tree;
+               goto free_inode;
  
         /* Load the header */
         head = (struct hfs_btree_header_rec *)(kmap(page) + sizeof(struct hfs_bnode_desc));
@@ -57,27 +63,56 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, u32 id)
         tree->max_key_len = be16_to_cpu(head->max_key_len);
         tree->depth = be16_to_cpu(head->depth);
  
-       /* Set the correct compare function */
-       if (id == HFSPLUS_EXT_CNID) {
+       /* Verify the tree and set the correct compare function */
+       switch (id) {
+       case HFSPLUS_EXT_CNID:
+               if (tree->max_key_len != HFSPLUS_EXT_KEYLEN - sizeof(u16)) {
+                       printk(KERN_ERR "hfs: invalid extent max_key_len %d\n",
+                               tree->max_key_len);
+                       goto fail_page;
+               }
+               if (tree->attributes & HFS_TREE_VARIDXKEYS) {
+                       printk(KERN_ERR "hfs: invalid extent btree flag\n");
+                       goto fail_page;
+               }
+
                 tree->keycmp = hfsplus_ext_cmp_key;
-       } else if (id == HFSPLUS_CAT_CNID) {
-               if ((HFSPLUS_SB(sb).flags & HFSPLUS_SB_HFSX) &&
+               break;
+       case HFSPLUS_CAT_CNID:
+               if (tree->max_key_len != HFSPLUS_CAT_KEYLEN - sizeof(u16)) {
+                       printk(KERN_ERR "hfs: invalid catalog max_key_len %d\n",
+                               tree->max_key_len);
+                       goto fail_page;
+               }
+               if (!(tree->attributes & HFS_TREE_VARIDXKEYS)) {
+                       printk(KERN_ERR "hfs: invalid catalog btree flag\n");
+                       goto fail_page;
+               }
+
+               if (test_bit(HFSPLUS_SB_HFSX, &HFSPLUS_SB(sb)->flags) &&
                     (head->key_type == HFSPLUS_KEY_BINARY))
                         tree->keycmp = hfsplus_cat_bin_cmp_key;
                 else {
                         tree->keycmp = hfsplus_cat_case_cmp_key;
-                       HFSPLUS_SB(sb).flags |= HFSPLUS_SB_CASEFOLD;
+                       set_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
                 }
-       } else {
+               break;
+       default:
                 printk(KERN_ERR "hfs: unknown B*Tree requested\n");
                 goto fail_page;
         }
  
+       if (!(tree->attributes & HFS_TREE_BIGKEYS)) {
+               printk(KERN_ERR "hfs: invalid btree flag\n");
+               goto fail_page;
+       }
+
         size = tree->node_size;
         if (!is_power_of_2(size))
                 goto fail_page;
         if (!tree->node_count)
                 goto fail_page;
+
         tree->node_size_shift = ffs(size) - 1;
  
         tree->pages_per_bnode = (tree->node_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
@@ -87,10 +122,11 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, u32 id)
         return tree;
  
   fail_page:
-       tree->inode->i_mapping->a_ops = &hfsplus_aops;
         page_cache_release(page);
- free_tree:
+ free_inode:
+       tree->inode->i_mapping->a_ops = &hfsplus_aops;
         iput(tree->inode);
+ free_tree:
         kfree(tree);
         return NULL;
  }
@@ -192,17 +228,18 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
  
         while (!tree->free_nodes) {
                 struct inode *inode = tree->inode;
+               struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
                 u32 count;
                 int res;
  
                 res = hfsplus_file_extend(inode);
                 if (res)
                         return ERR_PTR(res);
-               HFSPLUS_I(inode).phys_size = inode->i_size =
-                               (loff_t)HFSPLUS_I(inode).alloc_blocks <<
-                               HFSPLUS_SB(tree->sb).alloc_blksz_shift;
-               HFSPLUS_I(inode).fs_blocks = HFSPLUS_I(inode).alloc_blocks <<
-                                            HFSPLUS_SB(tree->sb).fs_shift;
+               hip->phys_size = inode->i_size =
+                       (loff_t)hip->alloc_blocks <<
+                               HFSPLUS_SB(tree->sb)->alloc_blksz_shift;
+               hip->fs_blocks =
+                       hip->alloc_blocks << HFSPLUS_SB(tree->sb)->fs_shift;
                 inode_set_bytes(inode, inode->i_size);
                 count = inode->i_size >> tree->node_size_shift;
                 tree->free_nodes = count - tree->node_count;
diff --git a/fs/hfsplus/catalog.c b/fs/hfsplus/catalog.c

index f6874acb2cf2a3a81a242f1983600eb912b9f607..8af45fc5b051abb3353501441fa9ea9030d9395a 100644 (file)
--- a/fs/hfsplus/catalog.c
+++ b/fs/hfsplus/catalog.c
@@ -67,7 +67,7 @@ static void hfsplus_cat_build_key_uni(hfsplus_btree_key *key, u32 parent,
         key->key_len = cpu_to_be16(6 + ustrlen);
  }
  
-static void hfsplus_set_perms(struct inode *inode, struct hfsplus_perm *perms)
+void hfsplus_cat_set_perms(struct inode *inode, struct hfsplus_perm *perms)
  {
         if (inode->i_flags & S_IMMUTABLE)
                 perms->rootflags |= HFSPLUS_FLG_IMMUTABLE;
@@ -77,15 +77,24 @@ static void hfsplus_set_perms(struct inode *inode, struct hfsplus_perm *perms)
                 perms->rootflags |= HFSPLUS_FLG_APPEND;
         else
                 perms->rootflags &= ~HFSPLUS_FLG_APPEND;
-       HFSPLUS_I(inode).rootflags = perms->rootflags;
-       HFSPLUS_I(inode).userflags = perms->userflags;
+
+       perms->userflags = HFSPLUS_I(inode)->userflags;
         perms->mode = cpu_to_be16(inode->i_mode);
         perms->owner = cpu_to_be32(inode->i_uid);
         perms->group = cpu_to_be32(inode->i_gid);
+
+       if (S_ISREG(inode->i_mode))
+               perms->dev = cpu_to_be32(inode->i_nlink);
+       else if (S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode))
+               perms->dev = cpu_to_be32(inode->i_rdev);
+       else
+               perms->dev = 0;
  }
  
  static int hfsplus_cat_build_record(hfsplus_cat_entry *entry, u32 cnid, struct inode *inode)
  {
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
+
         if (S_ISDIR(inode->i_mode)) {
                 struct hfsplus_cat_folder *folder;
  
@@ -93,13 +102,13 @@ static int hfsplus_cat_build_record(hfsplus_cat_entry *entry, u32 cnid, struct i
                 memset(folder, 0, sizeof(*folder));
                 folder->type = cpu_to_be16(HFSPLUS_FOLDER);
                 folder->id = cpu_to_be32(inode->i_ino);
-               HFSPLUS_I(inode).create_date =
+               HFSPLUS_I(inode)->create_date =
                         folder->create_date =
                         folder->content_mod_date =
                         folder->attribute_mod_date =
                         folder->access_date = hfsp_now2mt();
-               hfsplus_set_perms(inode, &folder->permissions);
-               if (inode == HFSPLUS_SB(inode->i_sb).hidden_dir)
+               hfsplus_cat_set_perms(inode, &folder->permissions);
+               if (inode == sbi->hidden_dir)
                         /* invisible and namelocked */
                         folder->user_info.frFlags = cpu_to_be16(0x5000);
                 return sizeof(*folder);
@@ -111,19 +120,19 @@ static int hfsplus_cat_build_record(hfsplus_cat_entry *entry, u32 cnid, struct i
                 file->type = cpu_to_be16(HFSPLUS_FILE);
                 file->flags = cpu_to_be16(HFSPLUS_FILE_THREAD_EXISTS);
                 file->id = cpu_to_be32(cnid);
-               HFSPLUS_I(inode).create_date =
+               HFSPLUS_I(inode)->create_date =
                         file->create_date =
                         file->content_mod_date =
                         file->attribute_mod_date =
                         file->access_date = hfsp_now2mt();
                 if (cnid == inode->i_ino) {
-                       hfsplus_set_perms(inode, &file->permissions);
+                       hfsplus_cat_set_perms(inode, &file->permissions);
                         if (S_ISLNK(inode->i_mode)) {
                                 file->user_info.fdType = cpu_to_be32(HFSP_SYMLINK_TYPE);
                                 file->user_info.fdCreator = cpu_to_be32(HFSP_SYMLINK_CREATOR);
                         } else {
-                               file->user_info.fdType = cpu_to_be32(HFSPLUS_SB(inode->i_sb).type);
-                               file->user_info.fdCreator = cpu_to_be32(HFSPLUS_SB(inode->i_sb).creator);
+                               file->user_info.fdType = cpu_to_be32(sbi->type);
+                               file->user_info.fdCreator = cpu_to_be32(sbi->creator);
                         }
                         if ((file->permissions.rootflags | file->permissions.userflags) & HFSPLUS_FLG_IMMUTABLE)
                                 file->flags |= cpu_to_be16(HFSPLUS_FILE_LOCKED);
@@ -131,8 +140,8 @@ static int hfsplus_cat_build_record(hfsplus_cat_entry *entry, u32 cnid, struct i
                         file->user_info.fdType = cpu_to_be32(HFSP_HARDLINK_TYPE);
                         file->user_info.fdCreator = cpu_to_be32(HFSP_HFSPLUS_CREATOR);
                         file->user_info.fdFlags = cpu_to_be16(0x100);
-                       file->create_date = HFSPLUS_I(HFSPLUS_SB(inode->i_sb).hidden_dir).create_date;
-                       file->permissions.dev = cpu_to_be32(HFSPLUS_I(inode).dev);
+                       file->create_date = HFSPLUS_I(sbi->hidden_dir)->create_date;
+                       file->permissions.dev = cpu_to_be32(HFSPLUS_I(inode)->linkid);
                 }
                 return sizeof(*file);
         }
@@ -180,15 +189,14 @@ int hfsplus_find_cat(struct super_block *sb, u32 cnid,
  
  int hfsplus_create_cat(u32 cnid, struct inode *dir, struct qstr *str, struct inode *inode)
  {
+       struct super_block *sb = dir->i_sb;
         struct hfs_find_data fd;
-       struct super_block *sb;
         hfsplus_cat_entry entry;
         int entry_size;
         int err;
  
         dprint(DBG_CAT_MOD, "create_cat: %s,%u(%d)\n", str->name, cnid, inode->i_nlink);
-       sb = dir->i_sb;
-       hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+       hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
  
         hfsplus_cat_build_key(sb, fd.search_key, cnid, NULL);
         entry_size = hfsplus_fill_cat_thread(sb, &entry, S_ISDIR(inode->i_mode) ?
@@ -234,7 +242,7 @@ err2:
  
  int hfsplus_delete_cat(u32 cnid, struct inode *dir, struct qstr *str)
  {
-       struct super_block *sb;
+       struct super_block *sb = dir->i_sb;
         struct hfs_find_data fd;
         struct hfsplus_fork_raw fork;
         struct list_head *pos;
@@ -242,8 +250,7 @@ int hfsplus_delete_cat(u32 cnid, struct inode *dir, struct qstr *str)
         u16 type;
  
         dprint(DBG_CAT_MOD, "delete_cat: %s,%u\n", str ? str->name : NULL, cnid);
-       sb = dir->i_sb;
-       hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+       hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
  
         if (!str) {
                 int len;
@@ -279,7 +286,7 @@ int hfsplus_delete_cat(u32 cnid, struct inode *dir, struct qstr *str)
                 hfsplus_free_fork(sb, cnid, &fork, HFSPLUS_TYPE_RSRC);
         }
  
-       list_for_each(pos, &HFSPLUS_I(dir).open_dir_list) {
+       list_for_each(pos, &HFSPLUS_I(dir)->open_dir_list) {
                 struct hfsplus_readdir_data *rd =
                         list_entry(pos, struct hfsplus_readdir_data, list);
                 if (fd.tree->keycmp(fd.search_key, (void *)&rd->key) < 0)
@@ -312,7 +319,7 @@ int hfsplus_rename_cat(u32 cnid,
                        struct inode *src_dir, struct qstr *src_name,
                        struct inode *dst_dir, struct qstr *dst_name)
  {
-       struct super_block *sb;
+       struct super_block *sb = src_dir->i_sb;
         struct hfs_find_data src_fd, dst_fd;
         hfsplus_cat_entry entry;
         int entry_size, type;
@@ -320,8 +327,7 @@ int hfsplus_rename_cat(u32 cnid,
  
         dprint(DBG_CAT_MOD, "rename_cat: %u - %lu,%s - %lu,%s\n", cnid, src_dir->i_ino, src_name->name,
                 dst_dir->i_ino, dst_name->name);
-       sb = src_dir->i_sb;
-       hfs_find_init(HFSPLUS_SB(sb).cat_tree, &src_fd);
+       hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &src_fd);
         dst_fd = src_fd;
  
         /* find the old dir entry and read the data */
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c

index 764fd1bdca882da08028e34b133930d9d155a364..d236d85ec9d73f703384ecaa9d6522fe7433c775 100644 (file)
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -39,7 +39,7 @@ static struct dentry *hfsplus_lookup(struct inode *dir, struct dentry *dentry,
  
         dentry->d_op = &hfsplus_dentry_operations;
         dentry->d_fsdata = NULL;
-       hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+       hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
         hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, &dentry->d_name);
  again:
         err = hfs_brec_read(&fd, &entry, sizeof(entry));
@@ -68,9 +68,9 @@ again:
                 cnid = be32_to_cpu(entry.file.id);
                 if (entry.file.user_info.fdType == cpu_to_be32(HFSP_HARDLINK_TYPE) &&
                     entry.file.user_info.fdCreator == cpu_to_be32(HFSP_HFSPLUS_CREATOR) &&
-                   (entry.file.create_date == HFSPLUS_I(HFSPLUS_SB(sb).hidden_dir).create_date ||
-                    entry.file.create_date == HFSPLUS_I(sb->s_root->d_inode).create_date) &&
-                   HFSPLUS_SB(sb).hidden_dir) {
+                   (entry.file.create_date == HFSPLUS_I(HFSPLUS_SB(sb)->hidden_dir)->create_date ||
+                    entry.file.create_date == HFSPLUS_I(sb->s_root->d_inode)->create_date) &&
+                   HFSPLUS_SB(sb)->hidden_dir) {
                         struct qstr str;
                         char name[32];
  
@@ -86,7 +86,8 @@ again:
                                 linkid = be32_to_cpu(entry.file.permissions.dev);
                                 str.len = sprintf(name, "iNode%d", linkid);
                                 str.name = name;
-                               hfsplus_cat_build_key(sb, fd.search_key, HFSPLUS_SB(sb).hidden_dir->i_ino, &str);
+                               hfsplus_cat_build_key(sb, fd.search_key,
+                                       HFSPLUS_SB(sb)->hidden_dir->i_ino, &str);
                                 goto again;
                         }
                 } else if (!dentry->d_fsdata)
@@ -101,7 +102,7 @@ again:
         if (IS_ERR(inode))
                 return ERR_CAST(inode);
         if (S_ISREG(inode->i_mode))
-               HFSPLUS_I(inode).dev = linkid;
+               HFSPLUS_I(inode)->linkid = linkid;
  out:
         d_add(dentry, inode);
         return NULL;
@@ -124,7 +125,7 @@ static int hfsplus_readdir(struct file *filp, void *dirent, filldir_t filldir)
         if (filp->f_pos >= inode->i_size)
                 return 0;
  
-       hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+       hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
         hfsplus_cat_build_key(sb, fd.search_key, inode->i_ino, NULL);
         err = hfs_brec_find(&fd);
         if (err)
@@ -180,8 +181,9 @@ static int hfsplus_readdir(struct file *filp, void *dirent, filldir_t filldir)
                                 err = -EIO;
                                 goto out;
                         }
-                       if (HFSPLUS_SB(sb).hidden_dir &&
-                           HFSPLUS_SB(sb).hidden_dir->i_ino == be32_to_cpu(entry.folder.id))
+                       if (HFSPLUS_SB(sb)->hidden_dir &&
+                           HFSPLUS_SB(sb)->hidden_dir->i_ino ==
+                                       be32_to_cpu(entry.folder.id))
                                 goto next;
                         if (filldir(dirent, strbuf, len, filp->f_pos,
                                     be32_to_cpu(entry.folder.id), DT_DIR))
@@ -217,7 +219,7 @@ static int hfsplus_readdir(struct file *filp, void *dirent, filldir_t filldir)
                 }
                 filp->private_data = rd;
                 rd->file = filp;
-               list_add(&rd->list, &HFSPLUS_I(inode).open_dir_list);
+               list_add(&rd->list, &HFSPLUS_I(inode)->open_dir_list);
         }
         memcpy(&rd->key, fd.key, sizeof(struct hfsplus_cat_key));
  out:
@@ -229,38 +231,18 @@ static int hfsplus_dir_release(struct inode *inode, struct file *file)
  {
         struct hfsplus_readdir_data *rd = file->private_data;
         if (rd) {
+               mutex_lock(&inode->i_mutex);
                 list_del(&rd->list);
+               mutex_unlock(&inode->i_mutex);
                 kfree(rd);
         }
         return 0;
  }
  
-static int hfsplus_create(struct inode *dir, struct dentry *dentry, int mode,
-                         struct nameidata *nd)
-{
-       struct inode *inode;
-       int res;
-
-       inode = hfsplus_new_inode(dir->i_sb, mode);
-       if (!inode)
-               return -ENOSPC;
-
-       res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
-       if (res) {
-               inode->i_nlink = 0;
-               hfsplus_delete_inode(inode);
-               iput(inode);
-               return res;
-       }
-       hfsplus_instantiate(dentry, inode, inode->i_ino);
-       mark_inode_dirty(inode);
-       return 0;
-}
-
  static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
                         struct dentry *dst_dentry)
  {
-       struct super_block *sb = dst_dir->i_sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(dst_dir->i_sb);
         struct inode *inode = src_dentry->d_inode;
         struct inode *src_dir = src_dentry->d_parent->d_inode;
         struct qstr str;
@@ -270,7 +252,10 @@ static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
  
         if (HFSPLUS_IS_RSRC(inode))
                 return -EPERM;
+       if (!S_ISREG(inode->i_mode))
+               return -EPERM;
  
+       mutex_lock(&sbi->vh_mutex);
         if (inode->i_ino == (u32)(unsigned long)src_dentry->d_fsdata) {
                 for (;;) {
                         get_random_bytes(&id, sizeof(cnid));
@@ -279,40 +264,41 @@ static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
                         str.len = sprintf(name, "iNode%d", id);
                         res = hfsplus_rename_cat(inode->i_ino,
                                                  src_dir, &src_dentry->d_name,
-                                                HFSPLUS_SB(sb).hidden_dir, &str);
+                                                sbi->hidden_dir, &str);
                         if (!res)
                                 break;
                         if (res != -EEXIST)
-                               return res;
+                               goto out;
                 }
-               HFSPLUS_I(inode).dev = id;
-               cnid = HFSPLUS_SB(sb).next_cnid++;
+               HFSPLUS_I(inode)->linkid = id;
+               cnid = sbi->next_cnid++;
                 src_dentry->d_fsdata = (void *)(unsigned long)cnid;
                 res = hfsplus_create_cat(cnid, src_dir, &src_dentry->d_name, inode);
                 if (res)
                         /* panic? */
-                       return res;
-               HFSPLUS_SB(sb).file_count++;
+                       goto out;
+               sbi->file_count++;
         }
-       cnid = HFSPLUS_SB(sb).next_cnid++;
+       cnid = sbi->next_cnid++;
         res = hfsplus_create_cat(cnid, dst_dir, &dst_dentry->d_name, inode);
         if (res)
-               return res;
+               goto out;
  
         inc_nlink(inode);
         hfsplus_instantiate(dst_dentry, inode, cnid);
         atomic_inc(&inode->i_count);
         inode->i_ctime = CURRENT_TIME_SEC;
         mark_inode_dirty(inode);
-       HFSPLUS_SB(sb).file_count++;
-       sb->s_dirt = 1;
-
-       return 0;
+       sbi->file_count++;
+       dst_dir->i_sb->s_dirt = 1;
+out:
+       mutex_unlock(&sbi->vh_mutex);
+       return res;
  }
  
  static int hfsplus_unlink(struct inode *dir, struct dentry *dentry)
  {
-       struct super_block *sb = dir->i_sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
         struct inode *inode = dentry->d_inode;
         struct qstr str;
         char name[32];
@@ -322,21 +308,22 @@ static int hfsplus_unlink(struct inode *dir, struct dentry *dentry)
         if (HFSPLUS_IS_RSRC(inode))
                 return -EPERM;
  
+       mutex_lock(&sbi->vh_mutex);
         cnid = (u32)(unsigned long)dentry->d_fsdata;
         if (inode->i_ino == cnid &&
-           atomic_read(&HFSPLUS_I(inode).opencnt)) {
+           atomic_read(&HFSPLUS_I(inode)->opencnt)) {
                 str.name = name;
                 str.len = sprintf(name, "temp%lu", inode->i_ino);
                 res = hfsplus_rename_cat(inode->i_ino,
                                          dir, &dentry->d_name,
-                                        HFSPLUS_SB(sb).hidden_dir, &str);
+                                        sbi->hidden_dir, &str);
                 if (!res)
                         inode->i_flags |= S_DEAD;
-               return res;
+               goto out;
         }
         res = hfsplus_delete_cat(cnid, dir, &dentry->d_name);
         if (res)
-               return res;
+               goto out;
  
         if (inode->i_nlink > 0)
                 drop_nlink(inode);
@@ -344,10 +331,10 @@ static int hfsplus_unlink(struct inode *dir, struct dentry *dentry)
                 clear_nlink(inode);
         if (!inode->i_nlink) {
                 if (inode->i_ino != cnid) {
-                       HFSPLUS_SB(sb).file_count--;
-                       if (!atomic_read(&HFSPLUS_I(inode).opencnt)) {
+                       sbi->file_count--;
+                       if (!atomic_read(&HFSPLUS_I(inode)->opencnt)) {
                                 res = hfsplus_delete_cat(inode->i_ino,
-                                                        HFSPLUS_SB(sb).hidden_dir,
+                                                        sbi->hidden_dir,
                                                          NULL);
                                 if (!res)
                                         hfsplus_delete_inode(inode);
@@ -356,107 +343,108 @@ static int hfsplus_unlink(struct inode *dir, struct dentry *dentry)
                 } else
                         hfsplus_delete_inode(inode);
         } else
-               HFSPLUS_SB(sb).file_count--;
+               sbi->file_count--;
         inode->i_ctime = CURRENT_TIME_SEC;
         mark_inode_dirty(inode);
-
+out:
+       mutex_unlock(&sbi->vh_mutex);
         return res;
  }
  
-static int hfsplus_mkdir(struct inode *dir, struct dentry *dentry, int mode)
-{
-       struct inode *inode;
-       int res;
-
-       inode = hfsplus_new_inode(dir->i_sb, S_IFDIR | mode);
-       if (!inode)
-               return -ENOSPC;
-
-       res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
-       if (res) {
-               inode->i_nlink = 0;
-               hfsplus_delete_inode(inode);
-               iput(inode);
-               return res;
-       }
-       hfsplus_instantiate(dentry, inode, inode->i_ino);
-       mark_inode_dirty(inode);
-       return 0;
-}
-
  static int hfsplus_rmdir(struct inode *dir, struct dentry *dentry)
  {
-       struct inode *inode;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
+       struct inode *inode = dentry->d_inode;
         int res;
  
-       inode = dentry->d_inode;
         if (inode->i_size != 2)
                 return -ENOTEMPTY;
+
+       mutex_lock(&sbi->vh_mutex);
         res = hfsplus_delete_cat(inode->i_ino, dir, &dentry->d_name);
         if (res)
-               return res;
+               goto out;
         clear_nlink(inode);
         inode->i_ctime = CURRENT_TIME_SEC;
         hfsplus_delete_inode(inode);
         mark_inode_dirty(inode);
-       return 0;
+out:
+       mutex_unlock(&sbi->vh_mutex);
+       return res;
  }
  
  static int hfsplus_symlink(struct inode *dir, struct dentry *dentry,
                            const char *symname)
  {
-       struct super_block *sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
         struct inode *inode;
-       int res;
+       int res = -ENOSPC;
  
-       sb = dir->i_sb;
-       inode = hfsplus_new_inode(sb, S_IFLNK | S_IRWXUGO);
+       mutex_lock(&sbi->vh_mutex);
+       inode = hfsplus_new_inode(dir->i_sb, S_IFLNK | S_IRWXUGO);
         if (!inode)
-               return -ENOSPC;
+               goto out;
  
         res = page_symlink(inode, symname, strlen(symname) + 1);
-       if (res) {
-               inode->i_nlink = 0;
-               hfsplus_delete_inode(inode);
-               iput(inode);
-               return res;
-       }
+       if (res)
+               goto out_err;
  
-       mark_inode_dirty(inode);
         res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
+       if (res)
+               goto out_err;
  
-       if (!res) {
-               hfsplus_instantiate(dentry, inode, inode->i_ino);
-               mark_inode_dirty(inode);
-       }
+       hfsplus_instantiate(dentry, inode, inode->i_ino);
+       mark_inode_dirty(inode);
+       goto out;
  
+out_err:
+       inode->i_nlink = 0;
+       hfsplus_delete_inode(inode);
+       iput(inode);
+out:
+       mutex_unlock(&sbi->vh_mutex);
         return res;
  }
  
  static int hfsplus_mknod(struct inode *dir, struct dentry *dentry,
                          int mode, dev_t rdev)
  {
-       struct super_block *sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
         struct inode *inode;
-       int res;
+       int res = -ENOSPC;
  
-       sb = dir->i_sb;
-       inode = hfsplus_new_inode(sb, mode);
+       mutex_lock(&sbi->vh_mutex);
+       inode = hfsplus_new_inode(dir->i_sb, mode);
         if (!inode)
-               return -ENOSPC;
+               goto out;
+
+       if (S_ISBLK(mode) || S_ISCHR(mode) || S_ISFIFO(mode) || S_ISSOCK(mode))
+               init_special_inode(inode, mode, rdev);
  
         res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
         if (res) {
                 inode->i_nlink = 0;
                 hfsplus_delete_inode(inode);
                 iput(inode);
-               return res;
+               goto out;
         }
-       init_special_inode(inode, mode, rdev);
+
         hfsplus_instantiate(dentry, inode, inode->i_ino);
         mark_inode_dirty(inode);
+out:
+       mutex_unlock(&sbi->vh_mutex);
+       return res;
+}
  
-       return 0;
+static int hfsplus_create(struct inode *dir, struct dentry *dentry, int mode,
+                         struct nameidata *nd)
+{
+       return hfsplus_mknod(dir, dentry, mode, 0);
+}
+
+static int hfsplus_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+       return hfsplus_mknod(dir, dentry, mode | S_IFDIR, 0);
  }
  
  static int hfsplus_rename(struct inode *old_dir, struct dentry *old_dentry,
@@ -466,7 +454,10 @@ static int hfsplus_rename(struct inode *old_dir, struct dentry *old_dentry,
  
         /* Unlink destination if it already exists */
         if (new_dentry->d_inode) {
-               res = hfsplus_unlink(new_dir, new_dentry);
+               if (S_ISDIR(new_dentry->d_inode->i_mode))
+                       res = hfsplus_rmdir(new_dir, new_dentry);
+               else
+                       res = hfsplus_unlink(new_dir, new_dentry);
                 if (res)
                         return res;
         }
diff --git a/fs/hfsplus/extents.c b/fs/hfsplus/extents.c

index 0022eec63cdacd97c2a438b8d9f623ff6be88dd4..0c9cb1820a523fae02c6bf2f37e6fbdc5c5dceeb 100644 (file)
--- a/fs/hfsplus/extents.c
+++ b/fs/hfsplus/extents.c
@@ -85,35 +85,49 @@ static u32 hfsplus_ext_lastblock(struct hfsplus_extent *ext)
  
  static void __hfsplus_ext_write_extent(struct inode *inode, struct hfs_find_data *fd)
  {
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         int res;
  
-       hfsplus_ext_build_key(fd->search_key, inode->i_ino, HFSPLUS_I(inode).cached_start,
-                             HFSPLUS_IS_RSRC(inode) ?  HFSPLUS_TYPE_RSRC : HFSPLUS_TYPE_DATA);
+       WARN_ON(!mutex_is_locked(&hip->extents_lock));
+
+       hfsplus_ext_build_key(fd->search_key, inode->i_ino, hip->cached_start,
+                             HFSPLUS_IS_RSRC(inode) ?
+                               HFSPLUS_TYPE_RSRC : HFSPLUS_TYPE_DATA);
+
         res = hfs_brec_find(fd);
-       if (HFSPLUS_I(inode).flags & HFSPLUS_FLG_EXT_NEW) {
+       if (hip->flags & HFSPLUS_FLG_EXT_NEW) {
                 if (res != -ENOENT)
                         return;
-               hfs_brec_insert(fd, HFSPLUS_I(inode).cached_extents, sizeof(hfsplus_extent_rec));
-               HFSPLUS_I(inode).flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
+               hfs_brec_insert(fd, hip->cached_extents,
+                               sizeof(hfsplus_extent_rec));
+               hip->flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
         } else {
                 if (res)
                         return;
-               hfs_bnode_write(fd->bnode, HFSPLUS_I(inode).cached_extents, fd->entryoffset, fd->entrylength);
-               HFSPLUS_I(inode).flags &= ~HFSPLUS_FLG_EXT_DIRTY;
+               hfs_bnode_write(fd->bnode, hip->cached_extents,
+                               fd->entryoffset, fd->entrylength);
+               hip->flags &= ~HFSPLUS_FLG_EXT_DIRTY;
         }
  }
  
-void hfsplus_ext_write_extent(struct inode *inode)
+static void hfsplus_ext_write_extent_locked(struct inode *inode)
  {
-       if (HFSPLUS_I(inode).flags & HFSPLUS_FLG_EXT_DIRTY) {
+       if (HFSPLUS_I(inode)->flags & HFSPLUS_FLG_EXT_DIRTY) {
                 struct hfs_find_data fd;
  
-               hfs_find_init(HFSPLUS_SB(inode->i_sb).ext_tree, &fd);
+               hfs_find_init(HFSPLUS_SB(inode->i_sb)->ext_tree, &fd);
                 __hfsplus_ext_write_extent(inode, &fd);
                 hfs_find_exit(&fd);
         }
  }
  
+void hfsplus_ext_write_extent(struct inode *inode)
+{
+       mutex_lock(&HFSPLUS_I(inode)->extents_lock);
+       hfsplus_ext_write_extent_locked(inode);
+       mutex_unlock(&HFSPLUS_I(inode)->extents_lock);
+}
+
  static inline int __hfsplus_ext_read_extent(struct hfs_find_data *fd,
                                             struct hfsplus_extent *extent,
                                             u32 cnid, u32 block, u8 type)
@@ -136,33 +150,39 @@ static inline int __hfsplus_ext_read_extent(struct hfs_find_data *fd,
  
  static inline int __hfsplus_ext_cache_extent(struct hfs_find_data *fd, struct inode *inode, u32 block)
  {
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         int res;
  
-       if (HFSPLUS_I(inode).flags & HFSPLUS_FLG_EXT_DIRTY)
+       WARN_ON(!mutex_is_locked(&hip->extents_lock));
+
+       if (hip->flags & HFSPLUS_FLG_EXT_DIRTY)
                 __hfsplus_ext_write_extent(inode, fd);
  
-       res = __hfsplus_ext_read_extent(fd, HFSPLUS_I(inode).cached_extents, inode->i_ino,
-                                       block, HFSPLUS_IS_RSRC(inode) ? HFSPLUS_TYPE_RSRC : HFSPLUS_TYPE_DATA);
+       res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino,
+                                       block, HFSPLUS_IS_RSRC(inode) ?
+                                               HFSPLUS_TYPE_RSRC :
+                                               HFSPLUS_TYPE_DATA);
         if (!res) {
-               HFSPLUS_I(inode).cached_start = be32_to_cpu(fd->key->ext.start_block);
-               HFSPLUS_I(inode).cached_blocks = hfsplus_ext_block_count(HFSPLUS_I(inode).cached_extents);
+               hip->cached_start = be32_to_cpu(fd->key->ext.start_block);
+               hip->cached_blocks = hfsplus_ext_block_count(hip->cached_extents);
         } else {
-               HFSPLUS_I(inode).cached_start = HFSPLUS_I(inode).cached_blocks = 0;
-               HFSPLUS_I(inode).flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
+               hip->cached_start = hip->cached_blocks = 0;
+               hip->flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
         }
         return res;
  }
  
  static int hfsplus_ext_read_extent(struct inode *inode, u32 block)
  {
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         struct hfs_find_data fd;
         int res;
  
-       if (block >= HFSPLUS_I(inode).cached_start &&
-           block < HFSPLUS_I(inode).cached_start + HFSPLUS_I(inode).cached_blocks)
+       if (block >= hip->cached_start &&
+           block < hip->cached_start + hip->cached_blocks)
                 return 0;
  
-       hfs_find_init(HFSPLUS_SB(inode->i_sb).ext_tree, &fd);
+       hfs_find_init(HFSPLUS_SB(inode->i_sb)->ext_tree, &fd);
         res = __hfsplus_ext_cache_extent(&fd, inode, block);
         hfs_find_exit(&fd);
         return res;
@@ -172,21 +192,21 @@ static int hfsplus_ext_read_extent(struct inode *inode, u32 block)
  int hfsplus_get_block(struct inode *inode, sector_t iblock,
                       struct buffer_head *bh_result, int create)
  {
-       struct super_block *sb;
+       struct super_block *sb = inode->i_sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         int res = -EIO;
         u32 ablock, dblock, mask;
         int shift;
  
-       sb = inode->i_sb;
-
         /* Convert inode block to disk allocation block */
-       shift = HFSPLUS_SB(sb).alloc_blksz_shift - sb->s_blocksize_bits;
-       ablock = iblock >> HFSPLUS_SB(sb).fs_shift;
+       shift = sbi->alloc_blksz_shift - sb->s_blocksize_bits;
+       ablock = iblock >> sbi->fs_shift;
  
-       if (iblock >= HFSPLUS_I(inode).fs_blocks) {
-               if (iblock > HFSPLUS_I(inode).fs_blocks || !create)
+       if (iblock >= hip->fs_blocks) {
+               if (iblock > hip->fs_blocks || !create)
                         return -EIO;
-               if (ablock >= HFSPLUS_I(inode).alloc_blocks) {
+               if (ablock >= hip->alloc_blocks) {
                         res = hfsplus_file_extend(inode);
                         if (res)
                                 return res;
@@ -194,33 +214,33 @@ int hfsplus_get_block(struct inode *inode, sector_t iblock,
         } else
                 create = 0;
  
-       if (ablock < HFSPLUS_I(inode).first_blocks) {
-               dblock = hfsplus_ext_find_block(HFSPLUS_I(inode).first_extents, ablock);
+       if (ablock < hip->first_blocks) {
+               dblock = hfsplus_ext_find_block(hip->first_extents, ablock);
                 goto done;
         }
  
         if (inode->i_ino == HFSPLUS_EXT_CNID)
                 return -EIO;
  
-       mutex_lock(&HFSPLUS_I(inode).extents_lock);
+       mutex_lock(&hip->extents_lock);
         res = hfsplus_ext_read_extent(inode, ablock);
         if (!res) {
-               dblock = hfsplus_ext_find_block(HFSPLUS_I(inode).cached_extents, ablock -
-                                            HFSPLUS_I(inode).cached_start);
+               dblock = hfsplus_ext_find_block(hip->cached_extents,
+                                               ablock - hip->cached_start);
         } else {
-               mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+               mutex_unlock(&hip->extents_lock);
                 return -EIO;
         }
-       mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+       mutex_unlock(&hip->extents_lock);
  
  done:
         dprint(DBG_EXTENT, "get_block(%lu): %llu - %u\n", inode->i_ino, (long long)iblock, dblock);
-       mask = (1 << HFSPLUS_SB(sb).fs_shift) - 1;
-       map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask));
+       mask = (1 << sbi->fs_shift) - 1;
+       map_bh(bh_result, sb, (dblock << sbi->fs_shift) + sbi->blockoffset + (iblock & mask));
         if (create) {
                 set_buffer_new(bh_result);
-               HFSPLUS_I(inode).phys_size += sb->s_blocksize;
-               HFSPLUS_I(inode).fs_blocks++;
+               hip->phys_size += sb->s_blocksize;
+               hip->fs_blocks++;
                 inode_add_bytes(inode, sb->s_blocksize);
                 mark_inode_dirty(inode);
         }
@@ -327,7 +347,7 @@ int hfsplus_free_fork(struct super_block *sb, u32 cnid, struct hfsplus_fork_raw
         if (total_blocks == blocks)
                 return 0;
  
-       hfs_find_init(HFSPLUS_SB(sb).ext_tree, &fd);
+       hfs_find_init(HFSPLUS_SB(sb)->ext_tree, &fd);
         do {
                 res = __hfsplus_ext_read_extent(&fd, ext_entry, cnid,
                                                 total_blocks, type);
@@ -348,29 +368,33 @@ int hfsplus_free_fork(struct super_block *sb, u32 cnid, struct hfsplus_fork_raw
  int hfsplus_file_extend(struct inode *inode)
  {
         struct super_block *sb = inode->i_sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         u32 start, len, goal;
         int res;
  
-       if (HFSPLUS_SB(sb).alloc_file->i_size * 8 < HFSPLUS_SB(sb).total_blocks - HFSPLUS_SB(sb).free_blocks + 8) {
+       if (sbi->alloc_file->i_size * 8 <
+           sbi->total_blocks - sbi->free_blocks + 8) {
                 // extend alloc file
-               printk(KERN_ERR "hfs: extend alloc file! (%Lu,%u,%u)\n", HFSPLUS_SB(sb).alloc_file->i_size * 8,
-                       HFSPLUS_SB(sb).total_blocks, HFSPLUS_SB(sb).free_blocks);
+               printk(KERN_ERR "hfs: extend alloc file! (%Lu,%u,%u)\n",
+                               sbi->alloc_file->i_size * 8,
+                               sbi->total_blocks, sbi->free_blocks);
                 return -ENOSPC;
         }
  
-       mutex_lock(&HFSPLUS_I(inode).extents_lock);
-       if (HFSPLUS_I(inode).alloc_blocks == HFSPLUS_I(inode).first_blocks)
-               goal = hfsplus_ext_lastblock(HFSPLUS_I(inode).first_extents);
+       mutex_lock(&hip->extents_lock);
+       if (hip->alloc_blocks == hip->first_blocks)
+               goal = hfsplus_ext_lastblock(hip->first_extents);
         else {
-               res = hfsplus_ext_read_extent(inode, HFSPLUS_I(inode).alloc_blocks);
+               res = hfsplus_ext_read_extent(inode, hip->alloc_blocks);
                 if (res)
                         goto out;
-               goal = hfsplus_ext_lastblock(HFSPLUS_I(inode).cached_extents);
+               goal = hfsplus_ext_lastblock(hip->cached_extents);
         }
  
-       len = HFSPLUS_I(inode).clump_blocks;
-       start = hfsplus_block_allocate(sb, HFSPLUS_SB(sb).total_blocks, goal, &len);
-       if (start >= HFSPLUS_SB(sb).total_blocks) {
+       len = hip->clump_blocks;
+       start = hfsplus_block_allocate(sb, sbi->total_blocks, goal, &len);
+       if (start >= sbi->total_blocks) {
                 start = hfsplus_block_allocate(sb, goal, 0, &len);
                 if (start >= goal) {
                         res = -ENOSPC;
@@ -379,56 +403,56 @@ int hfsplus_file_extend(struct inode *inode)
         }
  
         dprint(DBG_EXTENT, "extend %lu: %u,%u\n", inode->i_ino, start, len);
-       if (HFSPLUS_I(inode).alloc_blocks <= HFSPLUS_I(inode).first_blocks) {
-               if (!HFSPLUS_I(inode).first_blocks) {
+
+       if (hip->alloc_blocks <= hip->first_blocks) {
+               if (!hip->first_blocks) {
                         dprint(DBG_EXTENT, "first extents\n");
                         /* no extents yet */
-                       HFSPLUS_I(inode).first_extents[0].start_block = cpu_to_be32(start);
-                       HFSPLUS_I(inode).first_extents[0].block_count = cpu_to_be32(len);
+                       hip->first_extents[0].start_block = cpu_to_be32(start);
+                       hip->first_extents[0].block_count = cpu_to_be32(len);
                         res = 0;
                 } else {
                         /* try to append to extents in inode */
-                       res = hfsplus_add_extent(HFSPLUS_I(inode).first_extents,
-                                                HFSPLUS_I(inode).alloc_blocks,
+                       res = hfsplus_add_extent(hip->first_extents,
+                                                hip->alloc_blocks,
                                                  start, len);
                         if (res == -ENOSPC)
                                 goto insert_extent;
                 }
                 if (!res) {
-                       hfsplus_dump_extent(HFSPLUS_I(inode).first_extents);
-                       HFSPLUS_I(inode).first_blocks += len;
+                       hfsplus_dump_extent(hip->first_extents);
+                       hip->first_blocks += len;
                 }
         } else {
-               res = hfsplus_add_extent(HFSPLUS_I(inode).cached_extents,
-                                        HFSPLUS_I(inode).alloc_blocks -
-                                        HFSPLUS_I(inode).cached_start,
+               res = hfsplus_add_extent(hip->cached_extents,
+                                        hip->alloc_blocks - hip->cached_start,
                                          start, len);
                 if (!res) {
-                       hfsplus_dump_extent(HFSPLUS_I(inode).cached_extents);
-                       HFSPLUS_I(inode).flags |= HFSPLUS_FLG_EXT_DIRTY;
-                       HFSPLUS_I(inode).cached_blocks += len;
+                       hfsplus_dump_extent(hip->cached_extents);
+                       hip->flags |= HFSPLUS_FLG_EXT_DIRTY;
+                       hip->cached_blocks += len;
                 } else if (res == -ENOSPC)
                         goto insert_extent;
         }
  out:
-       mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+       mutex_unlock(&hip->extents_lock);
         if (!res) {
-               HFSPLUS_I(inode).alloc_blocks += len;
+               hip->alloc_blocks += len;
                 mark_inode_dirty(inode);
         }
         return res;
  
  insert_extent:
         dprint(DBG_EXTENT, "insert new extent\n");
-       hfsplus_ext_write_extent(inode);
+       hfsplus_ext_write_extent_locked(inode);
  
-       memset(HFSPLUS_I(inode).cached_extents, 0, sizeof(hfsplus_extent_rec));
-       HFSPLUS_I(inode).cached_extents[0].start_block = cpu_to_be32(start);
-       HFSPLUS_I(inode).cached_extents[0].block_count = cpu_to_be32(len);
-       hfsplus_dump_extent(HFSPLUS_I(inode).cached_extents);
-       HFSPLUS_I(inode).flags |= HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW;
-       HFSPLUS_I(inode).cached_start = HFSPLUS_I(inode).alloc_blocks;
-       HFSPLUS_I(inode).cached_blocks = len;
+       memset(hip->cached_extents, 0, sizeof(hfsplus_extent_rec));
+       hip->cached_extents[0].start_block = cpu_to_be32(start);
+       hip->cached_extents[0].block_count = cpu_to_be32(len);
+       hfsplus_dump_extent(hip->cached_extents);
+       hip->flags |= HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW;
+       hip->cached_start = hip->alloc_blocks;
+       hip->cached_blocks = len;
  
         res = 0;
         goto out;
@@ -437,13 +461,15 @@ insert_extent:
  void hfsplus_file_truncate(struct inode *inode)
  {
         struct super_block *sb = inode->i_sb;
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         struct hfs_find_data fd;
         u32 alloc_cnt, blk_cnt, start;
         int res;
  
-       dprint(DBG_INODE, "truncate: %lu, %Lu -> %Lu\n", inode->i_ino,
-              (long long)HFSPLUS_I(inode).phys_size, inode->i_size);
-       if (inode->i_size > HFSPLUS_I(inode).phys_size) {
+       dprint(DBG_INODE, "truncate: %lu, %Lu -> %Lu\n",
+               inode->i_ino, (long long)hip->phys_size, inode->i_size);
+
+       if (inode->i_size > hip->phys_size) {
                 struct address_space *mapping = inode->i_mapping;
                 struct page *page;
                 void *fsdata;
@@ -460,47 +486,48 @@ void hfsplus_file_truncate(struct inode *inode)
                         return;
                 mark_inode_dirty(inode);
                 return;
-       } else if (inode->i_size == HFSPLUS_I(inode).phys_size)
+       } else if (inode->i_size == hip->phys_size)
                 return;
  
-       blk_cnt = (inode->i_size + HFSPLUS_SB(sb).alloc_blksz - 1) >> HFSPLUS_SB(sb).alloc_blksz_shift;
-       alloc_cnt = HFSPLUS_I(inode).alloc_blocks;
+       blk_cnt = (inode->i_size + HFSPLUS_SB(sb)->alloc_blksz - 1) >>
+                       HFSPLUS_SB(sb)->alloc_blksz_shift;
+       alloc_cnt = hip->alloc_blocks;
         if (blk_cnt == alloc_cnt)
                 goto out;
  
-       mutex_lock(&HFSPLUS_I(inode).extents_lock);
-       hfs_find_init(HFSPLUS_SB(sb).ext_tree, &fd);
+       mutex_lock(&hip->extents_lock);
+       hfs_find_init(HFSPLUS_SB(sb)->ext_tree, &fd);
         while (1) {
-               if (alloc_cnt == HFSPLUS_I(inode).first_blocks) {
-                       hfsplus_free_extents(sb, HFSPLUS_I(inode).first_extents,
+               if (alloc_cnt == hip->first_blocks) {
+                       hfsplus_free_extents(sb, hip->first_extents,
                                              alloc_cnt, alloc_cnt - blk_cnt);
-                       hfsplus_dump_extent(HFSPLUS_I(inode).first_extents);
-                       HFSPLUS_I(inode).first_blocks = blk_cnt;
+                       hfsplus_dump_extent(hip->first_extents);
+                       hip->first_blocks = blk_cnt;
                         break;
                 }
                 res = __hfsplus_ext_cache_extent(&fd, inode, alloc_cnt);
                 if (res)
                         break;
-               start = HFSPLUS_I(inode).cached_start;
-               hfsplus_free_extents(sb, HFSPLUS_I(inode).cached_extents,
+               start = hip->cached_start;
+               hfsplus_free_extents(sb, hip->cached_extents,
                                      alloc_cnt - start, alloc_cnt - blk_cnt);
-               hfsplus_dump_extent(HFSPLUS_I(inode).cached_extents);
+               hfsplus_dump_extent(hip->cached_extents);
                 if (blk_cnt > start) {
-                       HFSPLUS_I(inode).flags |= HFSPLUS_FLG_EXT_DIRTY;
+                       hip->flags |= HFSPLUS_FLG_EXT_DIRTY;
                         break;
                 }
                 alloc_cnt = start;
-               HFSPLUS_I(inode).cached_start = HFSPLUS_I(inode).cached_blocks = 0;
-               HFSPLUS_I(inode).flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
+               hip->cached_start = hip->cached_blocks = 0;
+               hip->flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
                 hfs_brec_remove(&fd);
         }
         hfs_find_exit(&fd);
-       mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+       mutex_unlock(&hip->extents_lock);
  
-       HFSPLUS_I(inode).alloc_blocks = blk_cnt;
+       hip->alloc_blocks = blk_cnt;
  out:
-       HFSPLUS_I(inode).phys_size = inode->i_size;
-       HFSPLUS_I(inode).fs_blocks = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
-       inode_set_bytes(inode, HFSPLUS_I(inode).fs_blocks << sb->s_blocksize_bits);
+       hip->phys_size = inode->i_size;
+       hip->fs_blocks = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
+       inode_set_bytes(inode, hip->fs_blocks << sb->s_blocksize_bits);
         mark_inode_dirty(inode);
  }
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h

index dc856be3c2b010854c78da1049b86d2f7ce48147..cb3653efb57a2dcf2285a19fcb7262cb7a1ba509 100644 (file)
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -62,7 +62,7 @@ struct hfs_btree {
         unsigned int depth;
  
         //unsigned int map1_size, map_size;
-       struct semaphore tree_lock;
+       struct mutex tree_lock;
  
         unsigned int pages_per_bnode;
         spinlock_t hash_lock;
@@ -121,16 +121,21 @@ struct hfsplus_sb_info {
         u32 sect_count;
         int fs_shift;
  
-       /* Stuff in host order from Vol Header */
+       /* immutable data from the volume header */
         u32 alloc_blksz;
         int alloc_blksz_shift;
         u32 total_blocks;
+       u32 data_clump_blocks, rsrc_clump_blocks;
+
+       /* mutable data from the volume header, protected by alloc_mutex */
         u32 free_blocks;
-       u32 next_alloc;
+       struct mutex alloc_mutex;
+
+       /* mutable data from the volume header, protected by vh_mutex */
         u32 next_cnid;
         u32 file_count;
         u32 folder_count;
-       u32 data_clump_blocks, rsrc_clump_blocks;
+       struct mutex vh_mutex;
  
         /* Config options */
         u32 creator;
@@ -143,40 +148,50 @@ struct hfsplus_sb_info {
         int part, session;
  
         unsigned long flags;
-
-       struct hlist_head rsrc_inodes;
  };
  
-#define HFSPLUS_SB_WRITEBACKUP 0x0001
-#define HFSPLUS_SB_NODECOMPOSE 0x0002
-#define HFSPLUS_SB_FORCE       0x0004
-#define HFSPLUS_SB_HFSX                0x0008
-#define HFSPLUS_SB_CASEFOLD    0x0010
+#define HFSPLUS_SB_WRITEBACKUP 0
+#define HFSPLUS_SB_NODECOMPOSE 1
+#define HFSPLUS_SB_FORCE       2
+#define HFSPLUS_SB_HFSX                3
+#define HFSPLUS_SB_CASEFOLD    4
  
  
  struct hfsplus_inode_info {
-       struct mutex extents_lock;
-       u32 clump_blocks, alloc_blocks;
-       sector_t fs_blocks;
-       /* Allocation extents from catalog record or volume header */
-       hfsplus_extent_rec first_extents;
-       u32 first_blocks;
-       hfsplus_extent_rec cached_extents;
-       u32 cached_start, cached_blocks;
         atomic_t opencnt;
  
-       struct inode *rsrc_inode;
+       /*
+        * Extent allocation information, protected by extents_lock.
+        */
+       u32 first_blocks;
+       u32 clump_blocks;
+       u32 alloc_blocks;
+       u32 cached_start;
+       u32 cached_blocks;
+       hfsplus_extent_rec first_extents;
+       hfsplus_extent_rec cached_extents;
         unsigned long flags;
+       struct mutex extents_lock;
  
+       /*
+        * Immutable data.
+        */
+       struct inode *rsrc_inode;
         __be32 create_date;
-       /* Device number in hfsplus_permissions in catalog */
-       u32 dev;
-       /* BSD system and user file flags */
-       u8 rootflags;
-       u8 userflags;
  
+       /*
+        * Protected by sbi->vh_mutex.
+        */
+       u32 linkid;
+
+       /*
+        * Protected by i_mutex.
+        */
+       sector_t fs_blocks;
+       u8 userflags;           /* BSD user file flags */
         struct list_head open_dir_list;
         loff_t phys_size;
+
         struct inode vfs_inode;
  };
  
@@ -184,8 +199,8 @@ struct hfsplus_inode_info {
  #define HFSPLUS_FLG_EXT_DIRTY  0x0002
  #define HFSPLUS_FLG_EXT_NEW    0x0004
  
-#define HFSPLUS_IS_DATA(inode)   (!(HFSPLUS_I(inode).flags & HFSPLUS_FLG_RSRC))
-#define HFSPLUS_IS_RSRC(inode)   (HFSPLUS_I(inode).flags & HFSPLUS_FLG_RSRC)
+#define HFSPLUS_IS_DATA(inode)   (!(HFSPLUS_I(inode)->flags & HFSPLUS_FLG_RSRC))
+#define HFSPLUS_IS_RSRC(inode)   (HFSPLUS_I(inode)->flags & HFSPLUS_FLG_RSRC)
  
  struct hfs_find_data {
         /* filled by caller */
@@ -311,6 +326,7 @@ int hfsplus_create_cat(u32, struct inode *, struct qstr *, struct inode *);
  int hfsplus_delete_cat(u32, struct inode *, struct qstr *);
  int hfsplus_rename_cat(u32, struct inode *, struct qstr *,
                        struct inode *, struct qstr *);
+void hfsplus_cat_set_perms(struct inode *inode, struct hfsplus_perm *perms);
  
  /* dir.c */
  extern const struct inode_operations hfsplus_dir_inode_operations;
@@ -372,26 +388,15 @@ int hfsplus_read_wrapper(struct super_block *);
  int hfs_part_find(struct super_block *, sector_t *, sector_t *);
  
  /* access macros */
-/*
  static inline struct hfsplus_sb_info *HFSPLUS_SB(struct super_block *sb)
  {
         return sb->s_fs_info;
  }
+
  static inline struct hfsplus_inode_info *HFSPLUS_I(struct inode *inode)
  {
         return list_entry(inode, struct hfsplus_inode_info, vfs_inode);
  }
-*/
-#define HFSPLUS_SB(super)      (*(struct hfsplus_sb_info *)(super)->s_fs_info)
-#define HFSPLUS_I(inode)       (*list_entry(inode, struct hfsplus_inode_info, vfs_inode))
-
-#if 1
-#define hfsplus_kmap(p)                ({ struct page *__p = (p); kmap(__p); })
-#define hfsplus_kunmap(p)      ({ struct page *__p = (p); kunmap(__p); __p; })
-#else
-#define hfsplus_kmap(p)                kmap(p)
-#define hfsplus_kunmap(p)      kunmap(p)
-#endif
  
  #define sb_bread512(sb, sec, data) ({                  \
         struct buffer_head *__bh;                       \
@@ -419,6 +424,4 @@ static inline struct hfsplus_inode_info *HFSPLUS_I(struct inode *inode)
  #define hfsp_ut2mt(t)          __hfsp_ut2mt((t).tv_sec)
  #define hfsp_now2mt()          __hfsp_ut2mt(get_seconds())
  
-#define kdev_t_to_nr(x)                (x)
-
  #endif
diff --git a/fs/hfsplus/hfsplus_raw.h b/fs/hfsplus/hfsplus_raw.h

index fe99fe8db61a3cb279885cee2c73cf5743704604..6892899fd6fbbabce55d3fe1f2c8915fe3f33d89 100644 (file)
--- a/fs/hfsplus/hfsplus_raw.h
+++ b/fs/hfsplus/hfsplus_raw.h
@@ -200,6 +200,7 @@ struct hfsplus_cat_key {
         struct hfsplus_unistr name;
  } __packed;
  
+#define HFSPLUS_CAT_KEYLEN     (sizeof(struct hfsplus_cat_key))
  
  /* Structs from hfs.h */
  struct hfsp_point {
@@ -323,7 +324,7 @@ struct hfsplus_ext_key {
         __be32 start_block;
  } __packed;
  
-#define HFSPLUS_EXT_KEYLEN 12
+#define HFSPLUS_EXT_KEYLEN     sizeof(struct hfsplus_ext_key)
  
  /* HFS+ generic BTree key */
  typedef union {
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c

index c5a979d62c657a866685dac4743fa01515b1653a..78449280dae08471958a328afca7f225fe9d744f 100644 (file)
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -36,7 +36,7 @@ static int hfsplus_write_begin(struct file *file, struct address_space *mapping,
         *pagep = NULL;
         ret = cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
                                 hfsplus_get_block,
-                               &HFSPLUS_I(mapping->host).phys_size);
+                               &HFSPLUS_I(mapping->host)->phys_size);
         if (unlikely(ret)) {
                 loff_t isize = mapping->host->i_size;
                 if (pos + len > isize)
@@ -62,13 +62,13 @@ static int hfsplus_releasepage(struct page *page, gfp_t mask)
  
         switch (inode->i_ino) {
         case HFSPLUS_EXT_CNID:
-               tree = HFSPLUS_SB(sb).ext_tree;
+               tree = HFSPLUS_SB(sb)->ext_tree;
                 break;
         case HFSPLUS_CAT_CNID:
-               tree = HFSPLUS_SB(sb).cat_tree;
+               tree = HFSPLUS_SB(sb)->cat_tree;
                 break;
         case HFSPLUS_ATTR_CNID:
-               tree = HFSPLUS_SB(sb).attr_tree;
+               tree = HFSPLUS_SB(sb)->attr_tree;
                 break;
         default:
                 BUG();
@@ -172,12 +172,13 @@ static struct dentry *hfsplus_file_lookup(struct inode *dir, struct dentry *dent
         struct hfs_find_data fd;
         struct super_block *sb = dir->i_sb;
         struct inode *inode = NULL;
+       struct hfsplus_inode_info *hip;
         int err;
  
         if (HFSPLUS_IS_RSRC(dir) || strcmp(dentry->d_name.name, "rsrc"))
                 goto out;
  
-       inode = HFSPLUS_I(dir).rsrc_inode;
+       inode = HFSPLUS_I(dir)->rsrc_inode;
         if (inode)
                 goto out;
  
@@ -185,12 +186,13 @@ static struct dentry *hfsplus_file_lookup(struct inode *dir, struct dentry *dent
         if (!inode)
                 return ERR_PTR(-ENOMEM);
  
+       hip = HFSPLUS_I(inode);
         inode->i_ino = dir->i_ino;
-       INIT_LIST_HEAD(&HFSPLUS_I(inode).open_dir_list);
-       mutex_init(&HFSPLUS_I(inode).extents_lock);
-       HFSPLUS_I(inode).flags = HFSPLUS_FLG_RSRC;
+       INIT_LIST_HEAD(&hip->open_dir_list);
+       mutex_init(&hip->extents_lock);
+       hip->flags = HFSPLUS_FLG_RSRC;
  
-       hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+       hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
         err = hfsplus_find_cat(sb, dir->i_ino, &fd);
         if (!err)
                 err = hfsplus_cat_read_inode(inode, &fd);
@@ -199,10 +201,18 @@ static struct dentry *hfsplus_file_lookup(struct inode *dir, struct dentry *dent
                 iput(inode);
                 return ERR_PTR(err);
         }
-       HFSPLUS_I(inode).rsrc_inode = dir;
-       HFSPLUS_I(dir).rsrc_inode = inode;
+       hip->rsrc_inode = dir;
+       HFSPLUS_I(dir)->rsrc_inode = inode;
         igrab(dir);
-       hlist_add_head(&inode->i_hash, &HFSPLUS_SB(sb).rsrc_inodes);
+
+       /*
+        * __mark_inode_dirty expects inodes to be hashed.  Since we don't
+        * want resource fork inodes in the regular inode space, we make them
+        * appear hashed, but do not put on any lists.  hlist_del()
+        * will work fine and require no locking.
+        */
+       inode->i_hash.pprev = &inode->i_hash.next;
+
         mark_inode_dirty(inode);
  out:
         d_add(dentry, inode);
@@ -211,30 +221,27 @@ out:
  
  static void hfsplus_get_perms(struct inode *inode, struct hfsplus_perm *perms, int dir)
  {
-       struct super_block *sb = inode->i_sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
         u16 mode;
  
         mode = be16_to_cpu(perms->mode);
  
         inode->i_uid = be32_to_cpu(perms->owner);
         if (!inode->i_uid && !mode)
-               inode->i_uid = HFSPLUS_SB(sb).uid;
+               inode->i_uid = sbi->uid;
  
         inode->i_gid = be32_to_cpu(perms->group);
         if (!inode->i_gid && !mode)
-               inode->i_gid = HFSPLUS_SB(sb).gid;
+               inode->i_gid = sbi->gid;
  
         if (dir) {
-               mode = mode ? (mode & S_IALLUGO) :
-                       (S_IRWXUGO & ~(HFSPLUS_SB(sb).umask));
+               mode = mode ? (mode & S_IALLUGO) : (S_IRWXUGO & ~(sbi->umask));
                 mode |= S_IFDIR;
         } else if (!mode)
-               mode = S_IFREG | ((S_IRUGO|S_IWUGO) &
-                       ~(HFSPLUS_SB(sb).umask));
+               mode = S_IFREG | ((S_IRUGO|S_IWUGO) & ~(sbi->umask));
         inode->i_mode = mode;
  
-       HFSPLUS_I(inode).rootflags = perms->rootflags;
-       HFSPLUS_I(inode).userflags = perms->userflags;
+       HFSPLUS_I(inode)->userflags = perms->userflags;
         if (perms->rootflags & HFSPLUS_FLG_IMMUTABLE)
                 inode->i_flags |= S_IMMUTABLE;
         else
@@ -245,30 +252,13 @@ static void hfsplus_get_perms(struct inode *inode, struct hfsplus_perm *perms, i
                 inode->i_flags &= ~S_APPEND;
  }
  
-static void hfsplus_set_perms(struct inode *inode, struct hfsplus_perm *perms)
-{
-       if (inode->i_flags & S_IMMUTABLE)
-               perms->rootflags |= HFSPLUS_FLG_IMMUTABLE;
-       else
-               perms->rootflags &= ~HFSPLUS_FLG_IMMUTABLE;
-       if (inode->i_flags & S_APPEND)
-               perms->rootflags |= HFSPLUS_FLG_APPEND;
-       else
-               perms->rootflags &= ~HFSPLUS_FLG_APPEND;
-       perms->userflags = HFSPLUS_I(inode).userflags;
-       perms->mode = cpu_to_be16(inode->i_mode);
-       perms->owner = cpu_to_be32(inode->i_uid);
-       perms->group = cpu_to_be32(inode->i_gid);
-       perms->dev = cpu_to_be32(HFSPLUS_I(inode).dev);
-}
-
  static int hfsplus_file_open(struct inode *inode, struct file *file)
  {
         if (HFSPLUS_IS_RSRC(inode))
-               inode = HFSPLUS_I(inode).rsrc_inode;
+               inode = HFSPLUS_I(inode)->rsrc_inode;
         if (!(file->f_flags & O_LARGEFILE) && i_size_read(inode) > MAX_NON_LFS)
                 return -EOVERFLOW;
-       atomic_inc(&HFSPLUS_I(inode).opencnt);
+       atomic_inc(&HFSPLUS_I(inode)->opencnt);
         return 0;
  }
  
@@ -277,12 +267,13 @@ static int hfsplus_file_release(struct inode *inode, struct file *file)
         struct super_block *sb = inode->i_sb;
  
         if (HFSPLUS_IS_RSRC(inode))
-               inode = HFSPLUS_I(inode).rsrc_inode;
-       if (atomic_dec_and_test(&HFSPLUS_I(inode).opencnt)) {
+               inode = HFSPLUS_I(inode)->rsrc_inode;
+       if (atomic_dec_and_test(&HFSPLUS_I(inode)->opencnt)) {
                 mutex_lock(&inode->i_mutex);
                 hfsplus_file_truncate(inode);
                 if (inode->i_flags & S_DEAD) {
-                       hfsplus_delete_cat(inode->i_ino, HFSPLUS_SB(sb).hidden_dir, NULL);
+                       hfsplus_delete_cat(inode->i_ino,
+                                          HFSPLUS_SB(sb)->hidden_dir, NULL);
                         hfsplus_delete_inode(inode);
                 }
                 mutex_unlock(&inode->i_mutex);
@@ -361,47 +352,52 @@ static const struct file_operations hfsplus_file_operations = {
  
  struct inode *hfsplus_new_inode(struct super_block *sb, int mode)
  {
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
         struct inode *inode = new_inode(sb);
+       struct hfsplus_inode_info *hip;
+
         if (!inode)
                 return NULL;
  
-       inode->i_ino = HFSPLUS_SB(sb).next_cnid++;
+       inode->i_ino = sbi->next_cnid++;
         inode->i_mode = mode;
         inode->i_uid = current_fsuid();
         inode->i_gid = current_fsgid();
         inode->i_nlink = 1;
         inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
-       INIT_LIST_HEAD(&HFSPLUS_I(inode).open_dir_list);
-       mutex_init(&HFSPLUS_I(inode).extents_lock);
-       atomic_set(&HFSPLUS_I(inode).opencnt, 0);
-       HFSPLUS_I(inode).flags = 0;
-       memset(HFSPLUS_I(inode).first_extents, 0, sizeof(hfsplus_extent_rec));
-       memset(HFSPLUS_I(inode).cached_extents, 0, sizeof(hfsplus_extent_rec));
-       HFSPLUS_I(inode).alloc_blocks = 0;
-       HFSPLUS_I(inode).first_blocks = 0;
-       HFSPLUS_I(inode).cached_start = 0;
-       HFSPLUS_I(inode).cached_blocks = 0;
-       HFSPLUS_I(inode).phys_size = 0;
-       HFSPLUS_I(inode).fs_blocks = 0;
-       HFSPLUS_I(inode).rsrc_inode = NULL;
+
+       hip = HFSPLUS_I(inode);
+       INIT_LIST_HEAD(&hip->open_dir_list);
+       mutex_init(&hip->extents_lock);
+       atomic_set(&hip->opencnt, 0);
+       hip->flags = 0;
+       memset(hip->first_extents, 0, sizeof(hfsplus_extent_rec));
+       memset(hip->cached_extents, 0, sizeof(hfsplus_extent_rec));
+       hip->alloc_blocks = 0;
+       hip->first_blocks = 0;
+       hip->cached_start = 0;
+       hip->cached_blocks = 0;
+       hip->phys_size = 0;
+       hip->fs_blocks = 0;
+       hip->rsrc_inode = NULL;
         if (S_ISDIR(inode->i_mode)) {
                 inode->i_size = 2;
-               HFSPLUS_SB(sb).folder_count++;
+               sbi->folder_count++;
                 inode->i_op = &hfsplus_dir_inode_operations;
                 inode->i_fop = &hfsplus_dir_operations;
         } else if (S_ISREG(inode->i_mode)) {
-               HFSPLUS_SB(sb).file_count++;
+               sbi->file_count++;
                 inode->i_op = &hfsplus_file_inode_operations;
                 inode->i_fop = &hfsplus_file_operations;
                 inode->i_mapping->a_ops = &hfsplus_aops;
-               HFSPLUS_I(inode).clump_blocks = HFSPLUS_SB(sb).data_clump_blocks;
+               hip->clump_blocks = sbi->data_clump_blocks;
         } else if (S_ISLNK(inode->i_mode)) {
-               HFSPLUS_SB(sb).file_count++;
+               sbi->file_count++;
                 inode->i_op = &page_symlink_inode_operations;
                 inode->i_mapping->a_ops = &hfsplus_aops;
-               HFSPLUS_I(inode).clump_blocks = 1;
+               hip->clump_blocks = 1;
         } else
-               HFSPLUS_SB(sb).file_count++;
+               sbi->file_count++;
         insert_inode_hash(inode);
         mark_inode_dirty(inode);
         sb->s_dirt = 1;
@@ -414,11 +410,11 @@ void hfsplus_delete_inode(struct inode *inode)
         struct super_block *sb = inode->i_sb;
  
         if (S_ISDIR(inode->i_mode)) {
-               HFSPLUS_SB(sb).folder_count--;
+               HFSPLUS_SB(sb)->folder_count--;
                 sb->s_dirt = 1;
                 return;
         }
-       HFSPLUS_SB(sb).file_count--;
+       HFSPLUS_SB(sb)->file_count--;
         if (S_ISREG(inode->i_mode)) {
                 if (!inode->i_nlink) {
                         inode->i_size = 0;
@@ -434,34 +430,39 @@ void hfsplus_delete_inode(struct inode *inode)
  void hfsplus_inode_read_fork(struct inode *inode, struct hfsplus_fork_raw *fork)
  {
         struct super_block *sb = inode->i_sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         u32 count;
         int i;
  
-       memcpy(&HFSPLUS_I(inode).first_extents, &fork->extents,
-              sizeof(hfsplus_extent_rec));
+       memcpy(&hip->first_extents, &fork->extents, sizeof(hfsplus_extent_rec));
         for (count = 0, i = 0; i < 8; i++)
                 count += be32_to_cpu(fork->extents[i].block_count);
-       HFSPLUS_I(inode).first_blocks = count;
-       memset(HFSPLUS_I(inode).cached_extents, 0, sizeof(hfsplus_extent_rec));
-       HFSPLUS_I(inode).cached_start = 0;
-       HFSPLUS_I(inode).cached_blocks = 0;
-
-       HFSPLUS_I(inode).alloc_blocks = be32_to_cpu(fork->total_blocks);
-       inode->i_size = HFSPLUS_I(inode).phys_size = be64_to_cpu(fork->total_size);
-       HFSPLUS_I(inode).fs_blocks = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
-       inode_set_bytes(inode, HFSPLUS_I(inode).fs_blocks << sb->s_blocksize_bits);
-       HFSPLUS_I(inode).clump_blocks = be32_to_cpu(fork->clump_size) >> HFSPLUS_SB(sb).alloc_blksz_shift;
-       if (!HFSPLUS_I(inode).clump_blocks)
-               HFSPLUS_I(inode).clump_blocks = HFSPLUS_IS_RSRC(inode) ? HFSPLUS_SB(sb).rsrc_clump_blocks :
-                               HFSPLUS_SB(sb).data_clump_blocks;
+       hip->first_blocks = count;
+       memset(hip->cached_extents, 0, sizeof(hfsplus_extent_rec));
+       hip->cached_start = 0;
+       hip->cached_blocks = 0;
+
+       hip->alloc_blocks = be32_to_cpu(fork->total_blocks);
+       hip->phys_size = inode->i_size = be64_to_cpu(fork->total_size);
+       hip->fs_blocks =
+               (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
+       inode_set_bytes(inode, hip->fs_blocks << sb->s_blocksize_bits);
+       hip->clump_blocks =
+               be32_to_cpu(fork->clump_size) >> sbi->alloc_blksz_shift;
+       if (!hip->clump_blocks) {
+               hip->clump_blocks = HFSPLUS_IS_RSRC(inode) ?
+                       sbi->rsrc_clump_blocks :
+                       sbi->data_clump_blocks;
+       }
  }
  
  void hfsplus_inode_write_fork(struct inode *inode, struct hfsplus_fork_raw *fork)
  {
-       memcpy(&fork->extents, &HFSPLUS_I(inode).first_extents,
+       memcpy(&fork->extents, &HFSPLUS_I(inode)->first_extents,
                sizeof(hfsplus_extent_rec));
         fork->total_size = cpu_to_be64(inode->i_size);
-       fork->total_blocks = cpu_to_be32(HFSPLUS_I(inode).alloc_blocks);
+       fork->total_blocks = cpu_to_be32(HFSPLUS_I(inode)->alloc_blocks);
  }
  
  int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd)
@@ -472,7 +473,7 @@ int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd)
  
         type = hfs_bnode_read_u16(fd->bnode, fd->entryoffset);
  
-       HFSPLUS_I(inode).dev = 0;
+       HFSPLUS_I(inode)->linkid = 0;
         if (type == HFSPLUS_FOLDER) {
                 struct hfsplus_cat_folder *folder = &entry.folder;
  
@@ -486,8 +487,8 @@ int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd)
                 inode->i_atime = hfsp_mt2ut(folder->access_date);
                 inode->i_mtime = hfsp_mt2ut(folder->content_mod_date);
                 inode->i_ctime = hfsp_mt2ut(folder->attribute_mod_date);
-               HFSPLUS_I(inode).create_date = folder->create_date;
-               HFSPLUS_I(inode).fs_blocks = 0;
+               HFSPLUS_I(inode)->create_date = folder->create_date;
+               HFSPLUS_I(inode)->fs_blocks = 0;
                 inode->i_op = &hfsplus_dir_inode_operations;
                 inode->i_fop = &hfsplus_dir_operations;
         } else if (type == HFSPLUS_FILE) {
@@ -518,7 +519,7 @@ int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd)
                 inode->i_atime = hfsp_mt2ut(file->access_date);
                 inode->i_mtime = hfsp_mt2ut(file->content_mod_date);
                 inode->i_ctime = hfsp_mt2ut(file->attribute_mod_date);
-               HFSPLUS_I(inode).create_date = file->create_date;
+               HFSPLUS_I(inode)->create_date = file->create_date;
         } else {
                 printk(KERN_ERR "hfs: bad catalog entry used to create inode\n");
                 res = -EIO;
@@ -533,12 +534,12 @@ int hfsplus_cat_write_inode(struct inode *inode)
         hfsplus_cat_entry entry;
  
         if (HFSPLUS_IS_RSRC(inode))
-               main_inode = HFSPLUS_I(inode).rsrc_inode;
+               main_inode = HFSPLUS_I(inode)->rsrc_inode;
  
         if (!main_inode->i_nlink)
                 return 0;
  
-       if (hfs_find_init(HFSPLUS_SB(main_inode->i_sb).cat_tree, &fd))
+       if (hfs_find_init(HFSPLUS_SB(main_inode->i_sb)->cat_tree, &fd))
                 /* panic? */
                 return -EIO;
  
@@ -554,7 +555,7 @@ int hfsplus_cat_write_inode(struct inode *inode)
                 hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
                                         sizeof(struct hfsplus_cat_folder));
                 /* simple node checks? */
-               hfsplus_set_perms(inode, &folder->permissions);
+               hfsplus_cat_set_perms(inode, &folder->permissions);
                 folder->access_date = hfsp_ut2mt(inode->i_atime);
                 folder->content_mod_date = hfsp_ut2mt(inode->i_mtime);
                 folder->attribute_mod_date = hfsp_ut2mt(inode->i_ctime);
@@ -576,11 +577,7 @@ int hfsplus_cat_write_inode(struct inode *inode)
                 hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
                                         sizeof(struct hfsplus_cat_file));
                 hfsplus_inode_write_fork(inode, &file->data_fork);
-               if (S_ISREG(inode->i_mode))
-                       HFSPLUS_I(inode).dev = inode->i_nlink;
-               if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode))
-                       HFSPLUS_I(inode).dev = kdev_t_to_nr(inode->i_rdev);
-               hfsplus_set_perms(inode, &file->permissions);
+               hfsplus_cat_set_perms(inode, &file->permissions);
                 if ((file->permissions.rootflags | file->permissions.userflags) & HFSPLUS_FLG_IMMUTABLE)
                         file->flags |= cpu_to_be16(HFSPLUS_FILE_LOCKED);
                 else
diff --git a/fs/hfsplus/ioctl.c b/fs/hfsplus/ioctl.c

index ac405f09902651838979e322931ba7a1f1441633..5b4667e08ef7789e49c274758a28d11ef86d5fde 100644 (file)
--- a/fs/hfsplus/ioctl.c
+++ b/fs/hfsplus/ioctl.c
@@ -17,83 +17,98 @@
  #include <linux/mount.h>
  #include <linux/sched.h>
  #include <linux/xattr.h>
-#include <linux/smp_lock.h>
  #include <asm/uaccess.h>
  #include "hfsplus_fs.h"
  
-long hfsplus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int hfsplus_ioctl_getflags(struct file *file, int __user *user_flags)
  {
-       struct inode *inode = filp->f_path.dentry->d_inode;
+       struct inode *inode = file->f_path.dentry->d_inode;
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+       unsigned int flags = 0;
+
+       if (inode->i_flags & S_IMMUTABLE)
+               flags |= FS_IMMUTABLE_FL;
+       if (inode->i_flags |= S_APPEND)
+               flags |= FS_APPEND_FL;
+       if (hip->userflags & HFSPLUS_FLG_NODUMP)
+               flags |= FS_NODUMP_FL;
+
+       return put_user(flags, user_flags);
+}
+
+static int hfsplus_ioctl_setflags(struct file *file, int __user *user_flags)
+{
+       struct inode *inode = file->f_path.dentry->d_inode;
+       struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
         unsigned int flags;
+       int err = 0;
  
-       lock_kernel();
-       switch (cmd) {
-       case HFSPLUS_IOC_EXT2_GETFLAGS:
-               flags = 0;
-               if (HFSPLUS_I(inode).rootflags & HFSPLUS_FLG_IMMUTABLE)
-                       flags |= FS_IMMUTABLE_FL; /* EXT2_IMMUTABLE_FL */
-               if (HFSPLUS_I(inode).rootflags & HFSPLUS_FLG_APPEND)
-                       flags |= FS_APPEND_FL; /* EXT2_APPEND_FL */
-               if (HFSPLUS_I(inode).userflags & HFSPLUS_FLG_NODUMP)
-                       flags |= FS_NODUMP_FL; /* EXT2_NODUMP_FL */
-               return put_user(flags, (int __user *)arg);
-       case HFSPLUS_IOC_EXT2_SETFLAGS: {
-               int err = 0;
-               err = mnt_want_write(filp->f_path.mnt);
-               if (err) {
-                       unlock_kernel();
-                       return err;
-               }
+       err = mnt_want_write(file->f_path.mnt);
+       if (err)
+               goto out;
  
-               if (!is_owner_or_cap(inode)) {
-                       err = -EACCES;
-                       goto setflags_out;
-               }
-               if (get_user(flags, (int __user *)arg)) {
-                       err = -EFAULT;
-                       goto setflags_out;
-               }
-               if (flags & (FS_IMMUTABLE_FL|FS_APPEND_FL) ||
-                   HFSPLUS_I(inode).rootflags & (HFSPLUS_FLG_IMMUTABLE|HFSPLUS_FLG_APPEND)) {
-                       if (!capable(CAP_LINUX_IMMUTABLE)) {
-                               err = -EPERM;
-                               goto setflags_out;
-                       }
-               }
+       if (!is_owner_or_cap(inode)) {
+               err = -EACCES;
+               goto out_drop_write;
+       }
  
-               /* don't silently ignore unsupported ext2 flags */
-               if (flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL)) {
-                       err = -EOPNOTSUPP;
-                       goto setflags_out;
-               }
-               if (flags & FS_IMMUTABLE_FL) { /* EXT2_IMMUTABLE_FL */
-                       inode->i_flags |= S_IMMUTABLE;
-                       HFSPLUS_I(inode).rootflags |= HFSPLUS_FLG_IMMUTABLE;
-               } else {
-                       inode->i_flags &= ~S_IMMUTABLE;
-                       HFSPLUS_I(inode).rootflags &= ~HFSPLUS_FLG_IMMUTABLE;
-               }
-               if (flags & FS_APPEND_FL) { /* EXT2_APPEND_FL */
-                       inode->i_flags |= S_APPEND;
-                       HFSPLUS_I(inode).rootflags |= HFSPLUS_FLG_APPEND;
-               } else {
-                       inode->i_flags &= ~S_APPEND;
-                       HFSPLUS_I(inode).rootflags &= ~HFSPLUS_FLG_APPEND;
+       if (get_user(flags, user_flags)) {
+               err = -EFAULT;
+               goto out_drop_write;
+       }
+
+       mutex_lock(&inode->i_mutex);
+
+       if ((flags & (FS_IMMUTABLE_FL|FS_APPEND_FL)) ||
+           inode->i_flags & (S_IMMUTABLE|S_APPEND)) {
+               if (!capable(CAP_LINUX_IMMUTABLE)) {
+                       err = -EPERM;
+                       goto out_unlock_inode;
                 }
-               if (flags & FS_NODUMP_FL) /* EXT2_NODUMP_FL */
-                       HFSPLUS_I(inode).userflags |= HFSPLUS_FLG_NODUMP;
-               else
-                       HFSPLUS_I(inode).userflags &= ~HFSPLUS_FLG_NODUMP;
-
-               inode->i_ctime = CURRENT_TIME_SEC;
-               mark_inode_dirty(inode);
-setflags_out:
-               mnt_drop_write(filp->f_path.mnt);
-               unlock_kernel();
-               return err;
         }
+
+       /* don't silently ignore unsupported ext2 flags */
+       if (flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL)) {
+               err = -EOPNOTSUPP;
+               goto out_unlock_inode;
+       }
+
+       if (flags & FS_IMMUTABLE_FL)
+               inode->i_flags |= S_IMMUTABLE;
+       else
+               inode->i_flags &= ~S_IMMUTABLE;
+
+       if (flags & FS_APPEND_FL)
+               inode->i_flags |= S_APPEND;
+       else
+               inode->i_flags &= ~S_APPEND;
+
+       if (flags & FS_NODUMP_FL)
+               hip->userflags |= HFSPLUS_FLG_NODUMP;
+       else
+               hip->userflags &= ~HFSPLUS_FLG_NODUMP;
+
+       inode->i_ctime = CURRENT_TIME_SEC;
+       mark_inode_dirty(inode);
+
+out_unlock_inode:
+       mutex_lock(&inode->i_mutex);
+out_drop_write:
+       mnt_drop_write(file->f_path.mnt);
+out:
+       return err;
+}
+
+long hfsplus_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+       void __user *argp = (void __user *)arg;
+
+       switch (cmd) {
+       case HFSPLUS_IOC_EXT2_GETFLAGS:
+               return hfsplus_ioctl_getflags(file, argp);
+       case HFSPLUS_IOC_EXT2_SETFLAGS:
+               return hfsplus_ioctl_setflags(file, argp);
         default:
-               unlock_kernel();
                 return -ENOTTY;
         }
  }
@@ -110,7 +125,7 @@ int hfsplus_setxattr(struct dentry *dentry, const char *name,
         if (!S_ISREG(inode->i_mode) || HFSPLUS_IS_RSRC(inode))
                 return -EOPNOTSUPP;
  
-       res = hfs_find_init(HFSPLUS_SB(inode->i_sb).cat_tree, &fd);
+       res = hfs_find_init(HFSPLUS_SB(inode->i_sb)->cat_tree, &fd);
         if (res)
                 return res;
         res = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
@@ -153,7 +168,7 @@ ssize_t hfsplus_getxattr(struct dentry *dentry, const char *name,
                 return -EOPNOTSUPP;
  
         if (size) {
-               res = hfs_find_init(HFSPLUS_SB(inode->i_sb).cat_tree, &fd);
+               res = hfs_find_init(HFSPLUS_SB(inode->i_sb)->cat_tree, &fd);
                 if (res)
                         return res;
                 res = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
@@ -177,7 +192,7 @@ ssize_t hfsplus_getxattr(struct dentry *dentry, const char *name,
                 } else
                         res = size ? -ERANGE : 4;
         } else
-               res = -ENODATA;
+               res = -EOPNOTSUPP;
  out:
         if (size)
                 hfs_find_exit(&fd);
diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c

index 572628b4b07d23af08f98ca7b757c85acfbcbc0c..f9ab276a4d8de9e15d2acf49a358da1a9ff10fe6 100644 (file)
--- a/fs/hfsplus/options.c
+++ b/fs/hfsplus/options.c
@@ -143,13 +143,13 @@ int hfsplus_parse_options(char *input, struct hfsplus_sb_info *sbi)
                         kfree(p);
                         break;
                 case opt_decompose:
-                       sbi->flags &= ~HFSPLUS_SB_NODECOMPOSE;
+                       clear_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags);
                         break;
                 case opt_nodecompose:
-                       sbi->flags |= HFSPLUS_SB_NODECOMPOSE;
+                       set_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags);
                         break;
                 case opt_force:
-                       sbi->flags |= HFSPLUS_SB_FORCE;
+                       set_bit(HFSPLUS_SB_FORCE, &sbi->flags);
                         break;
                 default:
                         return 0;
@@ -171,7 +171,7 @@ done:
  
  int hfsplus_show_options(struct seq_file *seq, struct vfsmount *mnt)
  {
-       struct hfsplus_sb_info *sbi = &HFSPLUS_SB(mnt->mnt_sb);
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(mnt->mnt_sb);
  
         if (sbi->creator != HFSPLUS_DEF_CR_TYPE)
                 seq_printf(seq, ",creator=%.4s", (char *)&sbi->creator);
@@ -184,7 +184,7 @@ int hfsplus_show_options(struct seq_file *seq, struct vfsmount *mnt)
                 seq_printf(seq, ",session=%u", sbi->session);
         if (sbi->nls)
                 seq_printf(seq, ",nls=%s", sbi->nls->charset);
-       if (sbi->flags & HFSPLUS_SB_NODECOMPOSE)
+       if (test_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags))
                 seq_printf(seq, ",nodecompose");
         return 0;
  }
diff --git a/fs/hfsplus/part_tbl.c b/fs/hfsplus/part_tbl.c

index 1528a6fd02992f1858ee254fe01039520da77bd4..208b16c645cc234c6f5ce1b15fbd5b49ba802ee8 100644 (file)
--- a/fs/hfsplus/part_tbl.c
+++ b/fs/hfsplus/part_tbl.c
@@ -74,6 +74,7 @@ struct old_pmap {
  int hfs_part_find(struct super_block *sb,
                   sector_t *part_start, sector_t *part_size)
  {
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
         struct buffer_head *bh;
         __be16 *data;
         int i, size, res;
@@ -95,7 +96,7 @@ int hfs_part_find(struct super_block *sb,
                 for (i = 0; i < size; p++, i++) {
                         if (p->pdStart && p->pdSize &&
                             p->pdFSID == cpu_to_be32(0x54465331)/*"TFS1"*/ &&
-                           (HFSPLUS_SB(sb).part < 0 || HFSPLUS_SB(sb).part == i)) {
+                           (sbi->part < 0 || sbi->part == i)) {
                                 *part_start += be32_to_cpu(p->pdStart);
                                 *part_size = be32_to_cpu(p->pdSize);
                                 res = 0;
@@ -111,7 +112,7 @@ int hfs_part_find(struct super_block *sb,
                 size = be32_to_cpu(pm->pmMapBlkCnt);
                 for (i = 0; i < size;) {
                         if (!memcmp(pm->pmPartType,"Apple_HFS", 9) &&
-                           (HFSPLUS_SB(sb).part < 0 || HFSPLUS_SB(sb).part == i)) {
+                           (sbi->part < 0 || sbi->part == i)) {
                                 *part_start += be32_to_cpu(pm->pmPyPartStart);
                                 *part_size = be32_to_cpu(pm->pmPartBlkCnt);
                                 res = 0;
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c

index 3b55c050c74274710fa95cad827edf6abd6b8316..9a88d7536103e2c1c70f3824b42d367d3eac848f 100644 (file)
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -12,7 +12,6 @@
  #include <linux/pagemap.h>
  #include <linux/fs.h>
  #include <linux/slab.h>
-#include <linux/smp_lock.h>
  #include <linux/vfs.h>
  #include <linux/nls.h>
  
@@ -21,40 +20,11 @@ static void hfsplus_destroy_inode(struct inode *inode);
  
  #include "hfsplus_fs.h"
  
-struct inode *hfsplus_iget(struct super_block *sb, unsigned long ino)
+static int hfsplus_system_read_inode(struct inode *inode)
  {
-       struct hfs_find_data fd;
-       struct hfsplus_vh *vhdr;
-       struct inode *inode;
-       long err = -EIO;
-
-       inode = iget_locked(sb, ino);
-       if (!inode)
-               return ERR_PTR(-ENOMEM);
-       if (!(inode->i_state & I_NEW))
-               return inode;
+       struct hfsplus_vh *vhdr = HFSPLUS_SB(inode->i_sb)->s_vhdr;
  
-       INIT_LIST_HEAD(&HFSPLUS_I(inode).open_dir_list);
-       mutex_init(&HFSPLUS_I(inode).extents_lock);
-       HFSPLUS_I(inode).flags = 0;
-       HFSPLUS_I(inode).rsrc_inode = NULL;
-       atomic_set(&HFSPLUS_I(inode).opencnt, 0);
-
-       if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID) {
-       read_inode:
-               hfs_find_init(HFSPLUS_SB(inode->i_sb).cat_tree, &fd);
-               err = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
-               if (!err)
-                       err = hfsplus_cat_read_inode(inode, &fd);
-               hfs_find_exit(&fd);
-               if (err)
-                       goto bad_inode;
-               goto done;
-       }
-       vhdr = HFSPLUS_SB(inode->i_sb).s_vhdr;
-       switch(inode->i_ino) {
-       case HFSPLUS_ROOT_CNID:
-               goto read_inode;
+       switch (inode->i_ino) {
         case HFSPLUS_EXT_CNID:
                 hfsplus_inode_read_fork(inode, &vhdr->ext_file);
                 inode->i_mapping->a_ops = &hfsplus_btree_aops;
@@ -75,74 +45,101 @@ struct inode *hfsplus_iget(struct super_block *sb, unsigned long ino)
                 inode->i_mapping->a_ops = &hfsplus_btree_aops;
                 break;
         default:
-               goto bad_inode;
+               return -EIO;
+       }
+
+       return 0;
+}
+
+struct inode *hfsplus_iget(struct super_block *sb, unsigned long ino)
+{
+       struct hfs_find_data fd;
+       struct inode *inode;
+       int err;
+
+       inode = iget_locked(sb, ino);
+       if (!inode)
+               return ERR_PTR(-ENOMEM);
+       if (!(inode->i_state & I_NEW))
+               return inode;
+
+       INIT_LIST_HEAD(&HFSPLUS_I(inode)->open_dir_list);
+       mutex_init(&HFSPLUS_I(inode)->extents_lock);
+       HFSPLUS_I(inode)->flags = 0;
+       HFSPLUS_I(inode)->rsrc_inode = NULL;
+       atomic_set(&HFSPLUS_I(inode)->opencnt, 0);
+
+       if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID ||
+           inode->i_ino == HFSPLUS_ROOT_CNID) {
+               hfs_find_init(HFSPLUS_SB(inode->i_sb)->cat_tree, &fd);
+               err = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
+               if (!err)
+                       err = hfsplus_cat_read_inode(inode, &fd);
+               hfs_find_exit(&fd);
+       } else {
+               err = hfsplus_system_read_inode(inode);
+       }
+
+       if (err) {
+               iget_failed(inode);
+               return ERR_PTR(err);
         }
  
-done:
         unlock_new_inode(inode);
         return inode;
-
-bad_inode:
-       iget_failed(inode);
-       return ERR_PTR(err);
  }
  
-static int hfsplus_write_inode(struct inode *inode,
-               struct writeback_control *wbc)
+static int hfsplus_system_write_inode(struct inode *inode)
  {
-       struct hfsplus_vh *vhdr;
-       int ret = 0;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
+       struct hfsplus_vh *vhdr = sbi->s_vhdr;
+       struct hfsplus_fork_raw *fork;
+       struct hfs_btree *tree = NULL;
  
-       dprint(DBG_INODE, "hfsplus_write_inode: %lu\n", inode->i_ino);
-       hfsplus_ext_write_extent(inode);
-       if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID) {
-               return hfsplus_cat_write_inode(inode);
-       }
-       vhdr = HFSPLUS_SB(inode->i_sb).s_vhdr;
         switch (inode->i_ino) {
-       case HFSPLUS_ROOT_CNID:
-               ret = hfsplus_cat_write_inode(inode);
-               break;
         case HFSPLUS_EXT_CNID:
-               if (vhdr->ext_file.total_size != cpu_to_be64(inode->i_size)) {
-                       HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
-                       inode->i_sb->s_dirt = 1;
-               }
-               hfsplus_inode_write_fork(inode, &vhdr->ext_file);
-               hfs_btree_write(HFSPLUS_SB(inode->i_sb).ext_tree);
+               fork = &vhdr->ext_file;
+               tree = sbi->ext_tree;
                 break;
         case HFSPLUS_CAT_CNID:
-               if (vhdr->cat_file.total_size != cpu_to_be64(inode->i_size)) {
-                       HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
-                       inode->i_sb->s_dirt = 1;
-               }
-               hfsplus_inode_write_fork(inode, &vhdr->cat_file);
-               hfs_btree_write(HFSPLUS_SB(inode->i_sb).cat_tree);
+               fork = &vhdr->cat_file;
+               tree = sbi->cat_tree;
                 break;
         case HFSPLUS_ALLOC_CNID:
-               if (vhdr->alloc_file.total_size != cpu_to_be64(inode->i_size)) {
-                       HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
-                       inode->i_sb->s_dirt = 1;
-               }
-               hfsplus_inode_write_fork(inode, &vhdr->alloc_file);
+               fork = &vhdr->alloc_file;
                 break;
         case HFSPLUS_START_CNID:
-               if (vhdr->start_file.total_size != cpu_to_be64(inode->i_size)) {
-                       HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
-                       inode->i_sb->s_dirt = 1;
-               }
-               hfsplus_inode_write_fork(inode, &vhdr->start_file);
+               fork = &vhdr->start_file;
                 break;
         case HFSPLUS_ATTR_CNID:
-               if (vhdr->attr_file.total_size != cpu_to_be64(inode->i_size)) {
-                       HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
-                       inode->i_sb->s_dirt = 1;
-               }
-               hfsplus_inode_write_fork(inode, &vhdr->attr_file);
-               hfs_btree_write(HFSPLUS_SB(inode->i_sb).attr_tree);
-               break;
+               fork = &vhdr->attr_file;
+               tree = sbi->attr_tree;
+       default:
+               return -EIO;
+       }
+
+       if (fork->total_size != cpu_to_be64(inode->i_size)) {
+               set_bit(HFSPLUS_SB_WRITEBACKUP, &sbi->flags);
+               inode->i_sb->s_dirt = 1;
         }
-       return ret;
+       hfsplus_inode_write_fork(inode, fork);
+       if (tree)
+               hfs_btree_write(tree);
+       return 0;
+}
+
+static int hfsplus_write_inode(struct inode *inode,
+               struct writeback_control *wbc)
+{
+       dprint(DBG_INODE, "hfsplus_write_inode: %lu\n", inode->i_ino);
+
+       hfsplus_ext_write_extent(inode);
+
+       if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID ||
+           inode->i_ino == HFSPLUS_ROOT_CNID)
+               return hfsplus_cat_write_inode(inode);
+       else
+               return hfsplus_system_write_inode(inode);
  }
  
  static void hfsplus_evict_inode(struct inode *inode)
@@ -151,51 +148,53 @@ static void hfsplus_evict_inode(struct inode *inode)
         truncate_inode_pages(&inode->i_data, 0);
         end_writeback(inode);
         if (HFSPLUS_IS_RSRC(inode)) {
-               HFSPLUS_I(HFSPLUS_I(inode).rsrc_inode).rsrc_inode = NULL;
-               iput(HFSPLUS_I(inode).rsrc_inode);
+               HFSPLUS_I(HFSPLUS_I(inode)->rsrc_inode)->rsrc_inode = NULL;
+               iput(HFSPLUS_I(inode)->rsrc_inode);
         }
  }
  
  int hfsplus_sync_fs(struct super_block *sb, int wait)
  {
-       struct hfsplus_vh *vhdr = HFSPLUS_SB(sb).s_vhdr;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+       struct hfsplus_vh *vhdr = sbi->s_vhdr;
  
         dprint(DBG_SUPER, "hfsplus_write_super\n");
  
-       lock_super(sb);
+       mutex_lock(&sbi->vh_mutex);
+       mutex_lock(&sbi->alloc_mutex);
         sb->s_dirt = 0;
  
-       vhdr->free_blocks = cpu_to_be32(HFSPLUS_SB(sb).free_blocks);
-       vhdr->next_alloc = cpu_to_be32(HFSPLUS_SB(sb).next_alloc);
-       vhdr->next_cnid = cpu_to_be32(HFSPLUS_SB(sb).next_cnid);
-       vhdr->folder_count = cpu_to_be32(HFSPLUS_SB(sb).folder_count);
-       vhdr->file_count = cpu_to_be32(HFSPLUS_SB(sb).file_count);
+       vhdr->free_blocks = cpu_to_be32(sbi->free_blocks);
+       vhdr->next_cnid = cpu_to_be32(sbi->next_cnid);
+       vhdr->folder_count = cpu_to_be32(sbi->folder_count);
+       vhdr->file_count = cpu_to_be32(sbi->file_count);
  
-       mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
-       if (HFSPLUS_SB(sb).flags & HFSPLUS_SB_WRITEBACKUP) {
-               if (HFSPLUS_SB(sb).sect_count) {
+       mark_buffer_dirty(sbi->s_vhbh);
+       if (test_and_clear_bit(HFSPLUS_SB_WRITEBACKUP, &sbi->flags)) {
+               if (sbi->sect_count) {
                         struct buffer_head *bh;
                         u32 block, offset;
  
-                       block = HFSPLUS_SB(sb).blockoffset;
-                       block += (HFSPLUS_SB(sb).sect_count - 2) >> (sb->s_blocksize_bits - 9);
-                       offset = ((HFSPLUS_SB(sb).sect_count - 2) << 9) & (sb->s_blocksize - 1);
-                       printk(KERN_DEBUG "hfs: backup: %u,%u,%u,%u\n", HFSPLUS_SB(sb).blockoffset,
-                               HFSPLUS_SB(sb).sect_count, block, offset);
+                       block = sbi->blockoffset;
+                       block += (sbi->sect_count - 2) >> (sb->s_blocksize_bits - 9);
+                       offset = ((sbi->sect_count - 2) << 9) & (sb->s_blocksize - 1);
+                       printk(KERN_DEBUG "hfs: backup: %u,%u,%u,%u\n",
+                                         sbi->blockoffset, sbi->sect_count,
+                                         block, offset);
                         bh = sb_bread(sb, block);
                         if (bh) {
                                 vhdr = (struct hfsplus_vh *)(bh->b_data + offset);
                                 if (be16_to_cpu(vhdr->signature) == HFSPLUS_VOLHEAD_SIG) {
-                                       memcpy(vhdr, HFSPLUS_SB(sb).s_vhdr, sizeof(*vhdr));
+                                       memcpy(vhdr, sbi->s_vhdr, sizeof(*vhdr));
                                         mark_buffer_dirty(bh);
                                         brelse(bh);
                                 } else
                                         printk(KERN_WARNING "hfs: backup not found!\n");
                         }
                 }
-               HFSPLUS_SB(sb).flags &= ~HFSPLUS_SB_WRITEBACKUP;
         }
-       unlock_super(sb);
+       mutex_unlock(&sbi->alloc_mutex);
+       mutex_unlock(&sbi->vh_mutex);
         return 0;
  }
  
@@ -209,48 +208,48 @@ static void hfsplus_write_super(struct super_block *sb)
  
  static void hfsplus_put_super(struct super_block *sb)
  {
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+
         dprint(DBG_SUPER, "hfsplus_put_super\n");
+
         if (!sb->s_fs_info)
                 return;
  
-       lock_kernel();
-
         if (sb->s_dirt)
                 hfsplus_write_super(sb);
-       if (!(sb->s_flags & MS_RDONLY) && HFSPLUS_SB(sb).s_vhdr) {
-               struct hfsplus_vh *vhdr = HFSPLUS_SB(sb).s_vhdr;
+       if (!(sb->s_flags & MS_RDONLY) && sbi->s_vhdr) {
+               struct hfsplus_vh *vhdr = sbi->s_vhdr;
  
                 vhdr->modify_date = hfsp_now2mt();
                 vhdr->attributes |= cpu_to_be32(HFSPLUS_VOL_UNMNT);
                 vhdr->attributes &= cpu_to_be32(~HFSPLUS_VOL_INCNSTNT);
-               mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
-               sync_dirty_buffer(HFSPLUS_SB(sb).s_vhbh);
+               mark_buffer_dirty(sbi->s_vhbh);
+               sync_dirty_buffer(sbi->s_vhbh);
         }
  
-       hfs_btree_close(HFSPLUS_SB(sb).cat_tree);
-       hfs_btree_close(HFSPLUS_SB(sb).ext_tree);
-       iput(HFSPLUS_SB(sb).alloc_file);
-       iput(HFSPLUS_SB(sb).hidden_dir);
-       brelse(HFSPLUS_SB(sb).s_vhbh);
-       unload_nls(HFSPLUS_SB(sb).nls);
+       hfs_btree_close(sbi->cat_tree);
+       hfs_btree_close(sbi->ext_tree);
+       iput(sbi->alloc_file);
+       iput(sbi->hidden_dir);
+       brelse(sbi->s_vhbh);
+       unload_nls(sbi->nls);
         kfree(sb->s_fs_info);
         sb->s_fs_info = NULL;
-
-       unlock_kernel();
  }
  
  static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
  {
         struct super_block *sb = dentry->d_sb;
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
         u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
  
         buf->f_type = HFSPLUS_SUPER_MAGIC;
         buf->f_bsize = sb->s_blocksize;
-       buf->f_blocks = HFSPLUS_SB(sb).total_blocks << HFSPLUS_SB(sb).fs_shift;
-       buf->f_bfree = HFSPLUS_SB(sb).free_blocks << HFSPLUS_SB(sb).fs_shift;
+       buf->f_blocks = sbi->total_blocks << sbi->fs_shift;
+       buf->f_bfree = sbi->free_blocks << sbi->fs_shift;
         buf->f_bavail = buf->f_bfree;
         buf->f_files = 0xFFFFFFFF;
-       buf->f_ffree = 0xFFFFFFFF - HFSPLUS_SB(sb).next_cnid;
+       buf->f_ffree = 0xFFFFFFFF - sbi->next_cnid;
         buf->f_fsid.val[0] = (u32)id;
         buf->f_fsid.val[1] = (u32)(id >> 32);
         buf->f_namelen = HFSPLUS_MAX_STRLEN;
@@ -263,11 +262,11 @@ static int hfsplus_remount(struct super_block *sb, int *flags, char *data)
         if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
                 return 0;
         if (!(*flags & MS_RDONLY)) {
-               struct hfsplus_vh *vhdr = HFSPLUS_SB(sb).s_vhdr;
+               struct hfsplus_vh *vhdr = HFSPLUS_SB(sb)->s_vhdr;
                 struct hfsplus_sb_info sbi;
  
                 memset(&sbi, 0, sizeof(struct hfsplus_sb_info));
-               sbi.nls = HFSPLUS_SB(sb).nls;
+               sbi.nls = HFSPLUS_SB(sb)->nls;
                 if (!hfsplus_parse_options(data, &sbi))
                         return -EINVAL;
  
@@ -276,7 +275,7 @@ static int hfsplus_remount(struct super_block *sb, int *flags, char *data)
                                "running fsck.hfsplus is recommended.  leaving read-only.\n");
                         sb->s_flags |= MS_RDONLY;
                         *flags |= MS_RDONLY;
-               } else if (sbi.flags & HFSPLUS_SB_FORCE) {
+               } else if (test_bit(HFSPLUS_SB_FORCE, &sbi.flags)) {
                         /* nothing */
                 } else if (vhdr->attributes & cpu_to_be32(HFSPLUS_VOL_SOFTLOCK)) {
                         printk(KERN_WARNING "hfs: filesystem is marked locked, leaving read-only.\n");
@@ -320,7 +319,8 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
                 return -ENOMEM;
  
         sb->s_fs_info = sbi;
-       INIT_HLIST_HEAD(&sbi->rsrc_inodes);
+       mutex_init(&sbi->alloc_mutex);
+       mutex_init(&sbi->vh_mutex);
         hfsplus_fill_defaults(sbi);
         if (!hfsplus_parse_options(data, sbi)) {
                 printk(KERN_ERR "hfs: unable to parse mount options\n");
@@ -344,7 +344,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
                 err = -EINVAL;
                 goto cleanup;
         }
-       vhdr = HFSPLUS_SB(sb).s_vhdr;
+       vhdr = sbi->s_vhdr;
  
         /* Copy parts of the volume header into the superblock */
         sb->s_magic = HFSPLUS_VOLHEAD_SIG;
@@ -353,18 +353,19 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
                 printk(KERN_ERR "hfs: wrong filesystem version\n");
                 goto cleanup;
         }
-       HFSPLUS_SB(sb).total_blocks = be32_to_cpu(vhdr->total_blocks);
-       HFSPLUS_SB(sb).free_blocks = be32_to_cpu(vhdr->free_blocks);
-       HFSPLUS_SB(sb).next_alloc = be32_to_cpu(vhdr->next_alloc);
-       HFSPLUS_SB(sb).next_cnid = be32_to_cpu(vhdr->next_cnid);
-       HFSPLUS_SB(sb).file_count = be32_to_cpu(vhdr->file_count);
-       HFSPLUS_SB(sb).folder_count = be32_to_cpu(vhdr->folder_count);
-       HFSPLUS_SB(sb).data_clump_blocks = be32_to_cpu(vhdr->data_clump_sz) >> HFSPLUS_SB(sb).alloc_blksz_shift;
-       if (!HFSPLUS_SB(sb).data_clump_blocks)
-               HFSPLUS_SB(sb).data_clump_blocks = 1;
-       HFSPLUS_SB(sb).rsrc_clump_blocks = be32_to_cpu(vhdr->rsrc_clump_sz) >> HFSPLUS_SB(sb).alloc_blksz_shift;
-       if (!HFSPLUS_SB(sb).rsrc_clump_blocks)
-               HFSPLUS_SB(sb).rsrc_clump_blocks = 1;
+       sbi->total_blocks = be32_to_cpu(vhdr->total_blocks);
+       sbi->free_blocks = be32_to_cpu(vhdr->free_blocks);
+       sbi->next_cnid = be32_to_cpu(vhdr->next_cnid);
+       sbi->file_count = be32_to_cpu(vhdr->file_count);
+       sbi->folder_count = be32_to_cpu(vhdr->folder_count);
+       sbi->data_clump_blocks =
+               be32_to_cpu(vhdr->data_clump_sz) >> sbi->alloc_blksz_shift;
+       if (!sbi->data_clump_blocks)
+               sbi->data_clump_blocks = 1;
+       sbi->rsrc_clump_blocks =
+               be32_to_cpu(vhdr->rsrc_clump_sz) >> sbi->alloc_blksz_shift;
+       if (!sbi->rsrc_clump_blocks)
+               sbi->rsrc_clump_blocks = 1;
  
         /* Set up operations so we can load metadata */
         sb->s_op = &hfsplus_sops;
@@ -374,7 +375,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
                 printk(KERN_WARNING "hfs: Filesystem was not cleanly unmounted, "
                        "running fsck.hfsplus is recommended.  mounting read-only.\n");
                 sb->s_flags |= MS_RDONLY;
-       } else if (sbi->flags & HFSPLUS_SB_FORCE) {
+       } else if (test_and_clear_bit(HFSPLUS_SB_FORCE, &sbi->flags)) {
                 /* nothing */
         } else if (vhdr->attributes & cpu_to_be32(HFSPLUS_VOL_SOFTLOCK)) {
                 printk(KERN_WARNING "hfs: Filesystem is marked locked, mounting read-only.\n");
@@ -384,16 +385,15 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
                        "use the force option at your own risk, mounting read-only.\n");
                 sb->s_flags |= MS_RDONLY;
         }
-       sbi->flags &= ~HFSPLUS_SB_FORCE;
  
         /* Load metadata objects (B*Trees) */
-       HFSPLUS_SB(sb).ext_tree = hfs_btree_open(sb, HFSPLUS_EXT_CNID);
-       if (!HFSPLUS_SB(sb).ext_tree) {
+       sbi->ext_tree = hfs_btree_open(sb, HFSPLUS_EXT_CNID);
+       if (!sbi->ext_tree) {
                 printk(KERN_ERR "hfs: failed to load extents file\n");
                 goto cleanup;
         }
-       HFSPLUS_SB(sb).cat_tree = hfs_btree_open(sb, HFSPLUS_CAT_CNID);
-       if (!HFSPLUS_SB(sb).cat_tree) {
+       sbi->cat_tree = hfs_btree_open(sb, HFSPLUS_CAT_CNID);
+       if (!sbi->cat_tree) {
                 printk(KERN_ERR "hfs: failed to load catalog file\n");
                 goto cleanup;
         }
@@ -404,7 +404,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
                 err = PTR_ERR(inode);
                 goto cleanup;
         }
-       HFSPLUS_SB(sb).alloc_file = inode;
+       sbi->alloc_file = inode;
  
         /* Load the root directory */
         root = hfsplus_iget(sb, HFSPLUS_ROOT_CNID);
@@ -423,7 +423,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
  
         str.len = sizeof(HFSP_HIDDENDIR_NAME) - 1;
         str.name = HFSP_HIDDENDIR_NAME;
-       hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+       hfs_find_init(sbi->cat_tree, &fd);
         hfsplus_cat_build_key(sb, fd.search_key, HFSPLUS_ROOT_CNID, &str);
         if (!hfs_brec_read(&fd, &entry, sizeof(entry))) {
                 hfs_find_exit(&fd);
@@ -434,7 +434,7 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
                         err = PTR_ERR(inode);
                         goto cleanup;
                 }
-               HFSPLUS_SB(sb).hidden_dir = inode;
+               sbi->hidden_dir = inode;
         } else
                 hfs_find_exit(&fd);
  
@@ -449,15 +449,19 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent)
         be32_add_cpu(&vhdr->write_count, 1);
         vhdr->attributes &= cpu_to_be32(~HFSPLUS_VOL_UNMNT);
         vhdr->attributes |= cpu_to_be32(HFSPLUS_VOL_INCNSTNT);
-       mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
-       sync_dirty_buffer(HFSPLUS_SB(sb).s_vhbh);
+       mark_buffer_dirty(sbi->s_vhbh);
+       sync_dirty_buffer(sbi->s_vhbh);
  
-       if (!HFSPLUS_SB(sb).hidden_dir) {
+       if (!sbi->hidden_dir) {
                 printk(KERN_DEBUG "hfs: create hidden dir...\n");
-               HFSPLUS_SB(sb).hidden_dir = hfsplus_new_inode(sb, S_IFDIR);
-               hfsplus_create_cat(HFSPLUS_SB(sb).hidden_dir->i_ino, sb->s_root->d_inode,
-                                  &str, HFSPLUS_SB(sb).hidden_dir);
-               mark_inode_dirty(HFSPLUS_SB(sb).hidden_dir);
+
+               mutex_lock(&sbi->vh_mutex);
+               sbi->hidden_dir = hfsplus_new_inode(sb, S_IFDIR);
+               hfsplus_create_cat(sbi->hidden_dir->i_ino, sb->s_root->d_inode,
+                                  &str, sbi->hidden_dir);
+               mutex_unlock(&sbi->vh_mutex);
+
+               mark_inode_dirty(sbi->hidden_dir);
         }
  out:
         unload_nls(sbi->nls);
@@ -486,7 +490,7 @@ static struct inode *hfsplus_alloc_inode(struct super_block *sb)
  
  static void hfsplus_destroy_inode(struct inode *inode)
  {
-       kmem_cache_free(hfsplus_inode_cachep, &HFSPLUS_I(inode));
+       kmem_cache_free(hfsplus_inode_cachep, HFSPLUS_I(inode));
  }
  
  #define HFSPLUS_INODE_SIZE     sizeof(struct hfsplus_inode_info)
diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c

index 628ccf6fa402500aa15d7d53969b0f62b6ea5188..b66d67de882c3d098d661f54cbc2b19983bab32b 100644 (file)
--- a/fs/hfsplus/unicode.c
+++ b/fs/hfsplus/unicode.c
@@ -121,7 +121,7 @@ static u16 *hfsplus_compose_lookup(u16 *p, u16 cc)
  int hfsplus_uni2asc(struct super_block *sb, const struct hfsplus_unistr *ustr, char *astr, int *len_p)
  {
         const hfsplus_unichr *ip;
-       struct nls_table *nls = HFSPLUS_SB(sb).nls;
+       struct nls_table *nls = HFSPLUS_SB(sb)->nls;
         u8 *op;
         u16 cc, c0, c1;
         u16 *ce1, *ce2;
@@ -132,7 +132,7 @@ int hfsplus_uni2asc(struct super_block *sb, const struct hfsplus_unistr *ustr, c
         ustrlen = be16_to_cpu(ustr->length);
         len = *len_p;
         ce1 = NULL;
-       compose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+       compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
  
         while (ustrlen > 0) {
                 c0 = be16_to_cpu(*ip++);
@@ -246,7 +246,7 @@ out:
  static inline int asc2unichar(struct super_block *sb, const char *astr, int len,
                               wchar_t *uc)
  {
-       int size = HFSPLUS_SB(sb).nls->char2uni(astr, len, uc);
+       int size = HFSPLUS_SB(sb)->nls->char2uni(astr, len, uc);
         if (size <= 0) {
                 *uc = '?';
                 size = 1;
@@ -293,7 +293,7 @@ int hfsplus_asc2uni(struct super_block *sb, struct hfsplus_unistr *ustr,
         u16 *dstr, outlen = 0;
         wchar_t c;
  
-       decompose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+       decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
         while (outlen < HFSPLUS_MAX_STRLEN && len > 0) {
                 size = asc2unichar(sb, astr, len, &c);
  
@@ -330,8 +330,8 @@ int hfsplus_hash_dentry(struct dentry *dentry, struct qstr *str)
         wchar_t c;
         u16 c2;
  
-       casefold = (HFSPLUS_SB(sb).flags & HFSPLUS_SB_CASEFOLD);
-       decompose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+       casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
+       decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
         hash = init_name_hash();
         astr = str->name;
         len = str->len;
@@ -373,8 +373,8 @@ int hfsplus_compare_dentry(struct dentry *dentry, struct qstr *s1, struct qstr *
         u16 c1, c2;
         wchar_t c;
  
-       casefold = (HFSPLUS_SB(sb).flags & HFSPLUS_SB_CASEFOLD);
-       decompose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+       casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
+       decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
         astr1 = s1->name;
         len1 = s1->len;
         astr2 = s2->name;
diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c

index bed78ac8f6d1f5e530a93c0afac22e1cfdf2178a..8972c20b3216941a88eb3deffa591d89ea3c114a 100644 (file)
--- a/fs/hfsplus/wrapper.c
+++ b/fs/hfsplus/wrapper.c
@@ -65,8 +65,8 @@ static int hfsplus_get_last_session(struct super_block *sb,
         *start = 0;
         *size = sb->s_bdev->bd_inode->i_size >> 9;
  
-       if (HFSPLUS_SB(sb).session >= 0) {
-               te.cdte_track = HFSPLUS_SB(sb).session;
+       if (HFSPLUS_SB(sb)->session >= 0) {
+               te.cdte_track = HFSPLUS_SB(sb)->session;
                 te.cdte_format = CDROM_LBA;
                 res = ioctl_by_bdev(sb->s_bdev, CDROMREADTOCENTRY, (unsigned long)&te);
                 if (!res && (te.cdte_ctrl & CDROM_DATA_TRACK) == 4) {
@@ -87,6 +87,7 @@ static int hfsplus_get_last_session(struct super_block *sb,
  /* Takes in super block, returns true if good data read */
  int hfsplus_read_wrapper(struct super_block *sb)
  {
+       struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
         struct buffer_head *bh;
         struct hfsplus_vh *vhdr;
         struct hfsplus_wd wd;
@@ -122,7 +123,7 @@ int hfsplus_read_wrapper(struct super_block *sb)
                 if (vhdr->signature == cpu_to_be16(HFSPLUS_VOLHEAD_SIG))
                         break;
                 if (vhdr->signature == cpu_to_be16(HFSPLUS_VOLHEAD_SIGX)) {
-                       HFSPLUS_SB(sb).flags |= HFSPLUS_SB_HFSX;
+                       set_bit(HFSPLUS_SB_HFSX, &sbi->flags);
                         break;
                 }
                 brelse(bh);
@@ -143,11 +144,11 @@ int hfsplus_read_wrapper(struct super_block *sb)
         if (blocksize < HFSPLUS_SECTOR_SIZE ||
             ((blocksize - 1) & blocksize))
                 return -EINVAL;
-       HFSPLUS_SB(sb).alloc_blksz = blocksize;
-       HFSPLUS_SB(sb).alloc_blksz_shift = 0;
+       sbi->alloc_blksz = blocksize;
+       sbi->alloc_blksz_shift = 0;
         while ((blocksize >>= 1) != 0)
-               HFSPLUS_SB(sb).alloc_blksz_shift++;
-       blocksize = min(HFSPLUS_SB(sb).alloc_blksz, (u32)PAGE_SIZE);
+               sbi->alloc_blksz_shift++;
+       blocksize = min(sbi->alloc_blksz, (u32)PAGE_SIZE);
  
         /* align block size to block offset */
         while (part_start & ((blocksize >> HFSPLUS_SECTOR_SHIFT) - 1))
@@ -158,23 +159,26 @@ int hfsplus_read_wrapper(struct super_block *sb)
                 return -EINVAL;
         }
  
-       HFSPLUS_SB(sb).blockoffset = part_start >>
-                       (sb->s_blocksize_bits - HFSPLUS_SECTOR_SHIFT);
-       HFSPLUS_SB(sb).sect_count = part_size;
-       HFSPLUS_SB(sb).fs_shift = HFSPLUS_SB(sb).alloc_blksz_shift -
-                       sb->s_blocksize_bits;
+       sbi->blockoffset =
+               part_start >> (sb->s_blocksize_bits - HFSPLUS_SECTOR_SHIFT);
+       sbi->sect_count = part_size;
+       sbi->fs_shift = sbi->alloc_blksz_shift - sb->s_blocksize_bits;
  
         bh = sb_bread512(sb, part_start + HFSPLUS_VOLHEAD_SECTOR, vhdr);
         if (!bh)
                 return -EIO;
  
         /* should still be the same... */
-       if (vhdr->signature != (HFSPLUS_SB(sb).flags & HFSPLUS_SB_HFSX ?
-                               cpu_to_be16(HFSPLUS_VOLHEAD_SIGX) :
-                               cpu_to_be16(HFSPLUS_VOLHEAD_SIG)))
-               goto error;
-       HFSPLUS_SB(sb).s_vhbh = bh;
-       HFSPLUS_SB(sb).s_vhdr = vhdr;
+       if (test_bit(HFSPLUS_SB_HFSX, &sbi->flags)) {
+               if (vhdr->signature != cpu_to_be16(HFSPLUS_VOLHEAD_SIGX))
+                       goto error;
+       } else {
+               if (vhdr->signature != cpu_to_be16(HFSPLUS_VOLHEAD_SIG))
+                       goto error;
+       }
+
+       sbi->s_vhbh = bh;
+       sbi->s_vhdr = vhdr;
  
         return 0;
   error:
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h

index cdfb8c6a420674cbde75467a09d6a158977a4b61..c16f8d8331b5afcd80baec98cbdbc4594c66142d 100644 (file)
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -196,8 +196,6 @@ fh_lock(struct svc_fh *fhp)
  static inline void
  fh_unlock(struct svc_fh *fhp)
  {
-       BUG_ON(!fhp->fh_dentry);
-
         if (fhp->fh_locked) {
                 fill_post_wcc(fhp);
                 mutex_unlock(&fhp->fh_dentry->d_inode->i_mutex);
diff --git a/fs/notify/Kconfig b/fs/notify/Kconfig

index 22c629eedd82d70425704ee86b4ddf816b7bd174..b388443c3a09b01574b78befaabbf7fe03f9eeb4 100644 (file)
--- a/fs/notify/Kconfig
+++ b/fs/notify/Kconfig
@@ -3,4 +3,4 @@ config FSNOTIFY
  
  source "fs/notify/dnotify/Kconfig"
  source "fs/notify/inotify/Kconfig"
-source "fs/notify/fanotify/Kconfig"
+#source "fs/notify/fanotify/Kconfig"
diff --git a/fs/ocfs2/symlink.c b/fs/ocfs2/symlink.c

index 32499d213fc4f80f37efde6f71196283236896bf..9975457c981f904ca18a512dd7bcbe0f092ac664 100644 (file)
--- a/fs/ocfs2/symlink.c
+++ b/fs/ocfs2/symlink.c
@@ -128,7 +128,7 @@ static void *ocfs2_fast_follow_link(struct dentry *dentry,
         }
  
         /* Fast symlinks can't be large */
-       len = strlen(target);
+       len = strnlen(target, ocfs2_fast_symlink_chars(inode->i_sb));
         link = kzalloc(len + 1, GFP_NOFS);
         if (!link) {
                 status = -ENOMEM;
diff --git a/fs/proc/base.c b/fs/proc/base.c

index a1c43e7c8a7be4ce70c729f3338eae09097f4025..8e4addaa542458badbe0e62da644ec1c64194f99 100644 (file)
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2675,7 +2675,7 @@ static const struct pid_entry tgid_base_stuff[] = {
         INF("auxv",       S_IRUSR, proc_pid_auxv),
         ONE("status",     S_IRUGO, proc_pid_status),
         ONE("personality", S_IRUSR, proc_pid_personality),
-       INF("limits",     S_IRUSR, proc_pid_limits),
+       INF("limits",     S_IRUGO, proc_pid_limits),
  #ifdef CONFIG_SCHED_DEBUG
         REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
  #endif
@@ -3011,7 +3011,7 @@ static const struct pid_entry tid_base_stuff[] = {
         INF("auxv",      S_IRUSR, proc_pid_auxv),
         ONE("status",    S_IRUGO, proc_pid_status),
         ONE("personality", S_IRUSR, proc_pid_personality),
-       INF("limits",    S_IRUSR, proc_pid_limits),
+       INF("limits",    S_IRUGO, proc_pid_limits),
  #ifdef CONFIG_SCHED_DEBUG
         REG("sched",     S_IRUGO|S_IWUSR, proc_pid_sched_operations),
  #endif
diff --git a/fs/reiserfs/ioctl.c b/fs/reiserfs/ioctl.c

index f53505de071217399e39bf2013304ba46bd0c7f5..5cbb81e134aca031b21e3c88661400ff1c1b174f 100644 (file)
--- a/fs/reiserfs/ioctl.c
+++ b/fs/reiserfs/ioctl.c
@@ -170,6 +170,7 @@ int reiserfs_prepare_write(struct file *f, struct page *page,
  int reiserfs_unpack(struct inode *inode, struct file *filp)
  {
         int retval = 0;
+       int depth;
         int index;
         struct page *page;
         struct address_space *mapping;
@@ -188,8 +189,8 @@ int reiserfs_unpack(struct inode *inode, struct file *filp)
         /* we need to make sure nobody is changing the file size beneath
          ** us
          */
-       mutex_lock(&inode->i_mutex);
-       reiserfs_write_lock(inode->i_sb);
+       reiserfs_mutex_lock_safe(&inode->i_mutex, inode->i_sb);
+       depth = reiserfs_write_lock_once(inode->i_sb);
  
         write_from = inode->i_size & (blocksize - 1);
         /* if we are on a block boundary, we are already unpacked.  */
@@ -224,6 +225,6 @@ int reiserfs_unpack(struct inode *inode, struct file *filp)
  
        out:
         mutex_unlock(&inode->i_mutex);
-       reiserfs_write_unlock(inode->i_sb);
+       reiserfs_write_unlock_once(inode->i_sb, depth);
         return retval;
  }
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c

index d59c4a65d492c9b6b0713accaec1ab1c2ba7ea5f..81976ffed7d6f031f1bef0d7a5995cbbebbed69b 100644 (file)
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -668,14 +668,11 @@ xfs_inode_set_reclaim_tag(
         xfs_perag_put(pag);
  }
  
-void
-__xfs_inode_clear_reclaim_tag(
-       xfs_mount_t     *mp,
+STATIC void
+__xfs_inode_clear_reclaim(
         xfs_perag_t     *pag,
         xfs_inode_t     *ip)
  {
-       radix_tree_tag_clear(&pag->pag_ici_root,
-                       XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
         pag->pag_ici_reclaimable--;
         if (!pag->pag_ici_reclaimable) {
                 /* clear the reclaim tag from the perag radix tree */
@@ -689,6 +686,17 @@ __xfs_inode_clear_reclaim_tag(
         }
  }
  
+void
+__xfs_inode_clear_reclaim_tag(
+       xfs_mount_t     *mp,
+       xfs_perag_t     *pag,
+       xfs_inode_t     *ip)
+{
+       radix_tree_tag_clear(&pag->pag_ici_root,
+                       XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+       __xfs_inode_clear_reclaim(pag, ip);
+}
+
  /*
   * Inodes in different states need to be treated differently, and the return
   * value of xfs_iflush is not sufficient to get this right. The following table
@@ -838,6 +846,7 @@ reclaim:
         if (!radix_tree_delete(&pag->pag_ici_root,
                                 XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino)))
                 ASSERT(0);
+       __xfs_inode_clear_reclaim(pag, ip);
         write_unlock(&pag->pag_ici_lock);
  
         /*
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c

index ed575fb4b49597806200f676680ff787d5be9d12..7e206fc1fa362ed4bb761a8f5ad11ef39c2dd5c0 100644 (file)
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -405,9 +405,15 @@ xlog_cil_push(
         new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_SLEEP|KM_NOFS);
         new_ctx->ticket = xlog_cil_ticket_alloc(log);
  
-       /* lock out transaction commit, but don't block on background push */
+       /*
+        * Lock out transaction commit, but don't block for background pushes
+        * unless we are well over the CIL space limit. See the definition of
+        * XLOG_CIL_HARD_SPACE_LIMIT() for the full explanation of the logic
+        * used here.
+        */
         if (!down_write_trylock(&cil->xc_ctx_lock)) {
-               if (!push_seq)
+               if (!push_seq &&
+                   cil->xc_ctx->space_used < XLOG_CIL_HARD_SPACE_LIMIT(log))
                         goto out_free_ticket;
                 down_write(&cil->xc_ctx_lock);
         }
@@ -422,7 +428,7 @@ xlog_cil_push(
                 goto out_skip;
  
         /* check for a previously pushed seqeunce */
-       if (push_seq < cil->xc_ctx->sequence)
+       if (push_seq && push_seq < cil->xc_ctx->sequence)
                 goto out_skip;
  
         /*
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h

index ced52b98b322e3eb1be0e0c7dfc70f6096d0cd80..edcdfe01617f673bc1047caca3acc243f66a0658 100644 (file)
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -426,13 +426,13 @@ struct xfs_cil {
  };
  
  /*
- * The amount of log space we should the CIL to aggregate is difficult to size.
- * Whatever we chose we have to make we can get a reservation for the log space
- * effectively, that it is large enough to capture sufficient relogging to
- * reduce log buffer IO significantly, but it is not too large for the log or
- * induces too much latency when writing out through the iclogs. We track both
- * space consumed and the number of vectors in the checkpoint context, so we
- * need to decide which to use for limiting.
+ * The amount of log space we allow the CIL to aggregate is difficult to size.
+ * Whatever we choose, we have to make sure we can get a reservation for the
+ * log space effectively, that it is large enough to capture sufficient
+ * relogging to reduce log buffer IO significantly, but it is not too large for
+ * the log or induces too much latency when writing out through the iclogs. We
+ * track both space consumed and the number of vectors in the checkpoint
+ * context, so we need to decide which to use for limiting.
   *
   * Every log buffer we write out during a push needs a header reserved, which
   * is at least one sector and more for v2 logs. Hence we need a reservation of
@@ -459,16 +459,21 @@ struct xfs_cil {
   * checkpoint transaction ticket is specific to the checkpoint context, rather
   * than the CIL itself.
   *
- * With dynamic reservations, we can basically make up arbitrary limits for the
- * checkpoint size so long as they don't violate any other size rules.  Hence
- * the initial maximum size for the checkpoint transaction will be set to a
- * quarter of the log or 8MB, which ever is smaller. 8MB is an arbitrary limit
- * right now based on the latency of writing out a large amount of data through
- * the circular iclog buffers.
+ * With dynamic reservations, we can effectively make up arbitrary limits for
+ * the checkpoint size so long as they don't violate any other size rules.
+ * Recovery imposes a rule that no transaction exceed half the log, so we are
+ * limited by that.  Furthermore, the log transaction reservation subsystem
+ * tries to keep 25% of the log free, so we need to keep below that limit or we
+ * risk running out of free log space to start any new transactions.
+ *
+ * In order to keep background CIL push efficient, we will set a lower
+ * threshold at which background pushing is attempted without blocking current
+ * transaction commits.  A separate, higher bound defines when CIL pushes are
+ * enforced to ensure we stay within our maximum checkpoint size bounds.
+ * threshold, yet give us plenty of space for aggregation on large logs.
   */
-
-#define XLOG_CIL_SPACE_LIMIT(log)      \
-       (min((log->l_logsize >> 2), (8 * 1024 * 1024)))
+#define XLOG_CIL_SPACE_LIMIT(log)      (log->l_logsize >> 3)
+#define XLOG_CIL_HARD_SPACE_LIMIT(log) (3 * (log->l_logsize >> 4))
  
  /*
   * The reservation head lsn is not made up of a cycle number and block number.
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h

index c0786d446a00b88adf144ab97f05af06679cccf7..984cdc62e30bc52da4cef907f3dd5094aa5ffd71 100644 (file)
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -55,7 +55,7 @@
  extern u8 acpi_gbl_permanent_mmap;
  
  /*
- * Globals that are publically available, allowing for
+ * Globals that are publicly available, allowing for
   * run time configuration
   */
  extern u32 acpi_dbg_level;
diff --git a/include/drm/drmP.h b/include/drm/drmP.h

index 7809d230adee3f90c9537f6c00ec66bb53c1bd27..4c9461a4f9e67b4b3e67c5bb73192aed44695079 100644 (file)
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -612,7 +612,7 @@ struct drm_gem_object {
         struct kref refcount;
  
         /** Handle count of this object. Each handle also holds a reference */
-       struct kref handlecount;
+       atomic_t handle_count; /* number of handles on this object */
  
         /** Related drm device */
         struct drm_device *dev;
@@ -808,7 +808,6 @@ struct drm_driver {
          */
         int (*gem_init_object) (struct drm_gem_object *obj);
         void (*gem_free_object) (struct drm_gem_object *obj);
-       void (*gem_free_object_unlocked) (struct drm_gem_object *obj);
  
         /* vga arb irq handler */
         void (*vgaarb_irq)(struct drm_device *dev, bool state);
@@ -1175,6 +1174,7 @@ extern int drm_release(struct inode *inode, struct file *filp);
  extern int drm_mmap(struct file *filp, struct vm_area_struct *vma);
  extern int drm_mmap_locked(struct file *filp, struct vm_area_struct *vma);
  extern void drm_vm_open_locked(struct vm_area_struct *vma);
+extern void drm_vm_close_locked(struct vm_area_struct *vma);
  extern resource_size_t drm_core_get_map_ofs(struct drm_local_map * map);
  extern resource_size_t drm_core_get_reg_ofs(struct drm_device *dev);
  extern unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait);
@@ -1455,12 +1455,11 @@ int drm_gem_init(struct drm_device *dev);
  void drm_gem_destroy(struct drm_device *dev);
  void drm_gem_object_release(struct drm_gem_object *obj);
  void drm_gem_object_free(struct kref *kref);
-void drm_gem_object_free_unlocked(struct kref *kref);
  struct drm_gem_object *drm_gem_object_alloc(struct drm_device *dev,
                                             size_t size);
  int drm_gem_object_init(struct drm_device *dev,
                         struct drm_gem_object *obj, size_t size);
-void drm_gem_object_handle_free(struct kref *kref);
+void drm_gem_object_handle_free(struct drm_gem_object *obj);
  void drm_gem_vm_open(struct vm_area_struct *vma);
  void drm_gem_vm_close(struct vm_area_struct *vma);
  int drm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
@@ -1483,8 +1482,12 @@ drm_gem_object_unreference(struct drm_gem_object *obj)
  static inline void
  drm_gem_object_unreference_unlocked(struct drm_gem_object *obj)
  {
-       if (obj != NULL)
-               kref_put(&obj->refcount, drm_gem_object_free_unlocked);
+       if (obj != NULL) {
+               struct drm_device *dev = obj->dev;
+               mutex_lock(&dev->struct_mutex);
+               kref_put(&obj->refcount, drm_gem_object_free);
+               mutex_unlock(&dev->struct_mutex);
+       }
  }
  
  int drm_gem_handle_create(struct drm_file *file_priv,
@@ -1495,7 +1498,7 @@ static inline void
  drm_gem_object_handle_reference(struct drm_gem_object *obj)
  {
         drm_gem_object_reference(obj);
-       kref_get(&obj->handlecount);
+       atomic_inc(&obj->handle_count);
  }
  
  static inline void
@@ -1504,12 +1507,15 @@ drm_gem_object_handle_unreference(struct drm_gem_object *obj)
         if (obj == NULL)
                 return;
  
+       if (atomic_read(&obj->handle_count) == 0)
+               return;
         /*
          * Must bump handle count first as this may be the last
          * ref, in which case the object would disappear before we
          * checked for a name
          */
-       kref_put(&obj->handlecount, drm_gem_object_handle_free);
+       if (atomic_dec_and_test(&obj->handle_count))
+               drm_gem_object_handle_free(obj);
         drm_gem_object_unreference(obj);
  }
  
@@ -1519,12 +1525,17 @@ drm_gem_object_handle_unreference_unlocked(struct drm_gem_object *obj)
         if (obj == NULL)
                 return;
  
+       if (atomic_read(&obj->handle_count) == 0)
+               return;
+
         /*
         * Must bump handle count first as this may be the last
         * ref, in which case the object would disappear before we
         * checked for a name
         */
-       kref_put(&obj->handlecount, drm_gem_object_handle_free);
+
+       if (atomic_dec_and_test(&obj->handle_count))
+               drm_gem_object_handle_free(obj);
         drm_gem_object_unreference_unlocked(obj);
  }
  
diff --git a/include/drm/drm_pciids.h b/include/drm/drm_pciids.h

index 3a9940ef728bb5d2412c4cb54d84ed746e15d870..883c1d4398996d8ba807ca5d54940cec28c7809e 100644 (file)
--- a/include/drm/drm_pciids.h
+++ b/include/drm/drm_pciids.h
@@ -85,7 +85,6 @@
         {0x1002, 0x5460, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_IS_MOBILITY}, \
         {0x1002, 0x5462, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_IS_MOBILITY}, \
         {0x1002, 0x5464, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_IS_MOBILITY}, \
-       {0x1002, 0x5657, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_NEW_MEMMAP}, \
         {0x1002, 0x5548, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R423|RADEON_NEW_MEMMAP}, \
         {0x1002, 0x5549, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R423|RADEON_NEW_MEMMAP}, \
         {0x1002, 0x554A, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R423|RADEON_NEW_MEMMAP}, \
@@ -103,6 +102,7 @@
         {0x1002, 0x564F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
         {0x1002, 0x5652, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
         {0x1002, 0x5653, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
+       {0x1002, 0x5657, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_NEW_MEMMAP}, \
         {0x1002, 0x5834, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RS300|RADEON_IS_IGP}, \
         {0x1002, 0x5835, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RS300|RADEON_IS_IGP|RADEON_IS_MOBILITY}, \
         {0x1002, 0x5954, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RS480|RADEON_IS_IGP|RADEON_IS_MOBILITY|RADEON_IS_IGPGART}, \
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h

index 267a86c74e2e5cc089dfb5c72a1f1da3dbdae721..2040e6c4f1729a7de3001d5077277b8521427fb9 100644 (file)
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -246,9 +246,11 @@ struct ttm_buffer_object {
  
         atomic_t reserved;
  
-
         /**
          * Members protected by the bo::lock
+        * In addition, setting sync_obj to anything else
+        * than NULL requires bo::reserved to be held. This allows for
+        * checking NULL while reserved but not holding bo::lock.
          */
  
         void *sync_obj_arg;
diff --git a/include/linux/Kbuild b/include/linux/Kbuild

index 626b629429ff2fc3f30fcf63b25fb52211f184c0..4e8ea8c8ec1e7f6fc04ebf992294b63cb23b1bbb 100644 (file)
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -118,7 +118,6 @@ header-y += eventpoll.h
  header-y += ext2_fs.h
  header-y += fadvise.h
  header-y += falloc.h
-header-y += fanotify.h
  header-y += fb.h
  header-y += fcntl.h
  header-y += fd.h
diff --git a/include/linux/ceph/auth.h b/include/linux/ceph/auth.h

new file mode 100644 (file)

index 0000000..7fff521
--- /dev/null
+++ b/include/linux/ceph/auth.h
@@ -0,0 +1,92 @@
+#ifndef _FS_CEPH_AUTH_H
+#define _FS_CEPH_AUTH_H
+
+#include <linux/ceph/types.h>
+#include <linux/ceph/buffer.h>
+
+/*
+ * Abstract interface for communicating with the authenticate module.
+ * There is some handshake that takes place between us and the monitor
+ * to acquire the necessary keys.  These are used to generate an
+ * 'authorizer' that we use when connecting to a service (mds, osd).
+ */
+
+struct ceph_auth_client;
+struct ceph_authorizer;
+
+struct ceph_auth_client_ops {
+       const char *name;
+
+       /*
+        * true if we are authenticated and can connect to
+        * services.
+        */
+       int (*is_authenticated)(struct ceph_auth_client *ac);
+
+       /*
+        * true if we should (re)authenticate, e.g., when our tickets
+        * are getting old and crusty.
+        */
+       int (*should_authenticate)(struct ceph_auth_client *ac);
+
+       /*
+        * build requests and process replies during monitor
+        * handshake.  if handle_reply returns -EAGAIN, we build
+        * another request.
+        */
+       int (*build_request)(struct ceph_auth_client *ac, void *buf, void *end);
+       int (*handle_reply)(struct ceph_auth_client *ac, int result,
+                           void *buf, void *end);
+
+       /*
+        * Create authorizer for connecting to a service, and verify
+        * the response to authenticate the service.
+        */
+       int (*create_authorizer)(struct ceph_auth_client *ac, int peer_type,
+                                struct ceph_authorizer **a,
+                                void **buf, size_t *len,
+                                void **reply_buf, size_t *reply_len);
+       int (*verify_authorizer_reply)(struct ceph_auth_client *ac,
+                                      struct ceph_authorizer *a, size_t len);
+       void (*destroy_authorizer)(struct ceph_auth_client *ac,
+                                  struct ceph_authorizer *a);
+       void (*invalidate_authorizer)(struct ceph_auth_client *ac,
+                                     int peer_type);
+
+       /* reset when we (re)connect to a monitor */
+       void (*reset)(struct ceph_auth_client *ac);
+
+       void (*destroy)(struct ceph_auth_client *ac);
+};
+
+struct ceph_auth_client {
+       u32 protocol;           /* CEPH_AUTH_* */
+       void *private;          /* for use by protocol implementation */
+       const struct ceph_auth_client_ops *ops;  /* null iff protocol==0 */
+
+       bool negotiating;       /* true if negotiating protocol */
+       const char *name;       /* entity name */
+       u64 global_id;          /* our unique id in system */
+       const char *secret;     /* our secret key */
+       unsigned want_keys;     /* which services we want */
+};
+
+extern struct ceph_auth_client *ceph_auth_init(const char *name,
+                                              const char *secret);
+extern void ceph_auth_destroy(struct ceph_auth_client *ac);
+
+extern void ceph_auth_reset(struct ceph_auth_client *ac);
+
+extern int ceph_auth_build_hello(struct ceph_auth_client *ac,
+                                void *buf, size_t len);
+extern int ceph_handle_auth_reply(struct ceph_auth_client *ac,
+                                 void *buf, size_t len,
+                                 void *reply_buf, size_t reply_len);
+extern int ceph_entity_name_encode(const char *name, void **p, void *end);
+
+extern int ceph_build_auth(struct ceph_auth_client *ac,
+                   void *msg_buf, size_t msg_len);
+
+extern int ceph_auth_is_authenticated(struct ceph_auth_client *ac);
+
+#endif
diff --git a/include/linux/ceph/buffer.h b/include/linux/ceph/buffer.h

new file mode 100644 (file)

index 0000000..58d1901
--- /dev/null
+++ b/include/linux/ceph/buffer.h
@@ -0,0 +1,39 @@
+#ifndef __FS_CEPH_BUFFER_H
+#define __FS_CEPH_BUFFER_H
+
+#include <linux/kref.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/types.h>
+#include <linux/uio.h>
+
+/*
+ * a simple reference counted buffer.
+ *
+ * use kmalloc for small sizes (<= one page), vmalloc for larger
+ * sizes.
+ */
+struct ceph_buffer {
+       struct kref kref;
+       struct kvec vec;
+       size_t alloc_len;
+       bool is_vmalloc;
+};
+
+extern struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp);
+extern void ceph_buffer_release(struct kref *kref);
+
+static inline struct ceph_buffer *ceph_buffer_get(struct ceph_buffer *b)
+{
+       kref_get(&b->kref);
+       return b;
+}
+
+static inline void ceph_buffer_put(struct ceph_buffer *b)
+{
+       kref_put(&b->kref, ceph_buffer_release);
+}
+
+extern int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end);
+
+#endif
diff --git a/include/linux/ceph/ceph_debug.h b/include/linux/ceph/ceph_debug.h

new file mode 100644 (file)

index 0000000..aa2e191
--- /dev/null
+++ b/include/linux/ceph/ceph_debug.h
@@ -0,0 +1,38 @@
+#ifndef _FS_CEPH_DEBUG_H
+#define _FS_CEPH_DEBUG_H
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#ifdef CONFIG_CEPH_LIB_PRETTYDEBUG
+
+/*
+ * wrap pr_debug to include a filename:lineno prefix on each line.
+ * this incurs some overhead (kernel size and execution time) due to
+ * the extra function call at each call site.
+ */
+
+# if defined(DEBUG) || defined(CONFIG_DYNAMIC_DEBUG)
+extern const char *ceph_file_part(const char *s, int len);
+#  define dout(fmt, ...)                                               \
+       pr_debug("%.*s %12.12s:%-4d : " fmt,                            \
+                8 - (int)sizeof(KBUILD_MODNAME), "    ",               \
+                ceph_file_part(__FILE__, sizeof(__FILE__)),            \
+                __LINE__, ##__VA_ARGS__)
+# else
+/* faux printk call just to see any compiler warnings. */
+#  define dout(fmt, ...)       do {                            \
+               if (0)                                          \
+                       printk(KERN_DEBUG fmt, ##__VA_ARGS__);  \
+       } while (0)
+# endif
+
+#else
+
+/*
+ * or, just wrap pr_debug
+ */
+# define dout(fmt, ...)        pr_debug(" " fmt, ##__VA_ARGS__)
+
+#endif
+
+#endif
diff --git a/include/linux/ceph/ceph_frag.h b/include/linux/ceph/ceph_frag.h

new file mode 100644 (file)

index 0000000..5babb8e
--- /dev/null
+++ b/include/linux/ceph/ceph_frag.h
@@ -0,0 +1,109 @@
+#ifndef FS_CEPH_FRAG_H
+#define FS_CEPH_FRAG_H
+
+/*
+ * "Frags" are a way to describe a subset of a 32-bit number space,
+ * using a mask and a value to match against that mask.  Any given frag
+ * (subset of the number space) can be partitioned into 2^n sub-frags.
+ *
+ * Frags are encoded into a 32-bit word:
+ *   8 upper bits = "bits"
+ *  24 lower bits = "value"
+ * (We could go to 5+27 bits, but who cares.)
+ *
+ * We use the _most_ significant bits of the 24 bit value.  This makes
+ * values logically sort.
+ *
+ * Unfortunately, because the "bits" field is still in the high bits, we
+ * can't sort encoded frags numerically.  However, it does allow you
+ * to feed encoded frags as values into frag_contains_value.
+ */
+static inline __u32 ceph_frag_make(__u32 b, __u32 v)
+{
+       return (b << 24) |
+               (v & (0xffffffu << (24-b)) & 0xffffffu);
+}
+static inline __u32 ceph_frag_bits(__u32 f)
+{
+       return f >> 24;
+}
+static inline __u32 ceph_frag_value(__u32 f)
+{
+       return f & 0xffffffu;
+}
+static inline __u32 ceph_frag_mask(__u32 f)
+{
+       return (0xffffffu << (24-ceph_frag_bits(f))) & 0xffffffu;
+}
+static inline __u32 ceph_frag_mask_shift(__u32 f)
+{
+       return 24 - ceph_frag_bits(f);
+}
+
+static inline int ceph_frag_contains_value(__u32 f, __u32 v)
+{
+       return (v & ceph_frag_mask(f)) == ceph_frag_value(f);
+}
+static inline int ceph_frag_contains_frag(__u32 f, __u32 sub)
+{
+       /* is sub as specific as us, and contained by us? */
+       return ceph_frag_bits(sub) >= ceph_frag_bits(f) &&
+              (ceph_frag_value(sub) & ceph_frag_mask(f)) == ceph_frag_value(f);
+}
+
+static inline __u32 ceph_frag_parent(__u32 f)
+{
+       return ceph_frag_make(ceph_frag_bits(f) - 1,
+                        ceph_frag_value(f) & (ceph_frag_mask(f) << 1));
+}
+static inline int ceph_frag_is_left_child(__u32 f)
+{
+       return ceph_frag_bits(f) > 0 &&
+               (ceph_frag_value(f) & (0x1000000 >> ceph_frag_bits(f))) == 0;
+}
+static inline int ceph_frag_is_right_child(__u32 f)
+{
+       return ceph_frag_bits(f) > 0 &&
+               (ceph_frag_value(f) & (0x1000000 >> ceph_frag_bits(f))) == 1;
+}
+static inline __u32 ceph_frag_sibling(__u32 f)
+{
+       return ceph_frag_make(ceph_frag_bits(f),
+                     ceph_frag_value(f) ^ (0x1000000 >> ceph_frag_bits(f)));
+}
+static inline __u32 ceph_frag_left_child(__u32 f)
+{
+       return ceph_frag_make(ceph_frag_bits(f)+1, ceph_frag_value(f));
+}
+static inline __u32 ceph_frag_right_child(__u32 f)
+{
+       return ceph_frag_make(ceph_frag_bits(f)+1,
+             ceph_frag_value(f) | (0x1000000 >> (1+ceph_frag_bits(f))));
+}
+static inline __u32 ceph_frag_make_child(__u32 f, int by, int i)
+{
+       int newbits = ceph_frag_bits(f) + by;
+       return ceph_frag_make(newbits,
+                        ceph_frag_value(f) | (i << (24 - newbits)));
+}
+static inline int ceph_frag_is_leftmost(__u32 f)
+{
+       return ceph_frag_value(f) == 0;
+}
+static inline int ceph_frag_is_rightmost(__u32 f)
+{
+       return ceph_frag_value(f) == ceph_frag_mask(f);
+}
+static inline __u32 ceph_frag_next(__u32 f)
+{
+       return ceph_frag_make(ceph_frag_bits(f),
+                        ceph_frag_value(f) + (0x1000000 >> ceph_frag_bits(f)));
+}
+
+/*
+ * comparator to sort frags logically, as when traversing the
+ * number space in ascending order...
+ */
+int ceph_frag_compare(__u32 a, __u32 b);
+
+#endif
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h

new file mode 100644 (file)

index 0000000..c3c74ae
--- /dev/null
+++ b/include/linux/ceph/ceph_fs.h
@@ -0,0 +1,729 @@
+/*
+ * ceph_fs.h - Ceph constants and data types to share between kernel and
+ * user space.
+ *
+ * Most types in this file are defined as little-endian, and are
+ * primarily intended to describe data structures that pass over the
+ * wire or that are stored on disk.
+ *
+ * LGPL2
+ */
+
+#ifndef CEPH_FS_H
+#define CEPH_FS_H
+
+#include "msgr.h"
+#include "rados.h"
+
+/*
+ * subprotocol versions.  when specific messages types or high-level
+ * protocols change, bump the affected components.  we keep rev
+ * internal cluster protocols separately from the public,
+ * client-facing protocol.
+ */
+#define CEPH_OSD_PROTOCOL     8 /* cluster internal */
+#define CEPH_MDS_PROTOCOL    12 /* cluster internal */
+#define CEPH_MON_PROTOCOL     5 /* cluster internal */
+#define CEPH_OSDC_PROTOCOL   24 /* server/client */
+#define CEPH_MDSC_PROTOCOL   32 /* server/client */
+#define CEPH_MONC_PROTOCOL   15 /* server/client */
+
+
+#define CEPH_INO_ROOT  1
+#define CEPH_INO_CEPH  2        /* hidden .ceph dir */
+
+/* arbitrary limit on max # of monitors (cluster of 3 is typical) */
+#define CEPH_MAX_MON   31
+
+
+/*
+ * feature bits
+ */
+#define CEPH_FEATURE_UID            (1<<0)
+#define CEPH_FEATURE_NOSRCADDR      (1<<1)
+#define CEPH_FEATURE_MONCLOCKCHECK  (1<<2)
+#define CEPH_FEATURE_FLOCK          (1<<3)
+
+
+/*
+ * ceph_file_layout - describe data layout for a file/inode
+ */
+struct ceph_file_layout {
+       /* file -> object mapping */
+       __le32 fl_stripe_unit;     /* stripe unit, in bytes.  must be multiple
+                                     of page size. */
+       __le32 fl_stripe_count;    /* over this many objects */
+       __le32 fl_object_size;     /* until objects are this big, then move to
+                                     new objects */
+       __le32 fl_cas_hash;        /* 0 = none; 1 = sha256 */
+
+       /* pg -> disk layout */
+       __le32 fl_object_stripe_unit;  /* for per-object parity, if any */
+
+       /* object -> pg layout */
+       __le32 fl_pg_preferred; /* preferred primary for pg (-1 for none) */
+       __le32 fl_pg_pool;      /* namespace, crush ruleset, rep level */
+} __attribute__ ((packed));
+
+#define CEPH_MIN_STRIPE_UNIT 65536
+
+int ceph_file_layout_is_valid(const struct ceph_file_layout *layout);
+
+
+/* crypto algorithms */
+#define CEPH_CRYPTO_NONE 0x0
+#define CEPH_CRYPTO_AES  0x1
+
+#define CEPH_AES_IV "cephsageyudagreg"
+
+/* security/authentication protocols */
+#define CEPH_AUTH_UNKNOWN      0x0
+#define CEPH_AUTH_NONE         0x1
+#define CEPH_AUTH_CEPHX                0x2
+
+#define CEPH_AUTH_UID_DEFAULT ((__u64) -1)
+
+
+/*********************************************
+ * message layer
+ */
+
+/*
+ * message types
+ */
+
+/* misc */
+#define CEPH_MSG_SHUTDOWN               1
+#define CEPH_MSG_PING                   2
+
+/* client <-> monitor */
+#define CEPH_MSG_MON_MAP                4
+#define CEPH_MSG_MON_GET_MAP            5
+#define CEPH_MSG_STATFS                 13
+#define CEPH_MSG_STATFS_REPLY           14
+#define CEPH_MSG_MON_SUBSCRIBE          15
+#define CEPH_MSG_MON_SUBSCRIBE_ACK      16
+#define CEPH_MSG_AUTH                  17
+#define CEPH_MSG_AUTH_REPLY            18
+
+/* client <-> mds */
+#define CEPH_MSG_MDS_MAP                21
+
+#define CEPH_MSG_CLIENT_SESSION         22
+#define CEPH_MSG_CLIENT_RECONNECT       23
+
+#define CEPH_MSG_CLIENT_REQUEST         24
+#define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
+#define CEPH_MSG_CLIENT_REPLY           26
+#define CEPH_MSG_CLIENT_CAPS            0x310
+#define CEPH_MSG_CLIENT_LEASE           0x311
+#define CEPH_MSG_CLIENT_SNAP            0x312
+#define CEPH_MSG_CLIENT_CAPRELEASE      0x313
+
+/* pool ops */
+#define CEPH_MSG_POOLOP_REPLY           48
+#define CEPH_MSG_POOLOP                 49
+
+
+/* osd */
+#define CEPH_MSG_OSD_MAP          41
+#define CEPH_MSG_OSD_OP           42
+#define CEPH_MSG_OSD_OPREPLY      43
+
+/* pool operations */
+enum {
+  POOL_OP_CREATE                       = 0x01,
+  POOL_OP_DELETE                       = 0x02,
+  POOL_OP_AUID_CHANGE                  = 0x03,
+  POOL_OP_CREATE_SNAP                  = 0x11,
+  POOL_OP_DELETE_SNAP                  = 0x12,
+  POOL_OP_CREATE_UNMANAGED_SNAP                = 0x21,
+  POOL_OP_DELETE_UNMANAGED_SNAP                = 0x22,
+};
+
+struct ceph_mon_request_header {
+       __le64 have_version;
+       __le16 session_mon;
+       __le64 session_mon_tid;
+} __attribute__ ((packed));
+
+struct ceph_mon_statfs {
+       struct ceph_mon_request_header monhdr;
+       struct ceph_fsid fsid;
+} __attribute__ ((packed));
+
+struct ceph_statfs {
+       __le64 kb, kb_used, kb_avail;
+       __le64 num_objects;
+} __attribute__ ((packed));
+
+struct ceph_mon_statfs_reply {
+       struct ceph_fsid fsid;
+       __le64 version;
+       struct ceph_statfs st;
+} __attribute__ ((packed));
+
+const char *ceph_pool_op_name(int op);
+
+struct ceph_mon_poolop {
+       struct ceph_mon_request_header monhdr;
+       struct ceph_fsid fsid;
+       __le32 pool;
+       __le32 op;
+       __le64 auid;
+       __le64 snapid;
+       __le32 name_len;
+} __attribute__ ((packed));
+
+struct ceph_mon_poolop_reply {
+       struct ceph_mon_request_header monhdr;
+       struct ceph_fsid fsid;
+       __le32 reply_code;
+       __le32 epoch;
+       char has_data;
+       char data[0];
+} __attribute__ ((packed));
+
+struct ceph_mon_unmanaged_snap {
+       __le64 snapid;
+} __attribute__ ((packed));
+
+struct ceph_osd_getmap {
+       struct ceph_mon_request_header monhdr;
+       struct ceph_fsid fsid;
+       __le32 start;
+} __attribute__ ((packed));
+
+struct ceph_mds_getmap {
+       struct ceph_mon_request_header monhdr;
+       struct ceph_fsid fsid;
+} __attribute__ ((packed));
+
+struct ceph_client_mount {
+       struct ceph_mon_request_header monhdr;
+} __attribute__ ((packed));
+
+struct ceph_mon_subscribe_item {
+       __le64 have_version;    __le64 have;
+       __u8 onetime;
+} __attribute__ ((packed));
+
+struct ceph_mon_subscribe_ack {
+       __le32 duration;         /* seconds */
+       struct ceph_fsid fsid;
+} __attribute__ ((packed));
+
+/*
+ * mds states
+ *   > 0 -> in
+ *  <= 0 -> out
+ */
+#define CEPH_MDS_STATE_DNE          0  /* down, does not exist. */
+#define CEPH_MDS_STATE_STOPPED     -1  /* down, once existed, but no subtrees.
+                                         empty log. */
+#define CEPH_MDS_STATE_BOOT        -4  /* up, boot announcement. */
+#define CEPH_MDS_STATE_STANDBY     -5  /* up, idle.  waiting for assignment. */
+#define CEPH_MDS_STATE_CREATING    -6  /* up, creating MDS instance. */
+#define CEPH_MDS_STATE_STARTING    -7  /* up, starting previously stopped mds */
+#define CEPH_MDS_STATE_STANDBY_REPLAY -8 /* up, tailing active node's journal */
+
+#define CEPH_MDS_STATE_REPLAY       8  /* up, replaying journal. */
+#define CEPH_MDS_STATE_RESOLVE      9  /* up, disambiguating distributed
+                                         operations (import, rename, etc.) */
+#define CEPH_MDS_STATE_RECONNECT    10 /* up, reconnect to clients */
+#define CEPH_MDS_STATE_REJOIN       11 /* up, rejoining distributed cache */
+#define CEPH_MDS_STATE_CLIENTREPLAY 12 /* up, replaying client operations */
+#define CEPH_MDS_STATE_ACTIVE       13 /* up, active */
+#define CEPH_MDS_STATE_STOPPING     14 /* up, but exporting metadata */
+
+extern const char *ceph_mds_state_name(int s);
+
+
+/*
+ * metadata lock types.
+ *  - these are bitmasks.. we can compose them
+ *  - they also define the lock ordering by the MDS
+ *  - a few of these are internal to the mds
+ */
+#define CEPH_LOCK_DVERSION    1
+#define CEPH_LOCK_DN          2
+#define CEPH_LOCK_ISNAP       16
+#define CEPH_LOCK_IVERSION    32    /* mds internal */
+#define CEPH_LOCK_IFILE       64
+#define CEPH_LOCK_IAUTH       128
+#define CEPH_LOCK_ILINK       256
+#define CEPH_LOCK_IDFT        512   /* dir frag tree */
+#define CEPH_LOCK_INEST       1024  /* mds internal */
+#define CEPH_LOCK_IXATTR      2048
+#define CEPH_LOCK_IFLOCK      4096  /* advisory file locks */
+#define CEPH_LOCK_INO         8192  /* immutable inode bits; not a lock */
+
+/* client_session ops */
+enum {
+       CEPH_SESSION_REQUEST_OPEN,
+       CEPH_SESSION_OPEN,
+       CEPH_SESSION_REQUEST_CLOSE,
+       CEPH_SESSION_CLOSE,
+       CEPH_SESSION_REQUEST_RENEWCAPS,
+       CEPH_SESSION_RENEWCAPS,
+       CEPH_SESSION_STALE,
+       CEPH_SESSION_RECALL_STATE,
+};
+
+extern const char *ceph_session_op_name(int op);
+
+struct ceph_mds_session_head {
+       __le32 op;
+       __le64 seq;
+       struct ceph_timespec stamp;
+       __le32 max_caps, max_leases;
+} __attribute__ ((packed));
+
+/* client_request */
+/*
+ * metadata ops.
+ *  & 0x001000 -> write op
+ *  & 0x010000 -> follow symlink (e.g. stat(), not lstat()).
+ &  & 0x100000 -> use weird ino/path trace
+ */
+#define CEPH_MDS_OP_WRITE        0x001000
+enum {
+       CEPH_MDS_OP_LOOKUP     = 0x00100,
+       CEPH_MDS_OP_GETATTR    = 0x00101,
+       CEPH_MDS_OP_LOOKUPHASH = 0x00102,
+       CEPH_MDS_OP_LOOKUPPARENT = 0x00103,
+
+       CEPH_MDS_OP_SETXATTR   = 0x01105,
+       CEPH_MDS_OP_RMXATTR    = 0x01106,
+       CEPH_MDS_OP_SETLAYOUT  = 0x01107,
+       CEPH_MDS_OP_SETATTR    = 0x01108,
+       CEPH_MDS_OP_SETFILELOCK= 0x01109,
+       CEPH_MDS_OP_GETFILELOCK= 0x00110,
+       CEPH_MDS_OP_SETDIRLAYOUT=0x0110a,
+
+       CEPH_MDS_OP_MKNOD      = 0x01201,
+       CEPH_MDS_OP_LINK       = 0x01202,
+       CEPH_MDS_OP_UNLINK     = 0x01203,
+       CEPH_MDS_OP_RENAME     = 0x01204,
+       CEPH_MDS_OP_MKDIR      = 0x01220,
+       CEPH_MDS_OP_RMDIR      = 0x01221,
+       CEPH_MDS_OP_SYMLINK    = 0x01222,
+
+       CEPH_MDS_OP_CREATE     = 0x01301,
+       CEPH_MDS_OP_OPEN       = 0x00302,
+       CEPH_MDS_OP_READDIR    = 0x00305,
+
+       CEPH_MDS_OP_LOOKUPSNAP = 0x00400,
+       CEPH_MDS_OP_MKSNAP     = 0x01400,
+       CEPH_MDS_OP_RMSNAP     = 0x01401,
+       CEPH_MDS_OP_LSSNAP     = 0x00402,
+};
+
+extern const char *ceph_mds_op_name(int op);
+
+
+#define CEPH_SETATTR_MODE   1
+#define CEPH_SETATTR_UID    2
+#define CEPH_SETATTR_GID    4
+#define CEPH_SETATTR_MTIME  8
+#define CEPH_SETATTR_ATIME 16
+#define CEPH_SETATTR_SIZE  32
+#define CEPH_SETATTR_CTIME 64
+
+union ceph_mds_request_args {
+       struct {
+               __le32 mask;                 /* CEPH_CAP_* */
+       } __attribute__ ((packed)) getattr;
+       struct {
+               __le32 mode;
+               __le32 uid;
+               __le32 gid;
+               struct ceph_timespec mtime;
+               struct ceph_timespec atime;
+               __le64 size, old_size;       /* old_size needed by truncate */
+               __le32 mask;                 /* CEPH_SETATTR_* */
+       } __attribute__ ((packed)) setattr;
+       struct {
+               __le32 frag;                 /* which dir fragment */
+               __le32 max_entries;          /* how many dentries to grab */
+               __le32 max_bytes;
+       } __attribute__ ((packed)) readdir;
+       struct {
+               __le32 mode;
+               __le32 rdev;
+       } __attribute__ ((packed)) mknod;
+       struct {
+               __le32 mode;
+       } __attribute__ ((packed)) mkdir;
+       struct {
+               __le32 flags;
+               __le32 mode;
+               __le32 stripe_unit;          /* layout for newly created file */
+               __le32 stripe_count;         /* ... */
+               __le32 object_size;
+               __le32 file_replication;
+               __le32 preferred;
+       } __attribute__ ((packed)) open;
+       struct {
+               __le32 flags;
+       } __attribute__ ((packed)) setxattr;
+       struct {
+               struct ceph_file_layout layout;
+       } __attribute__ ((packed)) setlayout;
+       struct {
+               __u8 rule; /* currently fcntl or flock */
+               __u8 type; /* shared, exclusive, remove*/
+               __le64 pid; /* process id requesting the lock */
+               __le64 pid_namespace;
+               __le64 start; /* initial location to lock */
+               __le64 length; /* num bytes to lock from start */
+               __u8 wait; /* will caller wait for lock to become available? */
+       } __attribute__ ((packed)) filelock_change;
+} __attribute__ ((packed));
+
+#define CEPH_MDS_FLAG_REPLAY        1  /* this is a replayed op */
+#define CEPH_MDS_FLAG_WANT_DENTRY   2  /* want dentry in reply */
+
+struct ceph_mds_request_head {
+       __le64 oldest_client_tid;
+       __le32 mdsmap_epoch;           /* on client */
+       __le32 flags;                  /* CEPH_MDS_FLAG_* */
+       __u8 num_retry, num_fwd;       /* count retry, fwd attempts */
+       __le16 num_releases;           /* # include cap/lease release records */
+       __le32 op;                     /* mds op code */
+       __le32 caller_uid, caller_gid;
+       __le64 ino;                    /* use this ino for openc, mkdir, mknod,
+                                         etc. (if replaying) */
+       union ceph_mds_request_args args;
+} __attribute__ ((packed));
+
+/* cap/lease release record */
+struct ceph_mds_request_release {
+       __le64 ino, cap_id;            /* ino and unique cap id */
+       __le32 caps, wanted;           /* new issued, wanted */
+       __le32 seq, issue_seq, mseq;
+       __le32 dname_seq;              /* if releasing a dentry lease, a */
+       __le32 dname_len;              /* string follows. */
+} __attribute__ ((packed));
+
+/* client reply */
+struct ceph_mds_reply_head {
+       __le32 op;
+       __le32 result;
+       __le32 mdsmap_epoch;
+       __u8 safe;                     /* true if committed to disk */
+       __u8 is_dentry, is_target;     /* true if dentry, target inode records
+                                         are included with reply */
+} __attribute__ ((packed));
+
+/* one for each node split */
+struct ceph_frag_tree_split {
+       __le32 frag;                   /* this frag splits... */
+       __le32 by;                     /* ...by this many bits */
+} __attribute__ ((packed));
+
+struct ceph_frag_tree_head {
+       __le32 nsplits;                /* num ceph_frag_tree_split records */
+       struct ceph_frag_tree_split splits[];
+} __attribute__ ((packed));
+
+/* capability issue, for bundling with mds reply */
+struct ceph_mds_reply_cap {
+       __le32 caps, wanted;           /* caps issued, wanted */
+       __le64 cap_id;
+       __le32 seq, mseq;
+       __le64 realm;                  /* snap realm */
+       __u8 flags;                    /* CEPH_CAP_FLAG_* */
+} __attribute__ ((packed));
+
+#define CEPH_CAP_FLAG_AUTH  1          /* cap is issued by auth mds */
+
+/* inode record, for bundling with mds reply */
+struct ceph_mds_reply_inode {
+       __le64 ino;
+       __le64 snapid;
+       __le32 rdev;
+       __le64 version;                /* inode version */
+       __le64 xattr_version;          /* version for xattr blob */
+       struct ceph_mds_reply_cap cap; /* caps issued for this inode */
+       struct ceph_file_layout layout;
+       struct ceph_timespec ctime, mtime, atime;
+       __le32 time_warp_seq;
+       __le64 size, max_size, truncate_size;
+       __le32 truncate_seq;
+       __le32 mode, uid, gid;
+       __le32 nlink;
+       __le64 files, subdirs, rbytes, rfiles, rsubdirs;  /* dir stats */
+       struct ceph_timespec rctime;
+       struct ceph_frag_tree_head fragtree;  /* (must be at end of struct) */
+} __attribute__ ((packed));
+/* followed by frag array, then symlink string, then xattr blob */
+
+/* reply_lease follows dname, and reply_inode */
+struct ceph_mds_reply_lease {
+       __le16 mask;            /* lease type(s) */
+       __le32 duration_ms;     /* lease duration */
+       __le32 seq;
+} __attribute__ ((packed));
+
+struct ceph_mds_reply_dirfrag {
+       __le32 frag;            /* fragment */
+       __le32 auth;            /* auth mds, if this is a delegation point */
+       __le32 ndist;           /* number of mds' this is replicated on */
+       __le32 dist[];
+} __attribute__ ((packed));
+
+#define CEPH_LOCK_FCNTL    1
+#define CEPH_LOCK_FLOCK    2
+
+#define CEPH_LOCK_SHARED   1
+#define CEPH_LOCK_EXCL     2
+#define CEPH_LOCK_UNLOCK   4
+
+struct ceph_filelock {
+       __le64 start;/* file offset to start lock at */
+       __le64 length; /* num bytes to lock; 0 for all following start */
+       __le64 client; /* which client holds the lock */
+       __le64 pid; /* process id holding the lock on the client */
+       __le64 pid_namespace;
+       __u8 type; /* shared lock, exclusive lock, or unlock */
+} __attribute__ ((packed));
+
+
+/* file access modes */
+#define CEPH_FILE_MODE_PIN        0
+#define CEPH_FILE_MODE_RD         1
+#define CEPH_FILE_MODE_WR         2
+#define CEPH_FILE_MODE_RDWR       3  /* RD | WR */
+#define CEPH_FILE_MODE_LAZY       4  /* lazy io */
+#define CEPH_FILE_MODE_NUM        8  /* bc these are bit fields.. mostly */
+
+int ceph_flags_to_mode(int flags);
+
+
+/* capability bits */
+#define CEPH_CAP_PIN         1  /* no specific capabilities beyond the pin */
+
+/* generic cap bits */
+#define CEPH_CAP_GSHARED     1  /* client can reads */
+#define CEPH_CAP_GEXCL       2  /* client can read and update */
+#define CEPH_CAP_GCACHE      4  /* (file) client can cache reads */
+#define CEPH_CAP_GRD         8  /* (file) client can read */
+#define CEPH_CAP_GWR        16  /* (file) client can write */
+#define CEPH_CAP_GBUFFER    32  /* (file) client can buffer writes */
+#define CEPH_CAP_GWREXTEND  64  /* (file) client can extend EOF */
+#define CEPH_CAP_GLAZYIO   128  /* (file) client can perform lazy io */
+
+/* per-lock shift */
+#define CEPH_CAP_SAUTH      2
+#define CEPH_CAP_SLINK      4
+#define CEPH_CAP_SXATTR     6
+#define CEPH_CAP_SFILE      8
+#define CEPH_CAP_SFLOCK    20 
+
+#define CEPH_CAP_BITS       22
+
+/* composed values */
+#define CEPH_CAP_AUTH_SHARED  (CEPH_CAP_GSHARED  << CEPH_CAP_SAUTH)
+#define CEPH_CAP_AUTH_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SAUTH)
+#define CEPH_CAP_LINK_SHARED  (CEPH_CAP_GSHARED  << CEPH_CAP_SLINK)
+#define CEPH_CAP_LINK_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SLINK)
+#define CEPH_CAP_XATTR_SHARED (CEPH_CAP_GSHARED  << CEPH_CAP_SXATTR)
+#define CEPH_CAP_XATTR_EXCL    (CEPH_CAP_GEXCL     << CEPH_CAP_SXATTR)
+#define CEPH_CAP_FILE(x)    (x << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_SHARED   (CEPH_CAP_GSHARED   << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_EXCL     (CEPH_CAP_GEXCL     << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_CACHE    (CEPH_CAP_GCACHE    << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_RD       (CEPH_CAP_GRD       << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_WR       (CEPH_CAP_GWR       << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_BUFFER   (CEPH_CAP_GBUFFER   << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_WREXTEND (CEPH_CAP_GWREXTEND << CEPH_CAP_SFILE)
+#define CEPH_CAP_FILE_LAZYIO   (CEPH_CAP_GLAZYIO   << CEPH_CAP_SFILE)
+#define CEPH_CAP_FLOCK_SHARED  (CEPH_CAP_GSHARED   << CEPH_CAP_SFLOCK)
+#define CEPH_CAP_FLOCK_EXCL    (CEPH_CAP_GEXCL     << CEPH_CAP_SFLOCK)
+
+
+/* cap masks (for getattr) */
+#define CEPH_STAT_CAP_INODE    CEPH_CAP_PIN
+#define CEPH_STAT_CAP_TYPE     CEPH_CAP_PIN  /* mode >> 12 */
+#define CEPH_STAT_CAP_SYMLINK  CEPH_CAP_PIN
+#define CEPH_STAT_CAP_UID      CEPH_CAP_AUTH_SHARED
+#define CEPH_STAT_CAP_GID      CEPH_CAP_AUTH_SHARED
+#define CEPH_STAT_CAP_MODE     CEPH_CAP_AUTH_SHARED
+#define CEPH_STAT_CAP_NLINK    CEPH_CAP_LINK_SHARED
+#define CEPH_STAT_CAP_LAYOUT   CEPH_CAP_FILE_SHARED
+#define CEPH_STAT_CAP_MTIME    CEPH_CAP_FILE_SHARED
+#define CEPH_STAT_CAP_SIZE     CEPH_CAP_FILE_SHARED
+#define CEPH_STAT_CAP_ATIME    CEPH_CAP_FILE_SHARED  /* fixme */
+#define CEPH_STAT_CAP_XATTR    CEPH_CAP_XATTR_SHARED
+#define CEPH_STAT_CAP_INODE_ALL (CEPH_CAP_PIN |                        \
+                                CEPH_CAP_AUTH_SHARED | \
+                                CEPH_CAP_LINK_SHARED | \
+                                CEPH_CAP_FILE_SHARED | \
+                                CEPH_CAP_XATTR_SHARED)
+
+#define CEPH_CAP_ANY_SHARED (CEPH_CAP_AUTH_SHARED |                    \
+                             CEPH_CAP_LINK_SHARED |                    \
+                             CEPH_CAP_XATTR_SHARED |                   \
+                             CEPH_CAP_FILE_SHARED)
+#define CEPH_CAP_ANY_RD   (CEPH_CAP_ANY_SHARED | CEPH_CAP_FILE_RD |    \
+                          CEPH_CAP_FILE_CACHE)
+
+#define CEPH_CAP_ANY_EXCL (CEPH_CAP_AUTH_EXCL |                \
+                          CEPH_CAP_LINK_EXCL |         \
+                          CEPH_CAP_XATTR_EXCL |        \
+                          CEPH_CAP_FILE_EXCL)
+#define CEPH_CAP_ANY_FILE_WR (CEPH_CAP_FILE_WR | CEPH_CAP_FILE_BUFFER |        \
+                             CEPH_CAP_FILE_EXCL)
+#define CEPH_CAP_ANY_WR   (CEPH_CAP_ANY_EXCL | CEPH_CAP_ANY_FILE_WR)
+#define CEPH_CAP_ANY      (CEPH_CAP_ANY_RD | CEPH_CAP_ANY_EXCL | \
+                          CEPH_CAP_ANY_FILE_WR | CEPH_CAP_FILE_LAZYIO | \
+                          CEPH_CAP_PIN)
+
+#define CEPH_CAP_LOCKS (CEPH_LOCK_IFILE | CEPH_LOCK_IAUTH | CEPH_LOCK_ILINK | \
+                       CEPH_LOCK_IXATTR)
+
+int ceph_caps_for_mode(int mode);
+
+enum {
+       CEPH_CAP_OP_GRANT,         /* mds->client grant */
+       CEPH_CAP_OP_REVOKE,        /* mds->client revoke */
+       CEPH_CAP_OP_TRUNC,         /* mds->client trunc notify */
+       CEPH_CAP_OP_EXPORT,        /* mds has exported the cap */
+       CEPH_CAP_OP_IMPORT,        /* mds has imported the cap */
+       CEPH_CAP_OP_UPDATE,        /* client->mds update */
+       CEPH_CAP_OP_DROP,          /* client->mds drop cap bits */
+       CEPH_CAP_OP_FLUSH,         /* client->mds cap writeback */
+       CEPH_CAP_OP_FLUSH_ACK,     /* mds->client flushed */
+       CEPH_CAP_OP_FLUSHSNAP,     /* client->mds flush snapped metadata */
+       CEPH_CAP_OP_FLUSHSNAP_ACK, /* mds->client flushed snapped metadata */
+       CEPH_CAP_OP_RELEASE,       /* client->mds release (clean) cap */
+       CEPH_CAP_OP_RENEW,         /* client->mds renewal request */
+};
+
+extern const char *ceph_cap_op_name(int op);
+
+/*
+ * caps message, used for capability callbacks, acks, requests, etc.
+ */
+struct ceph_mds_caps {
+       __le32 op;                  /* CEPH_CAP_OP_* */
+       __le64 ino, realm;
+       __le64 cap_id;
+       __le32 seq, issue_seq;
+       __le32 caps, wanted, dirty; /* latest issued/wanted/dirty */
+       __le32 migrate_seq;
+       __le64 snap_follows;
+       __le32 snap_trace_len;
+
+       /* authlock */
+       __le32 uid, gid, mode;
+
+       /* linklock */
+       __le32 nlink;
+
+       /* xattrlock */
+       __le32 xattr_len;
+       __le64 xattr_version;
+
+       /* filelock */
+       __le64 size, max_size, truncate_size;
+       __le32 truncate_seq;
+       struct ceph_timespec mtime, atime, ctime;
+       struct ceph_file_layout layout;
+       __le32 time_warp_seq;
+} __attribute__ ((packed));
+
+/* cap release msg head */
+struct ceph_mds_cap_release {
+       __le32 num;                /* number of cap_items that follow */
+} __attribute__ ((packed));
+
+struct ceph_mds_cap_item {
+       __le64 ino;
+       __le64 cap_id;
+       __le32 migrate_seq, seq;
+} __attribute__ ((packed));
+
+#define CEPH_MDS_LEASE_REVOKE           1  /*    mds  -> client */
+#define CEPH_MDS_LEASE_RELEASE          2  /* client  -> mds    */
+#define CEPH_MDS_LEASE_RENEW            3  /* client <-> mds    */
+#define CEPH_MDS_LEASE_REVOKE_ACK       4  /* client  -> mds    */
+
+extern const char *ceph_lease_op_name(int o);
+
+/* lease msg header */
+struct ceph_mds_lease {
+       __u8 action;            /* CEPH_MDS_LEASE_* */
+       __le16 mask;            /* which lease */
+       __le64 ino;
+       __le64 first, last;     /* snap range */
+       __le32 seq;
+       __le32 duration_ms;     /* duration of renewal */
+} __attribute__ ((packed));
+/* followed by a __le32+string for dname */
+
+/* client reconnect */
+struct ceph_mds_cap_reconnect {
+       __le64 cap_id;
+       __le32 wanted;
+       __le32 issued;
+       __le64 snaprealm;
+       __le64 pathbase;        /* base ino for our path to this ino */
+       __le32 flock_len;       /* size of flock state blob, if any */
+} __attribute__ ((packed));
+/* followed by flock blob */
+
+struct ceph_mds_cap_reconnect_v1 {
+       __le64 cap_id;
+       __le32 wanted;
+       __le32 issued;
+       __le64 size;
+       struct ceph_timespec mtime, atime;
+       __le64 snaprealm;
+       __le64 pathbase;        /* base ino for our path to this ino */
+} __attribute__ ((packed));
+
+struct ceph_mds_snaprealm_reconnect {
+       __le64 ino;     /* snap realm base */
+       __le64 seq;     /* snap seq for this snap realm */
+       __le64 parent;  /* parent realm */
+} __attribute__ ((packed));
+
+/*
+ * snaps
+ */
+enum {
+       CEPH_SNAP_OP_UPDATE,  /* CREATE or DESTROY */
+       CEPH_SNAP_OP_CREATE,
+       CEPH_SNAP_OP_DESTROY,
+       CEPH_SNAP_OP_SPLIT,
+};
+
+extern const char *ceph_snap_op_name(int o);
+
+/* snap msg header */
+struct ceph_mds_snap_head {
+       __le32 op;                /* CEPH_SNAP_OP_* */
+       __le64 split;             /* ino to split off, if any */
+       __le32 num_split_inos;    /* # inos belonging to new child realm */
+       __le32 num_split_realms;  /* # child realms udner new child realm */
+       __le32 trace_len;         /* size of snap trace blob */
+} __attribute__ ((packed));
+/* followed by split ino list, then split realms, then the trace blob */
+
+/*
+ * encode info about a snaprealm, as viewed by a client
+ */
+struct ceph_mds_snap_realm {
+       __le64 ino;           /* ino */
+       __le64 created;       /* snap: when created */
+       __le64 parent;        /* ino: parent realm */
+       __le64 parent_since;  /* snap: same parent since */
+       __le64 seq;           /* snap: version */
+       __le32 num_snaps;
+       __le32 num_prior_parent_snaps;
+} __attribute__ ((packed));
+/* followed by my snap list, then prior parent snap list */
+
+#endif
diff --git a/include/linux/ceph/ceph_hash.h b/include/linux/ceph/ceph_hash.h

new file mode 100644 (file)

index 0000000..d099c3f
--- /dev/null
+++ b/include/linux/ceph/ceph_hash.h
@@ -0,0 +1,13 @@
+#ifndef FS_CEPH_HASH_H
+#define FS_CEPH_HASH_H
+
+#define CEPH_STR_HASH_LINUX      0x1  /* linux dcache hash */
+#define CEPH_STR_HASH_RJENKINS   0x2  /* robert jenkins' */
+
+extern unsigned ceph_str_hash_linux(const char *s, unsigned len);
+extern unsigned ceph_str_hash_rjenkins(const char *s, unsigned len);
+
+extern unsigned ceph_str_hash(int type, const char *s, unsigned len);
+extern const char *ceph_str_hash_name(int type);
+
+#endif
diff --git a/include/linux/ceph/debugfs.h b/include/linux/ceph/debugfs.h

new file mode 100644 (file)

index 0000000..2a79702
--- /dev/null
+++ b/include/linux/ceph/debugfs.h
@@ -0,0 +1,33 @@
+#ifndef _FS_CEPH_DEBUGFS_H
+#define _FS_CEPH_DEBUGFS_H
+
+#include "ceph_debug.h"
+#include "types.h"
+
+#define CEPH_DEFINE_SHOW_FUNC(name)                                    \
+static int name##_open(struct inode *inode, struct file *file)         \
+{                                                                      \
+       struct seq_file *sf;                                            \
+       int ret;                                                        \
+                                                                       \
+       ret = single_open(file, name, NULL);                            \
+       sf = file->private_data;                                        \
+       sf->private = inode->i_private;                                 \
+       return ret;                                                     \
+}                                                                      \
+                                                                       \
+static const struct file_operations name##_fops = {                    \
+       .open           = name##_open,                                  \
+       .read           = seq_read,                                     \
+       .llseek         = seq_lseek,                                    \
+       .release        = single_release,                               \
+};
+
+/* debugfs.c */
+extern int ceph_debugfs_init(void);
+extern void ceph_debugfs_cleanup(void);
+extern int ceph_debugfs_client_init(struct ceph_client *client);
+extern void ceph_debugfs_client_cleanup(struct ceph_client *client);
+
+#endif
+
diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h

new file mode 100644 (file)

index 0000000..c5b6939
--- /dev/null
+++ b/include/linux/ceph/decode.h
@@ -0,0 +1,201 @@
+#ifndef __CEPH_DECODE_H
+#define __CEPH_DECODE_H
+
+#include <asm/unaligned.h>
+#include <linux/time.h>
+
+#include "types.h"
+
+/*
+ * in all cases,
+ *   void **p     pointer to position pointer
+ *   void *end    pointer to end of buffer (last byte + 1)
+ */
+
+static inline u64 ceph_decode_64(void **p)
+{
+       u64 v = get_unaligned_le64(*p);
+       *p += sizeof(u64);
+       return v;
+}
+static inline u32 ceph_decode_32(void **p)
+{
+       u32 v = get_unaligned_le32(*p);
+       *p += sizeof(u32);
+       return v;
+}
+static inline u16 ceph_decode_16(void **p)
+{
+       u16 v = get_unaligned_le16(*p);
+       *p += sizeof(u16);
+       return v;
+}
+static inline u8 ceph_decode_8(void **p)
+{
+       u8 v = *(u8 *)*p;
+       (*p)++;
+       return v;
+}
+static inline void ceph_decode_copy(void **p, void *pv, size_t n)
+{
+       memcpy(pv, *p, n);
+       *p += n;
+}
+
+/*
+ * bounds check input.
+ */
+#define ceph_decode_need(p, end, n, bad)               \
+       do {                                            \
+               if (unlikely(*(p) + (n) > (end)))       \
+                       goto bad;                       \
+       } while (0)
+
+#define ceph_decode_64_safe(p, end, v, bad)                    \
+       do {                                                    \
+               ceph_decode_need(p, end, sizeof(u64), bad);     \
+               v = ceph_decode_64(p);                          \
+       } while (0)
+#define ceph_decode_32_safe(p, end, v, bad)                    \
+       do {                                                    \
+               ceph_decode_need(p, end, sizeof(u32), bad);     \
+               v = ceph_decode_32(p);                          \
+       } while (0)
+#define ceph_decode_16_safe(p, end, v, bad)                    \
+       do {                                                    \
+               ceph_decode_need(p, end, sizeof(u16), bad);     \
+               v = ceph_decode_16(p);                          \
+       } while (0)
+#define ceph_decode_8_safe(p, end, v, bad)                     \
+       do {                                                    \
+               ceph_decode_need(p, end, sizeof(u8), bad);      \
+               v = ceph_decode_8(p);                           \
+       } while (0)
+
+#define ceph_decode_copy_safe(p, end, pv, n, bad)              \
+       do {                                                    \
+               ceph_decode_need(p, end, n, bad);               \
+               ceph_decode_copy(p, pv, n);                     \
+       } while (0)
+
+/*
+ * struct ceph_timespec <-> struct timespec
+ */
+static inline void ceph_decode_timespec(struct timespec *ts,
+                                       const struct ceph_timespec *tv)
+{
+       ts->tv_sec = le32_to_cpu(tv->tv_sec);
+       ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
+}
+static inline void ceph_encode_timespec(struct ceph_timespec *tv,
+                                       const struct timespec *ts)
+{
+       tv->tv_sec = cpu_to_le32(ts->tv_sec);
+       tv->tv_nsec = cpu_to_le32(ts->tv_nsec);
+}
+
+/*
+ * sockaddr_storage <-> ceph_sockaddr
+ */
+static inline void ceph_encode_addr(struct ceph_entity_addr *a)
+{
+       __be16 ss_family = htons(a->in_addr.ss_family);
+       a->in_addr.ss_family = *(__u16 *)&ss_family;
+}
+static inline void ceph_decode_addr(struct ceph_entity_addr *a)
+{
+       __be16 ss_family = *(__be16 *)&a->in_addr.ss_family;
+       a->in_addr.ss_family = ntohs(ss_family);
+       WARN_ON(a->in_addr.ss_family == 512);
+}
+
+/*
+ * encoders
+ */
+static inline void ceph_encode_64(void **p, u64 v)
+{
+       put_unaligned_le64(v, (__le64 *)*p);
+       *p += sizeof(u64);
+}
+static inline void ceph_encode_32(void **p, u32 v)
+{
+       put_unaligned_le32(v, (__le32 *)*p);
+       *p += sizeof(u32);
+}
+static inline void ceph_encode_16(void **p, u16 v)
+{
+       put_unaligned_le16(v, (__le16 *)*p);
+       *p += sizeof(u16);
+}
+static inline void ceph_encode_8(void **p, u8 v)
+{
+       *(u8 *)*p = v;
+       (*p)++;
+}
+static inline void ceph_encode_copy(void **p, const void *s, int len)
+{
+       memcpy(*p, s, len);
+       *p += len;
+}
+
+/*
+ * filepath, string encoders
+ */
+static inline void ceph_encode_filepath(void **p, void *end,
+                                       u64 ino, const char *path)
+{
+       u32 len = path ? strlen(path) : 0;
+       BUG_ON(*p + sizeof(ino) + sizeof(len) + len > end);
+       ceph_encode_8(p, 1);
+       ceph_encode_64(p, ino);
+       ceph_encode_32(p, len);
+       if (len)
+               memcpy(*p, path, len);
+       *p += len;
+}
+
+static inline void ceph_encode_string(void **p, void *end,
+                                     const char *s, u32 len)
+{
+       BUG_ON(*p + sizeof(len) + len > end);
+       ceph_encode_32(p, len);
+       if (len)
+               memcpy(*p, s, len);
+       *p += len;
+}
+
+#define ceph_encode_need(p, end, n, bad)               \
+       do {                                            \
+               if (unlikely(*(p) + (n) > (end)))       \
+                       goto bad;                       \
+       } while (0)
+
+#define ceph_encode_64_safe(p, end, v, bad)                    \
+       do {                                                    \
+               ceph_encode_need(p, end, sizeof(u64), bad);     \
+               ceph_encode_64(p, v);                           \
+       } while (0)
+#define ceph_encode_32_safe(p, end, v, bad)                    \
+       do {                                                    \
+               ceph_encode_need(p, end, sizeof(u32), bad);     \
+               ceph_encode_32(p, v);                   \
+       } while (0)
+#define ceph_encode_16_safe(p, end, v, bad)                    \
+       do {                                                    \
+               ceph_encode_need(p, end, sizeof(u16), bad);     \
+               ceph_encode_16(p, v);                   \
+       } while (0)
+
+#define ceph_encode_copy_safe(p, end, pv, n, bad)              \
+       do {                                                    \
+               ceph_encode_need(p, end, n, bad);               \
+               ceph_encode_copy(p, pv, n);                     \
+       } while (0)
+#define ceph_encode_string_safe(p, end, s, n, bad)             \
+       do {                                                    \
+               ceph_encode_need(p, end, n, bad);               \
+               ceph_encode_string(p, end, s, n);               \
+       } while (0)
+
+
+#endif
diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h

new file mode 100644 (file)

index 0000000..f22b2e9
--- /dev/null
+++ b/include/linux/ceph/libceph.h
@@ -0,0 +1,249 @@
+#ifndef _FS_CEPH_LIBCEPH_H
+#define _FS_CEPH_LIBCEPH_H
+
+#include "ceph_debug.h"
+
+#include <asm/unaligned.h>
+#include <linux/backing-dev.h>
+#include <linux/completion.h>
+#include <linux/exportfs.h>
+#include <linux/fs.h>
+#include <linux/mempool.h>
+#include <linux/pagemap.h>
+#include <linux/wait.h>
+#include <linux/writeback.h>
+#include <linux/slab.h>
+
+#include "types.h"
+#include "messenger.h"
+#include "msgpool.h"
+#include "mon_client.h"
+#include "osd_client.h"
+#include "ceph_fs.h"
+
+/*
+ * Supported features
+ */
+#define CEPH_FEATURE_SUPPORTED_DEFAULT CEPH_FEATURE_NOSRCADDR
+#define CEPH_FEATURE_REQUIRED_DEFAULT  CEPH_FEATURE_NOSRCADDR
+
+/*
+ * mount options
+ */
+#define CEPH_OPT_FSID             (1<<0)
+#define CEPH_OPT_NOSHARE          (1<<1) /* don't share client with other sbs */
+#define CEPH_OPT_MYIP             (1<<2) /* specified my ip */
+#define CEPH_OPT_NOCRC            (1<<3) /* no data crc on writes */
+
+#define CEPH_OPT_DEFAULT   (0);
+
+#define ceph_set_opt(client, opt) \
+       (client)->options->flags |= CEPH_OPT_##opt;
+#define ceph_test_opt(client, opt) \
+       (!!((client)->options->flags & CEPH_OPT_##opt))
+
+struct ceph_options {
+       int flags;
+       struct ceph_fsid fsid;
+       struct ceph_entity_addr my_addr;
+       int mount_timeout;
+       int osd_idle_ttl;
+       int osd_timeout;
+       int osd_keepalive_timeout;
+
+       /*
+        * any type that can't be simply compared or doesn't need need
+        * to be compared should go beyond this point,
+        * ceph_compare_options() should be updated accordingly
+        */
+
+       struct ceph_entity_addr *mon_addr; /* should be the first
+                                             pointer type of args */
+       int num_mon;
+       char *name;
+       char *secret;
+};
+
+/*
+ * defaults
+ */
+#define CEPH_MOUNT_TIMEOUT_DEFAULT  60
+#define CEPH_OSD_TIMEOUT_DEFAULT    60  /* seconds */
+#define CEPH_OSD_KEEPALIVE_DEFAULT  5
+#define CEPH_OSD_IDLE_TTL_DEFAULT    60
+#define CEPH_MOUNT_RSIZE_DEFAULT    (512*1024) /* readahead */
+
+#define CEPH_MSG_MAX_FRONT_LEN (16*1024*1024)
+#define CEPH_MSG_MAX_DATA_LEN  (16*1024*1024)
+
+#define CEPH_AUTH_NAME_DEFAULT   "guest"
+
+/*
+ * Delay telling the MDS we no longer want caps, in case we reopen
+ * the file.  Delay a minimum amount of time, even if we send a cap
+ * message for some other reason.  Otherwise, take the oppotunity to
+ * update the mds to avoid sending another message later.
+ */
+#define CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT      5  /* cap release delay */
+#define CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT     60  /* cap release delay */
+
+#define CEPH_CAP_RELEASE_SAFETY_DEFAULT        (CEPH_CAPS_PER_RELEASE * 4)
+
+/* mount state */
+enum {
+       CEPH_MOUNT_MOUNTING,
+       CEPH_MOUNT_MOUNTED,
+       CEPH_MOUNT_UNMOUNTING,
+       CEPH_MOUNT_UNMOUNTED,
+       CEPH_MOUNT_SHUTDOWN,
+};
+
+/*
+ * subtract jiffies
+ */
+static inline unsigned long time_sub(unsigned long a, unsigned long b)
+{
+       BUG_ON(time_after(b, a));
+       return (long)a - (long)b;
+}
+
+struct ceph_mds_client;
+
+/*
+ * per client state
+ *
+ * possibly shared by multiple mount points, if they are
+ * mounting the same ceph filesystem/cluster.
+ */
+struct ceph_client {
+       struct ceph_fsid fsid;
+       bool have_fsid;
+
+       void *private;
+
+       struct ceph_options *options;
+
+       struct mutex mount_mutex;      /* serialize mount attempts */
+       wait_queue_head_t auth_wq;
+       int auth_err;
+
+       int (*extra_mon_dispatch)(struct ceph_client *, struct ceph_msg *);
+
+       u32 supported_features;
+       u32 required_features;
+
+       struct ceph_messenger *msgr;   /* messenger instance */
+       struct ceph_mon_client monc;
+       struct ceph_osd_client osdc;
+
+#ifdef CONFIG_DEBUG_FS
+       struct dentry *debugfs_dir;
+       struct dentry *debugfs_monmap;
+       struct dentry *debugfs_osdmap;
+#endif
+};
+
+
+
+/*
+ * snapshots
+ */
+
+/*
+ * A "snap context" is the set of existing snapshots when we
+ * write data.  It is used by the OSD to guide its COW behavior.
+ *
+ * The ceph_snap_context is refcounted, and attached to each dirty
+ * page, indicating which context the dirty data belonged when it was
+ * dirtied.
+ */
+struct ceph_snap_context {
+       atomic_t nref;
+       u64 seq;
+       int num_snaps;
+       u64 snaps[];
+};
+
+static inline struct ceph_snap_context *
+ceph_get_snap_context(struct ceph_snap_context *sc)
+{
+       /*
+       printk("get_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
+              atomic_read(&sc->nref)+1);
+       */
+       if (sc)
+               atomic_inc(&sc->nref);
+       return sc;
+}
+
+static inline void ceph_put_snap_context(struct ceph_snap_context *sc)
+{
+       if (!sc)
+               return;
+       /*
+       printk("put_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
+              atomic_read(&sc->nref)-1);
+       */
+       if (atomic_dec_and_test(&sc->nref)) {
+               /*printk(" deleting snap_context %p\n", sc);*/
+               kfree(sc);
+       }
+}
+
+/*
+ * calculate the number of pages a given length and offset map onto,
+ * if we align the data.
+ */
+static inline int calc_pages_for(u64 off, u64 len)
+{
+       return ((off+len+PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT) -
+               (off >> PAGE_CACHE_SHIFT);
+}
+
+/* ceph_common.c */
+extern const char *ceph_msg_type_name(int type);
+extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
+extern struct kmem_cache *ceph_inode_cachep;
+extern struct kmem_cache *ceph_cap_cachep;
+extern struct kmem_cache *ceph_dentry_cachep;
+extern struct kmem_cache *ceph_file_cachep;
+
+extern int ceph_parse_options(struct ceph_options **popt, char *options,
+                             const char *dev_name, const char *dev_name_end,
+                             int (*parse_extra_token)(char *c, void *private),
+                             void *private);
+extern void ceph_destroy_options(struct ceph_options *opt);
+extern int ceph_compare_options(struct ceph_options *new_opt,
+                               struct ceph_client *client);
+extern struct ceph_client *ceph_create_client(struct ceph_options *opt,
+                                             void *private);
+extern u64 ceph_client_id(struct ceph_client *client);
+extern void ceph_destroy_client(struct ceph_client *client);
+extern int __ceph_open_session(struct ceph_client *client,
+                              unsigned long started);
+extern int ceph_open_session(struct ceph_client *client);
+
+/* pagevec.c */
+extern void ceph_release_page_vector(struct page **pages, int num_pages);
+
+extern struct page **ceph_get_direct_page_vector(const char __user *data,
+                                           int num_pages,
+                                           loff_t off, size_t len);
+extern void ceph_put_page_vector(struct page **pages, int num_pages);
+extern void ceph_release_page_vector(struct page **pages, int num_pages);
+extern struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags);
+extern int ceph_copy_user_to_page_vector(struct page **pages,
+                                        const char __user *data,
+                                        loff_t off, size_t len);
+extern int ceph_copy_to_page_vector(struct page **pages,
+                                   const char *data,
+                                   loff_t off, size_t len);
+extern int ceph_copy_from_page_vector(struct page **pages,
+                                   char *data,
+                                   loff_t off, size_t len);
+extern int ceph_copy_page_vector_to_user(struct page **pages, char __user *data,
+                                   loff_t off, size_t len);
+extern void ceph_zero_page_vector_range(int off, int len, struct page **pages);
+
+
+#endif /* _FS_CEPH_SUPER_H */
diff --git a/include/linux/ceph/mdsmap.h b/include/linux/ceph/mdsmap.h

new file mode 100644 (file)

index 0000000..4c5cb08
--- /dev/null
+++ b/include/linux/ceph/mdsmap.h
@@ -0,0 +1,62 @@
+#ifndef _FS_CEPH_MDSMAP_H
+#define _FS_CEPH_MDSMAP_H
+
+#include "types.h"
+
+/*
+ * mds map - describe servers in the mds cluster.
+ *
+ * we limit fields to those the client actually xcares about
+ */
+struct ceph_mds_info {
+       u64 global_id;
+       struct ceph_entity_addr addr;
+       s32 state;
+       int num_export_targets;
+       bool laggy;
+       u32 *export_targets;
+};
+
+struct ceph_mdsmap {
+       u32 m_epoch, m_client_epoch, m_last_failure;
+       u32 m_root;
+       u32 m_session_timeout;          /* seconds */
+       u32 m_session_autoclose;        /* seconds */
+       u64 m_max_file_size;
+       u32 m_max_mds;                  /* size of m_addr, m_state arrays */
+       struct ceph_mds_info *m_info;
+
+       /* which object pools file data can be stored in */
+       int m_num_data_pg_pools;
+       u32 *m_data_pg_pools;
+       u32 m_cas_pg_pool;
+};
+
+static inline struct ceph_entity_addr *
+ceph_mdsmap_get_addr(struct ceph_mdsmap *m, int w)
+{
+       if (w >= m->m_max_mds)
+               return NULL;
+       return &m->m_info[w].addr;
+}
+
+static inline int ceph_mdsmap_get_state(struct ceph_mdsmap *m, int w)
+{
+       BUG_ON(w < 0);
+       if (w >= m->m_max_mds)
+               return CEPH_MDS_STATE_DNE;
+       return m->m_info[w].state;
+}
+
+static inline bool ceph_mdsmap_is_laggy(struct ceph_mdsmap *m, int w)
+{
+       if (w >= 0 && w < m->m_max_mds)
+               return m->m_info[w].laggy;
+       return false;
+}
+
+extern int ceph_mdsmap_get_random_mds(struct ceph_mdsmap *m);
+extern struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end);
+extern void ceph_mdsmap_destroy(struct ceph_mdsmap *m);
+
+#endif
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h

new file mode 100644 (file)

index 0000000..5956d62
--- /dev/null
+++ b/include/linux/ceph/messenger.h
@@ -0,0 +1,261 @@
+#ifndef __FS_CEPH_MESSENGER_H
+#define __FS_CEPH_MESSENGER_H
+
+#include <linux/kref.h>
+#include <linux/mutex.h>
+#include <linux/net.h>
+#include <linux/radix-tree.h>
+#include <linux/uio.h>
+#include <linux/version.h>
+#include <linux/workqueue.h>
+
+#include "types.h"
+#include "buffer.h"
+
+struct ceph_msg;
+struct ceph_connection;
+
+extern struct workqueue_struct *ceph_msgr_wq;       /* receive work queue */
+
+/*
+ * Ceph defines these callbacks for handling connection events.
+ */
+struct ceph_connection_operations {
+       struct ceph_connection *(*get)(struct ceph_connection *);
+       void (*put)(struct ceph_connection *);
+
+       /* handle an incoming message. */
+       void (*dispatch) (struct ceph_connection *con, struct ceph_msg *m);
+
+       /* authorize an outgoing connection */
+       int (*get_authorizer) (struct ceph_connection *con,
+                              void **buf, int *len, int *proto,
+                              void **reply_buf, int *reply_len, int force_new);
+       int (*verify_authorizer_reply) (struct ceph_connection *con, int len);
+       int (*invalidate_authorizer)(struct ceph_connection *con);
+
+       /* protocol version mismatch */
+       void (*bad_proto) (struct ceph_connection *con);
+
+       /* there was some error on the socket (disconnect, whatever) */
+       void (*fault) (struct ceph_connection *con);
+
+       /* a remote host as terminated a message exchange session, and messages
+        * we sent (or they tried to send us) may be lost. */
+       void (*peer_reset) (struct ceph_connection *con);
+
+       struct ceph_msg * (*alloc_msg) (struct ceph_connection *con,
+                                       struct ceph_msg_header *hdr,
+                                       int *skip);
+};
+
+/* use format string %s%d */
+#define ENTITY_NAME(n) ceph_entity_type_name((n).type), le64_to_cpu((n).num)
+
+struct ceph_messenger {
+       struct ceph_entity_inst inst;    /* my name+address */
+       struct ceph_entity_addr my_enc_addr;
+       struct page *zero_page;          /* used in certain error cases */
+
+       bool nocrc;
+
+       /*
+        * the global_seq counts connections i (attempt to) initiate
+        * in order to disambiguate certain connect race conditions.
+        */
+       u32 global_seq;
+       spinlock_t global_seq_lock;
+
+       u32 supported_features;
+       u32 required_features;
+};
+
+/*
+ * a single message.  it contains a header (src, dest, message type, etc.),
+ * footer (crc values, mainly), a "front" message body, and possibly a
+ * data payload (stored in some number of pages).
+ */
+struct ceph_msg {
+       struct ceph_msg_header hdr;     /* header */
+       struct ceph_msg_footer footer;  /* footer */
+       struct kvec front;              /* unaligned blobs of message */
+       struct ceph_buffer *middle;
+       struct page **pages;            /* data payload.  NOT OWNER. */
+       unsigned nr_pages;              /* size of page array */
+       struct ceph_pagelist *pagelist; /* instead of pages */
+       struct list_head list_head;
+       struct kref kref;
+       struct bio  *bio;               /* instead of pages/pagelist */
+       struct bio  *bio_iter;          /* bio iterator */
+       int bio_seg;                    /* current bio segment */
+       struct ceph_pagelist *trail;    /* the trailing part of the data */
+       bool front_is_vmalloc;
+       bool more_to_follow;
+       bool needs_out_seq;
+       int front_max;
+
+       struct ceph_msgpool *pool;
+};
+
+struct ceph_msg_pos {
+       int page, page_pos;  /* which page; offset in page */
+       int data_pos;        /* offset in data payload */
+       int did_page_crc;    /* true if we've calculated crc for current page */
+};
+
+/* ceph connection fault delay defaults, for exponential backoff */
+#define BASE_DELAY_INTERVAL    (HZ/2)
+#define MAX_DELAY_INTERVAL     (5 * 60 * HZ)
+
+/*
+ * ceph_connection state bit flags
+ *
+ * QUEUED and BUSY are used together to ensure that only a single
+ * thread is currently opening, reading or writing data to the socket.
+ */
+#define LOSSYTX         0  /* we can close channel or drop messages on errors */
+#define CONNECTING     1
+#define NEGOTIATING    2
+#define KEEPALIVE_PENDING      3
+#define WRITE_PENDING  4  /* we have data ready to send */
+#define QUEUED          5  /* there is work queued on this connection */
+#define BUSY            6  /* work is being done */
+#define STANDBY                8  /* no outgoing messages, socket closed.  we keep
+                           * the ceph_connection around to maintain shared
+                           * state with the peer. */
+#define CLOSED         10 /* we've closed the connection */
+#define SOCK_CLOSED    11 /* socket state changed to closed */
+#define OPENING         13 /* open connection w/ (possibly new) peer */
+#define DEAD            14 /* dead, about to kfree */
+
+/*
+ * A single connection with another host.
+ *
+ * We maintain a queue of outgoing messages, and some session state to
+ * ensure that we can preserve the lossless, ordered delivery of
+ * messages in the case of a TCP disconnect.
+ */
+struct ceph_connection {
+       void *private;
+       atomic_t nref;
+
+       const struct ceph_connection_operations *ops;
+
+       struct ceph_messenger *msgr;
+       struct socket *sock;
+       unsigned long state;    /* connection state (see flags above) */
+       const char *error_msg;  /* error message, if any */
+
+       struct ceph_entity_addr peer_addr; /* peer address */
+       struct ceph_entity_name peer_name; /* peer name */
+       struct ceph_entity_addr peer_addr_for_me;
+       unsigned peer_features;
+       u32 connect_seq;      /* identify the most recent connection
+                                attempt for this connection, client */
+       u32 peer_global_seq;  /* peer's global seq for this connection */
+
+       int auth_retry;       /* true if we need a newer authorizer */
+       void *auth_reply_buf;   /* where to put the authorizer reply */
+       int auth_reply_buf_len;
+
+       struct mutex mutex;
+
+       /* out queue */
+       struct list_head out_queue;
+       struct list_head out_sent;   /* sending or sent but unacked */
+       u64 out_seq;                 /* last message queued for send */
+       bool out_keepalive_pending;
+
+       u64 in_seq, in_seq_acked;  /* last message received, acked */
+
+       /* connection negotiation temps */
+       char in_banner[CEPH_BANNER_MAX_LEN];
+       union {
+               struct {  /* outgoing connection */
+                       struct ceph_msg_connect out_connect;
+                       struct ceph_msg_connect_reply in_reply;
+               };
+               struct {  /* incoming */
+                       struct ceph_msg_connect in_connect;
+                       struct ceph_msg_connect_reply out_reply;
+               };
+       };
+       struct ceph_entity_addr actual_peer_addr;
+
+       /* message out temps */
+       struct ceph_msg *out_msg;        /* sending message (== tail of
+                                           out_sent) */
+       bool out_msg_done;
+       struct ceph_msg_pos out_msg_pos;
+
+       struct kvec out_kvec[8],         /* sending header/footer data */
+               *out_kvec_cur;
+       int out_kvec_left;   /* kvec's left in out_kvec */
+       int out_skip;        /* skip this many bytes */
+       int out_kvec_bytes;  /* total bytes left */
+       bool out_kvec_is_msg; /* kvec refers to out_msg */
+       int out_more;        /* there is more data after the kvecs */
+       __le64 out_temp_ack; /* for writing an ack */
+
+       /* message in temps */
+       struct ceph_msg_header in_hdr;
+       struct ceph_msg *in_msg;
+       struct ceph_msg_pos in_msg_pos;
+       u32 in_front_crc, in_middle_crc, in_data_crc;  /* calculated crc */
+
+       char in_tag;         /* protocol control byte */
+       int in_base_pos;     /* bytes read */
+       __le64 in_temp_ack;  /* for reading an ack */
+
+       struct delayed_work work;           /* send|recv work */
+       unsigned long       delay;          /* current delay interval */
+};
+
+
+extern const char *ceph_pr_addr(const struct sockaddr_storage *ss);
+extern int ceph_parse_ips(const char *c, const char *end,
+                         struct ceph_entity_addr *addr,
+                         int max_count, int *count);
+
+
+extern int ceph_msgr_init(void);
+extern void ceph_msgr_exit(void);
+extern void ceph_msgr_flush(void);
+
+extern struct ceph_messenger *ceph_messenger_create(
+       struct ceph_entity_addr *myaddr,
+       u32 features, u32 required);
+extern void ceph_messenger_destroy(struct ceph_messenger *);
+
+extern void ceph_con_init(struct ceph_messenger *msgr,
+                         struct ceph_connection *con);
+extern void ceph_con_open(struct ceph_connection *con,
+                         struct ceph_entity_addr *addr);
+extern bool ceph_con_opened(struct ceph_connection *con);
+extern void ceph_con_close(struct ceph_connection *con);
+extern void ceph_con_send(struct ceph_connection *con, struct ceph_msg *msg);
+extern void ceph_con_revoke(struct ceph_connection *con, struct ceph_msg *msg);
+extern void ceph_con_revoke_message(struct ceph_connection *con,
+                                 struct ceph_msg *msg);
+extern void ceph_con_keepalive(struct ceph_connection *con);
+extern struct ceph_connection *ceph_con_get(struct ceph_connection *con);
+extern void ceph_con_put(struct ceph_connection *con);
+
+extern struct ceph_msg *ceph_msg_new(int type, int front_len, gfp_t flags);
+extern void ceph_msg_kfree(struct ceph_msg *m);
+
+
+static inline struct ceph_msg *ceph_msg_get(struct ceph_msg *msg)
+{
+       kref_get(&msg->kref);
+       return msg;
+}
+extern void ceph_msg_last_put(struct kref *kref);
+static inline void ceph_msg_put(struct ceph_msg *msg)
+{
+       kref_put(&msg->kref, ceph_msg_last_put);
+}
+
+extern void ceph_msg_dump(struct ceph_msg *msg);
+
+#endif
diff --git a/include/linux/ceph/mon_client.h b/include/linux/ceph/mon_client.h

new file mode 100644 (file)

index 0000000..545f859
--- /dev/null
+++ b/include/linux/ceph/mon_client.h
@@ -0,0 +1,122 @@
+#ifndef _FS_CEPH_MON_CLIENT_H
+#define _FS_CEPH_MON_CLIENT_H
+
+#include <linux/completion.h>
+#include <linux/kref.h>
+#include <linux/rbtree.h>
+
+#include "messenger.h"
+
+struct ceph_client;
+struct ceph_mount_args;
+struct ceph_auth_client;
+
+/*
+ * The monitor map enumerates the set of all monitors.
+ */
+struct ceph_monmap {
+       struct ceph_fsid fsid;
+       u32 epoch;
+       u32 num_mon;
+       struct ceph_entity_inst mon_inst[0];
+};
+
+struct ceph_mon_client;
+struct ceph_mon_generic_request;
+
+
+/*
+ * Generic mechanism for resending monitor requests.
+ */
+typedef void (*ceph_monc_request_func_t)(struct ceph_mon_client *monc,
+                                        int newmon);
+
+/* a pending monitor request */
+struct ceph_mon_request {
+       struct ceph_mon_client *monc;
+       struct delayed_work delayed_work;
+       unsigned long delay;
+       ceph_monc_request_func_t do_request;
+};
+
+/*
+ * ceph_mon_generic_request is being used for the statfs and poolop requests
+ * which are bening done a bit differently because we need to get data back
+ * to the caller
+ */
+struct ceph_mon_generic_request {
+       struct kref kref;
+       u64 tid;
+       struct rb_node node;
+       int result;
+       void *buf;
+       int buf_len;
+       struct completion completion;
+       struct ceph_msg *request;  /* original request */
+       struct ceph_msg *reply;    /* and reply */
+};
+
+struct ceph_mon_client {
+       struct ceph_client *client;
+       struct ceph_monmap *monmap;
+
+       struct mutex mutex;
+       struct delayed_work delayed_work;
+
+       struct ceph_auth_client *auth;
+       struct ceph_msg *m_auth, *m_auth_reply, *m_subscribe, *m_subscribe_ack;
+       int pending_auth;
+
+       bool hunting;
+       int cur_mon;                       /* last monitor i contacted */
+       unsigned long sub_sent, sub_renew_after;
+       struct ceph_connection *con;
+       bool have_fsid;
+
+       /* pending generic requests */
+       struct rb_root generic_request_tree;
+       int num_generic_requests;
+       u64 last_tid;
+
+       /* mds/osd map */
+       int want_mdsmap;
+       int want_next_osdmap; /* 1 = want, 2 = want+asked */
+       u32 have_osdmap, have_mdsmap;
+
+#ifdef CONFIG_DEBUG_FS
+       struct dentry *debugfs_file;
+#endif
+};
+
+extern struct ceph_monmap *ceph_monmap_decode(void *p, void *end);
+extern int ceph_monmap_contains(struct ceph_monmap *m,
+                               struct ceph_entity_addr *addr);
+
+extern int ceph_monc_init(struct ceph_mon_client *monc, struct ceph_client *cl);
+extern void ceph_monc_stop(struct ceph_mon_client *monc);
+
+/*
+ * The model here is to indicate that we need a new map of at least
+ * epoch @want, and also call in when we receive a map.  We will
+ * periodically rerequest the map from the monitor cluster until we
+ * get what we want.
+ */
+extern int ceph_monc_got_mdsmap(struct ceph_mon_client *monc, u32 have);
+extern int ceph_monc_got_osdmap(struct ceph_mon_client *monc, u32 have);
+
+extern void ceph_monc_request_next_osdmap(struct ceph_mon_client *monc);
+
+extern int ceph_monc_do_statfs(struct ceph_mon_client *monc,
+                              struct ceph_statfs *buf);
+
+extern int ceph_monc_open_session(struct ceph_mon_client *monc);
+
+extern int ceph_monc_validate_auth(struct ceph_mon_client *monc);
+
+extern int ceph_monc_create_snapid(struct ceph_mon_client *monc,
+                                  u32 pool, u64 *snapid);
+
+extern int ceph_monc_delete_snapid(struct ceph_mon_client *monc,
+                                  u32 pool, u64 snapid);
+
+#endif
diff --git a/include/linux/ceph/msgpool.h b/include/linux/ceph/msgpool.h

new file mode 100644 (file)

index 0000000..a362605
--- /dev/null
+++ b/include/linux/ceph/msgpool.h
@@ -0,0 +1,25 @@
+#ifndef _FS_CEPH_MSGPOOL
+#define _FS_CEPH_MSGPOOL
+
+#include <linux/mempool.h>
+#include "messenger.h"
+
+/*
+ * we use memory pools for preallocating messages we may receive, to
+ * avoid unexpected OOM conditions.
+ */
+struct ceph_msgpool {
+       const char *name;
+       mempool_t *pool;
+       int front_len;          /* preallocated payload size */
+};
+
+extern int ceph_msgpool_init(struct ceph_msgpool *pool,
+                            int front_len, int size, bool blocking,
+                            const char *name);
+extern void ceph_msgpool_destroy(struct ceph_msgpool *pool);
+extern struct ceph_msg *ceph_msgpool_get(struct ceph_msgpool *,
+                                        int front_len);
+extern void ceph_msgpool_put(struct ceph_msgpool *, struct ceph_msg *);
+
+#endif
diff --git a/include/linux/ceph/msgr.h b/include/linux/ceph/msgr.h

new file mode 100644 (file)

index 0000000..680d3d6
--- /dev/null
+++ b/include/linux/ceph/msgr.h
@@ -0,0 +1,175 @@
+#ifndef CEPH_MSGR_H
+#define CEPH_MSGR_H
+
+/*
+ * Data types for message passing layer used by Ceph.
+ */
+
+#define CEPH_MON_PORT    6789  /* default monitor port */
+
+/*
+ * client-side processes will try to bind to ports in this
+ * range, simply for the benefit of tools like nmap or wireshark
+ * that would like to identify the protocol.
+ */
+#define CEPH_PORT_FIRST  6789
+#define CEPH_PORT_START  6800  /* non-monitors start here */
+#define CEPH_PORT_LAST   6900
+
+/*
+ * tcp connection banner.  include a protocol version. and adjust
+ * whenever the wire protocol changes.  try to keep this string length
+ * constant.
+ */
+#define CEPH_BANNER "ceph v027"
+#define CEPH_BANNER_MAX_LEN 30
+
+
+/*
+ * Rollover-safe type and comparator for 32-bit sequence numbers.
+ * Comparator returns -1, 0, or 1.
+ */
+typedef __u32 ceph_seq_t;
+
+static inline __s32 ceph_seq_cmp(__u32 a, __u32 b)
+{
+       return (__s32)a - (__s32)b;
+}
+
+
+/*
+ * entity_name -- logical name for a process participating in the
+ * network, e.g. 'mds0' or 'osd3'.
+ */
+struct ceph_entity_name {
+       __u8 type;      /* CEPH_ENTITY_TYPE_* */
+       __le64 num;
+} __attribute__ ((packed));
+
+#define CEPH_ENTITY_TYPE_MON    0x01
+#define CEPH_ENTITY_TYPE_MDS    0x02
+#define CEPH_ENTITY_TYPE_OSD    0x04
+#define CEPH_ENTITY_TYPE_CLIENT 0x08
+#define CEPH_ENTITY_TYPE_AUTH   0x20
+
+#define CEPH_ENTITY_TYPE_ANY    0xFF
+
+extern const char *ceph_entity_type_name(int type);
+
+/*
+ * entity_addr -- network address
+ */
+struct ceph_entity_addr {
+       __le32 type;
+       __le32 nonce;  /* unique id for process (e.g. pid) */
+       struct sockaddr_storage in_addr;
+} __attribute__ ((packed));
+
+struct ceph_entity_inst {
+       struct ceph_entity_name name;
+       struct ceph_entity_addr addr;
+} __attribute__ ((packed));
+
+
+/* used by message exchange protocol */
+#define CEPH_MSGR_TAG_READY         1  /* server->client: ready for messages */
+#define CEPH_MSGR_TAG_RESETSESSION  2  /* server->client: reset, try again */
+#define CEPH_MSGR_TAG_WAIT          3  /* server->client: wait for racing
+                                         incoming connection */
+#define CEPH_MSGR_TAG_RETRY_SESSION 4  /* server->client + cseq: try again
+                                         with higher cseq */
+#define CEPH_MSGR_TAG_RETRY_GLOBAL  5  /* server->client + gseq: try again
+                                         with higher gseq */
+#define CEPH_MSGR_TAG_CLOSE         6  /* closing pipe */
+#define CEPH_MSGR_TAG_MSG           7  /* message */
+#define CEPH_MSGR_TAG_ACK           8  /* message ack */
+#define CEPH_MSGR_TAG_KEEPALIVE     9  /* just a keepalive byte! */
+#define CEPH_MSGR_TAG_BADPROTOVER  10  /* bad protocol version */
+#define CEPH_MSGR_TAG_BADAUTHORIZER 11 /* bad authorizer */
+#define CEPH_MSGR_TAG_FEATURES      12 /* insufficient features */
+
+
+/*
+ * connection negotiation
+ */
+struct ceph_msg_connect {
+       __le64 features;     /* supported feature bits */
+       __le32 host_type;    /* CEPH_ENTITY_TYPE_* */
+       __le32 global_seq;   /* count connections initiated by this host */
+       __le32 connect_seq;  /* count connections initiated in this session */
+       __le32 protocol_version;
+       __le32 authorizer_protocol;
+       __le32 authorizer_len;
+       __u8  flags;         /* CEPH_MSG_CONNECT_* */
+} __attribute__ ((packed));
+
+struct ceph_msg_connect_reply {
+       __u8 tag;
+       __le64 features;     /* feature bits for this session */
+       __le32 global_seq;
+       __le32 connect_seq;
+       __le32 protocol_version;
+       __le32 authorizer_len;
+       __u8 flags;
+} __attribute__ ((packed));
+
+#define CEPH_MSG_CONNECT_LOSSY  1  /* messages i send may be safely dropped */
+
+
+/*
+ * message header
+ */
+struct ceph_msg_header_old {
+       __le64 seq;       /* message seq# for this session */
+       __le64 tid;       /* transaction id */
+       __le16 type;      /* message type */
+       __le16 priority;  /* priority.  higher value == higher priority */
+       __le16 version;   /* version of message encoding */
+
+       __le32 front_len; /* bytes in main payload */
+       __le32 middle_len;/* bytes in middle payload */
+       __le32 data_len;  /* bytes of data payload */
+       __le16 data_off;  /* sender: include full offset;
+                            receiver: mask against ~PAGE_MASK */
+
+       struct ceph_entity_inst src, orig_src;
+       __le32 reserved;
+       __le32 crc;       /* header crc32c */
+} __attribute__ ((packed));
+
+struct ceph_msg_header {
+       __le64 seq;       /* message seq# for this session */
+       __le64 tid;       /* transaction id */
+       __le16 type;      /* message type */
+       __le16 priority;  /* priority.  higher value == higher priority */
+       __le16 version;   /* version of message encoding */
+
+       __le32 front_len; /* bytes in main payload */
+       __le32 middle_len;/* bytes in middle payload */
+       __le32 data_len;  /* bytes of data payload */
+       __le16 data_off;  /* sender: include full offset;
+                            receiver: mask against ~PAGE_MASK */
+
+       struct ceph_entity_name src;
+       __le32 reserved;
+       __le32 crc;       /* header crc32c */
+} __attribute__ ((packed));
+
+#define CEPH_MSG_PRIO_LOW     64
+#define CEPH_MSG_PRIO_DEFAULT 127
+#define CEPH_MSG_PRIO_HIGH    196
+#define CEPH_MSG_PRIO_HIGHEST 255
+
+/*
+ * follows data payload
+ */
+struct ceph_msg_footer {
+       __le32 front_crc, middle_crc, data_crc;
+       __u8 flags;
+} __attribute__ ((packed));
+
+#define CEPH_MSG_FOOTER_COMPLETE  (1<<0)   /* msg wasn't aborted */
+#define CEPH_MSG_FOOTER_NOCRC     (1<<1)   /* no data crc */
+
+
+#endif
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h

new file mode 100644 (file)

index 0000000..6c91fb0
--- /dev/null
+++ b/include/linux/ceph/osd_client.h
@@ -0,0 +1,234 @@
+#ifndef _FS_CEPH_OSD_CLIENT_H
+#define _FS_CEPH_OSD_CLIENT_H
+
+#include <linux/completion.h>
+#include <linux/kref.h>
+#include <linux/mempool.h>
+#include <linux/rbtree.h>
+
+#include "types.h"
+#include "osdmap.h"
+#include "messenger.h"
+
+struct ceph_msg;
+struct ceph_snap_context;
+struct ceph_osd_request;
+struct ceph_osd_client;
+struct ceph_authorizer;
+struct ceph_pagelist;
+
+/*
+ * completion callback for async writepages
+ */
+typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *,
+                                    struct ceph_msg *);
+
+/* a given osd we're communicating with */
+struct ceph_osd {
+       atomic_t o_ref;
+       struct ceph_osd_client *o_osdc;
+       int o_osd;
+       int o_incarnation;
+       struct rb_node o_node;
+       struct ceph_connection o_con;
+       struct list_head o_requests;
+       struct list_head o_osd_lru;
+       struct ceph_authorizer *o_authorizer;
+       void *o_authorizer_buf, *o_authorizer_reply_buf;
+       size_t o_authorizer_buf_len, o_authorizer_reply_buf_len;
+       unsigned long lru_ttl;
+       int o_marked_for_keepalive;
+       struct list_head o_keepalive_item;
+};
+
+/* an in-flight request */
+struct ceph_osd_request {
+       u64             r_tid;              /* unique for this client */
+       struct rb_node  r_node;
+       struct list_head r_req_lru_item;
+       struct list_head r_osd_item;
+       struct ceph_osd *r_osd;
+       struct ceph_pg   r_pgid;
+       int              r_pg_osds[CEPH_PG_MAX_SIZE];
+       int              r_num_pg_osds;
+
+       struct ceph_connection *r_con_filling_msg;
+
+       struct ceph_msg  *r_request, *r_reply;
+       int               r_result;
+       int               r_flags;     /* any additional flags for the osd */
+       u32               r_sent;      /* >0 if r_request is sending/sent */
+       int               r_got_reply;
+
+       struct ceph_osd_client *r_osdc;
+       struct kref       r_kref;
+       bool              r_mempool;
+       struct completion r_completion, r_safe_completion;
+       ceph_osdc_callback_t r_callback, r_safe_callback;
+       struct ceph_eversion r_reassert_version;
+       struct list_head  r_unsafe_item;
+
+       struct inode *r_inode;                /* for use by callbacks */
+       void *r_priv;                         /* ditto */
+
+       char              r_oid[40];          /* object name */
+       int               r_oid_len;
+       unsigned long     r_stamp;            /* send OR check time */
+       bool              r_resend;           /* msg send failed, needs retry */
+
+       struct ceph_file_layout r_file_layout;
+       struct ceph_snap_context *r_snapc;    /* snap context for writes */
+       unsigned          r_num_pages;        /* size of page array (follows) */
+       struct page     **r_pages;            /* pages for data payload */
+       int               r_pages_from_pool;
+       int               r_own_pages;        /* if true, i own page list */
+#ifdef CONFIG_BLOCK
+       struct bio       *r_bio;              /* instead of pages */
+#endif
+
+       struct ceph_pagelist *r_trail;        /* trailing part of the data */
+};
+
+struct ceph_osd_client {
+       struct ceph_client     *client;
+
+       struct ceph_osdmap     *osdmap;       /* current map */
+       struct rw_semaphore    map_sem;
+       struct completion      map_waiters;
+       u64                    last_requested_map;
+
+       struct mutex           request_mutex;
+       struct rb_root         osds;          /* osds */
+       struct list_head       osd_lru;       /* idle osds */
+       u64                    timeout_tid;   /* tid of timeout triggering rq */
+       u64                    last_tid;      /* tid of last request */
+       struct rb_root         requests;      /* pending requests */
+       struct list_head       req_lru;       /* pending requests lru */
+       int                    num_requests;
+       struct delayed_work    timeout_work;
+       struct delayed_work    osds_timeout_work;
+#ifdef CONFIG_DEBUG_FS
+       struct dentry          *debugfs_file;
+#endif
+
+       mempool_t              *req_mempool;
+
+       struct ceph_msgpool     msgpool_op;
+       struct ceph_msgpool     msgpool_op_reply;
+};
+
+struct ceph_osd_req_op {
+       u16 op;           /* CEPH_OSD_OP_* */
+       u32 flags;        /* CEPH_OSD_FLAG_* */
+       union {
+               struct {
+                       u64 offset, length;
+                       u64 truncate_size;
+                       u32 truncate_seq;
+               } extent;
+               struct {
+                       const char *name;
+                       u32 name_len;
+                       const char  *val;
+                       u32 value_len;
+                       __u8 cmp_op;       /* CEPH_OSD_CMPXATTR_OP_* */
+                       __u8 cmp_mode;     /* CEPH_OSD_CMPXATTR_MODE_* */
+               } xattr;
+               struct {
+                       const char *class_name;
+                       __u8 class_len;
+                       const char *method_name;
+                       __u8 method_len;
+                       __u8 argc;
+                       const char *indata;
+                       u32 indata_len;
+               } cls;
+               struct {
+                       u64 cookie, count;
+               } pgls;
+               struct {
+                       u64 snapid;
+               } snap;
+       };
+       u32 payload_len;
+};
+
+extern int ceph_osdc_init(struct ceph_osd_client *osdc,
+                         struct ceph_client *client);
+extern void ceph_osdc_stop(struct ceph_osd_client *osdc);
+
+extern void ceph_osdc_handle_reply(struct ceph_osd_client *osdc,
+                                  struct ceph_msg *msg);
+extern void ceph_osdc_handle_map(struct ceph_osd_client *osdc,
+                                struct ceph_msg *msg);
+
+extern void ceph_calc_raw_layout(struct ceph_osd_client *osdc,
+                       struct ceph_file_layout *layout,
+                       u64 snapid,
+                       u64 off, u64 *plen, u64 *bno,
+                       struct ceph_osd_request *req,
+                       struct ceph_osd_req_op *op);
+
+extern struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
+                                              int flags,
+                                              struct ceph_snap_context *snapc,
+                                              struct ceph_osd_req_op *ops,
+                                              bool use_mempool,
+                                              gfp_t gfp_flags,
+                                              struct page **pages,
+                                              struct bio *bio);
+
+extern void ceph_osdc_build_request(struct ceph_osd_request *req,
+                                   u64 off, u64 *plen,
+                                   struct ceph_osd_req_op *src_ops,
+                                   struct ceph_snap_context *snapc,
+                                   struct timespec *mtime,
+                                   const char *oid,
+                                   int oid_len);
+
+extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *,
+                                     struct ceph_file_layout *layout,
+                                     struct ceph_vino vino,
+                                     u64 offset, u64 *len, int op, int flags,
+                                     struct ceph_snap_context *snapc,
+                                     int do_sync, u32 truncate_seq,
+                                     u64 truncate_size,
+                                     struct timespec *mtime,
+                                     bool use_mempool, int num_reply);
+
+static inline void ceph_osdc_get_request(struct ceph_osd_request *req)
+{
+       kref_get(&req->r_kref);
+}
+extern void ceph_osdc_release_request(struct kref *kref);
+static inline void ceph_osdc_put_request(struct ceph_osd_request *req)
+{
+       kref_put(&req->r_kref, ceph_osdc_release_request);
+}
+
+extern int ceph_osdc_start_request(struct ceph_osd_client *osdc,
+                                  struct ceph_osd_request *req,
+                                  bool nofail);
+extern int ceph_osdc_wait_request(struct ceph_osd_client *osdc,
+                                 struct ceph_osd_request *req);
+extern void ceph_osdc_sync(struct ceph_osd_client *osdc);
+
+extern int ceph_osdc_readpages(struct ceph_osd_client *osdc,
+                              struct ceph_vino vino,
+                              struct ceph_file_layout *layout,
+                              u64 off, u64 *plen,
+                              u32 truncate_seq, u64 truncate_size,
+                              struct page **pages, int nr_pages);
+
+extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
+                               struct ceph_vino vino,
+                               struct ceph_file_layout *layout,
+                               struct ceph_snap_context *sc,
+                               u64 off, u64 len,
+                               u32 truncate_seq, u64 truncate_size,
+                               struct timespec *mtime,
+                               struct page **pages, int nr_pages,
+                               int flags, int do_sync, bool nofail);
+
+#endif
+
diff --git a/include/linux/ceph/osdmap.h b/include/linux/ceph/osdmap.h

new file mode 100644 (file)

index 0000000..ba4c205
--- /dev/null
+++ b/include/linux/ceph/osdmap.h
@@ -0,0 +1,130 @@
+#ifndef _FS_CEPH_OSDMAP_H
+#define _FS_CEPH_OSDMAP_H
+
+#include <linux/rbtree.h>
+#include "types.h"
+#include "ceph_fs.h"
+#include <linux/crush/crush.h>
+
+/*
+ * The osd map describes the current membership of the osd cluster and
+ * specifies the mapping of objects to placement groups and placement
+ * groups to (sets of) osds.  That is, it completely specifies the
+ * (desired) distribution of all data objects in the system at some
+ * point in time.
+ *
+ * Each map version is identified by an epoch, which increases monotonically.
+ *
+ * The map can be updated either via an incremental map (diff) describing
+ * the change between two successive epochs, or as a fully encoded map.
+ */
+struct ceph_pg_pool_info {
+       struct rb_node node;
+       int id;
+       struct ceph_pg_pool v;
+       int pg_num_mask, pgp_num_mask, lpg_num_mask, lpgp_num_mask;
+       char *name;
+};
+
+struct ceph_pg_mapping {
+       struct rb_node node;
+       struct ceph_pg pgid;
+       int len;
+       int osds[];
+};
+
+struct ceph_osdmap {
+       struct ceph_fsid fsid;
+       u32 epoch;
+       u32 mkfs_epoch;
+       struct ceph_timespec created, modified;
+
+       u32 flags;         /* CEPH_OSDMAP_* */
+
+       u32 max_osd;       /* size of osd_state, _offload, _addr arrays */
+       u8 *osd_state;     /* CEPH_OSD_* */
+       u32 *osd_weight;   /* 0 = failed, 0x10000 = 100% normal */
+       struct ceph_entity_addr *osd_addr;
+
+       struct rb_root pg_temp;
+       struct rb_root pg_pools;
+       u32 pool_max;
+
+       /* the CRUSH map specifies the mapping of placement groups to
+        * the list of osds that store+replicate them. */
+       struct crush_map *crush;
+};
+
+/*
+ * file layout helpers
+ */
+#define ceph_file_layout_su(l) ((__s32)le32_to_cpu((l).fl_stripe_unit))
+#define ceph_file_layout_stripe_count(l) \
+       ((__s32)le32_to_cpu((l).fl_stripe_count))
+#define ceph_file_layout_object_size(l) ((__s32)le32_to_cpu((l).fl_object_size))
+#define ceph_file_layout_cas_hash(l) ((__s32)le32_to_cpu((l).fl_cas_hash))
+#define ceph_file_layout_object_su(l) \
+       ((__s32)le32_to_cpu((l).fl_object_stripe_unit))
+#define ceph_file_layout_pg_preferred(l) \
+       ((__s32)le32_to_cpu((l).fl_pg_preferred))
+#define ceph_file_layout_pg_pool(l) \
+       ((__s32)le32_to_cpu((l).fl_pg_pool))
+
+static inline unsigned ceph_file_layout_stripe_width(struct ceph_file_layout *l)
+{
+       return le32_to_cpu(l->fl_stripe_unit) *
+               le32_to_cpu(l->fl_stripe_count);
+}
+
+/* "period" == bytes before i start on a new set of objects */
+static inline unsigned ceph_file_layout_period(struct ceph_file_layout *l)
+{
+       return le32_to_cpu(l->fl_object_size) *
+               le32_to_cpu(l->fl_stripe_count);
+}
+
+
+static inline int ceph_osd_is_up(struct ceph_osdmap *map, int osd)
+{
+       return (osd < map->max_osd) && (map->osd_state[osd] & CEPH_OSD_UP);
+}
+
+static inline bool ceph_osdmap_flag(struct ceph_osdmap *map, int flag)
+{
+       return map && (map->flags & flag);
+}
+
+extern char *ceph_osdmap_state_str(char *str, int len, int state);
+
+static inline struct ceph_entity_addr *ceph_osd_addr(struct ceph_osdmap *map,
+                                                    int osd)
+{
+       if (osd >= map->max_osd)
+               return NULL;
+       return &map->osd_addr[osd];
+}
+
+extern struct ceph_osdmap *osdmap_decode(void **p, void *end);
+extern struct ceph_osdmap *osdmap_apply_incremental(void **p, void *end,
+                                           struct ceph_osdmap *map,
+                                           struct ceph_messenger *msgr);
+extern void ceph_osdmap_destroy(struct ceph_osdmap *map);
+
+/* calculate mapping of a file extent to an object */
+extern void ceph_calc_file_object_mapping(struct ceph_file_layout *layout,
+                                         u64 off, u64 *plen,
+                                         u64 *bno, u64 *oxoff, u64 *oxlen);
+
+/* calculate mapping of object to a placement group */
+extern int ceph_calc_object_layout(struct ceph_object_layout *ol,
+                                  const char *oid,
+                                  struct ceph_file_layout *fl,
+                                  struct ceph_osdmap *osdmap);
+extern int ceph_calc_pg_acting(struct ceph_osdmap *osdmap, struct ceph_pg pgid,
+                              int *acting);
+extern int ceph_calc_pg_primary(struct ceph_osdmap *osdmap,
+                               struct ceph_pg pgid);
+
+extern int ceph_pg_poolid_by_name(struct ceph_osdmap *map, const char *name);
+
+#endif
diff --git a/include/linux/ceph/pagelist.h b/include/linux/ceph/pagelist.h

new file mode 100644 (file)

index 0000000..9660d6b
--- /dev/null
+++ b/include/linux/ceph/pagelist.h
@@ -0,0 +1,75 @@
+#ifndef __FS_CEPH_PAGELIST_H
+#define __FS_CEPH_PAGELIST_H
+
+#include <linux/list.h>
+
+struct ceph_pagelist {
+       struct list_head head;
+       void *mapped_tail;
+       size_t length;
+       size_t room;
+       struct list_head free_list;
+       size_t num_pages_free;
+};
+
+struct ceph_pagelist_cursor {
+       struct ceph_pagelist *pl;   /* pagelist, for error checking */
+       struct list_head *page_lru; /* page in list */
+       size_t room;                /* room remaining to reset to */
+};
+
+static inline void ceph_pagelist_init(struct ceph_pagelist *pl)
+{
+       INIT_LIST_HEAD(&pl->head);
+       pl->mapped_tail = NULL;
+       pl->length = 0;
+       pl->room = 0;
+       INIT_LIST_HEAD(&pl->free_list);
+       pl->num_pages_free = 0;
+}
+
+extern int ceph_pagelist_release(struct ceph_pagelist *pl);
+
+extern int ceph_pagelist_append(struct ceph_pagelist *pl, const void *d, size_t l);
+
+extern int ceph_pagelist_reserve(struct ceph_pagelist *pl, size_t space);
+
+extern int ceph_pagelist_free_reserve(struct ceph_pagelist *pl);
+
+extern void ceph_pagelist_set_cursor(struct ceph_pagelist *pl,
+                                    struct ceph_pagelist_cursor *c);
+
+extern int ceph_pagelist_truncate(struct ceph_pagelist *pl,
+                                 struct ceph_pagelist_cursor *c);
+
+static inline int ceph_pagelist_encode_64(struct ceph_pagelist *pl, u64 v)
+{
+       __le64 ev = cpu_to_le64(v);
+       return ceph_pagelist_append(pl, &ev, sizeof(ev));
+}
+static inline int ceph_pagelist_encode_32(struct ceph_pagelist *pl, u32 v)
+{
+       __le32 ev = cpu_to_le32(v);
+       return ceph_pagelist_append(pl, &ev, sizeof(ev));
+}
+static inline int ceph_pagelist_encode_16(struct ceph_pagelist *pl, u16 v)
+{
+       __le16 ev = cpu_to_le16(v);
+       return ceph_pagelist_append(pl, &ev, sizeof(ev));
+}
+static inline int ceph_pagelist_encode_8(struct ceph_pagelist *pl, u8 v)
+{
+       return ceph_pagelist_append(pl, &v, 1);
+}
+static inline int ceph_pagelist_encode_string(struct ceph_pagelist *pl,
+                                             char *s, size_t len)
+{
+       int ret = ceph_pagelist_encode_32(pl, len);
+       if (ret)
+               return ret;
+       if (len)
+               return ceph_pagelist_append(pl, s, len);
+       return 0;
+}
+
+#endif
diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h

new file mode 100644 (file)

index 0000000..6d5247f
--- /dev/null
+++ b/include/linux/ceph/rados.h
@@ -0,0 +1,405 @@
+#ifndef CEPH_RADOS_H
+#define CEPH_RADOS_H
+
+/*
+ * Data types for the Ceph distributed object storage layer RADOS
+ * (Reliable Autonomic Distributed Object Store).
+ */
+
+#include "msgr.h"
+
+/*
+ * osdmap encoding versions
+ */
+#define CEPH_OSDMAP_INC_VERSION     5
+#define CEPH_OSDMAP_INC_VERSION_EXT 5
+#define CEPH_OSDMAP_VERSION         5
+#define CEPH_OSDMAP_VERSION_EXT     5
+
+/*
+ * fs id
+ */
+struct ceph_fsid {
+       unsigned char fsid[16];
+};
+
+static inline int ceph_fsid_compare(const struct ceph_fsid *a,
+                                   const struct ceph_fsid *b)
+{
+       return memcmp(a, b, sizeof(*a));
+}
+
+/*
+ * ino, object, etc.
+ */
+typedef __le64 ceph_snapid_t;
+#define CEPH_SNAPDIR ((__u64)(-1))  /* reserved for hidden .snap dir */
+#define CEPH_NOSNAP  ((__u64)(-2))  /* "head", "live" revision */
+#define CEPH_MAXSNAP ((__u64)(-3))  /* largest valid snapid */
+
+struct ceph_timespec {
+       __le32 tv_sec;
+       __le32 tv_nsec;
+} __attribute__ ((packed));
+
+
+/*
+ * object layout - how objects are mapped into PGs
+ */
+#define CEPH_OBJECT_LAYOUT_HASH     1
+#define CEPH_OBJECT_LAYOUT_LINEAR   2
+#define CEPH_OBJECT_LAYOUT_HASHINO  3
+
+/*
+ * pg layout -- how PGs are mapped onto (sets of) OSDs
+ */
+#define CEPH_PG_LAYOUT_CRUSH  0
+#define CEPH_PG_LAYOUT_HASH   1
+#define CEPH_PG_LAYOUT_LINEAR 2
+#define CEPH_PG_LAYOUT_HYBRID 3
+
+#define CEPH_PG_MAX_SIZE      16  /* max # osds in a single pg */
+
+/*
+ * placement group.
+ * we encode this into one __le64.
+ */
+struct ceph_pg {
+       __le16 preferred; /* preferred primary osd */
+       __le16 ps;        /* placement seed */
+       __le32 pool;      /* object pool */
+} __attribute__ ((packed));
+
+/*
+ * pg_pool is a set of pgs storing a pool of objects
+ *
+ *  pg_num -- base number of pseudorandomly placed pgs
+ *
+ *  pgp_num -- effective number when calculating pg placement.  this
+ * is used for pg_num increases.  new pgs result in data being "split"
+ * into new pgs.  for this to proceed smoothly, new pgs are intiially
+ * colocated with their parents; that is, pgp_num doesn't increase
+ * until the new pgs have successfully split.  only _then_ are the new
+ * pgs placed independently.
+ *
+ *  lpg_num -- localized pg count (per device).  replicas are randomly
+ * selected.
+ *
+ *  lpgp_num -- as above.
+ */
+#define CEPH_PG_TYPE_REP     1
+#define CEPH_PG_TYPE_RAID4   2
+#define CEPH_PG_POOL_VERSION 2
+struct ceph_pg_pool {
+       __u8 type;                /* CEPH_PG_TYPE_* */
+       __u8 size;                /* number of osds in each pg */
+       __u8 crush_ruleset;       /* crush placement rule */
+       __u8 object_hash;         /* hash mapping object name to ps */
+       __le32 pg_num, pgp_num;   /* number of pg's */
+       __le32 lpg_num, lpgp_num; /* number of localized pg's */
+       __le32 last_change;       /* most recent epoch changed */
+       __le64 snap_seq;          /* seq for per-pool snapshot */
+       __le32 snap_epoch;        /* epoch of last snap */
+       __le32 num_snaps;
+       __le32 num_removed_snap_intervals; /* if non-empty, NO per-pool snaps */
+       __le64 auid;               /* who owns the pg */
+} __attribute__ ((packed));
+
+/*
+ * stable_mod func is used to control number of placement groups.
+ * similar to straight-up modulo, but produces a stable mapping as b
+ * increases over time.  b is the number of bins, and bmask is the
+ * containing power of 2 minus 1.
+ *
+ * b <= bmask and bmask=(2**n)-1
+ * e.g., b=12 -> bmask=15, b=123 -> bmask=127
+ */
+static inline int ceph_stable_mod(int x, int b, int bmask)
+{
+       if ((x & bmask) < b)
+               return x & bmask;
+       else
+               return x & (bmask >> 1);
+}
+
+/*
+ * object layout - how a given object should be stored.
+ */
+struct ceph_object_layout {
+       struct ceph_pg ol_pgid;   /* raw pg, with _full_ ps precision. */
+       __le32 ol_stripe_unit;    /* for per-object parity, if any */
+} __attribute__ ((packed));
+
+/*
+ * compound epoch+version, used by storage layer to serialize mutations
+ */
+struct ceph_eversion {
+       __le32 epoch;
+       __le64 version;
+} __attribute__ ((packed));
+
+/*
+ * osd map bits
+ */
+
+/* status bits */
+#define CEPH_OSD_EXISTS 1
+#define CEPH_OSD_UP     2
+
+/* osd weights.  fixed point value: 0x10000 == 1.0 ("in"), 0 == "out" */
+#define CEPH_OSD_IN  0x10000
+#define CEPH_OSD_OUT 0
+
+
+/*
+ * osd map flag bits
+ */
+#define CEPH_OSDMAP_NEARFULL (1<<0)  /* sync writes (near ENOSPC) */
+#define CEPH_OSDMAP_FULL     (1<<1)  /* no data writes (ENOSPC) */
+#define CEPH_OSDMAP_PAUSERD  (1<<2)  /* pause all reads */
+#define CEPH_OSDMAP_PAUSEWR  (1<<3)  /* pause all writes */
+#define CEPH_OSDMAP_PAUSEREC (1<<4)  /* pause recovery */
+
+/*
+ * osd ops
+ */
+#define CEPH_OSD_OP_MODE       0xf000
+#define CEPH_OSD_OP_MODE_RD    0x1000
+#define CEPH_OSD_OP_MODE_WR    0x2000
+#define CEPH_OSD_OP_MODE_RMW   0x3000
+#define CEPH_OSD_OP_MODE_SUB   0x4000
+
+#define CEPH_OSD_OP_TYPE       0x0f00
+#define CEPH_OSD_OP_TYPE_LOCK  0x0100
+#define CEPH_OSD_OP_TYPE_DATA  0x0200
+#define CEPH_OSD_OP_TYPE_ATTR  0x0300
+#define CEPH_OSD_OP_TYPE_EXEC  0x0400
+#define CEPH_OSD_OP_TYPE_PG    0x0500
+
+enum {
+       /** data **/
+       /* read */
+       CEPH_OSD_OP_READ      = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 1,
+       CEPH_OSD_OP_STAT      = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 2,
+
+       /* fancy read */
+       CEPH_OSD_OP_MASKTRUNC = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 4,
+
+       /* write */
+       CEPH_OSD_OP_WRITE     = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 1,
+       CEPH_OSD_OP_WRITEFULL = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 2,
+       CEPH_OSD_OP_TRUNCATE  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 3,
+       CEPH_OSD_OP_ZERO      = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 4,
+       CEPH_OSD_OP_DELETE    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 5,
+
+       /* fancy write */
+       CEPH_OSD_OP_APPEND    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 6,
+       CEPH_OSD_OP_STARTSYNC = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 7,
+       CEPH_OSD_OP_SETTRUNC  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 8,
+       CEPH_OSD_OP_TRIMTRUNC = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 9,
+
+       CEPH_OSD_OP_TMAPUP  = CEPH_OSD_OP_MODE_RMW | CEPH_OSD_OP_TYPE_DATA | 10,
+       CEPH_OSD_OP_TMAPPUT = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 11,
+       CEPH_OSD_OP_TMAPGET = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_DATA | 12,
+
+       CEPH_OSD_OP_CREATE  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 13,
+       CEPH_OSD_OP_ROLLBACK= CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_DATA | 14,
+
+       /** attrs **/
+       /* read */
+       CEPH_OSD_OP_GETXATTR  = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_ATTR | 1,
+       CEPH_OSD_OP_GETXATTRS = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_ATTR | 2,
+       CEPH_OSD_OP_CMPXATTR  = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_ATTR | 3,
+
+       /* write */
+       CEPH_OSD_OP_SETXATTR  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_ATTR | 1,
+       CEPH_OSD_OP_SETXATTRS = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_ATTR | 2,
+       CEPH_OSD_OP_RESETXATTRS = CEPH_OSD_OP_MODE_WR|CEPH_OSD_OP_TYPE_ATTR | 3,
+       CEPH_OSD_OP_RMXATTR   = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_ATTR | 4,
+
+       /** subop **/
+       CEPH_OSD_OP_PULL           = CEPH_OSD_OP_MODE_SUB | 1,
+       CEPH_OSD_OP_PUSH           = CEPH_OSD_OP_MODE_SUB | 2,
+       CEPH_OSD_OP_BALANCEREADS   = CEPH_OSD_OP_MODE_SUB | 3,
+       CEPH_OSD_OP_UNBALANCEREADS = CEPH_OSD_OP_MODE_SUB | 4,
+       CEPH_OSD_OP_SCRUB          = CEPH_OSD_OP_MODE_SUB | 5,
+
+       /** lock **/
+       CEPH_OSD_OP_WRLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 1,
+       CEPH_OSD_OP_WRUNLOCK  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 2,
+       CEPH_OSD_OP_RDLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 3,
+       CEPH_OSD_OP_RDUNLOCK  = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 4,
+       CEPH_OSD_OP_UPLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 5,
+       CEPH_OSD_OP_DNLOCK    = CEPH_OSD_OP_MODE_WR | CEPH_OSD_OP_TYPE_LOCK | 6,
+
+       /** exec **/
+       CEPH_OSD_OP_CALL    = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_EXEC | 1,
+
+       /** pg **/
+       CEPH_OSD_OP_PGLS      = CEPH_OSD_OP_MODE_RD | CEPH_OSD_OP_TYPE_PG | 1,
+};
+
+static inline int ceph_osd_op_type_lock(int op)
+{
+       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_LOCK;
+}
+static inline int ceph_osd_op_type_data(int op)
+{
+       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_DATA;
+}
+static inline int ceph_osd_op_type_attr(int op)
+{
+       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_ATTR;
+}
+static inline int ceph_osd_op_type_exec(int op)
+{
+       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_EXEC;
+}
+static inline int ceph_osd_op_type_pg(int op)
+{
+       return (op & CEPH_OSD_OP_TYPE) == CEPH_OSD_OP_TYPE_PG;
+}
+
+static inline int ceph_osd_op_mode_subop(int op)
+{
+       return (op & CEPH_OSD_OP_MODE) == CEPH_OSD_OP_MODE_SUB;
+}
+static inline int ceph_osd_op_mode_read(int op)
+{
+       return (op & CEPH_OSD_OP_MODE) == CEPH_OSD_OP_MODE_RD;
+}
+static inline int ceph_osd_op_mode_modify(int op)
+{
+       return (op & CEPH_OSD_OP_MODE) == CEPH_OSD_OP_MODE_WR;
+}
+
+/*
+ * note that the following tmap stuff is also defined in the ceph librados.h
+ * any modification here needs to be updated there
+ */
+#define CEPH_OSD_TMAP_HDR 'h'
+#define CEPH_OSD_TMAP_SET 's'
+#define CEPH_OSD_TMAP_RM  'r'
+
+extern const char *ceph_osd_op_name(int op);
+
+
+/*
+ * osd op flags
+ *
+ * An op may be READ, WRITE, or READ|WRITE.
+ */
+enum {
+       CEPH_OSD_FLAG_ACK = 1,          /* want (or is) "ack" ack */
+       CEPH_OSD_FLAG_ONNVRAM = 2,      /* want (or is) "onnvram" ack */
+       CEPH_OSD_FLAG_ONDISK = 4,       /* want (or is) "ondisk" ack */
+       CEPH_OSD_FLAG_RETRY = 8,        /* resend attempt */
+       CEPH_OSD_FLAG_READ = 16,        /* op may read */
+       CEPH_OSD_FLAG_WRITE = 32,       /* op may write */
+       CEPH_OSD_FLAG_ORDERSNAP = 64,   /* EOLDSNAP if snapc is out of order */
+       CEPH_OSD_FLAG_PEERSTAT = 128,   /* msg includes osd_peer_stat */
+       CEPH_OSD_FLAG_BALANCE_READS = 256,
+       CEPH_OSD_FLAG_PARALLELEXEC = 512, /* execute op in parallel */
+       CEPH_OSD_FLAG_PGOP = 1024,      /* pg op, no object */
+       CEPH_OSD_FLAG_EXEC = 2048,      /* op may exec */
+       CEPH_OSD_FLAG_EXEC_PUBLIC = 4096, /* op may exec (public) */
+};
+
+enum {
+       CEPH_OSD_OP_FLAG_EXCL = 1,      /* EXCL object create */
+};
+
+#define EOLDSNAPC    ERESTART  /* ORDERSNAP flag set; writer has old snapc*/
+#define EBLACKLISTED ESHUTDOWN /* blacklisted */
+
+/* xattr comparison */
+enum {
+       CEPH_OSD_CMPXATTR_OP_NOP = 0,
+       CEPH_OSD_CMPXATTR_OP_EQ  = 1,
+       CEPH_OSD_CMPXATTR_OP_NE  = 2,
+       CEPH_OSD_CMPXATTR_OP_GT  = 3,
+       CEPH_OSD_CMPXATTR_OP_GTE = 4,
+       CEPH_OSD_CMPXATTR_OP_LT  = 5,
+       CEPH_OSD_CMPXATTR_OP_LTE = 6
+};
+
+enum {
+       CEPH_OSD_CMPXATTR_MODE_STRING = 1,
+       CEPH_OSD_CMPXATTR_MODE_U64    = 2
+};
+
+/*
+ * an individual object operation.  each may be accompanied by some data
+ * payload
+ */
+struct ceph_osd_op {
+       __le16 op;           /* CEPH_OSD_OP_* */
+       __le32 flags;        /* CEPH_OSD_FLAG_* */
+       union {
+               struct {
+                       __le64 offset, length;
+                       __le64 truncate_size;
+                       __le32 truncate_seq;
+               } __attribute__ ((packed)) extent;
+               struct {
+                       __le32 name_len;
+                       __le32 value_len;
+                       __u8 cmp_op;       /* CEPH_OSD_CMPXATTR_OP_* */
+                       __u8 cmp_mode;     /* CEPH_OSD_CMPXATTR_MODE_* */
+               } __attribute__ ((packed)) xattr;
+               struct {
+                       __u8 class_len;
+                       __u8 method_len;
+                       __u8 argc;
+                       __le32 indata_len;
+               } __attribute__ ((packed)) cls;
+               struct {
+                       __le64 cookie, count;
+               } __attribute__ ((packed)) pgls;
+               struct {
+                       __le64 snapid;
+               } __attribute__ ((packed)) snap;
+       };
+       __le32 payload_len;
+} __attribute__ ((packed));
+
+/*
+ * osd request message header.  each request may include multiple
+ * ceph_osd_op object operations.
+ */
+struct ceph_osd_request_head {
+       __le32 client_inc;                 /* client incarnation */
+       struct ceph_object_layout layout;  /* pgid */
+       __le32 osdmap_epoch;               /* client's osdmap epoch */
+
+       __le32 flags;
+
+       struct ceph_timespec mtime;        /* for mutations only */
+       struct ceph_eversion reassert_version; /* if we are replaying op */
+
+       __le32 object_len;     /* length of object name */
+
+       __le64 snapid;         /* snapid to read */
+       __le64 snap_seq;       /* writer's snap context */
+       __le32 num_snaps;
+
+       __le16 num_ops;
+       struct ceph_osd_op ops[];  /* followed by ops[], obj, ticket, snaps */
+} __attribute__ ((packed));
+
+struct ceph_osd_reply_head {
+       __le32 client_inc;                /* client incarnation */
+       __le32 flags;
+       struct ceph_object_layout layout;
+       __le32 osdmap_epoch;
+       struct ceph_eversion reassert_version; /* for replaying uncommitted */
+
+       __le32 result;                    /* result code */
+
+       __le32 object_len;                /* length of object name */
+       __le32 num_ops;
+       struct ceph_osd_op ops[0];  /* ops[], object */
+} __attribute__ ((packed));
+
+
+#endif
diff --git a/include/linux/ceph/types.h b/include/linux/ceph/types.h

new file mode 100644 (file)

index 0000000..28b35a0
--- /dev/null
+++ b/include/linux/ceph/types.h
@@ -0,0 +1,29 @@
+#ifndef _FS_CEPH_TYPES_H
+#define _FS_CEPH_TYPES_H
+
+/* needed before including ceph_fs.h */
+#include <linux/in.h>
+#include <linux/types.h>
+#include <linux/fcntl.h>
+#include <linux/string.h>
+
+#include "ceph_fs.h"
+#include "ceph_frag.h"
+#include "ceph_hash.h"
+
+/*
+ * Identify inodes by both their ino AND snapshot id (a u64).
+ */
+struct ceph_vino {
+       u64 ino;
+       u64 snap;
+};
+
+
+/* context for the caps reservation mechanism */
+struct ceph_cap_reservation {
+       int count;
+};
+
+
+#endif
diff --git a/include/linux/coredump.h b/include/linux/coredump.h

index 8ba66a9d9022c7d0ddbbcb984b8839dabb2b91f8..ba4b85a6d9b8bc71853de37df2a36b865707b745 100644 (file)
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -9,37 +9,7 @@
   * These are the only things you should do on a core-file: use only these
   * functions to write out all the necessary info.
   */
-static inline int dump_write(struct file *file, const void *addr, int nr)
-{
-       return file->f_op->write(file, addr, nr, &file->f_pos) == nr;
-}
-
-static inline int dump_seek(struct file *file, loff_t off)
-{
-       int ret = 1;
-
-       if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
-               if (file->f_op->llseek(file, off, SEEK_CUR) < 0)
-                       return 0;
-       } else {
-               char *buf = (char *)get_zeroed_page(GFP_KERNEL);
-
-               if (!buf)
-                       return 0;
-               while (off > 0) {
-                       unsigned long n = off;
-
-                       if (n > PAGE_SIZE)
-                               n = PAGE_SIZE;
-                       if (!dump_write(file, buf, n)) {
-                               ret = 0;
-                               break;
-                       }
-                       off -= n;
-               }
-               free_page((unsigned long)buf);
-       }
-       return ret;
-}
+extern int dump_write(struct file *file, const void *addr, int nr);
+extern int dump_seek(struct file *file, loff_t off);
  
  #endif /* _LINUX_COREDUMP_H */
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h

index 36ca9721a0c28a0150b05153b4f442d2c558d6d4..1be416bbbb82540802a0a742ba2c22934a9a6659 100644 (file)
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -53,6 +53,7 @@ struct cpuidle_state {
  #define CPUIDLE_FLAG_BALANCED  (0x40) /* medium latency, moderate savings */
  #define CPUIDLE_FLAG_DEEP      (0x80) /* high latency, large savings */
  #define CPUIDLE_FLAG_IGNORE    (0x100) /* ignore during this idle period */
+#define CPUIDLE_FLAG_TLB_FLUSHED (0x200) /* tlb will be flushed */
  
  #define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000)
  
diff --git a/include/linux/crush/crush.h b/include/linux/crush/crush.h

new file mode 100644 (file)

index 0000000..97e435b
--- /dev/null
+++ b/include/linux/crush/crush.h
@@ -0,0 +1,180 @@
+#ifndef CEPH_CRUSH_CRUSH_H
+#define CEPH_CRUSH_CRUSH_H
+
+#include <linux/types.h>
+
+/*
+ * CRUSH is a pseudo-random data distribution algorithm that
+ * efficiently distributes input values (typically, data objects)
+ * across a heterogeneous, structured storage cluster.
+ *
+ * The algorithm was originally described in detail in this paper
+ * (although the algorithm has evolved somewhat since then):
+ *
+ *     http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
+ *
+ * LGPL2
+ */
+
+
+#define CRUSH_MAGIC 0x00010000ul   /* for detecting algorithm revisions */
+
+
+#define CRUSH_MAX_DEPTH 10  /* max crush hierarchy depth */
+#define CRUSH_MAX_SET   10  /* max size of a mapping result */
+
+
+/*
+ * CRUSH uses user-defined "rules" to describe how inputs should be
+ * mapped to devices.  A rule consists of sequence of steps to perform
+ * to generate the set of output devices.
+ */
+struct crush_rule_step {
+       __u32 op;
+       __s32 arg1;
+       __s32 arg2;
+};
+
+/* step op codes */
+enum {
+       CRUSH_RULE_NOOP = 0,
+       CRUSH_RULE_TAKE = 1,          /* arg1 = value to start with */
+       CRUSH_RULE_CHOOSE_FIRSTN = 2, /* arg1 = num items to pick */
+                                     /* arg2 = type */
+       CRUSH_RULE_CHOOSE_INDEP = 3,  /* same */
+       CRUSH_RULE_EMIT = 4,          /* no args */
+       CRUSH_RULE_CHOOSE_LEAF_FIRSTN = 6,
+       CRUSH_RULE_CHOOSE_LEAF_INDEP = 7,
+};
+
+/*
+ * for specifying choose num (arg1) relative to the max parameter
+ * passed to do_rule
+ */
+#define CRUSH_CHOOSE_N            0
+#define CRUSH_CHOOSE_N_MINUS(x)   (-(x))
+
+/*
+ * The rule mask is used to describe what the rule is intended for.
+ * Given a ruleset and size of output set, we search through the
+ * rule list for a matching rule_mask.
+ */
+struct crush_rule_mask {
+       __u8 ruleset;
+       __u8 type;
+       __u8 min_size;
+       __u8 max_size;
+};
+
+struct crush_rule {
+       __u32 len;
+       struct crush_rule_mask mask;
+       struct crush_rule_step steps[0];
+};
+
+#define crush_rule_size(len) (sizeof(struct crush_rule) + \
+                             (len)*sizeof(struct crush_rule_step))
+
+
+
+/*
+ * A bucket is a named container of other items (either devices or
+ * other buckets).  Items within a bucket are chosen using one of a
+ * few different algorithms.  The table summarizes how the speed of
+ * each option measures up against mapping stability when items are
+ * added or removed.
+ *
+ *  Bucket Alg     Speed       Additions    Removals
+ *  ------------------------------------------------
+ *  uniform         O(1)       poor         poor
+ *  list            O(n)       optimal      poor
+ *  tree            O(log n)   good         good
+ *  straw           O(n)       optimal      optimal
+ */
+enum {
+       CRUSH_BUCKET_UNIFORM = 1,
+       CRUSH_BUCKET_LIST = 2,
+       CRUSH_BUCKET_TREE = 3,
+       CRUSH_BUCKET_STRAW = 4
+};
+extern const char *crush_bucket_alg_name(int alg);
+
+struct crush_bucket {
+       __s32 id;        /* this'll be negative */
+       __u16 type;      /* non-zero; type=0 is reserved for devices */
+       __u8 alg;        /* one of CRUSH_BUCKET_* */
+       __u8 hash;       /* which hash function to use, CRUSH_HASH_* */
+       __u32 weight;    /* 16-bit fixed point */
+       __u32 size;      /* num items */
+       __s32 *items;
+
+       /*
+        * cached random permutation: used for uniform bucket and for
+        * the linear search fallback for the other bucket types.
+        */
+       __u32 perm_x;  /* @x for which *perm is defined */
+       __u32 perm_n;  /* num elements of *perm that are permuted/defined */
+       __u32 *perm;
+};
+
+struct crush_bucket_uniform {
+       struct crush_bucket h;
+       __u32 item_weight;  /* 16-bit fixed point; all items equally weighted */
+};
+
+struct crush_bucket_list {
+       struct crush_bucket h;
+       __u32 *item_weights;  /* 16-bit fixed point */
+       __u32 *sum_weights;   /* 16-bit fixed point.  element i is sum
+                                of weights 0..i, inclusive */
+};
+
+struct crush_bucket_tree {
+       struct crush_bucket h;  /* note: h.size is _tree_ size, not number of
+                                  actual items */
+       __u8 num_nodes;
+       __u32 *node_weights;
+};
+
+struct crush_bucket_straw {
+       struct crush_bucket h;
+       __u32 *item_weights;   /* 16-bit fixed point */
+       __u32 *straws;         /* 16-bit fixed point */
+};
+
+
+
+/*
+ * CRUSH map includes all buckets, rules, etc.
+ */
+struct crush_map {
+       struct crush_bucket **buckets;
+       struct crush_rule **rules;
+
+       /*
+        * Parent pointers to identify the parent bucket a device or
+        * bucket in the hierarchy.  If an item appears more than
+        * once, this is the _last_ time it appeared (where buckets
+        * are processed in bucket id order, from -1 on down to
+        * -max_buckets.
+        */
+       __u32 *bucket_parents;
+       __u32 *device_parents;
+
+       __s32 max_buckets;
+       __u32 max_rules;
+       __s32 max_devices;
+};
+
+
+/* crush.c */
+extern int crush_get_bucket_item_weight(struct crush_bucket *b, int pos);
+extern void crush_calc_parents(struct crush_map *map);
+extern void crush_destroy_bucket_uniform(struct crush_bucket_uniform *b);
+extern void crush_destroy_bucket_list(struct crush_bucket_list *b);
+extern void crush_destroy_bucket_tree(struct crush_bucket_tree *b);
+extern void crush_destroy_bucket_straw(struct crush_bucket_straw *b);
+extern void crush_destroy_bucket(struct crush_bucket *b);
+extern void crush_destroy(struct crush_map *map);
+
+#endif
diff --git a/include/linux/crush/hash.h b/include/linux/crush/hash.h

new file mode 100644 (file)

index 0000000..91e8842
--- /dev/null
+++ b/include/linux/crush/hash.h
@@ -0,0 +1,17 @@
+#ifndef CEPH_CRUSH_HASH_H
+#define CEPH_CRUSH_HASH_H
+
+#define CRUSH_HASH_RJENKINS1   0
+
+#define CRUSH_HASH_DEFAULT CRUSH_HASH_RJENKINS1
+
+extern const char *crush_hash_name(int type);
+
+extern __u32 crush_hash32(int type, __u32 a);
+extern __u32 crush_hash32_2(int type, __u32 a, __u32 b);
+extern __u32 crush_hash32_3(int type, __u32 a, __u32 b, __u32 c);
+extern __u32 crush_hash32_4(int type, __u32 a, __u32 b, __u32 c, __u32 d);
+extern __u32 crush_hash32_5(int type, __u32 a, __u32 b, __u32 c, __u32 d,
+                           __u32 e);
+
+#endif
diff --git a/include/linux/crush/mapper.h b/include/linux/crush/mapper.h

new file mode 100644 (file)

index 0000000..c46b99c
--- /dev/null
+++ b/include/linux/crush/mapper.h
@@ -0,0 +1,20 @@
+#ifndef CEPH_CRUSH_MAPPER_H
+#define CEPH_CRUSH_MAPPER_H
+
+/*
+ * CRUSH functions for find rules and then mapping an input to an
+ * output set.
+ *
+ * LGPL2
+ */
+
+#include "crush.h"
+
+extern int crush_find_rule(struct crush_map *map, int pool, int type, int size);
+extern int crush_do_rule(struct crush_map *map,
+                        int ruleno,
+                        int x, int *result, int result_max,
+                        int forcefeed,    /* -1 for none */
+                        __u32 *weights);
+
+#endif
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h

index c61d4ca27bcc26906699101b4f8eee7ce8e8525b..e2106495cc11383ad9a14d7ac119400b18a8f83a 100644 (file)
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -548,7 +548,7 @@ static inline bool dma_dev_has_pq_continue(struct dma_device *dma)
         return (dma->max_pq & DMA_HAS_PQ_CONTINUE) == DMA_HAS_PQ_CONTINUE;
  }
  
-static unsigned short dma_dev_to_maxpq(struct dma_device *dma)
+static inline unsigned short dma_dev_to_maxpq(struct dma_device *dma)
  {
         return dma->max_pq & ~DMA_HAS_PQ_CONTINUE;
  }
diff --git a/include/linux/elevator.h b/include/linux/elevator.h

index 926b50322a469c9d16c48ed35e10c4c12bd695f3..4fd978e7eb83ef8d689d0d313b5441b0631ea275 100644 (file)
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -93,6 +93,7 @@ struct elevator_queue
         struct elevator_type *elevator_type;
         struct mutex sysfs_lock;
         struct hlist_head *hash;
+       unsigned int registered:1;
  };
  
  /*
diff --git a/include/linux/kernel.h b/include/linux/kernel.h

index 2b0a35e6bc691896609944c114328b414be37a12..1759ba5adce845fa08d4b1fd899920ac65436938 100644 (file)
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -58,7 +58,18 @@ extern const char linux_proc_banner[];
  
  #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
  #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
-#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#define roundup(x, y) (                                        \
+{                                                      \
+       typeof(y) __y = y;                              \
+       (((x) + (__y - 1)) / __y) * __y;                \
+}                                                      \
+)
+#define rounddown(x, y) (                              \
+{                                                      \
+       typeof(x) __x = (x);                            \
+       __x - (__x % (y));                              \
+}                                                      \
+)
  #define DIV_ROUND_CLOSEST(x, divisor)(                 \
  {                                                      \
         typeof(divisor) __divisor = divisor;            \
diff --git a/include/linux/module.h b/include/linux/module.h

index 8a6b9fdc7ffae0c20335b8a2c73726396a934ee2..aace066bad8f067758cda4d341b9ce8941d2ae1c 100644 (file)
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -686,17 +686,16 @@ extern int module_sysfs_initialized;
  
  
  #ifdef CONFIG_GENERIC_BUG
-int  module_bug_finalize(const Elf_Ehdr *, const Elf_Shdr *,
+void module_bug_finalize(const Elf_Ehdr *, const Elf_Shdr *,
                          struct module *);
  void module_bug_cleanup(struct module *);
  
  #else  /* !CONFIG_GENERIC_BUG */
  
-static inline int  module_bug_finalize(const Elf_Ehdr *hdr,
+static inline void module_bug_finalize(const Elf_Ehdr *hdr,
                                         const Elf_Shdr *sechdrs,
                                         struct module *mod)
  {
-       return 0;
  }
  static inline void module_bug_cleanup(struct module *mod) {}
  #endif /* CONFIG_GENERIC_BUG */
diff --git a/include/linux/netfilter/nfnetlink_conntrack.h b/include/linux/netfilter/nfnetlink_conntrack.h

index 9ed534c991b9312d84876c9162ed63e5ef2b5c66..70cd0603911c97b865bc576a3f4490f523e2fc08 100644 (file)
--- a/include/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/linux/netfilter/nfnetlink_conntrack.h
@@ -39,8 +39,9 @@ enum ctattr_type {
         CTA_TUPLE_MASTER,
         CTA_NAT_SEQ_ADJ_ORIG,
         CTA_NAT_SEQ_ADJ_REPLY,
-       CTA_SECMARK,
+       CTA_SECMARK,            /* obsolete */
         CTA_ZONE,
+       CTA_SECCTX,
         __CTA_MAX
  };
  #define CTA_MAX (__CTA_MAX - 1)
@@ -172,4 +173,11 @@ enum ctattr_help {
  };
  #define CTA_HELP_MAX (__CTA_HELP_MAX - 1)
  
+enum ctattr_secctx {
+       CTA_SECCTX_UNSPEC,
+       CTA_SECCTX_NAME,
+       __CTA_SECCTX_MAX
+};
+#define CTA_SECCTX_MAX (__CTA_SECCTX_MAX - 1)
+
  #endif /* _IPCONNTRACK_NETLINK_H */
diff --git a/include/linux/netfilter/xt_SECMARK.h b/include/linux/netfilter/xt_SECMARK.h

index 6fcd3448b18631f04e081cde470f85218dd7f9b8..989092bd6274b44585ccc0faf4f005a0d7b909b7 100644 (file)
--- a/include/linux/netfilter/xt_SECMARK.h
+++ b/include/linux/netfilter/xt_SECMARK.h
@@ -11,18 +11,12 @@
   * packets are being marked for.
   */
  #define SECMARK_MODE_SEL       0x01            /* SELinux */
-#define SECMARK_SELCTX_MAX     256
-
-struct xt_secmark_target_selinux_info {
-       __u32 selsid;
-       char selctx[SECMARK_SELCTX_MAX];
-};
+#define SECMARK_SECCTX_MAX     256
  
  struct xt_secmark_target_info {
         __u8 mode;
-       union {
-               struct xt_secmark_target_selinux_info sel;
-       } u;
+       __u32 secid;
+       char secctx[SECMARK_SECCTX_MAX];
  };
  
  #endif /*_XT_SECMARK_H_target */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h

index 9fbc54a2585d42cb9276adf2c2d168f53e883f63..83af1f8d8b746cde3b8369d7125f3cdfe07da6d3 100644 (file)
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -454,7 +454,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
   * Makes rcu_dereference_check() do the dirty work.
   */
  #define rcu_dereference_bh(p) \
-               rcu_dereference_check(p, rcu_read_lock_bh_held())
+               rcu_dereference_check(p, rcu_read_lock_bh_held() || irqs_disabled())
  
  /**
   * rcu_dereference_sched - fetch RCU-protected pointer, checking for RCU-sched
diff --git a/include/linux/security.h b/include/linux/security.h

index a22219afff092952bbe276cb9da1d0509ddf196c..b8246a8df7d2dc2ecc864ac192d694b369dd12d6 100644 (file)
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -74,7 +74,7 @@ extern int cap_file_mmap(struct file *file, unsigned long reqprot,
  extern int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags);
  extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
                           unsigned long arg4, unsigned long arg5);
-extern int cap_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp);
+extern int cap_task_setscheduler(struct task_struct *p);
  extern int cap_task_setioprio(struct task_struct *p, int ioprio);
  extern int cap_task_setnice(struct task_struct *p, int nice);
  extern int cap_syslog(int type, bool from_file);
@@ -959,6 +959,12 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
   *     Sets the new child socket's sid to the openreq sid.
   * @inet_conn_established:
   *     Sets the connection's peersid to the secmark on skb.
+ * @secmark_relabel_packet:
+ *     check if the process should be allowed to relabel packets to the given secid
+ * @security_secmark_refcount_inc
+ *     tells the LSM to increment the number of secmark labeling rules loaded
+ * @security_secmark_refcount_dec
+ *     tells the LSM to decrement the number of secmark labeling rules loaded
   * @req_classify_flow:
   *     Sets the flow's sid to the openreq sid.
   * @tun_dev_create:
@@ -1279,9 +1285,13 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
   *     Return 0 if permission is granted.
   *
   * @secid_to_secctx:
- *     Convert secid to security context.
+ *     Convert secid to security context.  If secdata is NULL the length of
+ *     the result will be returned in seclen, but no secdata will be returned.
+ *     This does mean that the length could change between calls to check the
+ *     length and the next call which actually allocates and returns the secdata.
   *     @secid contains the security ID.
   *     @secdata contains the pointer that stores the converted security context.
+ *     @seclen pointer which contains the length of the data
   * @secctx_to_secid:
   *     Convert security context to secid.
   *     @secid contains the pointer to the generated security ID.
@@ -1501,8 +1511,7 @@ struct security_operations {
         int (*task_getioprio) (struct task_struct *p);
         int (*task_setrlimit) (struct task_struct *p, unsigned int resource,
                         struct rlimit *new_rlim);
-       int (*task_setscheduler) (struct task_struct *p, int policy,
-                                 struct sched_param *lp);
+       int (*task_setscheduler) (struct task_struct *p);
         int (*task_getscheduler) (struct task_struct *p);
         int (*task_movememory) (struct task_struct *p);
         int (*task_kill) (struct task_struct *p,
@@ -1594,6 +1603,9 @@ struct security_operations {
                                   struct request_sock *req);
         void (*inet_csk_clone) (struct sock *newsk, const struct request_sock *req);
         void (*inet_conn_established) (struct sock *sk, struct sk_buff *skb);
+       int (*secmark_relabel_packet) (u32 secid);
+       void (*secmark_refcount_inc) (void);
+       void (*secmark_refcount_dec) (void);
         void (*req_classify_flow) (const struct request_sock *req, struct flowi *fl);
         int (*tun_dev_create)(void);
         void (*tun_dev_post_create)(struct sock *sk);
@@ -1752,8 +1764,7 @@ int security_task_setioprio(struct task_struct *p, int ioprio);
  int security_task_getioprio(struct task_struct *p);
  int security_task_setrlimit(struct task_struct *p, unsigned int resource,
                 struct rlimit *new_rlim);
-int security_task_setscheduler(struct task_struct *p,
-                               int policy, struct sched_param *lp);
+int security_task_setscheduler(struct task_struct *p);
  int security_task_getscheduler(struct task_struct *p);
  int security_task_movememory(struct task_struct *p);
  int security_task_kill(struct task_struct *p, struct siginfo *info,
@@ -2320,11 +2331,9 @@ static inline int security_task_setrlimit(struct task_struct *p,
         return 0;
  }
  
-static inline int security_task_setscheduler(struct task_struct *p,
-                                            int policy,
-                                            struct sched_param *lp)
+static inline int security_task_setscheduler(struct task_struct *p)
  {
-       return cap_task_setscheduler(p, policy, lp);
+       return cap_task_setscheduler(p);
  }
  
  static inline int security_task_getscheduler(struct task_struct *p)
@@ -2551,6 +2560,9 @@ void security_inet_csk_clone(struct sock *newsk,
                         const struct request_sock *req);
  void security_inet_conn_established(struct sock *sk,
                         struct sk_buff *skb);
+int security_secmark_relabel_packet(u32 secid);
+void security_secmark_refcount_inc(void);
+void security_secmark_refcount_dec(void);
  int security_tun_dev_create(void);
  void security_tun_dev_post_create(struct sock *sk);
  int security_tun_dev_attach(struct sock *sk);
@@ -2705,6 +2717,19 @@ static inline void security_inet_conn_established(struct sock *sk,
  {
  }
  
+static inline int security_secmark_relabel_packet(u32 secid)
+{
+       return 0;
+}
+
+static inline void security_secmark_refcount_inc(void)
+{
+}
+
+static inline void security_secmark_refcount_dec(void)
+{
+}
+
  static inline int security_tun_dev_create(void)
  {
         return 0;
diff --git a/include/linux/selinux.h b/include/linux/selinux.h

index 82e0f26a12996a1bcce6efddd07e734313162c19..44f4596126904f7d0357654c999ed2ae13258c82 100644 (file)
--- a/include/linux/selinux.h
+++ b/include/linux/selinux.h
@@ -20,75 +20,12 @@ struct kern_ipc_perm;
  
  #ifdef CONFIG_SECURITY_SELINUX
  
-/**
- *     selinux_string_to_sid - map a security context string to a security ID
- *     @str: the security context string to be mapped
- *     @sid: ID value returned via this.
- *
- *     Returns 0 if successful, with the SID stored in sid.  A value
- *     of zero for sid indicates no SID could be determined (but no error
- *     occurred).
- */
-int selinux_string_to_sid(char *str, u32 *sid);
-
-/**
- *     selinux_secmark_relabel_packet_permission - secmark permission check
- *     @sid: SECMARK ID value to be applied to network packet
- *
- *     Returns 0 if the current task is allowed to set the SECMARK label of
- *     packets with the supplied security ID.  Note that it is implicit that
- *     the packet is always being relabeled from the default unlabeled value,
- *     and that the access control decision is made in the AVC.
- */
-int selinux_secmark_relabel_packet_permission(u32 sid);
-
-/**
- *     selinux_secmark_refcount_inc - increments the secmark use counter
- *
- *     SELinux keeps track of the current SECMARK targets in use so it knows
- *     when to apply SECMARK label access checks to network packets.  This
- *     function incements this reference count to indicate that a new SECMARK
- *     target has been configured.
- */
-void selinux_secmark_refcount_inc(void);
-
-/**
- *     selinux_secmark_refcount_dec - decrements the secmark use counter
- *
- *     SELinux keeps track of the current SECMARK targets in use so it knows
- *     when to apply SECMARK label access checks to network packets.  This
- *     function decements this reference count to indicate that one of the
- *     existing SECMARK targets has been removed/flushed.
- */
-void selinux_secmark_refcount_dec(void);
-
  /**
   * selinux_is_enabled - is SELinux enabled?
   */
  bool selinux_is_enabled(void);
  #else
  
-static inline int selinux_string_to_sid(const char *str, u32 *sid)
-{
-       *sid = 0;
-       return 0;
-}
-
-static inline int selinux_secmark_relabel_packet_permission(u32 sid)
-{
-       return 0;
-}
-
-static inline void selinux_secmark_refcount_inc(void)
-{
-       return;
-}
-
-static inline void selinux_secmark_refcount_dec(void)
-{
-       return;
-}
-
  static inline bool selinux_is_enabled(void)
  {
         return false;
diff --git a/include/linux/types.h b/include/linux/types.h

index 01a082f56ef423065adb66f621b11db2bd1d88f1..357dbc19606f8920f0b060ad04fb1d4fb2dfca00 100644 (file)
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -121,7 +121,15 @@ typedef            __u64           u_int64_t;
  typedef                __s64           int64_t;
  #endif
  
-/* this is a special 64bit data type that is 8-byte aligned */
+/*
+ * aligned_u64 should be used in defining kernel<->userspace ABIs to avoid
+ * common 32/64-bit compat problems.
+ * 64-bit values align to 4-byte boundaries on x86_32 (and possibly other
+ * architectures) and to 8-byte boundaries on 64-bit architetures.  The new
+ * aligned_64 type enforces 8-byte alignment so that structs containing
+ * aligned_64 values have the same alignment on 32-bit and 64-bit architectures.
+ * No conversions are necessary between 32-bit user-space and a 64-bit kernel.
+ */
  #define aligned_u64 __u64 __attribute__((aligned(8)))
  #define aligned_be64 __be64 __attribute__((aligned(8)))
  #define aligned_le64 __le64 __attribute__((aligned(8)))
@@ -178,6 +186,11 @@ typedef __u64 __bitwise __be64;
  typedef __u16 __bitwise __sum16;
  typedef __u32 __bitwise __wsum;
  
+/* this is a special 64bit data type that is 8-byte aligned */
+#define __aligned_u64 __u64 __attribute__((aligned(8)))
+#define __aligned_be64 __be64 __attribute__((aligned(8)))
+#define __aligned_le64 __le64 __attribute__((aligned(8)))
+
  #ifdef __KERNEL__
  typedef unsigned __bitwise__ gfp_t;
  typedef unsigned __bitwise__ fmode_t;
diff --git a/include/linux/wait.h b/include/linux/wait.h

index 0836ccc5712146f87d13e9e35a9480e980708392..3efc9f3f43a0862cad51aa325e08c9f24d129f8e 100644 (file)
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -614,6 +614,7 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
                 (wait)->private = current;                              \
                 (wait)->func = autoremove_wake_function;                \
                 INIT_LIST_HEAD(&(wait)->task_list);                     \
+               (wait)->flags = 0;                                      \
         } while (0)
  
  /**
diff --git a/include/media/videobuf-dma-sg.h b/include/media/videobuf-dma-sg.h

index 97e07f46a0fae22f1cba1eb3073f5661cc97b95a..aa4ebb42a5652b2261b1a2dbbbe133cbd5fbfe51 100644 (file)
--- a/include/media/videobuf-dma-sg.h
+++ b/include/media/videobuf-dma-sg.h
@@ -48,6 +48,7 @@ struct videobuf_dmabuf {
  
         /* for userland buffer */
         int                 offset;
+       size_t              size;
         struct page         **pages;
  
         /* for kernel buffers */
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h

index 27a902d9b3a9a431c6b3162a4c6fe479aa99504c..30fce0128dd72fa0281795bc0ae03809b85bc9d0 100644 (file)
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -161,12 +161,30 @@ static inline struct sk_buff *bt_skb_send_alloc(struct sock *sk, unsigned long l
  {
         struct sk_buff *skb;
  
+       release_sock(sk);
         if ((skb = sock_alloc_send_skb(sk, len + BT_SKB_RESERVE, nb, err))) {
                 skb_reserve(skb, BT_SKB_RESERVE);
                 bt_cb(skb)->incoming  = 0;
         }
+       lock_sock(sk);
+
+       if (!skb && *err)
+               return NULL;
+
+       *err = sock_error(sk);
+       if (*err)
+               goto out;
+
+       if (sk->sk_shutdown) {
+               *err = -ECONNRESET;
+               goto out;
+       }
  
         return skb;
+
+out:
+       kfree_skb(skb);
+       return NULL;
  }
  
  int bt_err(__u16 code);
diff --git a/ipc/sem.c b/ipc/sem.c

index 40a8f462a8224b298690cb07892f93afe8c15214..0e0d49bbb867f239be5690968227c53e7c0226c0 100644 (file)
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -743,6 +743,8 @@ static unsigned long copy_semid_to_user(void __user *buf, struct semid64_ds *in,
             {
                 struct semid_ds out;
  
+               memset(&out, 0, sizeof(out));
+
                 ipc64_perm_to_ipc_perm(&in->sem_perm, &out.sem_perm);
  
                 out.sem_otime   = in->sem_otime;
diff --git a/kernel/cpuset.c b/kernel/cpuset.c

index b23c0979bbe7212a748a9aac70697e92ee54f448..51b143e2a07a49603d9aa462728f6a45032c01d9 100644 (file)
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1397,7 +1397,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
         if (tsk->flags & PF_THREAD_BOUND)
                 return -EINVAL;
  
-       ret = security_task_setscheduler(tsk, 0, NULL);
+       ret = security_task_setscheduler(tsk);
         if (ret)
                 return ret;
         if (threadgroup) {
@@ -1405,7 +1405,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
  
                 rcu_read_lock();
                 list_for_each_entry_rcu(c, &tsk->thread_group, thread_group) {
-                       ret = security_task_setscheduler(c, 0, NULL);
+                       ret = security_task_setscheduler(c);
                         if (ret) {
                                 rcu_read_unlock();
                                 return ret;
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c

index 1decafbb6b1a28197b768021cc987e230d352b5b..72206cf5c6cf854898d889a6a645e44febdd526f 100644 (file)
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -931,6 +931,7 @@ static inline int
  remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
  {
         if (hrtimer_is_queued(timer)) {
+               unsigned long state;
                 int reprogram;
  
                 /*
@@ -944,8 +945,13 @@ remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
                 debug_deactivate(timer);
                 timer_stats_hrtimer_clear_start_info(timer);
                 reprogram = base->cpu_base == &__get_cpu_var(hrtimer_bases);
-               __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE,
-                                reprogram);
+               /*
+                * We must preserve the CALLBACK state flag here,
+                * otherwise we could move the timer base in
+                * switch_hrtimer_base.
+                */
+               state = timer->state & HRTIMER_STATE_CALLBACK;
+               __remove_hrtimer(timer, base, state, reprogram);
                 return 1;
         }
         return 0;
@@ -1231,6 +1237,9 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
                 BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
                 enqueue_hrtimer(timer, base);
         }
+
+       WARN_ON_ONCE(!(timer->state & HRTIMER_STATE_CALLBACK));
+
         timer->state &= ~HRTIMER_STATE_CALLBACK;
  }
  
diff --git a/kernel/kfifo.c b/kernel/kfifo.c

index 6b5580c57644dc1804b02fd85648e7100d5d75dd..01a0700e873f53ca60084da3c0c1142bebf49b16 100644 (file)
--- a/kernel/kfifo.c
+++ b/kernel/kfifo.c
@@ -365,8 +365,6 @@ static unsigned int setup_sgl(struct __kfifo *fifo, struct scatterlist *sgl,
         n = setup_sgl_buf(sgl, fifo->data + off, nents, l);
         n += setup_sgl_buf(sgl + n, fifo->data, nents - n, len - l);
  
-       if (n)
-               sg_mark_end(sgl + n - 1);
         return n;
  }
  
diff --git a/kernel/module.c b/kernel/module.c

index d0b5f8db11b4a4183c229e2a44f433a99f58090a..ccd641991842f4990906946f895fd5062a0389c3 100644 (file)
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1537,6 +1537,7 @@ static int __unlink_module(void *_mod)
  {
         struct module *mod = _mod;
         list_del(&mod->list);
+       module_bug_cleanup(mod);
         return 0;
  }
  
@@ -2625,6 +2626,7 @@ static struct module *load_module(void __user *umod,
         if (err < 0)
                 goto ddebug;
  
+       module_bug_finalize(info.hdr, info.sechdrs, mod);
         list_add_rcu(&mod->list, &modules);
         mutex_unlock(&module_mutex);
  
@@ -2650,6 +2652,8 @@ static struct module *load_module(void __user *umod,
         mutex_lock(&module_mutex);
         /* Unlink carefully: kallsyms could be walking list. */
         list_del_rcu(&mod->list);
+       module_bug_cleanup(mod);
+
   ddebug:
         if (!mod->taints)
                 dynamic_debug_remove(info.debug);
diff --git a/kernel/perf_event.c b/kernel/perf_event.c

index db5b56064687e453c0df1cc118b975ea047bdcae..b98bed3d8182685ffff2b840c0ff4809b4f65cb8 100644 (file)
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2202,15 +2202,13 @@ static void perf_event_for_each(struct perf_event *event,
  static int perf_event_period(struct perf_event *event, u64 __user *arg)
  {
         struct perf_event_context *ctx = event->ctx;
-       unsigned long size;
         int ret = 0;
         u64 value;
  
         if (!event->attr.sample_period)
                 return -EINVAL;
  
-       size = copy_from_user(&value, arg, sizeof(value));
-       if (size != sizeof(value))
+       if (copy_from_user(&value, arg, sizeof(value)))
                 return -EFAULT;
  
         if (!value)
diff --git a/kernel/sched.c b/kernel/sched.c

index dc85ceb908322cad7196339f4df8dd58c37b1cec..df6579d9b4dfe058a2287fee93dffc762b408770 100644 (file)
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4645,7 +4645,7 @@ recheck:
         }
  
         if (user) {
-               retval = security_task_setscheduler(p, policy, param);
+               retval = security_task_setscheduler(p);
                 if (retval)
                         return retval;
         }
@@ -4887,7 +4887,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
         if (!check_same_owner(p) && !capable(CAP_SYS_NICE))
                 goto out_unlock;
  
-       retval = security_task_setscheduler(p, 0, NULL);
+       retval = security_task_setscheduler(p);
         if (retval)
                 goto out_unlock;
  
diff --git a/kernel/signal.c b/kernel/signal.c

index bded65187780f5f288bd779920a5c04c190528dc..919562c3d6b720d58ff246b2c412114d77c0b419 100644 (file)
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2214,6 +2214,14 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
                 err |= __put_user(from->si_addr, &to->si_addr);
  #ifdef __ARCH_SI_TRAPNO
                 err |= __put_user(from->si_trapno, &to->si_trapno);
+#endif
+#ifdef BUS_MCEERR_AO
+               /* 
+                * Other callers might not initialize the si_lsb field,
+                * so check explicitely for the right codes here.
+                */
+               if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
+                       err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
  #endif
                 break;
         case __SI_CHLD:
diff --git a/kernel/smp.c b/kernel/smp.c

index 75c970c715d399f1385d72e92e0c47c509e568dc..ed6aacfcb7efb307fe313ea798e7074f2c8f4f92 100644 (file)
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -365,9 +365,10 @@ call:
  EXPORT_SYMBOL_GPL(smp_call_function_any);
  
  /**
- * __smp_call_function_single(): Run a function on another CPU
+ * __smp_call_function_single(): Run a function on a specific CPU
   * @cpu: The CPU to run on.
   * @data: Pre-allocated and setup data structure
+ * @wait: If true, wait until function has completed on specified CPU.
   *
   * Like smp_call_function_single(), but allow caller to pass in a
   * pre-allocated data structure. Useful for embedding @data inside
@@ -376,8 +377,10 @@ EXPORT_SYMBOL_GPL(smp_call_function_any);
  void __smp_call_function_single(int cpu, struct call_single_data *data,
                                 int wait)
  {
-       csd_lock(data);
+       unsigned int this_cpu;
+       unsigned long flags;
  
+       this_cpu = get_cpu();
         /*
          * Can deadlock when called with interrupts disabled.
          * We allow cpu's that are not yet online though, as no one else can
@@ -387,7 +390,15 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
         WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
                      && !oops_in_progress);
  
-       generic_exec_single(cpu, data, wait);
+       if (cpu == this_cpu) {
+               local_irq_save(flags);
+               data->func(data->info);
+               local_irq_restore(flags);
+       } else {
+               csd_lock(data);
+               generic_exec_single(cpu, data, wait);
+       }
+       put_cpu();
  }
  
  /**
diff --git a/kernel/sysctl.c b/kernel/sysctl.c

index f88552c6d2275be1216187f07b1e0e1b22b93af2..3a45c224770fb82fa4bd76f9c7d4f2f989ee5aa9 100644 (file)
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2485,7 +2485,7 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
                 kbuf[left] = 0;
         }
  
-       for (; left && vleft--; i++, min++, max++, first=0) {
+       for (; left && vleft--; i++, first = 0) {
                 unsigned long val;
  
                 if (write) {
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c

index 04cdcf72c827e7601cdca63ab4c54a3f16473c50..10b90d8a03c48678258c6aaf3de353af7b06ed36 100644 (file)
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -143,15 +143,6 @@ int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table)
                                 if (!table->maxlen)
                                         set_fail(&fail, table, "No maxlen");
                         }
-                       if ((table->proc_handler == proc_doulongvec_minmax) ||
-                           (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
-                               if (table->maxlen > sizeof (unsigned long)) {
-                                       if (!table->extra1)
-                                               set_fail(&fail, table, "No min");
-                                       if (!table->extra2)
-                                               set_fail(&fail, table, "No max");
-                               }
-                       }
  #ifdef CONFIG_PROC_SYSCTL
                         if (table->procname && !table->proc_handler)
                                 set_fail(&fail, table, "No proc_handler");
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c

index 492197e2f86cda2792603186b59ad3fdd17c448d..bca96377fd4e8667df513dfd91ea56931e35578f 100644 (file)
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -405,7 +405,7 @@ static inline int test_time_stamp(u64 delta)
  #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
  
  /* Max number of timestamps that can fit on a page */
-#define RB_TIMESTAMPS_PER_PAGE (BUF_PAGE_SIZE / RB_LEN_TIME_STAMP)
+#define RB_TIMESTAMPS_PER_PAGE (BUF_PAGE_SIZE / RB_LEN_TIME_EXTEND)
  
  int ring_buffer_print_page_header(struct trace_seq *s)
  {
diff --git a/lib/bug.c b/lib/bug.c

index 7cdfad88128fa5d3d3076d552fbde276ef500d66..19552096d16b06bd2dac2a9b10212e482a4d0da3 100644 (file)
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -72,8 +72,8 @@ static const struct bug_entry *module_find_bug(unsigned long bugaddr)
         return NULL;
  }
  
-int module_bug_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
-                       struct module *mod)
+void module_bug_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
+                        struct module *mod)
  {
         char *secstrings;
         unsigned int i;
@@ -97,8 +97,6 @@ int module_bug_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
          * could potentially lead to deadlock and thus be counter-productive.
          */
         list_add(&mod->bug_list, &module_bug_list);
-
-       return 0;
  }
  
  void module_bug_cleanup(struct module *mod)
diff --git a/lib/list_sort.c b/lib/list_sort.c

index 4b5cb794c38bb270b8b72b70a47de265ec05c210..a7616fa3162e844f5b3c7090c1911543c826147e 100644 (file)
--- a/lib/list_sort.c
+++ b/lib/list_sort.c
@@ -70,7 +70,7 @@ static void merge_and_restore_back_links(void *priv,
                  * element comparison is needed, so the client's cmp()
                  * routine can invoke cond_resched() periodically.
                  */
-               (*cmp)(priv, tail, tail);
+               (*cmp)(priv, tail->next, tail->next);
  
                 tail->next->prev = tail;
                 tail = tail->next;
diff --git a/mm/ksm.c b/mm/ksm.c

index b1873cf03ed986bcb062259da8f1c4093a97160b..65ab5c7067d994ad934c4f4bd5fd5809235a0756 100644 (file)
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -712,7 +712,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
         if (!ptep)
                 goto out;
  
-       if (pte_write(*ptep)) {
+       if (pte_write(*ptep) || pte_dirty(*ptep)) {
                 pte_t entry;
  
                 swapped = PageSwapCache(page);
@@ -735,7 +735,9 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
                         set_pte_at(mm, addr, ptep, entry);
                         goto out_unlock;
                 }
-               entry = pte_wrprotect(entry);
+               if (pte_dirty(entry))
+                       set_page_dirty(page);
+               entry = pte_mkclean(pte_wrprotect(entry));
                 set_pte_at_notify(mm, addr, ptep, entry);
         }
         *orig_pte = *ptep;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c

index 3eed583895a6f31eb434a252697371c7daa6bbd4..9be3cf8a5da462d4b1b4103eef61f8d5a9a6e06c 100644 (file)
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3587,9 +3587,13 @@ unlock:
  
  static void mem_cgroup_threshold(struct mem_cgroup *memcg)
  {
-       __mem_cgroup_threshold(memcg, false);
-       if (do_swap_account)
-               __mem_cgroup_threshold(memcg, true);
+       while (memcg) {
+               __mem_cgroup_threshold(memcg, false);
+               if (do_swap_account)
+                       __mem_cgroup_threshold(memcg, true);
+
+               memcg = parent_mem_cgroup(memcg);
+       }
  }
  
  static int compare_thresholds(const void *a, const void *b)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c

index 9c26eeca13425886690cddaf6dd45954fd3f0097..757f6b0accfe84d959b7fe5899b5916ad0ed1f14 100644 (file)
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -183,7 +183,7 @@ EXPORT_SYMBOL_GPL(hwpoison_filter);
   * signal.
   */
  static int kill_proc_ao(struct task_struct *t, unsigned long addr, int trapno,
-                       unsigned long pfn)
+                       unsigned long pfn, struct page *page)
  {
         struct siginfo si;
         int ret;
@@ -198,7 +198,7 @@ static int kill_proc_ao(struct task_struct *t, unsigned long addr, int trapno,
  #ifdef __ARCH_SI_TRAPNO
         si.si_trapno = trapno;
  #endif
-       si.si_addr_lsb = PAGE_SHIFT;
+       si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;
         /*
          * Don't use force here, it's convenient if the signal
          * can be temporarily blocked.
@@ -235,7 +235,7 @@ void shake_page(struct page *p, int access)
                 int nr;
                 do {
                         nr = shrink_slab(1000, GFP_KERNEL, 1000);
-                       if (page_count(p) == 0)
+                       if (page_count(p) == 1)
                                 break;
                 } while (nr > 10);
         }
@@ -327,7 +327,7 @@ static void add_to_kill(struct task_struct *tsk, struct page *p,
   * wrong earlier.
   */
  static void kill_procs_ao(struct list_head *to_kill, int doit, int trapno,
-                         int fail, unsigned long pfn)
+                         int fail, struct page *page, unsigned long pfn)
  {
         struct to_kill *tk, *next;
  
@@ -352,7 +352,7 @@ static void kill_procs_ao(struct list_head *to_kill, int doit, int trapno,
                          * process anyways.
                          */
                         else if (kill_proc_ao(tk->tsk, tk->addr, trapno,
-                                             pfn) < 0)
+                                             pfn, page) < 0)
                                 printk(KERN_ERR
                 "MCE %#lx: Cannot send advisory machine check signal to %s:%d\n",
                                         pfn, tk->tsk->comm, tk->tsk->pid);
@@ -928,7 +928,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
          * any accesses to the poisoned memory.
          */
         kill_procs_ao(&tokill, !!PageDirty(hpage), trapno,
-                     ret != SWAP_SUCCESS, pfn);
+                     ret != SWAP_SUCCESS, p, pfn);
  
         return ret;
  }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c

index a8cfa9cc6e86e5d6912a39bc3f5c9d18fb97b7cf..f12ad1836abe115b1b8e3bf4b9187c01249e8a30 100644 (file)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5182,9 +5182,9 @@ void *__init alloc_large_system_hash(const char *tablename,
         if (!table)
                 panic("Failed to allocate %s hash table\n", tablename);
  
-       printk(KERN_INFO "%s hash table entries: %d (order: %d, %lu bytes)\n",
+       printk(KERN_INFO "%s hash table entries: %ld (order: %d, %lu bytes)\n",
                tablename,
-              (1U << log2qty),
+              (1UL << log2qty),
                ilog2(size) - PAGE_SHIFT,
                size);
  
diff --git a/mm/rmap.c b/mm/rmap.c

index 9d2ba01bd4f91d3db707524cc9bafd5e81f623b4..92e6757f196ed4e3b3598c1f8b7214616a4cbe39 100644 (file)
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -381,7 +381,13 @@ vma_address(struct page *page, struct vm_area_struct *vma)
  unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
  {
         if (PageAnon(page)) {
-               if (vma->anon_vma->root != page_anon_vma(page)->root)
+               struct anon_vma *page__anon_vma = page_anon_vma(page);
+               /*
+                * Note: swapoff's unuse_vma() is more efficient with this
+                * check, and needs it to match anon_vma when KSM is active.
+                */
+               if (!vma->anon_vma || !page__anon_vma ||
+                   vma->anon_vma->root != page__anon_vma->root)
                         return -EFAULT;
         } else if (page->mapping && !(vma->vm_flags & VM_NONLINEAR)) {
                 if (!vma->vm_file ||
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c

index 01ddb0472f86c511f49ff8a602a726dd60550ff0..0eb96f7e44befb0155e364749f38eb37af0c2354 100644 (file)
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -24,8 +24,11 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
  
         if (vlan_dev)
                 skb->dev = vlan_dev;
-       else if (vlan_id)
-               goto drop;
+       else if (vlan_id) {
+               if (!(skb->dev->flags & IFF_PROMISC))
+                       goto drop;
+               skb->pkt_type = PACKET_OTHERHOST;
+       }
  
         return (polling ? netif_receive_skb(skb) : netif_rx(skb));
  
@@ -102,8 +105,11 @@ vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
  
         if (vlan_dev)
                 skb->dev = vlan_dev;
-       else if (vlan_id)
-               goto drop;
+       else if (vlan_id) {
+               if (!(skb->dev->flags & IFF_PROMISC))
+                       goto drop;
+               skb->pkt_type = PACKET_OTHERHOST;
+       }
  
         for (p = napi->gro_list; p; p = p->next) {
                 NAPI_GRO_CB(p)->same_flow =
diff --git a/net/Kconfig b/net/Kconfig

index e926884c1675c04c3d150ddc83eaa89f91a0cb4a..55fd82e9ffd91e9fd48878147f3068923373ce16 100644 (file)
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -293,6 +293,7 @@ source "net/wimax/Kconfig"
  source "net/rfkill/Kconfig"
  source "net/9p/Kconfig"
  source "net/caif/Kconfig"
+source "net/ceph/Kconfig"
  
  
  endif   # if NET
diff --git a/net/Makefile b/net/Makefile

index ea60fbce9b1ba3e623ee9f1ec9ce622169a3596d..6b7bfd7f1416d9950e90cb3ddd065d998c0e78ee 100644 (file)
--- a/net/Makefile
+++ b/net/Makefile
@@ -68,3 +68,4 @@ obj-$(CONFIG_SYSCTL)          += sysctl_net.o
  endif
  obj-$(CONFIG_WIMAX)            += wimax/
  obj-$(CONFIG_DNS_RESOLVER)     += dns_resolver/
+obj-$(CONFIG_CEPH_LIB)         += ceph/
diff --git a/net/atm/mpc.c b/net/atm/mpc.c

index 622b471e14e03dbc3752697851022a59aebffbe0..74bcc662c3dd8c5e7ec33a393560cc9bbf313b57 100644 (file)
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -778,7 +778,7 @@ static void mpc_push(struct atm_vcc *vcc, struct sk_buff *skb)
         eg->packets_rcvd++;
         mpc->eg_ops->put(eg);
  
-       memset(ATM_SKB(skb), 0, sizeof(struct atm_skb_data));
+       memset(ATM_SKB(new_skb), 0, sizeof(struct atm_skb_data));
         netif_rx(new_skb);
  }
  
diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c

index fadf26b4ed7c432eba09800b4532683fc44cc02d..0b54b7dd84010a52147a54155c4f8db2b61752a7 100644 (file)
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1441,33 +1441,23 @@ static inline void l2cap_do_send(struct sock *sk, struct sk_buff *skb)
  
  static void l2cap_streaming_send(struct sock *sk)
  {
-       struct sk_buff *skb, *tx_skb;
+       struct sk_buff *skb;
         struct l2cap_pinfo *pi = l2cap_pi(sk);
         u16 control, fcs;
  
-       while ((skb = sk->sk_send_head)) {
-               tx_skb = skb_clone(skb, GFP_ATOMIC);
-
-               control = get_unaligned_le16(tx_skb->data + L2CAP_HDR_SIZE);
+       while ((skb = skb_dequeue(TX_QUEUE(sk)))) {
+               control = get_unaligned_le16(skb->data + L2CAP_HDR_SIZE);
                 control |= pi->next_tx_seq << L2CAP_CTRL_TXSEQ_SHIFT;
-               put_unaligned_le16(control, tx_skb->data + L2CAP_HDR_SIZE);
+               put_unaligned_le16(control, skb->data + L2CAP_HDR_SIZE);
  
                 if (pi->fcs == L2CAP_FCS_CRC16) {
-                       fcs = crc16(0, (u8 *)tx_skb->data, tx_skb->len - 2);
-                       put_unaligned_le16(fcs, tx_skb->data + tx_skb->len - 2);
+                       fcs = crc16(0, (u8 *)skb->data, skb->len - 2);
+                       put_unaligned_le16(fcs, skb->data + skb->len - 2);
                 }
  
-               l2cap_do_send(sk, tx_skb);
+               l2cap_do_send(sk, skb);
  
                 pi->next_tx_seq = (pi->next_tx_seq + 1) % 64;
-
-               if (skb_queue_is_last(TX_QUEUE(sk), skb))
-                       sk->sk_send_head = NULL;
-               else
-                       sk->sk_send_head = skb_queue_next(TX_QUEUE(sk), skb);
-
-               skb = skb_dequeue(TX_QUEUE(sk));
-               kfree_skb(skb);
         }
  }
  
@@ -1960,6 +1950,11 @@ static int l2cap_sock_setsockopt_old(struct socket *sock, int optname, char __us
  
         switch (optname) {
         case L2CAP_OPTIONS:
+               if (sk->sk_state == BT_CONNECTED) {
+                       err = -EINVAL;
+                       break;
+               }
+
                 opts.imtu     = l2cap_pi(sk)->imtu;
                 opts.omtu     = l2cap_pi(sk)->omtu;
                 opts.flush_to = l2cap_pi(sk)->flush_to;
@@ -2771,10 +2766,10 @@ static int l2cap_parse_conf_rsp(struct sock *sk, void *rsp, int len, void *data,
                 case L2CAP_CONF_MTU:
                         if (val < L2CAP_DEFAULT_MIN_MTU) {
                                 *result = L2CAP_CONF_UNACCEPT;
-                               pi->omtu = L2CAP_DEFAULT_MIN_MTU;
+                               pi->imtu = L2CAP_DEFAULT_MIN_MTU;
                         } else
-                               pi->omtu = val;
-                       l2cap_add_conf_opt(&ptr, L2CAP_CONF_MTU, 2, pi->omtu);
+                               pi->imtu = val;
+                       l2cap_add_conf_opt(&ptr, L2CAP_CONF_MTU, 2, pi->imtu);
                         break;
  
                 case L2CAP_CONF_FLUSH_TO:
@@ -3071,6 +3066,17 @@ static inline int l2cap_connect_rsp(struct l2cap_conn *conn, struct l2cap_cmd_hd
         return 0;
  }
  
+static inline void set_default_fcs(struct l2cap_pinfo *pi)
+{
+       /* FCS is enabled only in ERTM or streaming mode, if one or both
+        * sides request it.
+        */
+       if (pi->mode != L2CAP_MODE_ERTM && pi->mode != L2CAP_MODE_STREAMING)
+               pi->fcs = L2CAP_FCS_NONE;
+       else if (!(pi->conf_state & L2CAP_CONF_NO_FCS_RECV))
+               pi->fcs = L2CAP_FCS_CRC16;
+}
+
  static inline int l2cap_config_req(struct l2cap_conn *conn, struct l2cap_cmd_hdr *cmd, u16 cmd_len, u8 *data)
  {
         struct l2cap_conf_req *req = (struct l2cap_conf_req *) data;
@@ -3088,14 +3094,8 @@ static inline int l2cap_config_req(struct l2cap_conn *conn, struct l2cap_cmd_hdr
         if (!sk)
                 return -ENOENT;
  
-       if (sk->sk_state != BT_CONFIG) {
-               struct l2cap_cmd_rej rej;
-
-               rej.reason = cpu_to_le16(0x0002);
-               l2cap_send_cmd(conn, cmd->ident, L2CAP_COMMAND_REJ,
-                               sizeof(rej), &rej);
+       if (sk->sk_state == BT_DISCONN)
                 goto unlock;
-       }
  
         /* Reject if config buffer is too small. */
         len = cmd_len - sizeof(*req);
@@ -3135,9 +3135,7 @@ static inline int l2cap_config_req(struct l2cap_conn *conn, struct l2cap_cmd_hdr
                 goto unlock;
  
         if (l2cap_pi(sk)->conf_state & L2CAP_CONF_INPUT_DONE) {
-               if (!(l2cap_pi(sk)->conf_state & L2CAP_CONF_NO_FCS_RECV) ||
-                   l2cap_pi(sk)->fcs != L2CAP_FCS_NONE)
-                       l2cap_pi(sk)->fcs = L2CAP_FCS_CRC16;
+               set_default_fcs(l2cap_pi(sk));
  
                 sk->sk_state = BT_CONNECTED;
  
@@ -3225,9 +3223,7 @@ static inline int l2cap_config_rsp(struct l2cap_conn *conn, struct l2cap_cmd_hdr
         l2cap_pi(sk)->conf_state |= L2CAP_CONF_INPUT_DONE;
  
         if (l2cap_pi(sk)->conf_state & L2CAP_CONF_OUTPUT_DONE) {
-               if (!(l2cap_pi(sk)->conf_state & L2CAP_CONF_NO_FCS_RECV) ||
-                   l2cap_pi(sk)->fcs != L2CAP_FCS_NONE)
-                       l2cap_pi(sk)->fcs = L2CAP_FCS_CRC16;
+               set_default_fcs(l2cap_pi(sk));
  
                 sk->sk_state = BT_CONNECTED;
                 l2cap_pi(sk)->next_tx_seq = 0;
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c

index 44a623275951e4b481abf1942fb2587867891dca..194b3a04cfd38a3b4a13817d5aecace4f355ea49 100644 (file)
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -82,11 +82,14 @@ static void rfcomm_sk_data_ready(struct rfcomm_dlc *d, struct sk_buff *skb)
  static void rfcomm_sk_state_change(struct rfcomm_dlc *d, int err)
  {
         struct sock *sk = d->owner, *parent;
+       unsigned long flags;
+
         if (!sk)
                 return;
  
         BT_DBG("dlc %p state %ld err %d", d, d->state, err);
  
+       local_irq_save(flags);
         bh_lock_sock(sk);
  
         if (err)
@@ -108,6 +111,7 @@ static void rfcomm_sk_state_change(struct rfcomm_dlc *d, int err)
         }
  
         bh_unlock_sock(sk);
+       local_irq_restore(flags);
  
         if (parent && sock_flag(sk, SOCK_ZAPPED)) {
                 /* We have to drop DLC lock here, otherwise
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c

index 8ce9047861166740a17a2448cb1d4f668fba1d21..4bf28f25f368b399a6ef220e06c08c0f5d2621f5 100644 (file)
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -827,6 +827,7 @@ static int caif_connect(struct socket *sock, struct sockaddr *uaddr,
         long timeo;
         int err;
         int ifindex, headroom, tailroom;
+       unsigned int mtu;
         struct net_device *dev;
  
         lock_sock(sk);
@@ -896,15 +897,23 @@ static int caif_connect(struct socket *sock, struct sockaddr *uaddr,
                 cf_sk->sk.sk_state = CAIF_DISCONNECTED;
                 goto out;
         }
-       dev = dev_get_by_index(sock_net(sk), ifindex);
+
+       err = -ENODEV;
+       rcu_read_lock();
+       dev = dev_get_by_index_rcu(sock_net(sk), ifindex);
+       if (!dev) {
+               rcu_read_unlock();
+               goto out;
+       }
         cf_sk->headroom = LL_RESERVED_SPACE_EXTRA(dev, headroom);
+       mtu = dev->mtu;
+       rcu_read_unlock();
+
         cf_sk->tailroom = tailroom;
-       cf_sk->maxframe = dev->mtu - (headroom + tailroom);
-       dev_put(dev);
+       cf_sk->maxframe = mtu - (headroom + tailroom);
         if (cf_sk->maxframe < 1) {
-               pr_warning("CAIF: %s(): CAIF Interface MTU too small (%d)\n",
-                       __func__, dev->mtu);
-               err = -ENODEV;
+               pr_warning("CAIF: %s(): CAIF Interface MTU too small (%u)\n",
+                          __func__, mtu);
                 goto out;
         }
  
diff --git a/net/ceph/Kconfig b/net/ceph/Kconfig

new file mode 100644 (file)

index 0000000..ad42404
--- /dev/null
+++ b/net/ceph/Kconfig
@@ -0,0 +1,28 @@
+config CEPH_LIB
+        tristate "Ceph core library (EXPERIMENTAL)"
+       depends on INET && EXPERIMENTAL
+       select LIBCRC32C
+       select CRYPTO_AES
+       select CRYPTO
+       default n
+       help
+         Choose Y or M here to include cephlib, which provides the
+         common functionality to both the Ceph filesystem and
+         to the rados block device (rbd).
+
+         More information at http://ceph.newdream.net/.
+
+         If unsure, say N.
+
+config CEPH_LIB_PRETTYDEBUG
+       bool "Include file:line in ceph debug output"
+       depends on CEPH_LIB
+       default n
+       help
+         If you say Y here, debug output will include a filename and
+         line to aid debugging.  This increases kernel size and slows
+         execution slightly when debug call sites are enabled (e.g.,
+         via CONFIG_DYNAMIC_DEBUG).
+
+         If unsure, say N.
+
diff --git a/net/ceph/Makefile b/net/ceph/Makefile

new file mode 100644 (file)

index 0000000..aab1cab
--- /dev/null
+++ b/net/ceph/Makefile
@@ -0,0 +1,37 @@
+#
+# Makefile for CEPH filesystem.
+#
+
+ifneq ($(KERNELRELEASE),)
+
+obj-$(CONFIG_CEPH_LIB) += libceph.o
+
+libceph-objs := ceph_common.o messenger.o msgpool.o buffer.o pagelist.o \
+       mon_client.o \
+       osd_client.o osdmap.o crush/crush.o crush/mapper.o crush/hash.o \
+       debugfs.o \
+       auth.o auth_none.o \
+       crypto.o armor.o \
+       auth_x.o \
+       ceph_fs.o ceph_strings.o ceph_hash.o \
+       pagevec.o
+
+else
+#Otherwise we were called directly from the command
+# line; invoke the kernel build system.
+
+KERNELDIR ?= /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+
+default: all
+
+all:
+       $(MAKE) -C $(KERNELDIR) M=$(PWD) CONFIG_CEPH_LIB=m modules
+
+modules_install:
+       $(MAKE) -C $(KERNELDIR) M=$(PWD) CONFIG_CEPH_LIB=m modules_install
+
+clean:
+       $(MAKE) -C $(KERNELDIR) M=$(PWD) clean
+
+endif
diff --git a/net/ceph/armor.c b/net/ceph/armor.c

new file mode 100644 (file)

index 0000000..eb2a666
--- /dev/null
+++ b/net/ceph/armor.c
@@ -0,0 +1,103 @@
+
+#include <linux/errno.h>
+
+int ceph_armor(char *dst, const char *src, const char *end);
+int ceph_unarmor(char *dst, const char *src, const char *end);
+
+/*
+ * base64 encode/decode.
+ */
+
+static const char *pem_key =
+       "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+
+static int encode_bits(int c)
+{
+       return pem_key[c];
+}
+
+static int decode_bits(char c)
+{
+       if (c >= 'A' && c <= 'Z')
+               return c - 'A';
+       if (c >= 'a' && c <= 'z')
+               return c - 'a' + 26;
+       if (c >= '0' && c <= '9')
+               return c - '0' + 52;
+       if (c == '+')
+               return 62;
+       if (c == '/')
+               return 63;
+       if (c == '=')
+               return 0; /* just non-negative, please */
+       return -EINVAL;
+}
+
+int ceph_armor(char *dst, const char *src, const char *end)
+{
+       int olen = 0;
+       int line = 0;
+
+       while (src < end) {
+               unsigned char a, b, c;
+
+               a = *src++;
+               *dst++ = encode_bits(a >> 2);
+               if (src < end) {
+                       b = *src++;
+                       *dst++ = encode_bits(((a & 3) << 4) | (b >> 4));
+                       if (src < end) {
+                               c = *src++;
+                               *dst++ = encode_bits(((b & 15) << 2) |
+                                                    (c >> 6));
+                               *dst++ = encode_bits(c & 63);
+                       } else {
+                               *dst++ = encode_bits((b & 15) << 2);
+                               *dst++ = '=';
+                       }
+               } else {
+                       *dst++ = encode_bits(((a & 3) << 4));
+                       *dst++ = '=';
+                       *dst++ = '=';
+               }
+               olen += 4;
+               line += 4;
+               if (line == 64) {
+                       line = 0;
+                       *(dst++) = '\n';
+                       olen++;
+               }
+       }
+       return olen;
+}
+
+int ceph_unarmor(char *dst, const char *src, const char *end)
+{
+       int olen = 0;
+
+       while (src < end) {
+               int a, b, c, d;
+
+               if (src < end && src[0] == '\n')
+                       src++;
+               if (src + 4 > end)
+                       return -EINVAL;
+               a = decode_bits(src[0]);
+               b = decode_bits(src[1]);
+               c = decode_bits(src[2]);
+               d = decode_bits(src[3]);
+               if (a < 0 || b < 0 || c < 0 || d < 0)
+                       return -EINVAL;
+
+               *dst++ = (a << 2) | (b >> 4);
+               if (src[2] == '=')
+                       return olen + 1;
+               *dst++ = ((b & 15) << 4) | (c >> 2);
+               if (src[3] == '=')
+                       return olen + 2;
+               *dst++ = ((c & 3) << 6) | d;
+               olen += 3;
+               src += 4;
+       }
+       return olen;
+}
diff --git a/net/ceph/auth.c b/net/ceph/auth.c

new file mode 100644 (file)

index 0000000..549c1f4
--- /dev/null
+++ b/net/ceph/auth.c
@@ -0,0 +1,259 @@
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/module.h>
+#include <linux/err.h>
+#include <linux/slab.h>
+
+#include <linux/ceph/types.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/messenger.h>
+#include "auth_none.h"
+#include "auth_x.h"
+
+
+/*
+ * get protocol handler
+ */
+static u32 supported_protocols[] = {
+       CEPH_AUTH_NONE,
+       CEPH_AUTH_CEPHX
+};
+
+static int ceph_auth_init_protocol(struct ceph_auth_client *ac, int protocol)
+{
+       switch (protocol) {
+       case CEPH_AUTH_NONE:
+               return ceph_auth_none_init(ac);
+       case CEPH_AUTH_CEPHX:
+               return ceph_x_init(ac);
+       default:
+               return -ENOENT;
+       }
+}
+
+/*
+ * setup, teardown.
+ */
+struct ceph_auth_client *ceph_auth_init(const char *name, const char *secret)
+{
+       struct ceph_auth_client *ac;
+       int ret;
+
+       dout("auth_init name '%s' secret '%s'\n", name, secret);
+
+       ret = -ENOMEM;
+       ac = kzalloc(sizeof(*ac), GFP_NOFS);
+       if (!ac)
+               goto out;
+
+       ac->negotiating = true;
+       if (name)
+               ac->name = name;
+       else
+               ac->name = CEPH_AUTH_NAME_DEFAULT;
+       dout("auth_init name %s secret %s\n", ac->name, secret);
+       ac->secret = secret;
+       return ac;
+
+out:
+       return ERR_PTR(ret);
+}
+
+void ceph_auth_destroy(struct ceph_auth_client *ac)
+{
+       dout("auth_destroy %p\n", ac);
+       if (ac->ops)
+               ac->ops->destroy(ac);
+       kfree(ac);
+}
+
+/*
+ * Reset occurs when reconnecting to the monitor.
+ */
+void ceph_auth_reset(struct ceph_auth_client *ac)
+{
+       dout("auth_reset %p\n", ac);
+       if (ac->ops && !ac->negotiating)
+               ac->ops->reset(ac);
+       ac->negotiating = true;
+}
+
+int ceph_entity_name_encode(const char *name, void **p, void *end)
+{
+       int len = strlen(name);
+
+       if (*p + 2*sizeof(u32) + len > end)
+               return -ERANGE;
+       ceph_encode_32(p, CEPH_ENTITY_TYPE_CLIENT);
+       ceph_encode_32(p, len);
+       ceph_encode_copy(p, name, len);
+       return 0;
+}
+
+/*
+ * Initiate protocol negotiation with monitor.  Include entity name
+ * and list supported protocols.
+ */
+int ceph_auth_build_hello(struct ceph_auth_client *ac, void *buf, size_t len)
+{
+       struct ceph_mon_request_header *monhdr = buf;
+       void *p = monhdr + 1, *end = buf + len, *lenp;
+       int i, num;
+       int ret;
+
+       dout("auth_build_hello\n");
+       monhdr->have_version = 0;
+       monhdr->session_mon = cpu_to_le16(-1);
+       monhdr->session_mon_tid = 0;
+
+       ceph_encode_32(&p, 0);  /* no protocol, yet */
+
+       lenp = p;
+       p += sizeof(u32);
+
+       ceph_decode_need(&p, end, 1 + sizeof(u32), bad);
+       ceph_encode_8(&p, 1);
+       num = ARRAY_SIZE(supported_protocols);
+       ceph_encode_32(&p, num);
+       ceph_decode_need(&p, end, num * sizeof(u32), bad);
+       for (i = 0; i < num; i++)
+               ceph_encode_32(&p, supported_protocols[i]);
+
+       ret = ceph_entity_name_encode(ac->name, &p, end);
+       if (ret < 0)
+               return ret;
+       ceph_decode_need(&p, end, sizeof(u64), bad);
+       ceph_encode_64(&p, ac->global_id);
+
+       ceph_encode_32(&lenp, p - lenp - sizeof(u32));
+       return p - buf;
+
+bad:
+       return -ERANGE;
+}
+
+static int ceph_build_auth_request(struct ceph_auth_client *ac,
+                                  void *msg_buf, size_t msg_len)
+{
+       struct ceph_mon_request_header *monhdr = msg_buf;
+       void *p = monhdr + 1;
+       void *end = msg_buf + msg_len;
+       int ret;
+
+       monhdr->have_version = 0;
+       monhdr->session_mon = cpu_to_le16(-1);
+       monhdr->session_mon_tid = 0;
+
+       ceph_encode_32(&p, ac->protocol);
+
+       ret = ac->ops->build_request(ac, p + sizeof(u32), end);
+       if (ret < 0) {
+               pr_err("error %d building auth method %s request\n", ret,
+                      ac->ops->name);
+               return ret;
+       }
+       dout(" built request %d bytes\n", ret);
+       ceph_encode_32(&p, ret);
+       return p + ret - msg_buf;
+}
+
+/*
+ * Handle auth message from monitor.
+ */
+int ceph_handle_auth_reply(struct ceph_auth_client *ac,
+                          void *buf, size_t len,
+                          void *reply_buf, size_t reply_len)
+{
+       void *p = buf;
+       void *end = buf + len;
+       int protocol;
+       s32 result;
+       u64 global_id;
+       void *payload, *payload_end;
+       int payload_len;
+       char *result_msg;
+       int result_msg_len;
+       int ret = -EINVAL;
+
+       dout("handle_auth_reply %p %p\n", p, end);
+       ceph_decode_need(&p, end, sizeof(u32) * 3 + sizeof(u64), bad);
+       protocol = ceph_decode_32(&p);
+       result = ceph_decode_32(&p);
+       global_id = ceph_decode_64(&p);
+       payload_len = ceph_decode_32(&p);
+       payload = p;
+       p += payload_len;
+       ceph_decode_need(&p, end, sizeof(u32), bad);
+       result_msg_len = ceph_decode_32(&p);
+       result_msg = p;
+       p += result_msg_len;
+       if (p != end)
+               goto bad;
+
+       dout(" result %d '%.*s' gid %llu len %d\n", result, result_msg_len,
+            result_msg, global_id, payload_len);
+
+       payload_end = payload + payload_len;
+
+       if (global_id && ac->global_id != global_id) {
+               dout(" set global_id %lld -> %lld\n", ac->global_id, global_id);
+               ac->global_id = global_id;
+       }
+
+       if (ac->negotiating) {
+               /* server does not support our protocols? */
+               if (!protocol && result < 0) {
+                       ret = result;
+                       goto out;
+               }
+               /* set up (new) protocol handler? */
+               if (ac->protocol && ac->protocol != protocol) {
+                       ac->ops->destroy(ac);
+                       ac->protocol = 0;
+                       ac->ops = NULL;
+               }
+               if (ac->protocol != protocol) {
+                       ret = ceph_auth_init_protocol(ac, protocol);
+                       if (ret) {
+                               pr_err("error %d on auth protocol %d init\n",
+                                      ret, protocol);
+                               goto out;
+                       }
+               }
+
+               ac->negotiating = false;
+       }
+
+       ret = ac->ops->handle_reply(ac, result, payload, payload_end);
+       if (ret == -EAGAIN) {
+               return ceph_build_auth_request(ac, reply_buf, reply_len);
+       } else if (ret) {
+               pr_err("auth method '%s' error %d\n", ac->ops->name, ret);
+               return ret;
+       }
+       return 0;
+
+bad:
+       pr_err("failed to decode auth msg\n");
+out:
+       return ret;
+}
+
+int ceph_build_auth(struct ceph_auth_client *ac,
+                   void *msg_buf, size_t msg_len)
+{
+       if (!ac->protocol)
+               return ceph_auth_build_hello(ac, msg_buf, msg_len);
+       BUG_ON(!ac->ops);
+       if (ac->ops->should_authenticate(ac))
+               return ceph_build_auth_request(ac, msg_buf, msg_len);
+       return 0;
+}
+
+int ceph_auth_is_authenticated(struct ceph_auth_client *ac)
+{
+       if (!ac->ops)
+               return 0;
+       return ac->ops->is_authenticated(ac);
+}
diff --git a/net/ceph/auth_none.c b/net/ceph/auth_none.c

new file mode 100644 (file)

index 0000000..214c2bb
--- /dev/null
+++ b/net/ceph/auth_none.c
@@ -0,0 +1,132 @@
+
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+
+#include <linux/ceph/decode.h>
+#include <linux/ceph/auth.h>
+
+#include "auth_none.h"
+
+static void reset(struct ceph_auth_client *ac)
+{
+       struct ceph_auth_none_info *xi = ac->private;
+
+       xi->starting = true;
+       xi->built_authorizer = false;
+}
+
+static void destroy(struct ceph_auth_client *ac)
+{
+       kfree(ac->private);
+       ac->private = NULL;
+}
+
+static int is_authenticated(struct ceph_auth_client *ac)
+{
+       struct ceph_auth_none_info *xi = ac->private;
+
+       return !xi->starting;
+}
+
+static int should_authenticate(struct ceph_auth_client *ac)
+{
+       struct ceph_auth_none_info *xi = ac->private;
+
+       return xi->starting;
+}
+
+/*
+ * the generic auth code decode the global_id, and we carry no actual
+ * authenticate state, so nothing happens here.
+ */
+static int handle_reply(struct ceph_auth_client *ac, int result,
+                       void *buf, void *end)
+{
+       struct ceph_auth_none_info *xi = ac->private;
+
+       xi->starting = false;
+       return result;
+}
+
+/*
+ * build an 'authorizer' with our entity_name and global_id.  we can
+ * reuse a single static copy since it is identical for all services
+ * we connect to.
+ */
+static int ceph_auth_none_create_authorizer(
+       struct ceph_auth_client *ac, int peer_type,
+       struct ceph_authorizer **a,
+       void **buf, size_t *len,
+       void **reply_buf, size_t *reply_len)
+{
+       struct ceph_auth_none_info *ai = ac->private;
+       struct ceph_none_authorizer *au = &ai->au;
+       void *p, *end;
+       int ret;
+
+       if (!ai->built_authorizer) {
+               p = au->buf;
+               end = p + sizeof(au->buf);
+               ceph_encode_8(&p, 1);
+               ret = ceph_entity_name_encode(ac->name, &p, end - 8);
+               if (ret < 0)
+                       goto bad;
+               ceph_decode_need(&p, end, sizeof(u64), bad2);
+               ceph_encode_64(&p, ac->global_id);
+               au->buf_len = p - (void *)au->buf;
+               ai->built_authorizer = true;
+               dout("built authorizer len %d\n", au->buf_len);
+       }
+
+       *a = (struct ceph_authorizer *)au;
+       *buf = au->buf;
+       *len = au->buf_len;
+       *reply_buf = au->reply_buf;
+       *reply_len = sizeof(au->reply_buf);
+       return 0;
+
+bad2:
+       ret = -ERANGE;
+bad:
+       return ret;
+}
+
+static void ceph_auth_none_destroy_authorizer(struct ceph_auth_client *ac,
+                                     struct ceph_authorizer *a)
+{
+       /* nothing to do */
+}
+
+static const struct ceph_auth_client_ops ceph_auth_none_ops = {
+       .name = "none",
+       .reset = reset,
+       .destroy = destroy,
+       .is_authenticated = is_authenticated,
+       .should_authenticate = should_authenticate,
+       .handle_reply = handle_reply,
+       .create_authorizer = ceph_auth_none_create_authorizer,
+       .destroy_authorizer = ceph_auth_none_destroy_authorizer,
+};
+
+int ceph_auth_none_init(struct ceph_auth_client *ac)
+{
+       struct ceph_auth_none_info *xi;
+
+       dout("ceph_auth_none_init %p\n", ac);
+       xi = kzalloc(sizeof(*xi), GFP_NOFS);
+       if (!xi)
+               return -ENOMEM;
+
+       xi->starting = true;
+       xi->built_authorizer = false;
+
+       ac->protocol = CEPH_AUTH_NONE;
+       ac->private = xi;
+       ac->ops = &ceph_auth_none_ops;
+       return 0;
+}
+
diff --git a/net/ceph/auth_none.h b/net/ceph/auth_none.h

new file mode 100644 (file)

index 0000000..ed7d088
--- /dev/null
+++ b/net/ceph/auth_none.h
@@ -0,0 +1,29 @@
+#ifndef _FS_CEPH_AUTH_NONE_H
+#define _FS_CEPH_AUTH_NONE_H
+
+#include <linux/slab.h>
+#include <linux/ceph/auth.h>
+
+/*
+ * null security mode.
+ *
+ * we use a single static authorizer that simply encodes our entity name
+ * and global id.
+ */
+
+struct ceph_none_authorizer {
+       char buf[128];
+       int buf_len;
+       char reply_buf[0];
+};
+
+struct ceph_auth_none_info {
+       bool starting;
+       bool built_authorizer;
+       struct ceph_none_authorizer au;   /* we only need one; it's static */
+};
+
+extern int ceph_auth_none_init(struct ceph_auth_client *ac);
+
+#endif
+
diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c

new file mode 100644 (file)

index 0000000..7fd5dfc
--- /dev/null
+++ b/net/ceph/auth_x.c
@@ -0,0 +1,688 @@
+
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+
+#include <linux/ceph/decode.h>
+#include <linux/ceph/auth.h>
+
+#include "crypto.h"
+#include "auth_x.h"
+#include "auth_x_protocol.h"
+
+#define TEMP_TICKET_BUF_LEN    256
+
+static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed);
+
+static int ceph_x_is_authenticated(struct ceph_auth_client *ac)
+{
+       struct ceph_x_info *xi = ac->private;
+       int need;
+
+       ceph_x_validate_tickets(ac, &need);
+       dout("ceph_x_is_authenticated want=%d need=%d have=%d\n",
+            ac->want_keys, need, xi->have_keys);
+       return (ac->want_keys & xi->have_keys) == ac->want_keys;
+}
+
+static int ceph_x_should_authenticate(struct ceph_auth_client *ac)
+{
+       struct ceph_x_info *xi = ac->private;
+       int need;
+
+       ceph_x_validate_tickets(ac, &need);
+       dout("ceph_x_should_authenticate want=%d need=%d have=%d\n",
+            ac->want_keys, need, xi->have_keys);
+       return need != 0;
+}
+
+static int ceph_x_encrypt_buflen(int ilen)
+{
+       return sizeof(struct ceph_x_encrypt_header) + ilen + 16 +
+               sizeof(u32);
+}
+
+static int ceph_x_encrypt(struct ceph_crypto_key *secret,
+                         void *ibuf, int ilen, void *obuf, size_t olen)
+{
+       struct ceph_x_encrypt_header head = {
+               .struct_v = 1,
+               .magic = cpu_to_le64(CEPHX_ENC_MAGIC)
+       };
+       size_t len = olen - sizeof(u32);
+       int ret;
+
+       ret = ceph_encrypt2(secret, obuf + sizeof(u32), &len,
+                           &head, sizeof(head), ibuf, ilen);
+       if (ret)
+               return ret;
+       ceph_encode_32(&obuf, len);
+       return len + sizeof(u32);
+}
+
+static int ceph_x_decrypt(struct ceph_crypto_key *secret,
+                         void **p, void *end, void *obuf, size_t olen)
+{
+       struct ceph_x_encrypt_header head;
+       size_t head_len = sizeof(head);
+       int len, ret;
+
+       len = ceph_decode_32(p);
+       if (*p + len > end)
+               return -EINVAL;
+
+       dout("ceph_x_decrypt len %d\n", len);
+       ret = ceph_decrypt2(secret, &head, &head_len, obuf, &olen,
+                           *p, len);
+       if (ret)
+               return ret;
+       if (head.struct_v != 1 || le64_to_cpu(head.magic) != CEPHX_ENC_MAGIC)
+               return -EPERM;
+       *p += len;
+       return olen;
+}
+
+/*
+ * get existing (or insert new) ticket handler
+ */
+static struct ceph_x_ticket_handler *
+get_ticket_handler(struct ceph_auth_client *ac, int service)
+{
+       struct ceph_x_ticket_handler *th;
+       struct ceph_x_info *xi = ac->private;
+       struct rb_node *parent = NULL, **p = &xi->ticket_handlers.rb_node;
+
+       while (*p) {
+               parent = *p;
+               th = rb_entry(parent, struct ceph_x_ticket_handler, node);
+               if (service < th->service)
+                       p = &(*p)->rb_left;
+               else if (service > th->service)
+                       p = &(*p)->rb_right;
+               else
+                       return th;
+       }
+
+       /* add it */
+       th = kzalloc(sizeof(*th), GFP_NOFS);
+       if (!th)
+               return ERR_PTR(-ENOMEM);
+       th->service = service;
+       rb_link_node(&th->node, parent, p);
+       rb_insert_color(&th->node, &xi->ticket_handlers);
+       return th;
+}
+
+static void remove_ticket_handler(struct ceph_auth_client *ac,
+                                 struct ceph_x_ticket_handler *th)
+{
+       struct ceph_x_info *xi = ac->private;
+
+       dout("remove_ticket_handler %p %d\n", th, th->service);
+       rb_erase(&th->node, &xi->ticket_handlers);
+       ceph_crypto_key_destroy(&th->session_key);
+       if (th->ticket_blob)
+               ceph_buffer_put(th->ticket_blob);
+       kfree(th);
+}
+
+static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
+                                   struct ceph_crypto_key *secret,
+                                   void *buf, void *end)
+{
+       struct ceph_x_info *xi = ac->private;
+       int num;
+       void *p = buf;
+       int ret;
+       char *dbuf;
+       char *ticket_buf;
+       u8 reply_struct_v;
+
+       dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
+       if (!dbuf)
+               return -ENOMEM;
+
+       ret = -ENOMEM;
+       ticket_buf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
+       if (!ticket_buf)
+               goto out_dbuf;
+
+       ceph_decode_need(&p, end, 1 + sizeof(u32), bad);
+       reply_struct_v = ceph_decode_8(&p);
+       if (reply_struct_v != 1)
+               goto bad;
+       num = ceph_decode_32(&p);
+       dout("%d tickets\n", num);
+       while (num--) {
+               int type;
+               u8 tkt_struct_v, blob_struct_v;
+               struct ceph_x_ticket_handler *th;
+               void *dp, *dend;
+               int dlen;
+               char is_enc;
+               struct timespec validity;
+               struct ceph_crypto_key old_key;
+               void *tp, *tpend;
+               struct ceph_timespec new_validity;
+               struct ceph_crypto_key new_session_key;
+               struct ceph_buffer *new_ticket_blob;
+               unsigned long new_expires, new_renew_after;
+               u64 new_secret_id;
+
+               ceph_decode_need(&p, end, sizeof(u32) + 1, bad);
+
+               type = ceph_decode_32(&p);
+               dout(" ticket type %d %s\n", type, ceph_entity_type_name(type));
+
+               tkt_struct_v = ceph_decode_8(&p);
+               if (tkt_struct_v != 1)
+                       goto bad;
+
+               th = get_ticket_handler(ac, type);
+               if (IS_ERR(th)) {
+                       ret = PTR_ERR(th);
+                       goto out;
+               }
+
+               /* blob for me */
+               dlen = ceph_x_decrypt(secret, &p, end, dbuf,
+                                     TEMP_TICKET_BUF_LEN);
+               if (dlen <= 0) {
+                       ret = dlen;
+                       goto out;
+               }
+               dout(" decrypted %d bytes\n", dlen);
+               dend = dbuf + dlen;
+               dp = dbuf;
+
+               tkt_struct_v = ceph_decode_8(&dp);
+               if (tkt_struct_v != 1)
+                       goto bad;
+
+               memcpy(&old_key, &th->session_key, sizeof(old_key));
+               ret = ceph_crypto_key_decode(&new_session_key, &dp, dend);
+               if (ret)
+                       goto out;
+
+               ceph_decode_copy(&dp, &new_validity, sizeof(new_validity));
+               ceph_decode_timespec(&validity, &new_validity);
+               new_expires = get_seconds() + validity.tv_sec;
+               new_renew_after = new_expires - (validity.tv_sec / 4);
+               dout(" expires=%lu renew_after=%lu\n", new_expires,
+                    new_renew_after);
+
+               /* ticket blob for service */
+               ceph_decode_8_safe(&p, end, is_enc, bad);
+               tp = ticket_buf;
+               if (is_enc) {
+                       /* encrypted */
+                       dout(" encrypted ticket\n");
+                       dlen = ceph_x_decrypt(&old_key, &p, end, ticket_buf,
+                                             TEMP_TICKET_BUF_LEN);
+                       if (dlen < 0) {
+                               ret = dlen;
+                               goto out;
+                       }
+                       dlen = ceph_decode_32(&tp);
+               } else {
+                       /* unencrypted */
+                       ceph_decode_32_safe(&p, end, dlen, bad);
+                       ceph_decode_need(&p, end, dlen, bad);
+                       ceph_decode_copy(&p, ticket_buf, dlen);
+               }
+               tpend = tp + dlen;
+               dout(" ticket blob is %d bytes\n", dlen);
+               ceph_decode_need(&tp, tpend, 1 + sizeof(u64), bad);
+               blob_struct_v = ceph_decode_8(&tp);
+               new_secret_id = ceph_decode_64(&tp);
+               ret = ceph_decode_buffer(&new_ticket_blob, &tp, tpend);
+               if (ret)
+                       goto out;
+
+               /* all is well, update our ticket */
+               ceph_crypto_key_destroy(&th->session_key);
+               if (th->ticket_blob)
+                       ceph_buffer_put(th->ticket_blob);
+               th->session_key = new_session_key;
+               th->ticket_blob = new_ticket_blob;
+               th->validity = new_validity;
+               th->secret_id = new_secret_id;
+               th->expires = new_expires;
+               th->renew_after = new_renew_after;
+               dout(" got ticket service %d (%s) secret_id %lld len %d\n",
+                    type, ceph_entity_type_name(type), th->secret_id,
+                    (int)th->ticket_blob->vec.iov_len);
+               xi->have_keys |= th->service;
+       }
+
+       ret = 0;
+out:
+       kfree(ticket_buf);
+out_dbuf:
+       kfree(dbuf);
+       return ret;
+
+bad:
+       ret = -EINVAL;
+       goto out;
+}
+
+static int ceph_x_build_authorizer(struct ceph_auth_client *ac,
+                                  struct ceph_x_ticket_handler *th,
+                                  struct ceph_x_authorizer *au)
+{
+       int maxlen;
+       struct ceph_x_authorize_a *msg_a;
+       struct ceph_x_authorize_b msg_b;
+       void *p, *end;
+       int ret;
+       int ticket_blob_len =
+               (th->ticket_blob ? th->ticket_blob->vec.iov_len : 0);
+
+       dout("build_authorizer for %s %p\n",
+            ceph_entity_type_name(th->service), au);
+
+       maxlen = sizeof(*msg_a) + sizeof(msg_b) +
+               ceph_x_encrypt_buflen(ticket_blob_len);
+       dout("  need len %d\n", maxlen);
+       if (au->buf && au->buf->alloc_len < maxlen) {
+               ceph_buffer_put(au->buf);
+               au->buf = NULL;
+       }
+       if (!au->buf) {
+               au->buf = ceph_buffer_new(maxlen, GFP_NOFS);
+               if (!au->buf)
+                       return -ENOMEM;
+       }
+       au->service = th->service;
+
+       msg_a = au->buf->vec.iov_base;
+       msg_a->struct_v = 1;
+       msg_a->global_id = cpu_to_le64(ac->global_id);
+       msg_a->service_id = cpu_to_le32(th->service);
+       msg_a->ticket_blob.struct_v = 1;
+       msg_a->ticket_blob.secret_id = cpu_to_le64(th->secret_id);
+       msg_a->ticket_blob.blob_len = cpu_to_le32(ticket_blob_len);
+       if (ticket_blob_len) {
+               memcpy(msg_a->ticket_blob.blob, th->ticket_blob->vec.iov_base,
+                      th->ticket_blob->vec.iov_len);
+       }
+       dout(" th %p secret_id %lld %lld\n", th, th->secret_id,
+            le64_to_cpu(msg_a->ticket_blob.secret_id));
+
+       p = msg_a + 1;
+       p += ticket_blob_len;
+       end = au->buf->vec.iov_base + au->buf->vec.iov_len;
+
+       get_random_bytes(&au->nonce, sizeof(au->nonce));
+       msg_b.struct_v = 1;
+       msg_b.nonce = cpu_to_le64(au->nonce);
+       ret = ceph_x_encrypt(&th->session_key, &msg_b, sizeof(msg_b),
+                            p, end - p);
+       if (ret < 0)
+               goto out_buf;
+       p += ret;
+       au->buf->vec.iov_len = p - au->buf->vec.iov_base;
+       dout(" built authorizer nonce %llx len %d\n", au->nonce,
+            (int)au->buf->vec.iov_len);
+       BUG_ON(au->buf->vec.iov_len > maxlen);
+       return 0;
+
+out_buf:
+       ceph_buffer_put(au->buf);
+       au->buf = NULL;
+       return ret;
+}
+
+static int ceph_x_encode_ticket(struct ceph_x_ticket_handler *th,
+                               void **p, void *end)
+{
+       ceph_decode_need(p, end, 1 + sizeof(u64), bad);
+       ceph_encode_8(p, 1);
+       ceph_encode_64(p, th->secret_id);
+       if (th->ticket_blob) {
+               const char *buf = th->ticket_blob->vec.iov_base;
+               u32 len = th->ticket_blob->vec.iov_len;
+
+               ceph_encode_32_safe(p, end, len, bad);
+               ceph_encode_copy_safe(p, end, buf, len, bad);
+       } else {
+               ceph_encode_32_safe(p, end, 0, bad);
+       }
+
+       return 0;
+bad:
+       return -ERANGE;
+}
+
+static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed)
+{
+       int want = ac->want_keys;
+       struct ceph_x_info *xi = ac->private;
+       int service;
+
+       *pneed = ac->want_keys & ~(xi->have_keys);
+
+       for (service = 1; service <= want; service <<= 1) {
+               struct ceph_x_ticket_handler *th;
+
+               if (!(ac->want_keys & service))
+                       continue;
+
+               if (*pneed & service)
+                       continue;
+
+               th = get_ticket_handler(ac, service);
+
+               if (IS_ERR(th)) {
+                       *pneed |= service;
+                       continue;
+               }
+
+               if (get_seconds() >= th->renew_after)
+                       *pneed |= service;
+               if (get_seconds() >= th->expires)
+                       xi->have_keys &= ~service;
+       }
+}
+
+
+static int ceph_x_build_request(struct ceph_auth_client *ac,
+                               void *buf, void *end)
+{
+       struct ceph_x_info *xi = ac->private;
+       int need;
+       struct ceph_x_request_header *head = buf;
+       int ret;
+       struct ceph_x_ticket_handler *th =
+               get_ticket_handler(ac, CEPH_ENTITY_TYPE_AUTH);
+
+       if (IS_ERR(th))
+               return PTR_ERR(th);
+
+       ceph_x_validate_tickets(ac, &need);
+
+       dout("build_request want %x have %x need %x\n",
+            ac->want_keys, xi->have_keys, need);
+
+       if (need & CEPH_ENTITY_TYPE_AUTH) {
+               struct ceph_x_authenticate *auth = (void *)(head + 1);
+               void *p = auth + 1;
+               struct ceph_x_challenge_blob tmp;
+               char tmp_enc[40];
+               u64 *u;
+
+               if (p > end)
+                       return -ERANGE;
+
+               dout(" get_auth_session_key\n");
+               head->op = cpu_to_le16(CEPHX_GET_AUTH_SESSION_KEY);
+
+               /* encrypt and hash */
+               get_random_bytes(&auth->client_challenge, sizeof(u64));
+               tmp.client_challenge = auth->client_challenge;
+               tmp.server_challenge = cpu_to_le64(xi->server_challenge);
+               ret = ceph_x_encrypt(&xi->secret, &tmp, sizeof(tmp),
+                                    tmp_enc, sizeof(tmp_enc));
+               if (ret < 0)
+                       return ret;
+
+               auth->struct_v = 1;
+               auth->key = 0;
+               for (u = (u64 *)tmp_enc; u + 1 <= (u64 *)(tmp_enc + ret); u++)
+                       auth->key ^= *(__le64 *)u;
+               dout(" server_challenge %llx client_challenge %llx key %llx\n",
+                    xi->server_challenge, le64_to_cpu(auth->client_challenge),
+                    le64_to_cpu(auth->key));
+
+               /* now encode the old ticket if exists */
+               ret = ceph_x_encode_ticket(th, &p, end);
+               if (ret < 0)
+                       return ret;
+
+               return p - buf;
+       }
+
+       if (need) {
+               void *p = head + 1;
+               struct ceph_x_service_ticket_request *req;
+
+               if (p > end)
+                       return -ERANGE;
+               head->op = cpu_to_le16(CEPHX_GET_PRINCIPAL_SESSION_KEY);
+
+               ret = ceph_x_build_authorizer(ac, th, &xi->auth_authorizer);
+               if (ret)
+                       return ret;
+               ceph_encode_copy(&p, xi->auth_authorizer.buf->vec.iov_base,
+                                xi->auth_authorizer.buf->vec.iov_len);
+
+               req = p;
+               req->keys = cpu_to_le32(need);
+               p += sizeof(*req);
+               return p - buf;
+       }
+
+       return 0;
+}
+
+static int ceph_x_handle_reply(struct ceph_auth_client *ac, int result,
+                              void *buf, void *end)
+{
+       struct ceph_x_info *xi = ac->private;
+       struct ceph_x_reply_header *head = buf;
+       struct ceph_x_ticket_handler *th;
+       int len = end - buf;
+       int op;
+       int ret;
+
+       if (result)
+               return result;  /* XXX hmm? */
+
+       if (xi->starting) {
+               /* it's a hello */
+               struct ceph_x_server_challenge *sc = buf;
+
+               if (len != sizeof(*sc))
+                       return -EINVAL;
+               xi->server_challenge = le64_to_cpu(sc->server_challenge);
+               dout("handle_reply got server challenge %llx\n",
+                    xi->server_challenge);
+               xi->starting = false;
+               xi->have_keys &= ~CEPH_ENTITY_TYPE_AUTH;
+               return -EAGAIN;
+       }
+
+       op = le16_to_cpu(head->op);
+       result = le32_to_cpu(head->result);
+       dout("handle_reply op %d result %d\n", op, result);
+       switch (op) {
+       case CEPHX_GET_AUTH_SESSION_KEY:
+               /* verify auth key */
+               ret = ceph_x_proc_ticket_reply(ac, &xi->secret,
+                                              buf + sizeof(*head), end);
+               break;
+
+       case CEPHX_GET_PRINCIPAL_SESSION_KEY:
+               th = get_ticket_handler(ac, CEPH_ENTITY_TYPE_AUTH);
+               if (IS_ERR(th))
+                       return PTR_ERR(th);
+               ret = ceph_x_proc_ticket_reply(ac, &th->session_key,
+                                              buf + sizeof(*head), end);
+               break;
+
+       default:
+               return -EINVAL;
+       }
+       if (ret)
+               return ret;
+       if (ac->want_keys == xi->have_keys)
+               return 0;
+       return -EAGAIN;
+}
+
+static int ceph_x_create_authorizer(
+       struct ceph_auth_client *ac, int peer_type,
+       struct ceph_authorizer **a,
+       void **buf, size_t *len,
+       void **reply_buf, size_t *reply_len)
+{
+       struct ceph_x_authorizer *au;
+       struct ceph_x_ticket_handler *th;
+       int ret;
+
+       th = get_ticket_handler(ac, peer_type);
+       if (IS_ERR(th))
+               return PTR_ERR(th);
+
+       au = kzalloc(sizeof(*au), GFP_NOFS);
+       if (!au)
+               return -ENOMEM;
+
+       ret = ceph_x_build_authorizer(ac, th, au);
+       if (ret) {
+               kfree(au);
+               return ret;
+       }
+
+       *a = (struct ceph_authorizer *)au;
+       *buf = au->buf->vec.iov_base;
+       *len = au->buf->vec.iov_len;
+       *reply_buf = au->reply_buf;
+       *reply_len = sizeof(au->reply_buf);
+       return 0;
+}
+
+static int ceph_x_verify_authorizer_reply(struct ceph_auth_client *ac,
+                                         struct ceph_authorizer *a, size_t len)
+{
+       struct ceph_x_authorizer *au = (void *)a;
+       struct ceph_x_ticket_handler *th;
+       int ret = 0;
+       struct ceph_x_authorize_reply reply;
+       void *p = au->reply_buf;
+       void *end = p + sizeof(au->reply_buf);
+
+       th = get_ticket_handler(ac, au->service);
+       if (IS_ERR(th))
+               return PTR_ERR(th);
+       ret = ceph_x_decrypt(&th->session_key, &p, end, &reply, sizeof(reply));
+       if (ret < 0)
+               return ret;
+       if (ret != sizeof(reply))
+               return -EPERM;
+
+       if (au->nonce + 1 != le64_to_cpu(reply.nonce_plus_one))
+               ret = -EPERM;
+       else
+               ret = 0;
+       dout("verify_authorizer_reply nonce %llx got %llx ret %d\n",
+            au->nonce, le64_to_cpu(reply.nonce_plus_one), ret);
+       return ret;
+}
+
+static void ceph_x_destroy_authorizer(struct ceph_auth_client *ac,
+                                     struct ceph_authorizer *a)
+{
+       struct ceph_x_authorizer *au = (void *)a;
+
+       ceph_buffer_put(au->buf);
+       kfree(au);
+}
+
+
+static void ceph_x_reset(struct ceph_auth_client *ac)
+{
+       struct ceph_x_info *xi = ac->private;
+
+       dout("reset\n");
+       xi->starting = true;
+       xi->server_challenge = 0;
+}
+
+static void ceph_x_destroy(struct ceph_auth_client *ac)
+{
+       struct ceph_x_info *xi = ac->private;
+       struct rb_node *p;
+
+       dout("ceph_x_destroy %p\n", ac);
+       ceph_crypto_key_destroy(&xi->secret);
+
+       while ((p = rb_first(&xi->ticket_handlers)) != NULL) {
+               struct ceph_x_ticket_handler *th =
+                       rb_entry(p, struct ceph_x_ticket_handler, node);
+               remove_ticket_handler(ac, th);
+       }
+
+       if (xi->auth_authorizer.buf)
+               ceph_buffer_put(xi->auth_authorizer.buf);
+
+       kfree(ac->private);
+       ac->private = NULL;
+}
+
+static void ceph_x_invalidate_authorizer(struct ceph_auth_client *ac,
+                                  int peer_type)
+{
+       struct ceph_x_ticket_handler *th;
+
+       th = get_ticket_handler(ac, peer_type);
+       if (!IS_ERR(th))
+               remove_ticket_handler(ac, th);
+}
+
+
+static const struct ceph_auth_client_ops ceph_x_ops = {
+       .name = "x",
+       .is_authenticated = ceph_x_is_authenticated,
+       .should_authenticate = ceph_x_should_authenticate,
+       .build_request = ceph_x_build_request,
+       .handle_reply = ceph_x_handle_reply,
+       .create_authorizer = ceph_x_create_authorizer,
+       .verify_authorizer_reply = ceph_x_verify_authorizer_reply,
+       .destroy_authorizer = ceph_x_destroy_authorizer,
+       .invalidate_authorizer = ceph_x_invalidate_authorizer,
+       .reset =  ceph_x_reset,
+       .destroy = ceph_x_destroy,
+};
+
+
+int ceph_x_init(struct ceph_auth_client *ac)
+{
+       struct ceph_x_info *xi;
+       int ret;
+
+       dout("ceph_x_init %p\n", ac);
+       ret = -ENOMEM;
+       xi = kzalloc(sizeof(*xi), GFP_NOFS);
+       if (!xi)
+               goto out;
+
+       ret = -EINVAL;
+       if (!ac->secret) {
+               pr_err("no secret set (for auth_x protocol)\n");
+               goto out_nomem;
+       }
+
+       ret = ceph_crypto_key_unarmor(&xi->secret, ac->secret);
+       if (ret)
+               goto out_nomem;
+
+       xi->starting = true;
+       xi->ticket_handlers = RB_ROOT;
+
+       ac->protocol = CEPH_AUTH_CEPHX;
+       ac->private = xi;
+       ac->ops = &ceph_x_ops;
+       return 0;
+
+out_nomem:
+       kfree(xi);
+out:
+       return ret;
+}
+
+
diff --git a/net/ceph/auth_x.h b/net/ceph/auth_x.h

new file mode 100644 (file)

index 0000000..e02da7a
--- /dev/null
+++ b/net/ceph/auth_x.h
@@ -0,0 +1,50 @@
+#ifndef _FS_CEPH_AUTH_X_H
+#define _FS_CEPH_AUTH_X_H
+
+#include <linux/rbtree.h>
+
+#include <linux/ceph/auth.h>
+
+#include "crypto.h"
+#include "auth_x_protocol.h"
+
+/*
+ * Handle ticket for a single service.
+ */
+struct ceph_x_ticket_handler {
+       struct rb_node node;
+       unsigned service;
+
+       struct ceph_crypto_key session_key;
+       struct ceph_timespec validity;
+
+       u64 secret_id;
+       struct ceph_buffer *ticket_blob;
+
+       unsigned long renew_after, expires;
+};
+
+
+struct ceph_x_authorizer {
+       struct ceph_buffer *buf;
+       unsigned service;
+       u64 nonce;
+       char reply_buf[128];  /* big enough for encrypted blob */
+};
+
+struct ceph_x_info {
+       struct ceph_crypto_key secret;
+
+       bool starting;
+       u64 server_challenge;
+
+       unsigned have_keys;
+       struct rb_root ticket_handlers;
+
+       struct ceph_x_authorizer auth_authorizer;
+};
+
+extern int ceph_x_init(struct ceph_auth_client *ac);
+
+#endif
+
diff --git a/net/ceph/auth_x_protocol.h b/net/ceph/auth_x_protocol.h

new file mode 100644 (file)

index 0000000..671d305
--- /dev/null
+++ b/net/ceph/auth_x_protocol.h
@@ -0,0 +1,90 @@
+#ifndef __FS_CEPH_AUTH_X_PROTOCOL
+#define __FS_CEPH_AUTH_X_PROTOCOL
+
+#define CEPHX_GET_AUTH_SESSION_KEY      0x0100
+#define CEPHX_GET_PRINCIPAL_SESSION_KEY 0x0200
+#define CEPHX_GET_ROTATING_KEY          0x0400
+
+/* common bits */
+struct ceph_x_ticket_blob {
+       __u8 struct_v;
+       __le64 secret_id;
+       __le32 blob_len;
+       char blob[];
+} __attribute__ ((packed));
+
+
+/* common request/reply headers */
+struct ceph_x_request_header {
+       __le16 op;
+} __attribute__ ((packed));
+
+struct ceph_x_reply_header {
+       __le16 op;
+       __le32 result;
+} __attribute__ ((packed));
+
+
+/* authenticate handshake */
+
+/* initial hello (no reply header) */
+struct ceph_x_server_challenge {
+       __u8 struct_v;
+       __le64 server_challenge;
+} __attribute__ ((packed));
+
+struct ceph_x_authenticate {
+       __u8 struct_v;
+       __le64 client_challenge;
+       __le64 key;
+       /* ticket blob */
+} __attribute__ ((packed));
+
+struct ceph_x_service_ticket_request {
+       __u8 struct_v;
+       __le32 keys;
+} __attribute__ ((packed));
+
+struct ceph_x_challenge_blob {
+       __le64 server_challenge;
+       __le64 client_challenge;
+} __attribute__ ((packed));
+
+
+
+/* authorize handshake */
+
+/*
+ * The authorizer consists of two pieces:
+ *  a - service id, ticket blob
+ *  b - encrypted with session key
+ */
+struct ceph_x_authorize_a {
+       __u8 struct_v;
+       __le64 global_id;
+       __le32 service_id;
+       struct ceph_x_ticket_blob ticket_blob;
+} __attribute__ ((packed));
+
+struct ceph_x_authorize_b {
+       __u8 struct_v;
+       __le64 nonce;
+} __attribute__ ((packed));
+
+struct ceph_x_authorize_reply {
+       __u8 struct_v;
+       __le64 nonce_plus_one;
+} __attribute__ ((packed));
+
+
+/*
+ * encyption bundle
+ */
+#define CEPHX_ENC_MAGIC 0xff009cad8826aa55ull
+
+struct ceph_x_encrypt_header {
+       __u8 struct_v;
+       __le64 magic;
+} __attribute__ ((packed));
+
+#endif
diff --git a/net/ceph/buffer.c b/net/ceph/buffer.c

new file mode 100644 (file)

index 0000000..53d8abf
--- /dev/null
+++ b/net/ceph/buffer.c
@@ -0,0 +1,68 @@
+
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include <linux/ceph/buffer.h>
+#include <linux/ceph/decode.h>
+
+struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
+{
+       struct ceph_buffer *b;
+
+       b = kmalloc(sizeof(*b), gfp);
+       if (!b)
+               return NULL;
+
+       b->vec.iov_base = kmalloc(len, gfp | __GFP_NOWARN);
+       if (b->vec.iov_base) {
+               b->is_vmalloc = false;
+       } else {
+               b->vec.iov_base = __vmalloc(len, gfp, PAGE_KERNEL);
+               if (!b->vec.iov_base) {
+                       kfree(b);
+                       return NULL;
+               }
+               b->is_vmalloc = true;
+       }
+
+       kref_init(&b->kref);
+       b->alloc_len = len;
+       b->vec.iov_len = len;
+       dout("buffer_new %p\n", b);
+       return b;
+}
+EXPORT_SYMBOL(ceph_buffer_new);
+
+void ceph_buffer_release(struct kref *kref)
+{
+       struct ceph_buffer *b = container_of(kref, struct ceph_buffer, kref);
+
+       dout("buffer_release %p\n", b);
+       if (b->vec.iov_base) {
+               if (b->is_vmalloc)
+                       vfree(b->vec.iov_base);
+               else
+                       kfree(b->vec.iov_base);
+       }
+       kfree(b);
+}
+EXPORT_SYMBOL(ceph_buffer_release);
+
+int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end)
+{
+       size_t len;
+
+       ceph_decode_need(p, end, sizeof(u32), bad);
+       len = ceph_decode_32(p);
+       dout("decode_buffer len %d\n", (int)len);
+       ceph_decode_need(p, end, len, bad);
+       *b = ceph_buffer_new(len, GFP_NOFS);
+       if (!*b)
+               return -ENOMEM;
+       ceph_decode_copy(p, (*b)->vec.iov_base, len);
+       return 0;
+bad:
+       return -EINVAL;
+}
diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c

new file mode 100644 (file)

index 0000000..f3e4a13
--- /dev/null
+++ b/net/ceph/ceph_common.c
@@ -0,0 +1,529 @@
+
+#include <linux/ceph/ceph_debug.h>
+#include <linux/backing-dev.h>
+#include <linux/ctype.h>
+#include <linux/fs.h>
+#include <linux/inet.h>
+#include <linux/in6.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/parser.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/statfs.h>
+#include <linux/string.h>
+
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/debugfs.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+
+
+
+/*
+ * find filename portion of a path (/foo/bar/baz -> baz)
+ */
+const char *ceph_file_part(const char *s, int len)
+{
+       const char *e = s + len;
+
+       while (e != s && *(e-1) != '/')
+               e--;
+       return e;
+}
+EXPORT_SYMBOL(ceph_file_part);
+
+const char *ceph_msg_type_name(int type)
+{
+       switch (type) {
+       case CEPH_MSG_SHUTDOWN: return "shutdown";
+       case CEPH_MSG_PING: return "ping";
+       case CEPH_MSG_AUTH: return "auth";
+       case CEPH_MSG_AUTH_REPLY: return "auth_reply";
+       case CEPH_MSG_MON_MAP: return "mon_map";
+       case CEPH_MSG_MON_GET_MAP: return "mon_get_map";
+       case CEPH_MSG_MON_SUBSCRIBE: return "mon_subscribe";
+       case CEPH_MSG_MON_SUBSCRIBE_ACK: return "mon_subscribe_ack";
+       case CEPH_MSG_STATFS: return "statfs";
+       case CEPH_MSG_STATFS_REPLY: return "statfs_reply";
+       case CEPH_MSG_MDS_MAP: return "mds_map";
+       case CEPH_MSG_CLIENT_SESSION: return "client_session";
+       case CEPH_MSG_CLIENT_RECONNECT: return "client_reconnect";
+       case CEPH_MSG_CLIENT_REQUEST: return "client_request";
+       case CEPH_MSG_CLIENT_REQUEST_FORWARD: return "client_request_forward";
+       case CEPH_MSG_CLIENT_REPLY: return "client_reply";
+       case CEPH_MSG_CLIENT_CAPS: return "client_caps";
+       case CEPH_MSG_CLIENT_CAPRELEASE: return "client_cap_release";
+       case CEPH_MSG_CLIENT_SNAP: return "client_snap";
+       case CEPH_MSG_CLIENT_LEASE: return "client_lease";
+       case CEPH_MSG_OSD_MAP: return "osd_map";
+       case CEPH_MSG_OSD_OP: return "osd_op";
+       case CEPH_MSG_OSD_OPREPLY: return "osd_opreply";
+       default: return "unknown";
+       }
+}
+EXPORT_SYMBOL(ceph_msg_type_name);
+
+/*
+ * Initially learn our fsid, or verify an fsid matches.
+ */
+int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid)
+{
+       if (client->have_fsid) {
+               if (ceph_fsid_compare(&client->fsid, fsid)) {
+                       pr_err("bad fsid, had %pU got %pU",
+                              &client->fsid, fsid);
+                       return -1;
+               }
+       } else {
+               pr_info("client%lld fsid %pU\n", ceph_client_id(client), fsid);
+               memcpy(&client->fsid, fsid, sizeof(*fsid));
+               ceph_debugfs_client_init(client);
+               client->have_fsid = true;
+       }
+       return 0;
+}
+EXPORT_SYMBOL(ceph_check_fsid);
+
+static int strcmp_null(const char *s1, const char *s2)
+{
+       if (!s1 && !s2)
+               return 0;
+       if (s1 && !s2)
+               return -1;
+       if (!s1 && s2)
+               return 1;
+       return strcmp(s1, s2);
+}
+
+int ceph_compare_options(struct ceph_options *new_opt,
+                        struct ceph_client *client)
+{
+       struct ceph_options *opt1 = new_opt;
+       struct ceph_options *opt2 = client->options;
+       int ofs = offsetof(struct ceph_options, mon_addr);
+       int i;
+       int ret;
+
+       ret = memcmp(opt1, opt2, ofs);
+       if (ret)
+               return ret;
+
+       ret = strcmp_null(opt1->name, opt2->name);
+       if (ret)
+               return ret;
+
+       ret = strcmp_null(opt1->secret, opt2->secret);
+       if (ret)
+               return ret;
+
+       /* any matching mon ip implies a match */
+       for (i = 0; i < opt1->num_mon; i++) {
+               if (ceph_monmap_contains(client->monc.monmap,
+                                &opt1->mon_addr[i]))
+                       return 0;
+       }
+       return -1;
+}
+EXPORT_SYMBOL(ceph_compare_options);
+
+
+static int parse_fsid(const char *str, struct ceph_fsid *fsid)
+{
+       int i = 0;
+       char tmp[3];
+       int err = -EINVAL;
+       int d;
+
+       dout("parse_fsid '%s'\n", str);
+       tmp[2] = 0;
+       while (*str && i < 16) {
+               if (ispunct(*str)) {
+                       str++;
+                       continue;
+               }
+               if (!isxdigit(str[0]) || !isxdigit(str[1]))
+                       break;
+               tmp[0] = str[0];
+               tmp[1] = str[1];
+               if (sscanf(tmp, "%x", &d) < 1)
+                       break;
+               fsid->fsid[i] = d & 0xff;
+               i++;
+               str += 2;
+       }
+
+       if (i == 16)
+               err = 0;
+       dout("parse_fsid ret %d got fsid %pU", err, fsid);
+       return err;
+}
+
+/*
+ * ceph options
+ */
+enum {
+       Opt_osdtimeout,
+       Opt_osdkeepalivetimeout,
+       Opt_mount_timeout,
+       Opt_osd_idle_ttl,
+       Opt_last_int,
+       /* int args above */
+       Opt_fsid,
+       Opt_name,
+       Opt_secret,
+       Opt_ip,
+       Opt_last_string,
+       /* string args above */
+       Opt_noshare,
+       Opt_nocrc,
+};
+
+static match_table_t opt_tokens = {
+       {Opt_osdtimeout, "osdtimeout=%d"},
+       {Opt_osdkeepalivetimeout, "osdkeepalive=%d"},
+       {Opt_mount_timeout, "mount_timeout=%d"},
+       {Opt_osd_idle_ttl, "osd_idle_ttl=%d"},
+       /* int args above */
+       {Opt_fsid, "fsid=%s"},
+       {Opt_name, "name=%s"},
+       {Opt_secret, "secret=%s"},
+       {Opt_ip, "ip=%s"},
+       /* string args above */
+       {Opt_noshare, "noshare"},
+       {Opt_nocrc, "nocrc"},
+       {-1, NULL}
+};
+
+void ceph_destroy_options(struct ceph_options *opt)
+{
+       dout("destroy_options %p\n", opt);
+       kfree(opt->name);
+       kfree(opt->secret);
+       kfree(opt);
+}
+EXPORT_SYMBOL(ceph_destroy_options);
+
+int ceph_parse_options(struct ceph_options **popt, char *options,
+                      const char *dev_name, const char *dev_name_end,
+                      int (*parse_extra_token)(char *c, void *private),
+                      void *private)
+{
+       struct ceph_options *opt;
+       const char *c;
+       int err = -ENOMEM;
+       substring_t argstr[MAX_OPT_ARGS];
+
+       opt = kzalloc(sizeof(*opt), GFP_KERNEL);
+       if (!opt)
+               return err;
+       opt->mon_addr = kcalloc(CEPH_MAX_MON, sizeof(*opt->mon_addr),
+                               GFP_KERNEL);
+       if (!opt->mon_addr)
+               goto out;
+
+       dout("parse_options %p options '%s' dev_name '%s'\n", opt, options,
+            dev_name);
+
+       /* start with defaults */
+       opt->flags = CEPH_OPT_DEFAULT;
+       opt->osd_timeout = CEPH_OSD_TIMEOUT_DEFAULT;
+       opt->osd_keepalive_timeout = CEPH_OSD_KEEPALIVE_DEFAULT;
+       opt->mount_timeout = CEPH_MOUNT_TIMEOUT_DEFAULT; /* seconds */
+       opt->osd_idle_ttl = CEPH_OSD_IDLE_TTL_DEFAULT;   /* seconds */
+
+       /* get mon ip(s) */
+       /* ip1[:port1][,ip2[:port2]...] */
+       err = ceph_parse_ips(dev_name, dev_name_end, opt->mon_addr,
+                            CEPH_MAX_MON, &opt->num_mon);
+       if (err < 0)
+               goto out;
+
+       /* parse mount options */
+       while ((c = strsep(&options, ",")) != NULL) {
+               int token, intval, ret;
+               if (!*c)
+                       continue;
+               err = -EINVAL;
+               token = match_token((char *)c, opt_tokens, argstr);
+               if (token < 0 && parse_extra_token) {
+                       /* extra? */
+                       err = parse_extra_token((char *)c, private);
+                       if (err < 0) {
+                               pr_err("bad option at '%s'\n", c);
+                               goto out;
+                       }
+                       continue;
+               }
+               if (token < Opt_last_int) {
+                       ret = match_int(&argstr[0], &intval);
+                       if (ret < 0) {
+                               pr_err("bad mount option arg (not int) "
+                                      "at '%s'\n", c);
+                               continue;
+                       }
+                       dout("got int token %d val %d\n", token, intval);
+               } else if (token > Opt_last_int && token < Opt_last_string) {
+                       dout("got string token %d val %s\n", token,
+                            argstr[0].from);
+               } else {
+                       dout("got token %d\n", token);
+               }
+               switch (token) {
+               case Opt_ip:
+                       err = ceph_parse_ips(argstr[0].from,
+                                            argstr[0].to,
+                                            &opt->my_addr,
+                                            1, NULL);
+                       if (err < 0)
+                               goto out;
+                       opt->flags |= CEPH_OPT_MYIP;
+                       break;
+
+               case Opt_fsid:
+                       err = parse_fsid(argstr[0].from, &opt->fsid);
+                       if (err == 0)
+                               opt->flags |= CEPH_OPT_FSID;
+                       break;
+               case Opt_name:
+                       opt->name = kstrndup(argstr[0].from,
+                                             argstr[0].to-argstr[0].from,
+                                             GFP_KERNEL);
+                       break;
+               case Opt_secret:
+                       opt->secret = kstrndup(argstr[0].from,
+                                               argstr[0].to-argstr[0].from,
+                                               GFP_KERNEL);
+                       break;
+
+                       /* misc */
+               case Opt_osdtimeout:
+                       opt->osd_timeout = intval;
+                       break;
+               case Opt_osdkeepalivetimeout:
+                       opt->osd_keepalive_timeout = intval;
+                       break;
+               case Opt_osd_idle_ttl:
+                       opt->osd_idle_ttl = intval;
+                       break;
+               case Opt_mount_timeout:
+                       opt->mount_timeout = intval;
+                       break;
+
+               case Opt_noshare:
+                       opt->flags |= CEPH_OPT_NOSHARE;
+                       break;
+
+               case Opt_nocrc:
+                       opt->flags |= CEPH_OPT_NOCRC;
+                       break;
+
+               default:
+                       BUG_ON(token);
+               }
+       }
+
+       /* success */
+       *popt = opt;
+       return 0;
+
+out:
+       ceph_destroy_options(opt);
+       return err;
+}
+EXPORT_SYMBOL(ceph_parse_options);
+
+u64 ceph_client_id(struct ceph_client *client)
+{
+       return client->monc.auth->global_id;
+}
+EXPORT_SYMBOL(ceph_client_id);
+
+/*
+ * create a fresh client instance
+ */
+struct ceph_client *ceph_create_client(struct ceph_options *opt, void *private)
+{
+       struct ceph_client *client;
+       int err = -ENOMEM;
+
+       client = kzalloc(sizeof(*client), GFP_KERNEL);
+       if (client == NULL)
+               return ERR_PTR(-ENOMEM);
+
+       client->private = private;
+       client->options = opt;
+
+       mutex_init(&client->mount_mutex);
+       init_waitqueue_head(&client->auth_wq);
+       client->auth_err = 0;
+
+       client->extra_mon_dispatch = NULL;
+       client->supported_features = CEPH_FEATURE_SUPPORTED_DEFAULT;
+       client->required_features = CEPH_FEATURE_REQUIRED_DEFAULT;
+
+       client->msgr = NULL;
+
+       /* subsystems */
+       err = ceph_monc_init(&client->monc, client);
+       if (err < 0)
+               goto fail;
+       err = ceph_osdc_init(&client->osdc, client);
+       if (err < 0)
+               goto fail_monc;
+
+       return client;
+
+fail_monc:
+       ceph_monc_stop(&client->monc);
+fail:
+       kfree(client);
+       return ERR_PTR(err);
+}
+EXPORT_SYMBOL(ceph_create_client);
+
+void ceph_destroy_client(struct ceph_client *client)
+{
+       dout("destroy_client %p\n", client);
+
+       /* unmount */
+       ceph_osdc_stop(&client->osdc);
+
+       /*
+        * make sure mds and osd connections close out before destroying
+        * the auth module, which is needed to free those connections'
+        * ceph_authorizers.
+        */
+       ceph_msgr_flush();
+
+       ceph_monc_stop(&client->monc);
+
+       ceph_debugfs_client_cleanup(client);
+
+       if (client->msgr)
+               ceph_messenger_destroy(client->msgr);
+
+       ceph_destroy_options(client->options);
+
+       kfree(client);
+       dout("destroy_client %p done\n", client);
+}
+EXPORT_SYMBOL(ceph_destroy_client);
+
+/*
+ * true if we have the mon map (and have thus joined the cluster)
+ */
+static int have_mon_and_osd_map(struct ceph_client *client)
+{
+       return client->monc.monmap && client->monc.monmap->epoch &&
+              client->osdc.osdmap && client->osdc.osdmap->epoch;
+}
+
+/*
+ * mount: join the ceph cluster, and open root directory.
+ */
+int __ceph_open_session(struct ceph_client *client, unsigned long started)
+{
+       struct ceph_entity_addr *myaddr = NULL;
+       int err;
+       unsigned long timeout = client->options->mount_timeout * HZ;
+
+       /* initialize the messenger */
+       if (client->msgr == NULL) {
+               if (ceph_test_opt(client, MYIP))
+                       myaddr = &client->options->my_addr;
+               client->msgr = ceph_messenger_create(myaddr,
+                                       client->supported_features,
+                                       client->required_features);
+               if (IS_ERR(client->msgr)) {
+                       client->msgr = NULL;
+                       return PTR_ERR(client->msgr);
+               }
+               client->msgr->nocrc = ceph_test_opt(client, NOCRC);
+       }
+
+       /* open session, and wait for mon and osd maps */
+       err = ceph_monc_open_session(&client->monc);
+       if (err < 0)
+               return err;
+
+       while (!have_mon_and_osd_map(client)) {
+               err = -EIO;
+               if (timeout && time_after_eq(jiffies, started + timeout))
+                       return err;
+
+               /* wait */
+               dout("mount waiting for mon_map\n");
+               err = wait_event_interruptible_timeout(client->auth_wq,
+                       have_mon_and_osd_map(client) || (client->auth_err < 0),
+                       timeout);
+               if (err == -EINTR || err == -ERESTARTSYS)
+                       return err;
+               if (client->auth_err < 0)
+                       return client->auth_err;
+       }
+
+       return 0;
+}
+EXPORT_SYMBOL(__ceph_open_session);
+
+
+int ceph_open_session(struct ceph_client *client)
+{
+       int ret;
+       unsigned long started = jiffies;  /* note the start time */
+
+       dout("open_session start\n");
+       mutex_lock(&client->mount_mutex);
+
+       ret = __ceph_open_session(client, started);
+
+       mutex_unlock(&client->mount_mutex);
+       return ret;
+}
+EXPORT_SYMBOL(ceph_open_session);
+
+
+static int __init init_ceph_lib(void)
+{
+       int ret = 0;
+
+       ret = ceph_debugfs_init();
+       if (ret < 0)
+               goto out;
+
+       ret = ceph_msgr_init();
+       if (ret < 0)
+               goto out_debugfs;
+
+       pr_info("loaded (mon/osd proto %d/%d, osdmap %d/%d %d/%d)\n",
+               CEPH_MONC_PROTOCOL, CEPH_OSDC_PROTOCOL,
+               CEPH_OSDMAP_VERSION, CEPH_OSDMAP_VERSION_EXT,
+               CEPH_OSDMAP_INC_VERSION, CEPH_OSDMAP_INC_VERSION_EXT);
+
+       return 0;
+
+out_debugfs:
+       ceph_debugfs_cleanup();
+out:
+       return ret;
+}
+
+static void __exit exit_ceph_lib(void)
+{
+       dout("exit_ceph_lib\n");
+       ceph_msgr_exit();
+       ceph_debugfs_cleanup();
+}
+
+module_init(init_ceph_lib);
+module_exit(exit_ceph_lib);
+
+MODULE_AUTHOR("Sage Weil <sage@newdream.net>");
+MODULE_AUTHOR("Yehuda Sadeh <yehuda@hq.newdream.net>");
+MODULE_AUTHOR("Patience Warnick <patience@newdream.net>");
+MODULE_DESCRIPTION("Ceph filesystem for Linux");
+MODULE_LICENSE("GPL");
diff --git a/net/ceph/ceph_fs.c b/net/ceph/ceph_fs.c

new file mode 100644 (file)

index 0000000..a3a3a31
--- /dev/null
+++ b/net/ceph/ceph_fs.c
@@ -0,0 +1,75 @@
+/*
+ * Some non-inline ceph helpers
+ */
+#include <linux/module.h>
+#include <linux/ceph/types.h>
+
+/*
+ * return true if @layout appears to be valid
+ */
+int ceph_file_layout_is_valid(const struct ceph_file_layout *layout)
+{
+       __u32 su = le32_to_cpu(layout->fl_stripe_unit);
+       __u32 sc = le32_to_cpu(layout->fl_stripe_count);
+       __u32 os = le32_to_cpu(layout->fl_object_size);
+
+       /* stripe unit, object size must be non-zero, 64k increment */
+       if (!su || (su & (CEPH_MIN_STRIPE_UNIT-1)))
+               return 0;
+       if (!os || (os & (CEPH_MIN_STRIPE_UNIT-1)))
+               return 0;
+       /* object size must be a multiple of stripe unit */
+       if (os < su || os % su)
+               return 0;
+       /* stripe count must be non-zero */
+       if (!sc)
+               return 0;
+       return 1;
+}
+
+
+int ceph_flags_to_mode(int flags)
+{
+       int mode;
+
+#ifdef O_DIRECTORY  /* fixme */
+       if ((flags & O_DIRECTORY) == O_DIRECTORY)
+               return CEPH_FILE_MODE_PIN;
+#endif
+       if ((flags & O_APPEND) == O_APPEND)
+               flags |= O_WRONLY;
+
+       if ((flags & O_ACCMODE) == O_RDWR)
+               mode = CEPH_FILE_MODE_RDWR;
+       else if ((flags & O_ACCMODE) == O_WRONLY)
+               mode = CEPH_FILE_MODE_WR;
+       else
+               mode = CEPH_FILE_MODE_RD;
+
+#ifdef O_LAZY
+       if (flags & O_LAZY)
+               mode |= CEPH_FILE_MODE_LAZY;
+#endif
+
+       return mode;
+}
+EXPORT_SYMBOL(ceph_flags_to_mode);
+
+int ceph_caps_for_mode(int mode)
+{
+       int caps = CEPH_CAP_PIN;
+
+       if (mode & CEPH_FILE_MODE_RD)
+               caps |= CEPH_CAP_FILE_SHARED |
+                       CEPH_CAP_FILE_RD | CEPH_CAP_FILE_CACHE;
+       if (mode & CEPH_FILE_MODE_WR)
+               caps |= CEPH_CAP_FILE_EXCL |
+                       CEPH_CAP_FILE_WR | CEPH_CAP_FILE_BUFFER |
+                       CEPH_CAP_AUTH_SHARED | CEPH_CAP_AUTH_EXCL |
+                       CEPH_CAP_XATTR_SHARED | CEPH_CAP_XATTR_EXCL;
+       if (mode & CEPH_FILE_MODE_LAZY)
+               caps |= CEPH_CAP_FILE_LAZYIO;
+
+       return caps;
+}
+EXPORT_SYMBOL(ceph_caps_for_mode);
diff --git a/net/ceph/ceph_hash.c b/net/ceph/ceph_hash.c

new file mode 100644 (file)

index 0000000..815ef88
--- /dev/null
+++ b/net/ceph/ceph_hash.c
@@ -0,0 +1,118 @@
+
+#include <linux/ceph/types.h>
+
+/*
+ * Robert Jenkin's hash function.
+ * http://burtleburtle.net/bob/hash/evahash.html
+ * This is in the public domain.
+ */
+#define mix(a, b, c)                                           \
+       do {                                                    \
+               a = a - b;  a = a - c;  a = a ^ (c >> 13);      \
+               b = b - c;  b = b - a;  b = b ^ (a << 8);       \
+               c = c - a;  c = c - b;  c = c ^ (b >> 13);      \
+               a = a - b;  a = a - c;  a = a ^ (c >> 12);      \
+               b = b - c;  b = b - a;  b = b ^ (a << 16);      \
+               c = c - a;  c = c - b;  c = c ^ (b >> 5);       \
+               a = a - b;  a = a - c;  a = a ^ (c >> 3);       \
+               b = b - c;  b = b - a;  b = b ^ (a << 10);      \
+               c = c - a;  c = c - b;  c = c ^ (b >> 15);      \
+       } while (0)
+
+unsigned ceph_str_hash_rjenkins(const char *str, unsigned length)
+{
+       const unsigned char *k = (const unsigned char *)str;
+       __u32 a, b, c;  /* the internal state */
+       __u32 len;      /* how many key bytes still need mixing */
+
+       /* Set up the internal state */
+       len = length;
+       a = 0x9e3779b9;      /* the golden ratio; an arbitrary value */
+       b = a;
+       c = 0;               /* variable initialization of internal state */
+
+       /* handle most of the key */
+       while (len >= 12) {
+               a = a + (k[0] + ((__u32)k[1] << 8) + ((__u32)k[2] << 16) +
+                        ((__u32)k[3] << 24));
+               b = b + (k[4] + ((__u32)k[5] << 8) + ((__u32)k[6] << 16) +
+                        ((__u32)k[7] << 24));
+               c = c + (k[8] + ((__u32)k[9] << 8) + ((__u32)k[10] << 16) +
+                        ((__u32)k[11] << 24));
+               mix(a, b, c);
+               k = k + 12;
+               len = len - 12;
+       }
+
+       /* handle the last 11 bytes */
+       c = c + length;
+       switch (len) {            /* all the case statements fall through */
+       case 11:
+               c = c + ((__u32)k[10] << 24);
+       case 10:
+               c = c + ((__u32)k[9] << 16);
+       case 9:
+               c = c + ((__u32)k[8] << 8);
+               /* the first byte of c is reserved for the length */
+       case 8:
+               b = b + ((__u32)k[7] << 24);
+       case 7:
+               b = b + ((__u32)k[6] << 16);
+       case 6:
+               b = b + ((__u32)k[5] << 8);
+       case 5:
+               b = b + k[4];
+       case 4:
+               a = a + ((__u32)k[3] << 24);
+       case 3:
+               a = a + ((__u32)k[2] << 16);
+       case 2:
+               a = a + ((__u32)k[1] << 8);
+       case 1:
+               a = a + k[0];
+               /* case 0: nothing left to add */
+       }
+       mix(a, b, c);
+
+       return c;
+}
+
+/*
+ * linux dcache hash
+ */
+unsigned ceph_str_hash_linux(const char *str, unsigned length)
+{
+       unsigned long hash = 0;
+       unsigned char c;
+
+       while (length--) {
+               c = *str++;
+               hash = (hash + (c << 4) + (c >> 4)) * 11;
+       }
+       return hash;
+}
+
+
+unsigned ceph_str_hash(int type, const char *s, unsigned len)
+{
+       switch (type) {
+       case CEPH_STR_HASH_LINUX:
+               return ceph_str_hash_linux(s, len);
+       case CEPH_STR_HASH_RJENKINS:
+               return ceph_str_hash_rjenkins(s, len);
+       default:
+               return -1;
+       }
+}
+
+const char *ceph_str_hash_name(int type)
+{
+       switch (type) {
+       case CEPH_STR_HASH_LINUX:
+               return "linux";
+       case CEPH_STR_HASH_RJENKINS:
+               return "rjenkins";
+       default:
+               return "unknown";
+       }
+}
diff --git a/net/ceph/ceph_strings.c b/net/ceph/ceph_strings.c

new file mode 100644 (file)

index 0000000..3fbda04
--- /dev/null
+++ b/net/ceph/ceph_strings.c
@@ -0,0 +1,84 @@
+/*
+ * Ceph string constants
+ */
+#include <linux/module.h>
+#include <linux/ceph/types.h>
+
+const char *ceph_entity_type_name(int type)
+{
+       switch (type) {
+       case CEPH_ENTITY_TYPE_MDS: return "mds";
+       case CEPH_ENTITY_TYPE_OSD: return "osd";
+       case CEPH_ENTITY_TYPE_MON: return "mon";
+       case CEPH_ENTITY_TYPE_CLIENT: return "client";
+       case CEPH_ENTITY_TYPE_AUTH: return "auth";
+       default: return "unknown";
+       }
+}
+
+const char *ceph_osd_op_name(int op)
+{
+       switch (op) {
+       case CEPH_OSD_OP_READ: return "read";
+       case CEPH_OSD_OP_STAT: return "stat";
+
+       case CEPH_OSD_OP_MASKTRUNC: return "masktrunc";
+
+       case CEPH_OSD_OP_WRITE: return "write";
+       case CEPH_OSD_OP_DELETE: return "delete";
+       case CEPH_OSD_OP_TRUNCATE: return "truncate";
+       case CEPH_OSD_OP_ZERO: return "zero";
+       case CEPH_OSD_OP_WRITEFULL: return "writefull";
+       case CEPH_OSD_OP_ROLLBACK: return "rollback";
+
+       case CEPH_OSD_OP_APPEND: return "append";
+       case CEPH_OSD_OP_STARTSYNC: return "startsync";
+       case CEPH_OSD_OP_SETTRUNC: return "settrunc";
+       case CEPH_OSD_OP_TRIMTRUNC: return "trimtrunc";
+
+       case CEPH_OSD_OP_TMAPUP: return "tmapup";
+       case CEPH_OSD_OP_TMAPGET: return "tmapget";
+       case CEPH_OSD_OP_TMAPPUT: return "tmapput";
+
+       case CEPH_OSD_OP_GETXATTR: return "getxattr";
+       case CEPH_OSD_OP_GETXATTRS: return "getxattrs";
+       case CEPH_OSD_OP_SETXATTR: return "setxattr";
+       case CEPH_OSD_OP_SETXATTRS: return "setxattrs";
+       case CEPH_OSD_OP_RESETXATTRS: return "resetxattrs";
+       case CEPH_OSD_OP_RMXATTR: return "rmxattr";
+       case CEPH_OSD_OP_CMPXATTR: return "cmpxattr";
+
+       case CEPH_OSD_OP_PULL: return "pull";
+       case CEPH_OSD_OP_PUSH: return "push";
+       case CEPH_OSD_OP_BALANCEREADS: return "balance-reads";
+       case CEPH_OSD_OP_UNBALANCEREADS: return "unbalance-reads";
+       case CEPH_OSD_OP_SCRUB: return "scrub";
+
+       case CEPH_OSD_OP_WRLOCK: return "wrlock";
+       case CEPH_OSD_OP_WRUNLOCK: return "wrunlock";
+       case CEPH_OSD_OP_RDLOCK: return "rdlock";
+       case CEPH_OSD_OP_RDUNLOCK: return "rdunlock";
+       case CEPH_OSD_OP_UPLOCK: return "uplock";
+       case CEPH_OSD_OP_DNLOCK: return "dnlock";
+
+       case CEPH_OSD_OP_CALL: return "call";
+
+       case CEPH_OSD_OP_PGLS: return "pgls";
+       }
+       return "???";
+}
+
+
+const char *ceph_pool_op_name(int op)
+{
+       switch (op) {
+       case POOL_OP_CREATE: return "create";
+       case POOL_OP_DELETE: return "delete";
+       case POOL_OP_AUID_CHANGE: return "auid change";
+       case POOL_OP_CREATE_SNAP: return "create snap";
+       case POOL_OP_DELETE_SNAP: return "delete snap";
+       case POOL_OP_CREATE_UNMANAGED_SNAP: return "create unmanaged snap";
+       case POOL_OP_DELETE_UNMANAGED_SNAP: return "delete unmanaged snap";
+       }
+       return "???";
+}
diff --git a/net/ceph/crush/crush.c b/net/ceph/crush/crush.c

new file mode 100644 (file)

index 0000000..d6ebb13
--- /dev/null
+++ b/net/ceph/crush/crush.c
@@ -0,0 +1,151 @@
+
+#ifdef __KERNEL__
+# include <linux/slab.h>
+#else
+# include <stdlib.h>
+# include <assert.h>
+# define kfree(x) do { if (x) free(x); } while (0)
+# define BUG_ON(x) assert(!(x))
+#endif
+
+#include <linux/crush/crush.h>
+
+const char *crush_bucket_alg_name(int alg)
+{
+       switch (alg) {
+       case CRUSH_BUCKET_UNIFORM: return "uniform";
+       case CRUSH_BUCKET_LIST: return "list";
+       case CRUSH_BUCKET_TREE: return "tree";
+       case CRUSH_BUCKET_STRAW: return "straw";
+       default: return "unknown";
+       }
+}
+
+/**
+ * crush_get_bucket_item_weight - Get weight of an item in given bucket
+ * @b: bucket pointer
+ * @p: item index in bucket
+ */
+int crush_get_bucket_item_weight(struct crush_bucket *b, int p)
+{
+       if (p >= b->size)
+               return 0;
+
+       switch (b->alg) {
+       case CRUSH_BUCKET_UNIFORM:
+               return ((struct crush_bucket_uniform *)b)->item_weight;
+       case CRUSH_BUCKET_LIST:
+               return ((struct crush_bucket_list *)b)->item_weights[p];
+       case CRUSH_BUCKET_TREE:
+               if (p & 1)
+                       return ((struct crush_bucket_tree *)b)->node_weights[p];
+               return 0;
+       case CRUSH_BUCKET_STRAW:
+               return ((struct crush_bucket_straw *)b)->item_weights[p];
+       }
+       return 0;
+}
+
+/**
+ * crush_calc_parents - Calculate parent vectors for the given crush map.
+ * @map: crush_map pointer
+ */
+void crush_calc_parents(struct crush_map *map)
+{
+       int i, b, c;
+
+       for (b = 0; b < map->max_buckets; b++) {
+               if (map->buckets[b] == NULL)
+                       continue;
+               for (i = 0; i < map->buckets[b]->size; i++) {
+                       c = map->buckets[b]->items[i];
+                       BUG_ON(c >= map->max_devices ||
+                              c < -map->max_buckets);
+                       if (c >= 0)
+                               map->device_parents[c] = map->buckets[b]->id;
+                       else
+                               map->bucket_parents[-1-c] = map->buckets[b]->id;
+               }
+       }
+}
+
+void crush_destroy_bucket_uniform(struct crush_bucket_uniform *b)
+{
+       kfree(b->h.perm);
+       kfree(b->h.items);
+       kfree(b);
+}
+
+void crush_destroy_bucket_list(struct crush_bucket_list *b)
+{
+       kfree(b->item_weights);
+       kfree(b->sum_weights);
+       kfree(b->h.perm);
+       kfree(b->h.items);
+       kfree(b);
+}
+
+void crush_destroy_bucket_tree(struct crush_bucket_tree *b)
+{
+       kfree(b->node_weights);
+       kfree(b);
+}
+
+void crush_destroy_bucket_straw(struct crush_bucket_straw *b)
+{
+       kfree(b->straws);
+       kfree(b->item_weights);
+       kfree(b->h.perm);
+       kfree(b->h.items);
+       kfree(b);
+}
+
+void crush_destroy_bucket(struct crush_bucket *b)
+{
+       switch (b->alg) {
+       case CRUSH_BUCKET_UNIFORM:
+               crush_destroy_bucket_uniform((struct crush_bucket_uniform *)b);
+               break;
+       case CRUSH_BUCKET_LIST:
+               crush_destroy_bucket_list((struct crush_bucket_list *)b);
+               break;
+       case CRUSH_BUCKET_TREE:
+               crush_destroy_bucket_tree((struct crush_bucket_tree *)b);
+               break;
+       case CRUSH_BUCKET_STRAW:
+               crush_destroy_bucket_straw((struct crush_bucket_straw *)b);
+               break;
+       }
+}
+
+/**
+ * crush_destroy - Destroy a crush_map
+ * @map: crush_map pointer
+ */
+void crush_destroy(struct crush_map *map)
+{
+       int b;
+
+       /* buckets */
+       if (map->buckets) {
+               for (b = 0; b < map->max_buckets; b++) {
+                       if (map->buckets[b] == NULL)
+                               continue;
+                       crush_destroy_bucket(map->buckets[b]);
+               }
+               kfree(map->buckets);
+       }
+
+       /* rules */
+       if (map->rules) {
+               for (b = 0; b < map->max_rules; b++)
+                       kfree(map->rules[b]);
+               kfree(map->rules);
+       }
+
+       kfree(map->bucket_parents);
+       kfree(map->device_parents);
+       kfree(map);
+}
+
+
diff --git a/net/ceph/crush/hash.c b/net/ceph/crush/hash.c

new file mode 100644 (file)

index 0000000..5bb63e3
--- /dev/null
+++ b/net/ceph/crush/hash.c
@@ -0,0 +1,149 @@
+
+#include <linux/types.h>
+#include <linux/crush/hash.h>
+
+/*
+ * Robert Jenkins' function for mixing 32-bit values
+ * http://burtleburtle.net/bob/hash/evahash.html
+ * a, b = random bits, c = input and output
+ */
+#define crush_hashmix(a, b, c) do {                    \
+               a = a-b;  a = a-c;  a = a^(c>>13);      \
+               b = b-c;  b = b-a;  b = b^(a<<8);       \
+               c = c-a;  c = c-b;  c = c^(b>>13);      \
+               a = a-b;  a = a-c;  a = a^(c>>12);      \
+               b = b-c;  b = b-a;  b = b^(a<<16);      \
+               c = c-a;  c = c-b;  c = c^(b>>5);       \
+               a = a-b;  a = a-c;  a = a^(c>>3);       \
+               b = b-c;  b = b-a;  b = b^(a<<10);      \
+               c = c-a;  c = c-b;  c = c^(b>>15);      \
+       } while (0)
+
+#define crush_hash_seed 1315423911
+
+static __u32 crush_hash32_rjenkins1(__u32 a)
+{
+       __u32 hash = crush_hash_seed ^ a;
+       __u32 b = a;
+       __u32 x = 231232;
+       __u32 y = 1232;
+       crush_hashmix(b, x, hash);
+       crush_hashmix(y, a, hash);
+       return hash;
+}
+
+static __u32 crush_hash32_rjenkins1_2(__u32 a, __u32 b)
+{
+       __u32 hash = crush_hash_seed ^ a ^ b;
+       __u32 x = 231232;
+       __u32 y = 1232;
+       crush_hashmix(a, b, hash);
+       crush_hashmix(x, a, hash);
+       crush_hashmix(b, y, hash);
+       return hash;
+}
+
+static __u32 crush_hash32_rjenkins1_3(__u32 a, __u32 b, __u32 c)
+{
+       __u32 hash = crush_hash_seed ^ a ^ b ^ c;
+       __u32 x = 231232;
+       __u32 y = 1232;
+       crush_hashmix(a, b, hash);
+       crush_hashmix(c, x, hash);
+       crush_hashmix(y, a, hash);
+       crush_hashmix(b, x, hash);
+       crush_hashmix(y, c, hash);
+       return hash;
+}
+
+static __u32 crush_hash32_rjenkins1_4(__u32 a, __u32 b, __u32 c, __u32 d)
+{
+       __u32 hash = crush_hash_seed ^ a ^ b ^ c ^ d;
+       __u32 x = 231232;
+       __u32 y = 1232;
+       crush_hashmix(a, b, hash);
+       crush_hashmix(c, d, hash);
+       crush_hashmix(a, x, hash);
+       crush_hashmix(y, b, hash);
+       crush_hashmix(c, x, hash);
+       crush_hashmix(y, d, hash);
+       return hash;
+}
+
+static __u32 crush_hash32_rjenkins1_5(__u32 a, __u32 b, __u32 c, __u32 d,
+                                     __u32 e)
+{
+       __u32 hash = crush_hash_seed ^ a ^ b ^ c ^ d ^ e;
+       __u32 x = 231232;
+       __u32 y = 1232;
+       crush_hashmix(a, b, hash);
+       crush_hashmix(c, d, hash);
+       crush_hashmix(e, x, hash);
+       crush_hashmix(y, a, hash);
+       crush_hashmix(b, x, hash);
+       crush_hashmix(y, c, hash);
+       crush_hashmix(d, x, hash);
+       crush_hashmix(y, e, hash);
+       return hash;
+}
+
+
+__u32 crush_hash32(int type, __u32 a)
+{
+       switch (type) {
+       case CRUSH_HASH_RJENKINS1:
+               return crush_hash32_rjenkins1(a);
+       default:
+               return 0;
+       }
+}
+
+__u32 crush_hash32_2(int type, __u32 a, __u32 b)
+{
+       switch (type) {
+       case CRUSH_HASH_RJENKINS1:
+               return crush_hash32_rjenkins1_2(a, b);
+       default:
+               return 0;
+       }
+}
+
+__u32 crush_hash32_3(int type, __u32 a, __u32 b, __u32 c)
+{
+       switch (type) {
+       case CRUSH_HASH_RJENKINS1:
+               return crush_hash32_rjenkins1_3(a, b, c);
+       default:
+               return 0;
+       }
+}
+
+__u32 crush_hash32_4(int type, __u32 a, __u32 b, __u32 c, __u32 d)
+{
+       switch (type) {
+       case CRUSH_HASH_RJENKINS1:
+               return crush_hash32_rjenkins1_4(a, b, c, d);
+       default:
+               return 0;
+       }
+}
+
+__u32 crush_hash32_5(int type, __u32 a, __u32 b, __u32 c, __u32 d, __u32 e)
+{
+       switch (type) {
+       case CRUSH_HASH_RJENKINS1:
+               return crush_hash32_rjenkins1_5(a, b, c, d, e);
+       default:
+               return 0;
+       }
+}
+
+const char *crush_hash_name(int type)
+{
+       switch (type) {
+       case CRUSH_HASH_RJENKINS1:
+               return "rjenkins1";
+       default:
+               return "unknown";
+       }
+}
diff --git a/net/ceph/crush/mapper.c b/net/ceph/crush/mapper.c

new file mode 100644 (file)

index 0000000..42599e3
--- /dev/null
+++ b/net/ceph/crush/mapper.c
@@ -0,0 +1,609 @@
+
+#ifdef __KERNEL__
+# include <linux/string.h>
+# include <linux/slab.h>
+# include <linux/bug.h>
+# include <linux/kernel.h>
+# ifndef dprintk
+#  define dprintk(args...)
+# endif
+#else
+# include <string.h>
+# include <stdio.h>
+# include <stdlib.h>
+# include <assert.h>
+# define BUG_ON(x) assert(!(x))
+# define dprintk(args...) /* printf(args) */
+# define kmalloc(x, f) malloc(x)
+# define kfree(x) free(x)
+#endif
+
+#include <linux/crush/crush.h>
+#include <linux/crush/hash.h>
+
+/*
+ * Implement the core CRUSH mapping algorithm.
+ */
+
+/**
+ * crush_find_rule - find a crush_rule id for a given ruleset, type, and size.
+ * @map: the crush_map
+ * @ruleset: the storage ruleset id (user defined)
+ * @type: storage ruleset type (user defined)
+ * @size: output set size
+ */
+int crush_find_rule(struct crush_map *map, int ruleset, int type, int size)
+{
+       int i;
+
+       for (i = 0; i < map->max_rules; i++) {
+               if (map->rules[i] &&
+                   map->rules[i]->mask.ruleset == ruleset &&
+                   map->rules[i]->mask.type == type &&
+                   map->rules[i]->mask.min_size <= size &&
+                   map->rules[i]->mask.max_size >= size)
+                       return i;
+       }
+       return -1;
+}
+
+
+/*
+ * bucket choose methods
+ *
+ * For each bucket algorithm, we have a "choose" method that, given a
+ * crush input @x and replica position (usually, position in output set) @r,
+ * will produce an item in the bucket.
+ */
+
+/*
+ * Choose based on a random permutation of the bucket.
+ *
+ * We used to use some prime number arithmetic to do this, but it
+ * wasn't very random, and had some other bad behaviors.  Instead, we
+ * calculate an actual random permutation of the bucket members.
+ * Since this is expensive, we optimize for the r=0 case, which
+ * captures the vast majority of calls.
+ */
+static int bucket_perm_choose(struct crush_bucket *bucket,
+                             int x, int r)
+{
+       unsigned pr = r % bucket->size;
+       unsigned i, s;
+
+       /* start a new permutation if @x has changed */
+       if (bucket->perm_x != x || bucket->perm_n == 0) {
+               dprintk("bucket %d new x=%d\n", bucket->id, x);
+               bucket->perm_x = x;
+
+               /* optimize common r=0 case */
+               if (pr == 0) {
+                       s = crush_hash32_3(bucket->hash, x, bucket->id, 0) %
+                               bucket->size;
+                       bucket->perm[0] = s;
+                       bucket->perm_n = 0xffff;   /* magic value, see below */
+                       goto out;
+               }
+
+               for (i = 0; i < bucket->size; i++)
+                       bucket->perm[i] = i;
+               bucket->perm_n = 0;
+       } else if (bucket->perm_n == 0xffff) {
+               /* clean up after the r=0 case above */
+               for (i = 1; i < bucket->size; i++)
+                       bucket->perm[i] = i;
+               bucket->perm[bucket->perm[0]] = 0;
+               bucket->perm_n = 1;
+       }
+
+       /* calculate permutation up to pr */
+       for (i = 0; i < bucket->perm_n; i++)
+               dprintk(" perm_choose have %d: %d\n", i, bucket->perm[i]);
+       while (bucket->perm_n <= pr) {
+               unsigned p = bucket->perm_n;
+               /* no point in swapping the final entry */
+               if (p < bucket->size - 1) {
+                       i = crush_hash32_3(bucket->hash, x, bucket->id, p) %
+                               (bucket->size - p);
+                       if (i) {
+                               unsigned t = bucket->perm[p + i];
+                               bucket->perm[p + i] = bucket->perm[p];
+                               bucket->perm[p] = t;
+                       }
+                       dprintk(" perm_choose swap %d with %d\n", p, p+i);
+               }
+               bucket->perm_n++;
+       }
+       for (i = 0; i < bucket->size; i++)
+               dprintk(" perm_choose  %d: %d\n", i, bucket->perm[i]);
+
+       s = bucket->perm[pr];
+out:
+       dprintk(" perm_choose %d sz=%d x=%d r=%d (%d) s=%d\n", bucket->id,
+               bucket->size, x, r, pr, s);
+       return bucket->items[s];
+}
+
+/* uniform */
+static int bucket_uniform_choose(struct crush_bucket_uniform *bucket,
+                                int x, int r)
+{
+       return bucket_perm_choose(&bucket->h, x, r);
+}
+
+/* list */
+static int bucket_list_choose(struct crush_bucket_list *bucket,
+                             int x, int r)
+{
+       int i;
+
+       for (i = bucket->h.size-1; i >= 0; i--) {
+               __u64 w = crush_hash32_4(bucket->h.hash,x, bucket->h.items[i],
+                                        r, bucket->h.id);
+               w &= 0xffff;
+               dprintk("list_choose i=%d x=%d r=%d item %d weight %x "
+                       "sw %x rand %llx",
+                       i, x, r, bucket->h.items[i], bucket->item_weights[i],
+                       bucket->sum_weights[i], w);
+               w *= bucket->sum_weights[i];
+               w = w >> 16;
+               /*dprintk(" scaled %llx\n", w);*/
+               if (w < bucket->item_weights[i])
+                       return bucket->h.items[i];
+       }
+
+       BUG_ON(1);
+       return 0;
+}
+
+
+/* (binary) tree */
+static int height(int n)
+{
+       int h = 0;
+       while ((n & 1) == 0) {
+               h++;
+               n = n >> 1;
+       }
+       return h;
+}
+
+static int left(int x)
+{
+       int h = height(x);
+       return x - (1 << (h-1));
+}
+
+static int right(int x)
+{
+       int h = height(x);
+       return x + (1 << (h-1));
+}
+
+static int terminal(int x)
+{
+       return x & 1;
+}
+
+static int bucket_tree_choose(struct crush_bucket_tree *bucket,
+                             int x, int r)
+{
+       int n, l;
+       __u32 w;
+       __u64 t;
+
+       /* start at root */
+       n = bucket->num_nodes >> 1;
+
+       while (!terminal(n)) {
+               /* pick point in [0, w) */
+               w = bucket->node_weights[n];
+               t = (__u64)crush_hash32_4(bucket->h.hash, x, n, r,
+                                         bucket->h.id) * (__u64)w;
+               t = t >> 32;
+
+               /* descend to the left or right? */
+               l = left(n);
+               if (t < bucket->node_weights[l])
+                       n = l;
+               else
+                       n = right(n);
+       }
+
+       return bucket->h.items[n >> 1];
+}
+
+
+/* straw */
+
+static int bucket_straw_choose(struct crush_bucket_straw *bucket,
+                              int x, int r)
+{
+       int i;
+       int high = 0;
+       __u64 high_draw = 0;
+       __u64 draw;
+
+       for (i = 0; i < bucket->h.size; i++) {
+               draw = crush_hash32_3(bucket->h.hash, x, bucket->h.items[i], r);
+               draw &= 0xffff;
+               draw *= bucket->straws[i];
+               if (i == 0 || draw > high_draw) {
+                       high = i;
+                       high_draw = draw;
+               }
+       }
+       return bucket->h.items[high];
+}
+
+static int crush_bucket_choose(struct crush_bucket *in, int x, int r)
+{
+       dprintk(" crush_bucket_choose %d x=%d r=%d\n", in->id, x, r);
+       switch (in->alg) {
+       case CRUSH_BUCKET_UNIFORM:
+               return bucket_uniform_choose((struct crush_bucket_uniform *)in,
+                                         x, r);
+       case CRUSH_BUCKET_LIST:
+               return bucket_list_choose((struct crush_bucket_list *)in,
+                                         x, r);
+       case CRUSH_BUCKET_TREE:
+               return bucket_tree_choose((struct crush_bucket_tree *)in,
+                                         x, r);
+       case CRUSH_BUCKET_STRAW:
+               return bucket_straw_choose((struct crush_bucket_straw *)in,
+                                          x, r);
+       default:
+               BUG_ON(1);
+               return in->items[0];
+       }
+}
+
+/*
+ * true if device is marked "out" (failed, fully offloaded)
+ * of the cluster
+ */
+static int is_out(struct crush_map *map, __u32 *weight, int item, int x)
+{
+       if (weight[item] >= 0x10000)
+               return 0;
+       if (weight[item] == 0)
+               return 1;
+       if ((crush_hash32_2(CRUSH_HASH_RJENKINS1, x, item) & 0xffff)
+           < weight[item])
+               return 0;
+       return 1;
+}
+
+/**
+ * crush_choose - choose numrep distinct items of given type
+ * @map: the crush_map
+ * @bucket: the bucket we are choose an item from
+ * @x: crush input value
+ * @numrep: the number of items to choose
+ * @type: the type of item to choose
+ * @out: pointer to output vector
+ * @outpos: our position in that vector
+ * @firstn: true if choosing "first n" items, false if choosing "indep"
+ * @recurse_to_leaf: true if we want one device under each item of given type
+ * @out2: second output vector for leaf items (if @recurse_to_leaf)
+ */
+static int crush_choose(struct crush_map *map,
+                       struct crush_bucket *bucket,
+                       __u32 *weight,
+                       int x, int numrep, int type,
+                       int *out, int outpos,
+                       int firstn, int recurse_to_leaf,
+                       int *out2)
+{
+       int rep;
+       int ftotal, flocal;
+       int retry_descent, retry_bucket, skip_rep;
+       struct crush_bucket *in = bucket;
+       int r;
+       int i;
+       int item = 0;
+       int itemtype;
+       int collide, reject;
+       const int orig_tries = 5; /* attempts before we fall back to search */
+
+       dprintk("CHOOSE%s bucket %d x %d outpos %d numrep %d\n", recurse_to_leaf ? "_LEAF" : "",
+               bucket->id, x, outpos, numrep);
+
+       for (rep = outpos; rep < numrep; rep++) {
+               /* keep trying until we get a non-out, non-colliding item */
+               ftotal = 0;
+               skip_rep = 0;
+               do {
+                       retry_descent = 0;
+                       in = bucket;               /* initial bucket */
+
+                       /* choose through intervening buckets */
+                       flocal = 0;
+                       do {
+                               collide = 0;
+                               retry_bucket = 0;
+                               r = rep;
+                               if (in->alg == CRUSH_BUCKET_UNIFORM) {
+                                       /* be careful */
+                                       if (firstn || numrep >= in->size)
+                                               /* r' = r + f_total */
+                                               r += ftotal;
+                                       else if (in->size % numrep == 0)
+                                               /* r'=r+(n+1)*f_local */
+                                               r += (numrep+1) *
+                                                       (flocal+ftotal);
+                                       else
+                                               /* r' = r + n*f_local */
+                                               r += numrep * (flocal+ftotal);
+                               } else {
+                                       if (firstn)
+                                               /* r' = r + f_total */
+                                               r += ftotal;
+                                       else
+                                               /* r' = r + n*f_local */
+                                               r += numrep * (flocal+ftotal);
+                               }
+
+                               /* bucket choose */
+                               if (in->size == 0) {
+                                       reject = 1;
+                                       goto reject;
+                               }
+                               if (flocal >= (in->size>>1) &&
+                                   flocal > orig_tries)
+                                       item = bucket_perm_choose(in, x, r);
+                               else
+                                       item = crush_bucket_choose(in, x, r);
+                               BUG_ON(item >= map->max_devices);
+
+                               /* desired type? */
+                               if (item < 0)
+                                       itemtype = map->buckets[-1-item]->type;
+                               else
+                                       itemtype = 0;
+                               dprintk("  item %d type %d\n", item, itemtype);
+
+                               /* keep going? */
+                               if (itemtype != type) {
+                                       BUG_ON(item >= 0 ||
+                                              (-1-item) >= map->max_buckets);
+                                       in = map->buckets[-1-item];
+                                       retry_bucket = 1;
+                                       continue;
+                               }
+
+                               /* collision? */
+                               for (i = 0; i < outpos; i++) {
+                                       if (out[i] == item) {
+                                               collide = 1;
+                                               break;
+                                       }
+                               }
+
+                               reject = 0;
+                               if (recurse_to_leaf) {
+                                       if (item < 0) {
+                                               if (crush_choose(map,
+                                                        map->buckets[-1-item],
+                                                        weight,
+                                                        x, outpos+1, 0,
+                                                        out2, outpos,
+                                                        firstn, 0,
+                                                        NULL) <= outpos)
+                                                       /* didn't get leaf */
+                                                       reject = 1;
+                                       } else {
+                                               /* we already have a leaf! */
+                                               out2[outpos] = item;
+                                       }
+                               }
+
+                               if (!reject) {
+                                       /* out? */
+                                       if (itemtype == 0)
+                                               reject = is_out(map, weight,
+                                                               item, x);
+                                       else
+                                               reject = 0;
+                               }
+
+reject:
+                               if (reject || collide) {
+                                       ftotal++;
+                                       flocal++;
+
+                                       if (collide && flocal < 3)
+                                               /* retry locally a few times */
+                                               retry_bucket = 1;
+                                       else if (flocal < in->size + orig_tries)
+                                               /* exhaustive bucket search */
+                                               retry_bucket = 1;
+                                       else if (ftotal < 20)
+                                               /* then retry descent */
+                                               retry_descent = 1;
+                                       else
+                                               /* else give up */
+                                               skip_rep = 1;
+                                       dprintk("  reject %d  collide %d  "
+                                               "ftotal %d  flocal %d\n",
+                                               reject, collide, ftotal,
+                                               flocal);
+                               }
+                       } while (retry_bucket);
+               } while (retry_descent);
+
+               if (skip_rep) {
+                       dprintk("skip rep\n");
+                       continue;
+               }
+
+               dprintk("CHOOSE got %d\n", item);
+               out[outpos] = item;
+               outpos++;
+       }
+
+       dprintk("CHOOSE returns %d\n", outpos);
+       return outpos;
+}
+
+
+/**
+ * crush_do_rule - calculate a mapping with the given input and rule
+ * @map: the crush_map
+ * @ruleno: the rule id
+ * @x: hash input
+ * @result: pointer to result vector
+ * @result_max: maximum result size
+ * @force: force initial replica choice; -1 for none
+ */
+int crush_do_rule(struct crush_map *map,
+                 int ruleno, int x, int *result, int result_max,
+                 int force, __u32 *weight)
+{
+       int result_len;
+       int force_context[CRUSH_MAX_DEPTH];
+       int force_pos = -1;
+       int a[CRUSH_MAX_SET];
+       int b[CRUSH_MAX_SET];
+       int c[CRUSH_MAX_SET];
+       int recurse_to_leaf;
+       int *w;
+       int wsize = 0;
+       int *o;
+       int osize;
+       int *tmp;
+       struct crush_rule *rule;
+       int step;
+       int i, j;
+       int numrep;
+       int firstn;
+       int rc = -1;
+
+       BUG_ON(ruleno >= map->max_rules);
+
+       rule = map->rules[ruleno];
+       result_len = 0;
+       w = a;
+       o = b;
+
+       /*
+        * determine hierarchical context of force, if any.  note
+        * that this may or may not correspond to the specific types
+        * referenced by the crush rule.
+        */
+       if (force >= 0) {
+               if (force >= map->max_devices ||
+                   map->device_parents[force] == 0) {
+                       /*dprintk("CRUSH: forcefed device dne\n");*/
+                       rc = -1;  /* force fed device dne */
+                       goto out;
+               }
+               if (!is_out(map, weight, force, x)) {
+                       while (1) {
+                               force_context[++force_pos] = force;
+                               if (force >= 0)
+                                       force = map->device_parents[force];
+                               else
+                                       force = map->bucket_parents[-1-force];
+                               if (force == 0)
+                                       break;
+                       }
+               }
+       }
+
+       for (step = 0; step < rule->len; step++) {
+               firstn = 0;
+               switch (rule->steps[step].op) {
+               case CRUSH_RULE_TAKE:
+                       w[0] = rule->steps[step].arg1;
+                       if (force_pos >= 0) {
+                               BUG_ON(force_context[force_pos] != w[0]);
+                               force_pos--;
+                       }
+                       wsize = 1;
+                       break;
+
+               case CRUSH_RULE_CHOOSE_LEAF_FIRSTN:
+               case CRUSH_RULE_CHOOSE_FIRSTN:
+                       firstn = 1;
+               case CRUSH_RULE_CHOOSE_LEAF_INDEP:
+               case CRUSH_RULE_CHOOSE_INDEP:
+                       BUG_ON(wsize == 0);
+
+                       recurse_to_leaf =
+                               rule->steps[step].op ==
+                                CRUSH_RULE_CHOOSE_LEAF_FIRSTN ||
+                               rule->steps[step].op ==
+                               CRUSH_RULE_CHOOSE_LEAF_INDEP;
+
+                       /* reset output */
+                       osize = 0;
+
+                       for (i = 0; i < wsize; i++) {
+                               /*
+                                * see CRUSH_N, CRUSH_N_MINUS macros.
+                                * basically, numrep <= 0 means relative to
+                                * the provided result_max
+                                */
+                               numrep = rule->steps[step].arg1;
+                               if (numrep <= 0) {
+                                       numrep += result_max;
+                                       if (numrep <= 0)
+                                               continue;
+                               }
+                               j = 0;
+                               if (osize == 0 && force_pos >= 0) {
+                                       /* skip any intermediate types */
+                                       while (force_pos &&
+                                              force_context[force_pos] < 0 &&
+                                              rule->steps[step].arg2 !=
+                                              map->buckets[-1 -
+                                              force_context[force_pos]]->type)
+                                               force_pos--;
+                                       o[osize] = force_context[force_pos];
+                                       if (recurse_to_leaf)
+                                               c[osize] = force_context[0];
+                                       j++;
+                                       force_pos--;
+                               }
+                               osize += crush_choose(map,
+                                                     map->buckets[-1-w[i]],
+                                                     weight,
+                                                     x, numrep,
+                                                     rule->steps[step].arg2,
+                                                     o+osize, j,
+                                                     firstn,
+                                                     recurse_to_leaf, c+osize);
+                       }
+
+                       if (recurse_to_leaf)
+                               /* copy final _leaf_ values to output set */
+                               memcpy(o, c, osize*sizeof(*o));
+
+                       /* swap t and w arrays */
+                       tmp = o;
+                       o = w;
+                       w = tmp;
+                       wsize = osize;
+                       break;
+
+
+               case CRUSH_RULE_EMIT:
+                       for (i = 0; i < wsize && result_len < result_max; i++) {
+                               result[result_len] = w[i];
+                               result_len++;
+                       }
+                       wsize = 0;
+                       break;
+
+               default:
+                       BUG_ON(1);
+               }
+       }
+       rc = result_len;
+
+out:
+       return rc;
+}
+
+
diff --git a/net/ceph/crypto.c b/net/ceph/crypto.c

new file mode 100644 (file)

index 0000000..7b505b0
--- /dev/null
+++ b/net/ceph/crypto.c
@@ -0,0 +1,412 @@
+
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/err.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <crypto/hash.h>
+
+#include <linux/ceph/decode.h>
+#include "crypto.h"
+
+int ceph_crypto_key_encode(struct ceph_crypto_key *key, void **p, void *end)
+{
+       if (*p + sizeof(u16) + sizeof(key->created) +
+           sizeof(u16) + key->len > end)
+               return -ERANGE;
+       ceph_encode_16(p, key->type);
+       ceph_encode_copy(p, &key->created, sizeof(key->created));
+       ceph_encode_16(p, key->len);
+       ceph_encode_copy(p, key->key, key->len);
+       return 0;
+}
+
+int ceph_crypto_key_decode(struct ceph_crypto_key *key, void **p, void *end)
+{
+       ceph_decode_need(p, end, 2*sizeof(u16) + sizeof(key->created), bad);
+       key->type = ceph_decode_16(p);
+       ceph_decode_copy(p, &key->created, sizeof(key->created));
+       key->len = ceph_decode_16(p);
+       ceph_decode_need(p, end, key->len, bad);
+       key->key = kmalloc(key->len, GFP_NOFS);
+       if (!key->key)
+               return -ENOMEM;
+       ceph_decode_copy(p, key->key, key->len);
+       return 0;
+
+bad:
+       dout("failed to decode crypto key\n");
+       return -EINVAL;
+}
+
+int ceph_crypto_key_unarmor(struct ceph_crypto_key *key, const char *inkey)
+{
+       int inlen = strlen(inkey);
+       int blen = inlen * 3 / 4;
+       void *buf, *p;
+       int ret;
+
+       dout("crypto_key_unarmor %s\n", inkey);
+       buf = kmalloc(blen, GFP_NOFS);
+       if (!buf)
+               return -ENOMEM;
+       blen = ceph_unarmor(buf, inkey, inkey+inlen);
+       if (blen < 0) {
+               kfree(buf);
+               return blen;
+       }
+
+       p = buf;
+       ret = ceph_crypto_key_decode(key, &p, p + blen);
+       kfree(buf);
+       if (ret)
+               return ret;
+       dout("crypto_key_unarmor key %p type %d len %d\n", key,
+            key->type, key->len);
+       return 0;
+}
+
+
+
+#define AES_KEY_SIZE 16
+
+static struct crypto_blkcipher *ceph_crypto_alloc_cipher(void)
+{
+       return crypto_alloc_blkcipher("cbc(aes)", 0, CRYPTO_ALG_ASYNC);
+}
+
+static const u8 *aes_iv = (u8 *)CEPH_AES_IV;
+
+static int ceph_aes_encrypt(const void *key, int key_len,
+                           void *dst, size_t *dst_len,
+                           const void *src, size_t src_len)
+{
+       struct scatterlist sg_in[2], sg_out[1];
+       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
+       struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
+       int ret;
+       void *iv;
+       int ivsize;
+       size_t zero_padding = (0x10 - (src_len & 0x0f));
+       char pad[16];
+
+       if (IS_ERR(tfm))
+               return PTR_ERR(tfm);
+
+       memset(pad, zero_padding, zero_padding);
+
+       *dst_len = src_len + zero_padding;
+
+       crypto_blkcipher_setkey((void *)tfm, key, key_len);
+       sg_init_table(sg_in, 2);
+       sg_set_buf(&sg_in[0], src, src_len);
+       sg_set_buf(&sg_in[1], pad, zero_padding);
+       sg_init_table(sg_out, 1);
+       sg_set_buf(sg_out, dst, *dst_len);
+       iv = crypto_blkcipher_crt(tfm)->iv;
+       ivsize = crypto_blkcipher_ivsize(tfm);
+
+       memcpy(iv, aes_iv, ivsize);
+       /*
+       print_hex_dump(KERN_ERR, "enc key: ", DUMP_PREFIX_NONE, 16, 1,
+                      key, key_len, 1);
+       print_hex_dump(KERN_ERR, "enc src: ", DUMP_PREFIX_NONE, 16, 1,
+                       src, src_len, 1);
+       print_hex_dump(KERN_ERR, "enc pad: ", DUMP_PREFIX_NONE, 16, 1,
+                       pad, zero_padding, 1);
+       */
+       ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
+                                    src_len + zero_padding);
+       crypto_free_blkcipher(tfm);
+       if (ret < 0)
+               pr_err("ceph_aes_crypt failed %d\n", ret);
+       /*
+       print_hex_dump(KERN_ERR, "enc out: ", DUMP_PREFIX_NONE, 16, 1,
+                      dst, *dst_len, 1);
+       */
+       return 0;
+}
+
+static int ceph_aes_encrypt2(const void *key, int key_len, void *dst,
+                            size_t *dst_len,
+                            const void *src1, size_t src1_len,
+                            const void *src2, size_t src2_len)
+{
+       struct scatterlist sg_in[3], sg_out[1];
+       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
+       struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
+       int ret;
+       void *iv;
+       int ivsize;
+       size_t zero_padding = (0x10 - ((src1_len + src2_len) & 0x0f));
+       char pad[16];
+
+       if (IS_ERR(tfm))
+               return PTR_ERR(tfm);
+
+       memset(pad, zero_padding, zero_padding);
+
+       *dst_len = src1_len + src2_len + zero_padding;
+
+       crypto_blkcipher_setkey((void *)tfm, key, key_len);
+       sg_init_table(sg_in, 3);
+       sg_set_buf(&sg_in[0], src1, src1_len);
+       sg_set_buf(&sg_in[1], src2, src2_len);
+       sg_set_buf(&sg_in[2], pad, zero_padding);
+       sg_init_table(sg_out, 1);
+       sg_set_buf(sg_out, dst, *dst_len);
+       iv = crypto_blkcipher_crt(tfm)->iv;
+       ivsize = crypto_blkcipher_ivsize(tfm);
+
+       memcpy(iv, aes_iv, ivsize);
+       /*
+       print_hex_dump(KERN_ERR, "enc  key: ", DUMP_PREFIX_NONE, 16, 1,
+                      key, key_len, 1);
+       print_hex_dump(KERN_ERR, "enc src1: ", DUMP_PREFIX_NONE, 16, 1,
+                       src1, src1_len, 1);
+       print_hex_dump(KERN_ERR, "enc src2: ", DUMP_PREFIX_NONE, 16, 1,
+                       src2, src2_len, 1);
+       print_hex_dump(KERN_ERR, "enc  pad: ", DUMP_PREFIX_NONE, 16, 1,
+                       pad, zero_padding, 1);
+       */
+       ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
+                                    src1_len + src2_len + zero_padding);
+       crypto_free_blkcipher(tfm);
+       if (ret < 0)
+               pr_err("ceph_aes_crypt2 failed %d\n", ret);
+       /*
+       print_hex_dump(KERN_ERR, "enc  out: ", DUMP_PREFIX_NONE, 16, 1,
+                      dst, *dst_len, 1);
+       */
+       return 0;
+}
+
+static int ceph_aes_decrypt(const void *key, int key_len,
+                           void *dst, size_t *dst_len,
+                           const void *src, size_t src_len)
+{
+       struct scatterlist sg_in[1], sg_out[2];
+       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
+       struct blkcipher_desc desc = { .tfm = tfm };
+       char pad[16];
+       void *iv;
+       int ivsize;
+       int ret;
+       int last_byte;
+
+       if (IS_ERR(tfm))
+               return PTR_ERR(tfm);
+
+       crypto_blkcipher_setkey((void *)tfm, key, key_len);
+       sg_init_table(sg_in, 1);
+       sg_init_table(sg_out, 2);
+       sg_set_buf(sg_in, src, src_len);
+       sg_set_buf(&sg_out[0], dst, *dst_len);
+       sg_set_buf(&sg_out[1], pad, sizeof(pad));
+
+       iv = crypto_blkcipher_crt(tfm)->iv;
+       ivsize = crypto_blkcipher_ivsize(tfm);
+
+       memcpy(iv, aes_iv, ivsize);
+
+       /*
+       print_hex_dump(KERN_ERR, "dec key: ", DUMP_PREFIX_NONE, 16, 1,
+                      key, key_len, 1);
+       print_hex_dump(KERN_ERR, "dec  in: ", DUMP_PREFIX_NONE, 16, 1,
+                      src, src_len, 1);
+       */
+
+       ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
+       crypto_free_blkcipher(tfm);
+       if (ret < 0) {
+               pr_err("ceph_aes_decrypt failed %d\n", ret);
+               return ret;
+       }
+
+       if (src_len <= *dst_len)
+               last_byte = ((char *)dst)[src_len - 1];
+       else
+               last_byte = pad[src_len - *dst_len - 1];
+       if (last_byte <= 16 && src_len >= last_byte) {
+               *dst_len = src_len - last_byte;
+       } else {
+               pr_err("ceph_aes_decrypt got bad padding %d on src len %d\n",
+                      last_byte, (int)src_len);
+               return -EPERM;  /* bad padding */
+       }
+       /*
+       print_hex_dump(KERN_ERR, "dec out: ", DUMP_PREFIX_NONE, 16, 1,
+                      dst, *dst_len, 1);
+       */
+       return 0;
+}
+
+static int ceph_aes_decrypt2(const void *key, int key_len,
+                            void *dst1, size_t *dst1_len,
+                            void *dst2, size_t *dst2_len,
+                            const void *src, size_t src_len)
+{
+       struct scatterlist sg_in[1], sg_out[3];
+       struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
+       struct blkcipher_desc desc = { .tfm = tfm };
+       char pad[16];
+       void *iv;
+       int ivsize;
+       int ret;
+       int last_byte;
+
+       if (IS_ERR(tfm))
+               return PTR_ERR(tfm);
+
+       sg_init_table(sg_in, 1);
+       sg_set_buf(sg_in, src, src_len);
+       sg_init_table(sg_out, 3);
+       sg_set_buf(&sg_out[0], dst1, *dst1_len);
+       sg_set_buf(&sg_out[1], dst2, *dst2_len);
+       sg_set_buf(&sg_out[2], pad, sizeof(pad));
+
+       crypto_blkcipher_setkey((void *)tfm, key, key_len);
+       iv = crypto_blkcipher_crt(tfm)->iv;
+       ivsize = crypto_blkcipher_ivsize(tfm);
+
+       memcpy(iv, aes_iv, ivsize);
+
+       /*
+       print_hex_dump(KERN_ERR, "dec  key: ", DUMP_PREFIX_NONE, 16, 1,
+                      key, key_len, 1);
+       print_hex_dump(KERN_ERR, "dec   in: ", DUMP_PREFIX_NONE, 16, 1,
+                      src, src_len, 1);
+       */
+
+       ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
+       crypto_free_blkcipher(tfm);
+       if (ret < 0) {
+               pr_err("ceph_aes_decrypt failed %d\n", ret);
+               return ret;
+       }
+
+       if (src_len <= *dst1_len)
+               last_byte = ((char *)dst1)[src_len - 1];
+       else if (src_len <= *dst1_len + *dst2_len)
+               last_byte = ((char *)dst2)[src_len - *dst1_len - 1];
+       else
+               last_byte = pad[src_len - *dst1_len - *dst2_len - 1];
+       if (last_byte <= 16 && src_len >= last_byte) {
+               src_len -= last_byte;
+       } else {
+               pr_err("ceph_aes_decrypt got bad padding %d on src len %d\n",
+                      last_byte, (int)src_len);
+               return -EPERM;  /* bad padding */
+       }
+
+       if (src_len < *dst1_len) {
+               *dst1_len = src_len;
+               *dst2_len = 0;
+       } else {
+               *dst2_len = src_len - *dst1_len;
+       }
+       /*
+       print_hex_dump(KERN_ERR, "dec  out1: ", DUMP_PREFIX_NONE, 16, 1,
+                      dst1, *dst1_len, 1);
+       print_hex_dump(KERN_ERR, "dec  out2: ", DUMP_PREFIX_NONE, 16, 1,
+                      dst2, *dst2_len, 1);
+       */
+
+       return 0;
+}
+
+
+int ceph_decrypt(struct ceph_crypto_key *secret, void *dst, size_t *dst_len,
+                const void *src, size_t src_len)
+{
+       switch (secret->type) {
+       case CEPH_CRYPTO_NONE:
+               if (*dst_len < src_len)
+                       return -ERANGE;
+               memcpy(dst, src, src_len);
+               *dst_len = src_len;
+               return 0;
+
+       case CEPH_CRYPTO_AES:
+               return ceph_aes_decrypt(secret->key, secret->len, dst,
+                                       dst_len, src, src_len);
+
+       default:
+               return -EINVAL;
+       }
+}
+
+int ceph_decrypt2(struct ceph_crypto_key *secret,
+                       void *dst1, size_t *dst1_len,
+                       void *dst2, size_t *dst2_len,
+                       const void *src, size_t src_len)
+{
+       size_t t;
+
+       switch (secret->type) {
+       case CEPH_CRYPTO_NONE:
+               if (*dst1_len + *dst2_len < src_len)
+                       return -ERANGE;
+               t = min(*dst1_len, src_len);
+               memcpy(dst1, src, t);
+               *dst1_len = t;
+               src += t;
+               src_len -= t;
+               if (src_len) {
+                       t = min(*dst2_len, src_len);
+                       memcpy(dst2, src, t);
+                       *dst2_len = t;
+               }
+               return 0;
+
+       case CEPH_CRYPTO_AES:
+               return ceph_aes_decrypt2(secret->key, secret->len,
+                                        dst1, dst1_len, dst2, dst2_len,
+                                        src, src_len);
+
+       default:
+               return -EINVAL;
+       }
+}
+
+int ceph_encrypt(struct ceph_crypto_key *secret, void *dst, size_t *dst_len,
+                const void *src, size_t src_len)
+{
+       switch (secret->type) {
+       case CEPH_CRYPTO_NONE:
+               if (*dst_len < src_len)
+                       return -ERANGE;
+               memcpy(dst, src, src_len);
+               *dst_len = src_len;
+               return 0;
+
+       case CEPH_CRYPTO_AES:
+               return ceph_aes_encrypt(secret->key, secret->len, dst,
+                                       dst_len, src, src_len);
+
+       default:
+               return -EINVAL;
+       }
+}
+
+int ceph_encrypt2(struct ceph_crypto_key *secret, void *dst, size_t *dst_len,
+                 const void *src1, size_t src1_len,
+                 const void *src2, size_t src2_len)
+{
+       switch (secret->type) {
+       case CEPH_CRYPTO_NONE:
+               if (*dst_len < src1_len + src2_len)
+                       return -ERANGE;
+               memcpy(dst, src1, src1_len);
+               memcpy(dst + src1_len, src2, src2_len);
+               *dst_len = src1_len + src2_len;
+               return 0;
+
+       case CEPH_CRYPTO_AES:
+               return ceph_aes_encrypt2(secret->key, secret->len, dst, dst_len,
+                                        src1, src1_len, src2, src2_len);
+
+       default:
+               return -EINVAL;
+       }
+}
diff --git a/net/ceph/crypto.h b/net/ceph/crypto.h

new file mode 100644 (file)

index 0000000..f9eccac
--- /dev/null
+++ b/net/ceph/crypto.h
@@ -0,0 +1,48 @@
+#ifndef _FS_CEPH_CRYPTO_H
+#define _FS_CEPH_CRYPTO_H
+
+#include <linux/ceph/types.h>
+#include <linux/ceph/buffer.h>
+
+/*
+ * cryptographic secret
+ */
+struct ceph_crypto_key {
+       int type;
+       struct ceph_timespec created;
+       int len;
+       void *key;
+};
+
+static inline void ceph_crypto_key_destroy(struct ceph_crypto_key *key)
+{
+       kfree(key->key);
+}
+
+extern int ceph_crypto_key_encode(struct ceph_crypto_key *key,
+                                 void **p, void *end);
+extern int ceph_crypto_key_decode(struct ceph_crypto_key *key,
+                                 void **p, void *end);
+extern int ceph_crypto_key_unarmor(struct ceph_crypto_key *key, const char *in);
+
+/* crypto.c */
+extern int ceph_decrypt(struct ceph_crypto_key *secret,
+                       void *dst, size_t *dst_len,
+                       const void *src, size_t src_len);
+extern int ceph_encrypt(struct ceph_crypto_key *secret,
+                       void *dst, size_t *dst_len,
+                       const void *src, size_t src_len);
+extern int ceph_decrypt2(struct ceph_crypto_key *secret,
+                       void *dst1, size_t *dst1_len,
+                       void *dst2, size_t *dst2_len,
+                       const void *src, size_t src_len);
+extern int ceph_encrypt2(struct ceph_crypto_key *secret,
+                        void *dst, size_t *dst_len,
+                        const void *src1, size_t src1_len,
+                        const void *src2, size_t src2_len);
+
+/* armor.c */
+extern int ceph_armor(char *dst, const char *src, const char *end);
+extern int ceph_unarmor(char *dst, const char *src, const char *end);
+
+#endif
diff --git a/net/ceph/debugfs.c b/net/ceph/debugfs.c

new file mode 100644 (file)

index 0000000..27d4ea3
--- /dev/null
+++ b/net/ceph/debugfs.c
@@ -0,0 +1,267 @@
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
+
+#ifdef CONFIG_DEBUG_FS
+
+/*
+ * Implement /sys/kernel/debug/ceph fun
+ *
+ * /sys/kernel/debug/ceph/client*  - an instance of the ceph client
+ *      .../osdmap      - current osdmap
+ *      .../monmap      - current monmap
+ *      .../osdc        - active osd requests
+ *      .../monc        - mon client state
+ *      .../dentry_lru  - dump contents of dentry lru
+ *      .../caps        - expose cap (reservation) stats
+ *      .../bdi         - symlink to ../../bdi/something
+ */
+
+static struct dentry *ceph_debugfs_dir;
+
+static int monmap_show(struct seq_file *s, void *p)
+{
+       int i;
+       struct ceph_client *client = s->private;
+
+       if (client->monc.monmap == NULL)
+               return 0;
+
+       seq_printf(s, "epoch %d\n", client->monc.monmap->epoch);
+       for (i = 0; i < client->monc.monmap->num_mon; i++) {
+               struct ceph_entity_inst *inst =
+                       &client->monc.monmap->mon_inst[i];
+
+               seq_printf(s, "\t%s%lld\t%s\n",
+                          ENTITY_NAME(inst->name),
+                          ceph_pr_addr(&inst->addr.in_addr));
+       }
+       return 0;
+}
+
+static int osdmap_show(struct seq_file *s, void *p)
+{
+       int i;
+       struct ceph_client *client = s->private;
+       struct rb_node *n;
+
+       if (client->osdc.osdmap == NULL)
+               return 0;
+       seq_printf(s, "epoch %d\n", client->osdc.osdmap->epoch);
+       seq_printf(s, "flags%s%s\n",
+                  (client->osdc.osdmap->flags & CEPH_OSDMAP_NEARFULL) ?
+                  " NEARFULL" : "",
+                  (client->osdc.osdmap->flags & CEPH_OSDMAP_FULL) ?
+                  " FULL" : "");
+       for (n = rb_first(&client->osdc.osdmap->pg_pools); n; n = rb_next(n)) {
+               struct ceph_pg_pool_info *pool =
+                       rb_entry(n, struct ceph_pg_pool_info, node);
+               seq_printf(s, "pg_pool %d pg_num %d / %d, lpg_num %d / %d\n",
+                          pool->id, pool->v.pg_num, pool->pg_num_mask,
+                          pool->v.lpg_num, pool->lpg_num_mask);
+       }
+       for (i = 0; i < client->osdc.osdmap->max_osd; i++) {
+               struct ceph_entity_addr *addr =
+                       &client->osdc.osdmap->osd_addr[i];
+               int state = client->osdc.osdmap->osd_state[i];
+               char sb[64];
+
+               seq_printf(s, "\tosd%d\t%s\t%3d%%\t(%s)\n",
+                          i, ceph_pr_addr(&addr->in_addr),
+                          ((client->osdc.osdmap->osd_weight[i]*100) >> 16),
+                          ceph_osdmap_state_str(sb, sizeof(sb), state));
+       }
+       return 0;
+}
+
+static int monc_show(struct seq_file *s, void *p)
+{
+       struct ceph_client *client = s->private;
+       struct ceph_mon_generic_request *req;
+       struct ceph_mon_client *monc = &client->monc;
+       struct rb_node *rp;
+
+       mutex_lock(&monc->mutex);
+
+       if (monc->have_mdsmap)
+               seq_printf(s, "have mdsmap %u\n", (unsigned)monc->have_mdsmap);
+       if (monc->have_osdmap)
+               seq_printf(s, "have osdmap %u\n", (unsigned)monc->have_osdmap);
+       if (monc->want_next_osdmap)
+               seq_printf(s, "want next osdmap\n");
+
+       for (rp = rb_first(&monc->generic_request_tree); rp; rp = rb_next(rp)) {
+               __u16 op;
+               req = rb_entry(rp, struct ceph_mon_generic_request, node);
+               op = le16_to_cpu(req->request->hdr.type);
+               if (op == CEPH_MSG_STATFS)
+                       seq_printf(s, "%lld statfs\n", req->tid);
+               else
+                       seq_printf(s, "%lld unknown\n", req->tid);
+       }
+
+       mutex_unlock(&monc->mutex);
+       return 0;
+}
+
+static int osdc_show(struct seq_file *s, void *pp)
+{
+       struct ceph_client *client = s->private;
+       struct ceph_osd_client *osdc = &client->osdc;
+       struct rb_node *p;
+
+       mutex_lock(&osdc->request_mutex);
+       for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
+               struct ceph_osd_request *req;
+               struct ceph_osd_request_head *head;
+               struct ceph_osd_op *op;
+               int num_ops;
+               int opcode, olen;
+               int i;
+
+               req = rb_entry(p, struct ceph_osd_request, r_node);
+
+               seq_printf(s, "%lld\tosd%d\t%d.%x\t", req->r_tid,
+                          req->r_osd ? req->r_osd->o_osd : -1,
+                          le32_to_cpu(req->r_pgid.pool),
+                          le16_to_cpu(req->r_pgid.ps));
+
+               head = req->r_request->front.iov_base;
+               op = (void *)(head + 1);
+
+               num_ops = le16_to_cpu(head->num_ops);
+               olen = le32_to_cpu(head->object_len);
+               seq_printf(s, "%.*s", olen,
+                          (const char *)(head->ops + num_ops));
+
+               if (req->r_reassert_version.epoch)
+                       seq_printf(s, "\t%u'%llu",
+                          (unsigned)le32_to_cpu(req->r_reassert_version.epoch),
+                          le64_to_cpu(req->r_reassert_version.version));
+               else
+                       seq_printf(s, "\t");
+
+               for (i = 0; i < num_ops; i++) {
+                       opcode = le16_to_cpu(op->op);
+                       seq_printf(s, "\t%s", ceph_osd_op_name(opcode));
+                       op++;
+               }
+
+               seq_printf(s, "\n");
+       }
+       mutex_unlock(&osdc->request_mutex);
+       return 0;
+}
+
+CEPH_DEFINE_SHOW_FUNC(monmap_show)
+CEPH_DEFINE_SHOW_FUNC(osdmap_show)
+CEPH_DEFINE_SHOW_FUNC(monc_show)
+CEPH_DEFINE_SHOW_FUNC(osdc_show)
+
+int ceph_debugfs_init(void)
+{
+       ceph_debugfs_dir = debugfs_create_dir("ceph", NULL);
+       if (!ceph_debugfs_dir)
+               return -ENOMEM;
+       return 0;
+}
+
+void ceph_debugfs_cleanup(void)
+{
+       debugfs_remove(ceph_debugfs_dir);
+}
+
+int ceph_debugfs_client_init(struct ceph_client *client)
+{
+       int ret = -ENOMEM;
+       char name[80];
+
+       snprintf(name, sizeof(name), "%pU.client%lld", &client->fsid,
+                client->monc.auth->global_id);
+
+       client->debugfs_dir = debugfs_create_dir(name, ceph_debugfs_dir);
+       if (!client->debugfs_dir)
+               goto out;
+
+       client->monc.debugfs_file = debugfs_create_file("monc",
+                                                     0600,
+                                                     client->debugfs_dir,
+                                                     client,
+                                                     &monc_show_fops);
+       if (!client->monc.debugfs_file)
+               goto out;
+
+       client->osdc.debugfs_file = debugfs_create_file("osdc",
+                                                     0600,
+                                                     client->debugfs_dir,
+                                                     client,
+                                                     &osdc_show_fops);
+       if (!client->osdc.debugfs_file)
+               goto out;
+
+       client->debugfs_monmap = debugfs_create_file("monmap",
+                                       0600,
+                                       client->debugfs_dir,
+                                       client,
+                                       &monmap_show_fops);
+       if (!client->debugfs_monmap)
+               goto out;
+
+       client->debugfs_osdmap = debugfs_create_file("osdmap",
+                                       0600,
+                                       client->debugfs_dir,
+                                       client,
+                                       &osdmap_show_fops);
+       if (!client->debugfs_osdmap)
+               goto out;
+
+       return 0;
+
+out:
+       ceph_debugfs_client_cleanup(client);
+       return ret;
+}
+
+void ceph_debugfs_client_cleanup(struct ceph_client *client)
+{
+       debugfs_remove(client->debugfs_osdmap);
+       debugfs_remove(client->debugfs_monmap);
+       debugfs_remove(client->osdc.debugfs_file);
+       debugfs_remove(client->monc.debugfs_file);
+       debugfs_remove(client->debugfs_dir);
+}
+
+#else  /* CONFIG_DEBUG_FS */
+
+int ceph_debugfs_init(void)
+{
+       return 0;
+}
+
+void ceph_debugfs_cleanup(void)
+{
+}
+
+int ceph_debugfs_client_init(struct ceph_client *client)
+{
+       return 0;
+}
+
+void ceph_debugfs_client_cleanup(struct ceph_client *client)
+{
+}
+
+#endif  /* CONFIG_DEBUG_FS */
+
+EXPORT_SYMBOL(ceph_debugfs_init);
+EXPORT_SYMBOL(ceph_debugfs_cleanup);
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c

new file mode 100644 (file)

index 0000000..0e8157e
--- /dev/null
+++ b/net/ceph/messenger.c
@@ -0,0 +1,2453 @@
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/crc32c.h>
+#include <linux/ctype.h>
+#include <linux/highmem.h>
+#include <linux/inet.h>
+#include <linux/kthread.h>
+#include <linux/net.h>
+#include <linux/slab.h>
+#include <linux/socket.h>
+#include <linux/string.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <net/tcp.h>
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/pagelist.h>
+
+/*
+ * Ceph uses the messenger to exchange ceph_msg messages with other
+ * hosts in the system.  The messenger provides ordered and reliable
+ * delivery.  We tolerate TCP disconnects by reconnecting (with
+ * exponential backoff) in the case of a fault (disconnection, bad
+ * crc, protocol error).  Acks allow sent messages to be discarded by
+ * the sender.
+ */
+
+/* static tag bytes (protocol control messages) */
+static char tag_msg = CEPH_MSGR_TAG_MSG;
+static char tag_ack = CEPH_MSGR_TAG_ACK;
+static char tag_keepalive = CEPH_MSGR_TAG_KEEPALIVE;
+
+#ifdef CONFIG_LOCKDEP
+static struct lock_class_key socket_class;
+#endif
+
+
+static void queue_con(struct ceph_connection *con);
+static void con_work(struct work_struct *);
+static void ceph_fault(struct ceph_connection *con);
+
+/*
+ * nicely render a sockaddr as a string.
+ */
+#define MAX_ADDR_STR 20
+#define MAX_ADDR_STR_LEN 60
+static char addr_str[MAX_ADDR_STR][MAX_ADDR_STR_LEN];
+static DEFINE_SPINLOCK(addr_str_lock);
+static int last_addr_str;
+
+const char *ceph_pr_addr(const struct sockaddr_storage *ss)
+{
+       int i;
+       char *s;
+       struct sockaddr_in *in4 = (void *)ss;
+       struct sockaddr_in6 *in6 = (void *)ss;
+
+       spin_lock(&addr_str_lock);
+       i = last_addr_str++;
+       if (last_addr_str == MAX_ADDR_STR)
+               last_addr_str = 0;
+       spin_unlock(&addr_str_lock);
+       s = addr_str[i];
+
+       switch (ss->ss_family) {
+       case AF_INET:
+               snprintf(s, MAX_ADDR_STR_LEN, "%pI4:%u", &in4->sin_addr,
+                        (unsigned int)ntohs(in4->sin_port));
+               break;
+
+       case AF_INET6:
+               snprintf(s, MAX_ADDR_STR_LEN, "[%pI6c]:%u", &in6->sin6_addr,
+                        (unsigned int)ntohs(in6->sin6_port));
+               break;
+
+       default:
+               sprintf(s, "(unknown sockaddr family %d)", (int)ss->ss_family);
+       }
+
+       return s;
+}
+EXPORT_SYMBOL(ceph_pr_addr);
+
+static void encode_my_addr(struct ceph_messenger *msgr)
+{
+       memcpy(&msgr->my_enc_addr, &msgr->inst.addr, sizeof(msgr->my_enc_addr));
+       ceph_encode_addr(&msgr->my_enc_addr);
+}
+
+/*
+ * work queue for all reading and writing to/from the socket.
+ */
+struct workqueue_struct *ceph_msgr_wq;
+
+int ceph_msgr_init(void)
+{
+       ceph_msgr_wq = create_workqueue("ceph-msgr");
+       if (IS_ERR(ceph_msgr_wq)) {
+               int ret = PTR_ERR(ceph_msgr_wq);
+               pr_err("msgr_init failed to create workqueue: %d\n", ret);
+               ceph_msgr_wq = NULL;
+               return ret;
+       }
+       return 0;
+}
+EXPORT_SYMBOL(ceph_msgr_init);
+
+void ceph_msgr_exit(void)
+{
+       destroy_workqueue(ceph_msgr_wq);
+}
+EXPORT_SYMBOL(ceph_msgr_exit);
+
+void ceph_msgr_flush(void)
+{
+       flush_workqueue(ceph_msgr_wq);
+}
+EXPORT_SYMBOL(ceph_msgr_flush);
+
+
+/*
+ * socket callback functions
+ */
+
+/* data available on socket, or listen socket received a connect */
+static void ceph_data_ready(struct sock *sk, int count_unused)
+{
+       struct ceph_connection *con =
+               (struct ceph_connection *)sk->sk_user_data;
+       if (sk->sk_state != TCP_CLOSE_WAIT) {
+               dout("ceph_data_ready on %p state = %lu, queueing work\n",
+                    con, con->state);
+               queue_con(con);
+       }
+}
+
+/* socket has buffer space for writing */
+static void ceph_write_space(struct sock *sk)
+{
+       struct ceph_connection *con =
+               (struct ceph_connection *)sk->sk_user_data;
+
+       /* only queue to workqueue if there is data we want to write. */
+       if (test_bit(WRITE_PENDING, &con->state)) {
+               dout("ceph_write_space %p queueing write work\n", con);
+               queue_con(con);
+       } else {
+               dout("ceph_write_space %p nothing to write\n", con);
+       }
+
+       /* since we have our own write_space, clear the SOCK_NOSPACE flag */
+       clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+}
+
+/* socket's state has changed */
+static void ceph_state_change(struct sock *sk)
+{
+       struct ceph_connection *con =
+               (struct ceph_connection *)sk->sk_user_data;
+
+       dout("ceph_state_change %p state = %lu sk_state = %u\n",
+            con, con->state, sk->sk_state);
+
+       if (test_bit(CLOSED, &con->state))
+               return;
+
+       switch (sk->sk_state) {
+       case TCP_CLOSE:
+               dout("ceph_state_change TCP_CLOSE\n");
+       case TCP_CLOSE_WAIT:
+               dout("ceph_state_change TCP_CLOSE_WAIT\n");
+               if (test_and_set_bit(SOCK_CLOSED, &con->state) == 0) {
+                       if (test_bit(CONNECTING, &con->state))
+                               con->error_msg = "connection failed";
+                       else
+                               con->error_msg = "socket closed";
+                       queue_con(con);
+               }
+               break;
+       case TCP_ESTABLISHED:
+               dout("ceph_state_change TCP_ESTABLISHED\n");
+               queue_con(con);
+               break;
+       }
+}
+
+/*
+ * set up socket callbacks
+ */
+static void set_sock_callbacks(struct socket *sock,
+                              struct ceph_connection *con)
+{
+       struct sock *sk = sock->sk;
+       sk->sk_user_data = (void *)con;
+       sk->sk_data_ready = ceph_data_ready;
+       sk->sk_write_space = ceph_write_space;
+       sk->sk_state_change = ceph_state_change;
+}
+
+
+/*
+ * socket helpers
+ */
+
+/*
+ * initiate connection to a remote socket.
+ */
+static struct socket *ceph_tcp_connect(struct ceph_connection *con)
+{
+       struct sockaddr_storage *paddr = &con->peer_addr.in_addr;
+       struct socket *sock;
+       int ret;
+
+       BUG_ON(con->sock);
+       ret = sock_create_kern(con->peer_addr.in_addr.ss_family, SOCK_STREAM,
+                              IPPROTO_TCP, &sock);
+       if (ret)
+               return ERR_PTR(ret);
+       con->sock = sock;
+       sock->sk->sk_allocation = GFP_NOFS;
+
+#ifdef CONFIG_LOCKDEP
+       lockdep_set_class(&sock->sk->sk_lock, &socket_class);
+#endif
+
+       set_sock_callbacks(sock, con);
+
+       dout("connect %s\n", ceph_pr_addr(&con->peer_addr.in_addr));
+
+       ret = sock->ops->connect(sock, (struct sockaddr *)paddr, sizeof(*paddr),
+                                O_NONBLOCK);
+       if (ret == -EINPROGRESS) {
+               dout("connect %s EINPROGRESS sk_state = %u\n",
+                    ceph_pr_addr(&con->peer_addr.in_addr),
+                    sock->sk->sk_state);
+               ret = 0;
+       }
+       if (ret < 0) {
+               pr_err("connect %s error %d\n",
+                      ceph_pr_addr(&con->peer_addr.in_addr), ret);
+               sock_release(sock);
+               con->sock = NULL;
+               con->error_msg = "connect error";
+       }
+
+       if (ret < 0)
+               return ERR_PTR(ret);
+       return sock;
+}
+
+static int ceph_tcp_recvmsg(struct socket *sock, void *buf, size_t len)
+{
+       struct kvec iov = {buf, len};
+       struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL };
+
+       return kernel_recvmsg(sock, &msg, &iov, 1, len, msg.msg_flags);
+}
+
+/*
+ * write something.  @more is true if caller will be sending more data
+ * shortly.
+ */
+static int ceph_tcp_sendmsg(struct socket *sock, struct kvec *iov,
+                    size_t kvlen, size_t len, int more)
+{
+       struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL };
+
+       if (more)
+               msg.msg_flags |= MSG_MORE;
+       else
+               msg.msg_flags |= MSG_EOR;  /* superfluous, but what the hell */
+
+       return kernel_sendmsg(sock, &msg, iov, kvlen, len);
+}
+
+
+/*
+ * Shutdown/close the socket for the given connection.
+ */
+static int con_close_socket(struct ceph_connection *con)
+{
+       int rc;
+
+       dout("con_close_socket on %p sock %p\n", con, con->sock);
+       if (!con->sock)
+               return 0;
+       set_bit(SOCK_CLOSED, &con->state);
+       rc = con->sock->ops->shutdown(con->sock, SHUT_RDWR);
+       sock_release(con->sock);
+       con->sock = NULL;
+       clear_bit(SOCK_CLOSED, &con->state);
+       return rc;
+}
+
+/*
+ * Reset a connection.  Discard all incoming and outgoing messages
+ * and clear *_seq state.
+ */
+static void ceph_msg_remove(struct ceph_msg *msg)
+{
+       list_del_init(&msg->list_head);
+       ceph_msg_put(msg);
+}
+static void ceph_msg_remove_list(struct list_head *head)
+{
+       while (!list_empty(head)) {
+               struct ceph_msg *msg = list_first_entry(head, struct ceph_msg,
+                                                       list_head);
+               ceph_msg_remove(msg);
+       }
+}
+
+static void reset_connection(struct ceph_connection *con)
+{
+       /* reset connection, out_queue, msg_ and connect_seq */
+       /* discard existing out_queue and msg_seq */
+       ceph_msg_remove_list(&con->out_queue);
+       ceph_msg_remove_list(&con->out_sent);
+
+       if (con->in_msg) {
+               ceph_msg_put(con->in_msg);
+               con->in_msg = NULL;
+       }
+
+       con->connect_seq = 0;
+       con->out_seq = 0;
+       if (con->out_msg) {
+               ceph_msg_put(con->out_msg);
+               con->out_msg = NULL;
+       }
+       con->out_keepalive_pending = false;
+       con->in_seq = 0;
+       con->in_seq_acked = 0;
+}
+
+/*
+ * mark a peer down.  drop any open connections.
+ */
+void ceph_con_close(struct ceph_connection *con)
+{
+       dout("con_close %p peer %s\n", con,
+            ceph_pr_addr(&con->peer_addr.in_addr));
+       set_bit(CLOSED, &con->state);  /* in case there's queued work */
+       clear_bit(STANDBY, &con->state);  /* avoid connect_seq bump */
+       clear_bit(LOSSYTX, &con->state);  /* so we retry next connect */
+       clear_bit(KEEPALIVE_PENDING, &con->state);
+       clear_bit(WRITE_PENDING, &con->state);
+       mutex_lock(&con->mutex);
+       reset_connection(con);
+       con->peer_global_seq = 0;
+       cancel_delayed_work(&con->work);
+       mutex_unlock(&con->mutex);
+       queue_con(con);
+}
+EXPORT_SYMBOL(ceph_con_close);
+
+/*
+ * Reopen a closed connection, with a new peer address.
+ */
+void ceph_con_open(struct ceph_connection *con, struct ceph_entity_addr *addr)
+{
+       dout("con_open %p %s\n", con, ceph_pr_addr(&addr->in_addr));
+       set_bit(OPENING, &con->state);
+       clear_bit(CLOSED, &con->state);
+       memcpy(&con->peer_addr, addr, sizeof(*addr));
+       con->delay = 0;      /* reset backoff memory */
+       queue_con(con);
+}
+EXPORT_SYMBOL(ceph_con_open);
+
+/*
+ * return true if this connection ever successfully opened
+ */
+bool ceph_con_opened(struct ceph_connection *con)
+{
+       return con->connect_seq > 0;
+}
+
+/*
+ * generic get/put
+ */
+struct ceph_connection *ceph_con_get(struct ceph_connection *con)
+{
+       dout("con_get %p nref = %d -> %d\n", con,
+            atomic_read(&con->nref), atomic_read(&con->nref) + 1);
+       if (atomic_inc_not_zero(&con->nref))
+               return con;
+       return NULL;
+}
+
+void ceph_con_put(struct ceph_connection *con)
+{
+       dout("con_put %p nref = %d -> %d\n", con,
+            atomic_read(&con->nref), atomic_read(&con->nref) - 1);
+       BUG_ON(atomic_read(&con->nref) == 0);
+       if (atomic_dec_and_test(&con->nref)) {
+               BUG_ON(con->sock);
+               kfree(con);
+       }
+}
+
+/*
+ * initialize a new connection.
+ */
+void ceph_con_init(struct ceph_messenger *msgr, struct ceph_connection *con)
+{
+       dout("con_init %p\n", con);
+       memset(con, 0, sizeof(*con));
+       atomic_set(&con->nref, 1);
+       con->msgr = msgr;
+       mutex_init(&con->mutex);
+       INIT_LIST_HEAD(&con->out_queue);
+       INIT_LIST_HEAD(&con->out_sent);
+       INIT_DELAYED_WORK(&con->work, con_work);
+}
+EXPORT_SYMBOL(ceph_con_init);
+
+
+/*
+ * We maintain a global counter to order connection attempts.  Get
+ * a unique seq greater than @gt.
+ */
+static u32 get_global_seq(struct ceph_messenger *msgr, u32 gt)
+{
+       u32 ret;
+
+       spin_lock(&msgr->global_seq_lock);
+       if (msgr->global_seq < gt)
+               msgr->global_seq = gt;
+       ret = ++msgr->global_seq;
+       spin_unlock(&msgr->global_seq_lock);
+       return ret;
+}
+
+
+/*
+ * Prepare footer for currently outgoing message, and finish things
+ * off.  Assumes out_kvec* are already valid.. we just add on to the end.
+ */
+static void prepare_write_message_footer(struct ceph_connection *con, int v)
+{
+       struct ceph_msg *m = con->out_msg;
+
+       dout("prepare_write_message_footer %p\n", con);
+       con->out_kvec_is_msg = true;
+       con->out_kvec[v].iov_base = &m->footer;
+       con->out_kvec[v].iov_len = sizeof(m->footer);
+       con->out_kvec_bytes += sizeof(m->footer);
+       con->out_kvec_left++;
+       con->out_more = m->more_to_follow;
+       con->out_msg_done = true;
+}
+
+/*
+ * Prepare headers for the next outgoing message.
+ */
+static void prepare_write_message(struct ceph_connection *con)
+{
+       struct ceph_msg *m;
+       int v = 0;
+
+       con->out_kvec_bytes = 0;
+       con->out_kvec_is_msg = true;
+       con->out_msg_done = false;
+
+       /* Sneak an ack in there first?  If we can get it into the same
+        * TCP packet that's a good thing. */
+       if (con->in_seq > con->in_seq_acked) {
+               con->in_seq_acked = con->in_seq;
+               con->out_kvec[v].iov_base = &tag_ack;
+               con->out_kvec[v++].iov_len = 1;
+               con->out_temp_ack = cpu_to_le64(con->in_seq_acked);
+               con->out_kvec[v].iov_base = &con->out_temp_ack;
+               con->out_kvec[v++].iov_len = sizeof(con->out_temp_ack);
+               con->out_kvec_bytes = 1 + sizeof(con->out_temp_ack);
+       }
+
+       m = list_first_entry(&con->out_queue,
+                      struct ceph_msg, list_head);
+       con->out_msg = m;
+       if (test_bit(LOSSYTX, &con->state)) {
+               list_del_init(&m->list_head);
+       } else {
+               /* put message on sent list */
+               ceph_msg_get(m);
+               list_move_tail(&m->list_head, &con->out_sent);
+       }
+
+       /*
+        * only assign outgoing seq # if we haven't sent this message
+        * yet.  if it is requeued, resend with it's original seq.
+        */
+       if (m->needs_out_seq) {
+               m->hdr.seq = cpu_to_le64(++con->out_seq);
+               m->needs_out_seq = false;
+       }
+
+       dout("prepare_write_message %p seq %lld type %d len %d+%d+%d %d pgs\n",
+            m, con->out_seq, le16_to_cpu(m->hdr.type),
+            le32_to_cpu(m->hdr.front_len), le32_to_cpu(m->hdr.middle_len),
+            le32_to_cpu(m->hdr.data_len),
+            m->nr_pages);
+       BUG_ON(le32_to_cpu(m->hdr.front_len) != m->front.iov_len);
+
+       /* tag + hdr + front + middle */
+       con->out_kvec[v].iov_base = &tag_msg;
+       con->out_kvec[v++].iov_len = 1;
+       con->out_kvec[v].iov_base = &m->hdr;
+       con->out_kvec[v++].iov_len = sizeof(m->hdr);
+       con->out_kvec[v++] = m->front;
+       if (m->middle)
+               con->out_kvec[v++] = m->middle->vec;
+       con->out_kvec_left = v;
+       con->out_kvec_bytes += 1 + sizeof(m->hdr) + m->front.iov_len +
+               (m->middle ? m->middle->vec.iov_len : 0);
+       con->out_kvec_cur = con->out_kvec;
+
+       /* fill in crc (except data pages), footer */
+       con->out_msg->hdr.crc =
+               cpu_to_le32(crc32c(0, (void *)&m->hdr,
+                                     sizeof(m->hdr) - sizeof(m->hdr.crc)));
+       con->out_msg->footer.flags = CEPH_MSG_FOOTER_COMPLETE;
+       con->out_msg->footer.front_crc =
+               cpu_to_le32(crc32c(0, m->front.iov_base, m->front.iov_len));
+       if (m->middle)
+               con->out_msg->footer.middle_crc =
+                       cpu_to_le32(crc32c(0, m->middle->vec.iov_base,
+                                          m->middle->vec.iov_len));
+       else
+               con->out_msg->footer.middle_crc = 0;
+       con->out_msg->footer.data_crc = 0;
+       dout("prepare_write_message front_crc %u data_crc %u\n",
+            le32_to_cpu(con->out_msg->footer.front_crc),
+            le32_to_cpu(con->out_msg->footer.middle_crc));
+
+       /* is there a data payload? */
+       if (le32_to_cpu(m->hdr.data_len) > 0) {
+               /* initialize page iterator */
+               con->out_msg_pos.page = 0;
+               if (m->pages)
+                       con->out_msg_pos.page_pos =
+                               le16_to_cpu(m->hdr.data_off) & ~PAGE_MASK;
+               else
+                       con->out_msg_pos.page_pos = 0;
+               con->out_msg_pos.data_pos = 0;
+               con->out_msg_pos.did_page_crc = 0;
+               con->out_more = 1;  /* data + footer will follow */
+       } else {
+               /* no, queue up footer too and be done */
+               prepare_write_message_footer(con, v);
+       }
+
+       set_bit(WRITE_PENDING, &con->state);
+}
+
+/*
+ * Prepare an ack.
+ */
+static void prepare_write_ack(struct ceph_connection *con)
+{
+       dout("prepare_write_ack %p %llu -> %llu\n", con,
+            con->in_seq_acked, con->in_seq);
+       con->in_seq_acked = con->in_seq;
+
+       con->out_kvec[0].iov_base = &tag_ack;
+       con->out_kvec[0].iov_len = 1;
+       con->out_temp_ack = cpu_to_le64(con->in_seq_acked);
+       con->out_kvec[1].iov_base = &con->out_temp_ack;
+       con->out_kvec[1].iov_len = sizeof(con->out_temp_ack);
+       con->out_kvec_left = 2;
+       con->out_kvec_bytes = 1 + sizeof(con->out_temp_ack);
+       con->out_kvec_cur = con->out_kvec;
+       con->out_more = 1;  /* more will follow.. eventually.. */
+       set_bit(WRITE_PENDING, &con->state);
+}
+
+/*
+ * Prepare to write keepalive byte.
+ */
+static void prepare_write_keepalive(struct ceph_connection *con)
+{
+       dout("prepare_write_keepalive %p\n", con);
+       con->out_kvec[0].iov_base = &tag_keepalive;
+       con->out_kvec[0].iov_len = 1;
+       con->out_kvec_left = 1;
+       con->out_kvec_bytes = 1;
+       con->out_kvec_cur = con->out_kvec;
+       set_bit(WRITE_PENDING, &con->state);
+}
+
+/*
+ * Connection negotiation.
+ */
+
+static void prepare_connect_authorizer(struct ceph_connection *con)
+{
+       void *auth_buf;
+       int auth_len = 0;
+       int auth_protocol = 0;
+
+       mutex_unlock(&con->mutex);
+       if (con->ops->get_authorizer)
+               con->ops->get_authorizer(con, &auth_buf, &auth_len,
+                                        &auth_protocol, &con->auth_reply_buf,
+                                        &con->auth_reply_buf_len,
+                                        con->auth_retry);
+       mutex_lock(&con->mutex);
+
+       con->out_connect.authorizer_protocol = cpu_to_le32(auth_protocol);
+       con->out_connect.authorizer_len = cpu_to_le32(auth_len);
+
+       con->out_kvec[con->out_kvec_left].iov_base = auth_buf;
+       con->out_kvec[con->out_kvec_left].iov_len = auth_len;
+       con->out_kvec_left++;
+       con->out_kvec_bytes += auth_len;
+}
+
+/*
+ * We connected to a peer and are saying hello.
+ */
+static void prepare_write_banner(struct ceph_messenger *msgr,
+                                struct ceph_connection *con)
+{
+       int len = strlen(CEPH_BANNER);
+
+       con->out_kvec[0].iov_base = CEPH_BANNER;
+       con->out_kvec[0].iov_len = len;
+       con->out_kvec[1].iov_base = &msgr->my_enc_addr;
+       con->out_kvec[1].iov_len = sizeof(msgr->my_enc_addr);
+       con->out_kvec_left = 2;
+       con->out_kvec_bytes = len + sizeof(msgr->my_enc_addr);
+       con->out_kvec_cur = con->out_kvec;
+       con->out_more = 0;
+       set_bit(WRITE_PENDING, &con->state);
+}
+
+static void prepare_write_connect(struct ceph_messenger *msgr,
+                                 struct ceph_connection *con,
+                                 int after_banner)
+{
+       unsigned global_seq = get_global_seq(con->msgr, 0);
+       int proto;
+
+       switch (con->peer_name.type) {
+       case CEPH_ENTITY_TYPE_MON:
+               proto = CEPH_MONC_PROTOCOL;
+               break;
+       case CEPH_ENTITY_TYPE_OSD:
+               proto = CEPH_OSDC_PROTOCOL;
+               break;
+       case CEPH_ENTITY_TYPE_MDS:
+               proto = CEPH_MDSC_PROTOCOL;
+               break;
+       default:
+               BUG();
+       }
+
+       dout("prepare_write_connect %p cseq=%d gseq=%d proto=%d\n", con,
+            con->connect_seq, global_seq, proto);
+
+       con->out_connect.features = cpu_to_le64(msgr->supported_features);
+       con->out_connect.host_type = cpu_to_le32(CEPH_ENTITY_TYPE_CLIENT);
+       con->out_connect.connect_seq = cpu_to_le32(con->connect_seq);
+       con->out_connect.global_seq = cpu_to_le32(global_seq);
+       con->out_connect.protocol_version = cpu_to_le32(proto);
+       con->out_connect.flags = 0;
+
+       if (!after_banner) {
+               con->out_kvec_left = 0;
+               con->out_kvec_bytes = 0;
+       }
+       con->out_kvec[con->out_kvec_left].iov_base = &con->out_connect;
+       con->out_kvec[con->out_kvec_left].iov_len = sizeof(con->out_connect);
+       con->out_kvec_left++;
+       con->out_kvec_bytes += sizeof(con->out_connect);
+       con->out_kvec_cur = con->out_kvec;
+       con->out_more = 0;
+       set_bit(WRITE_PENDING, &con->state);
+
+       prepare_connect_authorizer(con);
+}
+
+
+/*
+ * write as much of pending kvecs to the socket as we can.
+ *  1 -> done
+ *  0 -> socket full, but more to do
+ * <0 -> error
+ */
+static int write_partial_kvec(struct ceph_connection *con)
+{
+       int ret;
+
+       dout("write_partial_kvec %p %d left\n", con, con->out_kvec_bytes);
+       while (con->out_kvec_bytes > 0) {
+               ret = ceph_tcp_sendmsg(con->sock, con->out_kvec_cur,
+                                      con->out_kvec_left, con->out_kvec_bytes,
+                                      con->out_more);
+               if (ret <= 0)
+                       goto out;
+               con->out_kvec_bytes -= ret;
+               if (con->out_kvec_bytes == 0)
+                       break;            /* done */
+               while (ret > 0) {
+                       if (ret >= con->out_kvec_cur->iov_len) {
+                               ret -= con->out_kvec_cur->iov_len;
+                               con->out_kvec_cur++;
+                               con->out_kvec_left--;
+                       } else {
+                               con->out_kvec_cur->iov_len -= ret;
+                               con->out_kvec_cur->iov_base += ret;
+                               ret = 0;
+                               break;
+                       }
+               }
+       }
+       con->out_kvec_left = 0;
+       con->out_kvec_is_msg = false;
+       ret = 1;
+out:
+       dout("write_partial_kvec %p %d left in %d kvecs ret = %d\n", con,
+            con->out_kvec_bytes, con->out_kvec_left, ret);
+       return ret;  /* done! */
+}
+
+#ifdef CONFIG_BLOCK
+static void init_bio_iter(struct bio *bio, struct bio **iter, int *seg)
+{
+       if (!bio) {
+               *iter = NULL;
+               *seg = 0;
+               return;
+       }
+       *iter = bio;
+       *seg = bio->bi_idx;
+}
+
+static void iter_bio_next(struct bio **bio_iter, int *seg)
+{
+       if (*bio_iter == NULL)
+               return;
+
+       BUG_ON(*seg >= (*bio_iter)->bi_vcnt);
+
+       (*seg)++;
+       if (*seg == (*bio_iter)->bi_vcnt)
+               init_bio_iter((*bio_iter)->bi_next, bio_iter, seg);
+}
+#endif
+
+/*
+ * Write as much message data payload as we can.  If we finish, queue
+ * up the footer.
+ *  1 -> done, footer is now queued in out_kvec[].
+ *  0 -> socket full, but more to do
+ * <0 -> error
+ */
+static int write_partial_msg_pages(struct ceph_connection *con)
+{
+       struct ceph_msg *msg = con->out_msg;
+       unsigned data_len = le32_to_cpu(msg->hdr.data_len);
+       size_t len;
+       int crc = con->msgr->nocrc;
+       int ret;
+       int total_max_write;
+       int in_trail = 0;
+       size_t trail_len = (msg->trail ? msg->trail->length : 0);
+
+       dout("write_partial_msg_pages %p msg %p page %d/%d offset %d\n",
+            con, con->out_msg, con->out_msg_pos.page, con->out_msg->nr_pages,
+            con->out_msg_pos.page_pos);
+
+#ifdef CONFIG_BLOCK
+       if (msg->bio && !msg->bio_iter)
+               init_bio_iter(msg->bio, &msg->bio_iter, &msg->bio_seg);
+#endif
+
+       while (data_len > con->out_msg_pos.data_pos) {
+               struct page *page = NULL;
+               void *kaddr = NULL;
+               int max_write = PAGE_SIZE;
+               int page_shift = 0;
+
+               total_max_write = data_len - trail_len -
+                       con->out_msg_pos.data_pos;
+
+               /*
+                * if we are calculating the data crc (the default), we need
+                * to map the page.  if our pages[] has been revoked, use the
+                * zero page.
+                */
+
+               /* have we reached the trail part of the data? */
+               if (con->out_msg_pos.data_pos >= data_len - trail_len) {
+                       in_trail = 1;
+
+                       total_max_write = data_len - con->out_msg_pos.data_pos;
+
+                       page = list_first_entry(&msg->trail->head,
+                                               struct page, lru);
+                       if (crc)
+                               kaddr = kmap(page);
+                       max_write = PAGE_SIZE;
+               } else if (msg->pages) {
+                       page = msg->pages[con->out_msg_pos.page];
+                       if (crc)
+                               kaddr = kmap(page);
+               } else if (msg->pagelist) {
+                       page = list_first_entry(&msg->pagelist->head,
+                                               struct page, lru);
+                       if (crc)
+                               kaddr = kmap(page);
+#ifdef CONFIG_BLOCK
+               } else if (msg->bio) {
+                       struct bio_vec *bv;
+
+                       bv = bio_iovec_idx(msg->bio_iter, msg->bio_seg);
+                       page = bv->bv_page;
+                       page_shift = bv->bv_offset;
+                       if (crc)
+                               kaddr = kmap(page) + page_shift;
+                       max_write = bv->bv_len;
+#endif
+               } else {
+                       page = con->msgr->zero_page;
+                       if (crc)
+                               kaddr = page_address(con->msgr->zero_page);
+               }
+               len = min_t(int, max_write - con->out_msg_pos.page_pos,
+                           total_max_write);
+
+               if (crc && !con->out_msg_pos.did_page_crc) {
+                       void *base = kaddr + con->out_msg_pos.page_pos;
+                       u32 tmpcrc = le32_to_cpu(con->out_msg->footer.data_crc);
+
+                       BUG_ON(kaddr == NULL);
+                       con->out_msg->footer.data_crc =
+                               cpu_to_le32(crc32c(tmpcrc, base, len));
+                       con->out_msg_pos.did_page_crc = 1;
+               }
+               ret = kernel_sendpage(con->sock, page,
+                                     con->out_msg_pos.page_pos + page_shift,
+                                     len,
+                                     MSG_DONTWAIT | MSG_NOSIGNAL |
+                                     MSG_MORE);
+
+               if (crc &&
+                   (msg->pages || msg->pagelist || msg->bio || in_trail))
+                       kunmap(page);
+
+               if (ret <= 0)
+                       goto out;
+
+               con->out_msg_pos.data_pos += ret;
+               con->out_msg_pos.page_pos += ret;
+               if (ret == len) {
+                       con->out_msg_pos.page_pos = 0;
+                       con->out_msg_pos.page++;
+                       con->out_msg_pos.did_page_crc = 0;
+                       if (in_trail)
+                               list_move_tail(&page->lru,
+                                              &msg->trail->head);
+                       else if (msg->pagelist)
+                               list_move_tail(&page->lru,
+                                              &msg->pagelist->head);
+#ifdef CONFIG_BLOCK
+                       else if (msg->bio)
+                               iter_bio_next(&msg->bio_iter, &msg->bio_seg);
+#endif
+               }
+       }
+
+       dout("write_partial_msg_pages %p msg %p done\n", con, msg);
+
+       /* prepare and queue up footer, too */
+       if (!crc)
+               con->out_msg->footer.flags |= CEPH_MSG_FOOTER_NOCRC;
+       con->out_kvec_bytes = 0;
+       con->out_kvec_left = 0;
+       con->out_kvec_cur = con->out_kvec;
+       prepare_write_message_footer(con, 0);
+       ret = 1;
+out:
+       return ret;
+}
+
+/*
+ * write some zeros
+ */
+static int write_partial_skip(struct ceph_connection *con)
+{
+       int ret;
+
+       while (con->out_skip > 0) {
+               struct kvec iov = {
+                       .iov_base = page_address(con->msgr->zero_page),
+                       .iov_len = min(con->out_skip, (int)PAGE_CACHE_SIZE)
+               };
+
+               ret = ceph_tcp_sendmsg(con->sock, &iov, 1, iov.iov_len, 1);
+               if (ret <= 0)
+                       goto out;
+               con->out_skip -= ret;
+       }
+       ret = 1;
+out:
+       return ret;
+}
+
+/*
+ * Prepare to read connection handshake, or an ack.
+ */
+static void prepare_read_banner(struct ceph_connection *con)
+{
+       dout("prepare_read_banner %p\n", con);
+       con->in_base_pos = 0;
+}
+
+static void prepare_read_connect(struct ceph_connection *con)
+{
+       dout("prepare_read_connect %p\n", con);
+       con->in_base_pos = 0;
+}
+
+static void prepare_read_ack(struct ceph_connection *con)
+{
+       dout("prepare_read_ack %p\n", con);
+       con->in_base_pos = 0;
+}
+
+static void prepare_read_tag(struct ceph_connection *con)
+{
+       dout("prepare_read_tag %p\n", con);
+       con->in_base_pos = 0;
+       con->in_tag = CEPH_MSGR_TAG_READY;
+}
+
+/*
+ * Prepare to read a message.
+ */
+static int prepare_read_message(struct ceph_connection *con)
+{
+       dout("prepare_read_message %p\n", con);
+       BUG_ON(con->in_msg != NULL);
+       con->in_base_pos = 0;
+       con->in_front_crc = con->in_middle_crc = con->in_data_crc = 0;
+       return 0;
+}
+
+
+static int read_partial(struct ceph_connection *con,
+                       int *to, int size, void *object)
+{
+       *to += size;
+       while (con->in_base_pos < *to) {
+               int left = *to - con->in_base_pos;
+               int have = size - left;
+               int ret = ceph_tcp_recvmsg(con->sock, object + have, left);
+               if (ret <= 0)
+                       return ret;
+               con->in_base_pos += ret;
+       }
+       return 1;
+}
+
+
+/*
+ * Read all or part of the connect-side handshake on a new connection
+ */
+static int read_partial_banner(struct ceph_connection *con)
+{
+       int ret, to = 0;
+
+       dout("read_partial_banner %p at %d\n", con, con->in_base_pos);
+
+       /* peer's banner */
+       ret = read_partial(con, &to, strlen(CEPH_BANNER), con->in_banner);
+       if (ret <= 0)
+               goto out;
+       ret = read_partial(con, &to, sizeof(con->actual_peer_addr),
+                          &con->actual_peer_addr);
+       if (ret <= 0)
+               goto out;
+       ret = read_partial(con, &to, sizeof(con->peer_addr_for_me),
+                          &con->peer_addr_for_me);
+       if (ret <= 0)
+               goto out;
+out:
+       return ret;
+}
+
+static int read_partial_connect(struct ceph_connection *con)
+{
+       int ret, to = 0;
+
+       dout("read_partial_connect %p at %d\n", con, con->in_base_pos);
+
+       ret = read_partial(con, &to, sizeof(con->in_reply), &con->in_reply);
+       if (ret <= 0)
+               goto out;
+       ret = read_partial(con, &to, le32_to_cpu(con->in_reply.authorizer_len),
+                          con->auth_reply_buf);
+       if (ret <= 0)
+               goto out;
+
+       dout("read_partial_connect %p tag %d, con_seq = %u, g_seq = %u\n",
+            con, (int)con->in_reply.tag,
+            le32_to_cpu(con->in_reply.connect_seq),
+            le32_to_cpu(con->in_reply.global_seq));
+out:
+       return ret;
+
+}
+
+/*
+ * Verify the hello banner looks okay.
+ */
+static int verify_hello(struct ceph_connection *con)
+{
+       if (memcmp(con->in_banner, CEPH_BANNER, strlen(CEPH_BANNER))) {
+               pr_err("connect to %s got bad banner\n",
+                      ceph_pr_addr(&con->peer_addr.in_addr));
+               con->error_msg = "protocol error, bad banner";
+               return -1;
+       }
+       return 0;
+}
+
+static bool addr_is_blank(struct sockaddr_storage *ss)
+{
+       switch (ss->ss_family) {
+       case AF_INET:
+               return ((struct sockaddr_in *)ss)->sin_addr.s_addr == 0;
+       case AF_INET6:
+               return
+                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[0] == 0 &&
+                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[1] == 0 &&
+                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[2] == 0 &&
+                    ((struct sockaddr_in6 *)ss)->sin6_addr.s6_addr32[3] == 0;
+       }
+       return false;
+}
+
+static int addr_port(struct sockaddr_storage *ss)
+{
+       switch (ss->ss_family) {
+       case AF_INET:
+               return ntohs(((struct sockaddr_in *)ss)->sin_port);
+       case AF_INET6:
+               return ntohs(((struct sockaddr_in6 *)ss)->sin6_port);
+       }
+       return 0;
+}
+
+static void addr_set_port(struct sockaddr_storage *ss, int p)
+{
+       switch (ss->ss_family) {
+       case AF_INET:
+               ((struct sockaddr_in *)ss)->sin_port = htons(p);
+       case AF_INET6:
+               ((struct sockaddr_in6 *)ss)->sin6_port = htons(p);
+       }
+}
+
+/*
+ * Parse an ip[:port] list into an addr array.  Use the default
+ * monitor port if a port isn't specified.
+ */
+int ceph_parse_ips(const char *c, const char *end,
+                  struct ceph_entity_addr *addr,
+                  int max_count, int *count)
+{
+       int i;
+       const char *p = c;
+
+       dout("parse_ips on '%.*s'\n", (int)(end-c), c);
+       for (i = 0; i < max_count; i++) {
+               const char *ipend;
+               struct sockaddr_storage *ss = &addr[i].in_addr;
+               struct sockaddr_in *in4 = (void *)ss;
+               struct sockaddr_in6 *in6 = (void *)ss;
+               int port;
+               char delim = ',';
+
+               if (*p == '[') {
+                       delim = ']';
+                       p++;
+               }
+
+               memset(ss, 0, sizeof(*ss));
+               if (in4_pton(p, end - p, (u8 *)&in4->sin_addr.s_addr,
+                            delim, &ipend))
+                       ss->ss_family = AF_INET;
+               else if (in6_pton(p, end - p, (u8 *)&in6->sin6_addr.s6_addr,
+                                 delim, &ipend))
+                       ss->ss_family = AF_INET6;
+               else
+                       goto bad;
+               p = ipend;
+
+               if (delim == ']') {
+                       if (*p != ']') {
+                               dout("missing matching ']'\n");
+                               goto bad;
+                       }
+                       p++;
+               }
+
+               /* port? */
+               if (p < end && *p == ':') {
+                       port = 0;
+                       p++;
+                       while (p < end && *p >= '0' && *p <= '9') {
+                               port = (port * 10) + (*p - '0');
+                               p++;
+                       }
+                       if (port > 65535 || port == 0)
+                               goto bad;
+               } else {
+                       port = CEPH_MON_PORT;
+               }
+
+               addr_set_port(ss, port);
+
+               dout("parse_ips got %s\n", ceph_pr_addr(ss));
+
+               if (p == end)
+                       break;
+               if (*p != ',')
+                       goto bad;
+               p++;
+       }
+
+       if (p != end)
+               goto bad;
+
+       if (count)
+               *count = i + 1;
+       return 0;
+
+bad:
+       pr_err("parse_ips bad ip '%.*s'\n", (int)(end - c), c);
+       return -EINVAL;
+}
+EXPORT_SYMBOL(ceph_parse_ips);
+
+static int process_banner(struct ceph_connection *con)
+{
+       dout("process_banner on %p\n", con);
+
+       if (verify_hello(con) < 0)
+               return -1;
+
+       ceph_decode_addr(&con->actual_peer_addr);
+       ceph_decode_addr(&con->peer_addr_for_me);
+
+       /*
+        * Make sure the other end is who we wanted.  note that the other
+        * end may not yet know their ip address, so if it's 0.0.0.0, give
+        * them the benefit of the doubt.
+        */
+       if (memcmp(&con->peer_addr, &con->actual_peer_addr,
+                  sizeof(con->peer_addr)) != 0 &&
+           !(addr_is_blank(&con->actual_peer_addr.in_addr) &&
+             con->actual_peer_addr.nonce == con->peer_addr.nonce)) {
+               pr_warning("wrong peer, want %s/%d, got %s/%d\n",
+                          ceph_pr_addr(&con->peer_addr.in_addr),
+                          (int)le32_to_cpu(con->peer_addr.nonce),
+                          ceph_pr_addr(&con->actual_peer_addr.in_addr),
+                          (int)le32_to_cpu(con->actual_peer_addr.nonce));
+               con->error_msg = "wrong peer at address";
+               return -1;
+       }
+
+       /*
+        * did we learn our address?
+        */
+       if (addr_is_blank(&con->msgr->inst.addr.in_addr)) {
+               int port = addr_port(&con->msgr->inst.addr.in_addr);
+
+               memcpy(&con->msgr->inst.addr.in_addr,
+                      &con->peer_addr_for_me.in_addr,
+                      sizeof(con->peer_addr_for_me.in_addr));
+               addr_set_port(&con->msgr->inst.addr.in_addr, port);
+               encode_my_addr(con->msgr);
+               dout("process_banner learned my addr is %s\n",
+                    ceph_pr_addr(&con->msgr->inst.addr.in_addr));
+       }
+
+       set_bit(NEGOTIATING, &con->state);
+       prepare_read_connect(con);
+       return 0;
+}
+
+static void fail_protocol(struct ceph_connection *con)
+{
+       reset_connection(con);
+       set_bit(CLOSED, &con->state);  /* in case there's queued work */
+
+       mutex_unlock(&con->mutex);
+       if (con->ops->bad_proto)
+               con->ops->bad_proto(con);
+       mutex_lock(&con->mutex);
+}
+
+static int process_connect(struct ceph_connection *con)
+{
+       u64 sup_feat = con->msgr->supported_features;
+       u64 req_feat = con->msgr->required_features;
+       u64 server_feat = le64_to_cpu(con->in_reply.features);
+
+       dout("process_connect on %p tag %d\n", con, (int)con->in_tag);
+
+       switch (con->in_reply.tag) {
+       case CEPH_MSGR_TAG_FEATURES:
+               pr_err("%s%lld %s feature set mismatch,"
+                      " my %llx < server's %llx, missing %llx\n",
+                      ENTITY_NAME(con->peer_name),
+                      ceph_pr_addr(&con->peer_addr.in_addr),
+                      sup_feat, server_feat, server_feat & ~sup_feat);
+               con->error_msg = "missing required protocol features";
+               fail_protocol(con);
+               return -1;
+
+       case CEPH_MSGR_TAG_BADPROTOVER:
+               pr_err("%s%lld %s protocol version mismatch,"
+                      " my %d != server's %d\n",
+                      ENTITY_NAME(con->peer_name),
+                      ceph_pr_addr(&con->peer_addr.in_addr),
+                      le32_to_cpu(con->out_connect.protocol_version),
+                      le32_to_cpu(con->in_reply.protocol_version));
+               con->error_msg = "protocol version mismatch";
+               fail_protocol(con);
+               return -1;
+
+       case CEPH_MSGR_TAG_BADAUTHORIZER:
+               con->auth_retry++;
+               dout("process_connect %p got BADAUTHORIZER attempt %d\n", con,
+                    con->auth_retry);
+               if (con->auth_retry == 2) {
+                       con->error_msg = "connect authorization failure";
+                       reset_connection(con);
+                       set_bit(CLOSED, &con->state);
+                       return -1;
+               }
+               con->auth_retry = 1;
+               prepare_write_connect(con->msgr, con, 0);
+               prepare_read_connect(con);
+               break;
+
+       case CEPH_MSGR_TAG_RESETSESSION:
+               /*
+                * If we connected with a large connect_seq but the peer
+                * has no record of a session with us (no connection, or
+                * connect_seq == 0), they will send RESETSESION to indicate
+                * that they must have reset their session, and may have
+                * dropped messages.
+                */
+               dout("process_connect got RESET peer seq %u\n",
+                    le32_to_cpu(con->in_connect.connect_seq));
+               pr_err("%s%lld %s connection reset\n",
+                      ENTITY_NAME(con->peer_name),
+                      ceph_pr_addr(&con->peer_addr.in_addr));
+               reset_connection(con);
+               prepare_write_connect(con->msgr, con, 0);
+               prepare_read_connect(con);
+
+               /* Tell ceph about it. */
+               mutex_unlock(&con->mutex);
+               pr_info("reset on %s%lld\n", ENTITY_NAME(con->peer_name));
+               if (con->ops->peer_reset)
+                       con->ops->peer_reset(con);
+               mutex_lock(&con->mutex);
+               break;
+
+       case CEPH_MSGR_TAG_RETRY_SESSION:
+               /*
+                * If we sent a smaller connect_seq than the peer has, try
+                * again with a larger value.
+                */
+               dout("process_connect got RETRY my seq = %u, peer_seq = %u\n",
+                    le32_to_cpu(con->out_connect.connect_seq),
+                    le32_to_cpu(con->in_connect.connect_seq));
+               con->connect_seq = le32_to_cpu(con->in_connect.connect_seq);
+               prepare_write_connect(con->msgr, con, 0);
+               prepare_read_connect(con);
+               break;
+
+       case CEPH_MSGR_TAG_RETRY_GLOBAL:
+               /*
+                * If we sent a smaller global_seq than the peer has, try
+                * again with a larger value.
+                */
+               dout("process_connect got RETRY_GLOBAL my %u peer_gseq %u\n",
+                    con->peer_global_seq,
+                    le32_to_cpu(con->in_connect.global_seq));
+               get_global_seq(con->msgr,
+                              le32_to_cpu(con->in_connect.global_seq));
+               prepare_write_connect(con->msgr, con, 0);
+               prepare_read_connect(con);
+               break;
+
+       case CEPH_MSGR_TAG_READY:
+               if (req_feat & ~server_feat) {
+                       pr_err("%s%lld %s protocol feature mismatch,"
+                              " my required %llx > server's %llx, need %llx\n",
+                              ENTITY_NAME(con->peer_name),
+                              ceph_pr_addr(&con->peer_addr.in_addr),
+                              req_feat, server_feat, req_feat & ~server_feat);
+                       con->error_msg = "missing required protocol features";
+                       fail_protocol(con);
+                       return -1;
+               }
+               clear_bit(CONNECTING, &con->state);
+               con->peer_global_seq = le32_to_cpu(con->in_reply.global_seq);
+               con->connect_seq++;
+               con->peer_features = server_feat;
+               dout("process_connect got READY gseq %d cseq %d (%d)\n",
+                    con->peer_global_seq,
+                    le32_to_cpu(con->in_reply.connect_seq),
+                    con->connect_seq);
+               WARN_ON(con->connect_seq !=
+                       le32_to_cpu(con->in_reply.connect_seq));
+
+               if (con->in_reply.flags & CEPH_MSG_CONNECT_LOSSY)
+                       set_bit(LOSSYTX, &con->state);
+
+               prepare_read_tag(con);
+               break;
+
+       case CEPH_MSGR_TAG_WAIT:
+               /*
+                * If there is a connection race (we are opening
+                * connections to each other), one of us may just have
+                * to WAIT.  This shouldn't happen if we are the
+                * client.
+                */
+               pr_err("process_connect peer connecting WAIT\n");
+
+       default:
+               pr_err("connect protocol error, will retry\n");
+               con->error_msg = "protocol error, garbage tag during connect";
+               return -1;
+       }
+       return 0;
+}
+
+
+/*
+ * read (part of) an ack
+ */
+static int read_partial_ack(struct ceph_connection *con)
+{
+       int to = 0;
+
+       return read_partial(con, &to, sizeof(con->in_temp_ack),
+                           &con->in_temp_ack);
+}
+
+
+/*
+ * We can finally discard anything that's been acked.
+ */
+static void process_ack(struct ceph_connection *con)
+{
+       struct ceph_msg *m;
+       u64 ack = le64_to_cpu(con->in_temp_ack);
+       u64 seq;
+
+       while (!list_empty(&con->out_sent)) {
+               m = list_first_entry(&con->out_sent, struct ceph_msg,
+                                    list_head);
+               seq = le64_to_cpu(m->hdr.seq);
+               if (seq > ack)
+                       break;
+               dout("got ack for seq %llu type %d at %p\n", seq,
+                    le16_to_cpu(m->hdr.type), m);
+               ceph_msg_remove(m);
+       }
+       prepare_read_tag(con);
+}
+
+
+
+
+static int read_partial_message_section(struct ceph_connection *con,
+                                       struct kvec *section,
+                                       unsigned int sec_len, u32 *crc)
+{
+       int ret, left;
+
+       BUG_ON(!section);
+
+       while (section->iov_len < sec_len) {
+               BUG_ON(section->iov_base == NULL);
+               left = sec_len - section->iov_len;
+               ret = ceph_tcp_recvmsg(con->sock, (char *)section->iov_base +
+                                      section->iov_len, left);
+               if (ret <= 0)
+                       return ret;
+               section->iov_len += ret;
+               if (section->iov_len == sec_len)
+                       *crc = crc32c(0, section->iov_base,
+                                     section->iov_len);
+       }
+
+       return 1;
+}
+
+static struct ceph_msg *ceph_alloc_msg(struct ceph_connection *con,
+                               struct ceph_msg_header *hdr,
+                               int *skip);
+
+
+static int read_partial_message_pages(struct ceph_connection *con,
+                                     struct page **pages,
+                                     unsigned data_len, int datacrc)
+{
+       void *p;
+       int ret;
+       int left;
+
+       left = min((int)(data_len - con->in_msg_pos.data_pos),
+                  (int)(PAGE_SIZE - con->in_msg_pos.page_pos));
+       /* (page) data */
+       BUG_ON(pages == NULL);
+       p = kmap(pages[con->in_msg_pos.page]);
+       ret = ceph_tcp_recvmsg(con->sock, p + con->in_msg_pos.page_pos,
+                              left);
+       if (ret > 0 && datacrc)
+               con->in_data_crc =
+                       crc32c(con->in_data_crc,
+                                 p + con->in_msg_pos.page_pos, ret);
+       kunmap(pages[con->in_msg_pos.page]);
+       if (ret <= 0)
+               return ret;
+       con->in_msg_pos.data_pos += ret;
+       con->in_msg_pos.page_pos += ret;
+       if (con->in_msg_pos.page_pos == PAGE_SIZE) {
+               con->in_msg_pos.page_pos = 0;
+               con->in_msg_pos.page++;
+       }
+
+       return ret;
+}
+
+#ifdef CONFIG_BLOCK
+static int read_partial_message_bio(struct ceph_connection *con,
+                                   struct bio **bio_iter, int *bio_seg,
+                                   unsigned data_len, int datacrc)
+{
+       struct bio_vec *bv = bio_iovec_idx(*bio_iter, *bio_seg);
+       void *p;
+       int ret, left;
+
+       if (IS_ERR(bv))
+               return PTR_ERR(bv);
+
+       left = min((int)(data_len - con->in_msg_pos.data_pos),
+                  (int)(bv->bv_len - con->in_msg_pos.page_pos));
+
+       p = kmap(bv->bv_page) + bv->bv_offset;
+
+       ret = ceph_tcp_recvmsg(con->sock, p + con->in_msg_pos.page_pos,
+                              left);
+       if (ret > 0 && datacrc)
+               con->in_data_crc =
+                       crc32c(con->in_data_crc,
+                                 p + con->in_msg_pos.page_pos, ret);
+       kunmap(bv->bv_page);
+       if (ret <= 0)
+               return ret;
+       con->in_msg_pos.data_pos += ret;
+       con->in_msg_pos.page_pos += ret;
+       if (con->in_msg_pos.page_pos == bv->bv_len) {
+               con->in_msg_pos.page_pos = 0;
+               iter_bio_next(bio_iter, bio_seg);
+       }
+
+       return ret;
+}
+#endif
+
+/*
+ * read (part of) a message.
+ */
+static int read_partial_message(struct ceph_connection *con)
+{
+       struct ceph_msg *m = con->in_msg;
+       int ret;
+       int to, left;
+       unsigned front_len, middle_len, data_len, data_off;
+       int datacrc = con->msgr->nocrc;
+       int skip;
+       u64 seq;
+
+       dout("read_partial_message con %p msg %p\n", con, m);
+
+       /* header */
+       while (con->in_base_pos < sizeof(con->in_hdr)) {
+               left = sizeof(con->in_hdr) - con->in_base_pos;
+               ret = ceph_tcp_recvmsg(con->sock,
+                                      (char *)&con->in_hdr + con->in_base_pos,
+                                      left);
+               if (ret <= 0)
+                       return ret;
+               con->in_base_pos += ret;
+               if (con->in_base_pos == sizeof(con->in_hdr)) {
+                       u32 crc = crc32c(0, (void *)&con->in_hdr,
+                                sizeof(con->in_hdr) - sizeof(con->in_hdr.crc));
+                       if (crc != le32_to_cpu(con->in_hdr.crc)) {
+                               pr_err("read_partial_message bad hdr "
+                                      " crc %u != expected %u\n",
+                                      crc, con->in_hdr.crc);
+                               return -EBADMSG;
+                       }
+               }
+       }
+       front_len = le32_to_cpu(con->in_hdr.front_len);
+       if (front_len > CEPH_MSG_MAX_FRONT_LEN)
+               return -EIO;
+       middle_len = le32_to_cpu(con->in_hdr.middle_len);
+       if (middle_len > CEPH_MSG_MAX_DATA_LEN)
+               return -EIO;
+       data_len = le32_to_cpu(con->in_hdr.data_len);
+       if (data_len > CEPH_MSG_MAX_DATA_LEN)
+               return -EIO;
+       data_off = le16_to_cpu(con->in_hdr.data_off);
+
+       /* verify seq# */
+       seq = le64_to_cpu(con->in_hdr.seq);
+       if ((s64)seq - (s64)con->in_seq < 1) {
+               pr_info("skipping %s%lld %s seq %lld, expected %lld\n",
+                       ENTITY_NAME(con->peer_name),
+                       ceph_pr_addr(&con->peer_addr.in_addr),
+                       seq, con->in_seq + 1);
+               con->in_base_pos = -front_len - middle_len - data_len -
+                       sizeof(m->footer);
+               con->in_tag = CEPH_MSGR_TAG_READY;
+               con->in_seq++;
+               return 0;
+       } else if ((s64)seq - (s64)con->in_seq > 1) {
+               pr_err("read_partial_message bad seq %lld expected %lld\n",
+                      seq, con->in_seq + 1);
+               con->error_msg = "bad message sequence # for incoming message";
+               return -EBADMSG;
+       }
+
+       /* allocate message? */
+       if (!con->in_msg) {
+               dout("got hdr type %d front %d data %d\n", con->in_hdr.type,
+                    con->in_hdr.front_len, con->in_hdr.data_len);
+               skip = 0;
+               con->in_msg = ceph_alloc_msg(con, &con->in_hdr, &skip);
+               if (skip) {
+                       /* skip this message */
+                       dout("alloc_msg said skip message\n");
+                       BUG_ON(con->in_msg);
+                       con->in_base_pos = -front_len - middle_len - data_len -
+                               sizeof(m->footer);
+                       con->in_tag = CEPH_MSGR_TAG_READY;
+                       con->in_seq++;
+                       return 0;
+               }
+               if (!con->in_msg) {
+                       con->error_msg =
+                               "error allocating memory for incoming message";
+                       return -ENOMEM;
+               }
+               m = con->in_msg;
+               m->front.iov_len = 0;    /* haven't read it yet */
+               if (m->middle)
+                       m->middle->vec.iov_len = 0;
+
+               con->in_msg_pos.page = 0;
+               if (m->pages)
+                       con->in_msg_pos.page_pos = data_off & ~PAGE_MASK;
+               else
+                       con->in_msg_pos.page_pos = 0;
+               con->in_msg_pos.data_pos = 0;
+       }
+
+       /* front */
+       ret = read_partial_message_section(con, &m->front, front_len,
+                                          &con->in_front_crc);
+       if (ret <= 0)
+               return ret;
+
+       /* middle */
+       if (m->middle) {
+               ret = read_partial_message_section(con, &m->middle->vec,
+                                                  middle_len,
+                                                  &con->in_middle_crc);
+               if (ret <= 0)
+                       return ret;
+       }
+#ifdef CONFIG_BLOCK
+       if (m->bio && !m->bio_iter)
+               init_bio_iter(m->bio, &m->bio_iter, &m->bio_seg);
+#endif
+
+       /* (page) data */
+       while (con->in_msg_pos.data_pos < data_len) {
+               if (m->pages) {
+                       ret = read_partial_message_pages(con, m->pages,
+                                                data_len, datacrc);
+                       if (ret <= 0)
+                               return ret;
+#ifdef CONFIG_BLOCK
+               } else if (m->bio) {
+
+                       ret = read_partial_message_bio(con,
+                                                &m->bio_iter, &m->bio_seg,
+                                                data_len, datacrc);
+                       if (ret <= 0)
+                               return ret;
+#endif
+               } else {
+                       BUG_ON(1);
+               }
+       }
+
+       /* footer */
+       to = sizeof(m->hdr) + sizeof(m->footer);
+       while (con->in_base_pos < to) {
+               left = to - con->in_base_pos;
+               ret = ceph_tcp_recvmsg(con->sock, (char *)&m->footer +
+                                      (con->in_base_pos - sizeof(m->hdr)),
+                                      left);
+               if (ret <= 0)
+                       return ret;
+               con->in_base_pos += ret;
+       }
+       dout("read_partial_message got msg %p %d (%u) + %d (%u) + %d (%u)\n",
+            m, front_len, m->footer.front_crc, middle_len,
+            m->footer.middle_crc, data_len, m->footer.data_crc);
+
+       /* crc ok? */
+       if (con->in_front_crc != le32_to_cpu(m->footer.front_crc)) {
+               pr_err("read_partial_message %p front crc %u != exp. %u\n",
+                      m, con->in_front_crc, m->footer.front_crc);
+               return -EBADMSG;
+       }
+       if (con->in_middle_crc != le32_to_cpu(m->footer.middle_crc)) {
+               pr_err("read_partial_message %p middle crc %u != exp %u\n",
+                      m, con->in_middle_crc, m->footer.middle_crc);
+               return -EBADMSG;
+       }
+       if (datacrc &&
+           (m->footer.flags & CEPH_MSG_FOOTER_NOCRC) == 0 &&
+           con->in_data_crc != le32_to_cpu(m->footer.data_crc)) {
+               pr_err("read_partial_message %p data crc %u != exp. %u\n", m,
+                      con->in_data_crc, le32_to_cpu(m->footer.data_crc));
+               return -EBADMSG;
+       }
+
+       return 1; /* done! */
+}
+
+/*
+ * Process message.  This happens in the worker thread.  The callback should
+ * be careful not to do anything that waits on other incoming messages or it
+ * may deadlock.
+ */
+static void process_message(struct ceph_connection *con)
+{
+       struct ceph_msg *msg;
+
+       msg = con->in_msg;
+       con->in_msg = NULL;
+
+       /* if first message, set peer_name */
+       if (con->peer_name.type == 0)
+               con->peer_name = msg->hdr.src;
+
+       con->in_seq++;
+       mutex_unlock(&con->mutex);
+
+       dout("===== %p %llu from %s%lld %d=%s len %d+%d (%u %u %u) =====\n",
+            msg, le64_to_cpu(msg->hdr.seq),
+            ENTITY_NAME(msg->hdr.src),
+            le16_to_cpu(msg->hdr.type),
+            ceph_msg_type_name(le16_to_cpu(msg->hdr.type)),
+            le32_to_cpu(msg->hdr.front_len),
+            le32_to_cpu(msg->hdr.data_len),
+            con->in_front_crc, con->in_middle_crc, con->in_data_crc);
+       con->ops->dispatch(con, msg);
+
+       mutex_lock(&con->mutex);
+       prepare_read_tag(con);
+}
+
+
+/*
+ * Write something to the socket.  Called in a worker thread when the
+ * socket appears to be writeable and we have something ready to send.
+ */
+static int try_write(struct ceph_connection *con)
+{
+       struct ceph_messenger *msgr = con->msgr;
+       int ret = 1;
+
+       dout("try_write start %p state %lu nref %d\n", con, con->state,
+            atomic_read(&con->nref));
+
+more:
+       dout("try_write out_kvec_bytes %d\n", con->out_kvec_bytes);
+
+       /* open the socket first? */
+       if (con->sock == NULL) {
+               /*
+                * if we were STANDBY and are reconnecting _this_
+                * connection, bump connect_seq now.  Always bump
+                * global_seq.
+                */
+               if (test_and_clear_bit(STANDBY, &con->state))
+                       con->connect_seq++;
+
+               prepare_write_banner(msgr, con);
+               prepare_write_connect(msgr, con, 1);
+               prepare_read_banner(con);
+               set_bit(CONNECTING, &con->state);
+               clear_bit(NEGOTIATING, &con->state);
+
+               BUG_ON(con->in_msg);
+               con->in_tag = CEPH_MSGR_TAG_READY;
+               dout("try_write initiating connect on %p new state %lu\n",
+                    con, con->state);
+               con->sock = ceph_tcp_connect(con);
+               if (IS_ERR(con->sock)) {
+                       con->sock = NULL;
+                       con->error_msg = "connect error";
+                       ret = -1;
+                       goto out;
+               }
+       }
+
+more_kvec:
+       /* kvec data queued? */
+       if (con->out_skip) {
+               ret = write_partial_skip(con);
+               if (ret <= 0)
+                       goto done;
+               if (ret < 0) {
+                       dout("try_write write_partial_skip err %d\n", ret);
+                       goto done;
+               }
+       }
+       if (con->out_kvec_left) {
+               ret = write_partial_kvec(con);
+               if (ret <= 0)
+                       goto done;
+       }
+
+       /* msg pages? */
+       if (con->out_msg) {
+               if (con->out_msg_done) {
+                       ceph_msg_put(con->out_msg);
+                       con->out_msg = NULL;   /* we're done with this one */
+                       goto do_next;
+               }
+
+               ret = write_partial_msg_pages(con);
+               if (ret == 1)
+                       goto more_kvec;  /* we need to send the footer, too! */
+               if (ret == 0)
+                       goto done;
+               if (ret < 0) {
+                       dout("try_write write_partial_msg_pages err %d\n",
+                            ret);
+                       goto done;
+               }
+       }
+
+do_next:
+       if (!test_bit(CONNECTING, &con->state)) {
+               /* is anything else pending? */
+               if (!list_empty(&con->out_queue)) {
+                       prepare_write_message(con);
+                       goto more;
+               }
+               if (con->in_seq > con->in_seq_acked) {
+                       prepare_write_ack(con);
+                       goto more;
+               }
+               if (test_and_clear_bit(KEEPALIVE_PENDING, &con->state)) {
+                       prepare_write_keepalive(con);
+                       goto more;
+               }
+       }
+
+       /* Nothing to do! */
+       clear_bit(WRITE_PENDING, &con->state);
+       dout("try_write nothing else to write.\n");
+done:
+       ret = 0;
+out:
+       dout("try_write done on %p\n", con);
+       return ret;
+}
+
+
+
+/*
+ * Read what we can from the socket.
+ */
+static int try_read(struct ceph_connection *con)
+{
+       int ret = -1;
+
+       if (!con->sock)
+               return 0;
+
+       if (test_bit(STANDBY, &con->state))
+               return 0;
+
+       dout("try_read start on %p\n", con);
+
+more:
+       dout("try_read tag %d in_base_pos %d\n", (int)con->in_tag,
+            con->in_base_pos);
+       if (test_bit(CONNECTING, &con->state)) {
+               if (!test_bit(NEGOTIATING, &con->state)) {
+                       dout("try_read connecting\n");
+                       ret = read_partial_banner(con);
+                       if (ret <= 0)
+                               goto done;
+                       if (process_banner(con) < 0) {
+                               ret = -1;
+                               goto out;
+                       }
+               }
+               ret = read_partial_connect(con);
+               if (ret <= 0)
+                       goto done;
+               if (process_connect(con) < 0) {
+                       ret = -1;
+                       goto out;
+               }
+               goto more;
+       }
+
+       if (con->in_base_pos < 0) {
+               /*
+                * skipping + discarding content.
+                *
+                * FIXME: there must be a better way to do this!
+                */
+               static char buf[1024];
+               int skip = min(1024, -con->in_base_pos);
+               dout("skipping %d / %d bytes\n", skip, -con->in_base_pos);
+               ret = ceph_tcp_recvmsg(con->sock, buf, skip);
+               if (ret <= 0)
+                       goto done;
+               con->in_base_pos += ret;
+               if (con->in_base_pos)
+                       goto more;
+       }
+       if (con->in_tag == CEPH_MSGR_TAG_READY) {
+               /*
+                * what's next?
+                */
+               ret = ceph_tcp_recvmsg(con->sock, &con->in_tag, 1);
+               if (ret <= 0)
+                       goto done;
+               dout("try_read got tag %d\n", (int)con->in_tag);
+               switch (con->in_tag) {
+               case CEPH_MSGR_TAG_MSG:
+                       prepare_read_message(con);
+                       break;
+               case CEPH_MSGR_TAG_ACK:
+                       prepare_read_ack(con);
+                       break;
+               case CEPH_MSGR_TAG_CLOSE:
+                       set_bit(CLOSED, &con->state);   /* fixme */
+                       goto done;
+               default:
+                       goto bad_tag;
+               }
+       }
+       if (con->in_tag == CEPH_MSGR_TAG_MSG) {
+               ret = read_partial_message(con);
+               if (ret <= 0) {
+                       switch (ret) {
+                       case -EBADMSG:
+                               con->error_msg = "bad crc";
+                               ret = -EIO;
+                               goto out;
+                       case -EIO:
+                               con->error_msg = "io error";
+                               goto out;
+                       default:
+                               goto done;
+                       }
+               }
+               if (con->in_tag == CEPH_MSGR_TAG_READY)
+                       goto more;
+               process_message(con);
+               goto more;
+       }
+       if (con->in_tag == CEPH_MSGR_TAG_ACK) {
+               ret = read_partial_ack(con);
+               if (ret <= 0)
+                       goto done;
+               process_ack(con);
+               goto more;
+       }
+
+done:
+       ret = 0;
+out:
+       dout("try_read done on %p\n", con);
+       return ret;
+
+bad_tag:
+       pr_err("try_read bad con->in_tag = %d\n", (int)con->in_tag);
+       con->error_msg = "protocol error, garbage tag";
+       ret = -1;
+       goto out;
+}
+
+
+/*
+ * Atomically queue work on a connection.  Bump @con reference to
+ * avoid races with connection teardown.
+ *
+ * There is some trickery going on with QUEUED and BUSY because we
+ * only want a _single_ thread operating on each connection at any
+ * point in time, but we want to use all available CPUs.
+ *
+ * The worker thread only proceeds if it can atomically set BUSY.  It
+ * clears QUEUED and does it's thing.  When it thinks it's done, it
+ * clears BUSY, then rechecks QUEUED.. if it's set again, it loops
+ * (tries again to set BUSY).
+ *
+ * To queue work, we first set QUEUED, _then_ if BUSY isn't set, we
+ * try to queue work.  If that fails (work is already queued, or BUSY)
+ * we give up (work also already being done or is queued) but leave QUEUED
+ * set so that the worker thread will loop if necessary.
+ */
+static void queue_con(struct ceph_connection *con)
+{
+       if (test_bit(DEAD, &con->state)) {
+               dout("queue_con %p ignoring: DEAD\n",
+                    con);
+               return;
+       }
+
+       if (!con->ops->get(con)) {
+               dout("queue_con %p ref count 0\n", con);
+               return;
+       }
+
+       set_bit(QUEUED, &con->state);
+       if (test_bit(BUSY, &con->state)) {
+               dout("queue_con %p - already BUSY\n", con);
+               con->ops->put(con);
+       } else if (!queue_work(ceph_msgr_wq, &con->work.work)) {
+               dout("queue_con %p - already queued\n", con);
+               con->ops->put(con);
+       } else {
+               dout("queue_con %p\n", con);
+       }
+}
+
+/*
+ * Do some work on a connection.  Drop a connection ref when we're done.
+ */
+static void con_work(struct work_struct *work)
+{
+       struct ceph_connection *con = container_of(work, struct ceph_connection,
+                                                  work.work);
+       int backoff = 0;
+
+more:
+       if (test_and_set_bit(BUSY, &con->state) != 0) {
+               dout("con_work %p BUSY already set\n", con);
+               goto out;
+       }
+       dout("con_work %p start, clearing QUEUED\n", con);
+       clear_bit(QUEUED, &con->state);
+
+       mutex_lock(&con->mutex);
+
+       if (test_bit(CLOSED, &con->state)) { /* e.g. if we are replaced */
+               dout("con_work CLOSED\n");
+               con_close_socket(con);
+               goto done;
+       }
+       if (test_and_clear_bit(OPENING, &con->state)) {
+               /* reopen w/ new peer */
+               dout("con_work OPENING\n");
+               con_close_socket(con);
+       }
+
+       if (test_and_clear_bit(SOCK_CLOSED, &con->state) ||
+           try_read(con) < 0 ||
+           try_write(con) < 0) {
+               mutex_unlock(&con->mutex);
+               backoff = 1;
+               ceph_fault(con);     /* error/fault path */
+               goto done_unlocked;
+       }
+
+done:
+       mutex_unlock(&con->mutex);
+
+done_unlocked:
+       clear_bit(BUSY, &con->state);
+       dout("con->state=%lu\n", con->state);
+       if (test_bit(QUEUED, &con->state)) {
+               if (!backoff || test_bit(OPENING, &con->state)) {
+                       dout("con_work %p QUEUED reset, looping\n", con);
+                       goto more;
+               }
+               dout("con_work %p QUEUED reset, but just faulted\n", con);
+               clear_bit(QUEUED, &con->state);
+       }
+       dout("con_work %p done\n", con);
+
+out:
+       con->ops->put(con);
+}
+
+
+/*
+ * Generic error/fault handler.  A retry mechanism is used with
+ * exponential backoff
+ */
+static void ceph_fault(struct ceph_connection *con)
+{
+       pr_err("%s%lld %s %s\n", ENTITY_NAME(con->peer_name),
+              ceph_pr_addr(&con->peer_addr.in_addr), con->error_msg);
+       dout("fault %p state %lu to peer %s\n",
+            con, con->state, ceph_pr_addr(&con->peer_addr.in_addr));
+
+       if (test_bit(LOSSYTX, &con->state)) {
+               dout("fault on LOSSYTX channel\n");
+               goto out;
+       }
+
+       mutex_lock(&con->mutex);
+       if (test_bit(CLOSED, &con->state))
+               goto out_unlock;
+
+       con_close_socket(con);
+
+       if (con->in_msg) {
+               ceph_msg_put(con->in_msg);
+               con->in_msg = NULL;
+       }
+
+       /* Requeue anything that hasn't been acked */
+       list_splice_init(&con->out_sent, &con->out_queue);
+
+       /* If there are no messages in the queue, place the connection
+        * in a STANDBY state (i.e., don't try to reconnect just yet). */
+       if (list_empty(&con->out_queue) && !con->out_keepalive_pending) {
+               dout("fault setting STANDBY\n");
+               set_bit(STANDBY, &con->state);
+       } else {
+               /* retry after a delay. */
+               if (con->delay == 0)
+                       con->delay = BASE_DELAY_INTERVAL;
+               else if (con->delay < MAX_DELAY_INTERVAL)
+                       con->delay *= 2;
+               dout("fault queueing %p delay %lu\n", con, con->delay);
+               con->ops->get(con);
+               if (queue_delayed_work(ceph_msgr_wq, &con->work,
+                                      round_jiffies_relative(con->delay)) == 0)
+                       con->ops->put(con);
+       }
+
+out_unlock:
+       mutex_unlock(&con->mutex);
+out:
+       /*
+        * in case we faulted due to authentication, invalidate our
+        * current tickets so that we can get new ones.
+        */
+       if (con->auth_retry && con->ops->invalidate_authorizer) {
+               dout("calling invalidate_authorizer()\n");
+               con->ops->invalidate_authorizer(con);
+       }
+
+       if (con->ops->fault)
+               con->ops->fault(con);
+}
+
+
+
+/*
+ * create a new messenger instance
+ */
+struct ceph_messenger *ceph_messenger_create(struct ceph_entity_addr *myaddr,
+                                            u32 supported_features,
+                                            u32 required_features)
+{
+       struct ceph_messenger *msgr;
+
+       msgr = kzalloc(sizeof(*msgr), GFP_KERNEL);
+       if (msgr == NULL)
+               return ERR_PTR(-ENOMEM);
+
+       msgr->supported_features = supported_features;
+       msgr->required_features = required_features;
+
+       spin_lock_init(&msgr->global_seq_lock);
+
+       /* the zero page is needed if a request is "canceled" while the message
+        * is being written over the socket */
+       msgr->zero_page = __page_cache_alloc(GFP_KERNEL | __GFP_ZERO);
+       if (!msgr->zero_page) {
+               kfree(msgr);
+               return ERR_PTR(-ENOMEM);
+       }
+       kmap(msgr->zero_page);
+
+       if (myaddr)
+               msgr->inst.addr = *myaddr;
+
+       /* select a random nonce */
+       msgr->inst.addr.type = 0;
+       get_random_bytes(&msgr->inst.addr.nonce, sizeof(msgr->inst.addr.nonce));
+       encode_my_addr(msgr);
+
+       dout("messenger_create %p\n", msgr);
+       return msgr;
+}
+EXPORT_SYMBOL(ceph_messenger_create);
+
+void ceph_messenger_destroy(struct ceph_messenger *msgr)
+{
+       dout("destroy %p\n", msgr);
+       kunmap(msgr->zero_page);
+       __free_page(msgr->zero_page);
+       kfree(msgr);
+       dout("destroyed messenger %p\n", msgr);
+}
+EXPORT_SYMBOL(ceph_messenger_destroy);
+
+/*
+ * Queue up an outgoing message on the given connection.
+ */
+void ceph_con_send(struct ceph_connection *con, struct ceph_msg *msg)
+{
+       if (test_bit(CLOSED, &con->state)) {
+               dout("con_send %p closed, dropping %p\n", con, msg);
+               ceph_msg_put(msg);
+               return;
+       }
+
+       /* set src+dst */
+       msg->hdr.src = con->msgr->inst.name;
+
+       BUG_ON(msg->front.iov_len != le32_to_cpu(msg->hdr.front_len));
+
+       msg->needs_out_seq = true;
+
+       /* queue */
+       mutex_lock(&con->mutex);
+       BUG_ON(!list_empty(&msg->list_head));
+       list_add_tail(&msg->list_head, &con->out_queue);
+       dout("----- %p to %s%lld %d=%s len %d+%d+%d -----\n", msg,
+            ENTITY_NAME(con->peer_name), le16_to_cpu(msg->hdr.type),
+            ceph_msg_type_name(le16_to_cpu(msg->hdr.type)),
+            le32_to_cpu(msg->hdr.front_len),
+            le32_to_cpu(msg->hdr.middle_len),
+            le32_to_cpu(msg->hdr.data_len));
+       mutex_unlock(&con->mutex);
+
+       /* if there wasn't anything waiting to send before, queue
+        * new work */
+       if (test_and_set_bit(WRITE_PENDING, &con->state) == 0)
+               queue_con(con);
+}
+EXPORT_SYMBOL(ceph_con_send);
+
+/*
+ * Revoke a message that was previously queued for send
+ */
+void ceph_con_revoke(struct ceph_connection *con, struct ceph_msg *msg)
+{
+       mutex_lock(&con->mutex);
+       if (!list_empty(&msg->list_head)) {
+               dout("con_revoke %p msg %p - was on queue\n", con, msg);
+               list_del_init(&msg->list_head);
+               ceph_msg_put(msg);
+               msg->hdr.seq = 0;
+       }
+       if (con->out_msg == msg) {
+               dout("con_revoke %p msg %p - was sending\n", con, msg);
+               con->out_msg = NULL;
+               if (con->out_kvec_is_msg) {
+                       con->out_skip = con->out_kvec_bytes;
+                       con->out_kvec_is_msg = false;
+               }
+               ceph_msg_put(msg);
+               msg->hdr.seq = 0;
+       }
+       mutex_unlock(&con->mutex);
+}
+
+/*
+ * Revoke a message that we may be reading data into
+ */
+void ceph_con_revoke_message(struct ceph_connection *con, struct ceph_msg *msg)
+{
+       mutex_lock(&con->mutex);
+       if (con->in_msg && con->in_msg == msg) {
+               unsigned front_len = le32_to_cpu(con->in_hdr.front_len);
+               unsigned middle_len = le32_to_cpu(con->in_hdr.middle_len);
+               unsigned data_len = le32_to_cpu(con->in_hdr.data_len);
+
+               /* skip rest of message */
+               dout("con_revoke_pages %p msg %p revoked\n", con, msg);
+                       con->in_base_pos = con->in_base_pos -
+                               sizeof(struct ceph_msg_header) -
+                               front_len -
+                               middle_len -
+                               data_len -
+                               sizeof(struct ceph_msg_footer);
+               ceph_msg_put(con->in_msg);
+               con->in_msg = NULL;
+               con->in_tag = CEPH_MSGR_TAG_READY;
+               con->in_seq++;
+       } else {
+               dout("con_revoke_pages %p msg %p pages %p no-op\n",
+                    con, con->in_msg, msg);
+       }
+       mutex_unlock(&con->mutex);
+}
+
+/*
+ * Queue a keepalive byte to ensure the tcp connection is alive.
+ */
+void ceph_con_keepalive(struct ceph_connection *con)
+{
+       if (test_and_set_bit(KEEPALIVE_PENDING, &con->state) == 0 &&
+           test_and_set_bit(WRITE_PENDING, &con->state) == 0)
+               queue_con(con);
+}
+EXPORT_SYMBOL(ceph_con_keepalive);
+
+
+/*
+ * construct a new message with given type, size
+ * the new msg has a ref count of 1.
+ */
+struct ceph_msg *ceph_msg_new(int type, int front_len, gfp_t flags)
+{
+       struct ceph_msg *m;
+
+       m = kmalloc(sizeof(*m), flags);
+       if (m == NULL)
+               goto out;
+       kref_init(&m->kref);
+       INIT_LIST_HEAD(&m->list_head);
+
+       m->hdr.tid = 0;
+       m->hdr.type = cpu_to_le16(type);
+       m->hdr.priority = cpu_to_le16(CEPH_MSG_PRIO_DEFAULT);
+       m->hdr.version = 0;
+       m->hdr.front_len = cpu_to_le32(front_len);
+       m->hdr.middle_len = 0;
+       m->hdr.data_len = 0;
+       m->hdr.data_off = 0;
+       m->hdr.reserved = 0;
+       m->footer.front_crc = 0;
+       m->footer.middle_crc = 0;
+       m->footer.data_crc = 0;
+       m->footer.flags = 0;
+       m->front_max = front_len;
+       m->front_is_vmalloc = false;
+       m->more_to_follow = false;
+       m->pool = NULL;
+
+       /* front */
+       if (front_len) {
+               if (front_len > PAGE_CACHE_SIZE) {
+                       m->front.iov_base = __vmalloc(front_len, flags,
+                                                     PAGE_KERNEL);
+                       m->front_is_vmalloc = true;
+               } else {
+                       m->front.iov_base = kmalloc(front_len, flags);
+               }
+               if (m->front.iov_base == NULL) {
+                       pr_err("msg_new can't allocate %d bytes\n",
+                            front_len);
+                       goto out2;
+               }
+       } else {
+               m->front.iov_base = NULL;
+       }
+       m->front.iov_len = front_len;
+
+       /* middle */
+       m->middle = NULL;
+
+       /* data */
+       m->nr_pages = 0;
+       m->pages = NULL;
+       m->pagelist = NULL;
+       m->bio = NULL;
+       m->bio_iter = NULL;
+       m->bio_seg = 0;
+       m->trail = NULL;
+
+       dout("ceph_msg_new %p front %d\n", m, front_len);
+       return m;
+
+out2:
+       ceph_msg_put(m);
+out:
+       pr_err("msg_new can't create type %d front %d\n", type, front_len);
+       return NULL;
+}
+EXPORT_SYMBOL(ceph_msg_new);
+
+/*
+ * Allocate "middle" portion of a message, if it is needed and wasn't
+ * allocated by alloc_msg.  This allows us to read a small fixed-size
+ * per-type header in the front and then gracefully fail (i.e.,
+ * propagate the error to the caller based on info in the front) when
+ * the middle is too large.
+ */
+static int ceph_alloc_middle(struct ceph_connection *con, struct ceph_msg *msg)
+{
+       int type = le16_to_cpu(msg->hdr.type);
+       int middle_len = le32_to_cpu(msg->hdr.middle_len);
+
+       dout("alloc_middle %p type %d %s middle_len %d\n", msg, type,
+            ceph_msg_type_name(type), middle_len);
+       BUG_ON(!middle_len);
+       BUG_ON(msg->middle);
+
+       msg->middle = ceph_buffer_new(middle_len, GFP_NOFS);
+       if (!msg->middle)
+               return -ENOMEM;
+       return 0;
+}
+
+/*
+ * Generic message allocator, for incoming messages.
+ */
+static struct ceph_msg *ceph_alloc_msg(struct ceph_connection *con,
+                               struct ceph_msg_header *hdr,
+                               int *skip)
+{
+       int type = le16_to_cpu(hdr->type);
+       int front_len = le32_to_cpu(hdr->front_len);
+       int middle_len = le32_to_cpu(hdr->middle_len);
+       struct ceph_msg *msg = NULL;
+       int ret;
+
+       if (con->ops->alloc_msg) {
+               mutex_unlock(&con->mutex);
+               msg = con->ops->alloc_msg(con, hdr, skip);
+               mutex_lock(&con->mutex);
+               if (!msg || *skip)
+                       return NULL;
+       }
+       if (!msg) {
+               *skip = 0;
+               msg = ceph_msg_new(type, front_len, GFP_NOFS);
+               if (!msg) {
+                       pr_err("unable to allocate msg type %d len %d\n",
+                              type, front_len);
+                       return NULL;
+               }
+       }
+       memcpy(&msg->hdr, &con->in_hdr, sizeof(con->in_hdr));
+
+       if (middle_len && !msg->middle) {
+               ret = ceph_alloc_middle(con, msg);
+               if (ret < 0) {
+                       ceph_msg_put(msg);
+                       return NULL;
+               }
+       }
+
+       return msg;
+}
+
+
+/*
+ * Free a generically kmalloc'd message.
+ */
+void ceph_msg_kfree(struct ceph_msg *m)
+{
+       dout("msg_kfree %p\n", m);
+       if (m->front_is_vmalloc)
+               vfree(m->front.iov_base);
+       else
+               kfree(m->front.iov_base);
+       kfree(m);
+}
+
+/*
+ * Drop a msg ref.  Destroy as needed.
+ */
+void ceph_msg_last_put(struct kref *kref)
+{
+       struct ceph_msg *m = container_of(kref, struct ceph_msg, kref);
+
+       dout("ceph_msg_put last one on %p\n", m);
+       WARN_ON(!list_empty(&m->list_head));
+
+       /* drop middle, data, if any */
+       if (m->middle) {
+               ceph_buffer_put(m->middle);
+               m->middle = NULL;
+       }
+       m->nr_pages = 0;
+       m->pages = NULL;
+
+       if (m->pagelist) {
+               ceph_pagelist_release(m->pagelist);
+               kfree(m->pagelist);
+               m->pagelist = NULL;
+       }
+
+       m->trail = NULL;
+
+       if (m->pool)
+               ceph_msgpool_put(m->pool, m);
+       else
+               ceph_msg_kfree(m);
+}
+EXPORT_SYMBOL(ceph_msg_last_put);
+
+void ceph_msg_dump(struct ceph_msg *msg)
+{
+       pr_debug("msg_dump %p (front_max %d nr_pages %d)\n", msg,
+                msg->front_max, msg->nr_pages);
+       print_hex_dump(KERN_DEBUG, "header: ",
+                      DUMP_PREFIX_OFFSET, 16, 1,
+                      &msg->hdr, sizeof(msg->hdr), true);
+       print_hex_dump(KERN_DEBUG, " front: ",
+                      DUMP_PREFIX_OFFSET, 16, 1,
+                      msg->front.iov_base, msg->front.iov_len, true);
+       if (msg->middle)
+               print_hex_dump(KERN_DEBUG, "middle: ",
+                              DUMP_PREFIX_OFFSET, 16, 1,
+                              msg->middle->vec.iov_base,
+                              msg->middle->vec.iov_len, true);
+       print_hex_dump(KERN_DEBUG, "footer: ",
+                      DUMP_PREFIX_OFFSET, 16, 1,
+                      &msg->footer, sizeof(msg->footer), true);
+}
+EXPORT_SYMBOL(ceph_msg_dump);
diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c

new file mode 100644 (file)

index 0000000..8a07939
--- /dev/null
+++ b/net/ceph/mon_client.c
@@ -0,0 +1,1027 @@
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/decode.h>
+
+#include <linux/ceph/auth.h>
+
+/*
+ * Interact with Ceph monitor cluster.  Handle requests for new map
+ * versions, and periodically resend as needed.  Also implement
+ * statfs() and umount().
+ *
+ * A small cluster of Ceph "monitors" are responsible for managing critical
+ * cluster configuration and state information.  An odd number (e.g., 3, 5)
+ * of cmon daemons use a modified version of the Paxos part-time parliament
+ * algorithm to manage the MDS map (mds cluster membership), OSD map, and
+ * list of clients who have mounted the file system.
+ *
+ * We maintain an open, active session with a monitor at all times in order to
+ * receive timely MDSMap updates.  We periodically send a keepalive byte on the
+ * TCP socket to ensure we detect a failure.  If the connection does break, we
+ * randomly hunt for a new monitor.  Once the connection is reestablished, we
+ * resend any outstanding requests.
+ */
+
+static const struct ceph_connection_operations mon_con_ops;
+
+static int __validate_auth(struct ceph_mon_client *monc);
+
+/*
+ * Decode a monmap blob (e.g., during mount).
+ */
+struct ceph_monmap *ceph_monmap_decode(void *p, void *end)
+{
+       struct ceph_monmap *m = NULL;
+       int i, err = -EINVAL;
+       struct ceph_fsid fsid;
+       u32 epoch, num_mon;
+       u16 version;
+       u32 len;
+
+       ceph_decode_32_safe(&p, end, len, bad);
+       ceph_decode_need(&p, end, len, bad);
+
+       dout("monmap_decode %p %p len %d\n", p, end, (int)(end-p));
+
+       ceph_decode_16_safe(&p, end, version, bad);
+
+       ceph_decode_need(&p, end, sizeof(fsid) + 2*sizeof(u32), bad);
+       ceph_decode_copy(&p, &fsid, sizeof(fsid));
+       epoch = ceph_decode_32(&p);
+
+       num_mon = ceph_decode_32(&p);
+       ceph_decode_need(&p, end, num_mon*sizeof(m->mon_inst[0]), bad);
+
+       if (num_mon >= CEPH_MAX_MON)
+               goto bad;
+       m = kmalloc(sizeof(*m) + sizeof(m->mon_inst[0])*num_mon, GFP_NOFS);
+       if (m == NULL)
+               return ERR_PTR(-ENOMEM);
+       m->fsid = fsid;
+       m->epoch = epoch;
+       m->num_mon = num_mon;
+       ceph_decode_copy(&p, m->mon_inst, num_mon*sizeof(m->mon_inst[0]));
+       for (i = 0; i < num_mon; i++)
+               ceph_decode_addr(&m->mon_inst[i].addr);
+
+       dout("monmap_decode epoch %d, num_mon %d\n", m->epoch,
+            m->num_mon);
+       for (i = 0; i < m->num_mon; i++)
+               dout("monmap_decode  mon%d is %s\n", i,
+                    ceph_pr_addr(&m->mon_inst[i].addr.in_addr));
+       return m;
+
+bad:
+       dout("monmap_decode failed with %d\n", err);
+       kfree(m);
+       return ERR_PTR(err);
+}
+
+/*
+ * return true if *addr is included in the monmap.
+ */
+int ceph_monmap_contains(struct ceph_monmap *m, struct ceph_entity_addr *addr)
+{
+       int i;
+
+       for (i = 0; i < m->num_mon; i++)
+               if (memcmp(addr, &m->mon_inst[i].addr, sizeof(*addr)) == 0)
+                       return 1;
+       return 0;
+}
+
+/*
+ * Send an auth request.
+ */
+static void __send_prepared_auth_request(struct ceph_mon_client *monc, int len)
+{
+       monc->pending_auth = 1;
+       monc->m_auth->front.iov_len = len;
+       monc->m_auth->hdr.front_len = cpu_to_le32(len);
+       ceph_con_revoke(monc->con, monc->m_auth);
+       ceph_msg_get(monc->m_auth);  /* keep our ref */
+       ceph_con_send(monc->con, monc->m_auth);
+}
+
+/*
+ * Close monitor session, if any.
+ */
+static void __close_session(struct ceph_mon_client *monc)
+{
+       if (monc->con) {
+               dout("__close_session closing mon%d\n", monc->cur_mon);
+               ceph_con_revoke(monc->con, monc->m_auth);
+               ceph_con_close(monc->con);
+               monc->cur_mon = -1;
+               monc->pending_auth = 0;
+               ceph_auth_reset(monc->auth);
+       }
+}
+
+/*
+ * Open a session with a (new) monitor.
+ */
+static int __open_session(struct ceph_mon_client *monc)
+{
+       char r;
+       int ret;
+
+       if (monc->cur_mon < 0) {
+               get_random_bytes(&r, 1);
+               monc->cur_mon = r % monc->monmap->num_mon;
+               dout("open_session num=%d r=%d -> mon%d\n",
+                    monc->monmap->num_mon, r, monc->cur_mon);
+               monc->sub_sent = 0;
+               monc->sub_renew_after = jiffies;  /* i.e., expired */
+               monc->want_next_osdmap = !!monc->want_next_osdmap;
+
+               dout("open_session mon%d opening\n", monc->cur_mon);
+               monc->con->peer_name.type = CEPH_ENTITY_TYPE_MON;
+               monc->con->peer_name.num = cpu_to_le64(monc->cur_mon);
+               ceph_con_open(monc->con,
+                             &monc->monmap->mon_inst[monc->cur_mon].addr);
+
+               /* initiatiate authentication handshake */
+               ret = ceph_auth_build_hello(monc->auth,
+                                           monc->m_auth->front.iov_base,
+                                           monc->m_auth->front_max);
+               __send_prepared_auth_request(monc, ret);
+       } else {
+               dout("open_session mon%d already open\n", monc->cur_mon);
+       }
+       return 0;
+}
+
+static bool __sub_expired(struct ceph_mon_client *monc)
+{
+       return time_after_eq(jiffies, monc->sub_renew_after);
+}
+
+/*
+ * Reschedule delayed work timer.
+ */
+static void __schedule_delayed(struct ceph_mon_client *monc)
+{
+       unsigned delay;
+
+       if (monc->cur_mon < 0 || __sub_expired(monc))
+               delay = 10 * HZ;
+       else
+               delay = 20 * HZ;
+       dout("__schedule_delayed after %u\n", delay);
+       schedule_delayed_work(&monc->delayed_work, delay);
+}
+
+/*
+ * Send subscribe request for mdsmap and/or osdmap.
+ */
+static void __send_subscribe(struct ceph_mon_client *monc)
+{
+       dout("__send_subscribe sub_sent=%u exp=%u want_osd=%d\n",
+            (unsigned)monc->sub_sent, __sub_expired(monc),
+            monc->want_next_osdmap);
+       if ((__sub_expired(monc) && !monc->sub_sent) ||
+           monc->want_next_osdmap == 1) {
+               struct ceph_msg *msg = monc->m_subscribe;
+               struct ceph_mon_subscribe_item *i;
+               void *p, *end;
+               int num;
+
+               p = msg->front.iov_base;
+               end = p + msg->front_max;
+
+               num = 1 + !!monc->want_next_osdmap + !!monc->want_mdsmap;
+               ceph_encode_32(&p, num);
+
+               if (monc->want_next_osdmap) {
+                       dout("__send_subscribe to 'osdmap' %u\n",
+                            (unsigned)monc->have_osdmap);
+                       ceph_encode_string(&p, end, "osdmap", 6);
+                       i = p;
+                       i->have = cpu_to_le64(monc->have_osdmap);
+                       i->onetime = 1;
+                       p += sizeof(*i);
+                       monc->want_next_osdmap = 2;  /* requested */
+               }
+               if (monc->want_mdsmap) {
+                       dout("__send_subscribe to 'mdsmap' %u+\n",
+                            (unsigned)monc->have_mdsmap);
+                       ceph_encode_string(&p, end, "mdsmap", 6);
+                       i = p;
+                       i->have = cpu_to_le64(monc->have_mdsmap);
+                       i->onetime = 0;
+                       p += sizeof(*i);
+               }
+               ceph_encode_string(&p, end, "monmap", 6);
+               i = p;
+               i->have = 0;
+               i->onetime = 0;
+               p += sizeof(*i);
+
+               msg->front.iov_len = p - msg->front.iov_base;
+               msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
+               ceph_con_revoke(monc->con, msg);
+               ceph_con_send(monc->con, ceph_msg_get(msg));
+
+               monc->sub_sent = jiffies | 1;  /* never 0 */
+       }
+}
+
+static void handle_subscribe_ack(struct ceph_mon_client *monc,
+                                struct ceph_msg *msg)
+{
+       unsigned seconds;
+       struct ceph_mon_subscribe_ack *h = msg->front.iov_base;
+
+       if (msg->front.iov_len < sizeof(*h))
+               goto bad;
+       seconds = le32_to_cpu(h->duration);
+
+       mutex_lock(&monc->mutex);
+       if (monc->hunting) {
+               pr_info("mon%d %s session established\n",
+                       monc->cur_mon,
+                       ceph_pr_addr(&monc->con->peer_addr.in_addr));
+               monc->hunting = false;
+       }
+       dout("handle_subscribe_ack after %d seconds\n", seconds);
+       monc->sub_renew_after = monc->sub_sent + (seconds >> 1)*HZ - 1;
+       monc->sub_sent = 0;
+       mutex_unlock(&monc->mutex);
+       return;
+bad:
+       pr_err("got corrupt subscribe-ack msg\n");
+       ceph_msg_dump(msg);
+}
+
+/*
+ * Keep track of which maps we have
+ */
+int ceph_monc_got_mdsmap(struct ceph_mon_client *monc, u32 got)
+{
+       mutex_lock(&monc->mutex);
+       monc->have_mdsmap = got;
+       mutex_unlock(&monc->mutex);
+       return 0;
+}
+EXPORT_SYMBOL(ceph_monc_got_mdsmap);
+
+int ceph_monc_got_osdmap(struct ceph_mon_client *monc, u32 got)
+{
+       mutex_lock(&monc->mutex);
+       monc->have_osdmap = got;
+       monc->want_next_osdmap = 0;
+       mutex_unlock(&monc->mutex);
+       return 0;
+}
+
+/*
+ * Register interest in the next osdmap
+ */
+void ceph_monc_request_next_osdmap(struct ceph_mon_client *monc)
+{
+       dout("request_next_osdmap have %u\n", monc->have_osdmap);
+       mutex_lock(&monc->mutex);
+       if (!monc->want_next_osdmap)
+               monc->want_next_osdmap = 1;
+       if (monc->want_next_osdmap < 2)
+               __send_subscribe(monc);
+       mutex_unlock(&monc->mutex);
+}
+
+/*
+ *
+ */
+int ceph_monc_open_session(struct ceph_mon_client *monc)
+{
+       if (!monc->con) {
+               monc->con = kmalloc(sizeof(*monc->con), GFP_KERNEL);
+               if (!monc->con)
+                       return -ENOMEM;
+               ceph_con_init(monc->client->msgr, monc->con);
+               monc->con->private = monc;
+               monc->con->ops = &mon_con_ops;
+       }
+
+       mutex_lock(&monc->mutex);
+       __open_session(monc);
+       __schedule_delayed(monc);
+       mutex_unlock(&monc->mutex);
+       return 0;
+}
+EXPORT_SYMBOL(ceph_monc_open_session);
+
+/*
+ * The monitor responds with mount ack indicate mount success.  The
+ * included client ticket allows the client to talk to MDSs and OSDs.
+ */
+static void ceph_monc_handle_map(struct ceph_mon_client *monc,
+                                struct ceph_msg *msg)
+{
+       struct ceph_client *client = monc->client;
+       struct ceph_monmap *monmap = NULL, *old = monc->monmap;
+       void *p, *end;
+
+       mutex_lock(&monc->mutex);
+
+       dout("handle_monmap\n");
+       p = msg->front.iov_base;
+       end = p + msg->front.iov_len;
+
+       monmap = ceph_monmap_decode(p, end);
+       if (IS_ERR(monmap)) {
+               pr_err("problem decoding monmap, %d\n",
+                      (int)PTR_ERR(monmap));
+               goto out;
+       }
+
+       if (ceph_check_fsid(monc->client, &monmap->fsid) < 0) {
+               kfree(monmap);
+               goto out;
+       }
+
+       client->monc.monmap = monmap;
+       kfree(old);
+
+out:
+       mutex_unlock(&monc->mutex);
+       wake_up_all(&client->auth_wq);
+}
+
+/*
+ * generic requests (e.g., statfs, poolop)
+ */
+static struct ceph_mon_generic_request *__lookup_generic_req(
+       struct ceph_mon_client *monc, u64 tid)
+{
+       struct ceph_mon_generic_request *req;
+       struct rb_node *n = monc->generic_request_tree.rb_node;
+
+       while (n) {
+               req = rb_entry(n, struct ceph_mon_generic_request, node);
+               if (tid < req->tid)
+                       n = n->rb_left;
+               else if (tid > req->tid)
+                       n = n->rb_right;
+               else
+                       return req;
+       }
+       return NULL;
+}
+
+static void __insert_generic_request(struct ceph_mon_client *monc,
+                           struct ceph_mon_generic_request *new)
+{
+       struct rb_node **p = &monc->generic_request_tree.rb_node;
+       struct rb_node *parent = NULL;
+       struct ceph_mon_generic_request *req = NULL;
+
+       while (*p) {
+               parent = *p;
+               req = rb_entry(parent, struct ceph_mon_generic_request, node);
+               if (new->tid < req->tid)
+                       p = &(*p)->rb_left;
+               else if (new->tid > req->tid)
+                       p = &(*p)->rb_right;
+               else
+                       BUG();
+       }
+
+       rb_link_node(&new->node, parent, p);
+       rb_insert_color(&new->node, &monc->generic_request_tree);
+}
+
+static void release_generic_request(struct kref *kref)
+{
+       struct ceph_mon_generic_request *req =
+               container_of(kref, struct ceph_mon_generic_request, kref);
+
+       if (req->reply)
+               ceph_msg_put(req->reply);
+       if (req->request)
+               ceph_msg_put(req->request);
+
+       kfree(req);
+}
+
+static void put_generic_request(struct ceph_mon_generic_request *req)
+{
+       kref_put(&req->kref, release_generic_request);
+}
+
+static void get_generic_request(struct ceph_mon_generic_request *req)
+{
+       kref_get(&req->kref);
+}
+
+static struct ceph_msg *get_generic_reply(struct ceph_connection *con,
+                                        struct ceph_msg_header *hdr,
+                                        int *skip)
+{
+       struct ceph_mon_client *monc = con->private;
+       struct ceph_mon_generic_request *req;
+       u64 tid = le64_to_cpu(hdr->tid);
+       struct ceph_msg *m;
+
+       mutex_lock(&monc->mutex);
+       req = __lookup_generic_req(monc, tid);
+       if (!req) {
+               dout("get_generic_reply %lld dne\n", tid);
+               *skip = 1;
+               m = NULL;
+       } else {
+               dout("get_generic_reply %lld got %p\n", tid, req->reply);
+               m = ceph_msg_get(req->reply);
+               /*
+                * we don't need to track the connection reading into
+                * this reply because we only have one open connection
+                * at a time, ever.
+                */
+       }
+       mutex_unlock(&monc->mutex);
+       return m;
+}
+
+static int do_generic_request(struct ceph_mon_client *monc,
+                             struct ceph_mon_generic_request *req)
+{
+       int err;
+
+       /* register request */
+       mutex_lock(&monc->mutex);
+       req->tid = ++monc->last_tid;
+       req->request->hdr.tid = cpu_to_le64(req->tid);
+       __insert_generic_request(monc, req);
+       monc->num_generic_requests++;
+       ceph_con_send(monc->con, ceph_msg_get(req->request));
+       mutex_unlock(&monc->mutex);
+
+       err = wait_for_completion_interruptible(&req->completion);
+
+       mutex_lock(&monc->mutex);
+       rb_erase(&req->node, &monc->generic_request_tree);
+       monc->num_generic_requests--;
+       mutex_unlock(&monc->mutex);
+
+       if (!err)
+               err = req->result;
+       return err;
+}
+
+/*
+ * statfs
+ */
+static void handle_statfs_reply(struct ceph_mon_client *monc,
+                               struct ceph_msg *msg)
+{
+       struct ceph_mon_generic_request *req;
+       struct ceph_mon_statfs_reply *reply = msg->front.iov_base;
+       u64 tid = le64_to_cpu(msg->hdr.tid);
+
+       if (msg->front.iov_len != sizeof(*reply))
+               goto bad;
+       dout("handle_statfs_reply %p tid %llu\n", msg, tid);
+
+       mutex_lock(&monc->mutex);
+       req = __lookup_generic_req(monc, tid);
+       if (req) {
+               *(struct ceph_statfs *)req->buf = reply->st;
+               req->result = 0;
+               get_generic_request(req);
+       }
+       mutex_unlock(&monc->mutex);
+       if (req) {
+               complete_all(&req->completion);
+               put_generic_request(req);
+       }
+       return;
+
+bad:
+       pr_err("corrupt generic reply, tid %llu\n", tid);
+       ceph_msg_dump(msg);
+}
+
+/*
+ * Do a synchronous statfs().
+ */
+int ceph_monc_do_statfs(struct ceph_mon_client *monc, struct ceph_statfs *buf)
+{
+       struct ceph_mon_generic_request *req;
+       struct ceph_mon_statfs *h;
+       int err;
+
+       req = kzalloc(sizeof(*req), GFP_NOFS);
+       if (!req)
+               return -ENOMEM;
+
+       kref_init(&req->kref);
+       req->buf = buf;
+       req->buf_len = sizeof(*buf);
+       init_completion(&req->completion);
+
+       err = -ENOMEM;
+       req->request = ceph_msg_new(CEPH_MSG_STATFS, sizeof(*h), GFP_NOFS);
+       if (!req->request)
+               goto out;
+       req->reply = ceph_msg_new(CEPH_MSG_STATFS_REPLY, 1024, GFP_NOFS);
+       if (!req->reply)
+               goto out;
+
+       /* fill out request */
+       h = req->request->front.iov_base;
+       h->monhdr.have_version = 0;
+       h->monhdr.session_mon = cpu_to_le16(-1);
+       h->monhdr.session_mon_tid = 0;
+       h->fsid = monc->monmap->fsid;
+
+       err = do_generic_request(monc, req);
+
+out:
+       kref_put(&req->kref, release_generic_request);
+       return err;
+}
+EXPORT_SYMBOL(ceph_monc_do_statfs);
+
+/*
+ * pool ops
+ */
+static int get_poolop_reply_buf(const char *src, size_t src_len,
+                               char *dst, size_t dst_len)
+{
+       u32 buf_len;
+
+       if (src_len != sizeof(u32) + dst_len)
+               return -EINVAL;
+
+       buf_len = le32_to_cpu(*(u32 *)src);
+       if (buf_len != dst_len)
+               return -EINVAL;
+
+       memcpy(dst, src + sizeof(u32), dst_len);
+       return 0;
+}
+
+static void handle_poolop_reply(struct ceph_mon_client *monc,
+                               struct ceph_msg *msg)
+{
+       struct ceph_mon_generic_request *req;
+       struct ceph_mon_poolop_reply *reply = msg->front.iov_base;
+       u64 tid = le64_to_cpu(msg->hdr.tid);
+
+       if (msg->front.iov_len < sizeof(*reply))
+               goto bad;
+       dout("handle_poolop_reply %p tid %llu\n", msg, tid);
+
+       mutex_lock(&monc->mutex);
+       req = __lookup_generic_req(monc, tid);
+       if (req) {
+               if (req->buf_len &&
+                   get_poolop_reply_buf(msg->front.iov_base + sizeof(*reply),
+                                    msg->front.iov_len - sizeof(*reply),
+                                    req->buf, req->buf_len) < 0) {
+                       mutex_unlock(&monc->mutex);
+                       goto bad;
+               }
+               req->result = le32_to_cpu(reply->reply_code);
+               get_generic_request(req);
+       }
+       mutex_unlock(&monc->mutex);
+       if (req) {
+               complete(&req->completion);
+               put_generic_request(req);
+       }
+       return;
+
+bad:
+       pr_err("corrupt generic reply, tid %llu\n", tid);
+       ceph_msg_dump(msg);
+}
+
+/*
+ * Do a synchronous pool op.
+ */
+int ceph_monc_do_poolop(struct ceph_mon_client *monc, u32 op,
+                       u32 pool, u64 snapid,
+                       char *buf, int len)
+{
+       struct ceph_mon_generic_request *req;
+       struct ceph_mon_poolop *h;
+       int err;
+
+       req = kzalloc(sizeof(*req), GFP_NOFS);
+       if (!req)
+               return -ENOMEM;
+
+       kref_init(&req->kref);
+       req->buf = buf;
+       req->buf_len = len;
+       init_completion(&req->completion);
+
+       err = -ENOMEM;
+       req->request = ceph_msg_new(CEPH_MSG_POOLOP, sizeof(*h), GFP_NOFS);
+       if (!req->request)
+               goto out;
+       req->reply = ceph_msg_new(CEPH_MSG_POOLOP_REPLY, 1024, GFP_NOFS);
+       if (!req->reply)
+               goto out;
+
+       /* fill out request */
+       req->request->hdr.version = cpu_to_le16(2);
+       h = req->request->front.iov_base;
+       h->monhdr.have_version = 0;
+       h->monhdr.session_mon = cpu_to_le16(-1);
+       h->monhdr.session_mon_tid = 0;
+       h->fsid = monc->monmap->fsid;
+       h->pool = cpu_to_le32(pool);
+       h->op = cpu_to_le32(op);
+       h->auid = 0;
+       h->snapid = cpu_to_le64(snapid);
+       h->name_len = 0;
+
+       err = do_generic_request(monc, req);
+
+out:
+       kref_put(&req->kref, release_generic_request);
+       return err;
+}
+
+int ceph_monc_create_snapid(struct ceph_mon_client *monc,
+                           u32 pool, u64 *snapid)
+{
+       return ceph_monc_do_poolop(monc,  POOL_OP_CREATE_UNMANAGED_SNAP,
+                                  pool, 0, (char *)snapid, sizeof(*snapid));
+
+}
+EXPORT_SYMBOL(ceph_monc_create_snapid);
+
+int ceph_monc_delete_snapid(struct ceph_mon_client *monc,
+                           u32 pool, u64 snapid)
+{
+       return ceph_monc_do_poolop(monc,  POOL_OP_CREATE_UNMANAGED_SNAP,
+                                  pool, snapid, 0, 0);
+
+}
+
+/*
+ * Resend pending generic requests.
+ */
+static void __resend_generic_request(struct ceph_mon_client *monc)
+{
+       struct ceph_mon_generic_request *req;
+       struct rb_node *p;
+
+       for (p = rb_first(&monc->generic_request_tree); p; p = rb_next(p)) {
+               req = rb_entry(p, struct ceph_mon_generic_request, node);
+               ceph_con_revoke(monc->con, req->request);
+               ceph_con_send(monc->con, ceph_msg_get(req->request));
+       }
+}
+
+/*
+ * Delayed work.  If we haven't mounted yet, retry.  Otherwise,
+ * renew/retry subscription as needed (in case it is timing out, or we
+ * got an ENOMEM).  And keep the monitor connection alive.
+ */
+static void delayed_work(struct work_struct *work)
+{
+       struct ceph_mon_client *monc =
+               container_of(work, struct ceph_mon_client, delayed_work.work);
+
+       dout("monc delayed_work\n");
+       mutex_lock(&monc->mutex);
+       if (monc->hunting) {
+               __close_session(monc);
+               __open_session(monc);  /* continue hunting */
+       } else {
+               ceph_con_keepalive(monc->con);
+
+               __validate_auth(monc);
+
+               if (monc->auth->ops->is_authenticated(monc->auth))
+                       __send_subscribe(monc);
+       }
+       __schedule_delayed(monc);
+       mutex_unlock(&monc->mutex);
+}
+
+/*
+ * On startup, we build a temporary monmap populated with the IPs
+ * provided by mount(2).
+ */
+static int build_initial_monmap(struct ceph_mon_client *monc)
+{
+       struct ceph_options *opt = monc->client->options;
+       struct ceph_entity_addr *mon_addr = opt->mon_addr;
+       int num_mon = opt->num_mon;
+       int i;
+
+       /* build initial monmap */
+       monc->monmap = kzalloc(sizeof(*monc->monmap) +
+                              num_mon*sizeof(monc->monmap->mon_inst[0]),
+                              GFP_KERNEL);
+       if (!monc->monmap)
+               return -ENOMEM;
+       for (i = 0; i < num_mon; i++) {
+               monc->monmap->mon_inst[i].addr = mon_addr[i];
+               monc->monmap->mon_inst[i].addr.nonce = 0;
+               monc->monmap->mon_inst[i].name.type =
+                       CEPH_ENTITY_TYPE_MON;
+               monc->monmap->mon_inst[i].name.num = cpu_to_le64(i);
+       }
+       monc->monmap->num_mon = num_mon;
+       monc->have_fsid = false;
+       return 0;
+}
+
+int ceph_monc_init(struct ceph_mon_client *monc, struct ceph_client *cl)
+{
+       int err = 0;
+
+       dout("init\n");
+       memset(monc, 0, sizeof(*monc));
+       monc->client = cl;
+       monc->monmap = NULL;
+       mutex_init(&monc->mutex);
+
+       err = build_initial_monmap(monc);
+       if (err)
+               goto out;
+
+       monc->con = NULL;
+
+       /* authentication */
+       monc->auth = ceph_auth_init(cl->options->name,
+                                   cl->options->secret);
+       if (IS_ERR(monc->auth))
+               return PTR_ERR(monc->auth);
+       monc->auth->want_keys =
+               CEPH_ENTITY_TYPE_AUTH | CEPH_ENTITY_TYPE_MON |
+               CEPH_ENTITY_TYPE_OSD | CEPH_ENTITY_TYPE_MDS;
+
+       /* msgs */
+       err = -ENOMEM;
+       monc->m_subscribe_ack = ceph_msg_new(CEPH_MSG_MON_SUBSCRIBE_ACK,
+                                    sizeof(struct ceph_mon_subscribe_ack),
+                                    GFP_NOFS);
+       if (!monc->m_subscribe_ack)
+               goto out_monmap;
+
+       monc->m_subscribe = ceph_msg_new(CEPH_MSG_MON_SUBSCRIBE, 96, GFP_NOFS);
+       if (!monc->m_subscribe)
+               goto out_subscribe_ack;
+
+       monc->m_auth_reply = ceph_msg_new(CEPH_MSG_AUTH_REPLY, 4096, GFP_NOFS);
+       if (!monc->m_auth_reply)
+               goto out_subscribe;
+
+       monc->m_auth = ceph_msg_new(CEPH_MSG_AUTH, 4096, GFP_NOFS);
+       monc->pending_auth = 0;
+       if (!monc->m_auth)
+               goto out_auth_reply;
+
+       monc->cur_mon = -1;
+       monc->hunting = true;
+       monc->sub_renew_after = jiffies;
+       monc->sub_sent = 0;
+
+       INIT_DELAYED_WORK(&monc->delayed_work, delayed_work);
+       monc->generic_request_tree = RB_ROOT;
+       monc->num_generic_requests = 0;
+       monc->last_tid = 0;
+
+       monc->have_mdsmap = 0;
+       monc->have_osdmap = 0;
+       monc->want_next_osdmap = 1;
+       return 0;
+
+out_auth_reply:
+       ceph_msg_put(monc->m_auth_reply);
+out_subscribe:
+       ceph_msg_put(monc->m_subscribe);
+out_subscribe_ack:
+       ceph_msg_put(monc->m_subscribe_ack);
+out_monmap:
+       kfree(monc->monmap);
+out:
+       return err;
+}
+EXPORT_SYMBOL(ceph_monc_init);
+
+void ceph_monc_stop(struct ceph_mon_client *monc)
+{
+       dout("stop\n");
+       cancel_delayed_work_sync(&monc->delayed_work);
+
+       mutex_lock(&monc->mutex);
+       __close_session(monc);
+       if (monc->con) {
+               monc->con->private = NULL;
+               monc->con->ops->put(monc->con);
+               monc->con = NULL;
+       }
+       mutex_unlock(&monc->mutex);
+
+       ceph_auth_destroy(monc->auth);
+
+       ceph_msg_put(monc->m_auth);
+       ceph_msg_put(monc->m_auth_reply);
+       ceph_msg_put(monc->m_subscribe);
+       ceph_msg_put(monc->m_subscribe_ack);
+
+       kfree(monc->monmap);
+}
+EXPORT_SYMBOL(ceph_monc_stop);
+
+static void handle_auth_reply(struct ceph_mon_client *monc,
+                             struct ceph_msg *msg)
+{
+       int ret;
+       int was_auth = 0;
+
+       mutex_lock(&monc->mutex);
+       if (monc->auth->ops)
+               was_auth = monc->auth->ops->is_authenticated(monc->auth);
+       monc->pending_auth = 0;
+       ret = ceph_handle_auth_reply(monc->auth, msg->front.iov_base,
+                                    msg->front.iov_len,
+                                    monc->m_auth->front.iov_base,
+                                    monc->m_auth->front_max);
+       if (ret < 0) {
+               monc->client->auth_err = ret;
+               wake_up_all(&monc->client->auth_wq);
+       } else if (ret > 0) {
+               __send_prepared_auth_request(monc, ret);
+       } else if (!was_auth && monc->auth->ops->is_authenticated(monc->auth)) {
+               dout("authenticated, starting session\n");
+
+               monc->client->msgr->inst.name.type = CEPH_ENTITY_TYPE_CLIENT;
+               monc->client->msgr->inst.name.num =
+                                       cpu_to_le64(monc->auth->global_id);
+
+               __send_subscribe(monc);
+               __resend_generic_request(monc);
+       }
+       mutex_unlock(&monc->mutex);
+}
+
+static int __validate_auth(struct ceph_mon_client *monc)
+{
+       int ret;
+
+       if (monc->pending_auth)
+               return 0;
+
+       ret = ceph_build_auth(monc->auth, monc->m_auth->front.iov_base,
+                             monc->m_auth->front_max);
+       if (ret <= 0)
+               return ret; /* either an error, or no need to authenticate */
+       __send_prepared_auth_request(monc, ret);
+       return 0;
+}
+
+int ceph_monc_validate_auth(struct ceph_mon_client *monc)
+{
+       int ret;
+
+       mutex_lock(&monc->mutex);
+       ret = __validate_auth(monc);
+       mutex_unlock(&monc->mutex);
+       return ret;
+}
+EXPORT_SYMBOL(ceph_monc_validate_auth);
+
+/*
+ * handle incoming message
+ */
+static void dispatch(struct ceph_connection *con, struct ceph_msg *msg)
+{
+       struct ceph_mon_client *monc = con->private;
+       int type = le16_to_cpu(msg->hdr.type);
+
+       if (!monc)
+               return;
+
+       switch (type) {
+       case CEPH_MSG_AUTH_REPLY:
+               handle_auth_reply(monc, msg);
+               break;
+
+       case CEPH_MSG_MON_SUBSCRIBE_ACK:
+               handle_subscribe_ack(monc, msg);
+               break;
+
+       case CEPH_MSG_STATFS_REPLY:
+               handle_statfs_reply(monc, msg);
+               break;
+
+       case CEPH_MSG_POOLOP_REPLY:
+               handle_poolop_reply(monc, msg);
+               break;
+
+       case CEPH_MSG_MON_MAP:
+               ceph_monc_handle_map(monc, msg);
+               break;
+
+       case CEPH_MSG_OSD_MAP:
+               ceph_osdc_handle_map(&monc->client->osdc, msg);
+               break;
+
+       default:
+               /* can the chained handler handle it? */
+               if (monc->client->extra_mon_dispatch &&
+                   monc->client->extra_mon_dispatch(monc->client, msg) == 0)
+                       break;
+                       
+               pr_err("received unknown message type %d %s\n", type,
+                      ceph_msg_type_name(type));
+       }
+       ceph_msg_put(msg);
+}
+
+/*
+ * Allocate memory for incoming message
+ */
+static struct ceph_msg *mon_alloc_msg(struct ceph_connection *con,
+                                     struct ceph_msg_header *hdr,
+                                     int *skip)
+{
+       struct ceph_mon_client *monc = con->private;
+       int type = le16_to_cpu(hdr->type);
+       int front_len = le32_to_cpu(hdr->front_len);
+       struct ceph_msg *m = NULL;
+
+       *skip = 0;
+
+       switch (type) {
+       case CEPH_MSG_MON_SUBSCRIBE_ACK:
+               m = ceph_msg_get(monc->m_subscribe_ack);
+               break;
+       case CEPH_MSG_POOLOP_REPLY:
+       case CEPH_MSG_STATFS_REPLY:
+               return get_generic_reply(con, hdr, skip);
+       case CEPH_MSG_AUTH_REPLY:
+               m = ceph_msg_get(monc->m_auth_reply);
+               break;
+       case CEPH_MSG_MON_MAP:
+       case CEPH_MSG_MDS_MAP:
+       case CEPH_MSG_OSD_MAP:
+               m = ceph_msg_new(type, front_len, GFP_NOFS);
+               break;
+       }
+
+       if (!m) {
+               pr_info("alloc_msg unknown type %d\n", type);
+               *skip = 1;
+       }
+       return m;
+}
+
+/*
+ * If the monitor connection resets, pick a new monitor and resubmit
+ * any pending requests.
+ */
+static void mon_fault(struct ceph_connection *con)
+{
+       struct ceph_mon_client *monc = con->private;
+
+       if (!monc)
+               return;
+
+       dout("mon_fault\n");
+       mutex_lock(&monc->mutex);
+       if (!con->private)
+               goto out;
+
+       if (monc->con && !monc->hunting)
+               pr_info("mon%d %s session lost, "
+                       "hunting for new mon\n", monc->cur_mon,
+                       ceph_pr_addr(&monc->con->peer_addr.in_addr));
+
+       __close_session(monc);
+       if (!monc->hunting) {
+               /* start hunting */
+               monc->hunting = true;
+               __open_session(monc);
+       } else {
+               /* already hunting, let's wait a bit */
+               __schedule_delayed(monc);
+       }
+out:
+       mutex_unlock(&monc->mutex);
+}
+
+static const struct ceph_connection_operations mon_con_ops = {
+       .get = ceph_con_get,
+       .put = ceph_con_put,
+       .dispatch = dispatch,
+       .fault = mon_fault,
+       .alloc_msg = mon_alloc_msg,
+};
diff --git a/net/ceph/msgpool.c b/net/ceph/msgpool.c

new file mode 100644 (file)

index 0000000..d5f2d97
--- /dev/null
+++ b/net/ceph/msgpool.c
@@ -0,0 +1,64 @@
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/err.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include <linux/ceph/msgpool.h>
+
+static void *alloc_fn(gfp_t gfp_mask, void *arg)
+{
+       struct ceph_msgpool *pool = arg;
+       void *p;
+
+       p = ceph_msg_new(0, pool->front_len, gfp_mask);
+       if (!p)
+               pr_err("msgpool %s alloc failed\n", pool->name);
+       return p;
+}
+
+static void free_fn(void *element, void *arg)
+{
+       ceph_msg_put(element);
+}
+
+int ceph_msgpool_init(struct ceph_msgpool *pool,
+                     int front_len, int size, bool blocking, const char *name)
+{
+       pool->front_len = front_len;
+       pool->pool = mempool_create(size, alloc_fn, free_fn, pool);
+       if (!pool->pool)
+               return -ENOMEM;
+       pool->name = name;
+       return 0;
+}
+
+void ceph_msgpool_destroy(struct ceph_msgpool *pool)
+{
+       mempool_destroy(pool->pool);
+}
+
+struct ceph_msg *ceph_msgpool_get(struct ceph_msgpool *pool,
+                                 int front_len)
+{
+       if (front_len > pool->front_len) {
+               pr_err("msgpool_get pool %s need front %d, pool size is %d\n",
+                      pool->name, front_len, pool->front_len);
+               WARN_ON(1);
+
+               /* try to alloc a fresh message */
+               return ceph_msg_new(0, front_len, GFP_NOFS);
+       }
+
+       return mempool_alloc(pool->pool, GFP_NOFS);
+}
+
+void ceph_msgpool_put(struct ceph_msgpool *pool, struct ceph_msg *msg)
+{
+       /* reset msg front_len; user may have changed it */
+       msg->front.iov_len = pool->front_len;
+       msg->hdr.front_len = cpu_to_le32(pool->front_len);
+
+       kref_init(&msg->kref);  /* retake single ref */
+}
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c

new file mode 100644 (file)

index 0000000..7939199
--- /dev/null
+++ b/net/ceph/osd_client.c
@@ -0,0 +1,1773 @@
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/module.h>
+#include <linux/err.h>
+#include <linux/highmem.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#ifdef CONFIG_BLOCK
+#include <linux/bio.h>
+#endif
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/osd_client.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/pagelist.h>
+
+#define OSD_OP_FRONT_LEN       4096
+#define OSD_OPREPLY_FRONT_LEN  512
+
+static const struct ceph_connection_operations osd_con_ops;
+static int __kick_requests(struct ceph_osd_client *osdc,
+                         struct ceph_osd *kickosd);
+
+static void kick_requests(struct ceph_osd_client *osdc, struct ceph_osd *osd);
+
+static int op_needs_trail(int op)
+{
+       switch (op) {
+       case CEPH_OSD_OP_GETXATTR:
+       case CEPH_OSD_OP_SETXATTR:
+       case CEPH_OSD_OP_CMPXATTR:
+       case CEPH_OSD_OP_CALL:
+               return 1;
+       default:
+               return 0;
+       }
+}
+
+static int op_has_extent(int op)
+{
+       return (op == CEPH_OSD_OP_READ ||
+               op == CEPH_OSD_OP_WRITE);
+}
+
+void ceph_calc_raw_layout(struct ceph_osd_client *osdc,
+                       struct ceph_file_layout *layout,
+                       u64 snapid,
+                       u64 off, u64 *plen, u64 *bno,
+                       struct ceph_osd_request *req,
+                       struct ceph_osd_req_op *op)
+{
+       struct ceph_osd_request_head *reqhead = req->r_request->front.iov_base;
+       u64 orig_len = *plen;
+       u64 objoff, objlen;    /* extent in object */
+
+       reqhead->snapid = cpu_to_le64(snapid);
+
+       /* object extent? */
+       ceph_calc_file_object_mapping(layout, off, plen, bno,
+                                     &objoff, &objlen);
+       if (*plen < orig_len)
+               dout(" skipping last %llu, final file extent %llu~%llu\n",
+                    orig_len - *plen, off, *plen);
+
+       if (op_has_extent(op->op)) {
+               op->extent.offset = objoff;
+               op->extent.length = objlen;
+       }
+       req->r_num_pages = calc_pages_for(off, *plen);
+       if (op->op == CEPH_OSD_OP_WRITE)
+               op->payload_len = *plen;
+
+       dout("calc_layout bno=%llx %llu~%llu (%d pages)\n",
+            *bno, objoff, objlen, req->r_num_pages);
+
+}
+EXPORT_SYMBOL(ceph_calc_raw_layout);
+
+/*
+ * Implement client access to distributed object storage cluster.
+ *
+ * All data objects are stored within a cluster/cloud of OSDs, or
+ * "object storage devices."  (Note that Ceph OSDs have _nothing_ to
+ * do with the T10 OSD extensions to SCSI.)  Ceph OSDs are simply
+ * remote daemons serving up and coordinating consistent and safe
+ * access to storage.
+ *
+ * Cluster membership and the mapping of data objects onto storage devices
+ * are described by the osd map.
+ *
+ * We keep track of pending OSD requests (read, write), resubmit
+ * requests to different OSDs when the cluster topology/data layout
+ * change, or retry the affected requests when the communications
+ * channel with an OSD is reset.
+ */
+
+/*
+ * calculate the mapping of a file extent onto an object, and fill out the
+ * request accordingly.  shorten extent as necessary if it crosses an
+ * object boundary.
+ *
+ * fill osd op in request message.
+ */
+static void calc_layout(struct ceph_osd_client *osdc,
+                       struct ceph_vino vino,
+                       struct ceph_file_layout *layout,
+                       u64 off, u64 *plen,
+                       struct ceph_osd_request *req,
+                       struct ceph_osd_req_op *op)
+{
+       u64 bno;
+
+       ceph_calc_raw_layout(osdc, layout, vino.snap, off,
+                            plen, &bno, req, op);
+
+       sprintf(req->r_oid, "%llx.%08llx", vino.ino, bno);
+       req->r_oid_len = strlen(req->r_oid);
+}
+
+/*
+ * requests
+ */
+void ceph_osdc_release_request(struct kref *kref)
+{
+       struct ceph_osd_request *req = container_of(kref,
+                                                   struct ceph_osd_request,
+                                                   r_kref);
+
+       if (req->r_request)
+               ceph_msg_put(req->r_request);
+       if (req->r_reply)
+               ceph_msg_put(req->r_reply);
+       if (req->r_con_filling_msg) {
+               dout("release_request revoking pages %p from con %p\n",
+                    req->r_pages, req->r_con_filling_msg);
+               ceph_con_revoke_message(req->r_con_filling_msg,
+                                     req->r_reply);
+               ceph_con_put(req->r_con_filling_msg);
+       }
+       if (req->r_own_pages)
+               ceph_release_page_vector(req->r_pages,
+                                        req->r_num_pages);
+#ifdef CONFIG_BLOCK
+       if (req->r_bio)
+               bio_put(req->r_bio);
+#endif
+       ceph_put_snap_context(req->r_snapc);
+       if (req->r_trail) {
+               ceph_pagelist_release(req->r_trail);
+               kfree(req->r_trail);
+       }
+       if (req->r_mempool)
+               mempool_free(req, req->r_osdc->req_mempool);
+       else
+               kfree(req);
+}
+EXPORT_SYMBOL(ceph_osdc_release_request);
+
+static int get_num_ops(struct ceph_osd_req_op *ops, int *needs_trail)
+{
+       int i = 0;
+
+       if (needs_trail)
+               *needs_trail = 0;
+       while (ops[i].op) {
+               if (needs_trail && op_needs_trail(ops[i].op))
+                       *needs_trail = 1;
+               i++;
+       }
+
+       return i;
+}
+
+struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
+                                              int flags,
+                                              struct ceph_snap_context *snapc,
+                                              struct ceph_osd_req_op *ops,
+                                              bool use_mempool,
+                                              gfp_t gfp_flags,
+                                              struct page **pages,
+                                              struct bio *bio)
+{
+       struct ceph_osd_request *req;
+       struct ceph_msg *msg;
+       int needs_trail;
+       int num_op = get_num_ops(ops, &needs_trail);
+       size_t msg_size = sizeof(struct ceph_osd_request_head);
+
+       msg_size += num_op*sizeof(struct ceph_osd_op);
+
+       if (use_mempool) {
+               req = mempool_alloc(osdc->req_mempool, gfp_flags);
+               memset(req, 0, sizeof(*req));
+       } else {
+               req = kzalloc(sizeof(*req), gfp_flags);
+       }
+       if (req == NULL)
+               return NULL;
+
+       req->r_osdc = osdc;
+       req->r_mempool = use_mempool;
+
+       kref_init(&req->r_kref);
+       init_completion(&req->r_completion);
+       init_completion(&req->r_safe_completion);
+       INIT_LIST_HEAD(&req->r_unsafe_item);
+       req->r_flags = flags;
+
+       WARN_ON((flags & (CEPH_OSD_FLAG_READ|CEPH_OSD_FLAG_WRITE)) == 0);
+
+       /* create reply message */
+       if (use_mempool)
+               msg = ceph_msgpool_get(&osdc->msgpool_op_reply, 0);
+       else
+               msg = ceph_msg_new(CEPH_MSG_OSD_OPREPLY,
+                                  OSD_OPREPLY_FRONT_LEN, gfp_flags);
+       if (!msg) {
+               ceph_osdc_put_request(req);
+               return NULL;
+       }
+       req->r_reply = msg;
+
+       /* allocate space for the trailing data */
+       if (needs_trail) {
+               req->r_trail = kmalloc(sizeof(struct ceph_pagelist), gfp_flags);
+               if (!req->r_trail) {
+                       ceph_osdc_put_request(req);
+                       return NULL;
+               }
+               ceph_pagelist_init(req->r_trail);
+       }
+       /* create request message; allow space for oid */
+       msg_size += 40;
+       if (snapc)
+               msg_size += sizeof(u64) * snapc->num_snaps;
+       if (use_mempool)
+               msg = ceph_msgpool_get(&osdc->msgpool_op, 0);
+       else
+               msg = ceph_msg_new(CEPH_MSG_OSD_OP, msg_size, gfp_flags);
+       if (!msg) {
+               ceph_osdc_put_request(req);
+               return NULL;
+       }
+
+       msg->hdr.type = cpu_to_le16(CEPH_MSG_OSD_OP);
+       memset(msg->front.iov_base, 0, msg->front.iov_len);
+
+       req->r_request = msg;
+       req->r_pages = pages;
+#ifdef CONFIG_BLOCK
+       if (bio) {
+               req->r_bio = bio;
+               bio_get(req->r_bio);
+       }
+#endif
+
+       return req;
+}
+EXPORT_SYMBOL(ceph_osdc_alloc_request);
+
+static void osd_req_encode_op(struct ceph_osd_request *req,
+                             struct ceph_osd_op *dst,
+                             struct ceph_osd_req_op *src)
+{
+       dst->op = cpu_to_le16(src->op);
+
+       switch (dst->op) {
+       case CEPH_OSD_OP_READ:
+       case CEPH_OSD_OP_WRITE:
+               dst->extent.offset =
+                       cpu_to_le64(src->extent.offset);
+               dst->extent.length =
+                       cpu_to_le64(src->extent.length);
+               dst->extent.truncate_size =
+                       cpu_to_le64(src->extent.truncate_size);
+               dst->extent.truncate_seq =
+                       cpu_to_le32(src->extent.truncate_seq);
+               break;
+
+       case CEPH_OSD_OP_GETXATTR:
+       case CEPH_OSD_OP_SETXATTR:
+       case CEPH_OSD_OP_CMPXATTR:
+               BUG_ON(!req->r_trail);
+
+               dst->xattr.name_len = cpu_to_le32(src->xattr.name_len);
+               dst->xattr.value_len = cpu_to_le32(src->xattr.value_len);
+               dst->xattr.cmp_op = src->xattr.cmp_op;
+               dst->xattr.cmp_mode = src->xattr.cmp_mode;
+               ceph_pagelist_append(req->r_trail, src->xattr.name,
+                                    src->xattr.name_len);
+               ceph_pagelist_append(req->r_trail, src->xattr.val,
+                                    src->xattr.value_len);
+               break;
+       case CEPH_OSD_OP_CALL:
+               BUG_ON(!req->r_trail);
+
+               dst->cls.class_len = src->cls.class_len;
+               dst->cls.method_len = src->cls.method_len;
+               dst->cls.indata_len = cpu_to_le32(src->cls.indata_len);
+
+               ceph_pagelist_append(req->r_trail, src->cls.class_name,
+                                    src->cls.class_len);
+               ceph_pagelist_append(req->r_trail, src->cls.method_name,
+                                    src->cls.method_len);
+               ceph_pagelist_append(req->r_trail, src->cls.indata,
+                                    src->cls.indata_len);
+               break;
+       case CEPH_OSD_OP_ROLLBACK:
+               dst->snap.snapid = cpu_to_le64(src->snap.snapid);
+               break;
+       case CEPH_OSD_OP_STARTSYNC:
+               break;
+       default:
+               pr_err("unrecognized osd opcode %d\n", dst->op);
+               WARN_ON(1);
+               break;
+       }
+       dst->payload_len = cpu_to_le32(src->payload_len);
+}
+
+/*
+ * build new request AND message
+ *
+ */
+void ceph_osdc_build_request(struct ceph_osd_request *req,
+                            u64 off, u64 *plen,
+                            struct ceph_osd_req_op *src_ops,
+                            struct ceph_snap_context *snapc,
+                            struct timespec *mtime,
+                            const char *oid,
+                            int oid_len)
+{
+       struct ceph_msg *msg = req->r_request;
+       struct ceph_osd_request_head *head;
+       struct ceph_osd_req_op *src_op;
+       struct ceph_osd_op *op;
+       void *p;
+       int num_op = get_num_ops(src_ops, NULL);
+       size_t msg_size = sizeof(*head) + num_op*sizeof(*op);
+       int flags = req->r_flags;
+       u64 data_len = 0;
+       int i;
+
+       head = msg->front.iov_base;
+       op = (void *)(head + 1);
+       p = (void *)(op + num_op);
+
+       req->r_snapc = ceph_get_snap_context(snapc);
+
+       head->client_inc = cpu_to_le32(1); /* always, for now. */
+       head->flags = cpu_to_le32(flags);
+       if (flags & CEPH_OSD_FLAG_WRITE)
+               ceph_encode_timespec(&head->mtime, mtime);
+       head->num_ops = cpu_to_le16(num_op);
+
+
+       /* fill in oid */
+       head->object_len = cpu_to_le32(oid_len);
+       memcpy(p, oid, oid_len);
+       p += oid_len;
+
+       src_op = src_ops;
+       while (src_op->op) {
+               osd_req_encode_op(req, op, src_op);
+               src_op++;
+               op++;
+       }
+
+       if (req->r_trail)
+               data_len += req->r_trail->length;
+
+       if (snapc) {
+               head->snap_seq = cpu_to_le64(snapc->seq);
+               head->num_snaps = cpu_to_le32(snapc->num_snaps);
+               for (i = 0; i < snapc->num_snaps; i++) {
+                       put_unaligned_le64(snapc->snaps[i], p);
+                       p += sizeof(u64);
+               }
+       }
+
+       if (flags & CEPH_OSD_FLAG_WRITE) {
+               req->r_request->hdr.data_off = cpu_to_le16(off);
+               req->r_request->hdr.data_len = cpu_to_le32(*plen + data_len);
+       } else if (data_len) {
+               req->r_request->hdr.data_off = 0;
+               req->r_request->hdr.data_len = cpu_to_le32(data_len);
+       }
+
+       BUG_ON(p > msg->front.iov_base + msg->front.iov_len);
+       msg_size = p - msg->front.iov_base;
+       msg->front.iov_len = msg_size;
+       msg->hdr.front_len = cpu_to_le32(msg_size);
+       return;
+}
+EXPORT_SYMBOL(ceph_osdc_build_request);
+
+/*
+ * build new request AND message, calculate layout, and adjust file
+ * extent as needed.
+ *
+ * if the file was recently truncated, we include information about its
+ * old and new size so that the object can be updated appropriately.  (we
+ * avoid synchronously deleting truncated objects because it's slow.)
+ *
+ * if @do_sync, include a 'startsync' command so that the osd will flush
+ * data quickly.
+ */
+struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc,
+                                              struct ceph_file_layout *layout,
+                                              struct ceph_vino vino,
+                                              u64 off, u64 *plen,
+                                              int opcode, int flags,
+                                              struct ceph_snap_context *snapc,
+                                              int do_sync,
+                                              u32 truncate_seq,
+                                              u64 truncate_size,
+                                              struct timespec *mtime,
+                                              bool use_mempool, int num_reply)
+{
+       struct ceph_osd_req_op ops[3];
+       struct ceph_osd_request *req;
+
+       ops[0].op = opcode;
+       ops[0].extent.truncate_seq = truncate_seq;
+       ops[0].extent.truncate_size = truncate_size;
+       ops[0].payload_len = 0;
+
+       if (do_sync) {
+               ops[1].op = CEPH_OSD_OP_STARTSYNC;
+               ops[1].payload_len = 0;
+               ops[2].op = 0;
+       } else
+               ops[1].op = 0;
+
+       req = ceph_osdc_alloc_request(osdc, flags,
+                                        snapc, ops,
+                                        use_mempool,
+                                        GFP_NOFS, NULL, NULL);
+       if (IS_ERR(req))
+               return req;
+
+       /* calculate max write size */
+       calc_layout(osdc, vino, layout, off, plen, req, ops);
+       req->r_file_layout = *layout;  /* keep a copy */
+
+       ceph_osdc_build_request(req, off, plen, ops,
+                               snapc,
+                               mtime,
+                               req->r_oid, req->r_oid_len);
+
+       return req;
+}
+EXPORT_SYMBOL(ceph_osdc_new_request);
+
+/*
+ * We keep osd requests in an rbtree, sorted by ->r_tid.
+ */
+static void __insert_request(struct ceph_osd_client *osdc,
+                            struct ceph_osd_request *new)
+{
+       struct rb_node **p = &osdc->requests.rb_node;
+       struct rb_node *parent = NULL;
+       struct ceph_osd_request *req = NULL;
+
+       while (*p) {
+               parent = *p;
+               req = rb_entry(parent, struct ceph_osd_request, r_node);
+               if (new->r_tid < req->r_tid)
+                       p = &(*p)->rb_left;
+               else if (new->r_tid > req->r_tid)
+                       p = &(*p)->rb_right;
+               else
+                       BUG();
+       }
+
+       rb_link_node(&new->r_node, parent, p);
+       rb_insert_color(&new->r_node, &osdc->requests);
+}
+
+static struct ceph_osd_request *__lookup_request(struct ceph_osd_client *osdc,
+                                                u64 tid)
+{
+       struct ceph_osd_request *req;
+       struct rb_node *n = osdc->requests.rb_node;
+
+       while (n) {
+               req = rb_entry(n, struct ceph_osd_request, r_node);
+               if (tid < req->r_tid)
+                       n = n->rb_left;
+               else if (tid > req->r_tid)
+                       n = n->rb_right;
+               else
+                       return req;
+       }
+       return NULL;
+}
+
+static struct ceph_osd_request *
+__lookup_request_ge(struct ceph_osd_client *osdc,
+                   u64 tid)
+{
+       struct ceph_osd_request *req;
+       struct rb_node *n = osdc->requests.rb_node;
+
+       while (n) {
+               req = rb_entry(n, struct ceph_osd_request, r_node);
+               if (tid < req->r_tid) {
+                       if (!n->rb_left)
+                               return req;
+                       n = n->rb_left;
+               } else if (tid > req->r_tid) {
+                       n = n->rb_right;
+               } else {
+                       return req;
+               }
+       }
+       return NULL;
+}
+
+
+/*
+ * If the osd connection drops, we need to resubmit all requests.
+ */
+static void osd_reset(struct ceph_connection *con)
+{
+       struct ceph_osd *osd = con->private;
+       struct ceph_osd_client *osdc;
+
+       if (!osd)
+               return;
+       dout("osd_reset osd%d\n", osd->o_osd);
+       osdc = osd->o_osdc;
+       down_read(&osdc->map_sem);
+       kick_requests(osdc, osd);
+       up_read(&osdc->map_sem);
+}
+
+/*
+ * Track open sessions with osds.
+ */
+static struct ceph_osd *create_osd(struct ceph_osd_client *osdc)
+{
+       struct ceph_osd *osd;
+
+       osd = kzalloc(sizeof(*osd), GFP_NOFS);
+       if (!osd)
+               return NULL;
+
+       atomic_set(&osd->o_ref, 1);
+       osd->o_osdc = osdc;
+       INIT_LIST_HEAD(&osd->o_requests);
+       INIT_LIST_HEAD(&osd->o_osd_lru);
+       osd->o_incarnation = 1;
+
+       ceph_con_init(osdc->client->msgr, &osd->o_con);
+       osd->o_con.private = osd;
+       osd->o_con.ops = &osd_con_ops;
+       osd->o_con.peer_name.type = CEPH_ENTITY_TYPE_OSD;
+
+       INIT_LIST_HEAD(&osd->o_keepalive_item);
+       return osd;
+}
+
+static struct ceph_osd *get_osd(struct ceph_osd *osd)
+{
+       if (atomic_inc_not_zero(&osd->o_ref)) {
+               dout("get_osd %p %d -> %d\n", osd, atomic_read(&osd->o_ref)-1,
+                    atomic_read(&osd->o_ref));
+               return osd;
+       } else {
+               dout("get_osd %p FAIL\n", osd);
+               return NULL;
+       }
+}
+
+static void put_osd(struct ceph_osd *osd)
+{
+       dout("put_osd %p %d -> %d\n", osd, atomic_read(&osd->o_ref),
+            atomic_read(&osd->o_ref) - 1);
+       if (atomic_dec_and_test(&osd->o_ref)) {
+               struct ceph_auth_client *ac = osd->o_osdc->client->monc.auth;
+
+               if (osd->o_authorizer)
+                       ac->ops->destroy_authorizer(ac, osd->o_authorizer);
+               kfree(osd);
+       }
+}
+
+/*
+ * remove an osd from our map
+ */
+static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
+{
+       dout("__remove_osd %p\n", osd);
+       BUG_ON(!list_empty(&osd->o_requests));
+       rb_erase(&osd->o_node, &osdc->osds);
+       list_del_init(&osd->o_osd_lru);
+       ceph_con_close(&osd->o_con);
+       put_osd(osd);
+}
+
+static void __move_osd_to_lru(struct ceph_osd_client *osdc,
+                             struct ceph_osd *osd)
+{
+       dout("__move_osd_to_lru %p\n", osd);
+       BUG_ON(!list_empty(&osd->o_osd_lru));
+       list_add_tail(&osd->o_osd_lru, &osdc->osd_lru);
+       osd->lru_ttl = jiffies + osdc->client->options->osd_idle_ttl * HZ;
+}
+
+static void __remove_osd_from_lru(struct ceph_osd *osd)
+{
+       dout("__remove_osd_from_lru %p\n", osd);
+       if (!list_empty(&osd->o_osd_lru))
+               list_del_init(&osd->o_osd_lru);
+}
+
+static void remove_old_osds(struct ceph_osd_client *osdc, int remove_all)
+{
+       struct ceph_osd *osd, *nosd;
+
+       dout("__remove_old_osds %p\n", osdc);
+       mutex_lock(&osdc->request_mutex);
+       list_for_each_entry_safe(osd, nosd, &osdc->osd_lru, o_osd_lru) {
+               if (!remove_all && time_before(jiffies, osd->lru_ttl))
+                       break;
+               __remove_osd(osdc, osd);
+       }
+       mutex_unlock(&osdc->request_mutex);
+}
+
+/*
+ * reset osd connect
+ */
+static int __reset_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
+{
+       struct ceph_osd_request *req;
+       int ret = 0;
+
+       dout("__reset_osd %p osd%d\n", osd, osd->o_osd);
+       if (list_empty(&osd->o_requests)) {
+               __remove_osd(osdc, osd);
+       } else if (memcmp(&osdc->osdmap->osd_addr[osd->o_osd],
+                         &osd->o_con.peer_addr,
+                         sizeof(osd->o_con.peer_addr)) == 0 &&
+                  !ceph_con_opened(&osd->o_con)) {
+               dout(" osd addr hasn't changed and connection never opened,"
+                    " letting msgr retry");
+               /* touch each r_stamp for handle_timeout()'s benfit */
+               list_for_each_entry(req, &osd->o_requests, r_osd_item)
+                       req->r_stamp = jiffies;
+               ret = -EAGAIN;
+       } else {
+               ceph_con_close(&osd->o_con);
+               ceph_con_open(&osd->o_con, &osdc->osdmap->osd_addr[osd->o_osd]);
+               osd->o_incarnation++;
+       }
+       return ret;
+}
+
+static void __insert_osd(struct ceph_osd_client *osdc, struct ceph_osd *new)
+{
+       struct rb_node **p = &osdc->osds.rb_node;
+       struct rb_node *parent = NULL;
+       struct ceph_osd *osd = NULL;
+
+       while (*p) {
+               parent = *p;
+               osd = rb_entry(parent, struct ceph_osd, o_node);
+               if (new->o_osd < osd->o_osd)
+                       p = &(*p)->rb_left;
+               else if (new->o_osd > osd->o_osd)
+                       p = &(*p)->rb_right;
+               else
+                       BUG();
+       }
+
+       rb_link_node(&new->o_node, parent, p);
+       rb_insert_color(&new->o_node, &osdc->osds);
+}
+
+static struct ceph_osd *__lookup_osd(struct ceph_osd_client *osdc, int o)
+{
+       struct ceph_osd *osd;
+       struct rb_node *n = osdc->osds.rb_node;
+
+       while (n) {
+               osd = rb_entry(n, struct ceph_osd, o_node);
+               if (o < osd->o_osd)
+                       n = n->rb_left;
+               else if (o > osd->o_osd)
+                       n = n->rb_right;
+               else
+                       return osd;
+       }
+       return NULL;
+}
+
+static void __schedule_osd_timeout(struct ceph_osd_client *osdc)
+{
+       schedule_delayed_work(&osdc->timeout_work,
+                       osdc->client->options->osd_keepalive_timeout * HZ);
+}
+
+static void __cancel_osd_timeout(struct ceph_osd_client *osdc)
+{
+       cancel_delayed_work(&osdc->timeout_work);
+}
+
+/*
+ * Register request, assign tid.  If this is the first request, set up
+ * the timeout event.
+ */
+static void register_request(struct ceph_osd_client *osdc,
+                            struct ceph_osd_request *req)
+{
+       mutex_lock(&osdc->request_mutex);
+       req->r_tid = ++osdc->last_tid;
+       req->r_request->hdr.tid = cpu_to_le64(req->r_tid);
+       INIT_LIST_HEAD(&req->r_req_lru_item);
+
+       dout("register_request %p tid %lld\n", req, req->r_tid);
+       __insert_request(osdc, req);
+       ceph_osdc_get_request(req);
+       osdc->num_requests++;
+
+       if (osdc->num_requests == 1) {
+               dout(" first request, scheduling timeout\n");
+               __schedule_osd_timeout(osdc);
+       }
+       mutex_unlock(&osdc->request_mutex);
+}
+
+/*
+ * called under osdc->request_mutex
+ */
+static void __unregister_request(struct ceph_osd_client *osdc,
+                                struct ceph_osd_request *req)
+{
+       dout("__unregister_request %p tid %lld\n", req, req->r_tid);
+       rb_erase(&req->r_node, &osdc->requests);
+       osdc->num_requests--;
+
+       if (req->r_osd) {
+               /* make sure the original request isn't in flight. */
+               ceph_con_revoke(&req->r_osd->o_con, req->r_request);
+
+               list_del_init(&req->r_osd_item);
+               if (list_empty(&req->r_osd->o_requests))
+                       __move_osd_to_lru(osdc, req->r_osd);
+               req->r_osd = NULL;
+       }
+
+       ceph_osdc_put_request(req);
+
+       list_del_init(&req->r_req_lru_item);
+       if (osdc->num_requests == 0) {
+               dout(" no requests, canceling timeout\n");
+               __cancel_osd_timeout(osdc);
+       }
+}
+
+/*
+ * Cancel a previously queued request message
+ */
+static void __cancel_request(struct ceph_osd_request *req)
+{
+       if (req->r_sent && req->r_osd) {
+               ceph_con_revoke(&req->r_osd->o_con, req->r_request);
+               req->r_sent = 0;
+       }
+       list_del_init(&req->r_req_lru_item);
+}
+
+/*
+ * Pick an osd (the first 'up' osd in the pg), allocate the osd struct
+ * (as needed), and set the request r_osd appropriately.  If there is
+ * no up osd, set r_osd to NULL.
+ *
+ * Return 0 if unchanged, 1 if changed, or negative on error.
+ *
+ * Caller should hold map_sem for read and request_mutex.
+ */
+static int __map_osds(struct ceph_osd_client *osdc,
+                     struct ceph_osd_request *req)
+{
+       struct ceph_osd_request_head *reqhead = req->r_request->front.iov_base;
+       struct ceph_pg pgid;
+       int acting[CEPH_PG_MAX_SIZE];
+       int o = -1, num = 0;
+       int err;
+
+       dout("map_osds %p tid %lld\n", req, req->r_tid);
+       err = ceph_calc_object_layout(&reqhead->layout, req->r_oid,
+                                     &req->r_file_layout, osdc->osdmap);
+       if (err)
+               return err;
+       pgid = reqhead->layout.ol_pgid;
+       req->r_pgid = pgid;
+
+       err = ceph_calc_pg_acting(osdc->osdmap, pgid, acting);
+       if (err > 0) {
+               o = acting[0];
+               num = err;
+       }
+
+       if ((req->r_osd && req->r_osd->o_osd == o &&
+            req->r_sent >= req->r_osd->o_incarnation &&
+            req->r_num_pg_osds == num &&
+            memcmp(req->r_pg_osds, acting, sizeof(acting[0])*num) == 0) ||
+           (req->r_osd == NULL && o == -1))
+               return 0;  /* no change */
+
+       dout("map_osds tid %llu pgid %d.%x osd%d (was osd%d)\n",
+            req->r_tid, le32_to_cpu(pgid.pool), le16_to_cpu(pgid.ps), o,
+            req->r_osd ? req->r_osd->o_osd : -1);
+
+       /* record full pg acting set */
+       memcpy(req->r_pg_osds, acting, sizeof(acting[0]) * num);
+       req->r_num_pg_osds = num;
+
+       if (req->r_osd) {
+               __cancel_request(req);
+               list_del_init(&req->r_osd_item);
+               req->r_osd = NULL;
+       }
+
+       req->r_osd = __lookup_osd(osdc, o);
+       if (!req->r_osd && o >= 0) {
+               err = -ENOMEM;
+               req->r_osd = create_osd(osdc);
+               if (!req->r_osd)
+                       goto out;
+
+               dout("map_osds osd %p is osd%d\n", req->r_osd, o);
+               req->r_osd->o_osd = o;
+               req->r_osd->o_con.peer_name.num = cpu_to_le64(o);
+               __insert_osd(osdc, req->r_osd);
+
+               ceph_con_open(&req->r_osd->o_con, &osdc->osdmap->osd_addr[o]);
+       }
+
+       if (req->r_osd) {
+               __remove_osd_from_lru(req->r_osd);
+               list_add(&req->r_osd_item, &req->r_osd->o_requests);
+       }
+       err = 1;   /* osd or pg changed */
+
+out:
+       return err;
+}
+
+/*
+ * caller should hold map_sem (for read) and request_mutex
+ */
+static int __send_request(struct ceph_osd_client *osdc,
+                         struct ceph_osd_request *req)
+{
+       struct ceph_osd_request_head *reqhead;
+       int err;
+
+       err = __map_osds(osdc, req);
+       if (err < 0)
+               return err;
+       if (req->r_osd == NULL) {
+               dout("send_request %p no up osds in pg\n", req);
+               ceph_monc_request_next_osdmap(&osdc->client->monc);
+               return 0;
+       }
+
+       dout("send_request %p tid %llu to osd%d flags %d\n",
+            req, req->r_tid, req->r_osd->o_osd, req->r_flags);
+
+       reqhead = req->r_request->front.iov_base;
+       reqhead->osdmap_epoch = cpu_to_le32(osdc->osdmap->epoch);
+       reqhead->flags |= cpu_to_le32(req->r_flags);  /* e.g., RETRY */
+       reqhead->reassert_version = req->r_reassert_version;
+
+       req->r_stamp = jiffies;
+       list_move_tail(&req->r_req_lru_item, &osdc->req_lru);
+
+       ceph_msg_get(req->r_request); /* send consumes a ref */
+       ceph_con_send(&req->r_osd->o_con, req->r_request);
+       req->r_sent = req->r_osd->o_incarnation;
+       return 0;
+}
+
+/*
+ * Timeout callback, called every N seconds when 1 or more osd
+ * requests has been active for more than N seconds.  When this
+ * happens, we ping all OSDs with requests who have timed out to
+ * ensure any communications channel reset is detected.  Reset the
+ * request timeouts another N seconds in the future as we go.
+ * Reschedule the timeout event another N seconds in future (unless
+ * there are no open requests).
+ */
+static void handle_timeout(struct work_struct *work)
+{
+       struct ceph_osd_client *osdc =
+               container_of(work, struct ceph_osd_client, timeout_work.work);
+       struct ceph_osd_request *req, *last_req = NULL;
+       struct ceph_osd *osd;
+       unsigned long timeout = osdc->client->options->osd_timeout * HZ;
+       unsigned long keepalive =
+               osdc->client->options->osd_keepalive_timeout * HZ;
+       unsigned long last_stamp = 0;
+       struct rb_node *p;
+       struct list_head slow_osds;
+
+       dout("timeout\n");
+       down_read(&osdc->map_sem);
+
+       ceph_monc_request_next_osdmap(&osdc->client->monc);
+
+       mutex_lock(&osdc->request_mutex);
+       for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
+               req = rb_entry(p, struct ceph_osd_request, r_node);
+
+               if (req->r_resend) {
+                       int err;
+
+                       dout("osdc resending prev failed %lld\n", req->r_tid);
+                       err = __send_request(osdc, req);
+                       if (err)
+                               dout("osdc failed again on %lld\n", req->r_tid);
+                       else
+                               req->r_resend = false;
+                       continue;
+               }
+       }
+
+       /*
+        * reset osds that appear to be _really_ unresponsive.  this
+        * is a failsafe measure.. we really shouldn't be getting to
+        * this point if the system is working properly.  the monitors
+        * should mark the osd as failed and we should find out about
+        * it from an updated osd map.
+        */
+       while (timeout && !list_empty(&osdc->req_lru)) {
+               req = list_entry(osdc->req_lru.next, struct ceph_osd_request,
+                                r_req_lru_item);
+
+               if (time_before(jiffies, req->r_stamp + timeout))
+                       break;
+
+               BUG_ON(req == last_req && req->r_stamp == last_stamp);
+               last_req = req;
+               last_stamp = req->r_stamp;
+
+               osd = req->r_osd;
+               BUG_ON(!osd);
+               pr_warning(" tid %llu timed out on osd%d, will reset osd\n",
+                          req->r_tid, osd->o_osd);
+               __kick_requests(osdc, osd);
+       }
+
+       /*
+        * ping osds that are a bit slow.  this ensures that if there
+        * is a break in the TCP connection we will notice, and reopen
+        * a connection with that osd (from the fault callback).
+        */
+       INIT_LIST_HEAD(&slow_osds);
+       list_for_each_entry(req, &osdc->req_lru, r_req_lru_item) {
+               if (time_before(jiffies, req->r_stamp + keepalive))
+                       break;
+
+               osd = req->r_osd;
+               BUG_ON(!osd);
+               dout(" tid %llu is slow, will send keepalive on osd%d\n",
+                    req->r_tid, osd->o_osd);
+               list_move_tail(&osd->o_keepalive_item, &slow_osds);
+       }
+       while (!list_empty(&slow_osds)) {
+               osd = list_entry(slow_osds.next, struct ceph_osd,
+                                o_keepalive_item);
+               list_del_init(&osd->o_keepalive_item);
+               ceph_con_keepalive(&osd->o_con);
+       }
+
+       __schedule_osd_timeout(osdc);
+       mutex_unlock(&osdc->request_mutex);
+
+       up_read(&osdc->map_sem);
+}
+
+static void handle_osds_timeout(struct work_struct *work)
+{
+       struct ceph_osd_client *osdc =
+               container_of(work, struct ceph_osd_client,
+                            osds_timeout_work.work);
+       unsigned long delay =
+               osdc->client->options->osd_idle_ttl * HZ >> 2;
+
+       dout("osds timeout\n");
+       down_read(&osdc->map_sem);
+       remove_old_osds(osdc, 0);
+       up_read(&osdc->map_sem);
+
+       schedule_delayed_work(&osdc->osds_timeout_work,
+                             round_jiffies_relative(delay));
+}
+
+/*
+ * handle osd op reply.  either call the callback if it is specified,
+ * or do the completion to wake up the waiting thread.
+ */
+static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
+                        struct ceph_connection *con)
+{
+       struct ceph_osd_reply_head *rhead = msg->front.iov_base;
+       struct ceph_osd_request *req;
+       u64 tid;
+       int numops, object_len, flags;
+       s32 result;
+
+       tid = le64_to_cpu(msg->hdr.tid);
+       if (msg->front.iov_len < sizeof(*rhead))
+               goto bad;
+       numops = le32_to_cpu(rhead->num_ops);
+       object_len = le32_to_cpu(rhead->object_len);
+       result = le32_to_cpu(rhead->result);
+       if (msg->front.iov_len != sizeof(*rhead) + object_len +
+           numops * sizeof(struct ceph_osd_op))
+               goto bad;
+       dout("handle_reply %p tid %llu result %d\n", msg, tid, (int)result);
+
+       /* lookup */
+       mutex_lock(&osdc->request_mutex);
+       req = __lookup_request(osdc, tid);
+       if (req == NULL) {
+               dout("handle_reply tid %llu dne\n", tid);
+               mutex_unlock(&osdc->request_mutex);
+               return;
+       }
+       ceph_osdc_get_request(req);
+       flags = le32_to_cpu(rhead->flags);
+
+       /*
+        * if this connection filled our message, drop our reference now, to
+        * avoid a (safe but slower) revoke later.
+        */
+       if (req->r_con_filling_msg == con && req->r_reply == msg) {
+               dout(" dropping con_filling_msg ref %p\n", con);
+               req->r_con_filling_msg = NULL;
+               ceph_con_put(con);
+       }
+
+       if (!req->r_got_reply) {
+               unsigned bytes;
+
+               req->r_result = le32_to_cpu(rhead->result);
+               bytes = le32_to_cpu(msg->hdr.data_len);
+               dout("handle_reply result %d bytes %d\n", req->r_result,
+                    bytes);
+               if (req->r_result == 0)
+                       req->r_result = bytes;
+
+               /* in case this is a write and we need to replay, */
+               req->r_reassert_version = rhead->reassert_version;
+
+               req->r_got_reply = 1;
+       } else if ((flags & CEPH_OSD_FLAG_ONDISK) == 0) {
+               dout("handle_reply tid %llu dup ack\n", tid);
+               mutex_unlock(&osdc->request_mutex);
+               goto done;
+       }
+
+       dout("handle_reply tid %llu flags %d\n", tid, flags);
+
+       /* either this is a read, or we got the safe response */
+       if (result < 0 ||
+           (flags & CEPH_OSD_FLAG_ONDISK) ||
+           ((flags & CEPH_OSD_FLAG_WRITE) == 0))
+               __unregister_request(osdc, req);
+
+       mutex_unlock(&osdc->request_mutex);
+
+       if (req->r_callback)
+               req->r_callback(req, msg);
+       else
+               complete_all(&req->r_completion);
+
+       if (flags & CEPH_OSD_FLAG_ONDISK) {
+               if (req->r_safe_callback)
+                       req->r_safe_callback(req, msg);
+               complete_all(&req->r_safe_completion);  /* fsync waiter */
+       }
+
+done:
+       ceph_osdc_put_request(req);
+       return;
+
+bad:
+       pr_err("corrupt osd_op_reply got %d %d expected %d\n",
+              (int)msg->front.iov_len, le32_to_cpu(msg->hdr.front_len),
+              (int)sizeof(*rhead));
+       ceph_msg_dump(msg);
+}
+
+
+static int __kick_requests(struct ceph_osd_client *osdc,
+                         struct ceph_osd *kickosd)
+{
+       struct ceph_osd_request *req;
+       struct rb_node *p, *n;
+       int needmap = 0;
+       int err;
+
+       dout("kick_requests osd%d\n", kickosd ? kickosd->o_osd : -1);
+       if (kickosd) {
+               err = __reset_osd(osdc, kickosd);
+               if (err == -EAGAIN)
+                       return 1;
+       } else {
+               for (p = rb_first(&osdc->osds); p; p = n) {
+                       struct ceph_osd *osd =
+                               rb_entry(p, struct ceph_osd, o_node);
+
+                       n = rb_next(p);
+                       if (!ceph_osd_is_up(osdc->osdmap, osd->o_osd) ||
+                           memcmp(&osd->o_con.peer_addr,
+                                  ceph_osd_addr(osdc->osdmap,
+                                                osd->o_osd),
+                                  sizeof(struct ceph_entity_addr)) != 0)
+                               __reset_osd(osdc, osd);
+               }
+       }
+
+       for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
+               req = rb_entry(p, struct ceph_osd_request, r_node);
+
+               if (req->r_resend) {
+                       dout(" r_resend set on tid %llu\n", req->r_tid);
+                       __cancel_request(req);
+                       goto kick;
+               }
+               if (req->r_osd && kickosd == req->r_osd) {
+                       __cancel_request(req);
+                       goto kick;
+               }
+
+               err = __map_osds(osdc, req);
+               if (err == 0)
+                       continue;  /* no change */
+               if (err < 0) {
+                       /*
+                        * FIXME: really, we should set the request
+                        * error and fail if this isn't a 'nofail'
+                        * request, but that's a fair bit more
+                        * complicated to do.  So retry!
+                        */
+                       dout(" setting r_resend on %llu\n", req->r_tid);
+                       req->r_resend = true;
+                       continue;
+               }
+               if (req->r_osd == NULL) {
+                       dout("tid %llu maps to no valid osd\n", req->r_tid);
+                       needmap++;  /* request a newer map */
+                       continue;
+               }
+
+kick:
+               dout("kicking %p tid %llu osd%d\n", req, req->r_tid,
+                    req->r_osd ? req->r_osd->o_osd : -1);
+               req->r_flags |= CEPH_OSD_FLAG_RETRY;
+               err = __send_request(osdc, req);
+               if (err) {
+                       dout(" setting r_resend on %llu\n", req->r_tid);
+                       req->r_resend = true;
+               }
+       }
+
+       return needmap;
+}
+
+/*
+ * Resubmit osd requests whose osd or osd address has changed.  Request
+ * a new osd map if osds are down, or we are otherwise unable to determine
+ * how to direct a request.
+ *
+ * Close connections to down osds.
+ *
+ * If @who is specified, resubmit requests for that specific osd.
+ *
+ * Caller should hold map_sem for read and request_mutex.
+ */
+static void kick_requests(struct ceph_osd_client *osdc,
+                         struct ceph_osd *kickosd)
+{
+       int needmap;
+
+       mutex_lock(&osdc->request_mutex);
+       needmap = __kick_requests(osdc, kickosd);
+       mutex_unlock(&osdc->request_mutex);
+
+       if (needmap) {
+               dout("%d requests for down osds, need new map\n", needmap);
+               ceph_monc_request_next_osdmap(&osdc->client->monc);
+       }
+
+}
+/*
+ * Process updated osd map.
+ *
+ * The message contains any number of incremental and full maps, normally
+ * indicating some sort of topology change in the cluster.  Kick requests
+ * off to different OSDs as needed.
+ */
+void ceph_osdc_handle_map(struct ceph_osd_client *osdc, struct ceph_msg *msg)
+{
+       void *p, *end, *next;
+       u32 nr_maps, maplen;
+       u32 epoch;
+       struct ceph_osdmap *newmap = NULL, *oldmap;
+       int err;
+       struct ceph_fsid fsid;
+
+       dout("handle_map have %u\n", osdc->osdmap ? osdc->osdmap->epoch : 0);
+       p = msg->front.iov_base;
+       end = p + msg->front.iov_len;
+
+       /* verify fsid */
+       ceph_decode_need(&p, end, sizeof(fsid), bad);
+       ceph_decode_copy(&p, &fsid, sizeof(fsid));
+       if (ceph_check_fsid(osdc->client, &fsid) < 0)
+               return;
+
+       down_write(&osdc->map_sem);
+
+       /* incremental maps */
+       ceph_decode_32_safe(&p, end, nr_maps, bad);
+       dout(" %d inc maps\n", nr_maps);
+       while (nr_maps > 0) {
+               ceph_decode_need(&p, end, 2*sizeof(u32), bad);
+               epoch = ceph_decode_32(&p);
+               maplen = ceph_decode_32(&p);
+               ceph_decode_need(&p, end, maplen, bad);
+               next = p + maplen;
+               if (osdc->osdmap && osdc->osdmap->epoch+1 == epoch) {
+                       dout("applying incremental map %u len %d\n",
+                            epoch, maplen);
+                       newmap = osdmap_apply_incremental(&p, next,
+                                                         osdc->osdmap,
+                                                         osdc->client->msgr);
+                       if (IS_ERR(newmap)) {
+                               err = PTR_ERR(newmap);
+                               goto bad;
+                       }
+                       BUG_ON(!newmap);
+                       if (newmap != osdc->osdmap) {
+                               ceph_osdmap_destroy(osdc->osdmap);
+                               osdc->osdmap = newmap;
+                       }
+               } else {
+                       dout("ignoring incremental map %u len %d\n",
+                            epoch, maplen);
+               }
+               p = next;
+               nr_maps--;
+       }
+       if (newmap)
+               goto done;
+
+       /* full maps */
+       ceph_decode_32_safe(&p, end, nr_maps, bad);
+       dout(" %d full maps\n", nr_maps);
+       while (nr_maps) {
+               ceph_decode_need(&p, end, 2*sizeof(u32), bad);
+               epoch = ceph_decode_32(&p);
+               maplen = ceph_decode_32(&p);
+               ceph_decode_need(&p, end, maplen, bad);
+               if (nr_maps > 1) {
+                       dout("skipping non-latest full map %u len %d\n",
+                            epoch, maplen);
+               } else if (osdc->osdmap && osdc->osdmap->epoch >= epoch) {
+                       dout("skipping full map %u len %d, "
+                            "older than our %u\n", epoch, maplen,
+                            osdc->osdmap->epoch);
+               } else {
+                       dout("taking full map %u len %d\n", epoch, maplen);
+                       newmap = osdmap_decode(&p, p+maplen);
+                       if (IS_ERR(newmap)) {
+                               err = PTR_ERR(newmap);
+                               goto bad;
+                       }
+                       BUG_ON(!newmap);
+                       oldmap = osdc->osdmap;
+                       osdc->osdmap = newmap;
+                       if (oldmap)
+                               ceph_osdmap_destroy(oldmap);
+               }
+               p += maplen;
+               nr_maps--;
+       }
+
+done:
+       downgrade_write(&osdc->map_sem);
+       ceph_monc_got_osdmap(&osdc->client->monc, osdc->osdmap->epoch);
+       if (newmap)
+               kick_requests(osdc, NULL);
+       up_read(&osdc->map_sem);
+       wake_up_all(&osdc->client->auth_wq);
+       return;
+
+bad:
+       pr_err("osdc handle_map corrupt msg\n");
+       ceph_msg_dump(msg);
+       up_write(&osdc->map_sem);
+       return;
+}
+
+/*
+ * Register request, send initial attempt.
+ */
+int ceph_osdc_start_request(struct ceph_osd_client *osdc,
+                           struct ceph_osd_request *req,
+                           bool nofail)
+{
+       int rc = 0;
+
+       req->r_request->pages = req->r_pages;
+       req->r_request->nr_pages = req->r_num_pages;
+#ifdef CONFIG_BLOCK
+       req->r_request->bio = req->r_bio;
+#endif
+       req->r_request->trail = req->r_trail;
+
+       register_request(osdc, req);
+
+       down_read(&osdc->map_sem);
+       mutex_lock(&osdc->request_mutex);
+       /*
+        * a racing kick_requests() may have sent the message for us
+        * while we dropped request_mutex above, so only send now if
+        * the request still han't been touched yet.
+        */
+       if (req->r_sent == 0) {
+               rc = __send_request(osdc, req);
+               if (rc) {
+                       if (nofail) {
+                               dout("osdc_start_request failed send, "
+                                    " marking %lld\n", req->r_tid);
+                               req->r_resend = true;
+                               rc = 0;
+                       } else {
+                               __unregister_request(osdc, req);
+                       }
+               }
+       }
+       mutex_unlock(&osdc->request_mutex);
+       up_read(&osdc->map_sem);
+       return rc;
+}
+EXPORT_SYMBOL(ceph_osdc_start_request);
+
+/*
+ * wait for a request to complete
+ */
+int ceph_osdc_wait_request(struct ceph_osd_client *osdc,
+                          struct ceph_osd_request *req)
+{
+       int rc;
+
+       rc = wait_for_completion_interruptible(&req->r_completion);
+       if (rc < 0) {
+               mutex_lock(&osdc->request_mutex);
+               __cancel_request(req);
+               __unregister_request(osdc, req);
+               mutex_unlock(&osdc->request_mutex);
+               dout("wait_request tid %llu canceled/timed out\n", req->r_tid);
+               return rc;
+       }
+
+       dout("wait_request tid %llu result %d\n", req->r_tid, req->r_result);
+       return req->r_result;
+}
+EXPORT_SYMBOL(ceph_osdc_wait_request);
+
+/*
+ * sync - wait for all in-flight requests to flush.  avoid starvation.
+ */
+void ceph_osdc_sync(struct ceph_osd_client *osdc)
+{
+       struct ceph_osd_request *req;
+       u64 last_tid, next_tid = 0;
+
+       mutex_lock(&osdc->request_mutex);
+       last_tid = osdc->last_tid;
+       while (1) {
+               req = __lookup_request_ge(osdc, next_tid);
+               if (!req)
+                       break;
+               if (req->r_tid > last_tid)
+                       break;
+
+               next_tid = req->r_tid + 1;
+               if ((req->r_flags & CEPH_OSD_FLAG_WRITE) == 0)
+                       continue;
+
+               ceph_osdc_get_request(req);
+               mutex_unlock(&osdc->request_mutex);
+               dout("sync waiting on tid %llu (last is %llu)\n",
+                    req->r_tid, last_tid);
+               wait_for_completion(&req->r_safe_completion);
+               mutex_lock(&osdc->request_mutex);
+               ceph_osdc_put_request(req);
+       }
+       mutex_unlock(&osdc->request_mutex);
+       dout("sync done (thru tid %llu)\n", last_tid);
+}
+EXPORT_SYMBOL(ceph_osdc_sync);
+
+/*
+ * init, shutdown
+ */
+int ceph_osdc_init(struct ceph_osd_client *osdc, struct ceph_client *client)
+{
+       int err;
+
+       dout("init\n");
+       osdc->client = client;
+       osdc->osdmap = NULL;
+       init_rwsem(&osdc->map_sem);
+       init_completion(&osdc->map_waiters);
+       osdc->last_requested_map = 0;
+       mutex_init(&osdc->request_mutex);
+       osdc->last_tid = 0;
+       osdc->osds = RB_ROOT;
+       INIT_LIST_HEAD(&osdc->osd_lru);
+       osdc->requests = RB_ROOT;
+       INIT_LIST_HEAD(&osdc->req_lru);
+       osdc->num_requests = 0;
+       INIT_DELAYED_WORK(&osdc->timeout_work, handle_timeout);
+       INIT_DELAYED_WORK(&osdc->osds_timeout_work, handle_osds_timeout);
+
+       schedule_delayed_work(&osdc->osds_timeout_work,
+          round_jiffies_relative(osdc->client->options->osd_idle_ttl * HZ));
+
+       err = -ENOMEM;
+       osdc->req_mempool = mempool_create_kmalloc_pool(10,
+                                       sizeof(struct ceph_osd_request));
+       if (!osdc->req_mempool)
+               goto out;
+
+       err = ceph_msgpool_init(&osdc->msgpool_op, OSD_OP_FRONT_LEN, 10, true,
+                               "osd_op");
+       if (err < 0)
+               goto out_mempool;
+       err = ceph_msgpool_init(&osdc->msgpool_op_reply,
+                               OSD_OPREPLY_FRONT_LEN, 10, true,
+                               "osd_op_reply");
+       if (err < 0)
+               goto out_msgpool;
+       return 0;
+
+out_msgpool:
+       ceph_msgpool_destroy(&osdc->msgpool_op);
+out_mempool:
+       mempool_destroy(osdc->req_mempool);
+out:
+       return err;
+}
+EXPORT_SYMBOL(ceph_osdc_init);
+
+void ceph_osdc_stop(struct ceph_osd_client *osdc)
+{
+       cancel_delayed_work_sync(&osdc->timeout_work);
+       cancel_delayed_work_sync(&osdc->osds_timeout_work);
+       if (osdc->osdmap) {
+               ceph_osdmap_destroy(osdc->osdmap);
+               osdc->osdmap = NULL;
+       }
+       remove_old_osds(osdc, 1);
+       mempool_destroy(osdc->req_mempool);
+       ceph_msgpool_destroy(&osdc->msgpool_op);
+       ceph_msgpool_destroy(&osdc->msgpool_op_reply);
+}
+EXPORT_SYMBOL(ceph_osdc_stop);
+
+/*
+ * Read some contiguous pages.  If we cross a stripe boundary, shorten
+ * *plen.  Return number of bytes read, or error.
+ */
+int ceph_osdc_readpages(struct ceph_osd_client *osdc,
+                       struct ceph_vino vino, struct ceph_file_layout *layout,
+                       u64 off, u64 *plen,
+                       u32 truncate_seq, u64 truncate_size,
+                       struct page **pages, int num_pages)
+{
+       struct ceph_osd_request *req;
+       int rc = 0;
+
+       dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
+            vino.snap, off, *plen);
+       req = ceph_osdc_new_request(osdc, layout, vino, off, plen,
+                                   CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
+                                   NULL, 0, truncate_seq, truncate_size, NULL,
+                                   false, 1);
+       if (!req)
+               return -ENOMEM;
+
+       /* it may be a short read due to an object boundary */
+       req->r_pages = pages;
+
+       dout("readpages  final extent is %llu~%llu (%d pages)\n",
+            off, *plen, req->r_num_pages);
+
+       rc = ceph_osdc_start_request(osdc, req, false);
+       if (!rc)
+               rc = ceph_osdc_wait_request(osdc, req);
+
+       ceph_osdc_put_request(req);
+       dout("readpages result %d\n", rc);
+       return rc;
+}
+EXPORT_SYMBOL(ceph_osdc_readpages);
+
+/*
+ * do a synchronous write on N pages
+ */
+int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_vino vino,
+                        struct ceph_file_layout *layout,
+                        struct ceph_snap_context *snapc,
+                        u64 off, u64 len,
+                        u32 truncate_seq, u64 truncate_size,
+                        struct timespec *mtime,
+                        struct page **pages, int num_pages,
+                        int flags, int do_sync, bool nofail)
+{
+       struct ceph_osd_request *req;
+       int rc = 0;
+
+       BUG_ON(vino.snap != CEPH_NOSNAP);
+       req = ceph_osdc_new_request(osdc, layout, vino, off, &len,
+                                   CEPH_OSD_OP_WRITE,
+                                   flags | CEPH_OSD_FLAG_ONDISK |
+                                           CEPH_OSD_FLAG_WRITE,
+                                   snapc, do_sync,
+                                   truncate_seq, truncate_size, mtime,
+                                   nofail, 1);
+       if (!req)
+               return -ENOMEM;
+
+       /* it may be a short write due to an object boundary */
+       req->r_pages = pages;
+       dout("writepages %llu~%llu (%d pages)\n", off, len,
+            req->r_num_pages);
+
+       rc = ceph_osdc_start_request(osdc, req, nofail);
+       if (!rc)
+               rc = ceph_osdc_wait_request(osdc, req);
+
+       ceph_osdc_put_request(req);
+       if (rc == 0)
+               rc = len;
+       dout("writepages result %d\n", rc);
+       return rc;
+}
+EXPORT_SYMBOL(ceph_osdc_writepages);
+
+/*
+ * handle incoming message
+ */
+static void dispatch(struct ceph_connection *con, struct ceph_msg *msg)
+{
+       struct ceph_osd *osd = con->private;
+       struct ceph_osd_client *osdc;
+       int type = le16_to_cpu(msg->hdr.type);
+
+       if (!osd)
+               goto out;
+       osdc = osd->o_osdc;
+
+       switch (type) {
+       case CEPH_MSG_OSD_MAP:
+               ceph_osdc_handle_map(osdc, msg);
+               break;
+       case CEPH_MSG_OSD_OPREPLY:
+               handle_reply(osdc, msg, con);
+               break;
+
+       default:
+               pr_err("received unknown message type %d %s\n", type,
+                      ceph_msg_type_name(type));
+       }
+out:
+       ceph_msg_put(msg);
+}
+
+/*
+ * lookup and return message for incoming reply.  set up reply message
+ * pages.
+ */
+static struct ceph_msg *get_reply(struct ceph_connection *con,
+                                 struct ceph_msg_header *hdr,
+                                 int *skip)
+{
+       struct ceph_osd *osd = con->private;
+       struct ceph_osd_client *osdc = osd->o_osdc;
+       struct ceph_msg *m;
+       struct ceph_osd_request *req;
+       int front = le32_to_cpu(hdr->front_len);
+       int data_len = le32_to_cpu(hdr->data_len);
+       u64 tid;
+
+       tid = le64_to_cpu(hdr->tid);
+       mutex_lock(&osdc->request_mutex);
+       req = __lookup_request(osdc, tid);
+       if (!req) {
+               *skip = 1;
+               m = NULL;
+               pr_info("get_reply unknown tid %llu from osd%d\n", tid,
+                       osd->o_osd);
+               goto out;
+       }
+
+       if (req->r_con_filling_msg) {
+               dout("get_reply revoking msg %p from old con %p\n",
+                    req->r_reply, req->r_con_filling_msg);
+               ceph_con_revoke_message(req->r_con_filling_msg, req->r_reply);
+               ceph_con_put(req->r_con_filling_msg);
+               req->r_con_filling_msg = NULL;
+       }
+
+       if (front > req->r_reply->front.iov_len) {
+               pr_warning("get_reply front %d > preallocated %d\n",
+                          front, (int)req->r_reply->front.iov_len);
+               m = ceph_msg_new(CEPH_MSG_OSD_OPREPLY, front, GFP_NOFS);
+               if (!m)
+                       goto out;
+               ceph_msg_put(req->r_reply);
+               req->r_reply = m;
+       }
+       m = ceph_msg_get(req->r_reply);
+
+       if (data_len > 0) {
+               unsigned data_off = le16_to_cpu(hdr->data_off);
+               int want = calc_pages_for(data_off & ~PAGE_MASK, data_len);
+
+               if (unlikely(req->r_num_pages < want)) {
+                       pr_warning("tid %lld reply %d > expected %d pages\n",
+                                  tid, want, m->nr_pages);
+                       *skip = 1;
+                       ceph_msg_put(m);
+                       m = NULL;
+                       goto out;
+               }
+               m->pages = req->r_pages;
+               m->nr_pages = req->r_num_pages;
+#ifdef CONFIG_BLOCK
+               m->bio = req->r_bio;
+#endif
+       }
+       *skip = 0;
+       req->r_con_filling_msg = ceph_con_get(con);
+       dout("get_reply tid %lld %p\n", tid, m);
+
+out:
+       mutex_unlock(&osdc->request_mutex);
+       return m;
+
+}
+
+static struct ceph_msg *alloc_msg(struct ceph_connection *con,
+                                 struct ceph_msg_header *hdr,
+                                 int *skip)
+{
+       struct ceph_osd *osd = con->private;
+       int type = le16_to_cpu(hdr->type);
+       int front = le32_to_cpu(hdr->front_len);
+
+       switch (type) {
+       case CEPH_MSG_OSD_MAP:
+               return ceph_msg_new(type, front, GFP_NOFS);
+       case CEPH_MSG_OSD_OPREPLY:
+               return get_reply(con, hdr, skip);
+       default:
+               pr_info("alloc_msg unexpected msg type %d from osd%d\n", type,
+                       osd->o_osd);
+               *skip = 1;
+               return NULL;
+       }
+}
+
+/*
+ * Wrappers to refcount containing ceph_osd struct
+ */
+static struct ceph_connection *get_osd_con(struct ceph_connection *con)
+{
+       struct ceph_osd *osd = con->private;
+       if (get_osd(osd))
+               return con;
+       return NULL;
+}
+
+static void put_osd_con(struct ceph_connection *con)
+{
+       struct ceph_osd *osd = con->private;
+       put_osd(osd);
+}
+
+/*
+ * authentication
+ */
+static int get_authorizer(struct ceph_connection *con,
+                         void **buf, int *len, int *proto,
+                         void **reply_buf, int *reply_len, int force_new)
+{
+       struct ceph_osd *o = con->private;
+       struct ceph_osd_client *osdc = o->o_osdc;
+       struct ceph_auth_client *ac = osdc->client->monc.auth;
+       int ret = 0;
+
+       if (force_new && o->o_authorizer) {
+               ac->ops->destroy_authorizer(ac, o->o_authorizer);
+               o->o_authorizer = NULL;
+       }
+       if (o->o_authorizer == NULL) {
+               ret = ac->ops->create_authorizer(
+                       ac, CEPH_ENTITY_TYPE_OSD,
+                       &o->o_authorizer,
+                       &o->o_authorizer_buf,
+                       &o->o_authorizer_buf_len,
+                       &o->o_authorizer_reply_buf,
+                       &o->o_authorizer_reply_buf_len);
+               if (ret)
+                       return ret;
+       }
+
+       *proto = ac->protocol;
+       *buf = o->o_authorizer_buf;
+       *len = o->o_authorizer_buf_len;
+       *reply_buf = o->o_authorizer_reply_buf;
+       *reply_len = o->o_authorizer_reply_buf_len;
+       return 0;
+}
+
+
+static int verify_authorizer_reply(struct ceph_connection *con, int len)
+{
+       struct ceph_osd *o = con->private;
+       struct ceph_osd_client *osdc = o->o_osdc;
+       struct ceph_auth_client *ac = osdc->client->monc.auth;
+
+       return ac->ops->verify_authorizer_reply(ac, o->o_authorizer, len);
+}
+
+static int invalidate_authorizer(struct ceph_connection *con)
+{
+       struct ceph_osd *o = con->private;
+       struct ceph_osd_client *osdc = o->o_osdc;
+       struct ceph_auth_client *ac = osdc->client->monc.auth;
+
+       if (ac->ops->invalidate_authorizer)
+               ac->ops->invalidate_authorizer(ac, CEPH_ENTITY_TYPE_OSD);
+
+       return ceph_monc_validate_auth(&osdc->client->monc);
+}
+
+static const struct ceph_connection_operations osd_con_ops = {
+       .get = get_osd_con,
+       .put = put_osd_con,
+       .dispatch = dispatch,
+       .get_authorizer = get_authorizer,
+       .verify_authorizer_reply = verify_authorizer_reply,
+       .invalidate_authorizer = invalidate_authorizer,
+       .alloc_msg = alloc_msg,
+       .fault = osd_reset,
+};
diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c

new file mode 100644 (file)

index 0000000..d73f3f6
--- /dev/null
+++ b/net/ceph/osdmap.c
@@ -0,0 +1,1128 @@
+
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/div64.h>
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/osdmap.h>
+#include <linux/ceph/decode.h>
+#include <linux/crush/hash.h>
+#include <linux/crush/mapper.h>
+
+char *ceph_osdmap_state_str(char *str, int len, int state)
+{
+       int flag = 0;
+
+       if (!len)
+               goto done;
+
+       *str = '\0';
+       if (state) {
+               if (state & CEPH_OSD_EXISTS) {
+                       snprintf(str, len, "exists");
+                       flag = 1;
+               }
+               if (state & CEPH_OSD_UP) {
+                       snprintf(str, len, "%s%s%s", str, (flag ? ", " : ""),
+                                "up");
+                       flag = 1;
+               }
+       } else {
+               snprintf(str, len, "doesn't exist");
+       }
+done:
+       return str;
+}
+
+/* maps */
+
+static int calc_bits_of(unsigned t)
+{
+       int b = 0;
+       while (t) {
+               t = t >> 1;
+               b++;
+       }
+       return b;
+}
+
+/*
+ * the foo_mask is the smallest value 2^n-1 that is >= foo.
+ */
+static void calc_pg_masks(struct ceph_pg_pool_info *pi)
+{
+       pi->pg_num_mask = (1 << calc_bits_of(le32_to_cpu(pi->v.pg_num)-1)) - 1;
+       pi->pgp_num_mask =
+               (1 << calc_bits_of(le32_to_cpu(pi->v.pgp_num)-1)) - 1;
+       pi->lpg_num_mask =
+               (1 << calc_bits_of(le32_to_cpu(pi->v.lpg_num)-1)) - 1;
+       pi->lpgp_num_mask =
+               (1 << calc_bits_of(le32_to_cpu(pi->v.lpgp_num)-1)) - 1;
+}
+
+/*
+ * decode crush map
+ */
+static int crush_decode_uniform_bucket(void **p, void *end,
+                                      struct crush_bucket_uniform *b)
+{
+       dout("crush_decode_uniform_bucket %p to %p\n", *p, end);
+       ceph_decode_need(p, end, (1+b->h.size) * sizeof(u32), bad);
+       b->item_weight = ceph_decode_32(p);
+       return 0;
+bad:
+       return -EINVAL;
+}
+
+static int crush_decode_list_bucket(void **p, void *end,
+                                   struct crush_bucket_list *b)
+{
+       int j;
+       dout("crush_decode_list_bucket %p to %p\n", *p, end);
+       b->item_weights = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
+       if (b->item_weights == NULL)
+               return -ENOMEM;
+       b->sum_weights = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
+       if (b->sum_weights == NULL)
+               return -ENOMEM;
+       ceph_decode_need(p, end, 2 * b->h.size * sizeof(u32), bad);
+       for (j = 0; j < b->h.size; j++) {
+               b->item_weights[j] = ceph_decode_32(p);
+               b->sum_weights[j] = ceph_decode_32(p);
+       }
+       return 0;
+bad:
+       return -EINVAL;
+}
+
+static int crush_decode_tree_bucket(void **p, void *end,
+                                   struct crush_bucket_tree *b)
+{
+       int j;
+       dout("crush_decode_tree_bucket %p to %p\n", *p, end);
+       ceph_decode_32_safe(p, end, b->num_nodes, bad);
+       b->node_weights = kcalloc(b->num_nodes, sizeof(u32), GFP_NOFS);
+       if (b->node_weights == NULL)
+               return -ENOMEM;
+       ceph_decode_need(p, end, b->num_nodes * sizeof(u32), bad);
+       for (j = 0; j < b->num_nodes; j++)
+               b->node_weights[j] = ceph_decode_32(p);
+       return 0;
+bad:
+       return -EINVAL;
+}
+
+static int crush_decode_straw_bucket(void **p, void *end,
+                                    struct crush_bucket_straw *b)
+{
+       int j;
+       dout("crush_decode_straw_bucket %p to %p\n", *p, end);
+       b->item_weights = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
+       if (b->item_weights == NULL)
+               return -ENOMEM;
+       b->straws = kcalloc(b->h.size, sizeof(u32), GFP_NOFS);
+       if (b->straws == NULL)
+               return -ENOMEM;
+       ceph_decode_need(p, end, 2 * b->h.size * sizeof(u32), bad);
+       for (j = 0; j < b->h.size; j++) {
+               b->item_weights[j] = ceph_decode_32(p);
+               b->straws[j] = ceph_decode_32(p);
+       }
+       return 0;
+bad:
+       return -EINVAL;
+}
+
+static struct crush_map *crush_decode(void *pbyval, void *end)
+{
+       struct crush_map *c;
+       int err = -EINVAL;
+       int i, j;
+       void **p = &pbyval;
+       void *start = pbyval;
+       u32 magic;
+
+       dout("crush_decode %p to %p len %d\n", *p, end, (int)(end - *p));
+
+       c = kzalloc(sizeof(*c), GFP_NOFS);
+       if (c == NULL)
+               return ERR_PTR(-ENOMEM);
+
+       ceph_decode_need(p, end, 4*sizeof(u32), bad);
+       magic = ceph_decode_32(p);
+       if (magic != CRUSH_MAGIC) {
+               pr_err("crush_decode magic %x != current %x\n",
+                      (unsigned)magic, (unsigned)CRUSH_MAGIC);
+               goto bad;
+       }
+       c->max_buckets = ceph_decode_32(p);
+       c->max_rules = ceph_decode_32(p);
+       c->max_devices = ceph_decode_32(p);
+
+       c->device_parents = kcalloc(c->max_devices, sizeof(u32), GFP_NOFS);
+       if (c->device_parents == NULL)
+               goto badmem;
+       c->bucket_parents = kcalloc(c->max_buckets, sizeof(u32), GFP_NOFS);
+       if (c->bucket_parents == NULL)
+               goto badmem;
+
+       c->buckets = kcalloc(c->max_buckets, sizeof(*c->buckets), GFP_NOFS);
+       if (c->buckets == NULL)
+               goto badmem;
+       c->rules = kcalloc(c->max_rules, sizeof(*c->rules), GFP_NOFS);
+       if (c->rules == NULL)
+               goto badmem;
+
+       /* buckets */
+       for (i = 0; i < c->max_buckets; i++) {
+               int size = 0;
+               u32 alg;
+               struct crush_bucket *b;
+
+               ceph_decode_32_safe(p, end, alg, bad);
+               if (alg == 0) {
+                       c->buckets[i] = NULL;
+                       continue;
+               }
+               dout("crush_decode bucket %d off %x %p to %p\n",
+                    i, (int)(*p-start), *p, end);
+
+               switch (alg) {
+               case CRUSH_BUCKET_UNIFORM:
+                       size = sizeof(struct crush_bucket_uniform);
+                       break;
+               case CRUSH_BUCKET_LIST:
+                       size = sizeof(struct crush_bucket_list);
+                       break;
+               case CRUSH_BUCKET_TREE:
+                       size = sizeof(struct crush_bucket_tree);
+                       break;
+               case CRUSH_BUCKET_STRAW:
+                       size = sizeof(struct crush_bucket_straw);
+                       break;
+               default:
+                       err = -EINVAL;
+                       goto bad;
+               }
+               BUG_ON(size == 0);
+               b = c->buckets[i] = kzalloc(size, GFP_NOFS);
+               if (b == NULL)
+                       goto badmem;
+
+               ceph_decode_need(p, end, 4*sizeof(u32), bad);
+               b->id = ceph_decode_32(p);
+               b->type = ceph_decode_16(p);
+               b->alg = ceph_decode_8(p);
+               b->hash = ceph_decode_8(p);
+               b->weight = ceph_decode_32(p);
+               b->size = ceph_decode_32(p);
+
+               dout("crush_decode bucket size %d off %x %p to %p\n",
+                    b->size, (int)(*p-start), *p, end);
+
+               b->items = kcalloc(b->size, sizeof(__s32), GFP_NOFS);
+               if (b->items == NULL)
+                       goto badmem;
+               b->perm = kcalloc(b->size, sizeof(u32), GFP_NOFS);
+               if (b->perm == NULL)
+                       goto badmem;
+               b->perm_n = 0;
+
+               ceph_decode_need(p, end, b->size*sizeof(u32), bad);
+               for (j = 0; j < b->size; j++)
+                       b->items[j] = ceph_decode_32(p);
+
+               switch (b->alg) {
+               case CRUSH_BUCKET_UNIFORM:
+                       err = crush_decode_uniform_bucket(p, end,
+                                 (struct crush_bucket_uniform *)b);
+                       if (err < 0)
+                               goto bad;
+                       break;
+               case CRUSH_BUCKET_LIST:
+                       err = crush_decode_list_bucket(p, end,
+                              (struct crush_bucket_list *)b);
+                       if (err < 0)
+                               goto bad;
+                       break;
+               case CRUSH_BUCKET_TREE:
+                       err = crush_decode_tree_bucket(p, end,
+                               (struct crush_bucket_tree *)b);
+                       if (err < 0)
+                               goto bad;
+                       break;
+               case CRUSH_BUCKET_STRAW:
+                       err = crush_decode_straw_bucket(p, end,
+                               (struct crush_bucket_straw *)b);
+                       if (err < 0)
+                               goto bad;
+                       break;
+               }
+       }
+
+       /* rules */
+       dout("rule vec is %p\n", c->rules);
+       for (i = 0; i < c->max_rules; i++) {
+               u32 yes;
+               struct crush_rule *r;
+
+               ceph_decode_32_safe(p, end, yes, bad);
+               if (!yes) {
+                       dout("crush_decode NO rule %d off %x %p to %p\n",
+                            i, (int)(*p-start), *p, end);
+                       c->rules[i] = NULL;
+                       continue;
+               }
+
+               dout("crush_decode rule %d off %x %p to %p\n",
+                    i, (int)(*p-start), *p, end);
+
+               /* len */
+               ceph_decode_32_safe(p, end, yes, bad);
+#if BITS_PER_LONG == 32
+               err = -EINVAL;
+               if (yes > ULONG_MAX / sizeof(struct crush_rule_step))
+                       goto bad;
+#endif
+               r = c->rules[i] = kmalloc(sizeof(*r) +
+                                         yes*sizeof(struct crush_rule_step),
+                                         GFP_NOFS);
+               if (r == NULL)
+                       goto badmem;
+               dout(" rule %d is at %p\n", i, r);
+               r->len = yes;
+               ceph_decode_copy_safe(p, end, &r->mask, 4, bad); /* 4 u8's */
+               ceph_decode_need(p, end, r->len*3*sizeof(u32), bad);
+               for (j = 0; j < r->len; j++) {
+                       r->steps[j].op = ceph_decode_32(p);
+                       r->steps[j].arg1 = ceph_decode_32(p);
+                       r->steps[j].arg2 = ceph_decode_32(p);
+               }
+       }
+
+       /* ignore trailing name maps. */
+
+       dout("crush_decode success\n");
+       return c;
+
+badmem:
+       err = -ENOMEM;
+bad:
+       dout("crush_decode fail %d\n", err);
+       crush_destroy(c);
+       return ERR_PTR(err);
+}
+
+/*
+ * rbtree of pg_mapping for handling pg_temp (explicit mapping of pgid
+ * to a set of osds)
+ */
+static int pgid_cmp(struct ceph_pg l, struct ceph_pg r)
+{
+       u64 a = *(u64 *)&l;
+       u64 b = *(u64 *)&r;
+
+       if (a < b)
+               return -1;
+       if (a > b)
+               return 1;
+       return 0;
+}
+
+static int __insert_pg_mapping(struct ceph_pg_mapping *new,
+                              struct rb_root *root)
+{
+       struct rb_node **p = &root->rb_node;
+       struct rb_node *parent = NULL;
+       struct ceph_pg_mapping *pg = NULL;
+       int c;
+
+       while (*p) {
+               parent = *p;
+               pg = rb_entry(parent, struct ceph_pg_mapping, node);
+               c = pgid_cmp(new->pgid, pg->pgid);
+               if (c < 0)
+                       p = &(*p)->rb_left;
+               else if (c > 0)
+                       p = &(*p)->rb_right;
+               else
+                       return -EEXIST;
+       }
+
+       rb_link_node(&new->node, parent, p);
+       rb_insert_color(&new->node, root);
+       return 0;
+}
+
+static struct ceph_pg_mapping *__lookup_pg_mapping(struct rb_root *root,
+                                                  struct ceph_pg pgid)
+{
+       struct rb_node *n = root->rb_node;
+       struct ceph_pg_mapping *pg;
+       int c;
+
+       while (n) {
+               pg = rb_entry(n, struct ceph_pg_mapping, node);
+               c = pgid_cmp(pgid, pg->pgid);
+               if (c < 0)
+                       n = n->rb_left;
+               else if (c > 0)
+                       n = n->rb_right;
+               else
+                       return pg;
+       }
+       return NULL;
+}
+
+/*
+ * rbtree of pg pool info
+ */
+static int __insert_pg_pool(struct rb_root *root, struct ceph_pg_pool_info *new)
+{
+       struct rb_node **p = &root->rb_node;
+       struct rb_node *parent = NULL;
+       struct ceph_pg_pool_info *pi = NULL;
+
+       while (*p) {
+               parent = *p;
+               pi = rb_entry(parent, struct ceph_pg_pool_info, node);
+               if (new->id < pi->id)
+                       p = &(*p)->rb_left;
+               else if (new->id > pi->id)
+                       p = &(*p)->rb_right;
+               else
+                       return -EEXIST;
+       }
+
+       rb_link_node(&new->node, parent, p);
+       rb_insert_color(&new->node, root);
+       return 0;
+}
+
+static struct ceph_pg_pool_info *__lookup_pg_pool(struct rb_root *root, int id)
+{
+       struct ceph_pg_pool_info *pi;
+       struct rb_node *n = root->rb_node;
+
+       while (n) {
+               pi = rb_entry(n, struct ceph_pg_pool_info, node);
+               if (id < pi->id)
+                       n = n->rb_left;
+               else if (id > pi->id)
+                       n = n->rb_right;
+               else
+                       return pi;
+       }
+       return NULL;
+}
+
+int ceph_pg_poolid_by_name(struct ceph_osdmap *map, const char *name)
+{
+       struct rb_node *rbp;
+
+       for (rbp = rb_first(&map->pg_pools); rbp; rbp = rb_next(rbp)) {
+               struct ceph_pg_pool_info *pi =
+                       rb_entry(rbp, struct ceph_pg_pool_info, node);
+               if (pi->name && strcmp(pi->name, name) == 0)
+                       return pi->id;
+       }
+       return -ENOENT;
+}
+EXPORT_SYMBOL(ceph_pg_poolid_by_name);
+
+static void __remove_pg_pool(struct rb_root *root, struct ceph_pg_pool_info *pi)
+{
+       rb_erase(&pi->node, root);
+       kfree(pi->name);
+       kfree(pi);
+}
+
+static int __decode_pool(void **p, void *end, struct ceph_pg_pool_info *pi)
+{
+       unsigned n, m;
+
+       ceph_decode_copy(p, &pi->v, sizeof(pi->v));
+       calc_pg_masks(pi);
+
+       /* num_snaps * snap_info_t */
+       n = le32_to_cpu(pi->v.num_snaps);
+       while (n--) {
+               ceph_decode_need(p, end, sizeof(u64) + 1 + sizeof(u64) +
+                                sizeof(struct ceph_timespec), bad);
+               *p += sizeof(u64) +       /* key */
+                       1 + sizeof(u64) + /* u8, snapid */
+                       sizeof(struct ceph_timespec);
+               m = ceph_decode_32(p);    /* snap name */
+               *p += m;
+       }
+
+       *p += le32_to_cpu(pi->v.num_removed_snap_intervals) * sizeof(u64) * 2;
+       return 0;
+
+bad:
+       return -EINVAL;
+}
+
+static int __decode_pool_names(void **p, void *end, struct ceph_osdmap *map)
+{
+       struct ceph_pg_pool_info *pi;
+       u32 num, len, pool;
+
+       ceph_decode_32_safe(p, end, num, bad);
+       dout(" %d pool names\n", num);
+       while (num--) {
+               ceph_decode_32_safe(p, end, pool, bad);
+               ceph_decode_32_safe(p, end, len, bad);
+               dout("  pool %d len %d\n", pool, len);
+               pi = __lookup_pg_pool(&map->pg_pools, pool);
+               if (pi) {
+                       kfree(pi->name);
+                       pi->name = kmalloc(len + 1, GFP_NOFS);
+                       if (pi->name) {
+                               memcpy(pi->name, *p, len);
+                               pi->name[len] = '\0';
+                               dout("  name is %s\n", pi->name);
+                       }
+               }
+               *p += len;
+       }
+       return 0;
+
+bad:
+       return -EINVAL;
+}
+
+/*
+ * osd map
+ */
+void ceph_osdmap_destroy(struct ceph_osdmap *map)
+{
+       dout("osdmap_destroy %p\n", map);
+       if (map->crush)
+               crush_destroy(map->crush);
+       while (!RB_EMPTY_ROOT(&map->pg_temp)) {
+               struct ceph_pg_mapping *pg =
+                       rb_entry(rb_first(&map->pg_temp),
+                                struct ceph_pg_mapping, node);
+               rb_erase(&pg->node, &map->pg_temp);
+               kfree(pg);
+       }
+       while (!RB_EMPTY_ROOT(&map->pg_pools)) {
+               struct ceph_pg_pool_info *pi =
+                       rb_entry(rb_first(&map->pg_pools),
+                                struct ceph_pg_pool_info, node);
+               __remove_pg_pool(&map->pg_pools, pi);
+       }
+       kfree(map->osd_state);
+       kfree(map->osd_weight);
+       kfree(map->osd_addr);
+       kfree(map);
+}
+
+/*
+ * adjust max osd value.  reallocate arrays.
+ */
+static int osdmap_set_max_osd(struct ceph_osdmap *map, int max)
+{
+       u8 *state;
+       struct ceph_entity_addr *addr;
+       u32 *weight;
+
+       state = kcalloc(max, sizeof(*state), GFP_NOFS);
+       addr = kcalloc(max, sizeof(*addr), GFP_NOFS);
+       weight = kcalloc(max, sizeof(*weight), GFP_NOFS);
+       if (state == NULL || addr == NULL || weight == NULL) {
+               kfree(state);
+               kfree(addr);
+               kfree(weight);
+               return -ENOMEM;
+       }
+
+       /* copy old? */
+       if (map->osd_state) {
+               memcpy(state, map->osd_state, map->max_osd*sizeof(*state));
+               memcpy(addr, map->osd_addr, map->max_osd*sizeof(*addr));
+               memcpy(weight, map->osd_weight, map->max_osd*sizeof(*weight));
+               kfree(map->osd_state);
+               kfree(map->osd_addr);
+               kfree(map->osd_weight);
+       }
+
+       map->osd_state = state;
+       map->osd_weight = weight;
+       map->osd_addr = addr;
+       map->max_osd = max;
+       return 0;
+}
+
+/*
+ * decode a full map.
+ */
+struct ceph_osdmap *osdmap_decode(void **p, void *end)
+{
+       struct ceph_osdmap *map;
+       u16 version;
+       u32 len, max, i;
+       u8 ev;
+       int err = -EINVAL;
+       void *start = *p;
+       struct ceph_pg_pool_info *pi;
+
+       dout("osdmap_decode %p to %p len %d\n", *p, end, (int)(end - *p));
+
+       map = kzalloc(sizeof(*map), GFP_NOFS);
+       if (map == NULL)
+               return ERR_PTR(-ENOMEM);
+       map->pg_temp = RB_ROOT;
+
+       ceph_decode_16_safe(p, end, version, bad);
+       if (version > CEPH_OSDMAP_VERSION) {
+               pr_warning("got unknown v %d > %d of osdmap\n", version,
+                          CEPH_OSDMAP_VERSION);
+               goto bad;
+       }
+
+       ceph_decode_need(p, end, 2*sizeof(u64)+6*sizeof(u32), bad);
+       ceph_decode_copy(p, &map->fsid, sizeof(map->fsid));
+       map->epoch = ceph_decode_32(p);
+       ceph_decode_copy(p, &map->created, sizeof(map->created));
+       ceph_decode_copy(p, &map->modified, sizeof(map->modified));
+
+       ceph_decode_32_safe(p, end, max, bad);
+       while (max--) {
+               ceph_decode_need(p, end, 4 + 1 + sizeof(pi->v), bad);
+               pi = kzalloc(sizeof(*pi), GFP_NOFS);
+               if (!pi)
+                       goto bad;
+               pi->id = ceph_decode_32(p);
+               ev = ceph_decode_8(p); /* encoding version */
+               if (ev > CEPH_PG_POOL_VERSION) {
+                       pr_warning("got unknown v %d > %d of ceph_pg_pool\n",
+                                  ev, CEPH_PG_POOL_VERSION);
+                       kfree(pi);
+                       goto bad;
+               }
+               err = __decode_pool(p, end, pi);
+               if (err < 0)
+                       goto bad;
+               __insert_pg_pool(&map->pg_pools, pi);
+       }
+
+       if (version >= 5 && __decode_pool_names(p, end, map) < 0)
+               goto bad;
+
+       ceph_decode_32_safe(p, end, map->pool_max, bad);
+
+       ceph_decode_32_safe(p, end, map->flags, bad);
+
+       max = ceph_decode_32(p);
+
+       /* (re)alloc osd arrays */
+       err = osdmap_set_max_osd(map, max);
+       if (err < 0)
+               goto bad;
+       dout("osdmap_decode max_osd = %d\n", map->max_osd);
+
+       /* osds */
+       err = -EINVAL;
+       ceph_decode_need(p, end, 3*sizeof(u32) +
+                        map->max_osd*(1 + sizeof(*map->osd_weight) +
+                                      sizeof(*map->osd_addr)), bad);
+       *p += 4; /* skip length field (should match max) */
+       ceph_decode_copy(p, map->osd_state, map->max_osd);
+
+       *p += 4; /* skip length field (should match max) */
+       for (i = 0; i < map->max_osd; i++)
+               map->osd_weight[i] = ceph_decode_32(p);
+
+       *p += 4; /* skip length field (should match max) */
+       ceph_decode_copy(p, map->osd_addr, map->max_osd*sizeof(*map->osd_addr));
+       for (i = 0; i < map->max_osd; i++)
+               ceph_decode_addr(&map->osd_addr[i]);
+
+       /* pg_temp */
+       ceph_decode_32_safe(p, end, len, bad);
+       for (i = 0; i < len; i++) {
+               int n, j;
+               struct ceph_pg pgid;
+               struct ceph_pg_mapping *pg;
+
+               ceph_decode_need(p, end, sizeof(u32) + sizeof(u64), bad);
+               ceph_decode_copy(p, &pgid, sizeof(pgid));
+               n = ceph_decode_32(p);
+               ceph_decode_need(p, end, n * sizeof(u32), bad);
+               err = -ENOMEM;
+               pg = kmalloc(sizeof(*pg) + n*sizeof(u32), GFP_NOFS);
+               if (!pg)
+                       goto bad;
+               pg->pgid = pgid;
+               pg->len = n;
+               for (j = 0; j < n; j++)
+                       pg->osds[j] = ceph_decode_32(p);
+
+               err = __insert_pg_mapping(pg, &map->pg_temp);
+               if (err)
+                       goto bad;
+               dout(" added pg_temp %llx len %d\n", *(u64 *)&pgid, len);
+       }
+
+       /* crush */
+       ceph_decode_32_safe(p, end, len, bad);
+       dout("osdmap_decode crush len %d from off 0x%x\n", len,
+            (int)(*p - start));
+       ceph_decode_need(p, end, len, bad);
+       map->crush = crush_decode(*p, end);
+       *p += len;
+       if (IS_ERR(map->crush)) {
+               err = PTR_ERR(map->crush);
+               map->crush = NULL;
+               goto bad;
+       }
+
+       /* ignore the rest of the map */
+       *p = end;
+
+       dout("osdmap_decode done %p %p\n", *p, end);
+       return map;
+
+bad:
+       dout("osdmap_decode fail\n");
+       ceph_osdmap_destroy(map);
+       return ERR_PTR(err);
+}
+
+/*
+ * decode and apply an incremental map update.
+ */
+struct ceph_osdmap *osdmap_apply_incremental(void **p, void *end,
+                                            struct ceph_osdmap *map,
+                                            struct ceph_messenger *msgr)
+{
+       struct crush_map *newcrush = NULL;
+       struct ceph_fsid fsid;
+       u32 epoch = 0;
+       struct ceph_timespec modified;
+       u32 len, pool;
+       __s32 new_pool_max, new_flags, max;
+       void *start = *p;
+       int err = -EINVAL;
+       u16 version;
+       struct rb_node *rbp;
+
+       ceph_decode_16_safe(p, end, version, bad);
+       if (version > CEPH_OSDMAP_INC_VERSION) {
+               pr_warning("got unknown v %d > %d of inc osdmap\n", version,
+                          CEPH_OSDMAP_INC_VERSION);
+               goto bad;
+       }
+
+       ceph_decode_need(p, end, sizeof(fsid)+sizeof(modified)+2*sizeof(u32),
+                        bad);
+       ceph_decode_copy(p, &fsid, sizeof(fsid));
+       epoch = ceph_decode_32(p);
+       BUG_ON(epoch != map->epoch+1);
+       ceph_decode_copy(p, &modified, sizeof(modified));
+       new_pool_max = ceph_decode_32(p);
+       new_flags = ceph_decode_32(p);
+
+       /* full map? */
+       ceph_decode_32_safe(p, end, len, bad);
+       if (len > 0) {
+               dout("apply_incremental full map len %d, %p to %p\n",
+                    len, *p, end);
+               return osdmap_decode(p, min(*p+len, end));
+       }
+
+       /* new crush? */
+       ceph_decode_32_safe(p, end, len, bad);
+       if (len > 0) {
+               dout("apply_incremental new crush map len %d, %p to %p\n",
+                    len, *p, end);
+               newcrush = crush_decode(*p, min(*p+len, end));
+               if (IS_ERR(newcrush))
+                       return ERR_CAST(newcrush);
+               *p += len;
+       }
+
+       /* new flags? */
+       if (new_flags >= 0)
+               map->flags = new_flags;
+       if (new_pool_max >= 0)
+               map->pool_max = new_pool_max;
+
+       ceph_decode_need(p, end, 5*sizeof(u32), bad);
+
+       /* new max? */
+       max = ceph_decode_32(p);
+       if (max >= 0) {
+               err = osdmap_set_max_osd(map, max);
+               if (err < 0)
+                       goto bad;
+       }
+
+       map->epoch++;
+       map->modified = map->modified;
+       if (newcrush) {
+               if (map->crush)
+                       crush_destroy(map->crush);
+               map->crush = newcrush;
+               newcrush = NULL;
+       }
+
+       /* new_pool */
+       ceph_decode_32_safe(p, end, len, bad);
+       while (len--) {
+               __u8 ev;
+               struct ceph_pg_pool_info *pi;
+
+               ceph_decode_32_safe(p, end, pool, bad);
+               ceph_decode_need(p, end, 1 + sizeof(pi->v), bad);
+               ev = ceph_decode_8(p);  /* encoding version */
+               if (ev > CEPH_PG_POOL_VERSION) {
+                       pr_warning("got unknown v %d > %d of ceph_pg_pool\n",
+                                  ev, CEPH_PG_POOL_VERSION);
+                       goto bad;
+               }
+               pi = __lookup_pg_pool(&map->pg_pools, pool);
+               if (!pi) {
+                       pi = kzalloc(sizeof(*pi), GFP_NOFS);
+                       if (!pi) {
+                               err = -ENOMEM;
+                               goto bad;
+                       }
+                       pi->id = pool;
+                       __insert_pg_pool(&map->pg_pools, pi);
+               }
+               err = __decode_pool(p, end, pi);
+               if (err < 0)
+                       goto bad;
+       }
+       if (version >= 5 && __decode_pool_names(p, end, map) < 0)
+               goto bad;
+
+       /* old_pool */
+       ceph_decode_32_safe(p, end, len, bad);
+       while (len--) {
+               struct ceph_pg_pool_info *pi;
+
+               ceph_decode_32_safe(p, end, pool, bad);
+               pi = __lookup_pg_pool(&map->pg_pools, pool);
+               if (pi)
+                       __remove_pg_pool(&map->pg_pools, pi);
+       }
+
+       /* new_up */
+       err = -EINVAL;
+       ceph_decode_32_safe(p, end, len, bad);
+       while (len--) {
+               u32 osd;
+               struct ceph_entity_addr addr;
+               ceph_decode_32_safe(p, end, osd, bad);
+               ceph_decode_copy_safe(p, end, &addr, sizeof(addr), bad);
+               ceph_decode_addr(&addr);
+               pr_info("osd%d up\n", osd);
+               BUG_ON(osd >= map->max_osd);
+               map->osd_state[osd] |= CEPH_OSD_UP;
+               map->osd_addr[osd] = addr;
+       }
+
+       /* new_down */
+       ceph_decode_32_safe(p, end, len, bad);
+       while (len--) {
+               u32 osd;
+               ceph_decode_32_safe(p, end, osd, bad);
+               (*p)++;  /* clean flag */
+               pr_info("osd%d down\n", osd);
+               if (osd < map->max_osd)
+                       map->osd_state[osd] &= ~CEPH_OSD_UP;
+       }
+
+       /* new_weight */
+       ceph_decode_32_safe(p, end, len, bad);
+       while (len--) {
+               u32 osd, off;
+               ceph_decode_need(p, end, sizeof(u32)*2, bad);
+               osd = ceph_decode_32(p);
+               off = ceph_decode_32(p);
+               pr_info("osd%d weight 0x%x %s\n", osd, off,
+                    off == CEPH_OSD_IN ? "(in)" :
+                    (off == CEPH_OSD_OUT ? "(out)" : ""));
+               if (osd < map->max_osd)
+                       map->osd_weight[osd] = off;
+       }
+
+       /* new_pg_temp */
+       rbp = rb_first(&map->pg_temp);
+       ceph_decode_32_safe(p, end, len, bad);
+       while (len--) {
+               struct ceph_pg_mapping *pg;
+               int j;
+               struct ceph_pg pgid;
+               u32 pglen;
+               ceph_decode_need(p, end, sizeof(u64) + sizeof(u32), bad);
+               ceph_decode_copy(p, &pgid, sizeof(pgid));
+               pglen = ceph_decode_32(p);
+
+               /* remove any? */
+               while (rbp && pgid_cmp(rb_entry(rbp, struct ceph_pg_mapping,
+                                               node)->pgid, pgid) <= 0) {
+                       struct ceph_pg_mapping *cur =
+                               rb_entry(rbp, struct ceph_pg_mapping, node);
+
+                       rbp = rb_next(rbp);
+                       dout(" removed pg_temp %llx\n", *(u64 *)&cur->pgid);
+                       rb_erase(&cur->node, &map->pg_temp);
+                       kfree(cur);
+               }
+
+               if (pglen) {
+                       /* insert */
+                       ceph_decode_need(p, end, pglen*sizeof(u32), bad);
+                       pg = kmalloc(sizeof(*pg) + sizeof(u32)*pglen, GFP_NOFS);
+                       if (!pg) {
+                               err = -ENOMEM;
+                               goto bad;
+                       }
+                       pg->pgid = pgid;
+                       pg->len = pglen;
+                       for (j = 0; j < pglen; j++)
+                               pg->osds[j] = ceph_decode_32(p);
+                       err = __insert_pg_mapping(pg, &map->pg_temp);
+                       if (err) {
+                               kfree(pg);
+                               goto bad;
+                       }
+                       dout(" added pg_temp %llx len %d\n", *(u64 *)&pgid,
+                            pglen);
+               }
+       }
+       while (rbp) {
+               struct ceph_pg_mapping *cur =
+                       rb_entry(rbp, struct ceph_pg_mapping, node);
+
+               rbp = rb_next(rbp);
+               dout(" removed pg_temp %llx\n", *(u64 *)&cur->pgid);
+               rb_erase(&cur->node, &map->pg_temp);
+               kfree(cur);
+       }
+
+       /* ignore the rest */
+       *p = end;
+       return map;
+
+bad:
+       pr_err("corrupt inc osdmap epoch %d off %d (%p of %p-%p)\n",
+              epoch, (int)(*p - start), *p, start, end);
+       print_hex_dump(KERN_DEBUG, "osdmap: ",
+                      DUMP_PREFIX_OFFSET, 16, 1,
+                      start, end - start, true);
+       if (newcrush)
+               crush_destroy(newcrush);
+       return ERR_PTR(err);
+}
+
+
+
+
+/*
+ * calculate file layout from given offset, length.
+ * fill in correct oid, logical length, and object extent
+ * offset, length.
+ *
+ * for now, we write only a single su, until we can
+ * pass a stride back to the caller.
+ */
+void ceph_calc_file_object_mapping(struct ceph_file_layout *layout,
+                                  u64 off, u64 *plen,
+                                  u64 *ono,
+                                  u64 *oxoff, u64 *oxlen)
+{
+       u32 osize = le32_to_cpu(layout->fl_object_size);
+       u32 su = le32_to_cpu(layout->fl_stripe_unit);
+       u32 sc = le32_to_cpu(layout->fl_stripe_count);
+       u32 bl, stripeno, stripepos, objsetno;
+       u32 su_per_object;
+       u64 t, su_offset;
+
+       dout("mapping %llu~%llu  osize %u fl_su %u\n", off, *plen,
+            osize, su);
+       su_per_object = osize / su;
+       dout("osize %u / su %u = su_per_object %u\n", osize, su,
+            su_per_object);
+
+       BUG_ON((su & ~PAGE_MASK) != 0);
+       /* bl = *off / su; */
+       t = off;
+       do_div(t, su);
+       bl = t;
+       dout("off %llu / su %u = bl %u\n", off, su, bl);
+
+       stripeno = bl / sc;
+       stripepos = bl % sc;
+       objsetno = stripeno / su_per_object;
+
+       *ono = objsetno * sc + stripepos;
+       dout("objset %u * sc %u = ono %u\n", objsetno, sc, (unsigned)*ono);
+
+       /* *oxoff = *off % layout->fl_stripe_unit;  # offset in su */
+       t = off;
+       su_offset = do_div(t, su);
+       *oxoff = su_offset + (stripeno % su_per_object) * su;
+
+       /*
+        * Calculate the length of the extent being written to the selected
+        * object. This is the minimum of the full length requested (plen) or
+        * the remainder of the current stripe being written to.
+        */
+       *oxlen = min_t(u64, *plen, su - su_offset);
+       *plen = *oxlen;
+
+       dout(" obj extent %llu~%llu\n", *oxoff, *oxlen);
+}
+EXPORT_SYMBOL(ceph_calc_file_object_mapping);
+
+/*
+ * calculate an object layout (i.e. pgid) from an oid,
+ * file_layout, and osdmap
+ */
+int ceph_calc_object_layout(struct ceph_object_layout *ol,
+                           const char *oid,
+                           struct ceph_file_layout *fl,
+                           struct ceph_osdmap *osdmap)
+{
+       unsigned num, num_mask;
+       struct ceph_pg pgid;
+       s32 preferred = (s32)le32_to_cpu(fl->fl_pg_preferred);
+       int poolid = le32_to_cpu(fl->fl_pg_pool);
+       struct ceph_pg_pool_info *pool;
+       unsigned ps;
+
+       BUG_ON(!osdmap);
+
+       pool = __lookup_pg_pool(&osdmap->pg_pools, poolid);
+       if (!pool)
+               return -EIO;
+       ps = ceph_str_hash(pool->v.object_hash, oid, strlen(oid));
+       if (preferred >= 0) {
+               ps += preferred;
+               num = le32_to_cpu(pool->v.lpg_num);
+               num_mask = pool->lpg_num_mask;
+       } else {
+               num = le32_to_cpu(pool->v.pg_num);
+               num_mask = pool->pg_num_mask;
+       }
+
+       pgid.ps = cpu_to_le16(ps);
+       pgid.preferred = cpu_to_le16(preferred);
+       pgid.pool = fl->fl_pg_pool;
+       if (preferred >= 0)
+               dout("calc_object_layout '%s' pgid %d.%xp%d\n", oid, poolid, ps,
+                    (int)preferred);
+       else
+               dout("calc_object_layout '%s' pgid %d.%x\n", oid, poolid, ps);
+
+       ol->ol_pgid = pgid;
+       ol->ol_stripe_unit = fl->fl_object_stripe_unit;
+       return 0;
+}
+EXPORT_SYMBOL(ceph_calc_object_layout);
+
+/*
+ * Calculate raw osd vector for the given pgid.  Return pointer to osd
+ * array, or NULL on failure.
+ */
+static int *calc_pg_raw(struct ceph_osdmap *osdmap, struct ceph_pg pgid,
+                       int *osds, int *num)
+{
+       struct ceph_pg_mapping *pg;
+       struct ceph_pg_pool_info *pool;
+       int ruleno;
+       unsigned poolid, ps, pps;
+       int preferred;
+
+       /* pg_temp? */
+       pg = __lookup_pg_mapping(&osdmap->pg_temp, pgid);
+       if (pg) {
+               *num = pg->len;
+               return pg->osds;
+       }
+
+       /* crush */
+       poolid = le32_to_cpu(pgid.pool);
+       ps = le16_to_cpu(pgid.ps);
+       preferred = (s16)le16_to_cpu(pgid.preferred);
+
+       /* don't forcefeed bad device ids to crush */
+       if (preferred >= osdmap->max_osd ||
+           preferred >= osdmap->crush->max_devices)
+               preferred = -1;
+
+       pool = __lookup_pg_pool(&osdmap->pg_pools, poolid);
+       if (!pool)
+               return NULL;
+       ruleno = crush_find_rule(osdmap->crush, pool->v.crush_ruleset,
+                                pool->v.type, pool->v.size);
+       if (ruleno < 0) {
+               pr_err("no crush rule pool %d ruleset %d type %d size %d\n",
+                      poolid, pool->v.crush_ruleset, pool->v.type,
+                      pool->v.size);
+               return NULL;
+       }
+
+       if (preferred >= 0)
+               pps = ceph_stable_mod(ps,
+                                     le32_to_cpu(pool->v.lpgp_num),
+                                     pool->lpgp_num_mask);
+       else
+               pps = ceph_stable_mod(ps,
+                                     le32_to_cpu(pool->v.pgp_num),
+                                     pool->pgp_num_mask);
+       pps += poolid;
+       *num = crush_do_rule(osdmap->crush, ruleno, pps, osds,
+                            min_t(int, pool->v.size, *num),
+                            preferred, osdmap->osd_weight);
+       return osds;
+}
+
+/*
+ * Return acting set for given pgid.
+ */
+int ceph_calc_pg_acting(struct ceph_osdmap *osdmap, struct ceph_pg pgid,
+                       int *acting)
+{
+       int rawosds[CEPH_PG_MAX_SIZE], *osds;
+       int i, o, num = CEPH_PG_MAX_SIZE;
+
+       osds = calc_pg_raw(osdmap, pgid, rawosds, &num);
+       if (!osds)
+               return -1;
+
+       /* primary is first up osd */
+       o = 0;
+       for (i = 0; i < num; i++)
+               if (ceph_osd_is_up(osdmap, osds[i]))
+                       acting[o++] = osds[i];
+       return o;
+}
+
+/*
+ * Return primary osd for given pgid, or -1 if none.
+ */
+int ceph_calc_pg_primary(struct ceph_osdmap *osdmap, struct ceph_pg pgid)
+{
+       int rawosds[CEPH_PG_MAX_SIZE], *osds;
+       int i, num = CEPH_PG_MAX_SIZE;
+
+       osds = calc_pg_raw(osdmap, pgid, rawosds, &num);
+       if (!osds)
+               return -1;
+
+       /* primary is first up osd */
+       for (i = 0; i < num; i++)
+               if (ceph_osd_is_up(osdmap, osds[i]))
+                       return osds[i];
+       return -1;
+}
+EXPORT_SYMBOL(ceph_calc_pg_primary);
diff --git a/net/ceph/pagelist.c b/net/ceph/pagelist.c

new file mode 100644 (file)

index 0000000..13cb409
--- /dev/null
+++ b/net/ceph/pagelist.c
@@ -0,0 +1,154 @@
+
+#include <linux/module.h>
+#include <linux/gfp.h>
+#include <linux/pagemap.h>
+#include <linux/highmem.h>
+#include <linux/ceph/pagelist.h>
+
+static void ceph_pagelist_unmap_tail(struct ceph_pagelist *pl)
+{
+       if (pl->mapped_tail) {
+               struct page *page = list_entry(pl->head.prev, struct page, lru);
+               kunmap(page);
+               pl->mapped_tail = NULL;
+       }
+}
+
+int ceph_pagelist_release(struct ceph_pagelist *pl)
+{
+       ceph_pagelist_unmap_tail(pl);
+       while (!list_empty(&pl->head)) {
+               struct page *page = list_first_entry(&pl->head, struct page,
+                                                    lru);
+               list_del(&page->lru);
+               __free_page(page);
+       }
+       ceph_pagelist_free_reserve(pl);
+       return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_release);
+
+static int ceph_pagelist_addpage(struct ceph_pagelist *pl)
+{
+       struct page *page;
+
+       if (!pl->num_pages_free) {
+               page = __page_cache_alloc(GFP_NOFS);
+       } else {
+               page = list_first_entry(&pl->free_list, struct page, lru);
+               list_del(&page->lru);
+               --pl->num_pages_free;
+       }
+       if (!page)
+               return -ENOMEM;
+       pl->room += PAGE_SIZE;
+       ceph_pagelist_unmap_tail(pl);
+       list_add_tail(&page->lru, &pl->head);
+       pl->mapped_tail = kmap(page);
+       return 0;
+}
+
+int ceph_pagelist_append(struct ceph_pagelist *pl, const void *buf, size_t len)
+{
+       while (pl->room < len) {
+               size_t bit = pl->room;
+               int ret;
+
+               memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK),
+                      buf, bit);
+               pl->length += bit;
+               pl->room -= bit;
+               buf += bit;
+               len -= bit;
+               ret = ceph_pagelist_addpage(pl);
+               if (ret)
+                       return ret;
+       }
+
+       memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK), buf, len);
+       pl->length += len;
+       pl->room -= len;
+       return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_append);
+
+/**
+ * Allocate enough pages for a pagelist to append the given amount
+ * of data without without allocating.
+ * Returns: 0 on success, -ENOMEM on error.
+ */
+int ceph_pagelist_reserve(struct ceph_pagelist *pl, size_t space)
+{
+       if (space <= pl->room)
+               return 0;
+       space -= pl->room;
+       space = (space + PAGE_SIZE - 1) >> PAGE_SHIFT;   /* conv to num pages */
+
+       while (space > pl->num_pages_free) {
+               struct page *page = __page_cache_alloc(GFP_NOFS);
+               if (!page)
+                       return -ENOMEM;
+               list_add_tail(&page->lru, &pl->free_list);
+               ++pl->num_pages_free;
+       }
+       return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_reserve);
+
+/**
+ * Free any pages that have been preallocated.
+ */
+int ceph_pagelist_free_reserve(struct ceph_pagelist *pl)
+{
+       while (!list_empty(&pl->free_list)) {
+               struct page *page = list_first_entry(&pl->free_list,
+                                                    struct page, lru);
+               list_del(&page->lru);
+               __free_page(page);
+               --pl->num_pages_free;
+       }
+       BUG_ON(pl->num_pages_free);
+       return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_free_reserve);
+
+/**
+ * Create a truncation point.
+ */
+void ceph_pagelist_set_cursor(struct ceph_pagelist *pl,
+                             struct ceph_pagelist_cursor *c)
+{
+       c->pl = pl;
+       c->page_lru = pl->head.prev;
+       c->room = pl->room;
+}
+EXPORT_SYMBOL(ceph_pagelist_set_cursor);
+
+/**
+ * Truncate a pagelist to the given point. Move extra pages to reserve.
+ * This won't sleep.
+ * Returns: 0 on success,
+ *          -EINVAL if the pagelist doesn't match the trunc point pagelist
+ */
+int ceph_pagelist_truncate(struct ceph_pagelist *pl,
+                          struct ceph_pagelist_cursor *c)
+{
+       struct page *page;
+
+       if (pl != c->pl)
+               return -EINVAL;
+       ceph_pagelist_unmap_tail(pl);
+       while (pl->head.prev != c->page_lru) {
+               page = list_entry(pl->head.prev, struct page, lru);
+               list_del(&page->lru);                /* remove from pagelist */
+               list_add_tail(&page->lru, &pl->free_list); /* add to reserve */
+               ++pl->num_pages_free;
+       }
+       pl->room = c->room;
+       if (!list_empty(&pl->head)) {
+               page = list_entry(pl->head.prev, struct page, lru);
+               pl->mapped_tail = kmap(page);
+       }
+       return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_truncate);
diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c

new file mode 100644 (file)

index 0000000..54caf06
--- /dev/null
+++ b/net/ceph/pagevec.c
@@ -0,0 +1,223 @@
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/writeback.h>
+
+#include <linux/ceph/libceph.h>
+
+/*
+ * build a vector of user pages
+ */
+struct page **ceph_get_direct_page_vector(const char __user *data,
+                                                int num_pages,
+                                                loff_t off, size_t len)
+{
+       struct page **pages;
+       int rc;
+
+       pages = kmalloc(sizeof(*pages) * num_pages, GFP_NOFS);
+       if (!pages)
+               return ERR_PTR(-ENOMEM);
+
+       down_read(&current->mm->mmap_sem);
+       rc = get_user_pages(current, current->mm, (unsigned long)data,
+                           num_pages, 0, 0, pages, NULL);
+       up_read(&current->mm->mmap_sem);
+       if (rc < 0)
+               goto fail;
+       return pages;
+
+fail:
+       kfree(pages);
+       return ERR_PTR(rc);
+}
+EXPORT_SYMBOL(ceph_get_direct_page_vector);
+
+void ceph_put_page_vector(struct page **pages, int num_pages)
+{
+       int i;
+
+       for (i = 0; i < num_pages; i++)
+               put_page(pages[i]);
+       kfree(pages);
+}
+EXPORT_SYMBOL(ceph_put_page_vector);
+
+void ceph_release_page_vector(struct page **pages, int num_pages)
+{
+       int i;
+
+       for (i = 0; i < num_pages; i++)
+               __free_pages(pages[i], 0);
+       kfree(pages);
+}
+EXPORT_SYMBOL(ceph_release_page_vector);
+
+/*
+ * allocate a vector new pages
+ */
+struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
+{
+       struct page **pages;
+       int i;
+
+       pages = kmalloc(sizeof(*pages) * num_pages, flags);
+       if (!pages)
+               return ERR_PTR(-ENOMEM);
+       for (i = 0; i < num_pages; i++) {
+               pages[i] = __page_cache_alloc(flags);
+               if (pages[i] == NULL) {
+                       ceph_release_page_vector(pages, i);
+                       return ERR_PTR(-ENOMEM);
+               }
+       }
+       return pages;
+}
+EXPORT_SYMBOL(ceph_alloc_page_vector);
+
+/*
+ * copy user data into a page vector
+ */
+int ceph_copy_user_to_page_vector(struct page **pages,
+                                        const char __user *data,
+                                        loff_t off, size_t len)
+{
+       int i = 0;
+       int po = off & ~PAGE_CACHE_MASK;
+       int left = len;
+       int l, bad;
+
+       while (left > 0) {
+               l = min_t(int, PAGE_CACHE_SIZE-po, left);
+               bad = copy_from_user(page_address(pages[i]) + po, data, l);
+               if (bad == l)
+                       return -EFAULT;
+               data += l - bad;
+               left -= l - bad;
+               po += l - bad;
+               if (po == PAGE_CACHE_SIZE) {
+                       po = 0;
+                       i++;
+               }
+       }
+       return len;
+}
+EXPORT_SYMBOL(ceph_copy_user_to_page_vector);
+
+int ceph_copy_to_page_vector(struct page **pages,
+                                   const char *data,
+                                   loff_t off, size_t len)
+{
+       int i = 0;
+       size_t po = off & ~PAGE_CACHE_MASK;
+       size_t left = len;
+       size_t l;
+
+       while (left > 0) {
+               l = min_t(size_t, PAGE_CACHE_SIZE-po, left);
+               memcpy(page_address(pages[i]) + po, data, l);
+               data += l;
+               left -= l;
+               po += l;
+               if (po == PAGE_CACHE_SIZE) {
+                       po = 0;
+                       i++;
+               }
+       }
+       return len;
+}
+EXPORT_SYMBOL(ceph_copy_to_page_vector);
+
+int ceph_copy_from_page_vector(struct page **pages,
+                                   char *data,
+                                   loff_t off, size_t len)
+{
+       int i = 0;
+       size_t po = off & ~PAGE_CACHE_MASK;
+       size_t left = len;
+       size_t l;
+
+       while (left > 0) {
+               l = min_t(size_t, PAGE_CACHE_SIZE-po, left);
+               memcpy(data, page_address(pages[i]) + po, l);
+               data += l;
+               left -= l;
+               po += l;
+               if (po == PAGE_CACHE_SIZE) {
+                       po = 0;
+                       i++;
+               }
+       }
+       return len;
+}
+EXPORT_SYMBOL(ceph_copy_from_page_vector);
+
+/*
+ * copy user data from a page vector into a user pointer
+ */
+int ceph_copy_page_vector_to_user(struct page **pages,
+                                        char __user *data,
+                                        loff_t off, size_t len)
+{
+       int i = 0;
+       int po = off & ~PAGE_CACHE_MASK;
+       int left = len;
+       int l, bad;
+
+       while (left > 0) {
+               l = min_t(int, left, PAGE_CACHE_SIZE-po);
+               bad = copy_to_user(data, page_address(pages[i]) + po, l);
+               if (bad == l)
+                       return -EFAULT;
+               data += l - bad;
+               left -= l - bad;
+               if (po) {
+                       po += l - bad;
+                       if (po == PAGE_CACHE_SIZE)
+                               po = 0;
+               }
+               i++;
+       }
+       return len;
+}
+EXPORT_SYMBOL(ceph_copy_page_vector_to_user);
+
+/*
+ * Zero an extent within a page vector.  Offset is relative to the
+ * start of the first page.
+ */
+void ceph_zero_page_vector_range(int off, int len, struct page **pages)
+{
+       int i = off >> PAGE_CACHE_SHIFT;
+
+       off &= ~PAGE_CACHE_MASK;
+
+       dout("zero_page_vector_page %u~%u\n", off, len);
+
+       /* leading partial page? */
+       if (off) {
+               int end = min((int)PAGE_CACHE_SIZE, off + len);
+               dout("zeroing %d %p head from %d\n", i, pages[i],
+                    (int)off);
+               zero_user_segment(pages[i], off, end);
+               len -= (end - off);
+               i++;
+       }
+       while (len >= PAGE_CACHE_SIZE) {
+               dout("zeroing %d %p len=%d\n", i, pages[i], len);
+               zero_user_segment(pages[i], 0, PAGE_CACHE_SIZE);
+               len -= PAGE_CACHE_SIZE;
+               i++;
+       }
+       /* trailing partial page? */
+       if (len) {
+               dout("zeroing %d %p tail to %d\n", i, pages[i], (int)len);
+               zero_user_segment(pages[i], 0, len);
+       }
+}
+EXPORT_SYMBOL(ceph_zero_page_vector_range);
+
diff --git a/net/core/ethtool.c b/net/core/ethtool.c

index 7a85367b3c2f8010af24bbd6b6f4249698f9d78d..8451ab481095fc523c47fa01ccb11392b15dbc30 100644 (file)
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -348,7 +348,7 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
         if (info.cmd == ETHTOOL_GRXCLSRLALL) {
                 if (info.rule_cnt > 0) {
                         if (info.rule_cnt <= KMALLOC_MAX_SIZE / sizeof(u32))
-                               rule_buf = kmalloc(info.rule_cnt * sizeof(u32),
+                               rule_buf = kzalloc(info.rule_cnt * sizeof(u32),
                                                    GFP_USER);
                         if (!rule_buf)
                                 return -ENOMEM;
@@ -397,7 +397,7 @@ static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev,
             (KMALLOC_MAX_SIZE - sizeof(*indir)) / sizeof(*indir->ring_index))
                 return -ENOMEM;
         full_size = sizeof(*indir) + sizeof(*indir->ring_index) * table_size;
-       indir = kmalloc(full_size, GFP_USER);
+       indir = kzalloc(full_size, GFP_USER);
         if (!indir)
                 return -ENOMEM;
  
@@ -538,7 +538,7 @@ static int ethtool_get_rx_ntuple(struct net_device *dev, void __user *useraddr)
  
         gstrings.len = ret;
  
-       data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
+       data = kzalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
         if (!data)
                 return -ENOMEM;
  
@@ -775,7 +775,7 @@ static int ethtool_get_regs(struct net_device *dev, char __user *useraddr)
         if (regs.len > reglen)
                 regs.len = reglen;
  
-       regbuf = kmalloc(reglen, GFP_USER);
+       regbuf = kzalloc(reglen, GFP_USER);
         if (!regbuf)
                 return -ENOMEM;
  
diff --git a/net/core/stream.c b/net/core/stream.c

index d959e0f41528ce71d69f4aafea0f6d028d72607a..f5df85dcd20bc7f790aec8f58967f55e02586444 100644 (file)
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -141,10 +141,10 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
  
                 set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
                 sk->sk_write_pending++;
-               sk_wait_event(sk, &current_timeo, !sk->sk_err &&
-                                                 !(sk->sk_shutdown & SEND_SHUTDOWN) &&
-                                                 sk_stream_memory_free(sk) &&
-                                                 vm_wait);
+               sk_wait_event(sk, &current_timeo, sk->sk_err ||
+                                                 (sk->sk_shutdown & SEND_SHUTDOWN) ||
+                                                 (sk_stream_memory_free(sk) &&
+                                                 !vm_wait));
                 sk->sk_write_pending--;
  
                 if (vm_wait) {
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig

index 571f8950ed06f585f4dca482037d4b7985c256af..7cd7760144f7dd1998276f4e2976e4a4a1c8135d 100644 (file)
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -217,6 +217,7 @@ config NET_IPIP
  
  config NET_IPGRE
         tristate "IP: GRE tunnels over IP"
+       depends on IPV6 || IPV6=n
         help
           Tunneling means encapsulating data of one protocol type within
           another protocol and sending it over a channel that understands the
@@ -412,7 +413,7 @@ config INET_XFRM_MODE_BEET
           If unsure, say Y.
  
  config INET_LRO
-       bool "Large Receive Offload (ipv4/tcp)"
+       tristate "Large Receive Offload (ipv4/tcp)"
         default y
         ---help---
           Support for Large Receive Offload (ipv4/tcp).
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c

index 1fdcacd36ce75aa5eb12d69c3c6239237258882b..2a4bb76f2132957da25326ce98653b249d9ccaaf 100644 (file)
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -834,7 +834,7 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
         int                     mark = 0;
  
  
-       if (len == 8 || IGMP_V2_SEEN(in_dev)) {
+       if (len == 8) {
                 if (ih->code == 0) {
                         /* Alas, old v1 router presents here. */
  
@@ -856,6 +856,18 @@ static void igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
                 igmpv3_clear_delrec(in_dev);
         } else if (len < 12) {
                 return; /* ignore bogus packet; freed by caller */
+       } else if (IGMP_V1_SEEN(in_dev)) {
+               /* This is a v3 query with v1 queriers present */
+               max_delay = IGMP_Query_Response_Interval;
+               group = 0;
+       } else if (IGMP_V2_SEEN(in_dev)) {
+               /* this is a v3 query with v2 queriers present;
+                * Interpretation of the max_delay code is problematic here.
+                * A real v2 host would use ih_code directly, while v3 has a
+                * different encoding. We use the v3 encoding as more likely
+                * to be intended in a v3 query.
+                */
+               max_delay = IGMPV3_MRC(ih3->code)*(HZ/IGMP_TIMER_SCALE);
         } else { /* v3 */
                 if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
                         return;
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c

index 244f7cb08d681d35f9a08744ca449ce90f5b8972..37f8adb68c79e8619d1a45a496ca6e1ef37c7c8a 100644 (file)
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -11,6 +11,7 @@
  #include <linux/proc_fs.h>
  #include <linux/seq_file.h>
  #include <linux/percpu.h>
+#include <linux/security.h>
  #include <net/net_namespace.h>
  
  #include <linux/netfilter.h>
@@ -87,6 +88,29 @@ static void ct_seq_stop(struct seq_file *s, void *v)
         rcu_read_unlock();
  }
  
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+static int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+       int ret;
+       u32 len;
+       char *secctx;
+
+       ret = security_secid_to_secctx(ct->secmark, &secctx, &len);
+       if (ret)
+               return ret;
+
+       ret = seq_printf(s, "secctx=%s ", secctx);
+
+       security_release_secctx(secctx, len);
+       return ret;
+}
+#else
+static inline int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+       return 0;
+}
+#endif
+
  static int ct_seq_show(struct seq_file *s, void *v)
  {
         struct nf_conntrack_tuple_hash *hash = v;
@@ -148,10 +172,8 @@ static int ct_seq_show(struct seq_file *s, void *v)
                 goto release;
  #endif
  
-#ifdef CONFIG_NF_CONNTRACK_SECMARK
-       if (seq_printf(s, "secmark=%u ", ct->secmark))
+       if (ct_show_secctx(s, ct))
                 goto release;
-#endif
  
         if (seq_printf(s, "use=%u\n", atomic_read(&ct->ct_general.use)))
                 goto release;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c

index c35b469e851c298814d69583bd593ec1c580dde5..74c54b30600f618522e07581c43130eea031f2db 100644 (file)
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -135,13 +135,16 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
  
  /* This function calculates a "timeout" which is equivalent to the timeout of a
   * TCP connection after "boundary" unsuccessful, exponentially backed-off
- * retransmissions with an initial RTO of TCP_RTO_MIN.
+ * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if
+ * syn_set flag is set.
   */
  static bool retransmits_timed_out(struct sock *sk,
-                                 unsigned int boundary)
+                                 unsigned int boundary,
+                                 bool syn_set)
  {
         unsigned int timeout, linear_backoff_thresh;
         unsigned int start_ts;
+       unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN;
  
         if (!inet_csk(sk)->icsk_retransmits)
                 return false;
@@ -151,12 +154,12 @@ static bool retransmits_timed_out(struct sock *sk,
         else
                 start_ts = tcp_sk(sk)->retrans_stamp;
  
-       linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
+       linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
  
         if (boundary <= linear_backoff_thresh)
-               timeout = ((2 << boundary) - 1) * TCP_RTO_MIN;
+               timeout = ((2 << boundary) - 1) * rto_base;
         else
-               timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
+               timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
                           (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
  
         return (tcp_time_stamp - start_ts) >= timeout;
@@ -167,14 +170,15 @@ static int tcp_write_timeout(struct sock *sk)
  {
         struct inet_connection_sock *icsk = inet_csk(sk);
         int retry_until;
-       bool do_reset;
+       bool do_reset, syn_set = 0;
  
         if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
                 if (icsk->icsk_retransmits)
                         dst_negative_advice(sk);
                 retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
+               syn_set = 1;
         } else {
-               if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
+               if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0)) {
                         /* Black hole detection */
                         tcp_mtu_probing(icsk, sk);
  
@@ -187,14 +191,14 @@ static int tcp_write_timeout(struct sock *sk)
  
                         retry_until = tcp_orphan_retries(sk, alive);
                         do_reset = alive ||
-                                  !retransmits_timed_out(sk, retry_until);
+                                  !retransmits_timed_out(sk, retry_until, 0);
  
                         if (tcp_out_of_resources(sk, do_reset))
                                 return 1;
                 }
         }
  
-       if (retransmits_timed_out(sk, retry_until)) {
+       if (retransmits_timed_out(sk, retry_until, syn_set)) {
                 /* Has it gone just too far? */
                 tcp_write_err(sk);
                 return 1;
@@ -436,7 +440,7 @@ out_reset_timer:
                 icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
         }
         inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
-       if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
+       if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0))
                 __sk_dst_reset(sk);
  
  out:;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c

index 8323136bdc54cc24f81acf73acf0d2c256ff8203..a275c6e1e25c23884d7d1859e46a2ee82c00acef 100644 (file)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1556,14 +1556,13 @@ out:
   *     i.e. Path MTU discovery
   */
  
-void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
-                       struct net_device *dev, u32 pmtu)
+static void rt6_do_pmtu_disc(struct in6_addr *daddr, struct in6_addr *saddr,
+                            struct net *net, u32 pmtu, int ifindex)
  {
         struct rt6_info *rt, *nrt;
-       struct net *net = dev_net(dev);
         int allfrag = 0;
  
-       rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
+       rt = rt6_lookup(net, daddr, saddr, ifindex, 0);
         if (rt == NULL)
                 return;
  
@@ -1631,6 +1630,27 @@ out:
         dst_release(&rt->dst);
  }
  
+void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
+                       struct net_device *dev, u32 pmtu)
+{
+       struct net *net = dev_net(dev);
+
+       /*
+        * RFC 1981 states that a node "MUST reduce the size of the packets it
+        * is sending along the path" that caused the Packet Too Big message.
+        * Since it's not possible in the general case to determine which
+        * interface was used to send the original packet, we update the MTU
+        * on the interface that will be used to send future packets. We also
+        * update the MTU on the interface that received the Packet Too Big in
+        * case the original packet was forced out that interface with
+        * SO_BINDTODEVICE or similar. This is the next best thing to the
+        * correct behaviour, which would be to update the MTU on all
+        * interfaces.
+        */
+       rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
+       rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
+}
+
  /*
   *     Misc support functions
   */
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c

index c893f236acea771076b5572b42c82162c3684913..8f23401832b7729d28c9e0791a78ce8acd797728 100644 (file)
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -175,6 +175,8 @@ int ___ieee80211_stop_tx_ba_session(struct sta_info *sta, u16 tid,
  
         set_bit(HT_AGG_STATE_STOPPING, &tid_tx->state);
  
+       del_timer_sync(&tid_tx->addba_resp_timer);
+
         /*
          * After this packets are no longer handed right through
          * to the driver but are put onto tid_tx->pending instead,
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c

index fa0f37e4afe4901226b0ccd668ae6a88c136eeb2..28624282c5f36ad5bed8f74c0e8bf7df42ee2d6c 100644 (file)
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2199,9 +2199,6 @@ static void ieee80211_rx_cooked_monitor(struct ieee80211_rx_data *rx,
         struct net_device *prev_dev = NULL;
         struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
  
-       if (status->flag & RX_FLAG_INTERNAL_CMTR)
-               goto out_free_skb;
-
         if (skb_headroom(skb) < sizeof(*rthdr) &&
             pskb_expand_head(skb, sizeof(*rthdr), 0, GFP_ATOMIC))
                 goto out_free_skb;
@@ -2260,7 +2257,6 @@ static void ieee80211_rx_cooked_monitor(struct ieee80211_rx_data *rx,
         } else
                 goto out_free_skb;
  
-       status->flag |= RX_FLAG_INTERNAL_CMTR;
         return;
  
   out_free_skb:
diff --git a/net/mac80211/status.c b/net/mac80211/status.c

index 10caec5ea8fa7740d9617605adf1590eefa2a730..34da67995d94ae91c8982776fd3e9104c161079f 100644 (file)
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -377,7 +377,7 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
                                 skb2 = skb_clone(skb, GFP_ATOMIC);
                                 if (skb2) {
                                         skb2->dev = prev_dev;
-                                       netif_receive_skb(skb2);
+                                       netif_rx(skb2);
                                 }
                         }
  
@@ -386,7 +386,7 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
         }
         if (prev_dev) {
                 skb->dev = prev_dev;
-               netif_receive_skb(skb);
+               netif_rx(skb);
                 skb = NULL;
         }
         rcu_read_unlock();
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c

index 5bae1cd15eea93ee3f74cb51dab972c10c96d33c..146476c6441a9ea8894d78bc5a00c558c84a0874 100644 (file)
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -22,6 +22,7 @@
  #include <linux/rculist_nulls.h>
  #include <linux/types.h>
  #include <linux/timer.h>
+#include <linux/security.h>
  #include <linux/skbuff.h>
  #include <linux/errno.h>
  #include <linux/netlink.h>
@@ -245,16 +246,31 @@ nla_put_failure:
  
  #ifdef CONFIG_NF_CONNTRACK_SECMARK
  static inline int
-ctnetlink_dump_secmark(struct sk_buff *skb, const struct nf_conn *ct)
+ctnetlink_dump_secctx(struct sk_buff *skb, const struct nf_conn *ct)
  {
-       NLA_PUT_BE32(skb, CTA_SECMARK, htonl(ct->secmark));
-       return 0;
+       struct nlattr *nest_secctx;
+       int len, ret;
+       char *secctx;
+
+       ret = security_secid_to_secctx(ct->secmark, &secctx, &len);
+       if (ret)
+               return ret;
+
+       ret = -1;
+       nest_secctx = nla_nest_start(skb, CTA_SECCTX | NLA_F_NESTED);
+       if (!nest_secctx)
+               goto nla_put_failure;
+
+       NLA_PUT_STRING(skb, CTA_SECCTX_NAME, secctx);
+       nla_nest_end(skb, nest_secctx);
  
+       ret = 0;
  nla_put_failure:
-       return -1;
+       security_release_secctx(secctx, len);
+       return ret;
  }
  #else
-#define ctnetlink_dump_secmark(a, b) (0)
+#define ctnetlink_dump_secctx(a, b) (0)
  #endif
  
  #define master_tuple(ct) &(ct->master->tuplehash[IP_CT_DIR_ORIGINAL].tuple)
@@ -391,7 +407,7 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 pid, u32 seq,
             ctnetlink_dump_protoinfo(skb, ct) < 0 ||
             ctnetlink_dump_helpinfo(skb, ct) < 0 ||
             ctnetlink_dump_mark(skb, ct) < 0 ||
-           ctnetlink_dump_secmark(skb, ct) < 0 ||
+           ctnetlink_dump_secctx(skb, ct) < 0 ||
             ctnetlink_dump_id(skb, ct) < 0 ||
             ctnetlink_dump_use(skb, ct) < 0 ||
             ctnetlink_dump_master(skb, ct) < 0 ||
@@ -437,6 +453,17 @@ ctnetlink_counters_size(const struct nf_conn *ct)
                ;
  }
  
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+static int ctnetlink_nlmsg_secctx_size(const struct nf_conn *ct)
+{
+       int len;
+
+       security_secid_to_secctx(ct->secmark, NULL, &len);
+
+       return sizeof(char) * len;
+}
+#endif
+
  static inline size_t
  ctnetlink_nlmsg_size(const struct nf_conn *ct)
  {
@@ -453,7 +480,8 @@ ctnetlink_nlmsg_size(const struct nf_conn *ct)
                + nla_total_size(0) /* CTA_HELP */
                + nla_total_size(NF_CT_HELPER_NAME_LEN) /* CTA_HELP_NAME */
  #ifdef CONFIG_NF_CONNTRACK_SECMARK
-              + nla_total_size(sizeof(u_int32_t)) /* CTA_SECMARK */
+              + nla_total_size(0) /* CTA_SECCTX */
+              + nla_total_size(ctnetlink_nlmsg_secctx_size(ct)) /* CTA_SECCTX_NAME */
  #endif
  #ifdef CONFIG_NF_NAT_NEEDED
                + 2 * nla_total_size(0) /* CTA_NAT_SEQ_ADJ_ORIG|REPL */
@@ -556,7 +584,7 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
  
  #ifdef CONFIG_NF_CONNTRACK_SECMARK
                 if ((events & (1 << IPCT_SECMARK) || ct->secmark)
-                   && ctnetlink_dump_secmark(skb, ct) < 0)
+                   && ctnetlink_dump_secctx(skb, ct) < 0)
                         goto nla_put_failure;
  #endif
  
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c

index eb973fcd67ab4273a3cdee566a5f4a4fb44a24f2..0fb65705b44b522e3ba4de6c02d06f119f43f2d9 100644 (file)
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -15,6 +15,7 @@
  #include <linux/seq_file.h>
  #include <linux/percpu.h>
  #include <linux/netdevice.h>
+#include <linux/security.h>
  #include <net/net_namespace.h>
  #ifdef CONFIG_SYSCTL
  #include <linux/sysctl.h>
@@ -108,6 +109,29 @@ static void ct_seq_stop(struct seq_file *s, void *v)
         rcu_read_unlock();
  }
  
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+static int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+       int ret;
+       u32 len;
+       char *secctx;
+
+       ret = security_secid_to_secctx(ct->secmark, &secctx, &len);
+       if (ret)
+               return ret;
+
+       ret = seq_printf(s, "secctx=%s ", secctx);
+
+       security_release_secctx(secctx, len);
+       return ret;
+}
+#else
+static inline int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+       return 0;
+}
+#endif
+
  /* return 0 on success, 1 in case of error */
  static int ct_seq_show(struct seq_file *s, void *v)
  {
@@ -168,10 +192,8 @@ static int ct_seq_show(struct seq_file *s, void *v)
                 goto release;
  #endif
  
-#ifdef CONFIG_NF_CONNTRACK_SECMARK
-       if (seq_printf(s, "secmark=%u ", ct->secmark))
+       if (ct_show_secctx(s, ct))
                 goto release;
-#endif
  
  #ifdef CONFIG_NF_CONNTRACK_ZONES
         if (seq_printf(s, "zone=%u ", nf_ct_zone(ct)))
diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c

index 0cb6053f02fdf04723254bfe90d8ca9bff29edfc..782e51986a6f670ce6c033c1a1d99e5a26d46885 100644 (file)
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -9,7 +9,6 @@
  #include <linux/module.h>
  #include <linux/gfp.h>
  #include <linux/skbuff.h>
-#include <linux/selinux.h>
  #include <linux/netfilter_ipv4/ip_tables.h>
  #include <linux/netfilter_ipv6/ip6_tables.h>
  #include <linux/netfilter/x_tables.h>
diff --git a/net/netfilter/xt_SECMARK.c b/net/netfilter/xt_SECMARK.c

index 23b2d6c486b573927dcefd00b575546b35376bfa..9faf5e050b796186b3204a02ece181726a26cb1a 100644 (file)
--- a/net/netfilter/xt_SECMARK.c
+++ b/net/netfilter/xt_SECMARK.c
@@ -14,8 +14,8 @@
   */
  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
  #include <linux/module.h>
+#include <linux/security.h>
  #include <linux/skbuff.h>
-#include <linux/selinux.h>
  #include <linux/netfilter/x_tables.h>
  #include <linux/netfilter/xt_SECMARK.h>
  
@@ -39,9 +39,8 @@ secmark_tg(struct sk_buff *skb, const struct xt_action_param *par)
  
         switch (mode) {
         case SECMARK_MODE_SEL:
-               secmark = info->u.sel.selsid;
+               secmark = info->secid;
                 break;
-
         default:
                 BUG();
         }
@@ -50,33 +49,33 @@ secmark_tg(struct sk_buff *skb, const struct xt_action_param *par)
         return XT_CONTINUE;
  }
  
-static int checkentry_selinux(struct xt_secmark_target_info *info)
+static int checkentry_lsm(struct xt_secmark_target_info *info)
  {
         int err;
-       struct xt_secmark_target_selinux_info *sel = &info->u.sel;
  
-       sel->selctx[SECMARK_SELCTX_MAX - 1] = '\0';
+       info->secctx[SECMARK_SECCTX_MAX - 1] = '\0';
+       info->secid = 0;
  
-       err = selinux_string_to_sid(sel->selctx, &sel->selsid);
+       err = security_secctx_to_secid(info->secctx, strlen(info->secctx),
+                                      &info->secid);
         if (err) {
                 if (err == -EINVAL)
-                       pr_info("invalid SELinux context \'%s\'\n",
-                               sel->selctx);
+                       pr_info("invalid security context \'%s\'\n", info->secctx);
                 return err;
         }
  
-       if (!sel->selsid) {
-               pr_info("unable to map SELinux context \'%s\'\n", sel->selctx);
+       if (!info->secid) {
+               pr_info("unable to map security context \'%s\'\n", info->secctx);
                 return -ENOENT;
         }
  
-       err = selinux_secmark_relabel_packet_permission(sel->selsid);
+       err = security_secmark_relabel_packet(info->secid);
         if (err) {
                 pr_info("unable to obtain relabeling permission\n");
                 return err;
         }
  
-       selinux_secmark_refcount_inc();
+       security_secmark_refcount_inc();
         return 0;
  }
  
@@ -100,16 +99,16 @@ static int secmark_tg_check(const struct xt_tgchk_param *par)
  
         switch (info->mode) {
         case SECMARK_MODE_SEL:
-               err = checkentry_selinux(info);
-               if (err <= 0)
-                       return err;
                 break;
-
         default:
                 pr_info("invalid mode: %hu\n", info->mode);
                 return -EINVAL;
         }
  
+       err = checkentry_lsm(info);
+       if (err)
+               return err;
+
         if (!mode)
                 mode = info->mode;
         return 0;
@@ -119,7 +118,7 @@ static void secmark_tg_destroy(const struct xt_tgdtor_param *par)
  {
         switch (mode) {
         case SECMARK_MODE_SEL:
-               selinux_secmark_refcount_dec();
+               security_secmark_refcount_dec();
         }
  }
  
diff --git a/net/phonet/pep.c b/net/phonet/pep.c

index b2a3ae6cad78e28324e23b857dc0a5773f569786..15003021f4f0a8706e540150425b4f995dc582a6 100644 (file)
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -225,12 +225,13 @@ static void pipe_grant_credits(struct sock *sk)
  static int pipe_rcv_status(struct sock *sk, struct sk_buff *skb)
  {
         struct pep_sock *pn = pep_sk(sk);
-       struct pnpipehdr *hdr = pnp_hdr(skb);
+       struct pnpipehdr *hdr;
         int wake = 0;
  
         if (!pskb_may_pull(skb, sizeof(*hdr) + 4))
                 return -EINVAL;
  
+       hdr = pnp_hdr(skb);
         if (hdr->data[0] != PN_PEP_TYPE_COMMON) {
                 LIMIT_NETDEBUG(KERN_DEBUG"Phonet unknown PEP type: %u\n",
                                 (unsigned)hdr->data[0]);
diff --git a/net/rds/page.c b/net/rds/page.c

index 595a952d4b17f069c60a457701d6e207f68e621b..1dfbfea12e9bc82ba8482277b4331289effa275b 100644 (file)
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -57,30 +57,17 @@ int rds_page_copy_user(struct page *page, unsigned long offset,
         unsigned long ret;
         void *addr;
  
-       if (to_user)
+       addr = kmap(page);
+       if (to_user) {
                 rds_stats_add(s_copy_to_user, bytes);
-       else
+               ret = copy_to_user(ptr, addr + offset, bytes);
+       } else {
                 rds_stats_add(s_copy_from_user, bytes);
-
-       addr = kmap_atomic(page, KM_USER0);
-       if (to_user)
-               ret = __copy_to_user_inatomic(ptr, addr + offset, bytes);
-       else
-               ret = __copy_from_user_inatomic(addr + offset, ptr, bytes);
-       kunmap_atomic(addr, KM_USER0);
-
-       if (ret) {
-               addr = kmap(page);
-               if (to_user)
-                       ret = copy_to_user(ptr, addr + offset, bytes);
-               else
-                       ret = copy_from_user(addr + offset, ptr, bytes);
-               kunmap(page);
-               if (ret)
-                       return -EFAULT;
+               ret = copy_from_user(addr + offset, ptr, bytes);
         }
+       kunmap(page);
  
-       return 0;
+       return ret ? -EFAULT : 0;
  }
  EXPORT_SYMBOL_GPL(rds_page_copy_user);
  
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c

index 7416a5c73b2a993550991ac66eca7cc254c6f2e6..b0c2a82178afa032ce1d09b0e9f400afb2b578f5 100644 (file)
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -137,7 +137,7 @@ next_knode:
                         int toff = off + key->off + (off2 & key->offmask);
                         __be32 *data, _data;
  
-                       if (skb_headroom(skb) + toff < 0)
+                       if (skb_headroom(skb) + toff > INT_MAX)
                                 goto out;
  
                         data = skb_header_pointer(skb, toff, 4, &_data);
diff --git a/net/sctp/auth.c b/net/sctp/auth.c

index 86366390038a1cae0b536874fd037464f2a1678f..ddbbf7c81fa1d62adf50600b6c788d6769ff5d32 100644 (file)
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -543,16 +543,20 @@ struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc)
                 id = ntohs(hmacs->hmac_ids[i]);
  
                 /* Check the id is in the supported range */
-               if (id > SCTP_AUTH_HMAC_ID_MAX)
+               if (id > SCTP_AUTH_HMAC_ID_MAX) {
+                       id = 0;
                         continue;
+               }
  
                 /* See is we support the id.  Supported IDs have name and
                  * length fields set, so that we can allocated and use
                  * them.  We can safely just check for name, for without the
                  * name, we can't allocate the TFM.
                  */
-               if (!sctp_hmac_list[id].hmac_name)
+               if (!sctp_hmac_list[id].hmac_name) {
+                       id = 0;
                         continue;
+               }
  
                 break;
         }
diff --git a/net/sctp/socket.c b/net/sctp/socket.c

index ca44917872d2553b98f4e2f3601f95fe0f2c414f..fbb70770ad05d05807d25b5527e5616a621f58e8 100644 (file)
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -916,6 +916,11 @@ SCTP_STATIC int sctp_setsockopt_bindx(struct sock* sk,
         /* Walk through the addrs buffer and count the number of addresses. */
         addr_buf = kaddrs;
         while (walk_size < addrs_size) {
+               if (walk_size + sizeof(sa_family_t) > addrs_size) {
+                       kfree(kaddrs);
+                       return -EINVAL;
+               }
+
                 sa_addr = (struct sockaddr *)addr_buf;
                 af = sctp_get_af_specific(sa_addr->sa_family);
  
@@ -1002,9 +1007,13 @@ static int __sctp_connect(struct sock* sk,
         /* Walk through the addrs buffer and count the number of addresses. */
         addr_buf = kaddrs;
         while (walk_size < addrs_size) {
+               if (walk_size + sizeof(sa_family_t) > addrs_size) {
+                       err = -EINVAL;
+                       goto out_free;
+               }
+
                 sa_addr = (union sctp_addr *)addr_buf;
                 af = sctp_get_af_specific(sa_addr->sa.sa_family);
-               port = ntohs(sa_addr->v4.sin_port);
  
                 /* If the address family is not supported or if this address
                  * causes the address buffer to overflow return EINVAL.
@@ -1014,6 +1023,8 @@ static int __sctp_connect(struct sock* sk,
                         goto out_free;
                 }
  
+               port = ntohs(sa_addr->v4.sin_port);
+
                 /* Save current address so we can work with it */
                 memcpy(&to, sa_addr, af->sockaddr_len);
  
diff --git a/samples/kfifo/dma-example.c b/samples/kfifo/dma-example.c

index ee03a4f0b64f4361af8c850b3123b8d457bec0bf..06473791c08adb7c5b0a7080ea9600d927c09d94 100644 (file)
--- a/samples/kfifo/dma-example.c
+++ b/samples/kfifo/dma-example.c
@@ -24,6 +24,7 @@ static int __init example_init(void)
  {
         int                     i;
         unsigned int            ret;
+       unsigned int            nents;
         struct scatterlist      sg[10];
  
         printk(KERN_INFO "DMA fifo test start\n");
@@ -61,9 +62,9 @@ static int __init example_init(void)
          * byte at the beginning, after the kfifo_skip().
          */
         sg_init_table(sg, ARRAY_SIZE(sg));
-       ret = kfifo_dma_in_prepare(&fifo, sg, ARRAY_SIZE(sg), FIFO_SIZE);
-       printk(KERN_INFO "DMA sgl entries: %d\n", ret);
-       if (!ret) {
+       nents = kfifo_dma_in_prepare(&fifo, sg, ARRAY_SIZE(sg), FIFO_SIZE);
+       printk(KERN_INFO "DMA sgl entries: %d\n", nents);
+       if (!nents) {
                 /* fifo is full and no sgl was created */
                 printk(KERN_WARNING "error kfifo_dma_in_prepare\n");
                 return -EIO;
@@ -71,7 +72,7 @@ static int __init example_init(void)
  
         /* receive data */
         printk(KERN_INFO "scatterlist for receive:\n");
-       for (i = 0; i < ARRAY_SIZE(sg); i++) {
+       for (i = 0; i < nents; i++) {
                 printk(KERN_INFO
                 "sg[%d] -> "
                 "page_link 0x%.8lx offset 0x%.8x length 0x%.8x\n",
@@ -91,16 +92,16 @@ static int __init example_init(void)
         kfifo_dma_in_finish(&fifo, ret);
  
         /* Prepare to transmit data, example: 8 bytes */
-       ret = kfifo_dma_out_prepare(&fifo, sg, ARRAY_SIZE(sg), 8);
-       printk(KERN_INFO "DMA sgl entries: %d\n", ret);
-       if (!ret) {
+       nents = kfifo_dma_out_prepare(&fifo, sg, ARRAY_SIZE(sg), 8);
+       printk(KERN_INFO "DMA sgl entries: %d\n", nents);
+       if (!nents) {
                 /* no data was available and no sgl was created */
                 printk(KERN_WARNING "error kfifo_dma_out_prepare\n");
                 return -EIO;
         }
  
         printk(KERN_INFO "scatterlist for transmit:\n");
-       for (i = 0; i < ARRAY_SIZE(sg); i++) {
+       for (i = 0; i < nents; i++) {
                 printk(KERN_INFO
                 "sg[%d] -> "
                 "page_link 0x%.8lx offset 0x%.8x length 0x%.8x\n",
diff --git a/scripts/kconfig/conf.c b/scripts/kconfig/conf.c

index 5b7c86ea43a1e3f27484ccfc83f8d2992ad548a4..7ef429cd5cb38f9bd07889aad8f5ab8f3983e27b 100644 (file)
--- a/scripts/kconfig/conf.c
+++ b/scripts/kconfig/conf.c
@@ -427,7 +427,7 @@ static void check_conf(struct menu *menu)
                                 if (sym->name && !sym_is_choice_value(sym)) {
                                         printf("CONFIG_%s\n", sym->name);
                                 }
-                       } else {
+                       } else if (input_mode != oldnoconfig) {
                                 if (!conf_cnt++)
                                         printf(_("*\n* Restart config...\n*\n"));
                                 rootEntry = menu_get_parent_menu(menu);
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h

index 6ee2e4fb148146ace18b2e5a7414e6a89d5d3e2f..170459c224a13d94be08b00b72b63c1503303ed1 100644 (file)
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -165,7 +165,6 @@ struct menu {
         struct symbol *sym;
         struct property *prompt;
         struct expr *dep;
-       struct expr *dir_dep;
         unsigned int flags;
         char *help;
         struct file *file;
diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c

index 4fb590247f330bb75215fe88eb749d42da8b489e..edda8b49619d9e66ff70e16dd8ef68caf4a86bc6 100644 (file)
--- a/scripts/kconfig/menu.c
+++ b/scripts/kconfig/menu.c
@@ -107,7 +107,6 @@ static struct expr *menu_check_dep(struct expr *e)
  void menu_add_dep(struct expr *dep)
  {
         current_entry->dep = expr_alloc_and(current_entry->dep, menu_check_dep(dep));
-       current_entry->dir_dep = current_entry->dep;
  }
  
  void menu_set_type(int type)
@@ -291,10 +290,6 @@ void menu_finalize(struct menu *parent)
                 for (menu = parent->list; menu; menu = menu->next)
                         menu_finalize(menu);
         } else if (sym) {
-               /* ignore inherited dependencies for dir_dep */
-               sym->dir_dep.expr = expr_transform(expr_copy(parent->dir_dep));
-               sym->dir_dep.expr = expr_eliminate_dups(sym->dir_dep.expr);
-
                 basedep = parent->prompt ? parent->prompt->visible.expr : NULL;
                 basedep = expr_trans_compare(basedep, E_UNEQUAL, &symbol_no);
                 basedep = expr_eliminate_dups(expr_transform(basedep));
@@ -325,6 +320,8 @@ void menu_finalize(struct menu *parent)
                         parent->next = last_menu->next;
                         last_menu->next = NULL;
                 }
+
+               sym->dir_dep.expr = parent->dep;
         }
         for (menu = parent->list; menu; menu = menu->next) {
                 if (sym && sym_is_choice(sym) &&
diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c

index 943712ca6c0a6a0eb5bbf97dbba2a2d20e495b47..1f8b305449db354b103c6eb7ced09095c801dc1b 100644 (file)
--- a/scripts/kconfig/symbol.c
+++ b/scripts/kconfig/symbol.c
@@ -350,6 +350,7 @@ void sym_calc_value(struct symbol *sym)
                                 }
                         }
                 calc_newval:
+#if 0
                         if (sym->dir_dep.tri == no && sym->rev_dep.tri != no) {
                                 fprintf(stderr, "warning: (");
                                 expr_fprint(sym->rev_dep.expr, stderr);
@@ -358,6 +359,7 @@ void sym_calc_value(struct symbol *sym)
                                 expr_fprint(sym->dir_dep.expr, stderr);
                                 fprintf(stderr, ")\n");
                         }
+#endif
                         newval.tri = EXPR_OR(newval.tri, sym->rev_dep.tri);
                 }
                 if (newval.tri == mod && sym_get_type(sym) == S_BOOLEAN)
diff --git a/security/apparmor/.gitignore b/security/apparmor/.gitignore

index 0a0a99f3b08331f78f7499c5e964929af20e96c9..4d995aeaebc0ad0a3232faebead627da4ebb40b9 100644 (file)
--- a/security/apparmor/.gitignore
+++ b/security/apparmor/.gitignore
@@ -3,3 +3,4 @@
  #
  af_names.h
  capability_names.h
+rlim_names.h
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c

index 7320331b44aba5bd52eac6b2a97302ad21a4ca5f..544ff5837cb640623dadd738fe8879d823574eda 100644 (file)
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -29,7 +29,7 @@
   * aa_simple_write_to_buffer - common routine for getting policy from user
   * @op: operation doing the user buffer copy
   * @userbuf: user buffer to copy data from  (NOT NULL)
- * @alloc_size: size of user buffer
+ * @alloc_size: size of user buffer (REQUIRES: @alloc_size >= @copy_size)
   * @copy_size: size of data to copy from user buffer
   * @pos: position write is at in the file (NOT NULL)
   *
@@ -42,6 +42,8 @@ static char *aa_simple_write_to_buffer(int op, const char __user *userbuf,
  {
         char *data;
  
+       BUG_ON(copy_size > alloc_size);
+
         if (*pos != 0)
                 /* only writes from pos 0, that is complete writes */
                 return ERR_PTR(-ESPIPE);
diff --git a/security/capability.c b/security/capability.c

index 95a6599a37bb3ae0737779d16d9a0d811d5bfc82..30ae00fbecd591591acb55c1431d62a1bbbac427 100644 (file)
--- a/security/capability.c
+++ b/security/capability.c
@@ -677,7 +677,18 @@ static void cap_inet_conn_established(struct sock *sk, struct sk_buff *skb)
  {
  }
  
+static int cap_secmark_relabel_packet(u32 secid)
+{
+       return 0;
+}
  
+static void cap_secmark_refcount_inc(void)
+{
+}
+
+static void cap_secmark_refcount_dec(void)
+{
+}
  
  static void cap_req_classify_flow(const struct request_sock *req,
                                   struct flowi *fl)
@@ -777,7 +788,8 @@ static int cap_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
  
  static int cap_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid)
  {
-       return -EOPNOTSUPP;
+       *secid = 0;
+       return 0;
  }
  
  static void cap_release_secctx(char *secdata, u32 seclen)
@@ -1018,6 +1030,9 @@ void __init security_fixup_ops(struct security_operations *ops)
         set_to_cap_if_null(ops, inet_conn_request);
         set_to_cap_if_null(ops, inet_csk_clone);
         set_to_cap_if_null(ops, inet_conn_established);
+       set_to_cap_if_null(ops, secmark_relabel_packet);
+       set_to_cap_if_null(ops, secmark_refcount_inc);
+       set_to_cap_if_null(ops, secmark_refcount_dec);
         set_to_cap_if_null(ops, req_classify_flow);
         set_to_cap_if_null(ops, tun_dev_create);
         set_to_cap_if_null(ops, tun_dev_post_create);
diff --git a/security/commoncap.c b/security/commoncap.c

index 9d172e6e330c9fd7906a8a2e5754713f80dfb433..5e632b4857e443d8031eaa17c0e2bd7e877b3d14 100644 (file)
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -719,14 +719,11 @@ static int cap_safe_nice(struct task_struct *p)
  /**
   * cap_task_setscheduler - Detemine if scheduler policy change is permitted
   * @p: The task to affect
- * @policy: The policy to effect
- * @lp: The parameters to the scheduling policy
   *
   * Detemine if the requested scheduler policy change is permitted for the
   * specified task, returning 0 if permission is granted, -ve if denied.
   */
-int cap_task_setscheduler(struct task_struct *p, int policy,
-                          struct sched_param *lp)
+int cap_task_setscheduler(struct task_struct *p)
  {
         return cap_safe_nice(p);
  }
diff --git a/security/security.c b/security/security.c

index c53949f17d9e0dddc0601032576ef2922fb88f86..b50f472061a43c6ec7fb53781c084fb2cb487dcb 100644 (file)
--- a/security/security.c
+++ b/security/security.c
@@ -89,20 +89,12 @@ __setup("security=", choose_lsm);
   * Return true if:
   *     -The passed LSM is the one chosen by user at boot time,
   *     -or the passed LSM is configured as the default and the user did not
- *      choose an alternate LSM at boot time,
- *     -or there is no default LSM set and the user didn't specify a
- *      specific LSM and we're the first to ask for registration permission,
- *     -or the passed LSM is currently loaded.
+ *      choose an alternate LSM at boot time.
   * Otherwise, return false.
   */
  int __init security_module_enable(struct security_operations *ops)
  {
-       if (!*chosen_lsm)
-               strncpy(chosen_lsm, ops->name, SECURITY_NAME_MAX);
-       else if (strncmp(ops->name, chosen_lsm, SECURITY_NAME_MAX))
-               return 0;
-
-       return 1;
+       return !strcmp(ops->name, chosen_lsm);
  }
  
  /**
@@ -786,10 +778,9 @@ int security_task_setrlimit(struct task_struct *p, unsigned int resource,
         return security_ops->task_setrlimit(p, resource, new_rlim);
  }
  
-int security_task_setscheduler(struct task_struct *p,
-                               int policy, struct sched_param *lp)
+int security_task_setscheduler(struct task_struct *p)
  {
-       return security_ops->task_setscheduler(p, policy, lp);
+       return security_ops->task_setscheduler(p);
  }
  
  int security_task_getscheduler(struct task_struct *p)
@@ -1145,6 +1136,24 @@ void security_inet_conn_established(struct sock *sk,
         security_ops->inet_conn_established(sk, skb);
  }
  
+int security_secmark_relabel_packet(u32 secid)
+{
+       return security_ops->secmark_relabel_packet(secid);
+}
+EXPORT_SYMBOL(security_secmark_relabel_packet);
+
+void security_secmark_refcount_inc(void)
+{
+       security_ops->secmark_refcount_inc();
+}
+EXPORT_SYMBOL(security_secmark_refcount_inc);
+
+void security_secmark_refcount_dec(void)
+{
+       security_ops->secmark_refcount_dec();
+}
+EXPORT_SYMBOL(security_secmark_refcount_dec);
+
  int security_tun_dev_create(void)
  {
         return security_ops->tun_dev_create();
diff --git a/security/selinux/Makefile b/security/selinux/Makefile

index 58d80f3bd6f681f6d366f5b67becd7ff433ce2c7..ad5cd76ec231cd14f02b2fb15f07a3d8a069972f 100644 (file)
--- a/security/selinux/Makefile
+++ b/security/selinux/Makefile
@@ -2,25 +2,20 @@
  # Makefile for building the SELinux module as part of the kernel tree.
  #
  
-obj-$(CONFIG_SECURITY_SELINUX) := selinux.o ss/
-
-selinux-y := avc.o \
-            hooks.o \
-            selinuxfs.o \
-            netlink.o \
-            nlmsgtab.o \
-            netif.o \
-            netnode.o \
-            netport.o \
-            exports.o
+obj-$(CONFIG_SECURITY_SELINUX) := selinux.o
+
+selinux-y := avc.o hooks.o selinuxfs.o netlink.o nlmsgtab.o netif.o \
+            netnode.o netport.o exports.o \
+            ss/ebitmap.o ss/hashtab.o ss/symtab.o ss/sidtab.o ss/avtab.o \
+            ss/policydb.o ss/services.o ss/conditional.o ss/mls.o ss/status.o
  
  selinux-$(CONFIG_SECURITY_NETWORK_XFRM) += xfrm.o
  
  selinux-$(CONFIG_NETLABEL) += netlabel.o
  
-EXTRA_CFLAGS += -Isecurity/selinux -Isecurity/selinux/include
+ccflags-y := -Isecurity/selinux -Isecurity/selinux/include
  
-$(obj)/avc.o: $(obj)/flask.h
+$(addprefix $(obj)/,$(selinux-y)): $(obj)/flask.h
  
  quiet_cmd_flask = GEN     $(obj)/flask.h $(obj)/av_permissions.h
        cmd_flask = scripts/selinux/genheaders/genheaders $(obj)/flask.h $(obj)/av_permissions.h
diff --git a/security/selinux/exports.c b/security/selinux/exports.c

index c0a454aee1e03cb0e3825a2c5f965c941636c4d9..90664385dead0df01f6eec2b7479c83f4f021160 100644 (file)
--- a/security/selinux/exports.c
+++ b/security/selinux/exports.c
@@ -11,58 +11,9 @@
   * it under the terms of the GNU General Public License version 2,
   * as published by the Free Software Foundation.
   */
-#include <linux/types.h>
-#include <linux/kernel.h>
  #include <linux/module.h>
-#include <linux/selinux.h>
-#include <linux/fs.h>
-#include <linux/ipc.h>
-#include <asm/atomic.h>
  
  #include "security.h"
-#include "objsec.h"
-
-/* SECMARK reference count */
-extern atomic_t selinux_secmark_refcount;
-
-int selinux_string_to_sid(char *str, u32 *sid)
-{
-       if (selinux_enabled)
-               return security_context_to_sid(str, strlen(str), sid);
-       else {
-               *sid = 0;
-               return 0;
-       }
-}
-EXPORT_SYMBOL_GPL(selinux_string_to_sid);
-
-int selinux_secmark_relabel_packet_permission(u32 sid)
-{
-       if (selinux_enabled) {
-               const struct task_security_struct *__tsec;
-               u32 tsid;
-
-               __tsec = current_security();
-               tsid = __tsec->sid;
-
-               return avc_has_perm(tsid, sid, SECCLASS_PACKET,
-                                   PACKET__RELABELTO, NULL);
-       }
-       return 0;
-}
-EXPORT_SYMBOL_GPL(selinux_secmark_relabel_packet_permission);
-
-void selinux_secmark_refcount_inc(void)
-{
-       atomic_inc(&selinux_secmark_refcount);
-}
-EXPORT_SYMBOL_GPL(selinux_secmark_refcount_inc);
-
-void selinux_secmark_refcount_dec(void)
-{
-       atomic_dec(&selinux_secmark_refcount);
-}
-EXPORT_SYMBOL_GPL(selinux_secmark_refcount_dec);
  
  bool selinux_is_enabled(void)
  {
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c

index 4796ddd4e721ae454a02563d713aa235870ece02..d9154cf90ae19cd4eb5f40d65882abb60781da3d 100644 (file)
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3354,11 +3354,11 @@ static int selinux_task_setrlimit(struct task_struct *p, unsigned int resource,
         return 0;
  }
  
-static int selinux_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp)
+static int selinux_task_setscheduler(struct task_struct *p)
  {
         int rc;
  
-       rc = cap_task_setscheduler(p, policy, lp);
+       rc = cap_task_setscheduler(p);
         if (rc)
                 return rc;
  
@@ -4279,6 +4279,27 @@ static void selinux_inet_conn_established(struct sock *sk, struct sk_buff *skb)
         selinux_skb_peerlbl_sid(skb, family, &sksec->peer_sid);
  }
  
+static int selinux_secmark_relabel_packet(u32 sid)
+{
+       const struct task_security_struct *__tsec;
+       u32 tsid;
+
+       __tsec = current_security();
+       tsid = __tsec->sid;
+
+       return avc_has_perm(tsid, sid, SECCLASS_PACKET, PACKET__RELABELTO, NULL);
+}
+
+static void selinux_secmark_refcount_inc(void)
+{
+       atomic_inc(&selinux_secmark_refcount);
+}
+
+static void selinux_secmark_refcount_dec(void)
+{
+       atomic_dec(&selinux_secmark_refcount);
+}
+
  static void selinux_req_classify_flow(const struct request_sock *req,
                                       struct flowi *fl)
  {
@@ -5533,6 +5554,9 @@ static struct security_operations selinux_ops = {
         .inet_conn_request =            selinux_inet_conn_request,
         .inet_csk_clone =               selinux_inet_csk_clone,
         .inet_conn_established =        selinux_inet_conn_established,
+       .secmark_relabel_packet =       selinux_secmark_relabel_packet,
+       .secmark_refcount_inc =         selinux_secmark_refcount_inc,
+       .secmark_refcount_dec =         selinux_secmark_refcount_dec,
         .req_classify_flow =            selinux_req_classify_flow,
         .tun_dev_create =               selinux_tun_dev_create,
         .tun_dev_post_create =          selinux_tun_dev_post_create,
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h

index b4c9eb4bd6f9127a506e2a4483c592362c8fcafa..8858d2b2d4b6ad1dd1b005b20a06afa4a3505d03 100644 (file)
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -17,7 +17,7 @@ struct security_class_mapping secclass_map[] = {
           { "compute_av", "compute_create", "compute_member",
             "check_context", "load_policy", "compute_relabel",
             "compute_user", "setenforce", "setbool", "setsecparam",
-           "setcheckreqprot", NULL } },
+           "setcheckreqprot", "read_policy", NULL } },
         { "process",
           { "fork", "transition", "sigchld", "sigkill",
             "sigstop", "signull", "signal", "ptrace", "getsched", "setsched",
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h

index 1f7c2491d3dccbc54769a6ccaf509d50255cfe3f..671273eb1115c4e7f05983af071069aea7535650 100644 (file)
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -9,6 +9,7 @@
  #define _SELINUX_SECURITY_H_
  
  #include <linux/magic.h>
+#include <linux/types.h>
  #include "flask.h"
  
  #define SECSID_NULL                    0x00000000 /* unspecified SID */
@@ -82,6 +83,8 @@ extern int selinux_policycap_openperm;
  int security_mls_enabled(void);
  
  int security_load_policy(void *data, size_t len);
+int security_read_policy(void **data, ssize_t *len);
+size_t security_policydb_len(void);
  
  int security_policycap_supported(unsigned int req_cap);
  
@@ -191,5 +194,25 @@ static inline int security_netlbl_sid_to_secattr(u32 sid,
  
  const char *security_get_initial_sid_context(u32 sid);
  
+/*
+ * status notifier using mmap interface
+ */
+extern struct page *selinux_kernel_status_page(void);
+
+#define SELINUX_KERNEL_STATUS_VERSION  1
+struct selinux_kernel_status {
+       u32     version;        /* version number of thie structure */
+       u32     sequence;       /* sequence number of seqlock logic */
+       u32     enforcing;      /* current setting of enforcing mode */
+       u32     policyload;     /* times of policy reloaded */
+       u32     deny_unknown;   /* current setting of deny_unknown */
+       /*
+        * The version > 0 supports above members.
+        */
+} __attribute__((packed));
+
+extern void selinux_status_update_setenforce(int enforcing);
+extern void selinux_status_update_policyload(int seqno);
+
  #endif /* _SELINUX_SECURITY_H_ */
  
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c

index 79a1bb635662fbc7f65a306e10b5b67e534fcf1c..87e0556bae70ff977ea290b3cdfcc2c308d8edf5 100644 (file)
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -68,6 +68,8 @@ static int *bool_pending_values;
  static struct dentry *class_dir;
  static unsigned long last_class_ino;
  
+static char policy_opened;
+
  /* global data for policy capabilities */
  static struct dentry *policycap_dir;
  
@@ -110,6 +112,8 @@ enum sel_inos {
         SEL_COMPAT_NET, /* whether to use old compat network packet controls */
         SEL_REJECT_UNKNOWN, /* export unknown reject handling to userspace */
         SEL_DENY_UNKNOWN, /* export unknown deny handling to userspace */
+       SEL_STATUS,     /* export current status using mmap() */
+       SEL_POLICY,     /* allow userspace to read the in kernel policy */
         SEL_INO_NEXT,   /* The next inode number to use */
  };
  
@@ -171,6 +175,7 @@ static ssize_t sel_write_enforce(struct file *file, const char __user *buf,
                 if (selinux_enforcing)
                         avc_ss_reset(0);
                 selnl_notify_setenforce(selinux_enforcing);
+               selinux_status_update_setenforce(selinux_enforcing);
         }
         length = count;
  out:
@@ -205,6 +210,59 @@ static const struct file_operations sel_handle_unknown_ops = {
         .llseek         = generic_file_llseek,
  };
  
+static int sel_open_handle_status(struct inode *inode, struct file *filp)
+{
+       struct page    *status = selinux_kernel_status_page();
+
+       if (!status)
+               return -ENOMEM;
+
+       filp->private_data = status;
+
+       return 0;
+}
+
+static ssize_t sel_read_handle_status(struct file *filp, char __user *buf,
+                                     size_t count, loff_t *ppos)
+{
+       struct page    *status = filp->private_data;
+
+       BUG_ON(!status);
+
+       return simple_read_from_buffer(buf, count, ppos,
+                                      page_address(status),
+                                      sizeof(struct selinux_kernel_status));
+}
+
+static int sel_mmap_handle_status(struct file *filp,
+                                 struct vm_area_struct *vma)
+{
+       struct page    *status = filp->private_data;
+       unsigned long   size = vma->vm_end - vma->vm_start;
+
+       BUG_ON(!status);
+
+       /* only allows one page from the head */
+       if (vma->vm_pgoff > 0 || size != PAGE_SIZE)
+               return -EIO;
+       /* disallow writable mapping */
+       if (vma->vm_flags & VM_WRITE)
+               return -EPERM;
+       /* disallow mprotect() turns it into writable */
+       vma->vm_flags &= ~VM_MAYWRITE;
+
+       return remap_pfn_range(vma, vma->vm_start,
+                              page_to_pfn(status),
+                              size, vma->vm_page_prot);
+}
+
+static const struct file_operations sel_handle_status_ops = {
+       .open           = sel_open_handle_status,
+       .read           = sel_read_handle_status,
+       .mmap           = sel_mmap_handle_status,
+       .llseek         = generic_file_llseek,
+};
+
  #ifdef CONFIG_SECURITY_SELINUX_DISABLE
  static ssize_t sel_write_disable(struct file *file, const char __user *buf,
                                  size_t count, loff_t *ppos)
@@ -296,6 +354,141 @@ static const struct file_operations sel_mls_ops = {
         .llseek         = generic_file_llseek,
  };
  
+struct policy_load_memory {
+       size_t len;
+       void *data;
+};
+
+static int sel_open_policy(struct inode *inode, struct file *filp)
+{
+       struct policy_load_memory *plm = NULL;
+       int rc;
+
+       BUG_ON(filp->private_data);
+
+       mutex_lock(&sel_mutex);
+
+       rc = task_has_security(current, SECURITY__READ_POLICY);
+       if (rc)
+               goto err;
+
+       rc = -EBUSY;
+       if (policy_opened)
+               goto err;
+
+       rc = -ENOMEM;
+       plm = kzalloc(sizeof(*plm), GFP_KERNEL);
+       if (!plm)
+               goto err;
+
+       if (i_size_read(inode) != security_policydb_len()) {
+               mutex_lock(&inode->i_mutex);
+               i_size_write(inode, security_policydb_len());
+               mutex_unlock(&inode->i_mutex);
+       }
+
+       rc = security_read_policy(&plm->data, &plm->len);
+       if (rc)
+               goto err;
+
+       policy_opened = 1;
+
+       filp->private_data = plm;
+
+       mutex_unlock(&sel_mutex);
+
+       return 0;
+err:
+       mutex_unlock(&sel_mutex);
+
+       if (plm)
+               vfree(plm->data);
+       kfree(plm);
+       return rc;
+}
+
+static int sel_release_policy(struct inode *inode, struct file *filp)
+{
+       struct policy_load_memory *plm = filp->private_data;
+
+       BUG_ON(!plm);
+
+       policy_opened = 0;
+
+       vfree(plm->data);
+       kfree(plm);
+
+       return 0;
+}
+
+static ssize_t sel_read_policy(struct file *filp, char __user *buf,
+                              size_t count, loff_t *ppos)
+{
+       struct policy_load_memory *plm = filp->private_data;
+       int ret;
+
+       mutex_lock(&sel_mutex);
+
+       ret = task_has_security(current, SECURITY__READ_POLICY);
+       if (ret)
+               goto out;
+
+       ret = simple_read_from_buffer(buf, count, ppos, plm->data, plm->len);
+out:
+       mutex_unlock(&sel_mutex);
+       return ret;
+}
+
+static int sel_mmap_policy_fault(struct vm_area_struct *vma,
+                                struct vm_fault *vmf)
+{
+       struct policy_load_memory *plm = vma->vm_file->private_data;
+       unsigned long offset;
+       struct page *page;
+
+       if (vmf->flags & (FAULT_FLAG_MKWRITE | FAULT_FLAG_WRITE))
+               return VM_FAULT_SIGBUS;
+
+       offset = vmf->pgoff << PAGE_SHIFT;
+       if (offset >= roundup(plm->len, PAGE_SIZE))
+               return VM_FAULT_SIGBUS;
+
+       page = vmalloc_to_page(plm->data + offset);
+       get_page(page);
+
+       vmf->page = page;
+
+       return 0;
+}
+
+static struct vm_operations_struct sel_mmap_policy_ops = {
+       .fault = sel_mmap_policy_fault,
+       .page_mkwrite = sel_mmap_policy_fault,
+};
+
+int sel_mmap_policy(struct file *filp, struct vm_area_struct *vma)
+{
+       if (vma->vm_flags & VM_SHARED) {
+               /* do not allow mprotect to make mapping writable */
+               vma->vm_flags &= ~VM_MAYWRITE;
+
+               if (vma->vm_flags & VM_WRITE)
+                       return -EACCES;
+       }
+
+       vma->vm_flags |= VM_RESERVED;
+       vma->vm_ops = &sel_mmap_policy_ops;
+
+       return 0;
+}
+
+static const struct file_operations sel_policy_ops = {
+       .open           = sel_open_policy,
+       .read           = sel_read_policy,
+       .mmap           = sel_mmap_policy,
+       .release        = sel_release_policy,
+};
+
  static ssize_t sel_write_load(struct file *file, const char __user *buf,
                               size_t count, loff_t *ppos)
  
@@ -1612,6 +1805,8 @@ static int sel_fill_super(struct super_block *sb, void *data, int silent)
                 [SEL_CHECKREQPROT] = {"checkreqprot", &sel_checkreqprot_ops, S_IRUGO|S_IWUSR},
                 [SEL_REJECT_UNKNOWN] = {"reject_unknown", &sel_handle_unknown_ops, S_IRUGO},
                 [SEL_DENY_UNKNOWN] = {"deny_unknown", &sel_handle_unknown_ops, S_IRUGO},
+               [SEL_STATUS] = {"status", &sel_handle_status_ops, S_IRUGO},
+               [SEL_POLICY] = {"policy", &sel_policy_ops, S_IRUSR},
                 /* last one */ {""}
         };
         ret = simple_fill_super(sb, SELINUX_MAGIC, selinux_files);
diff --git a/security/selinux/ss/Makefile b/security/selinux/ss/Makefile

deleted file mode 100644 (file)

index 15d4e62..0000000
--- a/security/selinux/ss/Makefile
+++ /dev/null
@@ -1,9 +0,0 @@
-#
-# Makefile for building the SELinux security server as part of the kernel tree.
-#
-
-EXTRA_CFLAGS += -Isecurity/selinux -Isecurity/selinux/include
-obj-y := ss.o
-
-ss-y := ebitmap.o hashtab.o symtab.o sidtab.o avtab.o policydb.o services.o conditional.o mls.o
-
diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c

index 929480c6c4306e874eff82db107045b944cdd33b..a3dd9faa19c01eda269b13f7cfcd7ab6da6aa098 100644 (file)
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -266,8 +266,8 @@ int avtab_alloc(struct avtab *h, u32 nrules)
         if (shift > 2)
                 shift = shift - 2;
         nslot = 1 << shift;
-       if (nslot > MAX_AVTAB_SIZE)
-               nslot = MAX_AVTAB_SIZE;
+       if (nslot > MAX_AVTAB_HASH_BUCKETS)
+               nslot = MAX_AVTAB_HASH_BUCKETS;
         mask = nslot - 1;
  
         h->htable = kcalloc(nslot, sizeof(*(h->htable)), GFP_KERNEL);
@@ -501,6 +501,48 @@ bad:
         goto out;
  }
  
+int avtab_write_item(struct policydb *p, struct avtab_node *cur, void *fp)
+{
+       __le16 buf16[4];
+       __le32 buf32[1];
+       int rc;
+
+       buf16[0] = cpu_to_le16(cur->key.source_type);
+       buf16[1] = cpu_to_le16(cur->key.target_type);
+       buf16[2] = cpu_to_le16(cur->key.target_class);
+       buf16[3] = cpu_to_le16(cur->key.specified);
+       rc = put_entry(buf16, sizeof(u16), 4, fp);
+       if (rc)
+               return rc;
+       buf32[0] = cpu_to_le32(cur->datum.data);
+       rc = put_entry(buf32, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+       return 0;
+}
+
+int avtab_write(struct policydb *p, struct avtab *a, void *fp)
+{
+       unsigned int i;
+       int rc = 0;
+       struct avtab_node *cur;
+       __le32 buf[1];
+
+       buf[0] = cpu_to_le32(a->nel);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       for (i = 0; i < a->nslot; i++) {
+               for (cur = a->htable[i]; cur; cur = cur->next) {
+                       rc = avtab_write_item(p, cur, fp);
+                       if (rc)
+                               return rc;
+               }
+       }
+
+       return rc;
+}
  void avtab_cache_init(void)
  {
         avtab_node_cachep = kmem_cache_create("avtab_node",
diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h

index cd4f734e27499cf2f9e203f0edb2557cf02bc24c..dff0c75345c1642fd2b3181a6677035947834951 100644 (file)
--- a/security/selinux/ss/avtab.h
+++ b/security/selinux/ss/avtab.h
@@ -71,6 +71,8 @@ int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
                     void *p);
  
  int avtab_read(struct avtab *a, void *fp, struct policydb *pol);
+int avtab_write_item(struct policydb *p, struct avtab_node *cur, void *fp);
+int avtab_write(struct policydb *p, struct avtab *a, void *fp);
  
  struct avtab_node *avtab_insert_nonunique(struct avtab *h, struct avtab_key *key,
                                           struct avtab_datum *datum);
@@ -85,7 +87,6 @@ void avtab_cache_destroy(void);
  #define MAX_AVTAB_HASH_BITS 11
  #define MAX_AVTAB_HASH_BUCKETS (1 << MAX_AVTAB_HASH_BITS)
  #define MAX_AVTAB_HASH_MASK (MAX_AVTAB_HASH_BUCKETS-1)
-#define MAX_AVTAB_SIZE MAX_AVTAB_HASH_BUCKETS
  
  #endif /* _SS_AVTAB_H_ */
  
diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c

index c91e150c3087d78127eb8ca6cc6faccf04e9cd32..655fe1c6cc69dccd862b3142a37e41b6fb4010a7 100644 (file)
--- a/security/selinux/ss/conditional.c
+++ b/security/selinux/ss/conditional.c
@@ -490,6 +490,129 @@ err:
         return rc;
  }
  
+int cond_write_bool(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct cond_bool_datum *booldatum = datum;
+       struct policy_data *pd = ptr;
+       void *fp = pd->fp;
+       __le32 buf[3];
+       u32 len;
+       int rc;
+
+       len = strlen(key);
+       buf[0] = cpu_to_le32(booldatum->value);
+       buf[1] = cpu_to_le32(booldatum->state);
+       buf[2] = cpu_to_le32(len);
+       rc = put_entry(buf, sizeof(u32), 3, fp);
+       if (rc)
+               return rc;
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+       return 0;
+}
+
+/*
+ * cond_write_cond_av_list doesn't write out the av_list nodes.
+ * Instead it writes out the key/value pairs from the avtab. This
+ * is necessary because there is no way to uniquely identifying rules
+ * in the avtab so it is not possible to associate individual rules
+ * in the avtab with a conditional without saving them as part of
+ * the conditional. This means that the avtab with the conditional
+ * rules will not be saved but will be rebuilt on policy load.
+ */
+static int cond_write_av_list(struct policydb *p,
+                             struct cond_av_list *list, struct policy_file *fp)
+{
+       __le32 buf[1];
+       struct cond_av_list *cur_list;
+       u32 len;
+       int rc;
+
+       len = 0;
+       for (cur_list = list; cur_list != NULL; cur_list = cur_list->next)
+               len++;
+
+       buf[0] = cpu_to_le32(len);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       if (len == 0)
+               return 0;
+
+       for (cur_list = list; cur_list != NULL; cur_list = cur_list->next) {
+               rc = avtab_write_item(p, cur_list->node, fp);
+               if (rc)
+                       return rc;
+       }
+
+       return 0;
+}
+
+int cond_write_node(struct policydb *p, struct cond_node *node,
+                   struct policy_file *fp)
+{
+       struct cond_expr *cur_expr;
+       __le32 buf[2];
+       int rc;
+       u32 len = 0;
+
+       buf[0] = cpu_to_le32(node->cur_state);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       for (cur_expr = node->expr; cur_expr != NULL; cur_expr = cur_expr->next)
+               len++;
+
+       buf[0] = cpu_to_le32(len);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       for (cur_expr = node->expr; cur_expr != NULL; cur_expr = cur_expr->next) {
+               buf[0] = cpu_to_le32(cur_expr->expr_type);
+               buf[1] = cpu_to_le32(cur_expr->bool);
+               rc = put_entry(buf, sizeof(u32), 2, fp);
+               if (rc)
+                       return rc;
+       }
+
+       rc = cond_write_av_list(p, node->true_list, fp);
+       if (rc)
+               return rc;
+       rc = cond_write_av_list(p, node->false_list, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+int cond_write_list(struct policydb *p, struct cond_node *list, void *fp)
+{
+       struct cond_node *cur;
+       u32 len;
+       __le32 buf[1];
+       int rc;
+
+       len = 0;
+       for (cur = list; cur != NULL; cur = cur->next)
+               len++;
+       buf[0] = cpu_to_le32(len);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       for (cur = list; cur != NULL; cur = cur->next) {
+               rc = cond_write_node(p, cur, fp);
+               if (rc)
+                       return rc;
+       }
+
+       return 0;
+}
  /* Determine whether additional permissions are granted by the conditional
   * av table, and if so, add them to the result
   */
diff --git a/security/selinux/ss/conditional.h b/security/selinux/ss/conditional.h

index 53ddb013ae573f8bb053da9fa9416fbcf5ee9701..3f209c635295681f026874e332573577e743dea4 100644 (file)
--- a/security/selinux/ss/conditional.h
+++ b/security/selinux/ss/conditional.h
@@ -69,6 +69,8 @@ int cond_index_bool(void *key, void *datum, void *datap);
  
  int cond_read_bool(struct policydb *p, struct hashtab *h, void *fp);
  int cond_read_list(struct policydb *p, void *fp);
+int cond_write_bool(void *key, void *datum, void *ptr);
+int cond_write_list(struct policydb *p, struct cond_node *list, void *fp);
  
  void cond_compute_av(struct avtab *ctab, struct avtab_key *key, struct av_decision *avd);
  
diff --git a/security/selinux/ss/ebitmap.c b/security/selinux/ss/ebitmap.c

index 04b6145d767f96093423d5f733e6fe1b6a4e5687..d42951fcbe877355b08c16d5a63526c729f4355c 100644 (file)
--- a/security/selinux/ss/ebitmap.c
+++ b/security/selinux/ss/ebitmap.c
@@ -22,6 +22,8 @@
  #include "ebitmap.h"
  #include "policydb.h"
  
+#define BITS_PER_U64   (sizeof(u64) * 8)
+
  int ebitmap_cmp(struct ebitmap *e1, struct ebitmap *e2)
  {
         struct ebitmap_node *n1, *n2;
@@ -363,10 +365,10 @@ int ebitmap_read(struct ebitmap *e, void *fp)
         e->highbit = le32_to_cpu(buf[1]);
         count = le32_to_cpu(buf[2]);
  
-       if (mapunit != sizeof(u64) * 8) {
+       if (mapunit != BITS_PER_U64) {
                 printk(KERN_ERR "SELinux: ebitmap: map size %u does not "
                        "match my size %Zd (high bit was %d)\n",
-                      mapunit, sizeof(u64) * 8, e->highbit);
+                      mapunit, BITS_PER_U64, e->highbit);
                 goto bad;
         }
  
@@ -446,3 +448,78 @@ bad:
         ebitmap_destroy(e);
         goto out;
  }
+
+int ebitmap_write(struct ebitmap *e, void *fp)
+{
+       struct ebitmap_node *n;
+       u32 count;
+       __le32 buf[3];
+       u64 map;
+       int bit, last_bit, last_startbit, rc;
+
+       buf[0] = cpu_to_le32(BITS_PER_U64);
+
+       count = 0;
+       last_bit = 0;
+       last_startbit = -1;
+       ebitmap_for_each_positive_bit(e, n, bit) {
+               if (rounddown(bit, (int)BITS_PER_U64) > last_startbit) {
+                       count++;
+                       last_startbit = rounddown(bit, BITS_PER_U64);
+               }
+               last_bit = roundup(bit + 1, BITS_PER_U64);
+       }
+       buf[1] = cpu_to_le32(last_bit);
+       buf[2] = cpu_to_le32(count);
+
+       rc = put_entry(buf, sizeof(u32), 3, fp);
+       if (rc)
+               return rc;
+
+       map = 0;
+       last_startbit = INT_MIN;
+       ebitmap_for_each_positive_bit(e, n, bit) {
+               if (rounddown(bit, (int)BITS_PER_U64) > last_startbit) {
+                       __le64 buf64[1];
+
+                       /* this is the very first bit */
+                       if (!map) {
+                               last_startbit = rounddown(bit, BITS_PER_U64);
+                               map = (u64)1 << (bit - last_startbit);
+                               continue;
+                       }
+
+                       /* write the last node */
+                       buf[0] = cpu_to_le32(last_startbit);
+                       rc = put_entry(buf, sizeof(u32), 1, fp);
+                       if (rc)
+                               return rc;
+
+                       buf64[0] = cpu_to_le64(map);
+                       rc = put_entry(buf64, sizeof(u64), 1, fp);
+                       if (rc)
+                               return rc;
+
+                       /* set up for the next node */
+                       map = 0;
+                       last_startbit = rounddown(bit, BITS_PER_U64);
+               }
+               map |= (u64)1 << (bit - last_startbit);
+       }
+       /* write the last node */
+       if (map) {
+               __le64 buf64[1];
+
+               /* write the last node */
+               buf[0] = cpu_to_le32(last_startbit);
+               rc = put_entry(buf, sizeof(u32), 1, fp);
+               if (rc)
+                       return rc;
+
+               buf64[0] = cpu_to_le64(map);
+               rc = put_entry(buf64, sizeof(u64), 1, fp);
+               if (rc)
+                       return rc;
+       }
+       return 0;
+}
diff --git a/security/selinux/ss/ebitmap.h b/security/selinux/ss/ebitmap.h

index f283b4367f54d640e6a3b6fedb8f3180786883c3..1f4e93c2ae8695430c4a9ff33cfd31e7d36cec36 100644 (file)
--- a/security/selinux/ss/ebitmap.h
+++ b/security/selinux/ss/ebitmap.h
@@ -123,6 +123,7 @@ int ebitmap_get_bit(struct ebitmap *e, unsigned long bit);
  int ebitmap_set_bit(struct ebitmap *e, unsigned long bit, int value);
  void ebitmap_destroy(struct ebitmap *e);
  int ebitmap_read(struct ebitmap *e, void *fp);
+int ebitmap_write(struct ebitmap *e, void *fp);
  
  #ifdef CONFIG_NETLABEL
  int ebitmap_netlbl_export(struct ebitmap *ebmap,
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c

index 3a29704be8ce10f4409dd0a3d4f0bea8bb0d1086..94f630d93a5c5d0964a51e030646d26652619ee6 100644 (file)
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -37,6 +37,7 @@
  #include "policydb.h"
  #include "conditional.h"
  #include "mls.h"
+#include "services.h"
  
  #define _DEBUG_HASHES
  
@@ -185,9 +186,19 @@ static u32 rangetr_hash(struct hashtab *h, const void *k)
  static int rangetr_cmp(struct hashtab *h, const void *k1, const void *k2)
  {
         const struct range_trans *key1 = k1, *key2 = k2;
-       return (key1->source_type != key2->source_type ||
-               key1->target_type != key2->target_type ||
-               key1->target_class != key2->target_class);
+       int v;
+
+       v = key1->source_type - key2->source_type;
+       if (v)
+               return v;
+
+       v = key1->target_type - key2->target_type;
+       if (v)
+               return v;
+
+       v = key1->target_class - key2->target_class;
+
+       return v;
  }
  
  /*
@@ -1624,11 +1635,11 @@ static int role_bounds_sanity_check(void *key, void *datum, void *datap)
  
  static int type_bounds_sanity_check(void *key, void *datum, void *datap)
  {
-       struct type_datum *upper, *type;
+       struct type_datum *upper;
         struct policydb *p = datap;
         int depth = 0;
  
-       upper = type = datum;
+       upper = datum;
         while (upper->bounds) {
                 if (++depth == POLICYDB_BOUNDS_MAXDEPTH) {
                         printk(KERN_ERR "SELinux: type %s: "
@@ -2306,3 +2317,843 @@ bad:
         policydb_destroy(p);
         goto out;
  }
+
+/*
+ * Write a MLS level structure to a policydb binary
+ * representation file.
+ */
+static int mls_write_level(struct mls_level *l, void *fp)
+{
+       __le32 buf[1];
+       int rc;
+
+       buf[0] = cpu_to_le32(l->sens);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       rc = ebitmap_write(&l->cat, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+/*
+ * Write a MLS range structure to a policydb binary
+ * representation file.
+ */
+static int mls_write_range_helper(struct mls_range *r, void *fp)
+{
+       __le32 buf[3];
+       size_t items;
+       int rc, eq;
+
+       eq = mls_level_eq(&r->level[1], &r->level[0]);
+
+       if (eq)
+               items = 2;
+       else
+               items = 3;
+       buf[0] = cpu_to_le32(items-1);
+       buf[1] = cpu_to_le32(r->level[0].sens);
+       if (!eq)
+               buf[2] = cpu_to_le32(r->level[1].sens);
+
+       BUG_ON(items > (sizeof(buf)/sizeof(buf[0])));
+
+       rc = put_entry(buf, sizeof(u32), items, fp);
+       if (rc)
+               return rc;
+
+       rc = ebitmap_write(&r->level[0].cat, fp);
+       if (rc)
+               return rc;
+       if (!eq) {
+               rc = ebitmap_write(&r->level[1].cat, fp);
+               if (rc)
+                       return rc;
+       }
+
+       return 0;
+}
+
+static int sens_write(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct level_datum *levdatum = datum;
+       struct policy_data *pd = ptr;
+       void *fp = pd->fp;
+       __le32 buf[2];
+       size_t len;
+       int rc;
+
+       len = strlen(key);
+       buf[0] = cpu_to_le32(len);
+       buf[1] = cpu_to_le32(levdatum->isalias);
+       rc = put_entry(buf, sizeof(u32), 2, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       rc = mls_write_level(levdatum->level, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int cat_write(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct cat_datum *catdatum = datum;
+       struct policy_data *pd = ptr;
+       void *fp = pd->fp;
+       __le32 buf[3];
+       size_t len;
+       int rc;
+
+       len = strlen(key);
+       buf[0] = cpu_to_le32(len);
+       buf[1] = cpu_to_le32(catdatum->value);
+       buf[2] = cpu_to_le32(catdatum->isalias);
+       rc = put_entry(buf, sizeof(u32), 3, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int role_trans_write(struct role_trans *r, void *fp)
+{
+       struct role_trans *tr;
+       u32 buf[3];
+       size_t nel;
+       int rc;
+
+       nel = 0;
+       for (tr = r; tr; tr = tr->next)
+               nel++;
+       buf[0] = cpu_to_le32(nel);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+       for (tr = r; tr; tr = tr->next) {
+               buf[0] = cpu_to_le32(tr->role);
+               buf[1] = cpu_to_le32(tr->type);
+               buf[2] = cpu_to_le32(tr->new_role);
+               rc = put_entry(buf, sizeof(u32), 3, fp);
+               if (rc)
+                       return rc;
+       }
+
+       return 0;
+}
+
+static int role_allow_write(struct role_allow *r, void *fp)
+{
+       struct role_allow *ra;
+       u32 buf[2];
+       size_t nel;
+       int rc;
+
+       nel = 0;
+       for (ra = r; ra; ra = ra->next)
+               nel++;
+       buf[0] = cpu_to_le32(nel);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+       for (ra = r; ra; ra = ra->next) {
+               buf[0] = cpu_to_le32(ra->role);
+               buf[1] = cpu_to_le32(ra->new_role);
+               rc = put_entry(buf, sizeof(u32), 2, fp);
+               if (rc)
+                       return rc;
+       }
+       return 0;
+}
+
+/*
+ * Write a security context structure
+ * to a policydb binary representation file.
+ */
+static int context_write(struct policydb *p, struct context *c,
+                        void *fp)
+{
+       int rc;
+       __le32 buf[3];
+
+       buf[0] = cpu_to_le32(c->user);
+       buf[1] = cpu_to_le32(c->role);
+       buf[2] = cpu_to_le32(c->type);
+
+       rc = put_entry(buf, sizeof(u32), 3, fp);
+       if (rc)
+               return rc;
+
+       rc = mls_write_range_helper(&c->range, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+/*
+ * The following *_write functions are used to
+ * write the symbol data to a policy database
+ * binary representation file.
+ */
+
+static int perm_write(void *vkey, void *datum, void *fp)
+{
+       char *key = vkey;
+       struct perm_datum *perdatum = datum;
+       __le32 buf[2];
+       size_t len;
+       int rc;
+
+       len = strlen(key);
+       buf[0] = cpu_to_le32(len);
+       buf[1] = cpu_to_le32(perdatum->value);
+       rc = put_entry(buf, sizeof(u32), 2, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int common_write(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct common_datum *comdatum = datum;
+       struct policy_data *pd = ptr;
+       void *fp = pd->fp;
+       __le32 buf[4];
+       size_t len;
+       int rc;
+
+       len = strlen(key);
+       buf[0] = cpu_to_le32(len);
+       buf[1] = cpu_to_le32(comdatum->value);
+       buf[2] = cpu_to_le32(comdatum->permissions.nprim);
+       buf[3] = cpu_to_le32(comdatum->permissions.table->nel);
+       rc = put_entry(buf, sizeof(u32), 4, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       rc = hashtab_map(comdatum->permissions.table, perm_write, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int write_cons_helper(struct policydb *p, struct constraint_node *node,
+                            void *fp)
+{
+       struct constraint_node *c;
+       struct constraint_expr *e;
+       __le32 buf[3];
+       u32 nel;
+       int rc;
+
+       for (c = node; c; c = c->next) {
+               nel = 0;
+               for (e = c->expr; e; e = e->next)
+                       nel++;
+               buf[0] = cpu_to_le32(c->permissions);
+               buf[1] = cpu_to_le32(nel);
+               rc = put_entry(buf, sizeof(u32), 2, fp);
+               if (rc)
+                       return rc;
+               for (e = c->expr; e; e = e->next) {
+                       buf[0] = cpu_to_le32(e->expr_type);
+                       buf[1] = cpu_to_le32(e->attr);
+                       buf[2] = cpu_to_le32(e->op);
+                       rc = put_entry(buf, sizeof(u32), 3, fp);
+                       if (rc)
+                               return rc;
+
+                       switch (e->expr_type) {
+                       case CEXPR_NAMES:
+                               rc = ebitmap_write(&e->names, fp);
+                               if (rc)
+                                       return rc;
+                               break;
+                       default:
+                               break;
+                       }
+               }
+       }
+
+       return 0;
+}
+
+static int class_write(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct class_datum *cladatum = datum;
+       struct policy_data *pd = ptr;
+       void *fp = pd->fp;
+       struct policydb *p = pd->p;
+       struct constraint_node *c;
+       __le32 buf[6];
+       u32 ncons;
+       size_t len, len2;
+       int rc;
+
+       len = strlen(key);
+       if (cladatum->comkey)
+               len2 = strlen(cladatum->comkey);
+       else
+               len2 = 0;
+
+       ncons = 0;
+       for (c = cladatum->constraints; c; c = c->next)
+               ncons++;
+
+       buf[0] = cpu_to_le32(len);
+       buf[1] = cpu_to_le32(len2);
+       buf[2] = cpu_to_le32(cladatum->value);
+       buf[3] = cpu_to_le32(cladatum->permissions.nprim);
+       if (cladatum->permissions.table)
+               buf[4] = cpu_to_le32(cladatum->permissions.table->nel);
+       else
+               buf[4] = 0;
+       buf[5] = cpu_to_le32(ncons);
+       rc = put_entry(buf, sizeof(u32), 6, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       if (cladatum->comkey) {
+               rc = put_entry(cladatum->comkey, 1, len2, fp);
+               if (rc)
+                       return rc;
+       }
+
+       rc = hashtab_map(cladatum->permissions.table, perm_write, fp);
+       if (rc)
+               return rc;
+
+       rc = write_cons_helper(p, cladatum->constraints, fp);
+       if (rc)
+               return rc;
+
+       /* write out the validatetrans rule */
+       ncons = 0;
+       for (c = cladatum->validatetrans; c; c = c->next)
+               ncons++;
+
+       buf[0] = cpu_to_le32(ncons);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       rc = write_cons_helper(p, cladatum->validatetrans, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int role_write(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct role_datum *role = datum;
+       struct policy_data *pd = ptr;
+       void *fp = pd->fp;
+       struct policydb *p = pd->p;
+       __le32 buf[3];
+       size_t items, len;
+       int rc;
+
+       len = strlen(key);
+       items = 0;
+       buf[items++] = cpu_to_le32(len);
+       buf[items++] = cpu_to_le32(role->value);
+       if (p->policyvers >= POLICYDB_VERSION_BOUNDARY)
+               buf[items++] = cpu_to_le32(role->bounds);
+
+       BUG_ON(items > (sizeof(buf)/sizeof(buf[0])));
+
+       rc = put_entry(buf, sizeof(u32), items, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       rc = ebitmap_write(&role->dominates, fp);
+       if (rc)
+               return rc;
+
+       rc = ebitmap_write(&role->types, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int type_write(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct type_datum *typdatum = datum;
+       struct policy_data *pd = ptr;
+       struct policydb *p = pd->p;
+       void *fp = pd->fp;
+       __le32 buf[4];
+       int rc;
+       size_t items, len;
+
+       len = strlen(key);
+       items = 0;
+       buf[items++] = cpu_to_le32(len);
+       buf[items++] = cpu_to_le32(typdatum->value);
+       if (p->policyvers >= POLICYDB_VERSION_BOUNDARY) {
+               u32 properties = 0;
+
+               if (typdatum->primary)
+                       properties |= TYPEDATUM_PROPERTY_PRIMARY;
+
+               if (typdatum->attribute)
+                       properties |= TYPEDATUM_PROPERTY_ATTRIBUTE;
+
+               buf[items++] = cpu_to_le32(properties);
+               buf[items++] = cpu_to_le32(typdatum->bounds);
+       } else {
+               buf[items++] = cpu_to_le32(typdatum->primary);
+       }
+       BUG_ON(items > (sizeof(buf) / sizeof(buf[0])));
+       rc = put_entry(buf, sizeof(u32), items, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int user_write(void *vkey, void *datum, void *ptr)
+{
+       char *key = vkey;
+       struct user_datum *usrdatum = datum;
+       struct policy_data *pd = ptr;
+       struct policydb *p = pd->p;
+       void *fp = pd->fp;
+       __le32 buf[3];
+       size_t items, len;
+       int rc;
+
+       len = strlen(key);
+       items = 0;
+       buf[items++] = cpu_to_le32(len);
+       buf[items++] = cpu_to_le32(usrdatum->value);
+       if (p->policyvers >= POLICYDB_VERSION_BOUNDARY)
+               buf[items++] = cpu_to_le32(usrdatum->bounds);
+       BUG_ON(items > (sizeof(buf) / sizeof(buf[0])));
+       rc = put_entry(buf, sizeof(u32), items, fp);
+       if (rc)
+               return rc;
+
+       rc = put_entry(key, 1, len, fp);
+       if (rc)
+               return rc;
+
+       rc = ebitmap_write(&usrdatum->roles, fp);
+       if (rc)
+               return rc;
+
+       rc = mls_write_range_helper(&usrdatum->range, fp);
+       if (rc)
+               return rc;
+
+       rc = mls_write_level(&usrdatum->dfltlevel, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int (*write_f[SYM_NUM]) (void *key, void *datum,
+                               void *datap) =
+{
+       common_write,
+       class_write,
+       role_write,
+       type_write,
+       user_write,
+       cond_write_bool,
+       sens_write,
+       cat_write,
+};
+
+static int ocontext_write(struct policydb *p, struct policydb_compat_info *info,
+                         void *fp)
+{
+       unsigned int i, j, rc;
+       size_t nel, len;
+       __le32 buf[3];
+       u32 nodebuf[8];
+       struct ocontext *c;
+       for (i = 0; i < info->ocon_num; i++) {
+               nel = 0;
+               for (c = p->ocontexts[i]; c; c = c->next)
+                       nel++;
+               buf[0] = cpu_to_le32(nel);
+               rc = put_entry(buf, sizeof(u32), 1, fp);
+               if (rc)
+                       return rc;
+               for (c = p->ocontexts[i]; c; c = c->next) {
+                       switch (i) {
+                       case OCON_ISID:
+                               buf[0] = cpu_to_le32(c->sid[0]);
+                               rc = put_entry(buf, sizeof(u32), 1, fp);
+                               if (rc)
+                                       return rc;
+                               rc = context_write(p, &c->context[0], fp);
+                               if (rc)
+                                       return rc;
+                               break;
+                       case OCON_FS:
+                       case OCON_NETIF:
+                               len = strlen(c->u.name);
+                               buf[0] = cpu_to_le32(len);
+                               rc = put_entry(buf, sizeof(u32), 1, fp);
+                               if (rc)
+                                       return rc;
+                               rc = put_entry(c->u.name, 1, len, fp);
+                               if (rc)
+                                       return rc;
+                               rc = context_write(p, &c->context[0], fp);
+                               if (rc)
+                                       return rc;
+                               rc = context_write(p, &c->context[1], fp);
+                               if (rc)
+                                       return rc;
+                               break;
+                       case OCON_PORT:
+                               buf[0] = cpu_to_le32(c->u.port.protocol);
+                               buf[1] = cpu_to_le32(c->u.port.low_port);
+                               buf[2] = cpu_to_le32(c->u.port.high_port);
+                               rc = put_entry(buf, sizeof(u32), 3, fp);
+                               if (rc)
+                                       return rc;
+                               rc = context_write(p, &c->context[0], fp);
+                               if (rc)
+                                       return rc;
+                               break;
+                       case OCON_NODE:
+                               nodebuf[0] = c->u.node.addr; /* network order */
+                               nodebuf[1] = c->u.node.mask; /* network order */
+                               rc = put_entry(nodebuf, sizeof(u32), 2, fp);
+                               if (rc)
+                                       return rc;
+                               rc = context_write(p, &c->context[0], fp);
+                               if (rc)
+                                       return rc;
+                               break;
+                       case OCON_FSUSE:
+                               buf[0] = cpu_to_le32(c->v.behavior);
+                               len = strlen(c->u.name);
+                               buf[1] = cpu_to_le32(len);
+                               rc = put_entry(buf, sizeof(u32), 2, fp);
+                               if (rc)
+                                       return rc;
+                               rc = put_entry(c->u.name, 1, len, fp);
+                               if (rc)
+                                       return rc;
+                               rc = context_write(p, &c->context[0], fp);
+                               if (rc)
+                                       return rc;
+                               break;
+                       case OCON_NODE6:
+                               for (j = 0; j < 4; j++)
+                                       nodebuf[j] = c->u.node6.addr[j]; /* network order */
+                               for (j = 0; j < 4; j++)
+                                       nodebuf[j + 4] = c->u.node6.mask[j]; /* network order */
+                               rc = put_entry(nodebuf, sizeof(u32), 8, fp);
+                               if (rc)
+                                       return rc;
+                               rc = context_write(p, &c->context[0], fp);
+                               if (rc)
+                                       return rc;
+                               break;
+                       }
+               }
+       }
+       return 0;
+}
+
+static int genfs_write(struct policydb *p, void *fp)
+{
+       struct genfs *genfs;
+       struct ocontext *c;
+       size_t len;
+       __le32 buf[1];
+       int rc;
+
+       len = 0;
+       for (genfs = p->genfs; genfs; genfs = genfs->next)
+               len++;
+       buf[0] = cpu_to_le32(len);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+       for (genfs = p->genfs; genfs; genfs = genfs->next) {
+               len = strlen(genfs->fstype);
+               buf[0] = cpu_to_le32(len);
+               rc = put_entry(buf, sizeof(u32), 1, fp);
+               if (rc)
+                       return rc;
+               rc = put_entry(genfs->fstype, 1, len, fp);
+               if (rc)
+                       return rc;
+               len = 0;
+               for (c = genfs->head; c; c = c->next)
+                       len++;
+               buf[0] = cpu_to_le32(len);
+               rc = put_entry(buf, sizeof(u32), 1, fp);
+               if (rc)
+                       return rc;
+               for (c = genfs->head; c; c = c->next) {
+                       len = strlen(c->u.name);
+                       buf[0] = cpu_to_le32(len);
+                       rc = put_entry(buf, sizeof(u32), 1, fp);
+                       if (rc)
+                               return rc;
+                       rc = put_entry(c->u.name, 1, len, fp);
+                       if (rc)
+                               return rc;
+                       buf[0] = cpu_to_le32(c->v.sclass);
+                       rc = put_entry(buf, sizeof(u32), 1, fp);
+                       if (rc)
+                               return rc;
+                       rc = context_write(p, &c->context[0], fp);
+                       if (rc)
+                               return rc;
+               }
+       }
+       return 0;
+}
+
+static int range_count(void *key, void *data, void *ptr)
+{
+       int *cnt = ptr;
+       *cnt = *cnt + 1;
+
+       return 0;
+}
+
+static int range_write_helper(void *key, void *data, void *ptr)
+{
+       __le32 buf[2];
+       struct range_trans *rt = key;
+       struct mls_range *r = data;
+       struct policy_data *pd = ptr;
+       void *fp = pd->fp;
+       struct policydb *p = pd->p;
+       int rc;
+
+       buf[0] = cpu_to_le32(rt->source_type);
+       buf[1] = cpu_to_le32(rt->target_type);
+       rc = put_entry(buf, sizeof(u32), 2, fp);
+       if (rc)
+               return rc;
+       if (p->policyvers >= POLICYDB_VERSION_RANGETRANS) {
+               buf[0] = cpu_to_le32(rt->target_class);
+               rc = put_entry(buf, sizeof(u32), 1, fp);
+               if (rc)
+                       return rc;
+       }
+       rc = mls_write_range_helper(r, fp);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+static int range_write(struct policydb *p, void *fp)
+{
+       size_t nel;
+       __le32 buf[1];
+       int rc;
+       struct policy_data pd;
+
+       pd.p = p;
+       pd.fp = fp;
+
+       /* count the number of entries in the hashtab */
+       nel = 0;
+       rc = hashtab_map(p->range_tr, range_count, &nel);
+       if (rc)
+               return rc;
+
+       buf[0] = cpu_to_le32(nel);
+       rc = put_entry(buf, sizeof(u32), 1, fp);
+       if (rc)
+               return rc;
+
+       /* actually write all of the entries */
+       rc = hashtab_map(p->range_tr, range_write_helper, &pd);
+       if (rc)
+               return rc;
+
+       return 0;
+}
+
+/*
+ * Write the configuration data in a policy database
+ * structure to a policy database binary representation
+ * file.
+ */
+int policydb_write(struct policydb *p, void *fp)
+{
+       unsigned int i, num_syms;
+       int rc;
+       __le32 buf[4];
+       u32 config;
+       size_t len;
+       struct policydb_compat_info *info;
+
+       /*
+        * refuse to write policy older than compressed avtab
+        * to simplify the writer.  There are other tests dropped
+        * since we assume this throughout the writer code.  Be
+        * careful if you ever try to remove this restriction
+        */
+       if (p->policyvers < POLICYDB_VERSION_AVTAB) {
+               printk(KERN_ERR "SELinux: refusing to write policy version %d."
+                      "  Because it is less than version %d\n", p->policyvers,
+                      POLICYDB_VERSION_AVTAB);
+               return -EINVAL;
+       }
+
+       config = 0;
+       if (p->mls_enabled)
+               config |= POLICYDB_CONFIG_MLS;
+
+       if (p->reject_unknown)
+               config |= REJECT_UNKNOWN;
+       if (p->allow_unknown)
+               config |= ALLOW_UNKNOWN;
+
+       /* Write the magic number and string identifiers. */
+       buf[0] = cpu_to_le32(POLICYDB_MAGIC);
+       len = strlen(POLICYDB_STRING);
+       buf[1] = cpu_to_le32(len);
+       rc = put_entry(buf, sizeof(u32), 2, fp);
+       if (rc)
+               return rc;
+       rc = put_entry(POLICYDB_STRING, 1, len, fp);
+       if (rc)
+               return rc;
+
+       /* Write the version, config, and table sizes. */
+       info = policydb_lookup_compat(p->policyvers);
+       if (!info) {
+               printk(KERN_ERR "SELinux: compatibility lookup failed for policy "
+                   "version %d", p->policyvers);
+               return rc;
+       }
+
+       buf[0] = cpu_to_le32(p->policyvers);
+       buf[1] = cpu_to_le32(config);
+       buf[2] = cpu_to_le32(info->sym_num);
+       buf[3] = cpu_to_le32(info->ocon_num);
+
+       rc = put_entry(buf, sizeof(u32), 4, fp);
+       if (rc)
+               return rc;
+
+       if (p->policyvers >= POLICYDB_VERSION_POLCAP) {
+               rc = ebitmap_write(&p->policycaps, fp);
+               if (rc)
+                       return rc;
+       }
+
+       if (p->policyvers >= POLICYDB_VERSION_PERMISSIVE) {
+               rc = ebitmap_write(&p->permissive_map, fp);
+               if (rc)
+                       return rc;
+       }
+
+       num_syms = info->sym_num;
+       for (i = 0; i < num_syms; i++) {
+               struct policy_data pd;
+
+               pd.fp = fp;
+               pd.p = p;
+
+               buf[0] = cpu_to_le32(p->symtab[i].nprim);
+               buf[1] = cpu_to_le32(p->symtab[i].table->nel);
+
+               rc = put_entry(buf, sizeof(u32), 2, fp);
+               if (rc)
+                       return rc;
+               rc = hashtab_map(p->symtab[i].table, write_f[i], &pd);
+               if (rc)
+                       return rc;
+       }
+
+       rc = avtab_write(p, &p->te_avtab, fp);
+       if (rc)
+               return rc;
+
+       rc = cond_write_list(p, p->cond_list, fp);
+       if (rc)
+               return rc;
+
+       rc = role_trans_write(p->role_tr, fp);
+       if (rc)
+               return rc;
+
+       rc = role_allow_write(p->role_allow, fp);
+       if (rc)
+               return rc;
+
+       rc = ocontext_write(p, info, fp);
+       if (rc)
+               return rc;
+
+       rc = genfs_write(p, fp);
+       if (rc)
+               return rc;
+
+       rc = range_write(p, fp);
+       if (rc)
+               return rc;
+
+       for (i = 0; i < p->p_types.nprim; i++) {
+               struct ebitmap *e = flex_array_get(p->type_attr_map_array, i);
+
+               BUG_ON(!e);
+               rc = ebitmap_write(e, fp);
+               if (rc)
+                       return rc;
+       }
+
+       return 0;
+}
diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h

index 310e94442cb8b3535a8774b952de45ba0180794e..95d3d7de361e628adcc53b974533cafb384ef851 100644 (file)
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h
@@ -254,6 +254,9 @@ struct policydb {
  
         struct ebitmap permissive_map;
  
+       /* length of this policy when it was loaded */
+       size_t len;
+
         unsigned int policyvers;
  
         unsigned int reject_unknown : 1;
@@ -270,6 +273,7 @@ extern int policydb_class_isvalid(struct policydb *p, unsigned int class);
  extern int policydb_type_isvalid(struct policydb *p, unsigned int type);
  extern int policydb_role_isvalid(struct policydb *p, unsigned int role);
  extern int policydb_read(struct policydb *p, void *fp);
+extern int policydb_write(struct policydb *p, void *fp);
  
  #define PERM_SYMTAB_SIZE 32
  
@@ -290,6 +294,11 @@ struct policy_file {
         size_t len;
  };
  
+struct policy_data {
+       struct policydb *p;
+       void *fp;
+};
+
  static inline int next_entry(void *buf, struct policy_file *fp, size_t bytes)
  {
         if (bytes > fp->len)
@@ -301,6 +310,17 @@ static inline int next_entry(void *buf, struct policy_file *fp, size_t bytes)
         return 0;
  }
  
+static inline int put_entry(void *buf, size_t bytes, int num, struct policy_file *fp)
+{
+       size_t len = bytes * num;
+
+       memcpy(fp->data, buf, len);
+       fp->data += len;
+       fp->len -= len;
+
+       return 0;
+}
+
  extern u16 string_to_security_class(struct policydb *p, const char *name);
  extern u32 string_to_av_perm(struct policydb *p, u16 tclass, const char *name);
  
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c

index 9ea2feca3cd4f7b572361543fdf1b265002cecfb..223c1ff6ef2324488eca915d2ba87bc4f6e27fa9 100644 (file)
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -51,6 +51,7 @@
  #include <linux/mutex.h>
  #include <linux/selinux.h>
  #include <linux/flex_array.h>
+#include <linux/vmalloc.h>
  #include <net/netlabel.h>
  
  #include "flask.h"
@@ -991,7 +992,8 @@ static int context_struct_to_string(struct context *context, char **scontext, u3
  {
         char *scontextp;
  
-       *scontext = NULL;
+       if (scontext)
+               *scontext = NULL;
         *scontext_len = 0;
  
         if (context->len) {
@@ -1008,6 +1010,9 @@ static int context_struct_to_string(struct context *context, char **scontext, u3
         *scontext_len += strlen(policydb.p_type_val_to_name[context->type - 1]) + 1;
         *scontext_len += mls_compute_context_len(context);
  
+       if (!scontext)
+               return 0;
+
         /* Allocate space for the context; caller must free this space. */
         scontextp = kmalloc(*scontext_len, GFP_ATOMIC);
         if (!scontextp)
@@ -1047,7 +1052,8 @@ static int security_sid_to_context_core(u32 sid, char **scontext,
         struct context *context;
         int rc = 0;
  
-       *scontext = NULL;
+       if (scontext)
+               *scontext = NULL;
         *scontext_len  = 0;
  
         if (!ss_initialized) {
@@ -1055,6 +1061,8 @@ static int security_sid_to_context_core(u32 sid, char **scontext,
                         char *scontextp;
  
                         *scontext_len = strlen(initial_sid_to_string[sid]) + 1;
+                       if (!scontext)
+                               goto out;
                         scontextp = kmalloc(*scontext_len, GFP_ATOMIC);
                         if (!scontextp) {
                                 rc = -ENOMEM;
@@ -1769,6 +1777,7 @@ int security_load_policy(void *data, size_t len)
                         return rc;
                 }
  
+               policydb.len = len;
                 rc = selinux_set_mapping(&policydb, secclass_map,
                                          &current_mapping,
                                          &current_mapping_size);
@@ -1791,6 +1800,7 @@ int security_load_policy(void *data, size_t len)
                 selinux_complete_init();
                 avc_ss_reset(seqno);
                 selnl_notify_policyload(seqno);
+               selinux_status_update_policyload(seqno);
                 selinux_netlbl_cache_invalidate();
                 selinux_xfrm_notify_policyload();
                 return 0;
@@ -1804,6 +1814,7 @@ int security_load_policy(void *data, size_t len)
         if (rc)
                 return rc;
  
+       newpolicydb.len = len;
         /* If switching between different policy types, log MLS status */
         if (policydb.mls_enabled && !newpolicydb.mls_enabled)
                 printk(KERN_INFO "SELinux: Disabling MLS support...\n");
@@ -1870,6 +1881,7 @@ int security_load_policy(void *data, size_t len)
  
         avc_ss_reset(seqno);
         selnl_notify_policyload(seqno);
+       selinux_status_update_policyload(seqno);
         selinux_netlbl_cache_invalidate();
         selinux_xfrm_notify_policyload();
  
@@ -1883,6 +1895,17 @@ err:
  
  }
  
+size_t security_policydb_len(void)
+{
+       size_t len;
+
+       read_lock(&policy_rwlock);
+       len = policydb.len;
+       read_unlock(&policy_rwlock);
+
+       return len;
+}
+
  /**
   * security_port_sid - Obtain the SID for a port.
   * @protocol: protocol number
@@ -2374,6 +2397,7 @@ out:
         if (!rc) {
                 avc_ss_reset(seqno);
                 selnl_notify_policyload(seqno);
+               selinux_status_update_policyload(seqno);
                 selinux_xfrm_notify_policyload();
         }
         return rc;
@@ -3129,3 +3153,38 @@ netlbl_sid_to_secattr_failure:
         return rc;
  }
  #endif /* CONFIG_NETLABEL */
+
+/**
+ * security_read_policy - read the policy.
+ * @data: binary policy data
+ * @len: length of data in bytes
+ *
+ */
+int security_read_policy(void **data, ssize_t *len)
+{
+       int rc;
+       struct policy_file fp;
+
+       if (!ss_initialized)
+               return -EINVAL;
+
+       *len = security_policydb_len();
+
+       *data = vmalloc_user(*len);
+       if (!*data)
+               return -ENOMEM;
+
+       fp.data = *data;
+       fp.len = *len;
+
+       read_lock(&policy_rwlock);
+       rc = policydb_write(&policydb, &fp);
+       read_unlock(&policy_rwlock);
+
+       if (rc)
+               return rc;
+
+       *len = (unsigned long)fp.data - (unsigned long)*data;
+       return 0;
+
+}
diff --git a/security/selinux/ss/status.c b/security/selinux/ss/status.c

new file mode 100644 (file)

index 0000000..d982365
--- /dev/null
+++ b/security/selinux/ss/status.c
@@ -0,0 +1,126 @@
+/*
+ * mmap based event notifications for SELinux
+ *
+ * Author: KaiGai Kohei <kaigai@ak.jp.nec.com>
+ *
+ * Copyright (C) 2010 NEC corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2,
+ * as published by the Free Software Foundation.
+ */
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include "avc.h"
+#include "services.h"
+
+/*
+ * The selinux_status_page shall be exposed to userspace applications
+ * using mmap interface on /selinux/status.
+ * It enables to notify applications a few events that will cause reset
+ * of userspace access vector without context switching.
+ *
+ * The selinux_kernel_status structure on the head of status page is
+ * protected from concurrent accesses using seqlock logic, so userspace
+ * application should reference the status page according to the seqlock
+ * logic.
+ *
+ * Typically, application checks status->sequence at the head of access
+ * control routine. If it is odd-number, kernel is updating the status,
+ * so please wait for a moment. If it is changed from the last sequence
+ * number, it means something happen, so application will reset userspace
+ * avc, if needed.
+ * In most cases, application shall confirm the kernel status is not
+ * changed without any system call invocations.
+ */
+static struct page *selinux_status_page;
+static DEFINE_MUTEX(selinux_status_lock);
+
+/*
+ * selinux_kernel_status_page
+ *
+ * It returns a reference to selinux_status_page. If the status page is
+ * not allocated yet, it also tries to allocate it at the first time.
+ */
+struct page *selinux_kernel_status_page(void)
+{
+       struct selinux_kernel_status   *status;
+       struct page                    *result = NULL;
+
+       mutex_lock(&selinux_status_lock);
+       if (!selinux_status_page) {
+               selinux_status_page = alloc_page(GFP_KERNEL|__GFP_ZERO);
+
+               if (selinux_status_page) {
+                       status = page_address(selinux_status_page);
+
+                       status->version = SELINUX_KERNEL_STATUS_VERSION;
+                       status->sequence = 0;
+                       status->enforcing = selinux_enforcing;
+                       /*
+                        * NOTE: the next policyload event shall set
+                        * a positive value on the status->policyload,
+                        * although it may not be 1, but never zero.
+                        * So, application can know it was updated.
+                        */
+                       status->policyload = 0;
+                       status->deny_unknown = !security_get_allow_unknown();
+               }
+       }
+       result = selinux_status_page;
+       mutex_unlock(&selinux_status_lock);
+
+       return result;
+}
+
+/*
+ * selinux_status_update_setenforce
+ *
+ * It updates status of the current enforcing/permissive mode.
+ */
+void selinux_status_update_setenforce(int enforcing)
+{
+       struct selinux_kernel_status   *status;
+
+       mutex_lock(&selinux_status_lock);
+       if (selinux_status_page) {
+               status = page_address(selinux_status_page);
+
+               status->sequence++;
+               smp_wmb();
+
+               status->enforcing = enforcing;
+
+               smp_wmb();
+               status->sequence++;
+       }
+       mutex_unlock(&selinux_status_lock);
+}
+
+/*
+ * selinux_status_update_policyload
+ *
+ * It updates status of the times of policy reloaded, and current
+ * setting of deny_unknown.
+ */
+void selinux_status_update_policyload(int seqno)
+{
+       struct selinux_kernel_status   *status;
+
+       mutex_lock(&selinux_status_lock);
+       if (selinux_status_page) {
+               status = page_address(selinux_status_page);
+
+               status->sequence++;
+               smp_wmb();
+
+               status->policyload = seqno;
+               status->deny_unknown = !security_get_allow_unknown();
+
+               smp_wmb();
+               status->sequence++;
+       }
+       mutex_unlock(&selinux_status_lock);
+}
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c

index c448d57ae2b7721f72f17c5cf42e88f3f1bcba5e..bc39f4067af668874312af4187ad6f21dbdbb113 100644 (file)
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1281,12 +1281,11 @@ static int smack_task_getioprio(struct task_struct *p)
   *
   * Return 0 if read access is permitted
   */
-static int smack_task_setscheduler(struct task_struct *p, int policy,
-                                  struct sched_param *lp)
+static int smack_task_setscheduler(struct task_struct *p)
  {
         int rc;
  
-       rc = cap_task_setscheduler(p, policy, lp);
+       rc = cap_task_setscheduler(p);
         if (rc == 0)
                 rc = smk_curacc_on_task(p, MAY_WRITE);
         return rc;
@@ -3005,7 +3004,8 @@ static int smack_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
  {
         char *sp = smack_from_secid(secid);
  
-       *secdata = sp;
+       if (secdata)
+               *secdata = sp;
         *seclen = strlen(sp);
         return 0;
  }
diff --git a/security/tomoyo/common.c b/security/tomoyo/common.c

index c668b447c72594494417f6be50795f3b49f860bf..7556315c197823e0baedcf15d0f0930c74345429 100644 (file)
--- a/security/tomoyo/common.c
+++ b/security/tomoyo/common.c
@@ -768,8 +768,10 @@ static bool tomoyo_select_one(struct tomoyo_io_buffer *head, const char *data)
                 return true; /* Do nothing if open(O_WRONLY). */
         memset(&head->r, 0, sizeof(head->r));
         head->r.print_this_domain_only = true;
-       head->r.eof = !domain;
-       head->r.domain = &domain->list;
+       if (domain)
+               head->r.domain = &domain->list;
+       else
+               head->r.eof = 1;
         tomoyo_io_printf(head, "# select %s\n", data);
         if (domain && domain->is_deleted)
                 tomoyo_io_printf(head, "# This is a deleted domain.\n");
@@ -2051,13 +2053,22 @@ void tomoyo_check_profile(void)
                 const u8 profile = domain->profile;
                 if (tomoyo_profile_ptr[profile])
                         continue;
+               printk(KERN_ERR "You need to define profile %u before using it.\n",
+                      profile);
+               printk(KERN_ERR "Please see http://tomoyo.sourceforge.jp/2.3/ "
+                      "for more information.\n");
                 panic("Profile %u (used by '%s') not defined.\n",
                       profile, domain->domainname->name);
         }
         tomoyo_read_unlock(idx);
-       if (tomoyo_profile_version != 20090903)
+       if (tomoyo_profile_version != 20090903) {
+               printk(KERN_ERR "You need to install userland programs for "
+                      "TOMOYO 2.3 and initialize policy configuration.\n");
+               printk(KERN_ERR "Please see http://tomoyo.sourceforge.jp/2.3/ "
+                      "for more information.\n");
                 panic("Profile version %u is not supported.\n",
                       tomoyo_profile_version);
+       }
         printk(KERN_INFO "TOMOYO: 2.3.0\n");
         printk(KERN_INFO "Mandatory Access Control activated.\n");
  }
diff --git a/sound/core/control.c b/sound/core/control.c

index 070aab4901914870a0af43faef27e82e761acc32..45a818002d990f664cffadd066488a21c76fedd7 100644 (file)
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -31,6 +31,7 @@
  
  /* max number of user-defined controls */
  #define MAX_USER_CONTROLS      32
+#define MAX_CONTROL_COUNT      1028
  
  struct snd_kctl_ioctl {
         struct list_head list;          /* list of all ioctls */
@@ -195,6 +196,10 @@ static struct snd_kcontrol *snd_ctl_new(struct snd_kcontrol *control,
         
         if (snd_BUG_ON(!control || !control->count))
                 return NULL;
+
+       if (control->count > MAX_CONTROL_COUNT)
+               return NULL;
+
         kctl = kzalloc(sizeof(*kctl) + sizeof(struct snd_kcontrol_volatile) * control->count, GFP_KERNEL);
         if (kctl == NULL) {
                 snd_printk(KERN_ERR "Cannot allocate control instance\n");
diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c

index a7868ad4d530fd40b3a9bd608bd7860a7d35ce44..cbbed0db9e560315ae0559c7e5e97786387a5371 100644 (file)
--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -535,13 +535,15 @@ static int snd_rawmidi_release(struct inode *inode, struct file *file)
  {
         struct snd_rawmidi_file *rfile;
         struct snd_rawmidi *rmidi;
+       struct module *module;
  
         rfile = file->private_data;
         rmidi = rfile->rmidi;
         rawmidi_release_priv(rfile);
         kfree(rfile);
+       module = rmidi->card->module;
         snd_card_file_remove(rmidi->card, file);
-       module_put(rmidi->card->module);
+       module_put(module);
         return 0;
  }
  
diff --git a/sound/i2c/other/ak4xxx-adda.c b/sound/i2c/other/ak4xxx-adda.c

index 1adb8a3c2b62db229f9ba0b580d930d288e74c14..42d7844ecd0bfa66ddf3db96527ccc18c79f7485 100644 (file)
--- a/sound/i2c/other/ak4xxx-adda.c
+++ b/sound/i2c/other/ak4xxx-adda.c
@@ -900,7 +900,7 @@ static int proc_init(struct snd_akm4xxx *ak)
         return 0;
  }
  #else /* !CONFIG_PROC_FS */
-static int proc_init(struct snd_akm4xxx *ak) {}
+static int proc_init(struct snd_akm4xxx *ak) { return 0; }
  #endif
  
  int snd_akm4xxx_build_controls(struct snd_akm4xxx *ak)
diff --git a/sound/oss/soundcard.c b/sound/oss/soundcard.c

index 92aa762ffb7e97c998db3ca1da3a5cddc2968139..07f803e6d203a41615db0923cf073f7eadc283b6 100644 (file)
--- a/sound/oss/soundcard.c
+++ b/sound/oss/soundcard.c
@@ -391,11 +391,11 @@ static long sound_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
         case SND_DEV_DSP:
         case SND_DEV_DSP16:
         case SND_DEV_AUDIO:
-               return audio_ioctl(dev, file, cmd, p);
+               ret = audio_ioctl(dev, file, cmd, p);
                 break;
  
         case SND_DEV_MIDIN:
-               return MIDIbuf_ioctl(dev, file, cmd, p);
+               ret = MIDIbuf_ioctl(dev, file, cmd, p);
                 break;
  
         }
diff --git a/sound/pci/hda/patch_sigmatel.c b/sound/pci/hda/patch_sigmatel.c

index 95148e58026cfb045793d3ba26d570f26108c284..c16c5ba0fda0fe61387d6924a6b84307b0dedcbb 100644 (file)
--- a/sound/pci/hda/patch_sigmatel.c
+++ b/sound/pci/hda/patch_sigmatel.c
@@ -1747,6 +1747,8 @@ static struct snd_pci_quirk stac92hd71bxx_cfg_tbl[] = {
                       "HP dv6", STAC_HP_DV5),
         SND_PCI_QUIRK(PCI_VENDOR_ID_HP, 0x3061,
                       "HP dv6", STAC_HP_DV5), /* HP dv6-1110ax */
+       SND_PCI_QUIRK(PCI_VENDOR_ID_HP, 0x363e,
+                     "HP DV6", STAC_HP_DV5),
         SND_PCI_QUIRK_MASK(PCI_VENDOR_ID_HP, 0xfff0, 0x7010,
                       "HP", STAC_HP_DV5),
         SND_PCI_QUIRK(PCI_VENDOR_ID_DELL, 0x0233,
diff --git a/tools/perf/Makefile b/tools/perf/Makefile

index 4f1fa77c1feb0b7a854ab8a85bd21682cbc66377..1950e19af1cf6b23be56f0250bcf8cbb52b3d9f3 100644 (file)
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -1017,7 +1017,7 @@ builtin-revert.o wt-status.o: wt-status.h
  # we compile into subdirectories. if the target directory is not the source directory, they might not exists. So
  # we depend the various files onto their directories.
  DIRECTORY_DEPS = $(LIB_OBJS) $(BUILTIN_OBJS) $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h
-$(DIRECTORY_DEPS): $(sort $(dir $(DIRECTORY_DEPS)))
+$(DIRECTORY_DEPS): | $(sort $(dir $(DIRECTORY_DEPS)))
  # In the second step, we make a rule to actually create these directories
  $(sort $(dir $(DIRECTORY_DEPS))):
         $(QUIET_MKDIR)$(MKDIR) -p $@ 2>/dev/null
diff --git a/tools/perf/perf.h b/tools/perf/perf.h

index ef7aa0a0c5265191e8120e76f9f134b5e241fa53..95aaf565c704fb6ea67cee78d49e177f0ba8f595 100644 (file)
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -73,6 +73,18 @@ void get_term_dimensions(struct winsize *ws);
  #define cpu_relax()    asm volatile("":::"memory")
  #endif
  
+#ifdef __mips__
+#include "../../arch/mips/include/asm/unistd.h"
+#define rmb()          asm volatile(                                   \
+                               ".set   mips2\n\t"                      \
+                               "sync\n\t"                              \
+                               ".set   mips0"                          \
+                               : /* no output */                       \
+                               : /* no input */                        \
+                               : "memory")
+#define cpu_relax()    asm volatile("" ::: "memory")
+#endif
+
  #include <time.h>
  #include <unistd.h>
  #include <sys/types.h>
diff --git a/tools/perf/util/trace-event-scripting.c b/tools/perf/util/trace-event-scripting.c

index 7ea983acfaea521a425f7eb33250e319b0a3a960..f7af2fca965d5c206973f73ccd25b8b3607e02e5 100644 (file)
--- a/tools/perf/util/trace-event-scripting.c
+++ b/tools/perf/util/trace-event-scripting.c
@@ -97,7 +97,7 @@ void setup_python_scripting(void)
         register_python_scripting(&python_scripting_unsupported_ops);
  }
  #else
-struct scripting_ops python_scripting_ops;
+extern struct scripting_ops python_scripting_ops;
  
  void setup_python_scripting(void)
  {
@@ -158,7 +158,7 @@ void setup_perl_scripting(void)
         register_perl_scripting(&perl_scripting_unsupported_ops);
  }
  #else
-struct scripting_ops perl_scripting_ops;
+extern struct scripting_ops perl_scripting_ops;
  
  void setup_perl_scripting(void)
  {
diff --git a/tools/perf/util/ui/browsers/hists.c b/tools/perf/util/ui/browsers/hists.c

index dafdf6775d77f44d69abf1980b1a9cfe4ab053dc..6866aa4c41e09cfd0239559cc903506a351a549c 100644 (file)
--- a/tools/perf/util/ui/browsers/hists.c
+++ b/tools/perf/util/ui/browsers/hists.c
@@ -773,7 +773,7 @@ int hists__browse(struct hists *self, const char *helpline, const char *ev_name)
  
                         switch (key) {
                         case 'a':
-                               if (browser->selection->map == NULL &&
+                               if (browser->selection->map == NULL ||
                                     browser->selection->map->dso->annotate_warned)
                                         continue;
                                 goto do_annotate;
author	Linus Torvalds <torvalds@linux-foundation.org>
	Thu, 21 Oct 2010 19:49:15 +0000 (12:49 -0700)
committer	Linus Torvalds <torvalds@linux-foundation.org>
	Thu, 21 Oct 2010 19:49:15 +0000 (12:49 -0700)