Maciej W. Rozycki <macro@mips.com> <macro@imgtec.com>
Marcin Nowakowski <marcin.nowakowski@mips.com> <marcin.nowakowski@imgtec.com>
Mark Brown <broonie@sirena.org.uk>
+Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
Matthieu CASTET <castet.matthieu@free.fr>
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
ISDN Appropriate ISDN support is enabled.
+ ISOL CPU Isolation is enabled.
JOY Appropriate joystick support is enabled.
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
not play well with APC CPU idle - disable it if you have
APC and your system crashes randomly.
- apic= [APIC,X86-32] Advanced Programmable Interrupt Controller
+ apic= [APIC,X86] Advanced Programmable Interrupt Controller
Change the output verbosity whilst booting
Format: { quiet (default) | verbose | debug }
Change the amount of debugging information output
when initialising the APIC and IO-APIC components.
+ For X86-32, this can also be used to specify an APIC
+ driver name.
+ Format: apic=driver_name
+ Examples: apic=bigsmp
apic_extnmi= [APIC,X86] External NMI delivery setting
Format: { bsp (default) | all | none }
isapnp= [ISAPNP]
Format: <RDP>,<reset>,<pci_scan>,<verbosity>
- isolcpus= [KNL,SMP] Isolate a given set of CPUs from disturbance.
+ isolcpus= [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance.
[Deprecated - use cpusets instead]
Format: [flag-list,]<cpu-list>
Valid arguments: on, off
Default: on
- nohz_full= [KNL,BOOT]
+ nohz_full= [KNL,BOOT,SMP,ISOL]
The argument is a cpu list, as described above.
In kernels built with CONFIG_NO_HZ_FULL=y, set
the specified list of CPUs whose tick will be stopped
steal time is computed, but won't influence scheduler
behaviour
+ nopti [X86-64] Disable kernel page table isolation
+
nolapic [X86-32,APIC] Do not enable or use the local APIC.
nolapic_timer [X86-32,APIC] Do not use the local APIC timer.
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
+ pti= [X86_64]
+ Control user/kernel address space isolation:
+ on - enable
+ off - disable
+ auto - default setting
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
a sysfs attribute called "force_power".
For example the intel-wmi-thunderbolt driver exposes this attribute in:
- /sys/devices/platform/PNP0C14:00/wmi_bus/wmi_bus-PNP0C14:00/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
+ /sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
To force the power to on, write 1 to this attribute file.
To disable force power, write 0 to this attribute file.
| Qualcomm Tech. | Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 |
| Qualcomm Tech. | Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 |
| Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 |
+| Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 |
normal scheduling policy and absolute bandwidth allocation model for
realtime scheduling policy.
+WARNING: cgroup2 doesn't yet support control of realtime processes and
+the cpu controller can only be enabled when all RT processes are in
+the root cgroup. Be aware that system management software may already
+have placed RT processes into nonroot cgroups during the system boot
+process, and these processes may need to be moved to the root cgroup
+before the cpu controller can be enabled.
+
CPU Interface Files
~~~~~~~~~~~~~~~~~~~
at25df321a
at25df641
at26df081a
- en25s64
mr25h128
mr25h256
mr25h10
s25fl008k
s25fl064k
sst25vf040b
- sst25wf040b
m25p40
m25p80
m25p16
compatible = "dlg,da7218";
reg = <0x1a>;
interrupt-parent = <&gpio6>;
- interrupts = <11 IRQ_TYPE_LEVEL_HIGH>;
+ interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
wakeup-source;
VDD-supply = <®_audio>;
reg = <0x1a>;
interrupt-parent = <&gpio6>;
- interrupts = <11 IRQ_TYPE_LEVEL_HIGH>;
+ interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
VDD-supply = <®_audio>;
VDDMIC-supply = <®_audio>;
- "fsl,imx53-ecspi" for SPI compatible with the one integrated on i.MX53 and later Soc
- reg : Offset and length of the register set for the device
- interrupts : Should contain CSPI/eCSPI interrupt
-- cs-gpios : Specifies the gpio pins to be used for chipselects.
- clocks : Clock specifiers for both ipg and per clocks.
- clock-names : Clock names should include both "ipg" and "per"
See the clock consumer binding,
Documentation/devicetree/bindings/clock/clock-bindings.txt
-- dmas: DMA specifiers for tx and rx dma. See the DMA client binding,
- Documentation/devicetree/bindings/dma/dma.txt
-- dma-names: DMA request names should include "tx" and "rx" if present.
-Obsolete properties:
-- fsl,spi-num-chipselects : Contains the number of the chipselect
+Recommended properties:
+- cs-gpios : GPIOs to use as chip selects, see spi-bus.txt. While the native chip
+select lines can be used, they appear to always generate a pulse between each
+word of a transfer. Most use cases will require GPIO based chip selects to
+generate a valid transaction.
Optional properties:
+- num-cs : Number of total chip selects, see spi-bus.txt.
+- dmas: DMA specifiers for tx and rx dma. See the DMA client binding,
+Documentation/devicetree/bindings/dma/dma.txt.
+- dma-names: DMA request names, if present, should include "tx" and "rx".
- fsl,spi-rdy-drctl: Integer, representing the value of DRCTL, the register
controlling the SPI_READY handling. Note that to enable the DRCTL consideration,
the SPI_READY mode-flag needs to be set too.
Valid values are: 0 (disabled), 1 (edge-triggered burst) and 2 (level-triggered burst).
+Obsolete properties:
+- fsl,spi-num-chipselects : Contains the number of the chipselect
+
Example:
ecspi@70010000 {
root of the overlay. Finally the directory is moved to the new
location.
+There are several ways to tune the "redirect_dir" feature.
+
+Kernel config options:
+
+- OVERLAY_FS_REDIRECT_DIR:
+ If this is enabled, then redirect_dir is turned on by default.
+- OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW:
+ If this is enabled, then redirects are always followed by default. Enabling
+ this results in a less secure configuration. Enable this option only when
+ worried about backward compatibility with kernels that have the redirect_dir
+ feature and follow redirects even if turned off.
+
+Module options (can also be changed through /sys/module/overlay/parameters/*):
+
+- "redirect_dir=BOOL":
+ See OVERLAY_FS_REDIRECT_DIR kernel config option above.
+- "redirect_always_follow=BOOL":
+ See OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW kernel config option above.
+- "redirect_max=NUM":
+ The maximum number of bytes in an absolute redirect (default is 256).
+
+Mount options:
+
+- "redirect_dir=on":
+ Redirects are enabled.
+- "redirect_dir=follow":
+ Redirects are not created, but followed.
+- "redirect_dir=off":
+ Redirects are not created and only followed if "redirect_always_follow"
+ feature is enabled in the kernel/module config.
+- "redirect_dir=nofollow":
+ Redirects are not created and not followed (equivalent to "redirect_dir=off"
+ if "redirect_always_follow" feature is not enabled).
+
Non-directories
---------------
GuC-specific firmware loader
----------------------------
-.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c
- :doc: GuC-specific firmware loader
-
-.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c
+.. kernel-doc:: drivers/gpu/drm/i915/intel_guc_fw.c
:internal:
GuC-based command submission
+++ /dev/null
-Crossrelease
-============
-
-Started by Byungchul Park <byungchul.park@lge.com>
-
-Contents:
-
- (*) Background
-
- - What causes deadlock
- - How lockdep works
-
- (*) Limitation
-
- - Limit lockdep
- - Pros from the limitation
- - Cons from the limitation
- - Relax the limitation
-
- (*) Crossrelease
-
- - Introduce crossrelease
- - Introduce commit
-
- (*) Implementation
-
- - Data structures
- - How crossrelease works
-
- (*) Optimizations
-
- - Avoid duplication
- - Lockless for hot paths
-
- (*) APPENDIX A: What lockdep does to work aggresively
-
- (*) APPENDIX B: How to avoid adding false dependencies
-
-
-==========
-Background
-==========
-
-What causes deadlock
---------------------
-
-A deadlock occurs when a context is waiting for an event to happen,
-which is impossible because another (or the) context who can trigger the
-event is also waiting for another (or the) event to happen, which is
-also impossible due to the same reason.
-
-For example:
-
- A context going to trigger event C is waiting for event A to happen.
- A context going to trigger event A is waiting for event B to happen.
- A context going to trigger event B is waiting for event C to happen.
-
-A deadlock occurs when these three wait operations run at the same time,
-because event C cannot be triggered if event A does not happen, which in
-turn cannot be triggered if event B does not happen, which in turn
-cannot be triggered if event C does not happen. After all, no event can
-be triggered since any of them never meets its condition to wake up.
-
-A dependency might exist between two waiters and a deadlock might happen
-due to an incorrect releationship between dependencies. Thus, we must
-define what a dependency is first. A dependency exists between them if:
-
- 1. There are two waiters waiting for each event at a given time.
- 2. The only way to wake up each waiter is to trigger its event.
- 3. Whether one can be woken up depends on whether the other can.
-
-Each wait in the example creates its dependency like:
-
- Event C depends on event A.
- Event A depends on event B.
- Event B depends on event C.
-
- NOTE: Precisely speaking, a dependency is one between whether a
- waiter for an event can be woken up and whether another waiter for
- another event can be woken up. However from now on, we will describe
- a dependency as if it's one between an event and another event for
- simplicity.
-
-And they form circular dependencies like:
-
- -> C -> A -> B -
- / \
- \ /
- ----------------
-
- where 'A -> B' means that event A depends on event B.
-
-Such circular dependencies lead to a deadlock since no waiter can meet
-its condition to wake up as described.
-
-CONCLUSION
-
-Circular dependencies cause a deadlock.
-
-
-How lockdep works
------------------
-
-Lockdep tries to detect a deadlock by checking dependencies created by
-lock operations, acquire and release. Waiting for a lock corresponds to
-waiting for an event, and releasing a lock corresponds to triggering an
-event in the previous section.
-
-In short, lockdep does:
-
- 1. Detect a new dependency.
- 2. Add the dependency into a global graph.
- 3. Check if that makes dependencies circular.
- 4. Report a deadlock or its possibility if so.
-
-For example, consider a graph built by lockdep that looks like:
-
- A -> B -
- \
- -> E
- /
- C -> D -
-
- where A, B,..., E are different lock classes.
-
-Lockdep will add a dependency into the graph on detection of a new
-dependency. For example, it will add a dependency 'E -> C' when a new
-dependency between lock E and lock C is detected. Then the graph will be:
-
- A -> B -
- \
- -> E -
- / \
- -> C -> D - \
- / /
- \ /
- ------------------
-
- where A, B,..., E are different lock classes.
-
-This graph contains a subgraph which demonstrates circular dependencies:
-
- -> E -
- / \
- -> C -> D - \
- / /
- \ /
- ------------------
-
- where C, D and E are different lock classes.
-
-This is the condition under which a deadlock might occur. Lockdep
-reports it on detection after adding a new dependency. This is the way
-how lockdep works.
-
-CONCLUSION
-
-Lockdep detects a deadlock or its possibility by checking if circular
-dependencies were created after adding each new dependency.
-
-
-==========
-Limitation
-==========
-
-Limit lockdep
--------------
-
-Limiting lockdep to work on only typical locks e.g. spin locks and
-mutexes, which are released within the acquire context, the
-implementation becomes simple but its capacity for detection becomes
-limited. Let's check pros and cons in next section.
-
-
-Pros from the limitation
-------------------------
-
-Given the limitation, when acquiring a lock, locks in a held_locks
-cannot be released if the context cannot acquire it so has to wait to
-acquire it, which means all waiters for the locks in the held_locks are
-stuck. It's an exact case to create dependencies between each lock in
-the held_locks and the lock to acquire.
-
-For example:
-
- CONTEXT X
- ---------
- acquire A
- acquire B /* Add a dependency 'A -> B' */
- release B
- release A
-
- where A and B are different lock classes.
-
-When acquiring lock A, the held_locks of CONTEXT X is empty thus no
-dependency is added. But when acquiring lock B, lockdep detects and adds
-a new dependency 'A -> B' between lock A in the held_locks and lock B.
-They can be simply added whenever acquiring each lock.
-
-And data required by lockdep exists in a local structure, held_locks
-embedded in task_struct. Forcing to access the data within the context,
-lockdep can avoid racy problems without explicit locks while handling
-the local data.
-
-Lastly, lockdep only needs to keep locks currently being held, to build
-a dependency graph. However, relaxing the limitation, it needs to keep
-even locks already released, because a decision whether they created
-dependencies might be long-deferred.
-
-To sum up, we can expect several advantages from the limitation:
-
- 1. Lockdep can easily identify a dependency when acquiring a lock.
- 2. Races are avoidable while accessing local locks in a held_locks.
- 3. Lockdep only needs to keep locks currently being held.
-
-CONCLUSION
-
-Given the limitation, the implementation becomes simple and efficient.
-
-
-Cons from the limitation
-------------------------
-
-Given the limitation, lockdep is applicable only to typical locks. For
-example, page locks for page access or completions for synchronization
-cannot work with lockdep.
-
-Can we detect deadlocks below, under the limitation?
-
-Example 1:
-
- CONTEXT X CONTEXT Y CONTEXT Z
- --------- --------- ----------
- mutex_lock A
- lock_page B
- lock_page B
- mutex_lock A /* DEADLOCK */
- unlock_page B held by X
- unlock_page B
- mutex_unlock A
- mutex_unlock A
-
- where A and B are different lock classes.
-
-No, we cannot.
-
-Example 2:
-
- CONTEXT X CONTEXT Y
- --------- ---------
- mutex_lock A
- mutex_lock A
- wait_for_complete B /* DEADLOCK */
- complete B
- mutex_unlock A
- mutex_unlock A
-
- where A is a lock class and B is a completion variable.
-
-No, we cannot.
-
-CONCLUSION
-
-Given the limitation, lockdep cannot detect a deadlock or its
-possibility caused by page locks or completions.
-
-
-Relax the limitation
---------------------
-
-Under the limitation, things to create dependencies are limited to
-typical locks. However, synchronization primitives like page locks and
-completions, which are allowed to be released in any context, also
-create dependencies and can cause a deadlock. So lockdep should track
-these locks to do a better job. We have to relax the limitation for
-these locks to work with lockdep.
-
-Detecting dependencies is very important for lockdep to work because
-adding a dependency means adding an opportunity to check whether it
-causes a deadlock. The more lockdep adds dependencies, the more it
-thoroughly works. Thus Lockdep has to do its best to detect and add as
-many true dependencies into a graph as possible.
-
-For example, considering only typical locks, lockdep builds a graph like:
-
- A -> B -
- \
- -> E
- /
- C -> D -
-
- where A, B,..., E are different lock classes.
-
-On the other hand, under the relaxation, additional dependencies might
-be created and added. Assuming additional 'FX -> C' and 'E -> GX' are
-added thanks to the relaxation, the graph will be:
-
- A -> B -
- \
- -> E -> GX
- /
- FX -> C -> D -
-
- where A, B,..., E, FX and GX are different lock classes, and a suffix
- 'X' is added on non-typical locks.
-
-The latter graph gives us more chances to check circular dependencies
-than the former. However, it might suffer performance degradation since
-relaxing the limitation, with which design and implementation of lockdep
-can be efficient, might introduce inefficiency inevitably. So lockdep
-should provide two options, strong detection and efficient detection.
-
-Choosing efficient detection:
-
- Lockdep works with only locks restricted to be released within the
- acquire context. However, lockdep works efficiently.
-
-Choosing strong detection:
-
- Lockdep works with all synchronization primitives. However, lockdep
- suffers performance degradation.
-
-CONCLUSION
-
-Relaxing the limitation, lockdep can add additional dependencies giving
-additional opportunities to check circular dependencies.
-
-
-============
-Crossrelease
-============
-
-Introduce crossrelease
-----------------------
-
-In order to allow lockdep to handle additional dependencies by what
-might be released in any context, namely 'crosslock', we have to be able
-to identify those created by crosslocks. The proposed 'crossrelease'
-feature provoides a way to do that.
-
-Crossrelease feature has to do:
-
- 1. Identify dependencies created by crosslocks.
- 2. Add the dependencies into a dependency graph.
-
-That's all. Once a meaningful dependency is added into graph, then
-lockdep would work with the graph as it did. The most important thing
-crossrelease feature has to do is to correctly identify and add true
-dependencies into the global graph.
-
-A dependency e.g. 'A -> B' can be identified only in the A's release
-context because a decision required to identify the dependency can be
-made only in the release context. That is to decide whether A can be
-released so that a waiter for A can be woken up. It cannot be made in
-other than the A's release context.
-
-It's no matter for typical locks because each acquire context is same as
-its release context, thus lockdep can decide whether a lock can be
-released in the acquire context. However for crosslocks, lockdep cannot
-make the decision in the acquire context but has to wait until the
-release context is identified.
-
-Therefore, deadlocks by crosslocks cannot be detected just when it
-happens, because those cannot be identified until the crosslocks are
-released. However, deadlock possibilities can be detected and it's very
-worth. See 'APPENDIX A' section to check why.
-
-CONCLUSION
-
-Using crossrelease feature, lockdep can work with what might be released
-in any context, namely crosslock.
-
-
-Introduce commit
-----------------
-
-Since crossrelease defers the work adding true dependencies of
-crosslocks until they are actually released, crossrelease has to queue
-all acquisitions which might create dependencies with the crosslocks.
-Then it identifies dependencies using the queued data in batches at a
-proper time. We call it 'commit'.
-
-There are four types of dependencies:
-
-1. TT type: 'typical lock A -> typical lock B'
-
- Just when acquiring B, lockdep can see it's in the A's release
- context. So the dependency between A and B can be identified
- immediately. Commit is unnecessary.
-
-2. TC type: 'typical lock A -> crosslock BX'
-
- Just when acquiring BX, lockdep can see it's in the A's release
- context. So the dependency between A and BX can be identified
- immediately. Commit is unnecessary, too.
-
-3. CT type: 'crosslock AX -> typical lock B'
-
- When acquiring B, lockdep cannot identify the dependency because
- there's no way to know if it's in the AX's release context. It has
- to wait until the decision can be made. Commit is necessary.
-
-4. CC type: 'crosslock AX -> crosslock BX'
-
- When acquiring BX, lockdep cannot identify the dependency because
- there's no way to know if it's in the AX's release context. It has
- to wait until the decision can be made. Commit is necessary.
- But, handling CC type is not implemented yet. It's a future work.
-
-Lockdep can work without commit for typical locks, but commit step is
-necessary once crosslocks are involved. Introducing commit, lockdep
-performs three steps. What lockdep does in each step is:
-
-1. Acquisition: For typical locks, lockdep does what it originally did
- and queues the lock so that CT type dependencies can be checked using
- it at the commit step. For crosslocks, it saves data which will be
- used at the commit step and increases a reference count for it.
-
-2. Commit: No action is reauired for typical locks. For crosslocks,
- lockdep adds CT type dependencies using the data saved at the
- acquisition step.
-
-3. Release: No changes are required for typical locks. When a crosslock
- is released, it decreases a reference count for it.
-
-CONCLUSION
-
-Crossrelease introduces commit step to handle dependencies of crosslocks
-in batches at a proper time.
-
-
-==============
-Implementation
-==============
-
-Data structures
----------------
-
-Crossrelease introduces two main data structures.
-
-1. hist_lock
-
- This is an array embedded in task_struct, for keeping lock history so
- that dependencies can be added using them at the commit step. Since
- it's local data, it can be accessed locklessly in the owner context.
- The array is filled at the acquisition step and consumed at the
- commit step. And it's managed in circular manner.
-
-2. cross_lock
-
- One per lockdep_map exists. This is for keeping data of crosslocks
- and used at the commit step.
-
-
-How crossrelease works
-----------------------
-
-It's the key of how crossrelease works, to defer necessary works to an
-appropriate point in time and perform in at once at the commit step.
-Let's take a look with examples step by step, starting from how lockdep
-works without crossrelease for typical locks.
-
- acquire A /* Push A onto held_locks */
- acquire B /* Push B onto held_locks and add 'A -> B' */
- acquire C /* Push C onto held_locks and add 'B -> C' */
- release C /* Pop C from held_locks */
- release B /* Pop B from held_locks */
- release A /* Pop A from held_locks */
-
- where A, B and C are different lock classes.
-
- NOTE: This document assumes that readers already understand how
- lockdep works without crossrelease thus omits details. But there's
- one thing to note. Lockdep pretends to pop a lock from held_locks
- when releasing it. But it's subtly different from the original pop
- operation because lockdep allows other than the top to be poped.
-
-In this case, lockdep adds 'the top of held_locks -> the lock to acquire'
-dependency every time acquiring a lock.
-
-After adding 'A -> B', a dependency graph will be:
-
- A -> B
-
- where A and B are different lock classes.
-
-And after adding 'B -> C', the graph will be:
-
- A -> B -> C
-
- where A, B and C are different lock classes.
-
-Let's performs commit step even for typical locks to add dependencies.
-Of course, commit step is not necessary for them, however, it would work
-well because this is a more general way.
-
- acquire A
- /*
- * Queue A into hist_locks
- *
- * In hist_locks: A
- * In graph: Empty
- */
-
- acquire B
- /*
- * Queue B into hist_locks
- *
- * In hist_locks: A, B
- * In graph: Empty
- */
-
- acquire C
- /*
- * Queue C into hist_locks
- *
- * In hist_locks: A, B, C
- * In graph: Empty
- */
-
- commit C
- /*
- * Add 'C -> ?'
- * Answer the following to decide '?'
- * What has been queued since acquire C: Nothing
- *
- * In hist_locks: A, B, C
- * In graph: Empty
- */
-
- release C
-
- commit B
- /*
- * Add 'B -> ?'
- * Answer the following to decide '?'
- * What has been queued since acquire B: C
- *
- * In hist_locks: A, B, C
- * In graph: 'B -> C'
- */
-
- release B
-
- commit A
- /*
- * Add 'A -> ?'
- * Answer the following to decide '?'
- * What has been queued since acquire A: B, C
- *
- * In hist_locks: A, B, C
- * In graph: 'B -> C', 'A -> B', 'A -> C'
- */
-
- release A
-
- where A, B and C are different lock classes.
-
-In this case, dependencies are added at the commit step as described.
-
-After commits for A, B and C, the graph will be:
-
- A -> B -> C
-
- where A, B and C are different lock classes.
-
- NOTE: A dependency 'A -> C' is optimized out.
-
-We can see the former graph built without commit step is same as the
-latter graph built using commit steps. Of course the former way leads to
-earlier finish for building the graph, which means we can detect a
-deadlock or its possibility sooner. So the former way would be prefered
-when possible. But we cannot avoid using the latter way for crosslocks.
-
-Let's look at how commit steps work for crosslocks. In this case, the
-commit step is performed only on crosslock AX as real. And it assumes
-that the AX release context is different from the AX acquire context.
-
- BX RELEASE CONTEXT BX ACQUIRE CONTEXT
- ------------------ ------------------
- acquire A
- /*
- * Push A onto held_locks
- * Queue A into hist_locks
- *
- * In held_locks: A
- * In hist_locks: A
- * In graph: Empty
- */
-
- acquire BX
- /*
- * Add 'the top of held_locks -> BX'
- *
- * In held_locks: A
- * In hist_locks: A
- * In graph: 'A -> BX'
- */
-
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- It must be guaranteed that the following operations are seen after
- acquiring BX globally. It can be done by things like barrier.
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
- acquire C
- /*
- * Push C onto held_locks
- * Queue C into hist_locks
- *
- * In held_locks: C
- * In hist_locks: C
- * In graph: 'A -> BX'
- */
-
- release C
- /*
- * Pop C from held_locks
- *
- * In held_locks: Empty
- * In hist_locks: C
- * In graph: 'A -> BX'
- */
- acquire D
- /*
- * Push D onto held_locks
- * Queue D into hist_locks
- * Add 'the top of held_locks -> D'
- *
- * In held_locks: A, D
- * In hist_locks: A, D
- * In graph: 'A -> BX', 'A -> D'
- */
- acquire E
- /*
- * Push E onto held_locks
- * Queue E into hist_locks
- *
- * In held_locks: E
- * In hist_locks: C, E
- * In graph: 'A -> BX', 'A -> D'
- */
-
- release E
- /*
- * Pop E from held_locks
- *
- * In held_locks: Empty
- * In hist_locks: D, E
- * In graph: 'A -> BX', 'A -> D'
- */
- release D
- /*
- * Pop D from held_locks
- *
- * In held_locks: A
- * In hist_locks: A, D
- * In graph: 'A -> BX', 'A -> D'
- */
- commit BX
- /*
- * Add 'BX -> ?'
- * What has been queued since acquire BX: C, E
- *
- * In held_locks: Empty
- * In hist_locks: D, E
- * In graph: 'A -> BX', 'A -> D',
- * 'BX -> C', 'BX -> E'
- */
-
- release BX
- /*
- * In held_locks: Empty
- * In hist_locks: D, E
- * In graph: 'A -> BX', 'A -> D',
- * 'BX -> C', 'BX -> E'
- */
- release A
- /*
- * Pop A from held_locks
- *
- * In held_locks: Empty
- * In hist_locks: A, D
- * In graph: 'A -> BX', 'A -> D',
- * 'BX -> C', 'BX -> E'
- */
-
- where A, BX, C,..., E are different lock classes, and a suffix 'X' is
- added on crosslocks.
-
-Crossrelease considers all acquisitions after acqiuring BX are
-candidates which might create dependencies with BX. True dependencies
-will be determined when identifying the release context of BX. Meanwhile,
-all typical locks are queued so that they can be used at the commit step.
-And then two dependencies 'BX -> C' and 'BX -> E' are added at the
-commit step when identifying the release context.
-
-The final graph will be, with crossrelease:
-
- -> C
- /
- -> BX -
- / \
- A - -> E
- \
- -> D
-
- where A, BX, C,..., E are different lock classes, and a suffix 'X' is
- added on crosslocks.
-
-However, the final graph will be, without crossrelease:
-
- A -> D
-
- where A and D are different lock classes.
-
-The former graph has three more dependencies, 'A -> BX', 'BX -> C' and
-'BX -> E' giving additional opportunities to check if they cause
-deadlocks. This way lockdep can detect a deadlock or its possibility
-caused by crosslocks.
-
-CONCLUSION
-
-We checked how crossrelease works with several examples.
-
-
-=============
-Optimizations
-=============
-
-Avoid duplication
------------------
-
-Crossrelease feature uses a cache like what lockdep already uses for
-dependency chains, but this time it's for caching CT type dependencies.
-Once that dependency is cached, the same will never be added again.
-
-
-Lockless for hot paths
-----------------------
-
-To keep all locks for later use at the commit step, crossrelease adopts
-a local array embedded in task_struct, which makes access to the data
-lockless by forcing it to happen only within the owner context. It's
-like how lockdep handles held_locks. Lockless implmentation is important
-since typical locks are very frequently acquired and released.
-
-
-=================================================
-APPENDIX A: What lockdep does to work aggresively
-=================================================
-
-A deadlock actually occurs when all wait operations creating circular
-dependencies run at the same time. Even though they don't, a potential
-deadlock exists if the problematic dependencies exist. Thus it's
-meaningful to detect not only an actual deadlock but also its potential
-possibility. The latter is rather valuable. When a deadlock occurs
-actually, we can identify what happens in the system by some means or
-other even without lockdep. However, there's no way to detect possiblity
-without lockdep unless the whole code is parsed in head. It's terrible.
-Lockdep does the both, and crossrelease only focuses on the latter.
-
-Whether or not a deadlock actually occurs depends on several factors.
-For example, what order contexts are switched in is a factor. Assuming
-circular dependencies exist, a deadlock would occur when contexts are
-switched so that all wait operations creating the dependencies run
-simultaneously. Thus to detect a deadlock possibility even in the case
-that it has not occured yet, lockdep should consider all possible
-combinations of dependencies, trying to:
-
-1. Use a global dependency graph.
-
- Lockdep combines all dependencies into one global graph and uses them,
- regardless of which context generates them or what order contexts are
- switched in. Aggregated dependencies are only considered so they are
- prone to be circular if a problem exists.
-
-2. Check dependencies between classes instead of instances.
-
- What actually causes a deadlock are instances of lock. However,
- lockdep checks dependencies between classes instead of instances.
- This way lockdep can detect a deadlock which has not happened but
- might happen in future by others but the same class.
-
-3. Assume all acquisitions lead to waiting.
-
- Although locks might be acquired without waiting which is essential
- to create dependencies, lockdep assumes all acquisitions lead to
- waiting since it might be true some time or another.
-
-CONCLUSION
-
-Lockdep detects not only an actual deadlock but also its possibility,
-and the latter is more valuable.
-
-
-==================================================
-APPENDIX B: How to avoid adding false dependencies
-==================================================
-
-Remind what a dependency is. A dependency exists if:
-
- 1. There are two waiters waiting for each event at a given time.
- 2. The only way to wake up each waiter is to trigger its event.
- 3. Whether one can be woken up depends on whether the other can.
-
-For example:
-
- acquire A
- acquire B /* A dependency 'A -> B' exists */
- release B
- release A
-
- where A and B are different lock classes.
-
-A depedency 'A -> B' exists since:
-
- 1. A waiter for A and a waiter for B might exist when acquiring B.
- 2. Only way to wake up each is to release what it waits for.
- 3. Whether the waiter for A can be woken up depends on whether the
- other can. IOW, TASK X cannot release A if it fails to acquire B.
-
-For another example:
-
- TASK X TASK Y
- ------ ------
- acquire AX
- acquire B /* A dependency 'AX -> B' exists */
- release B
- release AX held by Y
-
- where AX and B are different lock classes, and a suffix 'X' is added
- on crosslocks.
-
-Even in this case involving crosslocks, the same rule can be applied. A
-depedency 'AX -> B' exists since:
-
- 1. A waiter for AX and a waiter for B might exist when acquiring B.
- 2. Only way to wake up each is to release what it waits for.
- 3. Whether the waiter for AX can be woken up depends on whether the
- other can. IOW, TASK X cannot release AX if it fails to acquire B.
-
-Let's take a look at more complicated example:
-
- TASK X TASK Y
- ------ ------
- acquire B
- release B
- fork Y
- acquire AX
- acquire C /* A dependency 'AX -> C' exists */
- release C
- release AX held by Y
-
- where AX, B and C are different lock classes, and a suffix 'X' is
- added on crosslocks.
-
-Does a dependency 'AX -> B' exist? Nope.
-
-Two waiters are essential to create a dependency. However, waiters for
-AX and B to create 'AX -> B' cannot exist at the same time in this
-example. Thus the dependency 'AX -> B' cannot be created.
-
-It would be ideal if the full set of true ones can be considered. But
-we can ensure nothing but what actually happened. Relying on what
-actually happens at runtime, we can anyway add only true ones, though
-they might be a subset of true ones. It's similar to how lockdep works
-for typical locks. There might be more true dependencies than what
-lockdep has detected in runtime. Lockdep has no choice but to rely on
-what actually happens. Crossrelease also relies on it.
-
-CONCLUSION
-
-Relying on what actually happens, lockdep can avoid adding false
-dependencies.
batman-adv
kapi
z8530book
+ msg_zerocopy
.. only:: subproject
=======
* :ref:`genindex`
-
if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &one, sizeof(one)))
error(1, errno, "setsockopt zerocopy");
+Setting the socket option only works when the socket is in its initial
+(TCP_CLOSED) state. Trying to set the option for a socket returned by accept(),
+for example, will lead to an EBUSY error. In this case, the option should be set
+to the listening socket and it will be inherited by the accepted sockets.
Transmission
------------
original compressor. Once all pages are removed from an old zpool, the zpool
and its compressor are freed.
+Some of the pages in zswap are same-value filled pages (i.e. contents of the
+page have same value or repetitive pattern). These pages include zero-filled
+pages and they are handled differently. During store operation, a page is
+checked if it is a same-value filled page before compressing it. If true, the
+compressed length of the page is set to zero and the pattern or same-filled
+value is stored.
+
+Same-value filled pages identification feature is enabled by default and can be
+disabled at boot time by setting the "same_filled_pages_enabled" attribute to 0,
+e.g. zswap.same_filled_pages_enabled=0. It can also be enabled and disabled at
+runtime using the sysfs "same_filled_pages_enabled" attribute, e.g.
+
+echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
+
+When zswap same-filled page identification is disabled at runtime, it will stop
+checking for the same-value filled pages during store operation. However, the
+existing pages which are marked as same-value filled pages remain stored
+unchanged in zswap until they are either loaded or invalidated.
+
A debugfs interface is provided for various statistic about pool size, number
-of pages stored, and various counters for the reasons pages are rejected.
+of pages stored, same-value filled pages and various counters for the reasons
+pages are rejected.
-<previous description obsolete, deleted>
-
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
... unused hole ...
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
... unused hole ...
+ vaddr_end for KASLR
+fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
+fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space (variable)
-ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
+ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space (variable)
+[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
+ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
Virtual memory map with 5 level page tables:
hole caused by [56:63] sign extension
ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff91ffffffffffff (=49 bits) hole
-ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
+ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
+ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
... unused hole ...
ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
+... unused hole ...
+ vaddr_end for KASLR
+fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping
... unused hole ...
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
-ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space
-ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
+ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
+[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
+ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
Architecture defines a 64-bit virtual address. Implementations can support
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
-through to the most-significant implemented bit are set to either all ones
-or all zero. This causes hole between user space and kernel addresses.
+through to the most-significant implemented bit are sign extended.
+This causes hole between user space and kernel addresses if you interpret them
+as unsigned.
The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
the processes using the page fault handler, with init_top_pgt as
reference.
-Current X86-64 implementations support up to 46 bits of address space (64 TB),
-which is our current limit. This expands into MBZ space in the page tables.
-
We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
during EFI runtime calls.
-The module mapping space size changes based on the CONFIG requirements for the
-following fixmap section.
-
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
Their order is preserved but their base will be offset early at boot time.
--Andi Kleen, Jul 2004
+Be very careful vs. KASLR when changing anything here. The KASLR address
+range must not overlap with anything except the KASAN shadow area, which is
+correct as KASAN disables KASLR.
F: include/uapi/linux/bfs_fs.h
BLACKFIN ARCHITECTURE
-M: Steven Miao <realmz6@gmail.com>
L: adi-buildroot-devel@lists.sourceforge.net (moderated for non-subscribers)
T: git git://git.code.sf.net/p/adi-linux/code
W: http://blackfin.uclinux.org
-S: Supported
+S: Orphan
F: arch/blackfin/
BLACKFIN EMAC DRIVER
L: adi-buildroot-devel@lists.sourceforge.net (moderated for non-subscribers)
W: http://blackfin.uclinux.org
-S: Supported
+S: Orphan
F: drivers/net/ethernet/adi/
BLACKFIN MEDIA DRIVER
-M: Scott Jiang <scott.jiang.linux@gmail.com>
L: adi-buildroot-devel@lists.sourceforge.net (moderated for non-subscribers)
W: http://blackfin.uclinux.org/
-S: Supported
+S: Orphan
F: drivers/media/platform/blackfin/
F: drivers/media/i2c/adv7183*
F: drivers/media/i2c/vs6624*
BLACKFIN RTC DRIVER
L: adi-buildroot-devel@lists.sourceforge.net (moderated for non-subscribers)
W: http://blackfin.uclinux.org
-S: Supported
+S: Orphan
F: drivers/rtc/rtc-bfin.c
BLACKFIN SDH DRIVER
L: adi-buildroot-devel@lists.sourceforge.net (moderated for non-subscribers)
W: http://blackfin.uclinux.org
-S: Supported
+S: Orphan
F: drivers/mmc/host/bfin_sdh.c
BLACKFIN SERIAL DRIVER
L: adi-buildroot-devel@lists.sourceforge.net (moderated for non-subscribers)
W: http://blackfin.uclinux.org
-S: Supported
+S: Orphan
F: drivers/tty/serial/bfin_uart.c
BLACKFIN WATCHDOG DRIVER
L: adi-buildroot-devel@lists.sourceforge.net (moderated for non-subscribers)
W: http://blackfin.uclinux.org
-S: Supported
+S: Orphan
F: drivers/watchdog/bfin_wdt.c
BLINKM RGB LED DRIVER
EFI TEST DRIVER
L: linux-efi@vger.kernel.org
M: Ivan Hu <ivan.hu@canonical.com>
-M: Matt Fleming <matt@codeblueprint.co.uk>
+M: Ard Biesheuvel <ard.biesheuvel@linaro.org>
S: Maintained
F: drivers/firmware/efi/test/
EFI VARIABLE FILESYSTEM
M: Matthew Garrett <matthew.garrett@nebula.com>
M: Jeremy Kerr <jk@ozlabs.org>
-M: Matt Fleming <matt@codeblueprint.co.uk>
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git
+M: Ard Biesheuvel <ard.biesheuvel@linaro.org>
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
L: linux-efi@vger.kernel.org
S: Maintained
F: fs/efivarfs/
F: security/integrity/evm/
EXTENSIBLE FIRMWARE INTERFACE (EFI)
-M: Matt Fleming <matt@codeblueprint.co.uk>
M: Ard Biesheuvel <ard.biesheuvel@linaro.org>
L: linux-efi@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
FCOE SUBSYSTEM (libfc, libfcoe, fcoe)
M: Johannes Thumshirn <jth@kernel.org>
-L: fcoe-devel@open-fcoe.org
+L: linux-scsi@vger.kernel.org
W: www.Open-FCoE.org
S: Supported
F: drivers/scsi/libfc/
F: drivers/irqchip/irq-or1k-*
OPENVSWITCH
-M: Pravin Shelar <pshelar@nicira.com>
+M: Pravin B Shelar <pshelar@ovn.org>
L: netdev@vger.kernel.org
L: dev@openvswitch.org
W: http://openvswitch.org
SYNOPSYS DESIGNWARE ENTERPRISE ETHERNET DRIVER
M: Jie Deng <jiedeng@synopsys.com>
+M: Jose Abreu <Jose.Abreu@synopsys.com>
L: netdev@vger.kernel.org
S: Supported
F: drivers/net/ethernet/synopsys/
M: Yehezkel Bernat <yehezkel.bernat@intel.com>
T: git git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git
S: Maintained
+F: Documentation/admin-guide/thunderbolt.rst
F: drivers/thunderbolt/
F: include/linux/thunderbolt.h
VERSION = 4
PATCHLEVEL = 15
SUBLEVEL = 0
-EXTRAVERSION = -rc3
+EXTRAVERSION = -rc7
NAME = Fearless Coyote
# *DOCUMENTATION*
# disable invalid "can't wrap" optimizations for signed / pointers
KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
+# Make sure -fstack-check isn't enabled (like gentoo apparently did)
+KBUILD_CFLAGS += $(call cc-option,-fno-stack-check,)
+
# conserve stack if available
KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
reg = <0x80 0x10>, <0x100 0x10>;
#clock-cells = <0>;
clocks = <&input_clk>;
+
+ /*
+ * Set initial core pll output frequency to 90MHz.
+ * It will be applied at the core pll driver probing
+ * on early boot.
+ */
+ assigned-clocks = <&core_clk>;
+ assigned-clock-rates = <90000000>;
};
core_intc: archs-intc@cpu {
reg = <0x80 0x10>, <0x100 0x10>;
#clock-cells = <0>;
clocks = <&input_clk>;
+
+ /*
+ * Set initial core pll output frequency to 100MHz.
+ * It will be applied at the core pll driver probing
+ * on early boot.
+ */
+ assigned-clocks = <&core_clk>;
+ assigned-clock-rates = <100000000>;
};
core_intc: archs-intc@cpu {
reg = <0x00 0x10>, <0x14B8 0x4>;
#clock-cells = <0>;
clocks = <&input_clk>;
+
+ /*
+ * Set initial core pll output frequency to 1GHz.
+ * It will be applied at the core pll driver probing
+ * on early boot.
+ */
+ assigned-clocks = <&core_clk>;
+ assigned-clock-rates = <1000000000>;
};
serial: serial@5000 {
CONFIG_SERIAL_OF_PLATFORM=y
# CONFIG_HW_RANDOM is not set
# CONFIG_HWMON is not set
+CONFIG_DRM=y
+# CONFIG_DRM_FBDEV_EMULATION is not set
+CONFIG_DRM_UDL=y
CONFIG_FB=y
-CONFIG_FB_UDL=y
CONFIG_FRAMEBUFFER_CONSOLE=y
-CONFIG_USB=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_HCD_PLATFORM=y
CONFIG_USB_OHCI_HCD=y
return 0;
__asm__ __volatile__(
+ " mov lp_count, %5 \n"
" lp 3f \n"
"1: ldb.ab %3, [%2, 1] \n"
" breq.d %3, 0, 3f \n"
" .word 1b, 4b \n"
" .previous \n"
: "+r"(res), "+r"(dst), "+r"(src), "=r"(val)
- : "g"(-EFAULT), "l"(count)
- : "memory");
+ : "g"(-EFAULT), "r"(count)
+ : "lp_count", "lp_start", "lp_end", "memory");
return res;
}
unsigned int exec_ctrl;
READ_BCR(AUX_EXEC_CTRL, exec_ctrl);
- cpu->extn.dual_enb = exec_ctrl & 1;
+ cpu->extn.dual_enb = !(exec_ctrl & 1);
/* dual issue always present for this core */
cpu->extn.dual = 1;
*/
static int __print_sym(unsigned int address, void *unused)
{
- __print_symbol(" %s\n", address);
+ printk(" %pS\n", (void *)address);
return 0;
}
DO_ERROR_INFO(SIGBUS, "Invalid Mem Access", __weak do_memory_error, BUS_ADRERR)
DO_ERROR_INFO(SIGTRAP, "Breakpoint Set", trap_is_brkpt, TRAP_BRKPT)
DO_ERROR_INFO(SIGBUS, "Misaligned Access", do_misaligned_error, BUS_ADRALN)
+DO_ERROR_INFO(SIGSEGV, "gcc generated __builtin_trap", do_trap5_error, 0)
/*
* Entry Point for Misaligned Data access Exception, for emulating in software
* Thus TRAP_S <n> can be used for specific purpose
* -1 used for software breakpointing (gdb)
* -2 used by kprobes
+ * -5 __builtin_trap() generated by gcc (2018.03 onwards) for toggle such as
+ * -fno-isolate-erroneous-paths-dereference
*/
void do_non_swi_trap(unsigned long address, struct pt_regs *regs)
{
kgdb_trap(regs);
break;
+ case 5:
+ do_trap5_error(address, regs);
+ break;
default:
break;
}
insterror_is_error(address, regs);
}
+
+/*
+ * abort() call generated by older gcc for __builtin_trap()
+ */
+void abort(void)
+{
+ __asm__ __volatile__("trap_s 5\n");
+}
else
pr_cont("Bus Error, check PRM\n");
#endif
+ } else if (vec == ECR_V_TRAP) {
+ if (regs->ecr_param == 5)
+ pr_cont("gcc generated __builtin_trap\n");
} else {
pr_cont("Check Programmer's Manual\n");
}
* Instead of duplicating defconfig/DT for SMP/QUAD, add a small hack
* of fudging the freq in DT
*/
+#define AXS103_QUAD_CORE_CPU_FREQ_HZ 50000000
+
unsigned int num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F;
if (num_cores > 2) {
- u32 freq = 50, orig;
- /*
- * TODO: use cpu node "cpu-freq" param instead of platform-specific
- * "/cpu_card/core_clk" as it works only if we use fixed-clock for cpu.
- */
+ u32 freq;
int off = fdt_path_offset(initial_boot_params, "/cpu_card/core_clk");
const struct fdt_property *prop;
prop = fdt_get_property(initial_boot_params, off,
- "clock-frequency", NULL);
- orig = be32_to_cpu(*(u32*)(prop->data)) / 1000000;
+ "assigned-clock-rates", NULL);
+ freq = be32_to_cpu(*(u32 *)(prop->data));
/* Patching .dtb in-place with new core clock value */
- if (freq != orig ) {
- freq = cpu_to_be32(freq * 1000000);
+ if (freq != AXS103_QUAD_CORE_CPU_FREQ_HZ) {
+ freq = cpu_to_be32(AXS103_QUAD_CORE_CPU_FREQ_HZ);
fdt_setprop_inplace(initial_boot_params, off,
- "clock-frequency", &freq, sizeof(freq));
+ "assigned-clock-rates", &freq, sizeof(freq));
}
}
#endif
#define CREG_PAE (CREG_BASE + 0x180)
#define CREG_PAE_UPDATE (CREG_BASE + 0x194)
-#define CREG_CORE_IF_CLK_DIV (CREG_BASE + 0x4B8)
-#define CREG_CORE_IF_CLK_DIV_2 0x1
-#define CGU_BASE ARC_PERIPHERAL_BASE
-#define CGU_PLL_STATUS (ARC_PERIPHERAL_BASE + 0x4)
-#define CGU_PLL_CTRL (ARC_PERIPHERAL_BASE + 0x0)
-#define CGU_PLL_STATUS_LOCK BIT(0)
-#define CGU_PLL_STATUS_ERR BIT(1)
-#define CGU_PLL_CTRL_1GHZ 0x3A10
-#define HSDK_PLL_LOCK_TIMEOUT 500
-
-#define HSDK_PLL_LOCKED() \
- !!(ioread32((void __iomem *) CGU_PLL_STATUS) & CGU_PLL_STATUS_LOCK)
-
-#define HSDK_PLL_ERR() \
- !!(ioread32((void __iomem *) CGU_PLL_STATUS) & CGU_PLL_STATUS_ERR)
-
-static void __init hsdk_set_cpu_freq_1ghz(void)
-{
- u32 timeout = HSDK_PLL_LOCK_TIMEOUT;
-
- /*
- * As we set cpu clock which exceeds 500MHz, the divider for the interface
- * clock must be programmed to div-by-2.
- */
- iowrite32(CREG_CORE_IF_CLK_DIV_2, (void __iomem *) CREG_CORE_IF_CLK_DIV);
-
- /* Set cpu clock to 1GHz */
- iowrite32(CGU_PLL_CTRL_1GHZ, (void __iomem *) CGU_PLL_CTRL);
-
- while (!HSDK_PLL_LOCKED() && timeout--)
- cpu_relax();
-
- if (!HSDK_PLL_LOCKED() || HSDK_PLL_ERR())
- pr_err("Failed to setup CPU frequency to 1GHz!");
-}
-
#define SDIO_BASE (ARC_PERIPHERAL_BASE + 0xA000)
#define SDIO_UHS_REG_EXT (SDIO_BASE + 0x108)
#define SDIO_UHS_REG_EXT_DIV_2 (2 << 30)
* minimum possible div-by-2.
*/
iowrite32(SDIO_UHS_REG_EXT_DIV_2, (void __iomem *) SDIO_UHS_REG_EXT);
-
- /*
- * Setup CPU frequency to 1GHz.
- * TODO: remove it after smart hsdk pll driver will be introduced.
- */
- hsdk_set_cpu_freq_1ghz();
}
static const char *hsdk_compat[] __initconst = {
compatible = "aspeed,ast2400-vuart";
reg = <0x1e787000 0x40>;
reg-shift = <2>;
- interrupts = <10>;
+ interrupts = <8>;
clocks = <&clk_uart>;
no-loopback-test;
status = "disabled";
jc42@18 {
compatible = "nxp,se97b", "jedec,jc-42.4-temp";
reg = <0x18>;
+ smbus-timeout-disable;
};
dpot: mcp4651-104@28 {
*/
battery {
pinctrl-names = "default";
- pintctrl-0 = <&battery_pins>;
+ pinctrl-0 = <&battery_pins>;
compatible = "lego,ev3-battery";
io-channels = <&adc 4>, <&adc 3>;
io-channel-names = "voltage", "current";
batt_volt_en {
gpio-hog;
gpios = <6 GPIO_ACTIVE_HIGH>;
- output-low;
+ output-high;
};
};
status = "okay";
};
+&mixer {
+ status = "okay";
+};
+
/* eMMC flash */
&mmc_0 {
status = "okay";
reg = <0x2a>;
VDDA-supply = <®_3p3v>;
VDDIO-supply = <®_3p3v>;
- clocks = <&sys_mclk 1>;
+ clocks = <&sys_mclk>;
};
};
};
reg = <0x0a>;
VDDA-supply = <®_3p3v>;
VDDIO-supply = <®_3p3v>;
- clocks = <&sys_mclk 1>;
+ clocks = <&sys_mclk>;
};
};
};
};
+&cpu0 {
+ cpu0-supply = <&vdd_arm>;
+};
+
&i2c1 {
status = "okay";
clock-frequency = <400000>;
iep_mmu: iommu@ff900800 {
compatible = "rockchip,iommu";
reg = <0x0 0xff900800 0x0 0x40>;
- interrupts = <GIC_SPI 17 IRQ_TYPE_LEVEL_HIGH 0>;
+ interrupts = <GIC_SPI 17 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "iep_mmu";
#iommu-cells = <0>;
status = "disabled";
reg = <0x01c16000 0x1000>;
interrupts = <58>;
clocks = <&ccu CLK_AHB_HDMI0>, <&ccu CLK_HDMI>,
- <&ccu 9>,
- <&ccu 18>;
+ <&ccu CLK_PLL_VIDEO0_2X>,
+ <&ccu CLK_PLL_VIDEO1_2X>;
clock-names = "ahb", "mod", "pll-0", "pll-1";
dmas = <&dma SUN4I_DMA_NORMAL 16>,
<&dma SUN4I_DMA_NORMAL 16>,
reg = <0x01c16000 0x1000>;
interrupts = <58>;
clocks = <&ccu CLK_AHB_HDMI>, <&ccu CLK_HDMI>,
- <&ccu 9>,
- <&ccu 16>;
+ <&ccu CLK_PLL_VIDEO0_2X>,
+ <&ccu CLK_PLL_VIDEO1_2X>;
clock-names = "ahb", "mod", "pll-0", "pll-1";
dmas = <&dma SUN4I_DMA_NORMAL 16>,
<&dma SUN4I_DMA_NORMAL 16>,
interrupts = <GIC_SPI 88 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&ccu CLK_AHB1_HDMI>, <&ccu CLK_HDMI>,
<&ccu CLK_HDMI_DDC>,
- <&ccu 7>,
- <&ccu 13>;
+ <&ccu CLK_PLL_VIDEO0_2X>,
+ <&ccu CLK_PLL_VIDEO1_2X>;
clock-names = "ahb", "mod", "ddc", "pll-0", "pll-1";
resets = <&ccu RST_AHB1_HDMI>;
reset-names = "ahb";
reg = <0x01c16000 0x1000>;
interrupts = <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&ccu CLK_AHB_HDMI0>, <&ccu CLK_HDMI>,
- <&ccu 9>,
- <&ccu 18>;
+ <&ccu CLK_PLL_VIDEO0_2X>,
+ <&ccu CLK_PLL_VIDEO1_2X>;
clock-names = "ahb", "mod", "pll-0", "pll-1";
dmas = <&dma SUN4I_DMA_NORMAL 16>,
<&dma SUN4I_DMA_NORMAL 16>,
status = "okay";
axp81x: pmic@3a3 {
+ compatible = "x-powers,axp813";
reg = <0x3a3>;
interrupt-parent = <&r_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
reg = <0x6e000 0x400>;
ranges = <0 0x6e000 0x400>;
interrupt-parent = <&gic>;
- interrupt-controller;
#address-cells = <1>;
#size-cells = <1>;
switch0port10: port@10 {
reg = <10>;
label = "dsa";
- phy-mode = "xgmii";
+ phy-mode = "xaui";
link = <&switch1port10>;
};
};
switch1port10: port@10 {
reg = <10>;
label = "dsa";
- phy-mode = "xgmii";
+ phy-mode = "xaui";
link = <&switch0port10>;
};
};
/* if that doesn't kill us, halt */
panic("Oops failed to kill thread");
}
-EXPORT_SYMBOL(abort);
void __init trap_init(void)
{
.pushsection .text.fixup,"ax"
.align 4
9001: mov r4, #-EFAULT
+#ifdef CONFIG_CPU_SW_DOMAIN_PAN
+ ldr r5, [sp, #9*4] @ *err_ptr
+#else
ldr r5, [sp, #8*4] @ *err_ptr
+#endif
str r4, [r5]
ldmia sp, {r1, r2} @ retrieve dst, len
add r2, r2, r1
{ "spi_davinci.0", "rx", EDMA_FILTER_PARAM(0, 17) },
{ "spi_davinci.3", "tx", EDMA_FILTER_PARAM(0, 18) },
{ "spi_davinci.3", "rx", EDMA_FILTER_PARAM(0, 19) },
- { "dm6441-mmc.0", "rx", EDMA_FILTER_PARAM(0, 26) },
- { "dm6441-mmc.0", "tx", EDMA_FILTER_PARAM(0, 27) },
- { "dm6441-mmc.1", "rx", EDMA_FILTER_PARAM(0, 30) },
- { "dm6441-mmc.1", "tx", EDMA_FILTER_PARAM(0, 31) },
+ { "da830-mmc.0", "rx", EDMA_FILTER_PARAM(0, 26) },
+ { "da830-mmc.0", "tx", EDMA_FILTER_PARAM(0, 27) },
+ { "da830-mmc.1", "rx", EDMA_FILTER_PARAM(0, 30) },
+ { "da830-mmc.1", "tx", EDMA_FILTER_PARAM(0, 31) },
};
static struct edma_soc_info dm365_edma_pdata = {
/* not using TC*_ERR */
};
-static struct platform_device dm365_edma_device = {
- .name = "edma",
- .id = 0,
- .dev.platform_data = &dm365_edma_pdata,
- .num_resources = ARRAY_SIZE(edma_resources),
- .resource = edma_resources,
+static const struct platform_device_info dm365_edma_device __initconst = {
+ .name = "edma",
+ .id = 0,
+ .dma_mask = DMA_BIT_MASK(32),
+ .res = edma_resources,
+ .num_res = ARRAY_SIZE(edma_resources),
+ .data = &dm365_edma_pdata,
+ .size_data = sizeof(dm365_edma_pdata),
};
static struct resource dm365_asp_resources[] = {
static int __init dm365_init_devices(void)
{
+ struct platform_device *edma_pdev;
int ret = 0;
if (!cpu_is_davinci_dm365())
return 0;
davinci_cfg_reg(DM365_INT_EDMA_CC);
- platform_device_register(&dm365_edma_device);
+ edma_pdev = platform_device_register_full(&dm365_edma_device);
+ if (IS_ERR(edma_pdev)) {
+ pr_warn("%s: Failed to register eDMA\n", __func__);
+ return PTR_ERR(edma_pdev);
+ }
platform_device_register(&dm365_mdio_device);
platform_device_register(&dm365_emac_device);
If unsure, say Y.
-
config SOCIONEXT_SYNQUACER_PREITS
bool "Socionext Synquacer: Workaround for GICv3 pre-ITS"
default y
a 128kB offset to be applied to the target address in this commands.
If unsure, say Y.
+
+config QCOM_FALKOR_ERRATUM_E1041
+ bool "Falkor E1041: Speculative instruction fetches might cause errant memory access"
+ default y
+ help
+ Falkor CPU may speculatively fetch instructions from an improper
+ memory location when MMU translation is changed from SCTLR_ELn[M]=1
+ to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
+
+ If unsure, say Y.
+
endmenu
pinctrl-0 = <&rgmii_pins>;
phy-mode = "rgmii";
phy-handle = <&ext_rgmii_phy>;
+ phy-supply = <®_dc1sw>;
status = "okay";
};
pinctrl-0 = <&rmii_pins>;
phy-mode = "rmii";
phy-handle = <&ext_rmii_phy1>;
+ phy-supply = <®_dc1sw>;
status = "okay";
};
pinctrl-0 = <&rgmii_pins>;
phy-mode = "rgmii";
phy-handle = <&ext_rgmii_phy>;
+ phy-supply = <®_dc1sw>;
status = "okay";
};
&mmc2 {
pinctrl-names = "default";
pinctrl-0 = <&mmc2_pins>;
- vmmc-supply = <®_vcc3v3>;
+ vmmc-supply = <®_dcdc1>;
vqmmc-supply = <®_vcc1v8>;
bus-width = <8>;
non-removable;
#include "sun50i-a64.dtsi"
-/ {
- reg_vcc3v3: vcc3v3 {
- compatible = "regulator-fixed";
- regulator-name = "vcc3v3";
- regulator-min-microvolt = <3300000>;
- regulator-max-microvolt = <3300000>;
- };
-};
-
&mmc0 {
pinctrl-names = "default";
pinctrl-0 = <&mmc0_pins>;
- vmmc-supply = <®_vcc3v3>;
+ vmmc-supply = <®_dcdc1>;
non-removable;
disable-wp;
bus-width = <4>;
pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin>;
vmmc-supply = <®_vcc3v3>;
bus-width = <4>;
- cd-gpios = <&pio 5 6 GPIO_ACTIVE_HIGH>;
+ cd-gpios = <&pio 5 6 GPIO_ACTIVE_LOW>;
status = "okay";
};
&avb {
pinctrl-0 = <&avb_pins>;
pinctrl-names = "default";
- renesas,no-ether-link;
phy-handle = <&phy0>;
status = "okay";
&avb {
pinctrl-0 = <&avb_pins>;
pinctrl-names = "default";
- renesas,no-ether-link;
phy-handle = <&phy0>;
status = "okay";
assigned-clocks = <&cru SCLK_MAC2IO>, <&cru SCLK_MAC2IO_EXT>;
assigned-clock-parents = <&gmac_clkin>, <&gmac_clkin>;
clock_in_out = "input";
+ /* shows instability at 1GBit right now */
+ max-speed = <100>;
phy-supply = <&vcc_io>;
phy-mode = "rgmii";
pinctrl-names = "default";
tsadc: tsadc@ff250000 {
compatible = "rockchip,rk3328-tsadc";
reg = <0x0 0xff250000 0x0 0x100>;
- interrupts = <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH 0>;
+ interrupts = <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH>;
assigned-clocks = <&cru SCLK_TSADC>;
assigned-clock-rates = <50000>;
clocks = <&cru SCLK_TSADC>, <&cru PCLK_TSADC>;
regulator-min-microvolt = <5000000>;
regulator-max-microvolt = <5000000>;
};
-
- vdd_log: vdd-log {
- compatible = "pwm-regulator";
- pwms = <&pwm2 0 25000 0>;
- regulator-name = "vdd_log";
- regulator-min-microvolt = <800000>;
- regulator-max-microvolt = <1400000>;
- regulator-always-on;
- regulator-boot-on;
- status = "okay";
- };
};
&cpu_b0 {
gpio-controller;
#gpio-cells = <2>;
gpio-ranges = <&pinctrl 0 0 0>,
- <&pinctrl 96 0 0>,
- <&pinctrl 160 0 0>;
+ <&pinctrl 104 0 0>,
+ <&pinctrl 168 0 0>;
gpio-ranges-group-names = "gpio_range0",
"gpio_range1",
"gpio_range2";
#endif
.endm
+/**
+ * Errata workaround prior to disable MMU. Insert an ISB immediately prior
+ * to executing the MSR that will change SCTLR_ELn[M] from a value of 1 to 0.
+ */
+ .macro pre_disable_mmu_workaround
+#ifdef CONFIG_QCOM_FALKOR_ERRATUM_E1041
+ isb
+#endif
+ .endm
+
#endif /* __ASM_ASSEMBLER_H */
#define FTR_VISIBLE true /* Feature visible to the user space */
#define FTR_HIDDEN false /* Feature is hidden from the user */
+#define FTR_VISIBLE_IF_IS_ENABLED(config) \
+ (IS_ENABLED(config) ? FTR_VISIBLE : FTR_HIDDEN)
+
struct arm64_ftr_bits {
bool sign; /* Value is signed ? */
bool visible;
#define BRCM_CPU_PART_VULCAN 0x516
#define QCOM_CPU_PART_FALKOR_V1 0x800
+#define QCOM_CPU_PART_FALKOR 0xC00
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
#define MIDR_QCOM_FALKOR_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR_V1)
+#define MIDR_QCOM_FALKOR MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_FALKOR)
#ifndef __ASSEMBLY__
#include <asm/cmpxchg.h>
#include <asm/fixmap.h>
#include <linux/mmdebug.h>
+#include <linux/mm_types.h>
+#include <linux/sched.h>
extern void __pte_error(const char *file, int line, unsigned long val);
extern void __pmd_error(const char *file, int line, unsigned long val);
static inline pte_t pte_mkclean(pte_t pte)
{
- return clear_pte_bit(pte, __pgprot(PTE_DIRTY));
+ pte = clear_pte_bit(pte, __pgprot(PTE_DIRTY));
+ pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
+
+ return pte;
}
static inline pte_t pte_mkdirty(pte_t pte)
{
- return set_pte_bit(pte, __pgprot(PTE_DIRTY));
+ pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
+
+ if (pte_write(pte))
+ pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
+
+ return pte;
}
static inline pte_t pte_mkold(pte_t pte)
}
}
-struct mm_struct;
-struct vm_area_struct;
-
extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
/*
* hardware updates of the pte (ptep_set_access_flags safely changes
* valid ptes without going through an invalid entry).
*/
- if (pte_valid(*ptep) && pte_valid(pte)) {
+ if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
+ (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
VM_WARN_ONCE(!pte_young(pte),
"%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
__func__, pte_val(*ptep), pte_val(pte));
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
/*
- * ptep_set_wrprotect - mark read-only while preserving the hardware update of
- * the Access Flag.
+ * ptep_set_wrprotect - mark read-only while trasferring potential hardware
+ * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit.
*/
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep)
{
pte_t old_pte, pte;
- /*
- * ptep_set_wrprotect() is only called on CoW mappings which are
- * private (!VM_SHARED) with the pte either read-only (!PTE_WRITE &&
- * PTE_RDONLY) or writable and software-dirty (PTE_WRITE &&
- * !PTE_RDONLY && PTE_DIRTY); see is_cow_mapping() and
- * protection_map[]. There is no race with the hardware update of the
- * dirty state: clearing of PTE_RDONLY when PTE_WRITE (a.k.a. PTE_DBM)
- * is set.
- */
- VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(*ptep),
- "%s: potential race with hardware DBM", __func__);
pte = READ_ONCE(*ptep);
do {
old_pte = pte;
+ /*
+ * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY
+ * clear), set the PTE_DIRTY bit.
+ */
+ if (pte_hw_dirty(pte))
+ pte = pte_mkdirty(pte);
pte = pte_wrprotect(pte);
pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
pte_val(old_pte), pte_val(pte));
mrs x12, sctlr_el1
ldr x13, =SCTLR_ELx_FLAGS
bic x12, x12, x13
+ pre_disable_mmu_workaround
msr sctlr_el1, x12
isb
};
static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
- ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
+ ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SVE),
+ FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_SVE_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_GIC_SHIFT, 4, 0),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
mrs x0, sctlr_el2
bic x0, x0, #1 << 0 // clear SCTLR.M
bic x0, x0, #1 << 2 // clear SCTLR.C
+ pre_disable_mmu_workaround
msr sctlr_el2, x0
isb
b 2f
mrs x0, sctlr_el1
bic x0, x0, #1 << 0 // clear SCTLR.M
bic x0, x0, #1 << 2 // clear SCTLR.C
+ pre_disable_mmu_workaround
msr sctlr_el1, x0
isb
2:
local_bh_disable();
- current->thread.fpsimd_state = *state;
+ current->thread.fpsimd_state.user_fpsimd = state->user_fpsimd;
if (system_supports_sve() && test_thread_flag(TIF_SVE))
fpsimd_to_sve(current);
* to take into account by discarding the current kernel mapping and
* creating a new one.
*/
+ pre_disable_mmu_workaround
msr sctlr_el1, x20 // disable the MMU
isb
bl __create_page_tables // recreate kernel mapping
#include <linux/perf_event.h>
#include <linux/ptrace.h>
#include <linux/smp.h>
+#include <linux/uaccess.h>
#include <asm/compat.h>
#include <asm/current.h>
#include <asm/traps.h>
#include <asm/cputype.h>
#include <asm/system_misc.h>
-#include <asm/uaccess.h>
/* Breakpoint currently in use for each BRP. */
static DEFINE_PER_CPU(struct perf_event *, bp_on_reg[ARM_MAX_BRP]);
mrs x0, sctlr_el2
ldr x1, =SCTLR_ELx_FLAGS
bic x0, x0, x1
+ pre_disable_mmu_workaround
msr sctlr_el2, x0
isb
1:
mrs x5, sctlr_el2
ldr x6, =SCTLR_ELx_FLAGS
bic x5, x5, x6 // Clear SCTL_M and etc
+ pre_disable_mmu_workaround
msr sctlr_el2, x5
isb
{
u64 reg;
+ /* Clear pmscr in case of early return */
+ *pmscr_el1 = 0;
+
/* SPE present on this CPU? */
if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
ID_AA64DFR0_PMSVER_SHIFT))
.check_wx = true,
};
- walk_pgd(&st, &init_mm, 0);
+ walk_pgd(&st, &init_mm, VA_START);
note_page(&st, 0, 0, 0);
if (st.wx_pages || st.uxn_pages)
pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found, %lu non-UXN pages found\n",
{
struct siginfo info;
const struct fault_info *inf;
- int ret = 0;
inf = esr_to_fault_info(esr);
pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
if (interrupts_enabled(regs))
nmi_enter();
- ret = ghes_notify_sea();
+ ghes_notify_sea();
if (interrupts_enabled(regs))
nmi_exit();
info.si_addr = (void __user *)addr;
arm64_notify_die("", regs, &info, esr);
- return ret;
+ return 0;
}
static const struct fault_info fault_info[] = {
reserve_elfcorehdr();
+ high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
+
dma_contiguous_reserve(arm64_dma_phys_limit);
memblock_allow_resize();
sparse_init();
zone_sizes_init(min, max);
- high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
memblock_dump_all();
}
/* if that doesn't kill us, halt */
panic("Oops failed to kill thread");
}
-EXPORT_SYMBOL(abort);
void __init trap_init(void)
{
has_mt t0, 3f
.set push
+ .set MIPS_ISA_LEVEL_RAW
.set mt
/* Only allow 1 TC per VPE to execute... */
#elif defined(CONFIG_MIPS_MT)
.set push
+ .set MIPS_ISA_LEVEL_RAW
.set mt
/* If the core doesn't support MT then return */
struct task_struct *t;
int max_users;
+ /* If nothing to change, return right away, successfully. */
+ if (value == mips_get_process_fp_mode(task))
+ return 0;
+
+ /* Only accept a mode change if 64-bit FP enabled for o32. */
+ if (!IS_ENABLED(CONFIG_MIPS_O32_FP64_SUPPORT))
+ return -EOPNOTSUPP;
+
+ /* And only for o32 tasks. */
+ if (IS_ENABLED(CONFIG_64BIT) && !test_thread_flag(TIF_32BIT_REGS))
+ return -EOPNOTSUPP;
+
/* Check the value is valid */
if (value & ~known_bits)
return -EOPNOTSUPP;
#endif /* CONFIG_64BIT */
+/*
+ * Copy the floating-point context to the supplied NT_PRFPREG buffer,
+ * !CONFIG_CPU_HAS_MSA variant. FP context's general register slots
+ * correspond 1:1 to buffer slots. Only general registers are copied.
+ */
+static int fpr_get_fpa(struct task_struct *target,
+ unsigned int *pos, unsigned int *count,
+ void **kbuf, void __user **ubuf)
+{
+ return user_regset_copyout(pos, count, kbuf, ubuf,
+ &target->thread.fpu,
+ 0, NUM_FPU_REGS * sizeof(elf_fpreg_t));
+}
+
+/*
+ * Copy the floating-point context to the supplied NT_PRFPREG buffer,
+ * CONFIG_CPU_HAS_MSA variant. Only lower 64 bits of FP context's
+ * general register slots are copied to buffer slots. Only general
+ * registers are copied.
+ */
+static int fpr_get_msa(struct task_struct *target,
+ unsigned int *pos, unsigned int *count,
+ void **kbuf, void __user **ubuf)
+{
+ unsigned int i;
+ u64 fpr_val;
+ int err;
+
+ BUILD_BUG_ON(sizeof(fpr_val) != sizeof(elf_fpreg_t));
+ for (i = 0; i < NUM_FPU_REGS; i++) {
+ fpr_val = get_fpr64(&target->thread.fpu.fpr[i], 0);
+ err = user_regset_copyout(pos, count, kbuf, ubuf,
+ &fpr_val, i * sizeof(elf_fpreg_t),
+ (i + 1) * sizeof(elf_fpreg_t));
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+/*
+ * Copy the floating-point context to the supplied NT_PRFPREG buffer.
+ * Choose the appropriate helper for general registers, and then copy
+ * the FCSR register separately.
+ */
static int fpr_get(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- unsigned i;
+ const int fcr31_pos = NUM_FPU_REGS * sizeof(elf_fpreg_t);
int err;
- u64 fpr_val;
- /* XXX fcr31 */
+ if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t))
+ err = fpr_get_fpa(target, &pos, &count, &kbuf, &ubuf);
+ else
+ err = fpr_get_msa(target, &pos, &count, &kbuf, &ubuf);
+ if (err)
+ return err;
- if (sizeof(target->thread.fpu.fpr[i]) == sizeof(elf_fpreg_t))
- return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpu,
- 0, sizeof(elf_fpregset_t));
+ err = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.fpu.fcr31,
+ fcr31_pos, fcr31_pos + sizeof(u32));
- for (i = 0; i < NUM_FPU_REGS; i++) {
- fpr_val = get_fpr64(&target->thread.fpu.fpr[i], 0);
- err = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &fpr_val, i * sizeof(elf_fpreg_t),
- (i + 1) * sizeof(elf_fpreg_t));
+ return err;
+}
+
+/*
+ * Copy the supplied NT_PRFPREG buffer to the floating-point context,
+ * !CONFIG_CPU_HAS_MSA variant. Buffer slots correspond 1:1 to FP
+ * context's general register slots. Only general registers are copied.
+ */
+static int fpr_set_fpa(struct task_struct *target,
+ unsigned int *pos, unsigned int *count,
+ const void **kbuf, const void __user **ubuf)
+{
+ return user_regset_copyin(pos, count, kbuf, ubuf,
+ &target->thread.fpu,
+ 0, NUM_FPU_REGS * sizeof(elf_fpreg_t));
+}
+
+/*
+ * Copy the supplied NT_PRFPREG buffer to the floating-point context,
+ * CONFIG_CPU_HAS_MSA variant. Buffer slots are copied to lower 64
+ * bits only of FP context's general register slots. Only general
+ * registers are copied.
+ */
+static int fpr_set_msa(struct task_struct *target,
+ unsigned int *pos, unsigned int *count,
+ const void **kbuf, const void __user **ubuf)
+{
+ unsigned int i;
+ u64 fpr_val;
+ int err;
+
+ BUILD_BUG_ON(sizeof(fpr_val) != sizeof(elf_fpreg_t));
+ for (i = 0; i < NUM_FPU_REGS && *count > 0; i++) {
+ err = user_regset_copyin(pos, count, kbuf, ubuf,
+ &fpr_val, i * sizeof(elf_fpreg_t),
+ (i + 1) * sizeof(elf_fpreg_t));
if (err)
return err;
+ set_fpr64(&target->thread.fpu.fpr[i], 0, fpr_val);
}
return 0;
}
+/*
+ * Copy the supplied NT_PRFPREG buffer to the floating-point context.
+ * Choose the appropriate helper for general registers, and then copy
+ * the FCSR register separately.
+ *
+ * We optimize for the case where `count % sizeof(elf_fpreg_t) == 0',
+ * which is supposed to have been guaranteed by the kernel before
+ * calling us, e.g. in `ptrace_regset'. We enforce that requirement,
+ * so that we can safely avoid preinitializing temporaries for
+ * partial register writes.
+ */
static int fpr_set(struct task_struct *target,
const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- unsigned i;
+ const int fcr31_pos = NUM_FPU_REGS * sizeof(elf_fpreg_t);
+ u32 fcr31;
int err;
- u64 fpr_val;
- /* XXX fcr31 */
+ BUG_ON(count % sizeof(elf_fpreg_t));
+
+ if (pos + count > sizeof(elf_fpregset_t))
+ return -EIO;
init_fp_ctx(target);
- if (sizeof(target->thread.fpu.fpr[i]) == sizeof(elf_fpreg_t))
- return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpu,
- 0, sizeof(elf_fpregset_t));
+ if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t))
+ err = fpr_set_fpa(target, &pos, &count, &kbuf, &ubuf);
+ else
+ err = fpr_set_msa(target, &pos, &count, &kbuf, &ubuf);
+ if (err)
+ return err;
- BUILD_BUG_ON(sizeof(fpr_val) != sizeof(elf_fpreg_t));
- for (i = 0; i < NUM_FPU_REGS && count >= sizeof(elf_fpreg_t); i++) {
+ if (count > 0) {
err = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &fpr_val, i * sizeof(elf_fpreg_t),
- (i + 1) * sizeof(elf_fpreg_t));
+ &fcr31,
+ fcr31_pos, fcr31_pos + sizeof(u32));
if (err)
return err;
- set_fpr64(&target->thread.fpu.fpr[i], 0, fpr_val);
+
+ ptrace_setfcr31(target, fcr31);
}
- return 0;
+ return err;
}
enum mips_regset {
while ((nuline = strchr(s, '\n')) != NULL) {
if (nuline != s)
pdc_iodc_print(s, nuline - s);
- pdc_iodc_print("\r\n", 2);
- s = nuline + 1;
+ pdc_iodc_print("\r\n", 2);
+ s = nuline + 1;
}
if (*s != '\0')
pdc_iodc_print(s, strlen(s));
for the semaphore. */
#define __PA_LDCW_ALIGNMENT 16
+#define __PA_LDCW_ALIGN_ORDER 4
#define __ldcw_align(a) ({ \
unsigned long __ret = (unsigned long) &(a)->lock[0]; \
__ret = (__ret + __PA_LDCW_ALIGNMENT - 1) \
ldcd). */
#define __PA_LDCW_ALIGNMENT 4
+#define __PA_LDCW_ALIGN_ORDER 2
#define __ldcw_align(a) (&(a)->slock)
#define __LDCW "ldcw,co"
/* thread information allocation */
+#ifdef CONFIG_IRQSTACKS
+#define THREAD_SIZE_ORDER 2 /* PA-RISC requires at least 16k stack */
+#else
#define THREAD_SIZE_ORDER 3 /* PA-RISC requires at least 32k stack */
+#endif
+
/* Be sure to hunt all references to this down when you change the size of
* the kernel stack */
#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
static int count;
print_pa_hwpath(dev, hw_path);
- printk(KERN_INFO "%d. %s at 0x%p [%s] { %d, 0x%x, 0x%.3x, 0x%.5x }",
+ printk(KERN_INFO "%d. %s at 0x%px [%s] { %d, 0x%x, 0x%.3x, 0x%.5x }",
++count, dev->name, (void*) dev->hpa.start, hw_path, dev->id.hw_type,
dev->id.hversion_rev, dev->id.hversion, dev->id.sversion);
#include <asm/pgtable.h>
#include <asm/signal.h>
#include <asm/unistd.h>
+#include <asm/ldcw.h>
#include <asm/thread_info.h>
#include <linux/linkage.h>
#endif
.import pa_tlb_lock,data
+ .macro load_pa_tlb_lock reg
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 PA(pa_tlb_lock) + __PA_LDCW_ALIGNMENT-1, \reg
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \reg
+#else
+ load32 PA(pa_tlb_lock), \reg
+#endif
+ .endm
/* space_to_prot macro creates a prot id from a space id */
.macro tlb_lock spc,ptp,pte,tmp,tmp1,fault
#ifdef CONFIG_SMP
cmpib,COND(=),n 0,\spc,2f
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
1: LDCW 0(\tmp),\tmp1
cmpib,COND(=) 0,\tmp1,1b
nop
/* Release pa_tlb_lock lock. */
.macro tlb_unlock1 spc,tmp
#ifdef CONFIG_SMP
- load32 PA(pa_tlb_lock),\tmp
+ load_pa_tlb_lock \tmp
tlb_unlock0 \spc,\tmp
#endif
.endm
STREG %r19,PT_SR7(%r16)
intr_return:
- /* NOTE: Need to enable interrupts incase we schedule. */
- ssm PSW_SM_I, %r0
-
/* check for reschedule */
mfctl %cr30,%r1
LDREG TI_FLAGS(%r1),%r19 /* sched.h: TIF_NEED_RESCHED */
LDREG PT_IASQ1(%r16), %r20
cmpib,COND(=),n 0,%r20,intr_restore /* backward */
+ /* NOTE: We need to enable interrupts if we have to deliver
+ * signals. We used to do this earlier but it caused kernel
+ * stack overflows. */
+ ssm PSW_SM_I, %r0
+
copy %r0, %r25 /* long in_syscall = 0 */
#ifdef CONFIG_64BIT
ldo -16(%r30),%r29 /* Reference param save area */
cmpib,COND(=) 0, %r20, intr_do_preempt
nop
+ /* NOTE: We need to enable interrupts if we schedule. We used
+ * to do this earlier but it caused kernel stack overflows. */
+ ssm PSW_SM_I, %r0
+
#ifdef CONFIG_64BIT
ldo -16(%r30),%r29 /* Reference param save area */
#endif
__INITRODATA
+ .align 4
.export os_hpmc_size
os_hpmc_size:
.word .os_hpmc_end-.os_hpmc
#include <asm/assembly.h>
#include <asm/pgtable.h>
#include <asm/cache.h>
+#include <asm/ldcw.h>
#include <linux/linkage.h>
.text
.macro tlb_lock la,flags,tmp
#ifdef CONFIG_SMP
- ldil L%pa_tlb_lock,%r1
- ldo R%pa_tlb_lock(%r1),\la
+#if __PA_LDCW_ALIGNMENT > 4
+ load32 pa_tlb_lock + __PA_LDCW_ALIGNMENT-1, \la
+ depi 0,31,__PA_LDCW_ALIGN_ORDER, \la
+#else
+ load32 pa_tlb_lock, \la
+#endif
rsm PSW_SM_I,\flags
1: LDCW 0(\la),\tmp
cmpib,<>,n 0,\tmp,3f
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/fs.h>
+#include <linux/cpu.h>
#include <linux/module.h>
#include <linux/personality.h>
#include <linux/ptrace.h>
return 1;
}
+/*
+ * Idle thread support
+ *
+ * Detect when running on QEMU with SeaBIOS PDC Firmware and let
+ * QEMU idle the host too.
+ */
+
+int running_on_qemu __read_mostly;
+
+void __cpuidle arch_cpu_idle_dead(void)
+{
+ /* nop on real hardware, qemu will offline CPU. */
+ asm volatile("or %%r31,%%r31,%%r31\n":::);
+}
+
+void __cpuidle arch_cpu_idle(void)
+{
+ local_irq_enable();
+
+ /* nop on real hardware, qemu will idle sleep. */
+ asm volatile("or %%r10,%%r10,%%r10\n":::);
+}
+
+static int __init parisc_idle_init(void)
+{
+ const char *marker;
+
+ /* check QEMU/SeaBIOS marker in PAGE0 */
+ marker = (char *) &PAGE0->pad0;
+ running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0);
+
+ if (!running_on_qemu)
+ cpu_idle_poll_ctrl(1);
+
+ return 0;
+}
+arch_initcall(parisc_idle_init);
+
/*
* Copy architecture-specific thread state
*/
#include <linux/slab.h>
#include <linux/kallsyms.h>
#include <linux/sort.h>
-#include <linux/sched.h>
#include <linux/uaccess.h>
#include <asm/assembly.h>
#include <linux/preempt.h>
#include <linux/init.h>
-#include <asm/processor.h>
#include <asm/delay.h>
-
#include <asm/special_insns.h> /* for mfctl() */
#include <asm/processor.h> /* for boot_cpu_data */
mem_init_print_info(NULL);
#ifdef CONFIG_DEBUG_KERNEL /* double-sanity-check paranoia */
printk("virtual kernel memory layout:\n"
- " vmalloc : 0x%p - 0x%p (%4ld MB)\n"
- " memory : 0x%p - 0x%p (%4ld MB)\n"
- " .init : 0x%p - 0x%p (%4ld kB)\n"
- " .data : 0x%p - 0x%p (%4ld kB)\n"
- " .text : 0x%p - 0x%p (%4ld kB)\n",
+ " vmalloc : 0x%px - 0x%px (%4ld MB)\n"
+ " memory : 0x%px - 0x%px (%4ld MB)\n"
+ " .init : 0x%px - 0x%px (%4ld kB)\n"
+ " .data : 0x%px - 0x%px (%4ld kB)\n"
+ " .text : 0x%px - 0x%px (%4ld kB)\n",
(void*)VMALLOC_START, (void*)VMALLOC_END,
(VMALLOC_END - VMALLOC_START) >> 20,
#endif
}
-static inline void arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+ struct mm_struct *mm)
{
+ return 0;
}
#ifndef CONFIG_PPC_BOOK3S_64
printk("NIP: "REG" LR: "REG" CTR: "REG"\n",
regs->nip, regs->link, regs->ctr);
- printk("REGS: %p TRAP: %04lx %s (%s)\n",
+ printk("REGS: %px TRAP: %04lx %s (%s)\n",
regs, regs->trap, print_tainted(), init_utsname()->release);
printk("MSR: "REG" ", regs->msr);
print_msr_bits(regs->msr);
/* Return the per-cpu state for state saving/migration */
return (u64)xc->cppr << KVM_REG_PPC_ICP_CPPR_SHIFT |
- (u64)xc->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT;
+ (u64)xc->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT |
+ (u64)0xff << KVM_REG_PPC_ICP_PPRI_SHIFT;
}
int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
/*
* Restore P and Q. If the interrupt was pending, we
- * force both P and Q, which will trigger a resend.
+ * force Q and !P, which will trigger a resend.
*
* That means that a guest that had both an interrupt
* pending (queued) and Q set will restore with only
* is perfectly fine as coalescing interrupts that haven't
* been presented yet is always allowed.
*/
- if (val & KVM_XICS_PRESENTED || val & KVM_XICS_PENDING)
+ if (val & KVM_XICS_PRESENTED && !(val & KVM_XICS_PENDING))
state->old_p = true;
if (val & KVM_XICS_QUEUED || val & KVM_XICS_PENDING)
state->old_q = true;
return __bad_area(regs, address, SEGV_MAPERR);
}
+static noinline int bad_access(struct pt_regs *regs, unsigned long address)
+{
+ return __bad_area(regs, address, SEGV_ACCERR);
+}
+
static int do_sigbus(struct pt_regs *regs, unsigned long address,
unsigned int fault)
{
good_area:
if (unlikely(access_error(is_write, is_exec, vma)))
- return bad_area(regs, address);
+ return bad_access(regs, address);
/*
* If for any reason at all we couldn't handle the fault,
func = (u8 *) __bpf_call_base + imm;
/* Save skb pointer if we need to re-cache skb data */
- if (bpf_helper_changes_pkt_data(func))
+ if ((ctx->seen & SEEN_SKB) &&
+ bpf_helper_changes_pkt_data(func))
PPC_BPF_STL(3, 1, bpf_jit_stack_local(ctx));
bpf_jit_emit_func_call(image, ctx, (u64)func);
PPC_MR(b2p[BPF_REG_0], 3);
/* refresh skb cache */
- if (bpf_helper_changes_pkt_data(func)) {
+ if ((ctx->seen & SEEN_SKB) &&
+ bpf_helper_changes_pkt_data(func)) {
/* reload skb pointer to r3 */
PPC_BPF_LL(3, 1, bpf_jit_stack_local(ctx));
bpf_jit_emit_skb_loads(image, ctx);
int ret;
__u64 target;
- if (is_kernel_addr(addr))
- return branch_target((unsigned int *)addr);
+ if (is_kernel_addr(addr)) {
+ if (probe_kernel_read(&instr, (void *)addr, sizeof(instr)))
+ return 0;
+
+ return branch_target(&instr);
+ }
/* Userspace: need copy instruction here then translate it */
pagefault_disable();
if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
return 0;
+ /*
+ * Check whether nest_imc is registered. We could end up here if the
+ * cpuhotplug callback registration fails. i.e, callback invokes the
+ * offline path for all successfully registered nodes. At this stage,
+ * nest_imc pmu will not be registered and we should return here.
+ *
+ * We return with a zero since this is not an offline failure. And
+ * cpuhp_setup_state() returns the actual failure reason to the caller,
+ * which in turn will call the cleanup routine.
+ */
+ if (!nest_pmus)
+ return 0;
+
/*
* Now that this cpu is one of the designated,
* find a next cpu a) which is online and b) in same chip.
if (nest_pmus == 1) {
cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE);
kfree(nest_imc_refc);
+ kfree(per_nest_pmu_arr);
}
if (nest_pmus > 0)
kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
kfree(pmu_ptr);
- kfree(per_nest_pmu_arr);
return;
}
ret = nest_pmu_cpumask_init();
if (ret) {
mutex_unlock(&nest_init_lock);
+ kfree(nest_imc_refc);
+ kfree(per_nest_pmu_arr);
goto err_free;
}
}
}
static struct lock_class_key fsl_msi_irq_class;
+static struct lock_class_key fsl_msi_irq_request_class;
static int fsl_msi_setup_hwirq(struct fsl_msi *msi, struct platform_device *dev,
int offset, int irq_index)
dev_err(&dev->dev, "No memory for MSI cascade data\n");
return -ENOMEM;
}
- irq_set_lockdep_class(virt_msir, &fsl_msi_irq_class);
+ irq_set_lockdep_class(virt_msir, &fsl_msi_irq_class,
+ &fsl_msi_irq_request_class);
cascade_data->index = offset;
cascade_data->msi_data = msi;
cascade_data->virq = virt_msir;
+CONFIG_SMP=y
+CONFIG_PCI=y
+CONFIG_PCIE_XILINX=y
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_CFS_BANDWIDTH=y
+CONFIG_CGROUP_BPF=y
+CONFIG_NAMESPACES=y
+CONFIG_USER_NS=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_EXPERT=y
+CONFIG_CHECKPOINT_RESTORE=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+CONFIG_NETLINK_DIAG=y
+CONFIG_DEVTMPFS=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_VIRTIO_BLK=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_BLK_DEV_SR=y
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_SATA_AHCI_PLATFORM=y
+CONFIG_NETDEVICES=y
+CONFIG_VIRTIO_NET=y
+CONFIG_MACB=y
+CONFIG_E1000E=y
+CONFIG_R8169=y
+CONFIG_MICROSEMI_PHY=y
+CONFIG_INPUT_MOUSEDEV=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_OF_PLATFORM=y
+# CONFIG_PTP_1588_CLOCK is not set
+CONFIG_DRM=y
+CONFIG_DRM_RADEON=y
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_USB=y
+CONFIG_USB_XHCI_HCD=y
+CONFIG_USB_XHCI_PLATFORM=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_EHCI_HCD_PLATFORM=y
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_OHCI_HCD_PLATFORM=y
+CONFIG_USB_STORAGE=y
+CONFIG_USB_UAS=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_RAS=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_AUTOFS4_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_NFS_FS=y
+CONFIG_NFS_V4=y
+CONFIG_NFS_V4_1=y
+CONFIG_NFS_V4_2=y
+CONFIG_ROOT_NFS=y
+# CONFIG_RCU_TRACE is not set
+CONFIG_CRYPTO_USER_API_HASH=y
#define smp_rmb() RISCV_FENCE(r,r)
#define smp_wmb() RISCV_FENCE(w,w)
+/*
+ * This is a very specific barrier: it's currently only used in two places in
+ * the kernel, both in the scheduler. See include/linux/spinlock.h for the two
+ * orderings it guarantees, but the "critical section is RCsc" guarantee
+ * mandates a barrier on RISC-V. The sequence looks like:
+ *
+ * lr.aq lock
+ * sc lock <= LOCKED
+ * smp_mb__after_spinlock()
+ * // critical section
+ * lr lock
+ * sc.rl lock <= UNLOCKED
+ *
+ * The AQ/RL pair provides a RCpc critical section, but there's not really any
+ * way we can take advantage of that here because the ordering is only enforced
+ * on that one lock. Thus, we're just doing a full fence.
+ */
+#define smp_mb__after_spinlock() RISCV_FENCE(rw,rw)
+
#include <asm-generic/barrier.h>
#endif /* __ASSEMBLY__ */
#include <linux/const.h>
/* Status register flags */
-#define SR_IE _AC(0x00000002, UL) /* Interrupt Enable */
-#define SR_PIE _AC(0x00000020, UL) /* Previous IE */
-#define SR_PS _AC(0x00000100, UL) /* Previously Supervisor */
-#define SR_SUM _AC(0x00040000, UL) /* Supervisor may access User Memory */
+#define SR_SIE _AC(0x00000002, UL) /* Supervisor Interrupt Enable */
+#define SR_SPIE _AC(0x00000020, UL) /* Previous Supervisor IE */
+#define SR_SPP _AC(0x00000100, UL) /* Previously Supervisor */
+#define SR_SUM _AC(0x00040000, UL) /* Supervisor may access User Memory */
#define SR_FS _AC(0x00006000, UL) /* Floating-point Status */
#define SR_FS_OFF _AC(0x00000000, UL)
#include <linux/types.h>
-#ifdef CONFIG_MMU
-
extern void __iomem *ioremap(phys_addr_t offset, unsigned long size);
/*
extern void iounmap(volatile void __iomem *addr);
-#endif /* CONFIG_MMU */
-
/* Generic IO read/write. These perform native-endian accesses. */
#define __raw_writeb __raw_writeb
static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
/* unconditionally enable interrupts */
static inline void arch_local_irq_enable(void)
{
- csr_set(sstatus, SR_IE);
+ csr_set(sstatus, SR_SIE);
}
/* unconditionally disable interrupts */
static inline void arch_local_irq_disable(void)
{
- csr_clear(sstatus, SR_IE);
+ csr_clear(sstatus, SR_SIE);
}
/* get status and disable interrupts */
static inline unsigned long arch_local_irq_save(void)
{
- return csr_read_clear(sstatus, SR_IE);
+ return csr_read_clear(sstatus, SR_SIE);
}
/* test flags */
static inline int arch_irqs_disabled_flags(unsigned long flags)
{
- return !(flags & SR_IE);
+ return !(flags & SR_SIE);
}
/* test hardware interrupt enable bit */
/* set interrupt enabled status */
static inline void arch_local_irq_restore(unsigned long flags)
{
- csr_set(sstatus, flags & SR_IE);
+ csr_set(sstatus, flags & SR_SIE);
}
#endif /* _ASM_RISCV_IRQFLAGS_H */
#ifndef __ASSEMBLY__
-#ifdef CONFIG_MMU
-
/* Page Upper Directory not used in RISC-V */
#include <asm-generic/pgtable-nopud.h>
#include <asm/page.h>
/* No page table caches to initialize */
}
-#endif /* CONFIG_MMU */
-
#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
#define VMALLOC_END (PAGE_OFFSET - 1)
#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
#define REG_FMT "%08lx"
#endif
-#define user_mode(regs) (((regs)->sstatus & SR_PS) == 0)
+#define user_mode(regs) (((regs)->sstatus & SR_SPP) == 0)
/* Helpers for working with the instruction pointer */
#ifndef _ASM_RISCV_TLBFLUSH_H
#define _ASM_RISCV_TLBFLUSH_H
-#ifdef CONFIG_MMU
-
#include <linux/mm_types.h>
/*
flush_tlb_all();
}
-#endif /* CONFIG_MMU */
-
#endif /* _ASM_RISCV_TLBFLUSH_H */
* call.
*/
-#ifdef CONFIG_MMU
#define __get_user_asm(insn, x, ptr, err) \
do { \
uintptr_t __tmp; \
__disable_user_access(); \
(x) = __x; \
} while (0)
-#endif /* CONFIG_MMU */
#ifdef CONFIG_64BIT
#define __get_user_8(x, ptr, err) \
__get_user_asm("ld", x, ptr, err)
#else /* !CONFIG_64BIT */
-#ifdef CONFIG_MMU
#define __get_user_8(x, ptr, err) \
do { \
u32 __user *__ptr = (u32 __user *)(ptr); \
(x) = (__typeof__(x))((__typeof__((x)-(x)))( \
(((u64)__hi << 32) | __lo))); \
} while (0)
-#endif /* CONFIG_MMU */
#endif /* CONFIG_64BIT */
((x) = 0, -EFAULT); \
})
-
-#ifdef CONFIG_MMU
#define __put_user_asm(insn, x, ptr, err) \
do { \
uintptr_t __tmp; \
: "rJ" (__x), "i" (-EFAULT)); \
__disable_user_access(); \
} while (0)
-#endif /* CONFIG_MMU */
-
#ifdef CONFIG_64BIT
#define __put_user_8(x, ptr, err) \
__put_user_asm("sd", x, ptr, err)
#else /* !CONFIG_64BIT */
-#ifdef CONFIG_MMU
#define __put_user_8(x, ptr, err) \
do { \
u32 __user *__ptr = (u32 __user *)(ptr); \
: "rJ" (__x), "rJ" (__x >> 32), "i" (-EFAULT)); \
__disable_user_access(); \
} while (0)
-#endif /* CONFIG_MMU */
#endif /* CONFIG_64BIT */
* will set "err" to -EFAULT, while successful accesses return the previous
* value.
*/
-#ifdef CONFIG_MMU
#define __cmpxchg_user(ptr, old, new, err, size, lrb, scb) \
({ \
__typeof__(ptr) __ptr = (ptr); \
(err) = __err; \
__ret; \
})
-#endif /* CONFIG_MMU */
#endif /* _ASM_RISCV_UACCESS_H */
#define __ARCH_HAVE_MMU
#define __ARCH_WANT_SYS_CLONE
#include <uapi/asm/unistd.h>
+#include <uapi/asm/syscalls.h>
+++ /dev/null
-/*
- * Copyright (C) 2017 SiFive
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef _ASM_RISCV_VDSO_SYSCALLS_H
-#define _ASM_RISCV_VDSO_SYSCALLS_H
-
-#ifdef CONFIG_SMP
-
-/* These syscalls are only used by the vDSO and are not in the uapi. */
-#define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
-__SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
-
-#endif
-
-#endif /* _ASM_RISCV_VDSO_H */
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2017 SiFive
+ */
+
+#ifndef _ASM__UAPI__SYSCALLS_H
+#define _ASM__UAPI__SYSCALLS_H
+
+/*
+ * Allows the instruction cache to be flushed from userspace. Despite RISC-V
+ * having a direct 'fence.i' instruction available to userspace (which we
+ * can't trap!), that's not actually viable when running on Linux because the
+ * kernel might schedule a process on another hart. There is no way for
+ * userspace to handle this without invoking the kernel (as it doesn't know the
+ * thread->hart mappings), so we've defined a RISC-V specific system call to
+ * flush the instruction cache.
+ *
+ * __NR_riscv_flush_icache is defined to flush the instruction cache over an
+ * address range, with the flush applying to either all threads or just the
+ * caller. We don't currently do anything with the address range, that's just
+ * in there for forwards compatibility.
+ */
+#define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
+__SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
+
+#endif
addi s2, s2, 0x4
REG_S s2, PT_SEPC(sp)
/* System calls run with interrupts enabled */
- csrs sstatus, SR_IE
+ csrs sstatus, SR_SIE
/* Trace syscalls, but only if requested by the user. */
REG_L t0, TASK_TI_FLAGS(tp)
andi t0, t0, _TIF_SYSCALL_TRACE
ret_from_exception:
REG_L s0, PT_SSTATUS(sp)
- csrc sstatus, SR_IE
- andi s0, s0, SR_PS
+ csrc sstatus, SR_SIE
+ andi s0, s0, SR_SPP
bnez s0, restore_all
resume_userspace:
bnez s1, work_resched
work_notifysig:
/* Handle pending signals and notify-resume requests */
- csrs sstatus, SR_IE /* Enable interrupts for do_notify_resume() */
+ csrs sstatus, SR_SIE /* Enable interrupts for do_notify_resume() */
move a0, sp /* pt_regs */
move a1, s0 /* current_thread_info->flags */
tail do_notify_resume
void start_thread(struct pt_regs *regs, unsigned long pc,
unsigned long sp)
{
- regs->sstatus = SR_PIE /* User mode, irqs on */ | SR_FS_INITIAL;
+ regs->sstatus = SR_SPIE /* User mode, irqs on */ | SR_FS_INITIAL;
regs->sepc = pc;
regs->sp = sp;
set_fs(USER_DS);
const register unsigned long gp __asm__ ("gp");
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gp = gp;
- childregs->sstatus = SR_PS | SR_PIE; /* Supervisor, irqs on */
+ childregs->sstatus = SR_SPP | SR_SPIE; /* Supervisor, irqs on */
p->thread.ra = (unsigned long)ret_from_kernel_thread;
p->thread.s[0] = usp; /* fn */
#include <asm/tlbflush.h>
#include <asm/thread_info.h>
-#ifdef CONFIG_HVC_RISCV_SBI
-#include <asm/hvc_riscv_sbi.h>
-#endif
-
#ifdef CONFIG_DUMMY_CONSOLE
struct screen_info screen_info = {
.orig_video_lines = 30,
void __init setup_arch(char **cmdline_p)
{
-#if defined(CONFIG_HVC_RISCV_SBI)
- if (likely(early_console == NULL)) {
- early_console = &riscv_sbi_early_console_dev;
- register_console(early_console);
- }
-#endif
-
#ifdef CONFIG_CMDLINE_BOOL
#ifdef CONFIG_CMDLINE_OVERRIDE
strlcpy(boot_command_line, builtin_cmdline, COMMAND_LINE_SIZE);
bool local = (flags & SYS_RISCV_FLUSH_ICACHE_LOCAL) != 0;
/* Check the reserved flags. */
- if (unlikely(flags & !SYS_RISCV_FLUSH_ICACHE_ALL))
+ if (unlikely(flags & ~SYS_RISCV_FLUSH_ICACHE_ALL))
return -EINVAL;
flush_icache_mm(mm, local);
void *sys_call_table[__NR_syscalls] = {
[0 ... __NR_syscalls - 1] = sys_ni_syscall,
#include <asm/unistd.h>
-#include <asm/vdso-syscalls.h>
};
#include <linux/linkage.h>
#include <asm/unistd.h>
-#include <asm/vdso-syscalls.h>
.text
/* int __vdso_flush_icache(void *start, void *end, unsigned long flags); */
goto vmalloc_fault;
/* Enable interrupts if they were enabled in the parent context. */
- if (likely(regs->sstatus & SR_PIE))
+ if (likely(regs->sstatus & SR_SPIE))
local_irq_enable();
/*
return pud;
}
-#define pud_write pud_write
-static inline int pud_write(pud_t pud)
-{
- return (pud_val(pud) & _REGION3_ENTRY_WRITE) != 0;
-}
-
static inline pud_t pud_mkclean(pud_t pud)
{
if (pud_large(pud)) {
return retval;
}
+ groups_sort(group_info);
retval = set_current_groups(group_info);
put_group_info(group_info);
if (kvm->arch.use_cmma) {
/*
- * Get the last slot. They should be sorted by base_gfn, so the
- * last slot is also the one at the end of the address space.
- * We have verified above that at least one slot is present.
+ * Get the first slot. They are reverse sorted by base_gfn, so
+ * the first slot is also the one at the end of the address
+ * space. We have verified above that at least one slot is
+ * present.
*/
- ms = slots->memslots + slots->used_slots - 1;
+ ms = slots->memslots;
/* round up so we only use full longs */
ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG);
/* allocate enough bytes to store all the bits */
cbrlo[entries] = gfn << PAGE_SHIFT;
}
- if (orc) {
+ if (orc && gfn < ms->bitmap_size) {
/* increment only if we are really flipping the bit to 1 */
if (!test_and_set_bit(gfn, ms->pgste_bitmap))
atomic64_inc(&ms->dirty_pages);
void disable_sacf_uaccess(mm_segment_t old_fs)
{
+ current->thread.mm_segment = old_fs;
if (old_fs == USER_DS && test_facility(27)) {
__ctl_load(S390_lowcore.user_asce, 1, 1);
clear_cpu_flag(CIF_ASCE_PRIMARY);
}
- current->thread.mm_segment = old_fs;
}
EXPORT_SYMBOL(disable_sacf_uaccess);
#define SEEN_LITERAL 8 /* code uses literals */
#define SEEN_FUNC 16 /* calls C functions */
#define SEEN_TAIL_CALL 32 /* code uses tail calls */
-#define SEEN_SKB_CHANGE 64 /* code changes skb data */
-#define SEEN_REG_AX 128 /* code uses constant blinding */
+#define SEEN_REG_AX 64 /* code uses constant blinding */
#define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB)
/*
EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0,
REG_15, 152);
}
- if (jit->seen & SEEN_SKB)
+ if (jit->seen & SEEN_SKB) {
emit_load_skb_data_hlen(jit);
- if (jit->seen & SEEN_SKB_CHANGE)
/* stg %b1,ST_OFF_SKBP(%r0,%r15) */
EMIT6_DISP_LH(0xe3000000, 0x0024, BPF_REG_1, REG_0, REG_15,
STK_OFF_SKBP);
+ }
}
/*
EMIT2(0x0d00, REG_14, REG_W1);
/* lgr %b0,%r2: load return value into %b0 */
EMIT4(0xb9040000, BPF_REG_0, REG_2);
- if (bpf_helper_changes_pkt_data((void *)func)) {
- jit->seen |= SEEN_SKB_CHANGE;
+ if ((jit->seen & SEEN_SKB) &&
+ bpf_helper_changes_pkt_data((void *)func)) {
/* lg %b1,ST_OFF_SKBP(%r15) */
EMIT6_DISP_LH(0xe3000000, 0x0004, BPF_REG_1, REG_0,
REG_15, STK_OFF_SKBP);
static int __dma_purge_tlb(struct zpci_dev *zdev, dma_addr_t dma_addr,
size_t size, int flags)
{
+ unsigned long irqflags;
+ int ret;
+
/*
* With zdev->tlb_refresh == 0, rpcit is not required to establish new
* translations when previously invalid translation-table entries are
return 0;
}
- return zpci_refresh_trans((u64) zdev->fh << 32, dma_addr,
- PAGE_ALIGN(size));
+ ret = zpci_refresh_trans((u64) zdev->fh << 32, dma_addr,
+ PAGE_ALIGN(size));
+ if (ret == -ENOMEM && !s390_iommu_strict) {
+ /* enable the hypervisor to free some resources */
+ if (zpci_refresh_global(zdev))
+ goto out;
+
+ spin_lock_irqsave(&zdev->iommu_bitmap_lock, irqflags);
+ bitmap_andnot(zdev->iommu_bitmap, zdev->iommu_bitmap,
+ zdev->lazy_bitmap, zdev->iommu_pages);
+ bitmap_zero(zdev->lazy_bitmap, zdev->iommu_pages);
+ spin_unlock_irqrestore(&zdev->iommu_bitmap_lock, irqflags);
+ ret = 0;
+ }
+out:
+ return ret;
}
static int dma_update_trans(struct zpci_dev *zdev, unsigned long pa,
if (cc)
zpci_err_insn(cc, status, addr, range);
+ if (cc == 1 && (status == 4 || status == 16))
+ return -ENOMEM;
+
return (cc) ? -EIO : 0;
}
*/
#include <linux/init.h>
#include <linux/platform_device.h>
+#include <linux/sh_eth.h>
#include <mach-se/mach/se.h>
#include <mach-se/mach/mrshpc.h>
#include <asm/machvec.h>
#if defined(CONFIG_CPU_SUBTYPE_SH7710) ||\
defined(CONFIG_CPU_SUBTYPE_SH7712)
/* SH771X Ethernet driver */
+static struct sh_eth_plat_data sh_eth_plat = {
+ .phy = PHY_ID,
+ .phy_interface = PHY_INTERFACE_MODE_MII,
+};
+
static struct resource sh_eth0_resources[] = {
[0] = {
.start = SH_ETH0_BASE,
- .end = SH_ETH0_BASE + 0x1B8,
+ .end = SH_ETH0_BASE + 0x1B8 - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
+ .start = SH_TSU_BASE,
+ .end = SH_TSU_BASE + 0x200 - 1,
+ .flags = IORESOURCE_MEM,
+ },
+ [2] = {
.start = SH_ETH0_IRQ,
.end = SH_ETH0_IRQ,
.flags = IORESOURCE_IRQ,
.name = "sh771x-ether",
.id = 0,
.dev = {
- .platform_data = PHY_ID,
+ .platform_data = &sh_eth_plat,
},
.num_resources = ARRAY_SIZE(sh_eth0_resources),
.resource = sh_eth0_resources,
static struct resource sh_eth1_resources[] = {
[0] = {
.start = SH_ETH1_BASE,
- .end = SH_ETH1_BASE + 0x1B8,
+ .end = SH_ETH1_BASE + 0x1B8 - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
+ .start = SH_TSU_BASE,
+ .end = SH_TSU_BASE + 0x200 - 1,
+ .flags = IORESOURCE_MEM,
+ },
+ [2] = {
.start = SH_ETH1_IRQ,
.end = SH_ETH1_IRQ,
.flags = IORESOURCE_IRQ,
.name = "sh771x-ether",
.id = 1,
.dev = {
- .platform_data = PHY_ID,
+ .platform_data = &sh_eth_plat,
},
.num_resources = ARRAY_SIZE(sh_eth1_resources),
.resource = sh_eth1_resources,
/* Base address */
#define SH_ETH0_BASE 0xA7000000
#define SH_ETH1_BASE 0xA7000400
+#define SH_TSU_BASE 0xA7000800
/* PHY ID */
#if defined(CONFIG_CPU_SUBTYPE_SH7710)
# define PHY_ID 0x00
.previous
ENTRY(__arch_hweight64)
- sethi %hi(__sw_hweight16), %g1
- jmpl %g1 + %lo(__sw_hweight16), %g0
+ sethi %hi(__sw_hweight64), %g1
+ jmpl %g1 + %lo(__sw_hweight64), %g0
nop
ENDPROC(__arch_hweight64)
EXPORT_SYMBOL(__arch_hweight64)
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %p (rpc %p) sp %p error %x",
+ printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->pc, (void *)regs->u_regs[UREG_I7],
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %p (rpc %p) sp %p error %x",
+ printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->tpc, (void *)regs->u_regs[UREG_I7],
if (!(pmd_val(pmd) & _PAGE_VALID))
return 0;
- if (!pmd_access_permitted(pmd, write))
+ if (write && !pmd_write(pmd))
return 0;
refs = 0;
if (!(pud_val(pud) & _PAGE_VALID))
return 0;
- if (!pud_access_permitted(pud, write))
+ if (write && !pud_write(pud))
return 0;
refs = 0;
u8 *func = ((u8 *)__bpf_call_base) + imm;
ctx->saw_call = true;
+ if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
+ emit_reg_move(bpf2sparc[BPF_REG_1], L7, ctx);
emit_call((u32 *)func, ctx);
emit_nop(ctx);
emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
- if (bpf_helper_changes_pkt_data(func) && ctx->saw_ld_abs_ind)
- load_skb_regs(ctx, bpf2sparc[BPF_REG_6]);
+ if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
+ load_skb_regs(ctx, L7);
break;
}
generic-y += barrier.h
+generic-y += bpf_perf_event.h
generic-y += bug.h
generic-y += clkdev.h
generic-y += current.h
/*
* Needed since we do not use the asm-generic/mm_hooks.h:
*/
-static inline void arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
+static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
{
uml_setup_stubs(mm);
+ return 0;
}
extern void arch_exit_mmap(struct mm_struct *mm);
static inline void arch_unmap(struct mm_struct *mm,
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %p sp %p error %x",
+ printk("%s%s[%d]: segfault at %lx ip %px sp %px error %x",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi),
(void *)UPT_IP(regs), (void *)UPT_SP(regs),
} \
} while (0)
-static inline void arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+ struct mm_struct *mm)
{
+ return 0;
}
static inline void arch_unmap(struct mm_struct *mm,
/* if that doesn't kill us, halt */
panic("Oops failed to kill thread");
}
-EXPORT_SYMBOL(abort);
void __init trap_init(void)
{
config NR_CPUS
int "Maximum number of CPUs" if SMP && !MAXSMP
range 2 8 if SMP && X86_32 && !X86_BIGSMP
- range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK
+ range 2 64 if SMP && X86_32 && X86_BIGSMP
+ range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK && X86_64
range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64
default "1" if !SMP
default "8192" if MAXSMP
config UNWINDER_GUESS
bool "Guess unwinder"
depends on EXPERT
+ depends on !STACKDEPOT
---help---
This option enables the "guess" unwinder for unwinding kernel stack
traces. It scans the stack and reports every kernel text address it
ifdef CONFIG_X86_64
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
+ vmlinux-objs-y += $(obj)/pgtable_64.o
endif
$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
leaq boot_stack_end(%rbx), %rsp
#ifdef CONFIG_X86_5LEVEL
- /* Check if 5-level paging has already enabled */
- movq %cr4, %rax
- testl $X86_CR4_LA57, %eax
- jnz lvl5
+ /*
+ * Check if we need to enable 5-level paging.
+ * RSI holds real mode data and need to be preserved across
+ * a function call.
+ */
+ pushq %rsi
+ call l5_paging_required
+ popq %rsi
+
+ /* If l5_paging_required() returned zero, we're done here. */
+ cmpq $0, %rax
+ je lvl5
/*
* At this point we are in long mode with 4-level paging enabled,
}
}
+static bool l5_supported(void)
+{
+ /* Check if leaf 7 is supported. */
+ if (native_cpuid_eax(0) < 7)
+ return 0;
+
+ /* Check if la57 is supported. */
+ return native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31));
+}
+
#if CONFIG_X86_NEED_RELOCS
static void handle_relocations(void *output, unsigned long output_len,
unsigned long virt_addr)
console_init();
debug_putstr("early console in extract_kernel\n");
+ if (IS_ENABLED(CONFIG_X86_5LEVEL) && !l5_supported()) {
+ error("This linux kernel as configured requires 5-level paging\n"
+ "This CPU does not support the required 'cr4.la57' feature\n"
+ "Unable to boot - please use a kernel appropriate for your CPU\n");
+ }
+
free_mem_ptr = heap; /* Heap */
free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
*/
#undef CONFIG_AMD_MEM_ENCRYPT
+/* No PAGE_TABLE_ISOLATION support needed either: */
+#undef CONFIG_PAGE_TABLE_ISOLATION
+
#include "misc.h"
/* These actually do the work of building the kernel identity maps. */
--- /dev/null
+#include <asm/processor.h>
+
+/*
+ * __force_order is used by special_insns.h asm code to force instruction
+ * serialization.
+ *
+ * It is not referenced from the code, but GCC < 5 with -fPIE would fail
+ * due to an undefined symbol. Define it to make these ancient GCCs work.
+ */
+unsigned long __force_order;
+
+int l5_paging_required(void)
+{
+ /* Check if leaf 7 is supported. */
+
+ if (native_cpuid_eax(0) < 7)
+ return 0;
+
+ /* Check if la57 is supported. */
+ if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ return 0;
+
+ /* Check if 5-level paging has already been enabled. */
+ if (native_read_cr4() & X86_CR4_LA57)
+ return 0;
+
+ return 1;
+}
# Make sure the files actually exist
verify "$FBZIMAGE"
-verify "$MTOOLSRC"
genbzdisk() {
+ verify "$MTOOLSRC"
mformat a:
syslinux $FIMAGE
echo "$KCMDLINE" | mcopy - a:syslinux.cfg
}
genfdimage144() {
+ verify "$MTOOLSRC"
dd if=/dev/zero of=$FIMAGE bs=1024 count=1440 2> /dev/null
mformat v:
syslinux $FIMAGE
}
genfdimage288() {
+ verify "$MTOOLSRC"
dd if=/dev/zero of=$FIMAGE bs=1024 count=2880 2> /dev/null
mformat w:
syslinux $FIMAGE
mcopy $FBZIMAGE w:linux
}
-genisoimage() {
+geniso() {
tmp_dir=`dirname $FIMAGE`/isoimage
rm -rf $tmp_dir
mkdir $tmp_dir
- for i in lib lib64 share end ; do
+ for i in lib lib64 share ; do
for j in syslinux ISOLINUX ; do
if [ -f /usr/$i/$j/isolinux.bin ] ; then
isolinux=/usr/$i/$j/isolinux.bin
- cp $isolinux $tmp_dir
fi
done
for j in syslinux syslinux/modules/bios ; do
if [ -f /usr/$i/$j/ldlinux.c32 ]; then
ldlinux=/usr/$i/$j/ldlinux.c32
- cp $ldlinux $tmp_dir
fi
done
if [ -n "$isolinux" -a -n "$ldlinux" ] ; then
break
fi
- if [ $i = end -a -z "$isolinux" ] ; then
- echo 'Need an isolinux.bin file, please install syslinux/isolinux.'
- exit 1
- fi
done
+ if [ -z "$isolinux" ] ; then
+ echo 'Need an isolinux.bin file, please install syslinux/isolinux.'
+ exit 1
+ fi
+ if [ -z "$ldlinux" ] ; then
+ echo 'Need an ldlinux.c32 file, please install syslinux/isolinux.'
+ exit 1
+ fi
+ cp $isolinux $tmp_dir
+ cp $ldlinux $tmp_dir
cp $FBZIMAGE $tmp_dir/linux
echo "$KCMDLINE" > $tmp_dir/isolinux.cfg
if [ -f "$FDINITRD" ] ; then
cp "$FDINITRD" $tmp_dir/initrd.img
fi
- mkisofs -J -r -input-charset=utf-8 -quiet -o $FIMAGE -b isolinux.bin \
- -c boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table \
- $tmp_dir
+ genisoimage -J -r -input-charset=utf-8 -quiet -o $FIMAGE \
+ -b isolinux.bin -c boot.cat -no-emul-boot -boot-load-size 4 \
+ -boot-info-table $tmp_dir
isohybrid $FIMAGE 2>/dev/null || true
rm -rf $tmp_dir
}
bzdisk) genbzdisk;;
fdimage144) genfdimage144;;
fdimage288) genfdimage288;;
- isoimage) genisoimage;;
+ isoimage) geniso;;
*) echo 'Unknown image format'; exit 1;
esac
salsa20_ivsetup(ctx, walk.iv);
- if (likely(walk.nbytes == nbytes))
- {
- salsa20_encrypt_bytes(ctx, walk.src.virt.addr,
- walk.dst.virt.addr, nbytes);
- return blkcipher_walk_done(desc, &walk, 0);
- }
-
while (walk.nbytes >= 64) {
salsa20_encrypt_bytes(ctx, walk.src.virt.addr,
walk.dst.virt.addr,
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/jump_label.h>
#include <asm/unwind_hints.h>
+#include <asm/cpufeatures.h>
+#include <asm/page_types.h>
+#include <asm/percpu.h>
+#include <asm/asm-offsets.h>
+#include <asm/processor-flags.h>
/*
#endif
.endm
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+/*
+ * PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
+ * halves:
+ */
+#define PTI_SWITCH_PGTABLES_MASK (1<<PAGE_SHIFT)
+#define PTI_SWITCH_MASK (PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))
+
+.macro SET_NOFLUSH_BIT reg:req
+ bts $X86_CR3_PCID_NOFLUSH_BIT, \reg
+.endm
+
+.macro ADJUST_KERNEL_CR3 reg:req
+ ALTERNATIVE "", "SET_NOFLUSH_BIT \reg", X86_FEATURE_PCID
+ /* Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
+ andq $(~PTI_SWITCH_MASK), \reg
+.endm
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+ mov %cr3, \scratch_reg
+ ADJUST_KERNEL_CR3 \scratch_reg
+ mov \scratch_reg, %cr3
+.Lend_\@:
+.endm
+
+#define THIS_CPU_user_pcid_flush_mask \
+ PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
+
+.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+ mov %cr3, \scratch_reg
+
+ ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
+
+ /*
+ * Test if the ASID needs a flush.
+ */
+ movq \scratch_reg, \scratch_reg2
+ andq $(0x7FF), \scratch_reg /* mask ASID */
+ bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jnc .Lnoflush_\@
+
+ /* Flush needed, clear the bit */
+ btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ movq \scratch_reg2, \scratch_reg
+ jmp .Lwrcr3_\@
+
+.Lnoflush_\@:
+ movq \scratch_reg2, \scratch_reg
+ SET_NOFLUSH_BIT \scratch_reg
+
+.Lwrcr3_\@:
+ /* Flip the PGD and ASID to the user version */
+ orq $(PTI_SWITCH_MASK), \scratch_reg
+ mov \scratch_reg, %cr3
+.Lend_\@:
+.endm
+
+.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
+ pushq %rax
+ SWITCH_TO_USER_CR3_NOSTACK scratch_reg=\scratch_reg scratch_reg2=%rax
+ popq %rax
+.endm
+
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+ ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI
+ movq %cr3, \scratch_reg
+ movq \scratch_reg, \save_reg
+ /*
+ * Is the "switch mask" all zero? That means that both of
+ * these are zero:
+ *
+ * 1. The user/kernel PCID bit, and
+ * 2. The user/kernel "bit" that points CR3 to the
+ * bottom half of the 8k PGD
+ *
+ * That indicates a kernel CR3 value, not a user CR3.
+ */
+ testq $(PTI_SWITCH_MASK), \scratch_reg
+ jz .Ldone_\@
+
+ ADJUST_KERNEL_CR3 \scratch_reg
+ movq \scratch_reg, %cr3
+
+.Ldone_\@:
+.endm
+
+.macro RESTORE_CR3 scratch_reg:req save_reg:req
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+ ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
+
+ /*
+ * KERNEL pages can always resume with NOFLUSH as we do
+ * explicit flushes.
+ */
+ bt $X86_CR3_PTI_SWITCH_BIT, \save_reg
+ jnc .Lnoflush_\@
+
+ /*
+ * Check if there's a pending flush for the user ASID we're
+ * about to set.
+ */
+ movq \save_reg, \scratch_reg
+ andq $(0x7FF), \scratch_reg
+ bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jnc .Lnoflush_\@
+
+ btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
+ jmp .Lwrcr3_\@
+
+.Lnoflush_\@:
+ SET_NOFLUSH_BIT \save_reg
+
+.Lwrcr3_\@:
+ /*
+ * The CR3 write could be avoided when not changing its value,
+ * but would require a CR3 read *and* a scratch register.
+ */
+ movq \save_reg, %cr3
+.Lend_\@:
+.endm
+
+#else /* CONFIG_PAGE_TABLE_ISOLATION=n: */
+
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+.endm
+.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
+.endm
+.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
+.endm
+.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
+.endm
+.macro RESTORE_CR3 scratch_reg:req save_reg:req
+.endm
+
+#endif
+
#endif /* CONFIG_X86_64 */
/*
movl %esp, %eax # pt_regs pointer
/* Are we currently on the SYSENTER stack? */
- PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
- subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */
- cmpl $SIZEOF_SYSENTER_stack, %ecx
+ movl PER_CPU_VAR(cpu_entry_area), %ecx
+ addl $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
+ subl %eax, %ecx /* ecx = (end of entry_stack) - esp */
+ cmpl $SIZEOF_entry_stack, %ecx
jb .Ldebug_from_sysenter_stack
TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
/* Are we currently on the SYSENTER stack? */
- PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx)
- subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */
- cmpl $SIZEOF_SYSENTER_stack, %ecx
+ movl PER_CPU_VAR(cpu_entry_area), %ecx
+ addl $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
+ subl %eax, %ecx /* ecx = (end of entry_stack) - esp */
+ cmpl $SIZEOF_entry_stack, %ecx
jb .Lnmi_from_sysenter_stack
/* Not on SYSENTER stack. */
#include <asm/segment.h>
#include <asm/cache.h>
#include <asm/errno.h>
-#include "calling.h"
#include <asm/asm-offsets.h>
#include <asm/msr.h>
#include <asm/unistd.h>
#include <asm/frame.h>
#include <linux/err.h>
+#include "calling.h"
+
.code64
.section .entry.text, "ax"
* with them due to bugs in both AMD and Intel CPUs.
*/
+ .pushsection .entry_trampoline, "ax"
+
+/*
+ * The code in here gets remapped into cpu_entry_area's trampoline. This means
+ * that the assembler and linker have the wrong idea as to where this code
+ * lives (and, in fact, it's mapped more than once, so it's not even at a
+ * fixed address). So we can't reference any symbols outside the entry
+ * trampoline and expect it to work.
+ *
+ * Instead, we carefully abuse %rip-relative addressing.
+ * _entry_trampoline(%rip) refers to the start of the remapped) entry
+ * trampoline. We can thus find cpu_entry_area with this macro:
+ */
+
+#define CPU_ENTRY_AREA \
+ _entry_trampoline - CPU_ENTRY_AREA_entry_trampoline(%rip)
+
+/* The top word of the SYSENTER stack is hot and is usable as scratch space. */
+#define RSP_SCRATCH CPU_ENTRY_AREA_entry_stack + \
+ SIZEOF_entry_stack - 8 + CPU_ENTRY_AREA
+
+ENTRY(entry_SYSCALL_64_trampoline)
+ UNWIND_HINT_EMPTY
+ swapgs
+
+ /* Stash the user RSP. */
+ movq %rsp, RSP_SCRATCH
+
+ /* Note: using %rsp as a scratch reg. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
+ /* Load the top of the task stack into RSP */
+ movq CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp
+
+ /* Start building the simulated IRET frame. */
+ pushq $__USER_DS /* pt_regs->ss */
+ pushq RSP_SCRATCH /* pt_regs->sp */
+ pushq %r11 /* pt_regs->flags */
+ pushq $__USER_CS /* pt_regs->cs */
+ pushq %rcx /* pt_regs->ip */
+
+ /*
+ * x86 lacks a near absolute jump, and we can't jump to the real
+ * entry text with a relative jump. We could push the target
+ * address and then use retq, but this destroys the pipeline on
+ * many CPUs (wasting over 20 cycles on Sandy Bridge). Instead,
+ * spill RDI and restore it in a second-stage trampoline.
+ */
+ pushq %rdi
+ movq $entry_SYSCALL_64_stage2, %rdi
+ jmp *%rdi
+END(entry_SYSCALL_64_trampoline)
+
+ .popsection
+
+ENTRY(entry_SYSCALL_64_stage2)
+ UNWIND_HINT_EMPTY
+ popq %rdi
+ jmp entry_SYSCALL_64_after_hwframe
+END(entry_SYSCALL_64_stage2)
+
ENTRY(entry_SYSCALL_64)
UNWIND_HINT_EMPTY
/*
*/
swapgs
+ /*
+ * This path is not taken when PAGE_TABLE_ISOLATION is disabled so it
+ * is not required to switch CR3.
+ */
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
popq %rsi /* skip rcx */
popq %rdx
popq %rsi
+
+ /*
+ * Now all regs are restored except RSP and RDI.
+ * Save old stack pointer and switch to trampoline stack.
+ */
+ movq %rsp, %rdi
+ movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
+
+ pushq RSP-RDI(%rdi) /* RSP */
+ pushq (%rdi) /* RDI */
+
+ /*
+ * We are on the trampoline stack. All regs except RDI are live.
+ * We can do future final exit work right here.
+ */
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
+
popq %rdi
- movq RSP-ORIG_RAX(%rsp), %rsp
+ popq %rsp
USERGS_SYSRET64
END(entry_SYSCALL_64)
.macro DEBUG_ENTRY_ASSERT_IRQS_OFF
#ifdef CONFIG_DEBUG_ENTRY
- pushfq
- testl $X86_EFLAGS_IF, (%rsp)
+ pushq %rax
+ SAVE_FLAGS(CLBR_RAX)
+ testl $X86_EFLAGS_IF, %eax
jz .Lokay_\@
ud2
.Lokay_\@:
- addq $8, %rsp
+ popq %rax
#endif
.endm
/* 0(%rsp): ~(interrupt number) */
.macro interrupt func
cld
+
+ testb $3, CS-ORIG_RAX(%rsp)
+ jz 1f
+ SWAPGS
+ call switch_to_thread_stack
+1:
+
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
jz 1f
/*
- * IRQ from user mode. Switch to kernel gsbase and inform context
- * tracking that we're in kernel mode.
- */
- SWAPGS
-
- /*
+ * IRQ from user mode.
+ *
* We need to tell lockdep that IRQs are off. We can't do this until
* we fix gsbase, and we should do it before enter_from_user_mode
* (which can take locks). Since TRACE_IRQS_OFF idempotent,
ud2
1:
#endif
- SWAPGS
POP_EXTRA_REGS
- POP_C_REGS
- addq $8, %rsp /* skip regs->orig_ax */
+ popq %r11
+ popq %r10
+ popq %r9
+ popq %r8
+ popq %rax
+ popq %rcx
+ popq %rdx
+ popq %rsi
+
+ /*
+ * The stack is now user RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS.
+ * Save old stack pointer and switch to trampoline stack.
+ */
+ movq %rsp, %rdi
+ movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
+
+ /* Copy the IRET frame to the trampoline stack. */
+ pushq 6*8(%rdi) /* SS */
+ pushq 5*8(%rdi) /* RSP */
+ pushq 4*8(%rdi) /* EFLAGS */
+ pushq 3*8(%rdi) /* CS */
+ pushq 2*8(%rdi) /* RIP */
+
+ /* Push user RDI on the trampoline stack. */
+ pushq (%rdi)
+
+ /*
+ * We are on the trampoline stack. All regs except RDI are live.
+ * We can do future final exit work right here.
+ */
+
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
+
+ /* Restore RDI. */
+ popq %rdi
+ SWAPGS
INTERRUPT_RETURN
*/
pushq %rdi /* Stash user RDI */
- SWAPGS
+ SWAPGS /* to kernel GS */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */
+
movq PER_CPU_VAR(espfix_waddr), %rdi
movq %rax, (0*8)(%rdi) /* user RAX */
movq (1*8)(%rsp), %rax /* user RIP */
/* Now RAX == RSP. */
andl $0xffff0000, %eax /* RAX = (RSP & 0xffff0000) */
- popq %rdi /* Restore user RDI */
/*
* espfix_stack[31:16] == 0. The page tables are set up such that
* still points to an RO alias of the ESPFIX stack.
*/
orq PER_CPU_VAR(espfix_stack), %rax
- SWAPGS
+
+ SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
+ SWAPGS /* to user GS */
+ popq %rdi /* Restore user RDI */
+
movq %rax, %rsp
UNWIND_HINT_IRET_REGS offset=8
/*
* Exception entry points.
*/
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8)
+
+/*
+ * Switch to the thread stack. This is called with the IRET frame and
+ * orig_ax on the stack. (That is, RDI..R12 are not on the stack and
+ * space has not been allocated for them.)
+ */
+ENTRY(switch_to_thread_stack)
+ UNWIND_HINT_FUNC
+
+ pushq %rdi
+ /* Need to switch before accessing the thread stack. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+ movq %rsp, %rdi
+ movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+ UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI
+
+ pushq 7*8(%rdi) /* regs->ss */
+ pushq 6*8(%rdi) /* regs->rsp */
+ pushq 5*8(%rdi) /* regs->eflags */
+ pushq 4*8(%rdi) /* regs->cs */
+ pushq 3*8(%rdi) /* regs->ip */
+ pushq 2*8(%rdi) /* regs->orig_ax */
+ pushq 8(%rdi) /* return address */
+ UNWIND_HINT_FUNC
+
+ movq (%rdi), %rdi
+ ret
+END(switch_to_thread_stack)
.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
ENTRY(\sym)
ALLOC_PT_GPREGS_ON_STACK
- .if \paranoid
- .if \paranoid == 1
+ .if \paranoid < 2
testb $3, CS(%rsp) /* If coming from userspace, switch stacks */
- jnz 1f
+ jnz .Lfrom_usermode_switch_stack_\@
.endif
+
+ .if \paranoid
call paranoid_entry
.else
call error_entry
jmp error_exit
.endif
- .if \paranoid == 1
+ .if \paranoid < 2
/*
- * Paranoid entry from userspace. Switch stacks and treat it
+ * Entry from userspace. Switch stacks and treat it
* as a normal entry. This means that paranoid handlers
* run in real process context if user_mode(regs).
*/
-1:
+.Lfrom_usermode_switch_stack_\@:
call error_entry
-
- movq %rsp, %rdi /* pt_regs pointer */
- call sync_regs
- movq %rax, %rsp /* switch stack */
-
movq %rsp, %rdi /* pt_regs pointer */
.if \has_error_code
js 1f /* negative -> in kernel */
SWAPGS
xorl %ebx, %ebx
-1: ret
+
+1:
+ SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
+
+ ret
END(paranoid_entry)
/*
testl %ebx, %ebx /* swapgs needed? */
jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ
+ RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
SWAPGS_UNSAFE_STACK
jmp .Lparanoid_exit_restore
.Lparanoid_exit_no_swapgs:
* from user mode due to an IRET fault.
*/
SWAPGS
+ /* We have user CR3. Change to kernel CR3. */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
.Lerror_entry_from_usermode_after_swapgs:
+ /* Put us onto the real thread stack. */
+ popq %r12 /* save return addr in %12 */
+ movq %rsp, %rdi /* arg0 = pt_regs pointer */
+ call sync_regs
+ movq %rax, %rsp /* switch stack */
+ ENCODE_FRAME_POINTER
+ pushq %r12
+
/*
* We need to tell lockdep that IRQs are off. We can't do this until
* we fix gsbase, and we should do it before enter_from_user_mode
* .Lgs_change's error handler with kernel gsbase.
*/
SWAPGS
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
jmp .Lerror_entry_done
.Lbstep_iret:
.Lerror_bad_iret:
/*
- * We came from an IRET to user mode, so we have user gsbase.
- * Switch to kernel gsbase:
+ * We came from an IRET to user mode, so we have user
+ * gsbase and CR3. Switch to kernel gsbase and CR3:
*/
SWAPGS
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
/*
* Pretend that the exception came from user mode: set up pt_regs
/*
* Runs on exception stack. Xen PV does not go through this path at all,
* so we can use real assembly here.
+ *
+ * Registers:
+ * %r14: Used to save/restore the CR3 of the interrupted context
+ * when PAGE_TABLE_ISOLATION is in use. Do not clobber.
*/
ENTRY(nmi)
UNWIND_HINT_IRET_REGS
swapgs
cld
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
UNWIND_HINT_IRET_REGS base=%rdx offset=8
movq $-1, %rsi
call do_nmi
+ RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
+
testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
*/
ENTRY(entry_SYSENTER_compat)
/* Interrupts are off on entry. */
- SWAPGS_UNSAFE_STACK
+ SWAPGS
+
+ /* We are about to clobber %rsp anyway, clobbering here is OK */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/*
/* Interrupts are off on entry. */
swapgs
- /* Stash user ESP and switch to the kernel stack. */
+ /* Stash user ESP */
movl %esp, %r8d
+
+ /* Use %rsp as scratch reg. User ESP is stashed in r8 */
+ SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
+
+ /* Switch to the kernel stack */
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
/* Construct struct pt_regs on stack */
* when the system call started, which is already known to user
* code. We zero R8-R10 to avoid info leaks.
*/
+ movq RSP-ORIG_RAX(%rsp), %rsp
+
+ /*
+ * The original userspace %rsp (RSP-ORIG_RAX(%rsp)) is stored
+ * on the process stack which is not mapped to userspace and
+ * not readable after we SWITCH_TO_USER_CR3. Delay the CR3
+ * switch until after after the last reference to the process
+ * stack.
+ *
+ * %r8/%r9 are zeroed before the sysret, thus safe to clobber.
+ */
+ SWITCH_TO_USER_CR3_NOSTACK scratch_reg=%r8 scratch_reg2=%r9
+
xorq %r8, %r8
xorq %r9, %r9
xorq %r10, %r10
- movq RSP-ORIG_RAX(%rsp), %rsp
swapgs
sysretl
END(entry_SYSCALL_compat)
*/
movl %eax, %eax
- /* Construct struct pt_regs on stack (iret frame is already on stack) */
pushq %rax /* pt_regs->orig_ax */
+
+ /* switch to thread stack expects orig_ax to be pushed */
+ call switch_to_thread_stack
+
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
pushq %rdx /* pt_regs->dx */
#include <asm/unistd.h>
#include <asm/fixmap.h>
#include <asm/traps.h>
+#include <asm/paravirt.h>
#define CREATE_TRACE_POINTS
#include "vsyscall_trace.h"
WARN_ON_ONCE(address != regs->ip);
+ /* This should be unreachable in NATIVE mode. */
+ if (WARN_ON(vsyscall_mode == NATIVE))
+ return false;
+
if (vsyscall_mode == NONE) {
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall attempted with vsyscall=none");
return vsyscall_mode != NONE && (addr & PAGE_MASK) == VSYSCALL_ADDR;
}
+/*
+ * The VSYSCALL page is the only user-accessible page in the kernel address
+ * range. Normally, the kernel page tables can have _PAGE_USER clear, but
+ * the tables covering VSYSCALL_ADDR need _PAGE_USER set if vsyscalls
+ * are enabled.
+ *
+ * Some day we may create a "minimal" vsyscall mode in which we emulate
+ * vsyscalls but leave the page not present. If so, we skip calling
+ * this.
+ */
+void __init set_vsyscall_pgtable_user_bits(pgd_t *root)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd = pgd_offset_pgd(root, VSYSCALL_ADDR);
+ set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
+ p4d = p4d_offset(pgd, VSYSCALL_ADDR);
+#if CONFIG_PGTABLE_LEVELS >= 5
+ p4d->p4d |= _PAGE_USER;
+#endif
+ pud = pud_offset(p4d, VSYSCALL_ADDR);
+ set_pud(pud, __pud(pud_val(*pud) | _PAGE_USER));
+ pmd = pmd_offset(pud, VSYSCALL_ADDR);
+ set_pmd(pmd, __pmd(pmd_val(*pmd) | _PAGE_USER));
+}
+
void __init map_vsyscall(void)
{
extern char __vsyscall_page;
unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
- if (vsyscall_mode != NONE)
+ if (vsyscall_mode != NONE) {
__set_fixmap(VSYSCALL_PAGE, physaddr_vsyscall,
vsyscall_mode == NATIVE
? PAGE_KERNEL_VSYSCALL
: PAGE_KERNEL_VVAR);
+ set_vsyscall_pgtable_user_bits(swapper_pg_dir);
+ }
BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
(unsigned long)VSYSCALL_ADDR);
__init int intel_pmu_init(void)
{
+ struct attribute **extra_attr = NULL;
+ struct attribute **to_free = NULL;
union cpuid10_edx edx;
union cpuid10_eax eax;
union cpuid10_ebx ebx;
unsigned int unused;
struct extra_reg *er;
int version, i;
- struct attribute **extra_attr = NULL;
char *name;
if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
hsw_format_attr : nhm_format_attr;
extra_attr = merge_attr(extra_attr, skl_format_attr);
+ to_free = extra_attr;
x86_pmu.cpu_events = get_hsw_events_attrs();
intel_pmu_pebs_data_source_skl(
boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
pr_cont("full-width counters, ");
}
+ kfree(to_free);
return 0;
}
#include <linux/types.h>
#include <linux/slab.h>
+#include <asm/cpu_entry_area.h>
#include <asm/perf_event.h>
+#include <asm/tlbflush.h>
#include <asm/insn.h>
#include "../perf_event.h"
+/* Waste a full page so it can be mapped into the cpu_entry_area */
+DEFINE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store);
+
/* The size of a BTS record in bytes: */
#define BTS_RECORD_SIZE 24
-#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
-#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4)
#define PEBS_FIXUP_SIZE PAGE_SIZE
/*
static DEFINE_PER_CPU(void *, insn_buffer);
-static int alloc_pebs_buffer(int cpu)
+static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+ unsigned long start = (unsigned long)cea;
+ phys_addr_t pa;
+ size_t msz = 0;
+
+ pa = virt_to_phys(addr);
+
+ preempt_disable();
+ for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE)
+ cea_set_pte(cea, pa, prot);
+
+ /*
+ * This is a cross-CPU update of the cpu_entry_area, we must shoot down
+ * all TLB entries for it.
+ */
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
+}
+
+static void ds_clear_cea(void *cea, size_t size)
+{
+ unsigned long start = (unsigned long)cea;
+ size_t msz = 0;
+
+ preempt_disable();
+ for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE)
+ cea_set_pte(cea, 0, PAGE_NONE);
+
+ flush_tlb_kernel_range(start, start + size);
+ preempt_enable();
+}
+
+static void *dsalloc_pages(size_t size, gfp_t flags, int cpu)
+{
+ unsigned int order = get_order(size);
int node = cpu_to_node(cpu);
- int max;
- void *buffer, *ibuffer;
+ struct page *page;
+
+ page = __alloc_pages_node(node, flags | __GFP_ZERO, order);
+ return page ? page_address(page) : NULL;
+}
+
+static void dsfree_pages(const void *buffer, size_t size)
+{
+ if (buffer)
+ free_pages((unsigned long)buffer, get_order(size));
+}
+
+static int alloc_pebs_buffer(int cpu)
+{
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ size_t bsiz = x86_pmu.pebs_buffer_size;
+ int max, node = cpu_to_node(cpu);
+ void *buffer, *ibuffer, *cea;
if (!x86_pmu.pebs)
return 0;
- buffer = kzalloc_node(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
+ buffer = dsalloc_pages(bsiz, GFP_KERNEL, cpu);
if (unlikely(!buffer))
return -ENOMEM;
if (x86_pmu.intel_cap.pebs_format < 2) {
ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
if (!ibuffer) {
- kfree(buffer);
+ dsfree_pages(buffer, bsiz);
return -ENOMEM;
}
per_cpu(insn_buffer, cpu) = ibuffer;
}
-
- max = x86_pmu.pebs_buffer_size / x86_pmu.pebs_record_size;
-
- ds->pebs_buffer_base = (u64)(unsigned long)buffer;
+ hwev->ds_pebs_vaddr = buffer;
+ /* Update the cpu entry area mapping */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer;
+ ds->pebs_buffer_base = (unsigned long) cea;
+ ds_update_cea(cea, buffer, bsiz, PAGE_KERNEL);
ds->pebs_index = ds->pebs_buffer_base;
- ds->pebs_absolute_maximum = ds->pebs_buffer_base +
- max * x86_pmu.pebs_record_size;
-
+ max = x86_pmu.pebs_record_size * (bsiz / x86_pmu.pebs_record_size);
+ ds->pebs_absolute_maximum = ds->pebs_buffer_base + max;
return 0;
}
static void release_pebs_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ void *cea;
if (!ds || !x86_pmu.pebs)
return;
kfree(per_cpu(insn_buffer, cpu));
per_cpu(insn_buffer, cpu) = NULL;
- kfree((void *)(unsigned long)ds->pebs_buffer_base);
+ /* Clear the fixmap */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer;
+ ds_clear_cea(cea, x86_pmu.pebs_buffer_size);
ds->pebs_buffer_base = 0;
+ dsfree_pages(hwev->ds_pebs_vaddr, x86_pmu.pebs_buffer_size);
+ hwev->ds_pebs_vaddr = NULL;
}
static int alloc_bts_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
- int node = cpu_to_node(cpu);
- int max, thresh;
- void *buffer;
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ void *buffer, *cea;
+ int max;
if (!x86_pmu.bts)
return 0;
- buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
+ buffer = dsalloc_pages(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, cpu);
if (unlikely(!buffer)) {
WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
return -ENOMEM;
}
-
- max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
- thresh = max / 16;
-
- ds->bts_buffer_base = (u64)(unsigned long)buffer;
+ hwev->ds_bts_vaddr = buffer;
+ /* Update the fixmap */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.bts_buffer;
+ ds->bts_buffer_base = (unsigned long) cea;
+ ds_update_cea(cea, buffer, BTS_BUFFER_SIZE, PAGE_KERNEL);
ds->bts_index = ds->bts_buffer_base;
- ds->bts_absolute_maximum = ds->bts_buffer_base +
- max * BTS_RECORD_SIZE;
- ds->bts_interrupt_threshold = ds->bts_absolute_maximum -
- thresh * BTS_RECORD_SIZE;
-
+ max = BTS_RECORD_SIZE * (BTS_BUFFER_SIZE / BTS_RECORD_SIZE);
+ ds->bts_absolute_maximum = ds->bts_buffer_base + max;
+ ds->bts_interrupt_threshold = ds->bts_absolute_maximum - (max / 16);
return 0;
}
static void release_bts_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
+ struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu);
+ struct debug_store *ds = hwev->ds;
+ void *cea;
if (!ds || !x86_pmu.bts)
return;
- kfree((void *)(unsigned long)ds->bts_buffer_base);
+ /* Clear the fixmap */
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.bts_buffer;
+ ds_clear_cea(cea, BTS_BUFFER_SIZE);
ds->bts_buffer_base = 0;
+ dsfree_pages(hwev->ds_bts_vaddr, BTS_BUFFER_SIZE);
+ hwev->ds_bts_vaddr = NULL;
}
static int alloc_ds_buffer(int cpu)
{
- int node = cpu_to_node(cpu);
- struct debug_store *ds;
-
- ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node);
- if (unlikely(!ds))
- return -ENOMEM;
+ struct debug_store *ds = &get_cpu_entry_area(cpu)->cpu_debug_store;
+ memset(ds, 0, sizeof(*ds));
per_cpu(cpu_hw_events, cpu).ds = ds;
-
return 0;
}
static void release_ds_buffer(int cpu)
{
- struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
-
- if (!ds)
- return;
-
per_cpu(cpu_hw_events, cpu).ds = NULL;
- kfree(ds);
}
void release_ds_buffers(void)
#include <linux/perf_event.h>
+#include <asm/intel_ds.h>
+
/* To enable MSR tracing please use the generic trace points. */
/*
struct event_constraint event_constraints[X86_PMC_IDX_MAX];
};
-/* The maximal number of PEBS events: */
-#define MAX_PEBS_EVENTS 8
#define PEBS_COUNTER_MASK ((1ULL << MAX_PEBS_EVENTS) - 1)
/*
PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \
PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER)
-/*
- * A debug store configuration.
- *
- * We only support architectures that use 64bit fields.
- */
-struct debug_store {
- u64 bts_buffer_base;
- u64 bts_index;
- u64 bts_absolute_maximum;
- u64 bts_interrupt_threshold;
- u64 pebs_buffer_base;
- u64 pebs_index;
- u64 pebs_absolute_maximum;
- u64 pebs_interrupt_threshold;
- u64 pebs_event_reset[MAX_PEBS_EVENTS];
-};
-
#define PEBS_REGS \
(PERF_REG_X86_AX | \
PERF_REG_X86_BX | \
* Intel DebugStore bits
*/
struct debug_store *ds;
+ void *ds_pebs_vaddr;
+ void *ds_bts_vaddr;
u64 pebs_enabled;
int n_pebs;
int n_large_pebs;
".popsection\n" \
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr, feature, 1) \
- ".popsection"
+ ".popsection\n"
#define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
OLDINSTR_2(oldinstr, 1, 2) \
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr1, feature1, 1) \
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2) \
- ".popsection"
+ ".popsection\n"
/*
* Alternative instructions for different CPU types or capabilities.
#endif
#ifndef __ASSEMBLY__
+#ifndef __BPF__
/*
* This output constraint should be used for any inline asm which has a "call"
* instruction. Otherwise the asm may be inserted before the frame pointer
register unsigned long current_stack_pointer asm(_ASM_SP);
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
#endif
+#endif
#endif /* _ASM_X86_ASM_H */
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef _ASM_X86_CPU_ENTRY_AREA_H
+#define _ASM_X86_CPU_ENTRY_AREA_H
+
+#include <linux/percpu-defs.h>
+#include <asm/processor.h>
+#include <asm/intel_ds.h>
+
+/*
+ * cpu_entry_area is a percpu region that contains things needed by the CPU
+ * and early entry/exit code. Real types aren't used for all fields here
+ * to avoid circular header dependencies.
+ *
+ * Every field is a virtual alias of some other allocated backing store.
+ * There is no direct allocation of a struct cpu_entry_area.
+ */
+struct cpu_entry_area {
+ char gdt[PAGE_SIZE];
+
+ /*
+ * The GDT is just below entry_stack and thus serves (on x86_64) as
+ * a a read-only guard page.
+ */
+ struct entry_stack_page entry_stack_page;
+
+ /*
+ * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because
+ * we need task switches to work, and task switches write to the TSS.
+ */
+ struct tss_struct tss;
+
+ char entry_trampoline[PAGE_SIZE];
+
+#ifdef CONFIG_X86_64
+ /*
+ * Exception stacks used for IST entries.
+ *
+ * In the future, this should have a separate slot for each stack
+ * with guard pages between them.
+ */
+ char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ];
+#endif
+#ifdef CONFIG_CPU_SUP_INTEL
+ /*
+ * Per CPU debug store for Intel performance monitoring. Wastes a
+ * full page at the moment.
+ */
+ struct debug_store cpu_debug_store;
+ /*
+ * The actual PEBS/BTS buffers must be mapped to user space
+ * Reserve enough fixmap PTEs.
+ */
+ struct debug_store_buffers cpu_debug_buffers;
+#endif
+};
+
+#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area))
+#define CPU_ENTRY_AREA_TOT_SIZE (CPU_ENTRY_AREA_SIZE * NR_CPUS)
+
+DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
+
+extern void setup_cpu_entry_areas(void);
+extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags);
+
+#define CPU_ENTRY_AREA_RO_IDT CPU_ENTRY_AREA_BASE
+#define CPU_ENTRY_AREA_PER_CPU (CPU_ENTRY_AREA_RO_IDT + PAGE_SIZE)
+
+#define CPU_ENTRY_AREA_RO_IDT_VADDR ((void *)CPU_ENTRY_AREA_RO_IDT)
+
+#define CPU_ENTRY_AREA_MAP_SIZE \
+ (CPU_ENTRY_AREA_PER_CPU + CPU_ENTRY_AREA_TOT_SIZE - CPU_ENTRY_AREA_BASE)
+
+extern struct cpu_entry_area *get_cpu_entry_area(int cpu);
+
+static inline struct entry_stack *cpu_entry_stack(int cpu)
+{
+ return &get_cpu_entry_area(cpu)->entry_stack_page.stack;
+}
+
+#endif
set_bit(bit, (unsigned long *)cpu_caps_set); \
} while (0)
+#define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit)
+
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_X86_FAST_FEATURE_TESTS)
/*
* Static testing of CPU features. Used the same as boot_cpu_has().
#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */
#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */
#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */
+#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 7) /* Effectively INVPCID && CR4.PCIDE=1 */
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */
-
+#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */
#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */
#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */
#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */
+#define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */
#endif /* _ASM_X86_CPUFEATURES_H */
#include <asm/mmu.h>
#include <asm/fixmap.h>
#include <asm/irq_vectors.h>
+#include <asm/cpu_entry_area.h>
#include <linux/smp.h>
#include <linux/percpu.h>
desc->type = (info->read_exec_only ^ 1) << 1;
desc->type |= info->contents << 2;
+ /* Set the ACCESS bit so it can be mapped RO */
+ desc->type |= 1;
desc->s = 1;
desc->dpl = 0x3;
return this_cpu_ptr(&gdt_page)->gdt;
}
-/* Get the fixmap index for a specific processor */
-static inline unsigned int get_cpu_gdt_ro_index(int cpu)
-{
- return FIX_GDT_REMAP_BEGIN + cpu;
-}
-
/* Provide the fixmap address of the remapped GDT */
static inline struct desc_struct *get_cpu_gdt_ro(int cpu)
{
- unsigned int idx = get_cpu_gdt_ro_index(cpu);
- return (struct desc_struct *)__fix_to_virt(idx);
+ return (struct desc_struct *)&get_cpu_entry_area(cpu)->gdt;
}
/* Provide the current read-only GDT */
#endif
}
-static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr)
+static inline void __set_tss_desc(unsigned cpu, unsigned int entry, struct x86_hw_tss *addr)
{
struct desc_struct *d = get_cpu_gdt_rw(cpu);
tss_desc tss;
# define DISABLE_LA57 (1<<(X86_FEATURE_LA57 & 31))
#endif
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define DISABLE_PTI 0
+#else
+# define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
#define DISABLED_MASK4 (DISABLE_PCID)
#define DISABLED_MASK5 0
#define DISABLED_MASK6 0
-#define DISABLED_MASK7 0
+#define DISABLED_MASK7 (DISABLE_PTI)
#define DISABLED_MASK8 0
#define DISABLED_MASK9 (DISABLE_MPX)
#define DISABLED_MASK10 0
#ifndef _ASM_X86_ESPFIX_H
#define _ASM_X86_ESPFIX_H
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_ESPFIX64
#include <asm/percpu.h>
extern void init_espfix_bsp(void);
extern void init_espfix_ap(int cpu);
-
-#endif /* CONFIG_X86_64 */
+#else
+static inline void init_espfix_ap(int cpu) { }
+#endif
#endif /* _ASM_X86_ESPFIX_H */
PAGE_SIZE)
#endif
-
/*
* Here we define all the compile-time 'special' virtual
* addresses. The point is to have a constant address at
FIX_IO_APIC_BASE_0,
FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS - 1,
#endif
- FIX_RO_IDT, /* Virtual mapping for read-only IDT */
#ifdef CONFIG_X86_32
FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
#ifdef CONFIG_X86_INTEL_MID
FIX_LNW_VRTC,
#endif
- /* Fixmap entries to remap the GDTs, one per processor. */
- FIX_GDT_REMAP_BEGIN,
- FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
#ifdef CONFIG_ACPI_APEI_GHES
/* Used for GHES mapping from assorted contexts */
extern void reserve_top_address(unsigned long reserve);
#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
-#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
+#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
extern int fixmaps_set;
#ifndef _ASM_X86_HYPERVISOR_H
#define _ASM_X86_HYPERVISOR_H
-#ifdef CONFIG_HYPERVISOR_GUEST
-
-#include <asm/kvm_para.h>
-#include <asm/x86_init.h>
-#include <asm/xen/hypervisor.h>
-
-/*
- * x86 hypervisor information
- */
-
+/* x86 hypervisor types */
enum x86_hypervisor_type {
X86_HYPER_NATIVE = 0,
X86_HYPER_VMWARE,
X86_HYPER_KVM,
};
+#ifdef CONFIG_HYPERVISOR_GUEST
+
+#include <asm/kvm_para.h>
+#include <asm/x86_init.h>
+#include <asm/xen/hypervisor.h>
+
struct hypervisor_x86 {
/* Hypervisor name */
const char *name;
extern enum x86_hypervisor_type x86_hyper_type;
extern void init_hypervisor_platform(void);
+static inline bool hypervisor_is_type(enum x86_hypervisor_type type)
+{
+ return x86_hyper_type == type;
+}
#else
static inline void init_hypervisor_platform(void) { }
+static inline bool hypervisor_is_type(enum x86_hypervisor_type type)
+{
+ return type == X86_HYPER_NATIVE;
+}
#endif /* CONFIG_HYPERVISOR_GUEST */
#endif /* _ASM_X86_HYPERVISOR_H */
--- /dev/null
+#ifndef _ASM_INTEL_DS_H
+#define _ASM_INTEL_DS_H
+
+#include <linux/percpu-defs.h>
+
+#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
+#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4)
+
+/* The maximal number of PEBS events: */
+#define MAX_PEBS_EVENTS 8
+
+/*
+ * A debug store configuration.
+ *
+ * We only support architectures that use 64bit fields.
+ */
+struct debug_store {
+ u64 bts_buffer_base;
+ u64 bts_index;
+ u64 bts_absolute_maximum;
+ u64 bts_interrupt_threshold;
+ u64 pebs_buffer_base;
+ u64 pebs_index;
+ u64 pebs_absolute_maximum;
+ u64 pebs_interrupt_threshold;
+ u64 pebs_event_reset[MAX_PEBS_EVENTS];
+} __aligned(PAGE_SIZE);
+
+DECLARE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store);
+
+struct debug_store_buffers {
+ char bts_buffer[BTS_BUFFER_SIZE];
+ char pebs_buffer[PEBS_BUFFER_SIZE];
+};
+
+#endif
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_INVPCID
+#define _ASM_X86_INVPCID
+
+static inline void __invpcid(unsigned long pcid, unsigned long addr,
+ unsigned long type)
+{
+ struct { u64 d[2]; } desc = { { pcid, addr } };
+
+ /*
+ * The memory clobber is because the whole point is to invalidate
+ * stale TLB entries and, especially if we're flushing global
+ * mappings, we don't want the compiler to reorder any subsequent
+ * memory accesses before the TLB flush.
+ *
+ * The hex opcode is invpcid (%ecx), %eax in 32-bit mode and
+ * invpcid (%rcx), %rax in long mode.
+ */
+ asm volatile (".byte 0x66, 0x0f, 0x38, 0x82, 0x01"
+ : : "m" (desc), "a" (type), "c" (&desc) : "memory");
+}
+
+#define INVPCID_TYPE_INDIV_ADDR 0
+#define INVPCID_TYPE_SINGLE_CTXT 1
+#define INVPCID_TYPE_ALL_INCL_GLOBAL 2
+#define INVPCID_TYPE_ALL_NON_GLOBAL 3
+
+/* Flush all mappings for a given pcid and addr, not including globals. */
+static inline void invpcid_flush_one(unsigned long pcid,
+ unsigned long addr)
+{
+ __invpcid(pcid, addr, INVPCID_TYPE_INDIV_ADDR);
+}
+
+/* Flush all mappings for a given PCID, not including globals. */
+static inline void invpcid_flush_single_context(unsigned long pcid)
+{
+ __invpcid(pcid, 0, INVPCID_TYPE_SINGLE_CTXT);
+}
+
+/* Flush all mappings, including globals, for all PCIDs. */
+static inline void invpcid_flush_all(void)
+{
+ __invpcid(0, 0, INVPCID_TYPE_ALL_INCL_GLOBAL);
+}
+
+/* Flush all mappings for all PCIDs except globals. */
+static inline void invpcid_flush_all_nonglobals(void)
+{
+ __invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL);
+}
+
+#endif /* _ASM_X86_INVPCID */
extern void mp_irqdomain_free(struct irq_domain *domain, unsigned int virq,
unsigned int nr_irqs);
extern int mp_irqdomain_activate(struct irq_domain *domain,
- struct irq_data *irq_data, bool early);
+ struct irq_data *irq_data, bool reserve);
extern void mp_irqdomain_deactivate(struct irq_domain *domain,
struct irq_data *irq_data);
extern int mp_irqdomain_ioapic_idx(struct irq_domain *domain);
swapgs; \
sysretl
+#ifdef CONFIG_DEBUG_ENTRY
+#define SAVE_FLAGS(x) pushfq; popq %rax
+#endif
#else
#define INTERRUPT_RETURN iret
#define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit
extern int __must_check __die(const char *, struct pt_regs *, long);
extern void show_stack_regs(struct pt_regs *regs);
extern void __show_regs(struct pt_regs *regs, int all);
+extern void show_iret_regs(struct pt_regs *regs);
extern unsigned long oops_begin(void);
extern void oops_end(unsigned long, struct pt_regs *, int signr);
#define _ASM_X86_MMU_H
#include <linux/spinlock.h>
+#include <linux/rwsem.h>
#include <linux/mutex.h>
#include <linux/atomic.h>
atomic64_t tlb_gen;
#ifdef CONFIG_MODIFY_LDT_SYSCALL
- struct ldt_struct *ldt;
+ struct rw_semaphore ldt_usr_sem;
+ struct ldt_struct *ldt;
#endif
#ifdef CONFIG_X86_64
* call gates. On native, we could merge the ldt_struct and LDT
* allocations, but it's not worth trying to optimize.
*/
- struct desc_struct *entries;
- unsigned int nr_entries;
+ struct desc_struct *entries;
+ unsigned int nr_entries;
+
+ /*
+ * If PTI is in use, then the entries array is not mapped while we're
+ * in user mode. The whole array will be aliased at the addressed
+ * given by ldt_slot_va(slot). We use two slots so that we can allocate
+ * and map, and enable a new LDT without invalidating the mapping
+ * of an older, still-in-use LDT.
+ *
+ * slot will be -1 if this LDT doesn't have an alias mapping.
+ */
+ int slot;
};
+/* This is a multiple of PAGE_SIZE. */
+#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
+
+static inline void *ldt_slot_va(int slot)
+{
+#ifdef CONFIG_X86_64
+ return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
+#else
+ BUG();
+#endif
+}
+
/*
* Used for LDT copy/destruction.
*/
-int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm);
+static inline void init_new_context_ldt(struct mm_struct *mm)
+{
+ mm->context.ldt = NULL;
+ init_rwsem(&mm->context.ldt_usr_sem);
+}
+int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm);
void destroy_context_ldt(struct mm_struct *mm);
+void ldt_arch_exit_mmap(struct mm_struct *mm);
#else /* CONFIG_MODIFY_LDT_SYSCALL */
-static inline int init_new_context_ldt(struct task_struct *tsk,
- struct mm_struct *mm)
+static inline void init_new_context_ldt(struct mm_struct *mm) { }
+static inline int ldt_dup_context(struct mm_struct *oldmm,
+ struct mm_struct *mm)
{
return 0;
}
-static inline void destroy_context_ldt(struct mm_struct *mm) {}
+static inline void destroy_context_ldt(struct mm_struct *mm) { }
+static inline void ldt_arch_exit_mmap(struct mm_struct *mm) { }
#endif
static inline void load_mm_ldt(struct mm_struct *mm)
* that we can see.
*/
- if (unlikely(ldt))
- set_ldt(ldt->entries, ldt->nr_entries);
- else
+ if (unlikely(ldt)) {
+ if (static_cpu_has(X86_FEATURE_PTI)) {
+ if (WARN_ON_ONCE((unsigned long)ldt->slot > 1)) {
+ /*
+ * Whoops -- either the new LDT isn't mapped
+ * (if slot == -1) or is mapped into a bogus
+ * slot (if slot > 1).
+ */
+ clear_LDT();
+ return;
+ }
+
+ /*
+ * If page table isolation is enabled, ldt->entries
+ * will not be mapped in the userspace pagetables.
+ * Tell the CPU to access the LDT through the alias
+ * at ldt_slot_va(ldt->slot).
+ */
+ set_ldt(ldt_slot_va(ldt->slot), ldt->nr_entries);
+ } else {
+ set_ldt(ldt->entries, ldt->nr_entries);
+ }
+ } else {
clear_LDT();
+ }
#else
clear_LDT();
#endif
static inline int init_new_context(struct task_struct *tsk,
struct mm_struct *mm)
{
+ mutex_init(&mm->context.lock);
+
mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id);
atomic64_set(&mm->context.tlb_gen, 0);
- #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
/* pkey 0 is the default and always allocated */
mm->context.pkey_allocation_map = 0x1;
/* -1 means unallocated or invalid */
mm->context.execute_only_pkey = -1;
}
- #endif
- return init_new_context_ldt(tsk, mm);
+#endif
+ init_new_context_ldt(mm);
+ return 0;
}
static inline void destroy_context(struct mm_struct *mm)
{
} while (0)
#endif
-static inline void arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
+static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
{
paravirt_arch_dup_mmap(oldmm, mm);
+ return ldt_dup_context(oldmm, mm);
}
static inline void arch_exit_mmap(struct mm_struct *mm)
{
paravirt_arch_exit_mmap(mm);
+ ldt_arch_exit_mmap(mm);
}
#ifdef CONFIG_X86_64
return __pkru_allows_pkey(vma_pkey(vma), write);
}
-/*
- * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID
- * bits. This serves two purposes. It prevents a nasty situation in
- * which PCID-unaware code saves CR3, loads some other value (with PCID
- * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if
- * the saved ASID was nonzero. It also means that any bugs involving
- * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger
- * deterministically.
- */
-
-static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid)
-{
- if (static_cpu_has(X86_FEATURE_PCID)) {
- VM_WARN_ON_ONCE(asid > 4094);
- return __sme_pa(mm->pgd) | (asid + 1);
- } else {
- VM_WARN_ON_ONCE(asid != 0);
- return __sme_pa(mm->pgd);
- }
-}
-
-static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid)
-{
- VM_WARN_ON_ONCE(asid > 4094);
- return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH;
-}
-
/*
* This can be used from process context to figure out what the value of
* CR3 is without needing to do a (slow) __read_cr3().
*/
static inline unsigned long __get_current_cr3_fast(void)
{
- unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm),
+ unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
this_cpu_read(cpu_tlbstate.loaded_mm_asid));
/* For now, be very restrictive about when this can be called. */
PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64), \
CLBR_NONE, \
jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_usergs_sysret64))
+
+#ifdef CONFIG_DEBUG_ENTRY
+#define SAVE_FLAGS(clobbers) \
+ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_save_fl), clobbers, \
+ PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \
+ call PARA_INDIRECT(pv_irq_ops+PV_IRQ_save_fl); \
+ PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
+#endif
+
#endif /* CONFIG_X86_32 */
#endif /* __ASSEMBLY__ */
*/
extern gfp_t __userpte_alloc_gfp;
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Instead of one PGD, we acquire two PGDs. Being order-1, it is
+ * both 8k in size and 8k-aligned. That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
+#define PGD_ALLOCATION_ORDER 1
+#else
+#define PGD_ALLOCATION_ORDER 0
+#endif
+
/*
* Allocate and free page tables.
*/
int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
void ptdump_walk_pgd_level_checkwx(void);
#ifdef CONFIG_DEBUG_WX
static inline int p4d_bad(p4d_t p4d)
{
- return (p4d_flags(p4d) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0;
+ unsigned long ignore_flags = _KERNPG_TABLE | _PAGE_USER;
+
+ if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ ignore_flags |= _PAGE_NX;
+
+ return (p4d_flags(p4d) & ~ignore_flags) != 0;
}
#endif /* CONFIG_PGTABLE_LEVELS > 3 */
static inline int pgd_bad(pgd_t pgd)
{
- return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+ unsigned long ignore_flags = _PAGE_USER;
+
+ if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ ignore_flags |= _PAGE_NX;
+
+ return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
}
static inline int pgd_none(pgd_t pgd)
* pgd_offset() returns a (pgd_t *)
* pgd_index() is used get the offset into the pgd page's array of pgd_t's;
*/
-#define pgd_offset(mm, address) ((mm)->pgd + pgd_index((address)))
+#define pgd_offset_pgd(pgd, address) (pgd + pgd_index((address)))
+/*
+ * a shortcut to get a pgd_t in a given mm
+ */
+#define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address))
/*
* a shortcut which implies the use of the kernel's pgd, instead
* of a process's
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+ /* Clone the user space pgd as well */
+ memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
+ count * sizeof(pgd_t));
+#endif
}
#define PTE_SHIFT ilog2(PTRS_PER_PTE)
#define LAST_PKMAP 1024
#endif
-#define PKMAP_BASE ((FIXADDR_START - PAGE_SIZE * (LAST_PKMAP + 1)) \
- & PMD_MASK)
+/*
+ * Define this here and validate with BUILD_BUG_ON() in pgtable_32.c
+ * to avoid include recursion hell
+ */
+#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40)
+
+#define CPU_ENTRY_AREA_BASE \
+ ((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK)
+
+#define PKMAP_BASE \
+ ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
#ifdef CONFIG_HIGHMEM
# define VMALLOC_END (PKMAP_BASE - 2 * PAGE_SIZE)
#else
-# define VMALLOC_END (FIXADDR_START - 2 * PAGE_SIZE)
+# define VMALLOC_END (CPU_ENTRY_AREA_BASE - 2 * PAGE_SIZE)
#endif
#define MODULES_VADDR VMALLOC_START
#endif
}
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and
+ * the user one is in the last 4k. To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr |= BIT(bit);
+ return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+ unsigned long __ptr = (unsigned long)ptr;
+
+ __ptr &= ~BIT(bit);
+ return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+ return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+ return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+ return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+ return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+ unsigned long ptr = (unsigned long)__ptr;
+
+ return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
+}
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return pgd;
+ return __pti_set_user_pgd(pgdp, pgd);
+}
+#else
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+#endif
+
static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
{
+#if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL)
+ p4dp->pgd = pti_set_user_pgd(&p4dp->pgd, p4d.pgd);
+#else
*p4dp = p4d;
+#endif
}
static inline void native_p4d_clear(p4d_t *p4d)
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ *pgdp = pti_set_user_pgd(pgdp, pgd);
+#else
*pgdp = pgd;
+#endif
}
static inline void native_pgd_clear(pgd_t *pgd)
#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE - 1))
-/* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
-#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
+/*
+ * See Documentation/x86/x86_64/mm.txt for a description of the memory map.
+ *
+ * Be very careful vs. KASLR when changing anything here. The KASLR address
+ * range must not overlap with anything except the KASAN shadow area, which
+ * is correct as KASAN disables KASLR.
+ */
+#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
+
#ifdef CONFIG_X86_5LEVEL
-#define VMALLOC_SIZE_TB _AC(16384, UL)
-#define __VMALLOC_BASE _AC(0xff92000000000000, UL)
-#define __VMEMMAP_BASE _AC(0xffd4000000000000, UL)
+# define VMALLOC_SIZE_TB _AC(12800, UL)
+# define __VMALLOC_BASE _AC(0xffa0000000000000, UL)
+# define __VMEMMAP_BASE _AC(0xffd4000000000000, UL)
+# define LDT_PGD_ENTRY _AC(-112, UL)
+# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
#else
-#define VMALLOC_SIZE_TB _AC(32, UL)
-#define __VMALLOC_BASE _AC(0xffffc90000000000, UL)
-#define __VMEMMAP_BASE _AC(0xffffea0000000000, UL)
+# define VMALLOC_SIZE_TB _AC(32, UL)
+# define __VMALLOC_BASE _AC(0xffffc90000000000, UL)
+# define __VMEMMAP_BASE _AC(0xffffea0000000000, UL)
+# define LDT_PGD_ENTRY _AC(-3, UL)
+# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
#endif
+
#ifdef CONFIG_RANDOMIZE_MEMORY
-#define VMALLOC_START vmalloc_base
-#define VMEMMAP_START vmemmap_base
+# define VMALLOC_START vmalloc_base
+# define VMEMMAP_START vmemmap_base
#else
-#define VMALLOC_START __VMALLOC_BASE
-#define VMEMMAP_START __VMEMMAP_BASE
+# define VMALLOC_START __VMALLOC_BASE
+# define VMEMMAP_START __VMEMMAP_BASE
#endif /* CONFIG_RANDOMIZE_MEMORY */
-#define VMALLOC_END (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
-#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+
+#define VMALLOC_END (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
+
+#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
/* The module sections ends with the start of the fixmap */
-#define MODULES_END __fix_to_virt(__end_of_fixed_addresses + 1)
-#define MODULES_LEN (MODULES_END - MODULES_VADDR)
-#define ESPFIX_PGD_ENTRY _AC(-2, UL)
-#define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT)
-#define EFI_VA_START ( -4 * (_AC(1, UL) << 30))
-#define EFI_VA_END (-68 * (_AC(1, UL) << 30))
+#define MODULES_END _AC(0xffffffffff000000, UL)
+#define MODULES_LEN (MODULES_END - MODULES_VADDR)
+
+#define ESPFIX_PGD_ENTRY _AC(-2, UL)
+#define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT)
+
+#define CPU_ENTRY_AREA_PGD _AC(-4, UL)
+#define CPU_ENTRY_AREA_BASE (CPU_ENTRY_AREA_PGD << P4D_SHIFT)
+
+#define EFI_VA_START ( -4 * (_AC(1, UL) << 30))
+#define EFI_VA_END (-68 * (_AC(1, UL) << 30))
#define EARLY_DYNAMIC_PAGE_TABLES 64
#define CR3_ADDR_MASK __sme_clr(0x7FFFFFFFFFFFF000ull)
#define CR3_PCID_MASK 0xFFFull
#define CR3_NOFLUSH BIT_ULL(63)
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_SWITCH_BIT 11
+#endif
+
#else
/*
* CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
extern struct cpuinfo_x86 boot_cpu_data;
extern struct cpuinfo_x86 new_cpu_data;
-extern struct tss_struct doublefault_tss;
-extern __u32 cpu_caps_cleared[NCAPINTS];
-extern __u32 cpu_caps_set[NCAPINTS];
+extern struct x86_hw_tss doublefault_tss;
+extern __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
+extern __u32 cpu_caps_set[NCAPINTS + NBUGINTS];
#ifdef CONFIG_SMP
DECLARE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info);
write_cr3(__sme_pa(pgdir));
}
+/*
+ * Note that while the legacy 'TSS' name comes from 'Task State Segment',
+ * on modern x86 CPUs the TSS also holds information important to 64-bit mode,
+ * unrelated to the task-switch mechanism:
+ */
#ifdef CONFIG_X86_32
/* This is the TSS defined by the hardware. */
struct x86_hw_tss {
struct x86_hw_tss {
u32 reserved1;
u64 sp0;
+
+ /*
+ * We store cpu_current_top_of_stack in sp1 so it's always accessible.
+ * Linux does not use ring 1, so sp1 is not otherwise needed.
+ */
u64 sp1;
+
u64 sp2;
u64 reserved2;
u64 ist[7];
#define IO_BITMAP_BITS 65536
#define IO_BITMAP_BYTES (IO_BITMAP_BITS/8)
#define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long))
-#define IO_BITMAP_OFFSET offsetof(struct tss_struct, io_bitmap)
+#define IO_BITMAP_OFFSET (offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss))
#define INVALID_IO_BITMAP_OFFSET 0x8000
+struct entry_stack {
+ unsigned long words[64];
+};
+
+struct entry_stack_page {
+ struct entry_stack stack;
+} __aligned(PAGE_SIZE);
+
struct tss_struct {
/*
- * The hardware state:
+ * The fixed hardware portion. This must not cross a page boundary
+ * at risk of violating the SDM's advice and potentially triggering
+ * errata.
*/
struct x86_hw_tss x86_tss;
* be within the limit.
*/
unsigned long io_bitmap[IO_BITMAP_LONGS + 1];
+} __aligned(PAGE_SIZE);
-#ifdef CONFIG_X86_32
- /*
- * Space for the temporary SYSENTER stack.
- */
- unsigned long SYSENTER_stack_canary;
- unsigned long SYSENTER_stack[64];
-#endif
-
-} ____cacheline_aligned;
-
-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss);
+DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw);
/*
* sizeof(unsigned long) coming from an extra "long" at the end
#ifdef CONFIG_X86_32
DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack);
+#else
+/* The RO copy can't be accessed with this_cpu_xyz(), so use the RW copy. */
+#define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1
#endif
/*
static inline void
native_load_sp0(unsigned long sp0)
{
- this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
+ this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0);
}
static inline void native_swapgs(void)
static inline unsigned long current_top_of_stack(void)
{
-#ifdef CONFIG_X86_64
- return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
-#else
- /* sp0 on x86_32 is special in and around vm86 mode. */
+ /*
+ * We can't read directly from tss.sp0: sp0 on x86_32 is special in
+ * and around vm86 mode and sp0 on x86_64 is special because of the
+ * entry trampoline.
+ */
return this_cpu_read_stable(cpu_current_top_of_stack);
-#endif
}
static inline bool on_thread_stack(void)
#else
/*
- * User space process size. 47bits minus one guard page. The guard
- * page is necessary on Intel CPUs: if a SYSCALL instruction is at
- * the highest possible canonical userspace address, then that
- * syscall will enter the kernel with a non-canonical return
- * address, and SYSRET will explode dangerously. We avoid this
- * particular problem by preventing anything from being mapped
- * at the maximum canonical address.
+ * User space process size. This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen. This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
*/
#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _ASM_X86_PTI_H
+#define _ASM_X86_PTI_H
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+extern void pti_init(void);
+extern void pti_check_boottime_disable(void);
+#else
+static inline void pti_check_boottime_disable(void) { }
+#endif
+
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_PTI_H */
STACK_TYPE_TASK,
STACK_TYPE_IRQ,
STACK_TYPE_SOFTIRQ,
+ STACK_TYPE_ENTRY,
STACK_TYPE_EXCEPTION,
STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1,
};
bool in_task_stack(unsigned long *stack, struct task_struct *task,
struct stack_info *info);
+bool in_entry_stack(unsigned long *stack, struct stack_info *info);
+
int get_stack_info(unsigned long *stack, struct task_struct *task,
struct stack_info *info, unsigned long *visit_mask);
/* image of the saved processor state */
struct saved_context {
- u16 es, fs, gs, ss;
+ /*
+ * On x86_32, all segment registers, with the possible exception of
+ * gs, are saved at kernel entry in pt_regs.
+ */
+#ifdef CONFIG_X86_32_LAZY_GS
+ u16 gs;
+#endif
unsigned long cr0, cr2, cr3, cr4;
u64 misc_enable;
bool misc_enable_saved;
*/
struct saved_context {
struct pt_regs regs;
- u16 ds, es, fs, gs, ss;
- unsigned long gs_base, gs_kernel_base, fs_base;
+
+ /*
+ * User CS and SS are saved in current_pt_regs(). The rest of the
+ * segment selectors need to be saved and restored here.
+ */
+ u16 ds, es, fs, gs;
+
+ /*
+ * Usermode FSBASE and GSBASE may not match the fs and gs selectors,
+ * so we save them separately. We save the kernelmode GSBASE to
+ * restore percpu access after resume.
+ */
+ unsigned long kernelmode_gs_base, usermode_gs_base, fs_base;
+
unsigned long cr0, cr2, cr3, cr4, cr8;
u64 misc_enable;
bool misc_enable_saved;
u16 gdt_pad; /* Unused */
struct desc_ptr gdt_desc;
u16 idt_pad;
- u16 idt_limit;
- unsigned long idt_base;
+ struct desc_ptr idt;
u16 ldt;
u16 tss;
unsigned long tr;
struct tss_struct *tss);
/* This runs runs on the previous thread's stack. */
-static inline void prepare_switch_to(struct task_struct *prev,
- struct task_struct *next)
+static inline void prepare_switch_to(struct task_struct *next)
{
#ifdef CONFIG_VMAP_STACK
/*
#define switch_to(prev, next, last) \
do { \
- prepare_switch_to(prev, next); \
+ prepare_switch_to(next); \
\
((last) = __switch_to_asm((prev), (next))); \
} while (0)
static inline void refresh_sysenter_cs(struct thread_struct *thread)
{
/* Only happens when SEP is enabled, no need to test "SEP"arately: */
- if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
+ if (unlikely(this_cpu_read(cpu_tss_rw.x86_tss.ss1) == thread->sysenter_cs))
return;
- this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);
+ this_cpu_write(cpu_tss_rw.x86_tss.ss1, thread->sysenter_cs);
wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
}
#endif
/* This is used when switching tasks or entering/exiting vm86 mode. */
static inline void update_sp0(struct task_struct *task)
{
+ /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */
#ifdef CONFIG_X86_32
load_sp0(task->thread.sp0);
#else
- load_sp0(task_top_of_stack(task));
+ if (static_cpu_has(X86_FEATURE_XENPV))
+ load_sp0(task_top_of_stack(task));
#endif
}
#else /* !__ASSEMBLY__ */
#ifdef CONFIG_X86_64
-# define cpu_current_top_of_stack (cpu_tss + TSS_sp0)
+# define cpu_current_top_of_stack (cpu_tss_rw + TSS_sp1)
#endif
#endif
#include <asm/cpufeature.h>
#include <asm/special_insns.h>
#include <asm/smp.h>
+#include <asm/invpcid.h>
+#include <asm/pti.h>
+#include <asm/processor-flags.h>
-static inline void __invpcid(unsigned long pcid, unsigned long addr,
- unsigned long type)
-{
- struct { u64 d[2]; } desc = { { pcid, addr } };
+/*
+ * The x86 feature is called PCID (Process Context IDentifier). It is similar
+ * to what is traditionally called ASID on the RISC processors.
+ *
+ * We don't use the traditional ASID implementation, where each process/mm gets
+ * its own ASID and flush/restart when we run out of ASID space.
+ *
+ * Instead we have a small per-cpu array of ASIDs and cache the last few mm's
+ * that came by on this CPU, allowing cheaper switch_mm between processes on
+ * this CPU.
+ *
+ * We end up with different spaces for different things. To avoid confusion we
+ * use different names for each of them:
+ *
+ * ASID - [0, TLB_NR_DYN_ASIDS-1]
+ * the canonical identifier for an mm
+ *
+ * kPCID - [1, TLB_NR_DYN_ASIDS]
+ * the value we write into the PCID part of CR3; corresponds to the
+ * ASID+1, because PCID 0 is special.
+ *
+ * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
+ * for KPTI each mm has two address spaces and thus needs two
+ * PCID values, but we can still do with a single ASID denomination
+ * for each mm. Corresponds to kPCID + 2048.
+ *
+ */
- /*
- * The memory clobber is because the whole point is to invalidate
- * stale TLB entries and, especially if we're flushing global
- * mappings, we don't want the compiler to reorder any subsequent
- * memory accesses before the TLB flush.
- *
- * The hex opcode is invpcid (%ecx), %eax in 32-bit mode and
- * invpcid (%rcx), %rax in long mode.
- */
- asm volatile (".byte 0x66, 0x0f, 0x38, 0x82, 0x01"
- : : "m" (desc), "a" (type), "c" (&desc) : "memory");
-}
+/* There are 12 bits of space for ASIDS in CR3 */
+#define CR3_HW_ASID_BITS 12
-#define INVPCID_TYPE_INDIV_ADDR 0
-#define INVPCID_TYPE_SINGLE_CTXT 1
-#define INVPCID_TYPE_ALL_INCL_GLOBAL 2
-#define INVPCID_TYPE_ALL_NON_GLOBAL 3
+/*
+ * When enabled, PAGE_TABLE_ISOLATION consumes a single bit for
+ * user/kernel switches
+ */
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define PTI_CONSUMED_PCID_BITS 1
+#else
+# define PTI_CONSUMED_PCID_BITS 0
+#endif
-/* Flush all mappings for a given pcid and addr, not including globals. */
-static inline void invpcid_flush_one(unsigned long pcid,
- unsigned long addr)
-{
- __invpcid(pcid, addr, INVPCID_TYPE_INDIV_ADDR);
-}
+#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS)
+
+/*
+ * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account
+ * for them being zero-based. Another -1 is because PCID 0 is reserved for
+ * use by non-PCID-aware users.
+ */
+#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2)
+
+/*
+ * 6 because 6 should be plenty and struct tlb_state will fit in two cache
+ * lines.
+ */
+#define TLB_NR_DYN_ASIDS 6
-/* Flush all mappings for a given PCID, not including globals. */
-static inline void invpcid_flush_single_context(unsigned long pcid)
+/*
+ * Given @asid, compute kPCID
+ */
+static inline u16 kern_pcid(u16 asid)
{
- __invpcid(pcid, 0, INVPCID_TYPE_SINGLE_CTXT);
+ VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ /*
+ * Make sure that the dynamic ASID space does not confict with the
+ * bit we are using to switch between user and kernel ASIDs.
+ */
+ BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_SWITCH_BIT));
+
+ /*
+ * The ASID being passed in here should have respected the
+ * MAX_ASID_AVAILABLE and thus never have the switch bit set.
+ */
+ VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_SWITCH_BIT));
+#endif
+ /*
+ * The dynamically-assigned ASIDs that get passed in are small
+ * (<TLB_NR_DYN_ASIDS). They never have the high switch bit set,
+ * so do not bother to clear it.
+ *
+ * If PCID is on, ASID-aware code paths put the ASID+1 into the
+ * PCID bits. This serves two purposes. It prevents a nasty
+ * situation in which PCID-unaware code saves CR3, loads some other
+ * value (with PCID == 0), and then restores CR3, thus corrupting
+ * the TLB for ASID 0 if the saved ASID was nonzero. It also means
+ * that any bugs involving loading a PCID-enabled CR3 with
+ * CR4.PCIDE off will trigger deterministically.
+ */
+ return asid + 1;
}
-/* Flush all mappings, including globals, for all PCIDs. */
-static inline void invpcid_flush_all(void)
+/*
+ * Given @asid, compute uPCID
+ */
+static inline u16 user_pcid(u16 asid)
{
- __invpcid(0, 0, INVPCID_TYPE_ALL_INCL_GLOBAL);
+ u16 ret = kern_pcid(asid);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ ret |= 1 << X86_CR3_PTI_SWITCH_BIT;
+#endif
+ return ret;
}
-/* Flush all mappings for all PCIDs except globals. */
-static inline void invpcid_flush_all_nonglobals(void)
+struct pgd_t;
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
{
- __invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL);
+ if (static_cpu_has(X86_FEATURE_PCID)) {
+ return __sme_pa(pgd) | kern_pcid(asid);
+ } else {
+ VM_WARN_ON_ONCE(asid != 0);
+ return __sme_pa(pgd);
+ }
}
-static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
{
- u64 new_tlb_gen;
-
- /*
- * Bump the generation count. This also serves as a full barrier
- * that synchronizes with switch_mm(): callers are required to order
- * their read of mm_cpumask after their writes to the paging
- * structures.
- */
- smp_mb__before_atomic();
- new_tlb_gen = atomic64_inc_return(&mm->context.tlb_gen);
- smp_mb__after_atomic();
-
- return new_tlb_gen;
+ VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+ VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID));
+ return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
}
#ifdef CONFIG_PARAVIRT
return !static_cpu_has(X86_FEATURE_PCID);
}
-/*
- * 6 because 6 should be plenty and struct tlb_state will fit in
- * two cache lines.
- */
-#define TLB_NR_DYN_ASIDS 6
-
struct tlb_context {
u64 ctx_id;
u64 tlb_gen;
*/
bool is_lazy;
+ /*
+ * If set we changed the page tables in such a way that we
+ * needed an invalidation of all contexts (aka. PCIDs / ASIDs).
+ * This tells us to go invalidate all the non-loaded ctxs[]
+ * on the next context switch.
+ *
+ * The current ctx was kept up-to-date as it ran and does not
+ * need to be invalidated.
+ */
+ bool invalidate_other;
+
+ /*
+ * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
+ * the corresponding user PCID needs a flush next time we
+ * switch to it; see SWITCH_TO_USER_CR3.
+ */
+ unsigned short user_pcid_flush_mask;
+
/*
* Access to this CR4 shadow and to H/W CR4 is protected by
* disabling interrupts when modifying either one.
return this_cpu_read(cpu_tlbstate.cr4);
}
+/*
+ * Mark all other ASIDs as invalid, preserves the current.
+ */
+static inline void invalidate_other_asid(void)
+{
+ this_cpu_write(cpu_tlbstate.invalidate_other, true);
+}
+
/*
* Save some of cr4 feature set we're using (e.g. Pentium 4MB
* enable and PPro Global page enable), so that any CPU's that boot
extern void initialize_tlbstate_and_flush(void);
-static inline void __native_flush_tlb(void)
+/*
+ * Given an ASID, flush the corresponding user ASID. We can delay this
+ * until the next time we switch to it.
+ *
+ * See SWITCH_TO_USER_CR3.
+ */
+static inline void invalidate_user_asid(u16 asid)
{
+ /* There is no user ASID if address space separation is off */
+ if (!IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION))
+ return;
+
/*
- * If current->mm == NULL then we borrow a mm which may change during a
- * task switch and therefore we must not be preempted while we write CR3
- * back:
+ * We only have a single ASID if PCID is off and the CR3
+ * write will have flushed it.
*/
- preempt_disable();
- native_write_cr3(__native_read_cr3());
- preempt_enable();
+ if (!cpu_feature_enabled(X86_FEATURE_PCID))
+ return;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ __set_bit(kern_pcid(asid),
+ (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
}
-static inline void __native_flush_tlb_global_irq_disabled(void)
+/*
+ * flush the entire current user mapping
+ */
+static inline void __native_flush_tlb(void)
{
- unsigned long cr4;
+ /*
+ * Preemption or interrupts must be disabled to protect the access
+ * to the per CPU variable and to prevent being preempted between
+ * read_cr3() and write_cr3().
+ */
+ WARN_ON_ONCE(preemptible());
- cr4 = this_cpu_read(cpu_tlbstate.cr4);
- /* clear PGE */
- native_write_cr4(cr4 & ~X86_CR4_PGE);
- /* write old PGE again and flush TLBs */
- native_write_cr4(cr4);
+ invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+
+ /* If current->mm == NULL then the read_cr3() "borrows" an mm */
+ native_write_cr3(__native_read_cr3());
}
+/*
+ * flush everything
+ */
static inline void __native_flush_tlb_global(void)
{
- unsigned long flags;
+ unsigned long cr4, flags;
if (static_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
+ *
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
invpcid_flush_all();
return;
*/
raw_local_irq_save(flags);
- __native_flush_tlb_global_irq_disabled();
+ cr4 = this_cpu_read(cpu_tlbstate.cr4);
+ /* toggle PGE */
+ native_write_cr4(cr4 ^ X86_CR4_PGE);
+ /* write old PGE again and flush TLBs */
+ native_write_cr4(cr4);
raw_local_irq_restore(flags);
}
+/*
+ * flush one page in the user mapping
+ */
static inline void __native_flush_tlb_single(unsigned long addr)
{
+ u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ /*
+ * Some platforms #GP if we call invpcid(type=1/2) before CR4.PCIDE=1.
+ * Just use invalidate_user_asid() in case we are called early.
+ */
+ if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE))
+ invalidate_user_asid(loaded_mm_asid);
+ else
+ invpcid_flush_one(user_pcid(loaded_mm_asid), addr);
}
+/*
+ * flush everything
+ */
static inline void __flush_tlb_all(void)
{
- if (boot_cpu_has(X86_FEATURE_PGE))
+ if (boot_cpu_has(X86_FEATURE_PGE)) {
__flush_tlb_global();
- else
+ } else {
+ /*
+ * !PGE -> !PCID (setup_pcid()), thus every flush is total.
+ */
__flush_tlb();
-
- /*
- * Note: if we somehow had PCID but not PGE, then this wouldn't work --
- * we'd end up flushing kernel translations for the current ASID but
- * we might fail to flush kernel translations for other cached ASIDs.
- *
- * To avoid this issue, we force PCID off if PGE is off.
- */
+ }
}
+/*
+ * flush one page in the kernel mapping
+ */
static inline void __flush_tlb_one(unsigned long addr)
{
count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);
__flush_tlb_single(addr);
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ /*
+ * __flush_tlb_single() will have cleared the TLB entry for this ASID,
+ * but since kernel space is replicated across all, we must also
+ * invalidate all others.
+ */
+ invalidate_other_asid();
}
#define TLB_FLUSH_ALL -1UL
void native_flush_tlb_others(const struct cpumask *cpumask,
const struct flush_tlb_info *info);
+static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
+{
+ /*
+ * Bump the generation count. This also serves as a full barrier
+ * that synchronizes with switch_mm(): callers are required to order
+ * their read of mm_cpumask after their writes to the paging
+ * structures.
+ */
+ return atomic64_inc_return(&mm->context.tlb_gen);
+}
+
static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
struct mm_struct *mm)
{
DECLARE_EVENT_CLASS(vector_activate,
TP_PROTO(unsigned int irq, bool is_managed, bool can_reserve,
- bool early),
+ bool reserve),
- TP_ARGS(irq, is_managed, can_reserve, early),
+ TP_ARGS(irq, is_managed, can_reserve, reserve),
TP_STRUCT__entry(
__field( unsigned int, irq )
__field( bool, is_managed )
__field( bool, can_reserve )
- __field( bool, early )
+ __field( bool, reserve )
),
TP_fast_assign(
__entry->irq = irq;
__entry->is_managed = is_managed;
__entry->can_reserve = can_reserve;
- __entry->early = early;
+ __entry->reserve = reserve;
),
- TP_printk("irq=%u is_managed=%d can_reserve=%d early=%d",
+ TP_printk("irq=%u is_managed=%d can_reserve=%d reserve=%d",
__entry->irq, __entry->is_managed, __entry->can_reserve,
- __entry->early)
+ __entry->reserve)
);
#define DEFINE_IRQ_VECTOR_ACTIVATE_EVENT(name) \
DEFINE_EVENT_FN(vector_activate, name, \
TP_PROTO(unsigned int irq, bool is_managed, \
- bool can_reserve, bool early), \
- TP_ARGS(irq, is_managed, can_reserve, early), NULL, NULL); \
+ bool can_reserve, bool reserve), \
+ TP_ARGS(irq, is_managed, can_reserve, reserve), NULL, NULL); \
DEFINE_IRQ_VECTOR_ACTIVATE_EVENT(vector_activate);
DEFINE_IRQ_VECTOR_ACTIVATE_EVENT(vector_deactivate);
dotraplinkage void do_stack_segment(struct pt_regs *, long);
#ifdef CONFIG_X86_64
dotraplinkage void do_double_fault(struct pt_regs *, long);
-asmlinkage struct pt_regs *sync_regs(struct pt_regs *);
#endif
dotraplinkage void do_general_protection(struct pt_regs *, long);
dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
#include <asm/ptrace.h>
#include <asm/stacktrace.h>
+#define IRET_FRAME_OFFSET (offsetof(struct pt_regs, ip))
+#define IRET_FRAME_SIZE (sizeof(struct pt_regs) - IRET_FRAME_OFFSET)
+
struct unwind_state {
struct stack_info stack_info;
unsigned long stack_mask;
}
#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
-static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+/*
+ * If 'partial' returns true, only the iret frame registers are valid.
+ */
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
+ bool *partial)
{
if (unwind_done(state))
return NULL;
+ if (partial) {
+#ifdef CONFIG_UNWINDER_ORC
+ *partial = !state->full_regs;
+#else
+ *partial = false;
+#endif
+ }
+
return state->regs;
}
#else
-static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
+ bool *partial)
{
return NULL;
}
#ifdef CONFIG_X86_VSYSCALL_EMULATION
extern void map_vsyscall(void);
+extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
/*
* Called on instruction fetch fault in vsyscall page.
#define X86_CR3_PWT _BITUL(X86_CR3_PWT_BIT)
#define X86_CR3_PCD_BIT 4 /* Page Cache Disable */
#define X86_CR3_PCD _BITUL(X86_CR3_PCD_BIT)
-#define X86_CR3_PCID_MASK _AC(0x00000fff,UL) /* PCID Mask */
+
+#define X86_CR3_PCID_BITS 12
+#define X86_CR3_PCID_MASK (_AC((1UL << X86_CR3_PCID_BITS) - 1, UL))
+
+#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
+#define X86_CR3_PCID_NOFLUSH _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
/*
* Intel CPU features in CR4
apic_verbosity = APIC_DEBUG;
else if (strcmp("verbose", arg) == 0)
apic_verbosity = APIC_VERBOSE;
+#ifdef CONFIG_X86_64
else {
pr_warning("APIC Verbosity level %s not recognised"
" use apic=verbose or apic=debug\n", arg);
return -EINVAL;
}
+#endif
return 0;
}
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = flat_apic_id_registered,
- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */
.disable_esr = 0,
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = noop_apic_id_registered,
- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode = 1,
}
int mp_irqdomain_activate(struct irq_domain *domain,
- struct irq_data *irq_data, bool early)
+ struct irq_data *irq_data, bool reserve)
{
unsigned long flags;
((apic->irq_dest_mode == 0) ?
MSI_ADDR_DEST_MODE_PHYSICAL :
MSI_ADDR_DEST_MODE_LOGICAL) |
- ((apic->irq_delivery_mode != dest_LowestPrio) ?
- MSI_ADDR_REDIRECTION_CPU :
- MSI_ADDR_REDIRECTION_LOWPRI) |
+ MSI_ADDR_REDIRECTION_CPU |
MSI_ADDR_DEST_ID(cfg->dest_apicid);
msg->data =
MSI_DATA_TRIGGER_EDGE |
MSI_DATA_LEVEL_ASSERT |
- ((apic->irq_delivery_mode != dest_LowestPrio) ?
- MSI_DATA_DELIVERY_FIXED :
- MSI_DATA_DELIVERY_LOWPRI) |
+ MSI_DATA_DELIVERY_FIXED |
MSI_DATA_VECTOR(cfg->vector);
}
.apic_id_valid = default_apic_id_valid,
.apic_id_registered = default_apic_id_registered,
- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
/* logical delivery broadcast to all CPUs: */
.irq_dest_mode = 1,
irq_matrix_reserve(vector_matrix);
apicd->can_reserve = true;
apicd->has_reserved = true;
+ irqd_set_can_reserve(irqd);
trace_vector_reserve(irqd->irq, 0);
vector_assign_managed_shutdown(irqd);
}
int ret;
ret = assign_irq_vector_any_locked(irqd);
- if (!ret)
+ if (!ret) {
apicd->has_reserved = false;
+ /*
+ * Core might have disabled reservation mode after
+ * allocating the irq descriptor. Ideally this should
+ * happen before allocation time, but that would require
+ * completely convoluted ways of transporting that
+ * information.
+ */
+ if (!irqd_can_reserve(irqd))
+ apicd->can_reserve = false;
+ }
return ret;
}
}
static int x86_vector_activate(struct irq_domain *dom, struct irq_data *irqd,
- bool early)
+ bool reserve)
{
struct apic_chip_data *apicd = apic_chip_data(irqd);
unsigned long flags;
int ret = 0;
trace_vector_activate(irqd->irq, apicd->is_managed,
- apicd->can_reserve, early);
+ apicd->can_reserve, reserve);
/* Nothing to do for fixed assigned vectors */
if (!apicd->can_reserve && !apicd->is_managed)
return 0;
raw_spin_lock_irqsave(&vector_lock, flags);
- if (early || irqd_is_managed_and_shutdown(irqd))
+ if (reserve || irqd_is_managed_and_shutdown(irqd))
vector_assign_managed_shutdown(irqd);
else if (apicd->is_managed)
ret = activate_managed(irqd);
} else {
/* Release the vector */
apicd->can_reserve = true;
+ irqd_set_can_reserve(irqd);
clear_irq_vector(irqd);
realloc = true;
}
.apic_id_valid = x2apic_apic_id_valid,
.apic_id_registered = x2apic_apic_id_registered,
- .irq_delivery_mode = dest_LowestPrio,
+ .irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */
.disable_esr = 0,
#include <asm/sigframe.h>
#include <asm/bootparam.h>
#include <asm/suspend.h>
+#include <asm/tlbflush.h>
#ifdef CONFIG_XEN
#include <xen/interface/xen.h>
BLANK();
DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
+
+ /* TLB state for the entry code */
+ OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);
+
+ /* Layout info for cpu_entry_area */
+ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss);
+ OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
+ OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
+ DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
}
BLANK();
/* Offset from the sysenter stack to tss.sp0 */
- DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) -
- offsetofend(struct tss_struct, SYSENTER_stack));
-
- /* Offset from cpu_tss to SYSENTER_stack */
- OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack);
- /* Size of SYSENTER_stack */
- DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
+ DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+ offsetofend(struct cpu_entry_area, entry_stack_page.stack));
#ifdef CONFIG_CC_STACKPROTECTOR
BLANK();
#ifdef CONFIG_PARAVIRT
OFFSET(PV_CPU_usergs_sysret64, pv_cpu_ops, usergs_sysret64);
OFFSET(PV_CPU_swapgs, pv_cpu_ops, swapgs);
+#ifdef CONFIG_DEBUG_ENTRY
+ OFFSET(PV_IRQ_save_fl, pv_irq_ops, save_fl);
+#endif
BLANK();
#endif
OFFSET(TSS_ist, tss_struct, x86_tss.ist);
OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
+ OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
BLANK();
#ifdef CONFIG_CC_STACKPROTECTOR
return NULL; /* Not found */
}
-__u32 cpu_caps_cleared[NCAPINTS];
-__u32 cpu_caps_set[NCAPINTS];
+__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
+__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
void load_percpu_segment(int cpu)
{
load_stack_canary_segment();
}
-/* Setup the fixmap mapping only once per-processor */
-static inline void setup_fixmap_gdt(int cpu)
-{
-#ifdef CONFIG_X86_64
- /* On 64-bit systems, we use a read-only fixmap GDT. */
- pgprot_t prot = PAGE_KERNEL_RO;
-#else
- /*
- * On native 32-bit systems, the GDT cannot be read-only because
- * our double fault handler uses a task gate, and entering through
- * a task gate needs to change an available TSS to busy. If the GDT
- * is read-only, that will triple fault.
- *
- * On Xen PV, the GDT must be read-only because the hypervisor requires
- * it.
- */
- pgprot_t prot = boot_cpu_has(X86_FEATURE_XENPV) ?
- PAGE_KERNEL_RO : PAGE_KERNEL;
+#ifdef CONFIG_X86_32
+/* The 32-bit entry code needs to find cpu_entry_area. */
+DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area);
#endif
- __set_fixmap(get_cpu_gdt_ro_index(cpu), get_cpu_gdt_paddr(cpu), prot);
-}
+#ifdef CONFIG_X86_64
+/*
+ * Special IST stacks which the CPU switches to when it calls
+ * an IST-marked descriptor entry. Up to 7 stacks (hardware
+ * limit), all of them are 4K, except the debug stack which
+ * is 8K.
+ */
+static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
+ [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
+ [DEBUG_STACK - 1] = DEBUG_STKSZ
+};
+#endif
/* Load the original GDT from the per-cpu structure */
void load_direct_gdt(int cpu)
{
int i;
- for (i = 0; i < NCAPINTS; i++) {
+ for (i = 0; i < NCAPINTS + NBUGINTS; i++) {
c->x86_capability[i] &= ~cpu_caps_cleared[i];
c->x86_capability[i] |= cpu_caps_set[i];
}
}
setup_force_cpu_cap(X86_FEATURE_ALWAYS);
+
+ if (c->x86_vendor != X86_VENDOR_AMD)
+ setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);
+
fpu__init_system(c);
#ifdef CONFIG_X86_32
return;
cpu = get_cpu();
- tss = &per_cpu(cpu_tss, cpu);
+ tss = &per_cpu(cpu_tss_rw, cpu);
/*
* We cache MSR_IA32_SYSENTER_CS's value in the TSS's ss1 field --
tss->x86_tss.ss1 = __KERNEL_CS;
wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0);
-
- wrmsr(MSR_IA32_SYSENTER_ESP,
- (unsigned long)tss + offsetofend(struct tss_struct, SYSENTER_stack),
- 0);
-
+ wrmsr(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1), 0);
wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0);
put_cpu();
DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
EXPORT_PER_CPU_SYMBOL(__preempt_count);
-/*
- * Special IST stacks which the CPU switches to when it calls
- * an IST-marked descriptor entry. Up to 7 stacks (hardware
- * limit), all of them are 4K, except the debug stack which
- * is 8K.
- */
-static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
- [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
- [DEBUG_STACK - 1] = DEBUG_STKSZ
-};
-
-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
- [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
-
/* May not be marked __init: used by software suspend */
void syscall_init(void)
{
+ extern char _entry_trampoline[];
+ extern char entry_SYSCALL_64_trampoline[];
+
+ int cpu = smp_processor_id();
+ unsigned long SYSCALL64_entry_trampoline =
+ (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline +
+ (entry_SYSCALL_64_trampoline - _entry_trampoline);
+
wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
- wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline);
+ else
+ wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
#ifdef CONFIG_IA32_EMULATION
wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
* AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
*/
wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
- wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+ wrmsrl_safe(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1));
wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
#else
wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
if (cpu)
load_ucode_ap();
- t = &per_cpu(cpu_tss, cpu);
+ t = &per_cpu(cpu_tss_rw, cpu);
oist = &per_cpu(orig_ist, cpu);
#ifdef CONFIG_NUMA
* set up and load the per-CPU TSS
*/
if (!oist->ist[0]) {
- char *estacks = per_cpu(exception_stacks, cpu);
+ char *estacks = get_cpu_entry_area(cpu)->exception_stacks;
for (v = 0; v < N_EXCEPTION_STACKS; v++) {
estacks += exception_stack_sizes[v];
}
}
- t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
+ t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
/*
* <= is required because the CPU will access up to
enter_lazy_tlb(&init_mm, me);
/*
- * Initialize the TSS. Don't bother initializing sp0, as the initial
- * task never enters user mode.
+ * Initialize the TSS. sp0 points to the entry trampoline stack
+ * regardless of what task is running.
*/
- set_tss_desc(cpu, t);
+ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
load_TR_desc();
+ load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
load_mm_ldt(&init_mm);
if (is_uv_system())
uv_cpu_init();
- setup_fixmap_gdt(cpu);
load_fixmap_gdt(cpu);
}
{
int cpu = smp_processor_id();
struct task_struct *curr = current;
- struct tss_struct *t = &per_cpu(cpu_tss, cpu);
+ struct tss_struct *t = &per_cpu(cpu_tss_rw, cpu);
wait_for_master_cpu(cpu);
* Initialize the TSS. Don't bother initializing sp0, as the initial
* task never enters user mode.
*/
- set_tss_desc(cpu, t);
+ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
load_TR_desc();
load_mm_ldt(&init_mm);
- t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
+ t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
#ifdef CONFIG_DOUBLEFAULT
/* Set up doublefault TSS pointer in the GDT */
fpu__init_cpu();
- setup_fixmap_gdt(cpu);
load_fixmap_gdt(cpu);
}
#endif
}
#else
-/*
- * Flush global tlb. We only do this in x86_64 where paging has been enabled
- * already and PGE should be enabled as well.
- */
-static inline void flush_tlb_early(void)
-{
- __native_flush_tlb_global_irq_disabled();
-}
-
static inline void print_ucode(struct ucode_cpu_info *uci)
{
struct microcode_intel *mc;
if (rev != mc->hdr.rev)
return -1;
-#ifdef CONFIG_X86_64
- /* Flush global tlb. This is precaution. */
- flush_tlb_early();
-#endif
uci->cpu_sig.rev = rev;
if (early)
cpu_relax();
}
-struct tss_struct doublefault_tss __cacheline_aligned = {
- .x86_tss = {
- .sp0 = STACK_START,
- .ss0 = __KERNEL_DS,
- .ldt = 0,
- .io_bitmap_base = INVALID_IO_BITMAP_OFFSET,
-
- .ip = (unsigned long) doublefault_fn,
- /* 0x2 bit is always set */
- .flags = X86_EFLAGS_SF | 0x2,
- .sp = STACK_START,
- .es = __USER_DS,
- .cs = __KERNEL_CS,
- .ss = __KERNEL_DS,
- .ds = __USER_DS,
- .fs = __KERNEL_PERCPU,
-
- .__cr3 = __pa_nodebug(swapper_pg_dir),
- }
+struct x86_hw_tss doublefault_tss __cacheline_aligned = {
+ .sp0 = STACK_START,
+ .ss0 = __KERNEL_DS,
+ .ldt = 0,
+ .io_bitmap_base = INVALID_IO_BITMAP_OFFSET,
+
+ .ip = (unsigned long) doublefault_fn,
+ /* 0x2 bit is always set */
+ .flags = X86_EFLAGS_SF | 0x2,
+ .sp = STACK_START,
+ .es = __USER_DS,
+ .cs = __KERNEL_CS,
+ .ss = __KERNEL_DS,
+ .ds = __USER_DS,
+ .fs = __KERNEL_PERCPU,
+
+ .__cr3 = __pa_nodebug(swapper_pg_dir),
};
/* dummy for do_double_fault() call */
#include <linux/nmi.h>
#include <linux/sysfs.h>
+#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
#include <asm/unwind.h>
return true;
}
+bool in_entry_stack(unsigned long *stack, struct stack_info *info)
+{
+ struct entry_stack *ss = cpu_entry_stack(smp_processor_id());
+
+ void *begin = ss;
+ void *end = ss + 1;
+
+ if ((void *)stack < begin || (void *)stack >= end)
+ return false;
+
+ info->type = STACK_TYPE_ENTRY;
+ info->begin = begin;
+ info->end = end;
+ info->next_sp = NULL;
+
+ return true;
+}
+
static void printk_stack_address(unsigned long address, int reliable,
char *log_lvl)
{
printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
}
+void show_iret_regs(struct pt_regs *regs)
+{
+ printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip);
+ printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss,
+ regs->sp, regs->flags);
+}
+
+static void show_regs_if_on_stack(struct stack_info *info, struct pt_regs *regs,
+ bool partial)
+{
+ /*
+ * These on_stack() checks aren't strictly necessary: the unwind code
+ * has already validated the 'regs' pointer. The checks are done for
+ * ordering reasons: if the registers are on the next stack, we don't
+ * want to print them out yet. Otherwise they'll be shown as part of
+ * the wrong stack. Later, when show_trace_log_lvl() switches to the
+ * next stack, this function will be called again with the same regs so
+ * they can be printed in the right context.
+ */
+ if (!partial && on_stack(info, regs, sizeof(*regs))) {
+ __show_regs(regs, 0);
+
+ } else if (partial && on_stack(info, (void *)regs + IRET_FRAME_OFFSET,
+ IRET_FRAME_SIZE)) {
+ /*
+ * When an interrupt or exception occurs in entry code, the
+ * full pt_regs might not have been saved yet. In that case
+ * just print the iret frame.
+ */
+ show_iret_regs(regs);
+ }
+}
+
void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, char *log_lvl)
{
struct stack_info stack_info = {0};
unsigned long visit_mask = 0;
int graph_idx = 0;
+ bool partial;
printk("%sCall Trace:\n", log_lvl);
unwind_start(&state, task, regs, stack);
stack = stack ? : get_stack_pointer(task, regs);
+ regs = unwind_get_entry_regs(&state, &partial);
/*
* Iterate through the stacks, starting with the current stack pointer.
* - task stack
* - interrupt stack
* - HW exception stacks (double fault, nmi, debug, mce)
+ * - entry stack
*
- * x86-32 can have up to three stacks:
+ * x86-32 can have up to four stacks:
* - task stack
* - softirq stack
* - hardirq stack
+ * - entry stack
*/
- for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
+ for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
const char *stack_name;
- /*
- * If we overflowed the task stack into a guard page, jump back
- * to the bottom of the usable stack.
- */
- if (task_stack_page(task) - (void *)stack < PAGE_SIZE)
- stack = task_stack_page(task);
-
- if (get_stack_info(stack, task, &stack_info, &visit_mask))
- break;
+ if (get_stack_info(stack, task, &stack_info, &visit_mask)) {
+ /*
+ * We weren't on a valid stack. It's possible that
+ * we overflowed a valid stack into a guard page.
+ * See if the next page up is valid so that we can
+ * generate some kind of backtrace if this happens.
+ */
+ stack = (unsigned long *)PAGE_ALIGN((unsigned long)stack);
+ if (get_stack_info(stack, task, &stack_info, &visit_mask))
+ break;
+ }
stack_name = stack_type_name(stack_info.type);
if (stack_name)
printk("%s <%s>\n", log_lvl, stack_name);
- if (regs && on_stack(&stack_info, regs, sizeof(*regs)))
- __show_regs(regs, 0);
+ if (regs)
+ show_regs_if_on_stack(&stack_info, regs, partial);
/*
* Scan the stack, printing any text addresses we find. At the
/*
* Don't print regs->ip again if it was already printed
- * by __show_regs() below.
+ * by show_regs_if_on_stack().
*/
if (regs && stack == ®s->ip)
goto next;
unwind_next_frame(&state);
/* if the frame has entry regs, print them */
- regs = unwind_get_entry_regs(&state);
- if (regs && on_stack(&stack_info, regs, sizeof(*regs)))
- __show_regs(regs, 0);
+ regs = unwind_get_entry_regs(&state, &partial);
+ if (regs)
+ show_regs_if_on_stack(&stack_info, regs, partial);
}
if (stack_name)
unsigned long sp;
#endif
printk(KERN_DEFAULT
- "%s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff, ++die_counter,
+ "%s: %04lx [#%d]%s%s%s%s%s\n", str, err & 0xffff, ++die_counter,
IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "",
IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
- IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "");
+ IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "",
+ IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION) ?
+ (boot_cpu_has(X86_FEATURE_PTI) ? " PTI" : " NOPTI") : "");
if (notify_die(DIE_OOPS, str, regs, err,
current->thread.trap_nr, SIGSEGV) == NOTIFY_STOP)
if (type == STACK_TYPE_SOFTIRQ)
return "SOFTIRQ";
+ if (type == STACK_TYPE_ENTRY)
+ return "ENTRY_TRAMPOLINE";
+
return NULL;
}
if (task != current)
goto unknown;
+ if (in_entry_stack(stack, info))
+ goto recursion_check;
+
if (in_hardirq_stack(stack, info))
goto recursion_check;
if (type == STACK_TYPE_IRQ)
return "IRQ";
+ if (type == STACK_TYPE_ENTRY) {
+ /*
+ * On 64-bit, we have a generic entry stack that we
+ * use for all the kernel entry points, including
+ * SYSENTER.
+ */
+ return "ENTRY_TRAMPOLINE";
+ }
+
if (type >= STACK_TYPE_EXCEPTION && type <= STACK_TYPE_EXCEPTION_LAST)
return exception_stack_names[type - STACK_TYPE_EXCEPTION];
if (in_irq_stack(stack, info))
goto recursion_check;
+ if (in_entry_stack(stack, info))
+ goto recursion_check;
+
goto unknown;
recursion_check:
.balign PAGE_SIZE; \
GLOBAL(name)
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * Each PGD needs to be 8k long and 8k aligned. We do not
+ * ever go out to userspace with these, so we do not
+ * strictly *need* the second page, but this allows us to
+ * have a single set_pgd() implementation that does not
+ * need to worry about whether it has 4k or 8k to work
+ * with.
+ *
+ * This ensures PGDs are 8k long:
+ */
+#define PTI_USER_PGD_FILL 512
+/* This ensures they are 8k-aligned: */
+#define NEXT_PGD_PAGE(name) \
+ .balign 2 * PAGE_SIZE; \
+GLOBAL(name)
+#else
+#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#define PTI_USER_PGD_FILL 0
+#endif
+
/* Automate the creation of 1 to 1 mapping pmd entries */
#define PMDS(START, PERM, COUNT) \
i = 0 ; \
.endr
__INITDATA
-NEXT_PAGE(early_top_pgt)
+NEXT_PGD_PAGE(early_top_pgt)
.fill 511,8,0
#ifdef CONFIG_X86_5LEVEL
.quad level4_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
#else
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
#endif
+ .fill PTI_USER_PGD_FILL,8,0
NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
.data
#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH)
-NEXT_PAGE(init_top_pgt)
+NEXT_PGD_PAGE(init_top_pgt)
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_top_pgt + PGD_PAGE_OFFSET*8, 0
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_top_pgt + PGD_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+ .fill PTI_USER_PGD_FILL,8,0
NEXT_PAGE(level3_ident_pgt)
.quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
*/
PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
#else
-NEXT_PAGE(init_top_pgt)
+NEXT_PGD_PAGE(init_top_pgt)
.fill 512,8,0
+ .fill PTI_USER_PGD_FILL,8,0
#endif
#ifdef CONFIG_X86_5LEVEL
* because the ->io_bitmap_max value must match the bitmap
* contents:
*/
- tss = &per_cpu(cpu_tss, get_cpu());
+ tss = &per_cpu(cpu_tss_rw, get_cpu());
if (turn_on)
bitmap_clear(t->io_bitmap_ptr, from, num);
/* high bit used in ret_from_ code */
unsigned vector = ~regs->orig_ax;
- /*
- * NB: Unlike exception entries, IRQ entries do not reliably
- * handle context tracking in the low-level entry code. This is
- * because syscall entries execute briefly with IRQs on before
- * updating context tracking state, so we can take an IRQ from
- * kernel mode with CONTEXT_USER. The low-level entry code only
- * updates the context if we came from user mode, so we won't
- * switch to CONTEXT_KERNEL. We'll fix that once the syscall
- * code is cleaned up enough that we can cleanly defer enabling
- * IRQs.
- */
-
entering_irq();
/* entering_irq() tells RCU that we're not quiescent. Check it. */
if (regs->sp >= estack_top && regs->sp <= estack_bottom)
return;
- WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx)\n",
+ WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx,ip:%pF)\n",
current->comm, curbase, regs->sp,
irq_stack_top, irq_stack_bottom,
- estack_top, estack_bottom);
+ estack_top, estack_bottom, (void *)regs->ip);
if (sysctl_panic_on_stackoverflow)
panic("low stack detected by irq handler - check messages\n");
* Copyright (C) 2002 Andi Kleen
*
* This handles calls from both 32bit and 64bit mode.
+ *
+ * Lock order:
+ * contex.ldt_usr_sem
+ * mmap_sem
+ * context.lock
*/
#include <linux/errno.h>
#include <linux/uaccess.h>
#include <asm/ldt.h>
+#include <asm/tlb.h>
#include <asm/desc.h>
#include <asm/mmu_context.h>
#include <asm/syscalls.h>
#endif
}
-/* context.lock is held for us, so we don't need any locking. */
+/* context.lock is held by the task which issued the smp function call */
static void flush_ldt(void *__mm)
{
struct mm_struct *mm = __mm;
- mm_context_t *pc;
if (this_cpu_read(cpu_tlbstate.loaded_mm) != mm)
return;
- pc = &mm->context;
- set_ldt(pc->ldt->entries, pc->ldt->nr_entries);
+ load_mm_ldt(mm);
refresh_ldt_segments();
}
return NULL;
}
+ /* The new LDT isn't aliased for PTI yet. */
+ new_ldt->slot = -1;
+
new_ldt->nr_entries = num_entries;
return new_ldt;
}
+/*
+ * If PTI is enabled, this maps the LDT into the kernelmode and
+ * usermode tables for the given mm.
+ *
+ * There is no corresponding unmap function. Even if the LDT is freed, we
+ * leave the PTEs around until the slot is reused or the mm is destroyed.
+ * This is harmless: the LDT is always in ordinary memory, and no one will
+ * access the freed slot.
+ *
+ * If we wanted to unmap freed LDTs, we'd also need to do a flush to make
+ * it useful, and the flush would slow down modify_ldt().
+ */
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ bool is_vmalloc, had_top_level_entry;
+ unsigned long va;
+ spinlock_t *ptl;
+ pgd_t *pgd;
+ int i;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return 0;
+
+ /*
+ * Any given ldt_struct should have map_ldt_struct() called at most
+ * once.
+ */
+ WARN_ON(ldt->slot != -1);
+
+ /*
+ * Did we already have the top level entry allocated? We can't
+ * use pgd_none() for this because it doens't do anything on
+ * 4-level page table kernels.
+ */
+ pgd = pgd_offset(mm, LDT_BASE_ADDR);
+ had_top_level_entry = (pgd->pgd != 0);
+
+ is_vmalloc = is_vmalloc_addr(ldt->entries);
+
+ for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) {
+ unsigned long offset = i << PAGE_SHIFT;
+ const void *src = (char *)ldt->entries + offset;
+ unsigned long pfn;
+ pte_t pte, *ptep;
+
+ va = (unsigned long)ldt_slot_va(slot) + offset;
+ pfn = is_vmalloc ? vmalloc_to_pfn(src) :
+ page_to_pfn(virt_to_page(src));
+ /*
+ * Treat the PTI LDT range as a *userspace* range.
+ * get_locked_pte() will allocate all needed pagetables
+ * and account for them in this mm.
+ */
+ ptep = get_locked_pte(mm, va, &ptl);
+ if (!ptep)
+ return -ENOMEM;
+ /*
+ * Map it RO so the easy to find address is not a primary
+ * target via some kernel interface which misses a
+ * permission check.
+ */
+ pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL));
+ set_pte_at(mm, va, ptep, pte);
+ pte_unmap_unlock(ptep, ptl);
+ }
+
+ if (mm->context.ldt) {
+ /*
+ * We already had an LDT. The top-level entry should already
+ * have been allocated and synchronized with the usermode
+ * tables.
+ */
+ WARN_ON(!had_top_level_entry);
+ if (static_cpu_has(X86_FEATURE_PTI))
+ WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
+ } else {
+ /*
+ * This is the first time we're mapping an LDT for this process.
+ * Sync the pgd to the usermode tables.
+ */
+ WARN_ON(had_top_level_entry);
+ if (static_cpu_has(X86_FEATURE_PTI)) {
+ WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
+ set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+ }
+ }
+
+ va = (unsigned long)ldt_slot_va(slot);
+ flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
+
+ ldt->slot = slot;
+#endif
+ return 0;
+}
+
+static void free_ldt_pgtables(struct mm_struct *mm)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ struct mmu_gather tlb;
+ unsigned long start = LDT_BASE_ADDR;
+ unsigned long end = start + (1UL << PGDIR_SHIFT);
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ tlb_gather_mmu(&tlb, mm, start, end);
+ free_pgd_range(&tlb, start, end, start, end);
+ tlb_finish_mmu(&tlb, start, end);
+#endif
+}
+
/* After calling this, the LDT is immutable. */
static void finalize_ldt_struct(struct ldt_struct *ldt)
{
paravirt_alloc_ldt(ldt->entries, ldt->nr_entries);
}
-/* context.lock is held */
-static void install_ldt(struct mm_struct *current_mm,
- struct ldt_struct *ldt)
+static void install_ldt(struct mm_struct *mm, struct ldt_struct *ldt)
{
+ mutex_lock(&mm->context.lock);
+
/* Synchronizes with READ_ONCE in load_mm_ldt. */
- smp_store_release(¤t_mm->context.ldt, ldt);
+ smp_store_release(&mm->context.ldt, ldt);
- /* Activate the LDT for all CPUs using current_mm. */
- on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+ /* Activate the LDT for all CPUs using currents mm. */
+ on_each_cpu_mask(mm_cpumask(mm), flush_ldt, mm, true);
+
+ mutex_unlock(&mm->context.lock);
}
static void free_ldt_struct(struct ldt_struct *ldt)
}
/*
- * we do not have to muck with descriptors here, that is
- * done in switch_mm() as needed.
+ * Called on fork from arch_dup_mmap(). Just copy the current LDT state,
+ * the new task is not running, so nothing can be installed.
*/
-int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm)
+int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm)
{
struct ldt_struct *new_ldt;
- struct mm_struct *old_mm;
int retval = 0;
- mutex_init(&mm->context.lock);
- old_mm = current->mm;
- if (!old_mm) {
- mm->context.ldt = NULL;
+ if (!old_mm)
return 0;
- }
mutex_lock(&old_mm->context.lock);
- if (!old_mm->context.ldt) {
- mm->context.ldt = NULL;
+ if (!old_mm->context.ldt)
goto out_unlock;
- }
new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries);
if (!new_ldt) {
new_ldt->nr_entries * LDT_ENTRY_SIZE);
finalize_ldt_struct(new_ldt);
+ retval = map_ldt_struct(mm, new_ldt, 0);
+ if (retval) {
+ free_ldt_pgtables(mm);
+ free_ldt_struct(new_ldt);
+ goto out_unlock;
+ }
mm->context.ldt = new_ldt;
out_unlock:
mm->context.ldt = NULL;
}
+void ldt_arch_exit_mmap(struct mm_struct *mm)
+{
+ free_ldt_pgtables(mm);
+}
+
static int read_ldt(void __user *ptr, unsigned long bytecount)
{
struct mm_struct *mm = current->mm;
unsigned long entries_size;
int retval;
- mutex_lock(&mm->context.lock);
+ down_read(&mm->context.ldt_usr_sem);
if (!mm->context.ldt) {
retval = 0;
retval = bytecount;
out_unlock:
- mutex_unlock(&mm->context.lock);
+ up_read(&mm->context.ldt_usr_sem);
return retval;
}
ldt.avl = 0;
}
- mutex_lock(&mm->context.lock);
+ if (down_write_killable(&mm->context.ldt_usr_sem))
+ return -EINTR;
old_ldt = mm->context.ldt;
old_nr_entries = old_ldt ? old_ldt->nr_entries : 0;
new_ldt->entries[ldt_info.entry_number] = ldt;
finalize_ldt_struct(new_ldt);
+ /*
+ * If we are using PTI, map the new LDT into the userspace pagetables.
+ * If there is already an LDT, use the other slot so that other CPUs
+ * will continue to use the old LDT until install_ldt() switches
+ * them over to the new LDT.
+ */
+ error = map_ldt_struct(mm, new_ldt, old_ldt ? !old_ldt->slot : 0);
+ if (error) {
+ /*
+ * This only can fail for the first LDT setup. If an LDT is
+ * already installed then the PTE page is already
+ * populated. Mop up a half populated page table.
+ */
+ if (!WARN_ON_ONCE(old_ldt))
+ free_ldt_pgtables(mm);
+ free_ldt_struct(new_ldt);
+ goto out_unlock;
+ }
+
install_ldt(mm, new_ldt);
free_ldt_struct(old_ldt);
error = 0;
out_unlock:
- mutex_unlock(&mm->context.lock);
+ up_write(&mm->context.ldt_usr_sem);
out:
return error;
}
"\tmovl $"STR(__KERNEL_DS)",%%eax\n"
"\tmovl %%eax,%%ds\n"
"\tmovl %%eax,%%es\n"
- "\tmovl %%eax,%%fs\n"
- "\tmovl %%eax,%%gs\n"
"\tmovl %%eax,%%ss\n"
: : : "eax", "memory");
#undef STR
* The gdt & idt are now invalid.
* If you want to load them you must set up your own idt & gdt.
*/
- set_gdt(phys_to_virt(0), 0);
idt_invalidate(phys_to_virt(0));
+ set_gdt(phys_to_virt(0), 0);
/* now call it */
image->start = relocate_kernel_ptr((unsigned long)image->head,
DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
DEF_NATIVE(pv_cpu_ops, usergs_sysret64, "swapgs; sysretq");
PATCH_SITE(pv_mmu_ops, read_cr2);
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
- PATCH_SITE(pv_mmu_ops, flush_tlb_single);
PATCH_SITE(pv_cpu_ops, wbinvd);
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
* section. Since TSS's are completely CPU-local, we want them
* on exact cacheline boundaries, to eliminate cacheline ping-pong.
*/
-__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
+__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
.x86_tss = {
/*
* .sp0 is only used when entering ring 0 from a lower
* Poison it.
*/
.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
+
+#ifdef CONFIG_X86_64
+ /*
+ * .sp1 is cpu_current_top_of_stack. The init task never
+ * runs user code, but cpu_current_top_of_stack should still
+ * be well defined before the first context switch.
+ */
+ .sp1 = TOP_OF_INIT_STACK,
+#endif
+
#ifdef CONFIG_X86_32
.ss0 = __KERNEL_DS,
.ss1 = __KERNEL_CS,
*/
.io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 },
#endif
-#ifdef CONFIG_X86_32
- .SYSENTER_stack_canary = STACK_END_MAGIC,
-#endif
};
-EXPORT_PER_CPU_SYMBOL(cpu_tss);
+EXPORT_PER_CPU_SYMBOL(cpu_tss_rw);
DEFINE_PER_CPU(bool, __tss_limit_invalid);
EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
struct fpu *fpu = &t->fpu;
if (bp) {
- struct tss_struct *tss = &per_cpu(cpu_tss, get_cpu());
+ struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu());
t->io_bitmap_ptr = NULL;
clear_thread_flag(TIF_IO_BITMAP);
struct fpu *prev_fpu = &prev->fpu;
struct fpu *next_fpu = &next->fpu;
int cpu = smp_processor_id();
- struct tss_struct *tss = &per_cpu(cpu_tss, cpu);
+ struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu);
/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
unsigned int fsindex, gsindex;
unsigned int ds, cs, es;
- printk(KERN_DEFAULT "RIP: %04lx:%pS\n", regs->cs, (void *)regs->ip);
- printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss,
- regs->sp, regs->flags);
+ show_iret_regs(regs);
+
if (regs->orig_ax != -1)
pr_cont(" ORIG_RAX: %016lx\n", regs->orig_ax);
else
printk(KERN_DEFAULT "R13: %016lx R14: %016lx R15: %016lx\n",
regs->r13, regs->r14, regs->r15);
+ if (!all)
+ return;
+
asm("movl %%ds,%0" : "=r" (ds));
asm("movl %%cs,%0" : "=r" (cs));
asm("movl %%es,%0" : "=r" (es));
rdmsrl(MSR_GS_BASE, gs);
rdmsrl(MSR_KERNEL_GS_BASE, shadowgs);
- if (!all)
- return;
-
cr0 = read_cr0();
cr2 = read_cr2();
cr3 = __read_cr3();
struct fpu *prev_fpu = &prev->fpu;
struct fpu *next_fpu = &next->fpu;
int cpu = smp_processor_id();
- struct tss_struct *tss = &per_cpu(cpu_tss, cpu);
+ struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu);
WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
this_cpu_read(irq_count) != -1);
* Switch the PDA and FPU contexts.
*/
this_cpu_write(current_task, next_p);
+ this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
/* Reload sp0. */
update_sp0(next_p);
set_bit(EFI_BOOT, &efi.flags);
set_bit(EFI_64BIT, &efi.flags);
}
-
- if (efi_enabled(EFI_BOOT))
- efi_memblock_x86_reserve_range();
#endif
x86_init.oem.arch_setup();
parse_early_param();
+ if (efi_enabled(EFI_BOOT))
+ efi_memblock_x86_reserve_range();
#ifdef CONFIG_MEMORY_HOTPLUG
/*
* Memory used by the kernel cannot be hot-removed because Linux
static unsigned int logical_packages __read_mostly;
/* Maximum number of SMT threads on any online core */
-int __max_smt_threads __read_mostly;
+int __read_mostly __max_smt_threads = 1;
/* Flag to indicate if a complete sched domain rebuild is required */
bool x86_topology_update;
spin_lock_irqsave(&rtc_lock, flags);
CMOS_WRITE(0xa, 0xf);
spin_unlock_irqrestore(&rtc_lock, flags);
- local_flush_tlb();
- pr_debug("1.\n");
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
start_eip >> 4;
- pr_debug("2.\n");
*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
start_eip & 0xf;
- pr_debug("3.\n");
}
static inline void smpboot_restore_warm_reset_vector(void)
{
unsigned long flags;
- /*
- * Install writable page 0 entry to set BIOS data area.
- */
- local_flush_tlb();
-
/*
* Paranoid: Set warm reset code and vector here back
* to default values.
initial_code = (unsigned long)start_secondary;
initial_stack = idle->thread.sp;
- /*
- * Enable the espfix hack for this CPU
- */
-#ifdef CONFIG_X86_ESPFIX64
+ /* Enable the espfix hack for this CPU */
init_espfix_ap(cpu);
-#endif
/* So we see what's up */
announce_cpu(cpu, apicid);
* Today neither Intel nor AMD support heterogenous systems so
* extrapolate the boot cpu's data to all packages.
*/
- ncpus = cpu_data(0).booted_cores * smp_num_siblings;
+ ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
pr_info("Max logical packages: %u\n", __max_logical_packages);
for (unwind_start(&state, task, NULL, NULL); !unwind_done(&state);
unwind_next_frame(&state)) {
- regs = unwind_get_entry_regs(&state);
+ regs = unwind_get_entry_regs(&state, NULL);
if (regs) {
/*
* Kernel mode registers on the stack indicate an
{
int ret;
+ /*
+ * If the task doesn't have a stack (e.g., a zombie), the stack is
+ * "reliably" empty.
+ */
if (!try_get_task_stack(tsk))
- return -EINVAL;
+ return 0;
ret = __save_stack_trace_reliable(trace, tsk);
cpu = get_cpu();
while (n-- > 0) {
- if (LDT_empty(info) || LDT_zero(info)) {
+ if (LDT_empty(info) || LDT_zero(info))
memset(desc, 0, sizeof(*desc));
- } else {
+ else
fill_ldt(desc, info);
-
- /*
- * Always set the accessed bit so that the CPU
- * doesn't try to write to the (read-only) GDT.
- */
- desc->type |= 1;
- }
++info;
++desc;
}
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
/*
* If IRET takes a non-IST fault on the espfix64 stack, then we
- * end up promoting it to a doublefault. In that case, modify
- * the stack to make it look like we just entered the #GP
- * handler from user space, similar to bad_iret.
+ * end up promoting it to a doublefault. In that case, take
+ * advantage of the fact that we're not using the normal (TSS.sp0)
+ * stack right now. We can write a fake #GP(0) frame at TSS.sp0
+ * and then modify our own IRET frame so that, when we return,
+ * we land directly at the #GP(0) vector with the stack already
+ * set up according to its expectations.
+ *
+ * The net result is that our #GP handler will think that we
+ * entered from usermode with the bad user context.
*
* No need for ist_enter here because we don't use RCU.
*/
- if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
+ if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY &&
regs->cs == __KERNEL_CS &&
regs->ip == (unsigned long)native_irq_return_iret)
{
- struct pt_regs *normal_regs = task_pt_regs(current);
+ struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
- /* Fake a #GP(0) from userspace. */
- memmove(&normal_regs->ip, (void *)regs->sp, 5*8);
- normal_regs->orig_ax = 0; /* Missing (lost) #GP error code */
+ /*
+ * regs->sp points to the failing IRET frame on the
+ * ESPFIX64 stack. Copy it to the entry stack. This fills
+ * in gpregs->ss through gpregs->ip.
+ *
+ */
+ memmove(&gpregs->ip, (void *)regs->sp, 5*8);
+ gpregs->orig_ax = 0; /* Missing (lost) #GP error code */
+
+ /*
+ * Adjust our frame so that we return straight to the #GP
+ * vector with the expected RSP value. This is safe because
+ * we won't enable interupts or schedule before we invoke
+ * general_protection, so nothing will clobber the stack
+ * frame we just set up.
+ */
regs->ip = (unsigned long)general_protection;
- regs->sp = (unsigned long)&normal_regs->orig_ax;
+ regs->sp = (unsigned long)&gpregs->orig_ax;
return;
}
*
* Processors update CR2 whenever a page fault is detected. If a
* second page fault occurs while an earlier page fault is being
- * deliv- ered, the faulting linear address of the second fault will
+ * delivered, the faulting linear address of the second fault will
* overwrite the contents of CR2 (replacing the previous
* address). These updates to CR2 occur even if the page fault
* results in a double fault or occurs during the delivery of a
#ifdef CONFIG_X86_64
/*
- * Help handler running on IST stack to switch off the IST stack if the
- * interrupted code was in user mode. The actual stack switch is done in
- * entry_64.S
+ * Help handler running on a per-cpu (IST or entry trampoline) stack
+ * to switch to the normal thread stack if the interrupted code was in
+ * user mode. The actual stack switch is done in entry_64.S
*/
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
{
- struct pt_regs *regs = task_pt_regs(current);
- *regs = *eregs;
+ struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
+ if (regs != eregs)
+ *regs = *eregs;
return regs;
}
NOKPROBE_SYMBOL(sync_regs);
/*
* This is called from entry_64.S early in handling a fault
* caused by a bad iret to user mode. To handle the fault
- * correctly, we want move our stack frame to task_pt_regs
- * and we want to pretend that the exception came from the
- * iret target.
+ * correctly, we want to move our stack frame to where it would
+ * be had we entered directly on the entry stack (rather than
+ * just below the IRET frame) and we want to pretend that the
+ * exception came from the IRET target.
*/
struct bad_iret_stack *new_stack =
- container_of(task_pt_regs(current),
- struct bad_iret_stack, regs);
+ (struct bad_iret_stack *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
/* Copy the IRET target to the new stack. */
memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
debug_stack_usage_dec();
exit:
-#if defined(CONFIG_X86_32)
- /*
- * This is the most likely code path that involves non-trivial use
- * of the SYSENTER stack. Check that we haven't overrun it.
- */
- WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC,
- "Overran or corrupted SYSENTER stack\n");
-#endif
ist_exit(regs);
}
NOKPROBE_SYMBOL(do_debug);
void __init trap_init(void)
{
+ /* Init cpu_entry_area before IST entries are set up */
+ setup_cpu_entry_areas();
+
idt_setup_traps();
/*
* "sidt" instruction will not leak the location of the kernel, and
* to defend the IDT against arbitrary memory write vulnerabilities.
* It will be reloaded in cpu_init() */
- __set_fixmap(FIX_RO_IDT, __pa_symbol(idt_table), PAGE_KERNEL_RO);
- idt_descr.address = fix_to_virt(FIX_RO_IDT);
+ cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table),
+ PAGE_KERNEL_RO);
+ idt_descr.address = CPU_ENTRY_AREA_RO_IDT;
/*
* Should be a barrier for any external CPU state:
return NULL;
}
-static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
+static bool stack_access_ok(struct unwind_state *state, unsigned long _addr,
size_t len)
{
struct stack_info *info = &state->stack_info;
+ void *addr = (void *)_addr;
- /*
- * If the address isn't on the current stack, switch to the next one.
- *
- * We may have to traverse multiple stacks to deal with the possibility
- * that info->next_sp could point to an empty stack and the address
- * could be on a subsequent stack.
- */
- while (!on_stack(info, (void *)addr, len))
- if (get_stack_info(info->next_sp, state->task, info,
- &state->stack_mask))
- return false;
+ if (!on_stack(info, addr, len) &&
+ (get_stack_info(addr, state->task, info, &state->stack_mask)))
+ return false;
return true;
}
return true;
}
-#define REGS_SIZE (sizeof(struct pt_regs))
-#define SP_OFFSET (offsetof(struct pt_regs, sp))
-#define IRET_REGS_SIZE (REGS_SIZE - offsetof(struct pt_regs, ip))
-#define IRET_SP_OFFSET (SP_OFFSET - offsetof(struct pt_regs, ip))
-
static bool deref_stack_regs(struct unwind_state *state, unsigned long addr,
- unsigned long *ip, unsigned long *sp, bool full)
+ unsigned long *ip, unsigned long *sp)
{
- size_t regs_size = full ? REGS_SIZE : IRET_REGS_SIZE;
- size_t sp_offset = full ? SP_OFFSET : IRET_SP_OFFSET;
- struct pt_regs *regs = (struct pt_regs *)(addr + regs_size - REGS_SIZE);
-
- if (IS_ENABLED(CONFIG_X86_64)) {
- if (!stack_access_ok(state, addr, regs_size))
- return false;
+ struct pt_regs *regs = (struct pt_regs *)addr;
- *ip = regs->ip;
- *sp = regs->sp;
+ /* x86-32 support will be more complicated due to the ®s->sp hack */
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_32));
- return true;
- }
-
- if (!stack_access_ok(state, addr, sp_offset))
+ if (!stack_access_ok(state, addr, sizeof(struct pt_regs)))
return false;
*ip = regs->ip;
+ *sp = regs->sp;
+ return true;
+}
- if (user_mode(regs)) {
- if (!stack_access_ok(state, addr + sp_offset,
- REGS_SIZE - SP_OFFSET))
- return false;
+static bool deref_stack_iret_regs(struct unwind_state *state, unsigned long addr,
+ unsigned long *ip, unsigned long *sp)
+{
+ struct pt_regs *regs = (void *)addr - IRET_FRAME_OFFSET;
- *sp = regs->sp;
- } else
- *sp = (unsigned long)®s->sp;
+ if (!stack_access_ok(state, addr, IRET_FRAME_SIZE))
+ return false;
+ *ip = regs->ip;
+ *sp = regs->sp;
return true;
}
unsigned long ip_p, sp, orig_ip, prev_sp = state->sp;
enum stack_type prev_type = state->stack_info.type;
struct orc_entry *orc;
- struct pt_regs *ptregs;
bool indirect = false;
if (unwind_done(state))
break;
case ORC_TYPE_REGS:
- if (!deref_stack_regs(state, sp, &state->ip, &state->sp, true)) {
+ if (!deref_stack_regs(state, sp, &state->ip, &state->sp)) {
orc_warn("can't dereference registers at %p for ip %pB\n",
(void *)sp, (void *)orig_ip);
goto done;
break;
case ORC_TYPE_REGS_IRET:
- if (!deref_stack_regs(state, sp, &state->ip, &state->sp, false)) {
+ if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
orc_warn("can't dereference iret registers at %p for ip %pB\n",
(void *)sp, (void *)orig_ip);
goto done;
}
- ptregs = container_of((void *)sp, struct pt_regs, ip);
- if ((unsigned long)ptregs >= prev_sp &&
- on_stack(&state->stack_info, ptregs, REGS_SIZE)) {
- state->regs = ptregs;
- state->full_regs = false;
- } else
- state->regs = NULL;
-
+ state->regs = (void *)sp - IRET_FRAME_OFFSET;
+ state->full_regs = false;
state->signal = true;
break;
}
if (get_stack_info((unsigned long *)state->sp, state->task,
- &state->stack_info, &state->stack_mask))
- return;
+ &state->stack_info, &state->stack_mask)) {
+ /*
+ * We weren't on a valid stack. It's possible that
+ * we overflowed a valid stack into a guard page.
+ * See if the next page up is valid so that we can
+ * generate some kind of backtrace if this happens.
+ */
+ void *next_page = (void *)PAGE_ALIGN((unsigned long)state->sp);
+ if (get_stack_info(next_page, state->task, &state->stack_info,
+ &state->stack_mask))
+ return;
+ }
/*
* The caller can provide the address of the first frame directly
. = ALIGN(HPAGE_SIZE); \
__end_rodata_hpage_align = .;
+#define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE);
+#define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE);
+
#else
#define X64_ALIGN_RODATA_BEGIN
#define X64_ALIGN_RODATA_END
+#define ALIGN_ENTRY_TEXT_BEGIN
+#define ALIGN_ENTRY_TEXT_END
+
#endif
PHDRS {
CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
+ ALIGN_ENTRY_TEXT_BEGIN
ENTRY_TEXT
IRQENTRY_TEXT
+ ALIGN_ENTRY_TEXT_END
SOFTIRQENTRY_TEXT
*(.fixup)
*(.gnu.warning)
+
+#ifdef CONFIG_X86_64
+ . = ALIGN(PAGE_SIZE);
+ _entry_trampoline = .;
+ *(.entry_trampoline)
+ . = ALIGN(PAGE_SIZE);
+ ASSERT(. - _entry_trampoline == PAGE_SIZE, "entry trampoline is too big");
+#endif
+
/* End of text section */
_etext = .;
} :text = 0x9090
}
static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
- u64 cr0, u64 cr4)
+ u64 cr0, u64 cr3, u64 cr4)
{
int bad;
+ u64 pcid;
+
+ /* In order to later set CR4.PCIDE, CR3[11:0] must be zero. */
+ pcid = 0;
+ if (cr4 & X86_CR4_PCIDE) {
+ pcid = cr3 & 0xfff;
+ cr3 &= ~0xfff;
+ }
+
+ bad = ctxt->ops->set_cr(ctxt, 3, cr3);
+ if (bad)
+ return X86EMUL_UNHANDLEABLE;
/*
* First enable PAE, long mode needs it before CR0.PG = 1 is set.
bad = ctxt->ops->set_cr(ctxt, 4, cr4);
if (bad)
return X86EMUL_UNHANDLEABLE;
+ if (pcid) {
+ bad = ctxt->ops->set_cr(ctxt, 3, cr3 | pcid);
+ if (bad)
+ return X86EMUL_UNHANDLEABLE;
+ }
+
}
return X86EMUL_CONTINUE;
struct desc_struct desc;
struct desc_ptr dt;
u16 selector;
- u32 val, cr0, cr4;
+ u32 val, cr0, cr3, cr4;
int i;
cr0 = GET_SMSTATE(u32, smbase, 0x7ffc);
- ctxt->ops->set_cr(ctxt, 3, GET_SMSTATE(u32, smbase, 0x7ff8));
+ cr3 = GET_SMSTATE(u32, smbase, 0x7ff8);
ctxt->eflags = GET_SMSTATE(u32, smbase, 0x7ff4) | X86_EFLAGS_FIXED;
ctxt->_eip = GET_SMSTATE(u32, smbase, 0x7ff0);
ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smbase, 0x7ef8));
- return rsm_enter_protected_mode(ctxt, cr0, cr4);
+ return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
}
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
{
struct desc_struct desc;
struct desc_ptr dt;
- u64 val, cr0, cr4;
+ u64 val, cr0, cr3, cr4;
u32 base3;
u16 selector;
int i, r;
ctxt->ops->set_dr(ctxt, 7, (val & DR7_VOLATILE) | DR7_FIXED_1);
cr0 = GET_SMSTATE(u64, smbase, 0x7f58);
- ctxt->ops->set_cr(ctxt, 3, GET_SMSTATE(u64, smbase, 0x7f50));
+ cr3 = GET_SMSTATE(u64, smbase, 0x7f50);
cr4 = GET_SMSTATE(u64, smbase, 0x7f48);
ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smbase, 0x7f00));
val = GET_SMSTATE(u64, smbase, 0x7ed0);
dt.address = GET_SMSTATE(u64, smbase, 0x7e68);
ctxt->ops->set_gdt(ctxt, &dt);
- r = rsm_enter_protected_mode(ctxt, cr0, cr4);
+ r = rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
if (r != X86EMUL_CONTINUE)
return r;
spin_lock(&vcpu->kvm->mmu_lock);
if(make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, 0, 0,
vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL);
spin_lock(&vcpu->kvm->mmu_lock);
if (make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
spin_lock(&vcpu->kvm->mmu_lock);
if (make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
vcpu->arch.mmu.shadow_root_level, 0, ACC_ALL);
spin_lock(&vcpu->kvm->mmu_lock);
if (make_mmu_pages_available(vcpu) < 0) {
spin_unlock(&vcpu->kvm->mmu_lock);
- return 1;
+ return -ENOSPC;
}
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
0, ACC_ALL);
"mov %%r13, %c[r13](%[svm]) \n\t"
"mov %%r14, %c[r14](%[svm]) \n\t"
"mov %%r15, %c[r15](%[svm]) \n\t"
+#endif
+ /*
+ * Clear host registers marked as clobbered to prevent
+ * speculative use.
+ */
+ "xor %%" _ASM_BX ", %%" _ASM_BX " \n\t"
+ "xor %%" _ASM_CX ", %%" _ASM_CX " \n\t"
+ "xor %%" _ASM_DX ", %%" _ASM_DX " \n\t"
+ "xor %%" _ASM_SI ", %%" _ASM_SI " \n\t"
+ "xor %%" _ASM_DI ", %%" _ASM_DI " \n\t"
+#ifdef CONFIG_X86_64
+ "xor %%r8, %%r8 \n\t"
+ "xor %%r9, %%r9 \n\t"
+ "xor %%r10, %%r10 \n\t"
+ "xor %%r11, %%r11 \n\t"
+ "xor %%r12, %%r12 \n\t"
+ "xor %%r13, %%r13 \n\t"
+ "xor %%r14, %%r14 \n\t"
+ "xor %%r15, %%r15 \n\t"
#endif
"pop %%" _ASM_BP
:
* processors. See 22.2.4.
*/
vmcs_writel(HOST_TR_BASE,
- (unsigned long)this_cpu_ptr(&cpu_tss));
+ (unsigned long)&get_cpu_entry_area(cpu)->tss.x86_tss);
vmcs_writel(HOST_GDTR_BASE, (unsigned long)gdt); /* 22.2.4 */
/*
/* Save guest registers, load host registers, keep flags */
"mov %0, %c[wordsize](%%" _ASM_SP ") \n\t"
"pop %0 \n\t"
+ "setbe %c[fail](%0)\n\t"
"mov %%" _ASM_AX ", %c[rax](%0) \n\t"
"mov %%" _ASM_BX ", %c[rbx](%0) \n\t"
__ASM_SIZE(pop) " %c[rcx](%0) \n\t"
"mov %%r13, %c[r13](%0) \n\t"
"mov %%r14, %c[r14](%0) \n\t"
"mov %%r15, %c[r15](%0) \n\t"
+ "xor %%r8d, %%r8d \n\t"
+ "xor %%r9d, %%r9d \n\t"
+ "xor %%r10d, %%r10d \n\t"
+ "xor %%r11d, %%r11d \n\t"
+ "xor %%r12d, %%r12d \n\t"
+ "xor %%r13d, %%r13d \n\t"
+ "xor %%r14d, %%r14d \n\t"
+ "xor %%r15d, %%r15d \n\t"
#endif
"mov %%cr2, %%" _ASM_AX " \n\t"
"mov %%" _ASM_AX ", %c[cr2](%0) \n\t"
+ "xor %%eax, %%eax \n\t"
+ "xor %%ebx, %%ebx \n\t"
+ "xor %%esi, %%esi \n\t"
+ "xor %%edi, %%edi \n\t"
"pop %%" _ASM_BP "; pop %%" _ASM_DX " \n\t"
- "setbe %c[fail](%0) \n\t"
".pushsection .rodata \n\t"
".global vmx_return \n\t"
"vmx_return: " _ASM_PTR " 2b \n\t"
addr, n, v))
&& kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, n, v))
break;
- trace_kvm_mmio(KVM_TRACE_MMIO_READ, n, addr, *(u64 *)v);
+ trace_kvm_mmio(KVM_TRACE_MMIO_READ, n, addr, v);
handled += n;
addr += n;
len -= n;
{
if (vcpu->mmio_read_completed) {
trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes,
- vcpu->mmio_fragments[0].gpa, *(u64 *)val);
+ vcpu->mmio_fragments[0].gpa, val);
vcpu->mmio_read_completed = 0;
return 1;
}
static int write_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes, void *val)
{
- trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+ trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, val);
return vcpu_mmio_write(vcpu, gpa, bytes, val);
}
static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
void *val, int bytes)
{
- trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0);
+ trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, NULL);
return X86EMUL_IO_NEEDED;
}
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
- struct fpu *fpu = ¤t->thread.fpu;
int r;
- fpu__initialize(fpu);
-
kvm_sigset_activate(vcpu);
+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
if (kvm_run->immediate_exit) {
r = -EINTR;
}
}
- kvm_load_guest_fpu(vcpu);
-
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out_fpu;
+ goto out;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
else
r = vcpu_run(vcpu);
-out_fpu:
- kvm_put_guest_fpu(vcpu);
out:
+ kvm_put_guest_fpu(vcpu);
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
#endif
kvm_rip_write(vcpu, regs->rip);
- kvm_set_rflags(vcpu, regs->rflags);
+ kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
vcpu->arch.exception.pending = false;
}
EXPORT_SYMBOL_GPL(kvm_task_switch);
+int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG_BIT)) {
+ /*
+ * When EFER.LME and CR0.PG are set, the processor is in
+ * 64-bit mode (though maybe in a 32-bit code segment).
+ * CR4.PAE and EFER.LMA must be set.
+ */
+ if (!(sregs->cr4 & X86_CR4_PAE_BIT)
+ || !(sregs->efer & EFER_LMA))
+ return -EINVAL;
+ } else {
+ /*
+ * Not in 64-bit mode: EFER.LMA is clear and the code
+ * segment cannot be 64-bit.
+ */
+ if (sregs->efer & EFER_LMA || sregs->cs.l)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
struct kvm_sregs *sregs)
{
(sregs->cr4 & X86_CR4_OSXSAVE))
return -EINVAL;
+ if (kvm_valid_sregs(vcpu, sregs))
+ return -EINVAL;
+
apic_base_msr.data = sregs->apic_base;
apic_base_msr.host_initiated = true;
if (kvm_set_apic_base(vcpu, &apic_base_msr))
delay = min_t(u64, MWAITX_MAX_LOOPS, loops);
/*
- * Use cpu_tss as a cacheline-aligned, seldomly
+ * Use cpu_tss_rw as a cacheline-aligned, seldomly
* accessed per-cpu variable as the monitor target.
*/
- __monitorx(raw_cpu_ptr(&cpu_tss), 0, 0);
+ __monitorx(raw_cpu_ptr(&cpu_tss_rw), 0, 0);
/*
* AMD, like Intel, supports the EAX hint and EAX=0xf
fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1)
fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1)
fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
-ff:
+ff: UD0
EndTable
Table: 3-byte opcode 1 (0x0f 0x38)
7e: vpermt2d/q Vx,Hx,Wx (66),(ev)
7f: vpermt2ps/d Vx,Hx,Wx (66),(ev)
80: INVEPT Gy,Mdq (66)
-81: INVPID Gy,Mdq (66)
+81: INVVPID Gy,Mdq (66)
82: INVPCID Gy,Mdq (66)
83: vpmultishiftqb Vx,Hx,Wx (66),(ev)
88: vexpandps/d Vpd,Wpd (66),(ev)
EndTable
GrpTable: Grp10
+# all are UD1
+0: UD1
+1: UD1
+2: UD1
+3: UD1
+4: UD1
+5: UD1
+6: UD1
+7: UD1
EndTable
# Grp11A and Grp11B are expressed as Grp11 in Intel SDM
endif
obj-y := init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
- pat.o pgtable.o physaddr.o setup_nx.o tlb.o
+ pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o
# Make sure __phys_addr has no stackprotector
nostackp := $(call cc-option, -fno-stack-protector)
obj-$(CONFIG_ACPI_NUMA) += srat.o
obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
-obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
-obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
-obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
+obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
--- /dev/null
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/spinlock.h>
+#include <linux/percpu.h>
+
+#include <asm/cpu_entry_area.h>
+#include <asm/pgtable.h>
+#include <asm/fixmap.h>
+#include <asm/desc.h>
+
+static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
+
+#ifdef CONFIG_X86_64
+static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+ [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]);
+#endif
+
+struct cpu_entry_area *get_cpu_entry_area(int cpu)
+{
+ unsigned long va = CPU_ENTRY_AREA_PER_CPU + cpu * CPU_ENTRY_AREA_SIZE;
+ BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0);
+
+ return (struct cpu_entry_area *) va;
+}
+EXPORT_SYMBOL(get_cpu_entry_area);
+
+void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags)
+{
+ unsigned long va = (unsigned long) cea_vaddr;
+
+ set_pte_vaddr(va, pfn_pte(pa >> PAGE_SHIFT, flags));
+}
+
+static void __init
+cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot)
+{
+ for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE)
+ cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot);
+}
+
+static void percpu_setup_debug_store(int cpu)
+{
+#ifdef CONFIG_CPU_SUP_INTEL
+ int npages;
+ void *cea;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return;
+
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_store;
+ npages = sizeof(struct debug_store) / PAGE_SIZE;
+ BUILD_BUG_ON(sizeof(struct debug_store) % PAGE_SIZE != 0);
+ cea_map_percpu_pages(cea, &per_cpu(cpu_debug_store, cpu), npages,
+ PAGE_KERNEL);
+
+ cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers;
+ /*
+ * Force the population of PMDs for not yet allocated per cpu
+ * memory like debug store buffers.
+ */
+ npages = sizeof(struct debug_store_buffers) / PAGE_SIZE;
+ for (; npages; npages--, cea += PAGE_SIZE)
+ cea_set_pte(cea, 0, PAGE_NONE);
+#endif
+}
+
+/* Setup the fixmap mappings only once per-processor */
+static void __init setup_cpu_entry_area(int cpu)
+{
+#ifdef CONFIG_X86_64
+ extern char _entry_trampoline[];
+
+ /* On 64-bit systems, we use a read-only fixmap GDT and TSS. */
+ pgprot_t gdt_prot = PAGE_KERNEL_RO;
+ pgprot_t tss_prot = PAGE_KERNEL_RO;
+#else
+ /*
+ * On native 32-bit systems, the GDT cannot be read-only because
+ * our double fault handler uses a task gate, and entering through
+ * a task gate needs to change an available TSS to busy. If the
+ * GDT is read-only, that will triple fault. The TSS cannot be
+ * read-only because the CPU writes to it on task switches.
+ *
+ * On Xen PV, the GDT must be read-only because the hypervisor
+ * requires it.
+ */
+ pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ?
+ PAGE_KERNEL_RO : PAGE_KERNEL;
+ pgprot_t tss_prot = PAGE_KERNEL;
+#endif
+
+ cea_set_pte(&get_cpu_entry_area(cpu)->gdt, get_cpu_gdt_paddr(cpu),
+ gdt_prot);
+
+ cea_map_percpu_pages(&get_cpu_entry_area(cpu)->entry_stack_page,
+ per_cpu_ptr(&entry_stack_storage, cpu), 1,
+ PAGE_KERNEL);
+
+ /*
+ * The Intel SDM says (Volume 3, 7.2.1):
+ *
+ * Avoid placing a page boundary in the part of the TSS that the
+ * processor reads during a task switch (the first 104 bytes). The
+ * processor may not correctly perform address translations if a
+ * boundary occurs in this area. During a task switch, the processor
+ * reads and writes into the first 104 bytes of each TSS (using
+ * contiguous physical addresses beginning with the physical address
+ * of the first byte of the TSS). So, after TSS access begins, if
+ * part of the 104 bytes is not physically contiguous, the processor
+ * will access incorrect information without generating a page-fault
+ * exception.
+ *
+ * There are also a lot of errata involving the TSS spanning a page
+ * boundary. Assert that we're not doing that.
+ */
+ BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^
+ offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK);
+ BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0);
+ cea_map_percpu_pages(&get_cpu_entry_area(cpu)->tss,
+ &per_cpu(cpu_tss_rw, cpu),
+ sizeof(struct tss_struct) / PAGE_SIZE, tss_prot);
+
+#ifdef CONFIG_X86_32
+ per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu);
+#endif
+
+#ifdef CONFIG_X86_64
+ BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0);
+ BUILD_BUG_ON(sizeof(exception_stacks) !=
+ sizeof(((struct cpu_entry_area *)0)->exception_stacks));
+ cea_map_percpu_pages(&get_cpu_entry_area(cpu)->exception_stacks,
+ &per_cpu(exception_stacks, cpu),
+ sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL);
+
+ cea_set_pte(&get_cpu_entry_area(cpu)->entry_trampoline,
+ __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX);
+#endif
+ percpu_setup_debug_store(cpu);
+}
+
+static __init void setup_cpu_entry_area_ptes(void)
+{
+#ifdef CONFIG_X86_32
+ unsigned long start, end;
+
+ BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);
+ BUG_ON(CPU_ENTRY_AREA_BASE & ~PMD_MASK);
+
+ start = CPU_ENTRY_AREA_BASE;
+ end = start + CPU_ENTRY_AREA_MAP_SIZE;
+
+ /* Careful here: start + PMD_SIZE might wrap around */
+ for (; start < end && start >= CPU_ENTRY_AREA_BASE; start += PMD_SIZE)
+ populate_extra_pte(start);
+#endif
+}
+
+void __init setup_cpu_entry_areas(void)
+{
+ unsigned int cpu;
+
+ setup_cpu_entry_area_ptes();
+
+ for_each_possible_cpu(cpu)
+ setup_cpu_entry_area(cpu);
+}
static int ptdump_show(struct seq_file *m, void *v)
{
- ptdump_walk_pgd_level(m, NULL);
+ ptdump_walk_pgd_level_debugfs(m, NULL, false);
return 0;
}
.release = single_release,
};
-static struct dentry *pe;
+static int ptdump_show_curknl(struct seq_file *m, void *v)
+{
+ if (current->mm->pgd) {
+ down_read(¤t->mm->mmap_sem);
+ ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
+ up_read(¤t->mm->mmap_sem);
+ }
+ return 0;
+}
+
+static int ptdump_open_curknl(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, ptdump_show_curknl, NULL);
+}
+
+static const struct file_operations ptdump_curknl_fops = {
+ .owner = THIS_MODULE,
+ .open = ptdump_open_curknl,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static struct dentry *pe_curusr;
+
+static int ptdump_show_curusr(struct seq_file *m, void *v)
+{
+ if (current->mm->pgd) {
+ down_read(¤t->mm->mmap_sem);
+ ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
+ up_read(¤t->mm->mmap_sem);
+ }
+ return 0;
+}
+
+static int ptdump_open_curusr(struct inode *inode, struct file *filp)
+{
+ return single_open(filp, ptdump_show_curusr, NULL);
+}
+
+static const struct file_operations ptdump_curusr_fops = {
+ .owner = THIS_MODULE,
+ .open = ptdump_open_curusr,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+#endif
+
+static struct dentry *dir, *pe_knl, *pe_curknl;
static int __init pt_dump_debug_init(void)
{
- pe = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL,
- &ptdump_fops);
- if (!pe)
+ dir = debugfs_create_dir("page_tables", NULL);
+ if (!dir)
return -ENOMEM;
+ pe_knl = debugfs_create_file("kernel", 0400, dir, NULL,
+ &ptdump_fops);
+ if (!pe_knl)
+ goto err;
+
+ pe_curknl = debugfs_create_file("current_kernel", 0400,
+ dir, NULL, &ptdump_curknl_fops);
+ if (!pe_curknl)
+ goto err;
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pe_curusr = debugfs_create_file("current_user", 0400,
+ dir, NULL, &ptdump_curusr_fops);
+ if (!pe_curusr)
+ goto err;
+#endif
return 0;
+err:
+ debugfs_remove_recursive(dir);
+ return -ENOMEM;
}
static void __exit pt_dump_debug_exit(void)
{
- debugfs_remove_recursive(pe);
+ debugfs_remove_recursive(dir);
}
module_init(pt_dump_debug_init);
unsigned long max_lines;
};
-/* indices for address_markers; keep sync'd w/ address_markers below */
+/* Address space markers hints */
+
+#ifdef CONFIG_X86_64
+
enum address_markers_idx {
USER_SPACE_NR = 0,
-#ifdef CONFIG_X86_64
KERNEL_SPACE_NR,
LOW_KERNEL_NR,
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && defined(CONFIG_X86_5LEVEL)
+ LDT_NR,
+#endif
VMALLOC_START_NR,
VMEMMAP_START_NR,
#ifdef CONFIG_KASAN
KASAN_SHADOW_START_NR,
KASAN_SHADOW_END_NR,
#endif
-# ifdef CONFIG_X86_ESPFIX64
+ CPU_ENTRY_AREA_NR,
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && !defined(CONFIG_X86_5LEVEL)
+ LDT_NR,
+#endif
+#ifdef CONFIG_X86_ESPFIX64
ESPFIX_START_NR,
-# endif
+#endif
+#ifdef CONFIG_EFI
+ EFI_END_NR,
+#endif
HIGH_KERNEL_NR,
MODULES_VADDR_NR,
MODULES_END_NR,
-#else
+ FIXADDR_START_NR,
+ END_OF_SPACE_NR,
+};
+
+static struct addr_marker address_markers[] = {
+ [USER_SPACE_NR] = { 0, "User Space" },
+ [KERNEL_SPACE_NR] = { (1UL << 63), "Kernel Space" },
+ [LOW_KERNEL_NR] = { 0UL, "Low Kernel Mapping" },
+ [VMALLOC_START_NR] = { 0UL, "vmalloc() Area" },
+ [VMEMMAP_START_NR] = { 0UL, "Vmemmap" },
+#ifdef CONFIG_KASAN
+ [KASAN_SHADOW_START_NR] = { KASAN_SHADOW_START, "KASAN shadow" },
+ [KASAN_SHADOW_END_NR] = { KASAN_SHADOW_END, "KASAN shadow end" },
+#endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ [LDT_NR] = { LDT_BASE_ADDR, "LDT remap" },
+#endif
+ [CPU_ENTRY_AREA_NR] = { CPU_ENTRY_AREA_BASE,"CPU entry Area" },
+#ifdef CONFIG_X86_ESPFIX64
+ [ESPFIX_START_NR] = { ESPFIX_BASE_ADDR, "ESPfix Area", 16 },
+#endif
+#ifdef CONFIG_EFI
+ [EFI_END_NR] = { EFI_VA_END, "EFI Runtime Services" },
+#endif
+ [HIGH_KERNEL_NR] = { __START_KERNEL_map, "High Kernel Mapping" },
+ [MODULES_VADDR_NR] = { MODULES_VADDR, "Modules" },
+ [MODULES_END_NR] = { MODULES_END, "End Modules" },
+ [FIXADDR_START_NR] = { FIXADDR_START, "Fixmap Area" },
+ [END_OF_SPACE_NR] = { -1, NULL }
+};
+
+#else /* CONFIG_X86_64 */
+
+enum address_markers_idx {
+ USER_SPACE_NR = 0,
KERNEL_SPACE_NR,
VMALLOC_START_NR,
VMALLOC_END_NR,
-# ifdef CONFIG_HIGHMEM
+#ifdef CONFIG_HIGHMEM
PKMAP_BASE_NR,
-# endif
- FIXADDR_START_NR,
#endif
+ CPU_ENTRY_AREA_NR,
+ FIXADDR_START_NR,
+ END_OF_SPACE_NR,
};
-/* Address space markers hints */
static struct addr_marker address_markers[] = {
- { 0, "User Space" },
-#ifdef CONFIG_X86_64
- { 0x8000000000000000UL, "Kernel Space" },
- { 0/* PAGE_OFFSET */, "Low Kernel Mapping" },
- { 0/* VMALLOC_START */, "vmalloc() Area" },
- { 0/* VMEMMAP_START */, "Vmemmap" },
-#ifdef CONFIG_KASAN
- { KASAN_SHADOW_START, "KASAN shadow" },
- { KASAN_SHADOW_END, "KASAN shadow end" },
+ [USER_SPACE_NR] = { 0, "User Space" },
+ [KERNEL_SPACE_NR] = { PAGE_OFFSET, "Kernel Mapping" },
+ [VMALLOC_START_NR] = { 0UL, "vmalloc() Area" },
+ [VMALLOC_END_NR] = { 0UL, "vmalloc() End" },
+#ifdef CONFIG_HIGHMEM
+ [PKMAP_BASE_NR] = { 0UL, "Persistent kmap() Area" },
#endif
-# ifdef CONFIG_X86_ESPFIX64
- { ESPFIX_BASE_ADDR, "ESPfix Area", 16 },
-# endif
-# ifdef CONFIG_EFI
- { EFI_VA_END, "EFI Runtime Services" },
-# endif
- { __START_KERNEL_map, "High Kernel Mapping" },
- { MODULES_VADDR, "Modules" },
- { MODULES_END, "End Modules" },
-#else
- { PAGE_OFFSET, "Kernel Mapping" },
- { 0/* VMALLOC_START */, "vmalloc() Area" },
- { 0/*VMALLOC_END*/, "vmalloc() End" },
-# ifdef CONFIG_HIGHMEM
- { 0/*PKMAP_BASE*/, "Persistent kmap() Area" },
-# endif
- { 0/*FIXADDR_START*/, "Fixmap Area" },
-#endif
- { -1, NULL } /* End of list */
+ [CPU_ENTRY_AREA_NR] = { 0UL, "CPU entry area" },
+ [FIXADDR_START_NR] = { 0UL, "Fixmap area" },
+ [END_OF_SPACE_NR] = { -1, NULL }
};
+#endif /* !CONFIG_X86_64 */
+
/* Multipliers for offsets within the PTEs */
#define PTE_LEVEL_MULT (PAGE_SIZE)
#define PMD_LEVEL_MULT (PTRS_PER_PTE * PTE_LEVEL_MULT)
static const char * const level_name[] =
{ "cr3", "pgd", "p4d", "pud", "pmd", "pte" };
- if (!pgprot_val(prot)) {
+ if (!(pr & _PAGE_PRESENT)) {
/* Not present */
pt_dump_cont_printf(m, dmsg, " ");
} else {
}
static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
- bool checkwx)
+ bool checkwx, bool dmesg)
{
#ifdef CONFIG_X86_64
pgd_t *start = (pgd_t *) &init_top_pgt;
if (pgd) {
start = pgd;
- st.to_dmesg = true;
+ st.to_dmesg = dmesg;
}
st.check_wx = checkwx;
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
{
- ptdump_walk_pgd_level_core(m, pgd, false);
+ ptdump_walk_pgd_level_core(m, pgd, false, true);
+}
+
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ if (user && static_cpu_has(X86_FEATURE_PTI))
+ pgd = kernel_to_user_pgdp(pgd);
+#endif
+ ptdump_walk_pgd_level_core(m, pgd, false, false);
+}
+EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
+
+static void ptdump_walk_user_pgd_level_checkwx(void)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+ pgd_t *pgd = (pgd_t *) &init_top_pgt;
+
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ pr_info("x86/mm: Checking user space page tables\n");
+ pgd = kernel_to_user_pgdp(pgd);
+ ptdump_walk_pgd_level_core(NULL, pgd, true, false);
+#endif
}
-EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level);
void ptdump_walk_pgd_level_checkwx(void)
{
- ptdump_walk_pgd_level_core(NULL, NULL, true);
+ ptdump_walk_pgd_level_core(NULL, NULL, true, false);
+ ptdump_walk_user_pgd_level_checkwx();
}
static int __init pt_dump_init(void)
address_markers[PKMAP_BASE_NR].start_address = PKMAP_BASE;
# endif
address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
+ address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE;
#endif
-
return 0;
}
__initcall(pt_dump_init);
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
+ printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->ip, (void *)regs->sp, error_code);
#include <asm/kaslr.h>
#include <asm/hypervisor.h>
#include <asm/cpufeature.h>
+#include <asm/pti.h>
/*
* We need to define the tracepoints somewhere, and tlb.c
static int page_size_mask;
+static void enable_global_pages(void)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ __supported_pte_mask |= _PAGE_GLOBAL;
+}
+
static void __init probe_page_size_mask(void)
{
/*
cr4_set_bits_and_update_boot(X86_CR4_PSE);
/* Enable PGE if available */
+ __supported_pte_mask &= ~_PAGE_GLOBAL;
if (boot_cpu_has(X86_FEATURE_PGE)) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
- __supported_pte_mask |= _PAGE_GLOBAL;
- } else
- __supported_pte_mask &= ~_PAGE_GLOBAL;
+ enable_global_pages();
+ }
/* Enable 1 GB linear kernel mappings if available: */
if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) {
static void setup_pcid(void)
{
-#ifdef CONFIG_X86_64
- if (boot_cpu_has(X86_FEATURE_PCID)) {
- if (boot_cpu_has(X86_FEATURE_PGE)) {
- /*
- * This can't be cr4_set_bits_and_update_boot() --
- * the trampoline code can't handle CR4.PCIDE and
- * it wouldn't do any good anyway. Despite the name,
- * cr4_set_bits_and_update_boot() doesn't actually
- * cause the bits in question to remain set all the
- * way through the secondary boot asm.
- *
- * Instead, we brute-force it and set CR4.PCIDE
- * manually in start_secondary().
- */
- cr4_set_bits(X86_CR4_PCIDE);
- } else {
- /*
- * flush_tlb_all(), as currently implemented, won't
- * work if PCID is on but PGE is not. Since that
- * combination doesn't exist on real hardware, there's
- * no reason to try to fully support it, but it's
- * polite to avoid corrupting data if we're on
- * an improperly configured VM.
- */
- setup_clear_cpu_cap(X86_FEATURE_PCID);
- }
+ if (!IS_ENABLED(CONFIG_X86_64))
+ return;
+
+ if (!boot_cpu_has(X86_FEATURE_PCID))
+ return;
+
+ if (boot_cpu_has(X86_FEATURE_PGE)) {
+ /*
+ * This can't be cr4_set_bits_and_update_boot() -- the
+ * trampoline code can't handle CR4.PCIDE and it wouldn't
+ * do any good anyway. Despite the name,
+ * cr4_set_bits_and_update_boot() doesn't actually cause
+ * the bits in question to remain set all the way through
+ * the secondary boot asm.
+ *
+ * Instead, we brute-force it and set CR4.PCIDE manually in
+ * start_secondary().
+ */
+ cr4_set_bits(X86_CR4_PCIDE);
+
+ /*
+ * INVPCID's single-context modes (2/3) only work if we set
+ * X86_CR4_PCIDE, *and* we INVPCID support. It's unusable
+ * on systems that have X86_CR4_PCIDE clear, or that have
+ * no INVPCID support at all.
+ */
+ if (boot_cpu_has(X86_FEATURE_INVPCID))
+ setup_force_cpu_cap(X86_FEATURE_INVPCID_SINGLE);
+ } else {
+ /*
+ * flush_tlb_all(), as currently implemented, won't work if
+ * PCID is on but PGE is not. Since that combination
+ * doesn't exist on real hardware, there's no reason to try
+ * to fully support it, but it's polite to avoid corrupting
+ * data if we're on an improperly configured VM.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_PCID);
}
-#endif
}
#ifdef CONFIG_X86_32
{
unsigned long end;
+ pti_check_boottime_disable();
probe_page_size_mask();
setup_pcid();
free_area_init_nodes(max_zone_pfns);
}
-DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
+__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
.loaded_mm = &init_mm,
.next_asid = 1,
.cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */
};
-EXPORT_SYMBOL_GPL(cpu_tlbstate);
+EXPORT_PER_CPU_SYMBOL(cpu_tlbstate);
void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache)
{
#include <asm/setup.h>
#include <asm/set_memory.h>
#include <asm/page_types.h>
+#include <asm/cpu_entry_area.h>
#include <asm/init.h>
#include "mm_internal.h"
mem_init_print_info(NULL);
printk(KERN_INFO "virtual kernel memory layout:\n"
" fixmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
+ " cpu_entry : 0x%08lx - 0x%08lx (%4ld kB)\n"
#ifdef CONFIG_HIGHMEM
" pkmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
#endif
FIXADDR_START, FIXADDR_TOP,
(FIXADDR_TOP - FIXADDR_START) >> 10,
+ CPU_ENTRY_AREA_BASE,
+ CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE,
+ CPU_ENTRY_AREA_MAP_SIZE >> 10,
+
#ifdef CONFIG_HIGHMEM
PKMAP_BASE, PKMAP_BASE+LAST_PKMAP*PAGE_SIZE,
(LAST_PKMAP*PAGE_SIZE) >> 10,
return;
}
+ mmiotrace_iounmap(addr);
+
addr = (volatile void __iomem *)
(PAGE_MASK & (unsigned long __force)addr);
- mmiotrace_iounmap(addr);
-
/* Use the vm area unlocked, assuming the caller
ensures there isn't another iounmap for the same address
in parallel. Reuse of the virtual address is prevented by
#include <asm/tlbflush.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
+#include <asm/cpu_entry_area.h>
extern struct range pfn_mapped[E820_MAX_ENTRIES];
void __init kasan_init(void)
{
int i;
+ void *shadow_cpu_entry_begin, *shadow_cpu_entry_end;
#ifdef CONFIG_KASAN_INLINE
register_die_notifier(&kasan_die_notifier);
map_range(&pfn_mapped[i]);
}
+ shadow_cpu_entry_begin = (void *)CPU_ENTRY_AREA_BASE;
+ shadow_cpu_entry_begin = kasan_mem_to_shadow(shadow_cpu_entry_begin);
+ shadow_cpu_entry_begin = (void *)round_down((unsigned long)shadow_cpu_entry_begin,
+ PAGE_SIZE);
+
+ shadow_cpu_entry_end = (void *)(CPU_ENTRY_AREA_BASE +
+ CPU_ENTRY_AREA_MAP_SIZE);
+ shadow_cpu_entry_end = kasan_mem_to_shadow(shadow_cpu_entry_end);
+ shadow_cpu_entry_end = (void *)round_up((unsigned long)shadow_cpu_entry_end,
+ PAGE_SIZE);
+
kasan_populate_zero_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
- kasan_mem_to_shadow((void *)__START_KERNEL_map));
+ shadow_cpu_entry_begin);
+
+ kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin,
+ (unsigned long)shadow_cpu_entry_end, 0);
+
+ kasan_populate_zero_shadow(shadow_cpu_entry_end,
+ kasan_mem_to_shadow((void *)__START_KERNEL_map));
kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext),
(unsigned long)kasan_mem_to_shadow(_end),
early_pfn_to_nid(__pa(_stext)));
kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
- (void *)KASAN_SHADOW_END);
+ (void *)KASAN_SHADOW_END);
load_cr3(init_top_pgt);
__flush_tlb_all();
#define TB_SHIFT 40
/*
- * Virtual address start and end range for randomization. The end changes base
- * on configuration to have the highest amount of space for randomization.
- * It increases the possible random position for each randomized region.
+ * Virtual address start and end range for randomization.
*
- * You need to add an if/def entry if you introduce a new memory region
- * compatible with KASLR. Your entry must be in logical order with memory
- * layout. For example, ESPFIX is before EFI because its virtual address is
- * before. You also need to add a BUILD_BUG_ON() in kernel_randomize_memory() to
- * ensure that this order is correct and won't be changed.
+ * The end address could depend on more configuration options to make the
+ * highest amount of space for randomization available, but that's too hard
+ * to keep straight and caused issues already.
*/
static const unsigned long vaddr_start = __PAGE_OFFSET_BASE;
-
-#if defined(CONFIG_X86_ESPFIX64)
-static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
-#elif defined(CONFIG_EFI)
-static const unsigned long vaddr_end = EFI_VA_END;
-#else
-static const unsigned long vaddr_end = __START_KERNEL_map;
-#endif
+static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
/* Default values */
unsigned long page_offset_base = __PAGE_OFFSET_BASE;
unsigned long remain_entropy;
/*
- * All these BUILD_BUG_ON checks ensures the memory layout is
- * consistent with the vaddr_start/vaddr_end variables.
+ * These BUILD_BUG_ON checks ensure the memory layout is consistent
+ * with the vaddr_start/vaddr_end variables. These checks are very
+ * limited....
*/
BUILD_BUG_ON(vaddr_start >= vaddr_end);
- BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) &&
- vaddr_end >= EFI_VA_END);
- BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) ||
- IS_ENABLED(CONFIG_EFI)) &&
- vaddr_end >= __START_KERNEL_map);
+ BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE);
BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
if (!kaslr_memory_enabled())
unsigned long flags;
int ret = 0;
unsigned long size = 0;
+ unsigned long addr = p->addr & PAGE_MASK;
const unsigned long size_lim = p->len + (p->addr & ~PAGE_MASK);
unsigned int l;
pte_t *pte;
spin_lock_irqsave(&kmmio_lock, flags);
- if (get_kmmio_probe(p->addr)) {
+ if (get_kmmio_probe(addr)) {
ret = -EEXIST;
goto out;
}
- pte = lookup_address(p->addr, &l);
+ pte = lookup_address(addr, &l);
if (!pte) {
ret = -EINVAL;
goto out;
kmmio_count++;
list_add_rcu(&p->list, &kmmio_probes);
while (size < size_lim) {
- if (add_kmmio_fault_page(p->addr + size))
+ if (add_kmmio_fault_page(addr + size))
pr_err("Unable to set page fault.\n");
size += page_level_size(l);
}
{
unsigned long flags;
unsigned long size = 0;
+ unsigned long addr = p->addr & PAGE_MASK;
const unsigned long size_lim = p->len + (p->addr & ~PAGE_MASK);
struct kmmio_fault_page *release_list = NULL;
struct kmmio_delayed_release *drelease;
unsigned int l;
pte_t *pte;
- pte = lookup_address(p->addr, &l);
+ pte = lookup_address(addr, &l);
if (!pte)
return;
spin_lock_irqsave(&kmmio_lock, flags);
while (size < size_lim) {
- release_kmmio_fault_page(p->addr + size, &release_list);
+ release_kmmio_fault_page(addr + size, &release_list);
size += page_level_size(l);
}
list_del_rcu(&p->list);
{
return sme_me_mask && !sev_enabled;
}
-EXPORT_SYMBOL_GPL(sme_active);
+EXPORT_SYMBOL(sme_active);
bool sev_active(void)
{
return sme_me_mask && sev_enabled;
}
-EXPORT_SYMBOL_GPL(sev_active);
+EXPORT_SYMBOL(sev_active);
static const struct dma_map_ops sev_dma_ops = {
.alloc = sev_alloc,
kmem_cache_free(pgd_cache, pgd);
}
#else
+
static inline pgd_t *_pgd_alloc(void)
{
- return (pgd_t *)__get_free_page(PGALLOC_GFP);
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
}
static inline void _pgd_free(pgd_t *pgd)
{
- free_page((unsigned long)pgd);
+ free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
}
#endif /* CONFIG_X86_PAE */
#include <linux/pagemap.h>
#include <linux/spinlock.h>
+#include <asm/cpu_entry_area.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/fixmap.h>
--- /dev/null
+/*
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * This code is based in part on work published here:
+ *
+ * https://github.com/IAIK/KAISER
+ *
+ * The original work was written by and and signed off by for the Linux
+ * kernel by:
+ *
+ * Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
+ * Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
+ * Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
+ * Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
+ *
+ * Major changes to the original code by: Dave Hansen <dave.hansen@intel.com>
+ * Mostly rewritten by Thomas Gleixner <tglx@linutronix.de> and
+ * Andy Lutomirsky <luto@amacapital.net>
+ */
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+#include <asm/cpufeature.h>
+#include <asm/hypervisor.h>
+#include <asm/vsyscall.h>
+#include <asm/cmdline.h>
+#include <asm/pti.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm/desc.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt
+
+/* Backporting helper */
+#ifndef __GFP_NOTRACK
+#define __GFP_NOTRACK 0
+#endif
+
+static void __init pti_print_if_insecure(const char *reason)
+{
+ if (boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
+ pr_info("%s\n", reason);
+}
+
+static void __init pti_print_if_secure(const char *reason)
+{
+ if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
+ pr_info("%s\n", reason);
+}
+
+void __init pti_check_boottime_disable(void)
+{
+ char arg[5];
+ int ret;
+
+ if (hypervisor_is_type(X86_HYPER_XEN_PV)) {
+ pti_print_if_insecure("disabled on XEN PV.");
+ return;
+ }
+
+ ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+ if (ret > 0) {
+ if (ret == 3 && !strncmp(arg, "off", 3)) {
+ pti_print_if_insecure("disabled on command line.");
+ return;
+ }
+ if (ret == 2 && !strncmp(arg, "on", 2)) {
+ pti_print_if_secure("force enabled on command line.");
+ goto enable;
+ }
+ if (ret == 4 && !strncmp(arg, "auto", 4))
+ goto autosel;
+ }
+
+ if (cmdline_find_option_bool(boot_command_line, "nopti")) {
+ pti_print_if_insecure("disabled on command line.");
+ return;
+ }
+
+autosel:
+ if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
+ return;
+enable:
+ setup_force_cpu_cap(X86_FEATURE_PTI);
+}
+
+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ /*
+ * Changes to the high (kernel) portion of the kernelmode page
+ * tables are not automatically propagated to the usermode tables.
+ *
+ * Users should keep in mind that, unlike the kernelmode tables,
+ * there is no vmalloc_fault equivalent for the usermode tables.
+ * Top-level entries added to init_mm's usermode pgd after boot
+ * will not be automatically propagated to other mms.
+ */
+ if (!pgdp_maps_userspace(pgdp))
+ return pgd;
+
+ /*
+ * The user page tables get the full PGD, accessible from
+ * userspace:
+ */
+ kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd;
+
+ /*
+ * If this is normal user memory, make it NX in the kernel
+ * pagetables so that, if we somehow screw up and return to
+ * usermode with the kernel CR3 loaded, we'll get a page fault
+ * instead of allowing user code to execute with the wrong CR3.
+ *
+ * As exceptions, we don't set NX if:
+ * - _PAGE_USER is not set. This could be an executable
+ * EFI runtime mapping or something similar, and the kernel
+ * may execute from it
+ * - we don't have NX support
+ * - we're clearing the PGD (i.e. the new pgd is not present).
+ */
+ if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) &&
+ (__supported_pte_mask & _PAGE_NX))
+ pgd.pgd |= _PAGE_NX;
+
+ /* return the copy of the PGD we want the kernel to use: */
+ return pgd;
+}
+
+/*
+ * Walk the user copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.
+ *
+ * Returns a pointer to a P4D on success, or NULL on failure.
+ */
+static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+{
+ pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+
+ if (address < PAGE_OFFSET) {
+ WARN_ONCE(1, "attempt to walk user address\n");
+ return NULL;
+ }
+
+ if (pgd_none(*pgd)) {
+ unsigned long new_p4d_page = __get_free_page(gfp);
+ if (!new_p4d_page)
+ return NULL;
+
+ if (pgd_none(*pgd)) {
+ set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
+ new_p4d_page = 0;
+ }
+ if (new_p4d_page)
+ free_page(new_p4d_page);
+ }
+ BUILD_BUG_ON(pgd_large(*pgd) != 0);
+
+ return p4d_offset(pgd, address);
+}
+
+/*
+ * Walk the user copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.
+ *
+ * Returns a pointer to a PMD on success, or NULL on failure.
+ */
+static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+{
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
+ pud_t *pud;
+
+ BUILD_BUG_ON(p4d_large(*p4d) != 0);
+ if (p4d_none(*p4d)) {
+ unsigned long new_pud_page = __get_free_page(gfp);
+ if (!new_pud_page)
+ return NULL;
+
+ if (p4d_none(*p4d)) {
+ set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
+ new_pud_page = 0;
+ }
+ if (new_pud_page)
+ free_page(new_pud_page);
+ }
+
+ pud = pud_offset(p4d, address);
+ /* The user page tables do not use large mappings: */
+ if (pud_large(*pud)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pud_none(*pud)) {
+ unsigned long new_pmd_page = __get_free_page(gfp);
+ if (!new_pmd_page)
+ return NULL;
+
+ if (pud_none(*pud)) {
+ set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ new_pmd_page = 0;
+ }
+ if (new_pmd_page)
+ free_page(new_pmd_page);
+ }
+
+ return pmd_offset(pud, address);
+}
+
+#ifdef CONFIG_X86_VSYSCALL_EMULATION
+/*
+ * Walk the shadow copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down. Does not support large pages.
+ *
+ * Note: this is only used when mapping *new* kernel data into the
+ * user/shadow page tables. It is never used for userspace data.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static __init pte_t *pti_user_pagetable_walk_pte(unsigned long address)
+{
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+ pmd_t *pmd = pti_user_pagetable_walk_pmd(address);
+ pte_t *pte;
+
+ /* We can't do anything sensible if we hit a large mapping. */
+ if (pmd_large(*pmd)) {
+ WARN_ON(1);
+ return NULL;
+ }
+
+ if (pmd_none(*pmd)) {
+ unsigned long new_pte_page = __get_free_page(gfp);
+ if (!new_pte_page)
+ return NULL;
+
+ if (pmd_none(*pmd)) {
+ set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ new_pte_page = 0;
+ }
+ if (new_pte_page)
+ free_page(new_pte_page);
+ }
+
+ pte = pte_offset_kernel(pmd, address);
+ if (pte_flags(*pte) & _PAGE_USER) {
+ WARN_ONCE(1, "attempt to walk to user pte\n");
+ return NULL;
+ }
+ return pte;
+}
+
+static void __init pti_setup_vsyscall(void)
+{
+ pte_t *pte, *target_pte;
+ unsigned int level;
+
+ pte = lookup_address(VSYSCALL_ADDR, &level);
+ if (!pte || WARN_ON(level != PG_LEVEL_4K) || pte_none(*pte))
+ return;
+
+ target_pte = pti_user_pagetable_walk_pte(VSYSCALL_ADDR);
+ if (WARN_ON(!target_pte))
+ return;
+
+ *target_pte = *pte;
+ set_vsyscall_pgtable_user_bits(kernel_to_user_pgdp(swapper_pg_dir));
+}
+#else
+static void __init pti_setup_vsyscall(void) { }
+#endif
+
+static void __init
+pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
+{
+ unsigned long addr;
+
+ /*
+ * Clone the populated PMDs which cover start to end. These PMD areas
+ * can have holes.
+ */
+ for (addr = start; addr < end; addr += PMD_SIZE) {
+ pmd_t *pmd, *target_pmd;
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+
+ pgd = pgd_offset_k(addr);
+ if (WARN_ON(pgd_none(*pgd)))
+ return;
+ p4d = p4d_offset(pgd, addr);
+ if (WARN_ON(p4d_none(*p4d)))
+ return;
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud))
+ continue;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ continue;
+
+ target_pmd = pti_user_pagetable_walk_pmd(addr);
+ if (WARN_ON(!target_pmd))
+ return;
+
+ /*
+ * Copy the PMD. That is, the kernelmode and usermode
+ * tables will share the last-level page tables of this
+ * address range
+ */
+ *target_pmd = pmd_clear_flags(*pmd, clear);
+ }
+}
+
+/*
+ * Clone a single p4d (i.e. a top-level entry on 4-level systems and a
+ * next-level entry on 5-level systems.
+ */
+static void __init pti_clone_p4d(unsigned long addr)
+{
+ p4d_t *kernel_p4d, *user_p4d;
+ pgd_t *kernel_pgd;
+
+ user_p4d = pti_user_pagetable_walk_p4d(addr);
+ kernel_pgd = pgd_offset_k(addr);
+ kernel_p4d = p4d_offset(kernel_pgd, addr);
+ *user_p4d = *kernel_p4d;
+}
+
+/*
+ * Clone the CPU_ENTRY_AREA into the user space visible page table.
+ */
+static void __init pti_clone_user_shared(void)
+{
+ pti_clone_p4d(CPU_ENTRY_AREA_BASE);
+}
+
+/*
+ * Clone the ESPFIX P4D into the user space visinble page table
+ */
+static void __init pti_setup_espfix64(void)
+{
+#ifdef CONFIG_X86_ESPFIX64
+ pti_clone_p4d(ESPFIX_BASE_ADDR);
+#endif
+}
+
+/*
+ * Clone the populated PMDs of the entry and irqentry text and force it RO.
+ */
+static void __init pti_clone_entry_text(void)
+{
+ pti_clone_pmds((unsigned long) __entry_text_start,
+ (unsigned long) __irqentry_text_end,
+ _PAGE_RW | _PAGE_GLOBAL);
+}
+
+/*
+ * Initialize kernel page table isolation
+ */
+void __init pti_init(void)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ pr_info("enabled\n");
+
+ pti_clone_user_shared();
+ pti_clone_entry_text();
+ pti_setup_espfix64();
+ pti_setup_vsyscall();
+}
* Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
*/
+/*
+ * We get here when we do something requiring a TLB invalidation
+ * but could not go invalidate all of the contexts. We do the
+ * necessary invalidation by clearing out the 'ctx_id' which
+ * forces a TLB flush when the context is loaded.
+ */
+void clear_asid_other(void)
+{
+ u16 asid;
+
+ /*
+ * This is only expected to be set if we have disabled
+ * kernel _PAGE_GLOBAL pages.
+ */
+ if (!static_cpu_has(X86_FEATURE_PTI)) {
+ WARN_ON_ONCE(1);
+ return;
+ }
+
+ for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) {
+ /* Do not need to flush the current asid */
+ if (asid == this_cpu_read(cpu_tlbstate.loaded_mm_asid))
+ continue;
+ /*
+ * Make sure the next time we go to switch to
+ * this asid, we do a flush:
+ */
+ this_cpu_write(cpu_tlbstate.ctxs[asid].ctx_id, 0);
+ }
+ this_cpu_write(cpu_tlbstate.invalidate_other, false);
+}
+
atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);
return;
}
+ if (this_cpu_read(cpu_tlbstate.invalidate_other))
+ clear_asid_other();
+
for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) {
if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) !=
next->context.ctx_id)
*need_flush = true;
}
+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+{
+ unsigned long new_mm_cr3;
+
+ if (need_flush) {
+ invalidate_user_asid(new_asid);
+ new_mm_cr3 = build_cr3(pgdir, new_asid);
+ } else {
+ new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+ }
+
+ /*
+ * Caution: many callers of this function expect
+ * that load_cr3() is serializing and orders TLB
+ * fills with respect to the mm_cpumask writes.
+ */
+ write_cr3(new_mm_cr3);
+}
+
void leave_mm(int cpu)
{
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
* isn't free.
*/
#ifdef CONFIG_DEBUG_VM
- if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) {
+ if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
/*
* If we were to BUG here, we'd be very likely to kill
* the system so hard that we don't see the call trace.
if (need_flush) {
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
- write_cr3(build_cr3(next, new_asid));
+ load_new_mm_cr3(next->pgd, new_asid, true);
/*
* NB: This gets called via leave_mm() in the idle path
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
} else {
/* The new ASID is already up to date. */
- write_cr3(build_cr3_noflush(next, new_asid));
+ load_new_mm_cr3(next->pgd, new_asid, false);
/* See above wrt _rcuidle. */
trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
!(cr4_read_shadow() & X86_CR4_PCIDE));
/* Force ASID 0 and force a TLB flush. */
- write_cr3(build_cr3(mm, 0));
+ write_cr3(build_cr3(mm->pgd, 0));
/* Reinitialize tlbstate. */
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
/* flush range by one by one 'invlpg' */
for (addr = f->start; addr < f->end; addr += PAGE_SIZE)
- __flush_tlb_single(addr);
+ __flush_tlb_one(addr);
}
void flush_tlb_kernel_range(unsigned long start, unsigned long end)
unsigned i;
u32 base, limit, high;
struct resource *res, *conflict;
+ struct pci_dev *other;
+
+ /* Check that we are the only device of that type */
+ other = pci_get_device(dev->vendor, dev->device, NULL);
+ if (other != dev ||
+ (other = pci_get_device(dev->vendor, dev->device, other))) {
+ /* This is a multi-socket system, don't touch it for now */
+ pci_dev_put(other);
+ return;
+ }
for (i = 0; i < 8; i++) {
pci_read_config_dword(dev, AMD_141b_MMIO_BASE(i), &base);
res->end = 0xfd00000000ull - 1;
/* Just grab the free area behind system memory for this */
- while ((conflict = request_resource_conflict(&iomem_resource, res)))
+ while ((conflict = request_resource_conflict(&iomem_resource, res))) {
+ if (conflict->end >= res->end) {
+ kfree(res);
+ return;
+ }
res->start = conflict->end + 1;
+ }
dev_info(&dev->dev, "adding root bus resource %pR\n", res);
pci_bus_add_resource(dev->bus, res, 0);
}
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1401, pci_amd_enable_64bit_bar);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x141b, pci_amd_enable_64bit_bar);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1571, pci_amd_enable_64bit_bar);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x15b1, pci_amd_enable_64bit_bar);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1601, pci_amd_enable_64bit_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1401, pci_amd_enable_64bit_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x141b, pci_amd_enable_64bit_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1571, pci_amd_enable_64bit_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15b1, pci_amd_enable_64bit_bar);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1601, pci_amd_enable_64bit_bar);
#endif
* because we want to avoid inserting EFI region mappings (EFI_VA_END
* to EFI_VA_START) into the standard kernel page tables. Everything
* else can be shared, see efi_sync_low_kernel_mappings().
+ *
+ * We don't want the pgd on the pgd_list and cannot use pgd_alloc() for the
+ * allocation.
*/
int __init efi_alloc_page_tables(void)
{
return 0;
gfp_mask = GFP_KERNEL | __GFP_ZERO;
- efi_pgd = (pgd_t *)__get_free_page(gfp_mask);
+ efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER);
if (!efi_pgd)
return -ENOMEM;
/*
* Update the first page pointer to skip over the CSH header.
*/
- cap_info->pages[0] += csh->headersize;
+ cap_info->phys[0] += csh->headersize;
+
+ /*
+ * cap_info->capsule should point at a virtual mapping of the entire
+ * capsule, starting at the capsule header. Our image has the Quark
+ * security header prepended, so we cannot rely on the default vmap()
+ * mapping created by the generic capsule code.
+ * Given that the Quark firmware does not appear to care about the
+ * virtual mapping, let's just point cap_info->capsule at our copy
+ * of the capsule header.
+ */
+ cap_info->capsule = &cap_info->header;
return 1;
}
local_flush_tlb();
stat->d_alltlb++;
} else {
- __flush_tlb_one(msg->address);
+ __flush_tlb_single(msg->address);
stat->d_onetlb++;
}
stat->d_requestee++;
* on the specified blade to allow the sending of MSIs to the specified CPU.
*/
static int uv_domain_activate(struct irq_domain *domain,
- struct irq_data *irq_data, bool early)
+ struct irq_data *irq_data, bool reserve)
{
uv_program_mmr(irqd_cfg(irq_data), irq_data->chip_data);
return 0;
/*
* descriptor tables
*/
-#ifdef CONFIG_X86_32
store_idt(&ctxt->idt);
-#else
-/* CONFIG_X86_64 */
- store_idt((struct desc_ptr *)&ctxt->idt_limit);
-#endif
+
/*
* We save it here, but restore it only in the hibernate case.
* For ACPI S3 resume, this is loaded via 'early_gdt_desc' in 64-bit
/*
* segment registers
*/
-#ifdef CONFIG_X86_32
- savesegment(es, ctxt->es);
- savesegment(fs, ctxt->fs);
+#ifdef CONFIG_X86_32_LAZY_GS
savesegment(gs, ctxt->gs);
- savesegment(ss, ctxt->ss);
-#else
-/* CONFIG_X86_64 */
- asm volatile ("movw %%ds, %0" : "=m" (ctxt->ds));
- asm volatile ("movw %%es, %0" : "=m" (ctxt->es));
- asm volatile ("movw %%fs, %0" : "=m" (ctxt->fs));
- asm volatile ("movw %%gs, %0" : "=m" (ctxt->gs));
- asm volatile ("movw %%ss, %0" : "=m" (ctxt->ss));
+#endif
+#ifdef CONFIG_X86_64
+ savesegment(gs, ctxt->gs);
+ savesegment(fs, ctxt->fs);
+ savesegment(ds, ctxt->ds);
+ savesegment(es, ctxt->es);
rdmsrl(MSR_FS_BASE, ctxt->fs_base);
- rdmsrl(MSR_GS_BASE, ctxt->gs_base);
- rdmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base);
+ rdmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
+ rdmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
mtrr_save_fixed_ranges(NULL);
rdmsrl(MSR_EFER, ctxt->efer);
static void fix_processor_context(void)
{
int cpu = smp_processor_id();
- struct tss_struct *t = &per_cpu(cpu_tss, cpu);
#ifdef CONFIG_X86_64
struct desc_struct *desc = get_cpu_gdt_rw(cpu);
tss_desc tss;
#endif
- set_tss_desc(cpu, t); /*
- * This just modifies memory; should not be
- * necessary. But... This is necessary, because
- * 386 hardware has concept of busy TSS or some
- * similar stupidity.
- */
+
+ /*
+ * We need to reload TR, which requires that we change the
+ * GDT entry to indicate "available" first.
+ *
+ * XXX: This could probably all be replaced by a call to
+ * force_reload_TR().
+ */
+ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
#ifdef CONFIG_X86_64
memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc));
write_gdt_entry(desc, GDT_ENTRY_TSS, &tss, DESC_TSS);
syscall_init(); /* This sets MSR_*STAR and related */
+#else
+ if (boot_cpu_has(X86_FEATURE_SEP))
+ enable_sep_cpu();
#endif
load_TR_desc(); /* This does ltr */
load_mm_ldt(current->active_mm); /* This does lldt */
}
/**
- * __restore_processor_state - restore the contents of CPU registers saved
- * by __save_processor_state()
- * @ctxt - structure to load the registers contents from
+ * __restore_processor_state - restore the contents of CPU registers saved
+ * by __save_processor_state()
+ * @ctxt - structure to load the registers contents from
+ *
+ * The asm code that gets us here will have restored a usable GDT, although
+ * it will be pointing to the wrong alias.
*/
static void notrace __restore_processor_state(struct saved_context *ctxt)
{
write_cr2(ctxt->cr2);
write_cr0(ctxt->cr0);
+ /* Restore the IDT. */
+ load_idt(&ctxt->idt);
+
/*
- * now restore the descriptor tables to their proper values
- * ltr is done i fix_processor_context().
+ * Just in case the asm code got us here with the SS, DS, or ES
+ * out of sync with the GDT, update them.
*/
-#ifdef CONFIG_X86_32
- load_idt(&ctxt->idt);
-#else
-/* CONFIG_X86_64 */
- load_idt((const struct desc_ptr *)&ctxt->idt_limit);
-#endif
+ loadsegment(ss, __KERNEL_DS);
+ loadsegment(ds, __USER_DS);
+ loadsegment(es, __USER_DS);
-#ifdef CONFIG_X86_64
/*
- * We need GSBASE restored before percpu access can work.
- * percpu access can happen in exception handlers or in complicated
- * helpers like load_gs_index().
+ * Restore percpu access. Percpu access can happen in exception
+ * handlers or in complicated helpers like load_gs_index().
*/
- wrmsrl(MSR_GS_BASE, ctxt->gs_base);
+#ifdef CONFIG_X86_64
+ wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
+#else
+ loadsegment(fs, __KERNEL_PERCPU);
+ loadsegment(gs, __KERNEL_STACK_CANARY);
#endif
+ /* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
fix_processor_context();
/*
- * Restore segment registers. This happens after restoring the GDT
- * and LDT, which happen in fix_processor_context().
+ * Now that we have descriptor tables fully restored and working
+ * exception handling, restore the usermode segments.
*/
-#ifdef CONFIG_X86_32
+#ifdef CONFIG_X86_64
+ loadsegment(ds, ctxt->es);
loadsegment(es, ctxt->es);
loadsegment(fs, ctxt->fs);
- loadsegment(gs, ctxt->gs);
- loadsegment(ss, ctxt->ss);
-
- /*
- * sysenter MSRs
- */
- if (boot_cpu_has(X86_FEATURE_SEP))
- enable_sep_cpu();
-#else
-/* CONFIG_X86_64 */
- asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
- asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
- asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
load_gs_index(ctxt->gs);
- asm volatile ("movw %0, %%ss" :: "r" (ctxt->ss));
/*
- * Restore FSBASE and user GSBASE after reloading the respective
- * segment selectors.
+ * Restore FSBASE and GSBASE after restoring the selectors, since
+ * restoring the selectors clobbers the bases. Keep in mind
+ * that MSR_KERNEL_GS_BASE is horribly misnamed.
*/
wrmsrl(MSR_FS_BASE, ctxt->fs_base);
- wrmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base);
+ wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
+#elif defined(CONFIG_X86_32_LAZY_GS)
+ loadsegment(gs, ctxt->gs);
#endif
do_fpu_end();
return 0;
if (reg == APIC_LVR)
- return 0x10;
+ return 0x14;
#ifdef CONFIG_X86_32
if (reg == APIC_LDR)
return SET_APIC_LOGICAL_ID(1UL << smp_processor_id());
+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
+#include <linux/bootmem.h>
+#endif
#include <linux/cpu.h>
#include <linux/kexec.h>
#include <xen/features.h>
#include <xen/page.h>
+#include <xen/interface/memory.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/hypervisor.h>
}
EXPORT_SYMBOL(xen_arch_unregister_cpu);
#endif
+
+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
+void __init arch_xen_balloon_init(struct resource *hostmem_resource)
+{
+ struct xen_memory_map memmap;
+ int rc;
+ unsigned int i, last_guest_ram;
+ phys_addr_t max_addr = PFN_PHYS(max_pfn);
+ struct e820_table *xen_e820_table;
+ const struct e820_entry *entry;
+ struct resource *res;
+
+ if (!xen_initial_domain())
+ return;
+
+ xen_e820_table = kmalloc(sizeof(*xen_e820_table), GFP_KERNEL);
+ if (!xen_e820_table)
+ return;
+
+ memmap.nr_entries = ARRAY_SIZE(xen_e820_table->entries);
+ set_xen_guest_handle(memmap.buffer, xen_e820_table->entries);
+ rc = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
+ if (rc) {
+ pr_warn("%s: Can't read host e820 (%d)\n", __func__, rc);
+ goto out;
+ }
+
+ last_guest_ram = 0;
+ for (i = 0; i < memmap.nr_entries; i++) {
+ if (xen_e820_table->entries[i].addr >= max_addr)
+ break;
+ if (xen_e820_table->entries[i].type == E820_TYPE_RAM)
+ last_guest_ram = i;
+ }
+
+ entry = &xen_e820_table->entries[last_guest_ram];
+ if (max_addr >= entry->addr + entry->size)
+ goto out; /* No unallocated host RAM. */
+
+ hostmem_resource->start = max_addr;
+ hostmem_resource->end = entry->addr + entry->size;
+
+ /*
+ * Mark non-RAM regions between the end of dom0 RAM and end of host RAM
+ * as unavailable. The rest of that region can be used for hotplug-based
+ * ballooning.
+ */
+ for (; i < memmap.nr_entries; i++) {
+ entry = &xen_e820_table->entries[i];
+
+ if (entry->type == E820_TYPE_RAM)
+ continue;
+
+ if (entry->addr >= hostmem_resource->end)
+ break;
+
+ res = kzalloc(sizeof(*res), GFP_KERNEL);
+ if (!res)
+ goto out;
+
+ res->name = "Unavailable host RAM";
+ res->start = entry->addr;
+ res->end = (entry->addr + entry->size < hostmem_resource->end) ?
+ entry->addr + entry->size : hostmem_resource->end;
+ rc = insert_resource(hostmem_resource, res);
+ if (rc) {
+ pr_warn("%s: Can't insert [%llx - %llx) (%d)\n",
+ __func__, res->start, res->end, rc);
+ kfree(res);
+ goto out;
+ }
+ }
+
+ out:
+ kfree(xen_e820_table);
+}
+#endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */
#include "multicalls.h"
#include "pmu.h"
+#include "../kernel/cpu/cpu.h" /* get_cpu_cap() */
+
void *xen_initial_gdt;
static int xen_cpu_up_prepare_pv(unsigned int cpu);
mcs = xen_mc_entry(0);
MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0);
xen_mc_issue(PARAVIRT_LAZY_CPU);
- this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
+ this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0);
}
void xen_set_iopl_mask(unsigned mask)
__userpte_alloc_gfp &= ~__GFP_HIGHMEM;
/* Work out if we support NX */
+ get_cpu_cap(&boot_cpu_data);
x86_configure_nx();
/* Get mfn list */
/* Graft it onto L4[511][510] */
copy_page(level2_kernel_pgt, l2);
+ /*
+ * Zap execute permission from the ident map. Due to the sharing of
+ * L1 entries we need to do this in the L2.
+ */
+ if (__supported_pte_mask & _PAGE_NX) {
+ for (i = 0; i < PTRS_PER_PMD; ++i) {
+ if (pmd_none(level2_ident_pgt[i]))
+ continue;
+ level2_ident_pgt[i] = pmd_set_flags(level2_ident_pgt[i], _PAGE_NX);
+ }
+ }
+
/* Copy the initial P->M table mappings if necessary. */
i = pgd_index(xen_start_info->mfn_list);
if (i && i < pgd_index(__START_KERNEL_map))
switch (idx) {
case FIX_BTMAP_END ... FIX_BTMAP_BEGIN:
- case FIX_RO_IDT:
#ifdef CONFIG_X86_32
case FIX_WP_TEST:
# ifdef CONFIG_HIGHMEM
#endif
case FIX_TEXT_POKE0:
case FIX_TEXT_POKE1:
- case FIX_GDT_REMAP_BEGIN ... FIX_GDT_REMAP_END:
/* All local page mappings */
pte = pfn_pte(phys, prot);
break;
addr = xen_e820_table.entries[0].addr;
size = xen_e820_table.entries[0].size;
while (i < xen_e820_table.nr_entries) {
- bool discard = false;
chunk_size = size;
type = xen_e820_table.entries[i].type;
xen_add_extra_mem(pfn_s, n_pfns);
xen_max_p2m_pfn = pfn_s + n_pfns;
} else
- discard = true;
+ type = E820_TYPE_UNUSABLE;
}
- if (!discard)
- xen_align_and_add_e820_region(addr, chunk_size, type);
+ xen_align_and_add_e820_region(addr, chunk_size, type);
addr += chunk_size;
size -= chunk_size;
bio->bi_disk = bio_src->bi_disk;
bio->bi_partno = bio_src->bi_partno;
bio_set_flag(bio, BIO_CLONED);
+ if (bio_flagged(bio_src, BIO_THROTTLED))
+ bio_set_flag(bio, BIO_THROTTLED);
bio->bi_opf = bio_src->bi_opf;
bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter = bio_src->bi_iter;
}
}
+void blk_drain_queue(struct request_queue *q)
+{
+ spin_lock_irq(q->queue_lock);
+ __blk_drain_queue(q, true);
+ spin_unlock_irq(q->queue_lock);
+}
+
/**
* blk_queue_bypass_start - enter queue bypass mode
* @q: queue of interest
*/
blk_freeze_queue(q);
spin_lock_irq(lock);
- if (!q->mq_ops)
- __blk_drain_queue(q, true);
queue_flag_set(QUEUE_FLAG_DEAD, q);
spin_unlock_irq(lock);
#include "blk.h"
/*
- * Append a bio to a passthrough request. Only works can be merged into
- * the request based on the driver constraints.
+ * Append a bio to a passthrough request. Only works if the bio can be merged
+ * into the request based on the driver constraints.
*/
-int blk_rq_append_bio(struct request *rq, struct bio *bio)
+int blk_rq_append_bio(struct request *rq, struct bio **bio)
{
- blk_queue_bounce(rq->q, &bio);
+ struct bio *orig_bio = *bio;
+
+ blk_queue_bounce(rq->q, bio);
if (!rq->bio) {
- blk_rq_bio_prep(rq->q, rq, bio);
+ blk_rq_bio_prep(rq->q, rq, *bio);
} else {
- if (!ll_back_merge_fn(rq->q, rq, bio))
+ if (!ll_back_merge_fn(rq->q, rq, *bio)) {
+ if (orig_bio != *bio) {
+ bio_put(*bio);
+ *bio = orig_bio;
+ }
return -EINVAL;
+ }
- rq->biotail->bi_next = bio;
- rq->biotail = bio;
- rq->__data_len += bio->bi_iter.bi_size;
+ rq->biotail->bi_next = *bio;
+ rq->biotail = *bio;
+ rq->__data_len += (*bio)->bi_iter.bi_size;
}
return 0;
* We link the bounce buffer in and could have to traverse it
* later so we have to get a ref to prevent it from being freed
*/
- ret = blk_rq_append_bio(rq, bio);
- bio_get(bio);
+ ret = blk_rq_append_bio(rq, &bio);
if (ret) {
- bio_endio(bio);
__blk_rq_unmap_user(orig_bio);
- bio_put(bio);
return ret;
}
+ bio_get(bio);
return 0;
}
int reading = rq_data_dir(rq) == READ;
unsigned long addr = (unsigned long) kbuf;
int do_copy = 0;
- struct bio *bio;
+ struct bio *bio, *orig_bio;
int ret;
if (len > (queue_max_hw_sectors(q) << 9))
if (do_copy)
rq->rq_flags |= RQF_COPY_USER;
- ret = blk_rq_append_bio(rq, bio);
+ orig_bio = bio;
+ ret = blk_rq_append_bio(rq, &bio);
if (unlikely(ret)) {
/* request is too big */
- bio_put(bio);
+ bio_put(orig_bio);
return ret;
}
* exported to drivers as the only user for unfreeze is blk_mq.
*/
blk_freeze_queue_start(q);
+ if (!q->mq_ops)
+ blk_drain_queue(q);
blk_mq_freeze_queue_wait(q);
}
out_unlock:
spin_unlock_irq(q->queue_lock);
out:
- /*
- * As multiple blk-throtls may stack in the same issue path, we
- * don't want bios to leave with the flag set. Clear the flag if
- * being issued.
- */
- if (!throttled)
- bio_clear_flag(bio, BIO_THROTTLED);
+ bio_set_flag(bio, BIO_THROTTLED);
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
if (throttled || !td->track_bio_latency)
}
#endif /* CONFIG_BOUNCE */
+extern void blk_drain_queue(struct request_queue *q);
+
#endif /* BLK_INTERNAL_H */
unsigned i = 0;
bool bounce = false;
int sectors = 0;
+ bool passthrough = bio_is_passthrough(*bio_orig);
bio_for_each_segment(from, *bio_orig, iter) {
if (i++ < BIO_MAX_PAGES)
if (!bounce)
return;
- if (sectors < bio_sectors(*bio_orig)) {
+ if (!passthrough && sectors < bio_sectors(*bio_orig)) {
bio = bio_split(*bio_orig, sectors, GFP_NOIO, bounce_bio_split);
bio_chain(bio, *bio_orig);
generic_make_request(*bio_orig);
*bio_orig = bio;
}
- bio = bio_clone_bioset(*bio_orig, GFP_NOIO, bounce_bio_set);
+ bio = bio_clone_bioset(*bio_orig, GFP_NOIO, passthrough ? NULL :
+ bounce_bio_set);
bio_for_each_segment_all(to, bio, i) {
struct page *page = to->bv_page;
unsigned int cur_domain;
unsigned int batching;
wait_queue_entry_t domain_wait[KYBER_NUM_DOMAINS];
+ struct sbq_wait_state *domain_ws[KYBER_NUM_DOMAINS];
atomic_t wait_index[KYBER_NUM_DOMAINS];
};
+static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
+ void *key);
+
static int rq_sched_domain(const struct request *rq)
{
unsigned int op = rq->cmd_flags;
for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
INIT_LIST_HEAD(&khd->rqs[i]);
+ init_waitqueue_func_entry(&khd->domain_wait[i],
+ kyber_domain_wake);
+ khd->domain_wait[i].private = hctx;
INIT_LIST_HEAD(&khd->domain_wait[i].entry);
atomic_set(&khd->wait_index[i], 0);
}
int nr;
nr = __sbitmap_queue_get(domain_tokens);
- if (nr >= 0)
- return nr;
/*
* If we failed to get a domain token, make sure the hardware queue is
* run when one becomes available. Note that this is serialized on
* khd->lock, but we still need to be careful about the waker.
*/
- if (list_empty_careful(&wait->entry)) {
- init_waitqueue_func_entry(wait, kyber_domain_wake);
- wait->private = hctx;
+ if (nr < 0 && list_empty_careful(&wait->entry)) {
ws = sbq_wait_ptr(domain_tokens,
&khd->wait_index[sched_domain]);
+ khd->domain_ws[sched_domain] = ws;
add_wait_queue(&ws->wait, wait);
/*
* Try again in case a token was freed before we got on the wait
- * queue. The waker may have already removed the entry from the
- * wait queue, but list_del_init() is okay with that.
+ * queue.
*/
nr = __sbitmap_queue_get(domain_tokens);
- if (nr >= 0) {
- unsigned long flags;
+ }
- spin_lock_irqsave(&ws->wait.lock, flags);
- list_del_init(&wait->entry);
- spin_unlock_irqrestore(&ws->wait.lock, flags);
- }
+ /*
+ * If we got a token while we were on the wait queue, remove ourselves
+ * from the wait queue to ensure that all wake ups make forward
+ * progress. It's possible that the waker already deleted the entry
+ * between the !list_empty_careful() check and us grabbing the lock, but
+ * list_del_init() is okay with that.
+ */
+ if (nr >= 0 && !list_empty_careful(&wait->entry)) {
+ ws = khd->domain_ws[sched_domain];
+ spin_lock_irq(&ws->wait.lock);
+ list_del_init(&wait->entry);
+ spin_unlock_irq(&ws->wait.lock);
}
+
return nr;
}
}
tsgl = areq->tsgl;
- for_each_sg(tsgl, sg, areq->tsgl_entries, i) {
- if (!sg_page(sg))
- continue;
- put_page(sg_page(sg));
- }
+ if (tsgl) {
+ for_each_sg(tsgl, sg, areq->tsgl_entries, i) {
+ if (!sg_page(sg))
+ continue;
+ put_page(sg_page(sg));
+ }
- if (areq->tsgl && areq->tsgl_entries)
sock_kfree_s(sk, tsgl, areq->tsgl_entries * sizeof(*tsgl));
+ }
}
EXPORT_SYMBOL_GPL(af_alg_free_areq_sgls);
struct aead_tfm *tfm = private;
crypto_free_aead(tfm->aead);
+ crypto_put_default_null_skcipher2();
kfree(tfm);
}
unsigned int ivlen = crypto_aead_ivsize(tfm);
af_alg_pull_tsgl(sk, ctx->used, NULL, 0);
- crypto_put_default_null_skcipher2();
sock_kzfree_s(sk, ctx->iv, ivlen);
sock_kfree_s(sk, ctx, ctx->len);
af_alg_release_parent(sk);
salg = shash_attr_alg(tb[1], 0, 0);
if (IS_ERR(salg))
return PTR_ERR(salg);
+ alg = &salg->base;
+ /* The underlying hash algorithm must be unkeyed */
err = -EINVAL;
+ if (crypto_shash_alg_has_setkey(salg))
+ goto out_put_alg;
+
ds = salg->digestsize;
ss = salg->statesize;
- alg = &salg->base;
if (ds > alg->cra_blocksize ||
ss < alg->cra_blocksize)
goto out_put_alg;
return -EINVAL;
if (fips_enabled) {
- while (!*ptr && n_sz) {
+ while (n_sz && !*ptr) {
ptr++;
n_sz--;
}
salsa20_ivsetup(ctx, walk.iv);
- if (likely(walk.nbytes == nbytes))
- {
- salsa20_encrypt_bytes(ctx, walk.dst.virt.addr,
- walk.src.virt.addr, nbytes);
- return blkcipher_walk_done(desc, &walk, 0);
- }
-
while (walk.nbytes >= 64) {
salsa20_encrypt_bytes(ctx, walk.dst.virt.addr,
walk.src.virt.addr,
static const struct crypto_type crypto_shash_type;
-static int shash_no_setkey(struct crypto_shash *tfm, const u8 *key,
- unsigned int keylen)
+int shash_no_setkey(struct crypto_shash *tfm, const u8 *key,
+ unsigned int keylen)
{
return -ENOSYS;
}
+EXPORT_SYMBOL_GPL(shash_no_setkey);
static int shash_setkey_unaligned(struct crypto_shash *tfm, const u8 *key,
unsigned int keylen)
/* The record may be cleared by others, try read next record */
if (len == -ENOENT)
goto skip;
- else if (len < sizeof(*rcd)) {
+ else if (len < 0 || len < sizeof(*rcd)) {
rc = -EIO;
goto out;
}
struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
struct cpc_register_resource *desired_reg;
int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
- struct cppc_pcc_data *pcc_ss_data = pcc_data[pcc_ss_id];
+ struct cppc_pcc_data *pcc_ss_data;
int ret = 0;
if (!cpc_desc || pcc_ss_id < 0) {
* skip all of the subsequent "thaw" callbacks for the device.
*/
if (dev_pm_smart_suspend_and_suspended(dev)) {
- dev->power.direct_complete = true;
+ dev_pm_skip_next_resume_phases(dev);
return 0;
}
dev_name(&adev_dimm->dev));
return -ENXIO;
}
+ /*
+ * Record nfit_mem for the notification path to track back to
+ * the nfit sysfs attributes for this dimm device object.
+ */
+ dev_set_drvdata(&adev_dimm->dev, nfit_mem);
/*
* Until standardization materializes we need to consider 4
sysfs_put(nfit_mem->flags_attr);
nfit_mem->flags_attr = NULL;
}
- if (adev_dimm)
+ if (adev_dimm) {
acpi_remove_notify_handler(adev_dimm->handle,
ACPI_DEVICE_NOTIFY, acpi_nvdimm_notify);
+ dev_set_drvdata(&adev_dimm->dev, NULL);
+ }
}
mutex_unlock(&acpi_desc->init_mutex);
}
* @tsk task_struct for group_leader of process
* (invariant after initialized)
* @files files_struct for process
- * (invariant after initialized)
+ * (protected by @files_lock)
+ * @files_lock mutex to protect @files
* @deferred_work_node: element for binder_deferred_list
* (protected by binder_deferred_lock)
* @deferred_work: bitmap of deferred work to perform
int pid;
struct task_struct *tsk;
struct files_struct *files;
+ struct mutex files_lock;
struct hlist_node deferred_work_node;
int deferred_work;
bool is_dead;
static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
{
- struct files_struct *files = proc->files;
unsigned long rlim_cur;
unsigned long irqs;
+ int ret;
- if (files == NULL)
- return -ESRCH;
-
- if (!lock_task_sighand(proc->tsk, &irqs))
- return -EMFILE;
-
+ mutex_lock(&proc->files_lock);
+ if (proc->files == NULL) {
+ ret = -ESRCH;
+ goto err;
+ }
+ if (!lock_task_sighand(proc->tsk, &irqs)) {
+ ret = -EMFILE;
+ goto err;
+ }
rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
unlock_task_sighand(proc->tsk, &irqs);
- return __alloc_fd(files, 0, rlim_cur, flags);
+ ret = __alloc_fd(proc->files, 0, rlim_cur, flags);
+err:
+ mutex_unlock(&proc->files_lock);
+ return ret;
}
/*
static void task_fd_install(
struct binder_proc *proc, unsigned int fd, struct file *file)
{
+ mutex_lock(&proc->files_lock);
if (proc->files)
__fd_install(proc->files, fd, file);
+ mutex_unlock(&proc->files_lock);
}
/*
{
int retval;
- if (proc->files == NULL)
- return -ESRCH;
-
+ mutex_lock(&proc->files_lock);
+ if (proc->files == NULL) {
+ retval = -ESRCH;
+ goto err;
+ }
retval = __close_fd(proc->files, fd);
/* can't restart close syscall because file table entry was cleared */
if (unlikely(retval == -ERESTARTSYS ||
retval == -ERESTARTNOHAND ||
retval == -ERESTART_RESTARTBLOCK))
retval = -EINTR;
-
+err:
+ mutex_unlock(&proc->files_lock);
return retval;
}
ret = binder_alloc_mmap_handler(&proc->alloc, vma);
if (ret)
return ret;
+ mutex_lock(&proc->files_lock);
proc->files = get_files_struct(current);
+ mutex_unlock(&proc->files_lock);
return 0;
err_bad_arg:
spin_lock_init(&proc->outer_lock);
get_task_struct(current->group_leader);
proc->tsk = current->group_leader;
+ mutex_init(&proc->files_lock);
INIT_LIST_HEAD(&proc->todo);
proc->default_priority = task_nice(current);
binder_dev = container_of(filp->private_data, struct binder_device,
files = NULL;
if (defer & BINDER_DEFERRED_PUT_FILES) {
+ mutex_lock(&proc->files_lock);
files = proc->files;
if (files)
proc->files = NULL;
+ mutex_unlock(&proc->files_lock);
}
if (defer & BINDER_DEFERRED_FLUSH)
/*
- * MeidaTek AHCI SATA driver
+ * MediaTek AHCI SATA driver
*
* Copyright (c) 2017 MediaTek Inc.
* Author: Ryder Lee <ryder.lee@mediatek.com>
#include <linux/reset.h>
#include "ahci.h"
-#define DRV_NAME "ahci"
+#define DRV_NAME "ahci-mtk"
#define SYS_CFG 0x14
#define SYS_CFG_SATA_MSK GENMASK(31, 30)
};
module_platform_driver(mtk_ahci_driver);
-MODULE_DESCRIPTION("MeidaTek SATA AHCI Driver");
+MODULE_DESCRIPTION("MediaTek SATA AHCI Driver");
MODULE_LICENSE("GPL v2");
/* port register default value */
#define AHCI_PORT_PHY_1_CFG 0xa003fffe
+#define AHCI_PORT_PHY2_CFG 0x28184d1f
+#define AHCI_PORT_PHY3_CFG 0x0e081509
#define AHCI_PORT_TRANS_CFG 0x08000029
#define AHCI_PORT_AXICC_CFG 0x3fffffff
writel(readl(qpriv->ecc_addr) | ECC_DIS_ARMV8_CH2,
qpriv->ecc_addr);
writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1);
+ writel(AHCI_PORT_PHY2_CFG, reg_base + PORT_PHY2);
+ writel(AHCI_PORT_PHY3_CFG, reg_base + PORT_PHY3);
writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS);
if (qpriv->is_dmacoherent)
writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
case AHCI_LS2080A:
writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1);
+ writel(AHCI_PORT_PHY2_CFG, reg_base + PORT_PHY2);
+ writel(AHCI_PORT_PHY3_CFG, reg_base + PORT_PHY3);
writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS);
if (qpriv->is_dmacoherent)
writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
writel(readl(qpriv->ecc_addr) | ECC_DIS_ARMV8_CH2,
qpriv->ecc_addr);
writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1);
+ writel(AHCI_PORT_PHY2_CFG, reg_base + PORT_PHY2);
+ writel(AHCI_PORT_PHY3_CFG, reg_base + PORT_PHY3);
writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS);
if (qpriv->is_dmacoherent)
writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
writel(readl(qpriv->ecc_addr) | ECC_DIS_LS1088A,
qpriv->ecc_addr);
writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1);
+ writel(AHCI_PORT_PHY2_CFG, reg_base + PORT_PHY2);
+ writel(AHCI_PORT_PHY3_CFG, reg_base + PORT_PHY3);
writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS);
if (qpriv->is_dmacoherent)
writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
case AHCI_LS2088A:
writel(AHCI_PORT_PHY_1_CFG, reg_base + PORT_PHY1);
+ writel(AHCI_PORT_PHY2_CFG, reg_base + PORT_PHY2);
+ writel(AHCI_PORT_PHY3_CFG, reg_base + PORT_PHY3);
writel(AHCI_PORT_TRANS_CFG, reg_base + PORT_TRANS);
if (qpriv->is_dmacoherent)
writel(AHCI_PORT_AXICC_CFG, reg_base + PORT_AXICC);
bit = fls(mask) - 1;
mask &= ~(1 << bit);
- /* Mask off all speeds higher than or equal to the current
- * one. Force 1.5Gbps if current SPD is not available.
+ /*
+ * Mask off all speeds higher than or equal to the current one. At
+ * this point, if current SPD is not available and we previously
+ * recorded the link speed from SStatus, the driver has already
+ * masked off the highest bit so mask should already be 1 or 0.
+ * Otherwise, we should not force 1.5Gbps on a link where we have
+ * not previously recorded speed from SStatus. Just return in this
+ * case.
*/
if (spd > 1)
mask &= (1 << (spd - 1)) - 1;
else
- mask &= 1;
+ return -EINVAL;
/* were we already at the bottom? */
if (!mask)
* is issued to the device. However, if the controller clock is 133MHz,
* the following tables must be used.
*/
-static struct pdc2027x_pio_timing {
+static const struct pdc2027x_pio_timing {
u8 value0, value1, value2;
} pdc2027x_pio_timing_tbl[] = {
{ 0xfb, 0x2b, 0xac }, /* PIO mode 0 */
{ 0x23, 0x09, 0x25 }, /* PIO mode 4, IORDY on, Prefetch off */
};
-static struct pdc2027x_mdma_timing {
+static const struct pdc2027x_mdma_timing {
u8 value0, value1;
} pdc2027x_mdma_timing_tbl[] = {
{ 0xdf, 0x5f }, /* MDMA mode 0 */
{ 0x69, 0x25 }, /* MDMA mode 2 */
};
-static struct pdc2027x_udma_timing {
+static const struct pdc2027x_udma_timing {
u8 value0, value1, value2;
} pdc2027x_udma_timing_tbl[] = {
{ 0x4a, 0x0f, 0xd5 }, /* UDMA mode 0 */
* @host: target ATA host
* @board_idx: board identifier
*/
-static int pdc_hardware_init(struct ata_host *host, unsigned int board_idx)
+static void pdc_hardware_init(struct ata_host *host, unsigned int board_idx)
{
long pll_clock;
/* Adjust PLL control register */
pdc_adjust_pll(host, pll_clock, board_idx);
-
- return 0;
}
/**
//pci_enable_intx(pdev);
/* initialize adapter */
- if (pdc_hardware_init(host, board_idx) != 0)
- return -EIO;
+ pdc_hardware_init(host, board_idx);
pci_set_master(pdev);
return ata_host_activate(host, pdev->irq, ata_bmdma_interrupt,
else
board_idx = PDC_UDMA_133;
- if (pdc_hardware_init(host, board_idx))
- return -EIO;
+ pdc_hardware_init(host, board_idx);
ata_host_resume(host);
return 0;
this_leaf->ways_of_associativity = (size / nr_sets) / line_size;
}
+static bool cache_node_is_unified(struct cacheinfo *this_leaf)
+{
+ return of_property_read_bool(this_leaf->of_node, "cache-unified");
+}
+
static void cache_of_override_properties(unsigned int cpu)
{
int index;
for (index = 0; index < cache_leaves(cpu); index++) {
this_leaf = this_cpu_ci->info_list + index;
+ /*
+ * init_cache_level must setup the cache level correctly
+ * overriding the architecturally specified levels, so
+ * if type is NONE at this stage, it should be unified
+ */
+ if (this_leaf->type == CACHE_TYPE_NOCACHE &&
+ cache_node_is_unified(this_leaf))
+ this_leaf->type = CACHE_TYPE_UNIFIED;
cache_size(this_leaf);
cache_get_line_size(this_leaf);
cache_nr_sets(this_leaf);
/*------------------------- Resume routines -------------------------*/
+/**
+ * dev_pm_skip_next_resume_phases - Skip next system resume phases for device.
+ * @dev: Target device.
+ *
+ * Make the core skip the "early resume" and "resume" phases for @dev.
+ *
+ * This function can be called by middle-layer code during the "noirq" phase of
+ * system resume if necessary, but not by device drivers.
+ */
+void dev_pm_skip_next_resume_phases(struct device *dev)
+{
+ dev->power.is_late_suspended = false;
+ dev->power.is_suspended = false;
+}
+
/**
* device_resume_noirq - Execute a "noirq resume" callback for given device.
* @dev: Device to handle.
return err;
}
-static void lo_release(struct gendisk *disk, fmode_t mode)
+static void __lo_release(struct loop_device *lo)
{
- struct loop_device *lo = disk->private_data;
int err;
if (atomic_dec_return(&lo->lo_refcnt))
mutex_unlock(&lo->lo_ctl_mutex);
}
+static void lo_release(struct gendisk *disk, fmode_t mode)
+{
+ mutex_lock(&loop_index_mutex);
+ __lo_release(disk->private_data);
+ mutex_unlock(&loop_index_mutex);
+}
+
static const struct block_device_operations lo_fops = {
.owner = THIS_MODULE,
.open = lo_open,
struct nullb_cmd {
struct list_head list;
struct llist_node ll_list;
- call_single_data_t csd;
+ struct __call_single_data csd;
struct request *rq;
struct bio *bio;
unsigned int tag;
+ blk_status_t error;
struct nullb_queue *nq;
struct hrtimer timer;
- blk_status_t error;
};
struct nullb_queue {
mutex_unlock(&rbd_dev->watch_mutex);
}
+static void __rbd_lock(struct rbd_device *rbd_dev, const char *cookie)
+{
+ struct rbd_client_id cid = rbd_get_cid(rbd_dev);
+
+ strcpy(rbd_dev->lock_cookie, cookie);
+ rbd_set_owner_cid(rbd_dev, &cid);
+ queue_work(rbd_dev->task_wq, &rbd_dev->acquired_lock_work);
+}
+
/*
* lock_rwsem must be held for write
*/
static int rbd_lock(struct rbd_device *rbd_dev)
{
struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc;
- struct rbd_client_id cid = rbd_get_cid(rbd_dev);
char cookie[32];
int ret;
return ret;
rbd_dev->lock_state = RBD_LOCK_STATE_LOCKED;
- strcpy(rbd_dev->lock_cookie, cookie);
- rbd_set_owner_cid(rbd_dev, &cid);
- queue_work(rbd_dev->task_wq, &rbd_dev->acquired_lock_work);
+ __rbd_lock(rbd_dev, cookie);
return 0;
}
queue_delayed_work(rbd_dev->task_wq,
&rbd_dev->lock_dwork, 0);
} else {
- strcpy(rbd_dev->lock_cookie, cookie);
+ __rbd_lock(rbd_dev, cookie);
}
}
segment_size = rbd_obj_bytes(&rbd_dev->header);
blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);
q->limits.max_sectors = queue_max_hw_sectors(q);
- blk_queue_max_segments(q, segment_size / SECTOR_SIZE);
+ blk_queue_max_segments(q, USHRT_MAX);
blk_queue_max_segment_size(q, segment_size);
blk_queue_io_min(q, segment_size);
blk_queue_io_opt(q, segment_size);
.match = sunxi_rsb_device_match,
.probe = sunxi_rsb_device_probe,
.remove = sunxi_rsb_device_remove,
+ .uevent = of_device_uevent_modalias,
};
static void sunxi_rsb_dev_release(struct device *dev)
/* The timer for this si. */
struct timer_list si_timer;
+ /* This flag is set, if the timer can be set */
+ bool timer_can_start;
+
/* This flag is set, if the timer is running (timer_pending() isn't enough) */
bool timer_running;
static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val)
{
+ if (!smi_info->timer_can_start)
+ return;
smi_info->last_timeout_jiffies = jiffies;
mod_timer(&smi_info->si_timer, new_val);
smi_info->timer_running = true;
smi_info->handlers->start_transaction(smi_info->si_sm, msg, size);
}
-static void start_check_enables(struct smi_info *smi_info, bool start_timer)
+static void start_check_enables(struct smi_info *smi_info)
{
unsigned char msg[2];
msg[0] = (IPMI_NETFN_APP_REQUEST << 2);
msg[1] = IPMI_GET_BMC_GLOBAL_ENABLES_CMD;
- if (start_timer)
- start_new_msg(smi_info, msg, 2);
- else
- smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2);
+ start_new_msg(smi_info, msg, 2);
smi_info->si_state = SI_CHECKING_ENABLES;
}
-static void start_clear_flags(struct smi_info *smi_info, bool start_timer)
+static void start_clear_flags(struct smi_info *smi_info)
{
unsigned char msg[3];
msg[1] = IPMI_CLEAR_MSG_FLAGS_CMD;
msg[2] = WDT_PRE_TIMEOUT_INT;
- if (start_timer)
- start_new_msg(smi_info, msg, 3);
- else
- smi_info->handlers->start_transaction(smi_info->si_sm, msg, 3);
+ start_new_msg(smi_info, msg, 3);
smi_info->si_state = SI_CLEARING_FLAGS;
}
* Note that we cannot just use disable_irq(), since the interrupt may
* be shared.
*/
-static inline bool disable_si_irq(struct smi_info *smi_info, bool start_timer)
+static inline bool disable_si_irq(struct smi_info *smi_info)
{
if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
smi_info->interrupt_disabled = true;
- start_check_enables(smi_info, start_timer);
+ start_check_enables(smi_info);
return true;
}
return false;
{
if ((smi_info->io.irq) && (smi_info->interrupt_disabled)) {
smi_info->interrupt_disabled = false;
- start_check_enables(smi_info, true);
+ start_check_enables(smi_info);
return true;
}
return false;
msg = ipmi_alloc_smi_msg();
if (!msg) {
- if (!disable_si_irq(smi_info, true))
+ if (!disable_si_irq(smi_info))
smi_info->si_state = SI_NORMAL;
} else if (enable_si_irq(smi_info)) {
ipmi_free_smi_msg(msg);
/* Watchdog pre-timeout */
smi_inc_stat(smi_info, watchdog_pretimeouts);
- start_clear_flags(smi_info, true);
+ start_clear_flags(smi_info);
smi_info->msg_flags &= ~WDT_PRE_TIMEOUT_INT;
if (smi_info->intf)
ipmi_smi_watchdog_pretimeout(smi_info->intf);
* disable and messages disabled.
*/
if (smi_info->supports_event_msg_buff || smi_info->io.irq) {
- start_check_enables(smi_info, true);
+ start_check_enables(smi_info);
} else {
smi_info->curr_msg = alloc_msg_handle_irq(smi_info);
if (!smi_info->curr_msg)
/* Set up the timer that drives the interface. */
timer_setup(&new_smi->si_timer, smi_timeout, 0);
+ new_smi->timer_can_start = true;
smi_mod_timer(new_smi, jiffies + SI_TIMEOUT_JIFFIES);
/* Try to claim any interrupts. */
check_set_rcv_irq(smi_info);
}
-static inline void wait_for_timer_and_thread(struct smi_info *smi_info)
+static inline void stop_timer_and_thread(struct smi_info *smi_info)
{
if (smi_info->thread != NULL)
kthread_stop(smi_info->thread);
+
+ smi_info->timer_can_start = false;
if (smi_info->timer_running)
del_timer_sync(&smi_info->si_timer);
}
* Start clearing the flags before we enable interrupts or the
* timer to avoid racing with the timer.
*/
- start_clear_flags(new_smi, false);
+ start_clear_flags(new_smi);
/*
* IRQ is defined to be set when non-zero. req_events will
dev_set_drvdata(new_smi->io.dev, NULL);
out_err_stop_timer:
- wait_for_timer_and_thread(new_smi);
+ stop_timer_and_thread(new_smi);
out_err:
new_smi->interrupt_disabled = true;
*/
if (to_clean->io.irq_cleanup)
to_clean->io.irq_cleanup(&to_clean->io);
- wait_for_timer_and_thread(to_clean);
+ stop_timer_and_thread(to_clean);
/*
* Timeouts are stopped, now make sure the interrupts are off
schedule_timeout_uninterruptible(1);
}
if (to_clean->handlers)
- disable_si_irq(to_clean, false);
+ disable_si_irq(to_clean);
while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
poll(to_clean);
schedule_timeout_uninterruptible(1);
{
struct si_sm_io io;
+ memset(&io, 0, sizeof(io));
+
io.si_type = SI_KCS;
io.addr_source = SI_DEVICETREE;
io.addr_type = IPMI_MEM_ADDR_SPACE;
io.addr_source_cleanup = ipmi_pci_cleanup;
io.addr_source_data = pdev;
- if (pci_resource_flags(pdev, 0) & IORESOURCE_IO)
+ if (pci_resource_flags(pdev, 0) & IORESOURCE_IO) {
io.addr_type = IPMI_IO_ADDR_SPACE;
- else
+ io.io_setup = ipmi_si_port_setup;
+ } else {
io.addr_type = IPMI_MEM_ADDR_SPACE;
+ io.io_setup = ipmi_si_mem_setup;
+ }
io.addr_data = pci_resource_start(pdev, 0);
io.regspacing = ipmi_pci_probe_regspacing(&io);
ret = core->ops->is_enabled(core->hw);
done:
- clk_pm_runtime_put(core);
+ if (core->dev)
+ pm_runtime_put(core->dev);
return ret;
}
best_parent_rate = core->parent->rate;
}
+ if (clk_pm_runtime_get(core))
+ return;
+
if (core->flags & CLK_SET_RATE_UNGATE) {
unsigned long flags;
/* handle the new child who might not be in core->children yet */
if (core->new_child)
clk_change_rate(core->new_child);
+
+ clk_pm_runtime_put(core);
}
static int clk_core_set_rate_nolock(struct clk_core *core,
#include <linux/clk.h>
#include <linux/clk-provider.h>
+#include <linux/delay.h>
#include <linux/init.h>
#include <linux/of.h>
#include <linux/of_device.h>
return 0;
}
+static int sun9i_mmc_reset_reset(struct reset_controller_dev *rcdev,
+ unsigned long id)
+{
+ sun9i_mmc_reset_assert(rcdev, id);
+ udelay(10);
+ sun9i_mmc_reset_deassert(rcdev, id);
+
+ return 0;
+}
+
static const struct reset_control_ops sun9i_mmc_reset_ops = {
.assert = sun9i_mmc_reset_assert,
.deassert = sun9i_mmc_reset_deassert,
+ .reset = sun9i_mmc_reset_reset,
};
static int sun9i_a80_mmc_config_clk_probe(struct platform_device *pdev)
#include "cpufreq_governor.h"
+#define CPUFREQ_DBS_MIN_SAMPLING_INTERVAL (2 * TICK_NSEC / NSEC_PER_USEC)
+
static DEFINE_PER_CPU(struct cpu_dbs_info, cpu_dbs);
static DEFINE_MUTEX(gov_dbs_data_mutex);
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
struct policy_dbs_info *policy_dbs;
+ unsigned int sampling_interval;
int ret;
- ret = sscanf(buf, "%u", &dbs_data->sampling_rate);
- if (ret != 1)
+
+ ret = sscanf(buf, "%u", &sampling_interval);
+ if (ret != 1 || sampling_interval < CPUFREQ_DBS_MIN_SAMPLING_INTERVAL)
return -EINVAL;
+ dbs_data->sampling_rate = sampling_interval;
+
/*
* We are operating under dbs_data->mutex and so the list and its
* entries can't be freed concurrently.
if (ret)
goto free_policy_dbs_info;
- dbs_data->sampling_rate = cpufreq_policy_transition_delay_us(policy);
+ /*
+ * The sampling interval should not be less than the transition latency
+ * of the CPU and it also cannot be too small for dbs_update() to work
+ * correctly.
+ */
+ dbs_data->sampling_rate = max_t(unsigned int,
+ CPUFREQ_DBS_MIN_SAMPLING_INTERVAL,
+ cpufreq_policy_transition_delay_us(policy));
if (!have_governor_per_policy())
gov->gdbs_data = dbs_data;
val >>= OCOTP_CFG3_SPEED_SHIFT;
val &= 0x3;
- if ((val != OCOTP_CFG3_SPEED_1P2GHZ) &&
- of_machine_is_compatible("fsl,imx6q"))
- if (dev_pm_opp_disable(dev, 1200000000))
- dev_warn(dev, "failed to disable 1.2GHz OPP\n");
if (val < OCOTP_CFG3_SPEED_996MHZ)
if (dev_pm_opp_disable(dev, 996000000))
dev_warn(dev, "failed to disable 996MHz OPP\n");
- if (of_machine_is_compatible("fsl,imx6q")) {
+
+ if (of_machine_is_compatible("fsl,imx6q") ||
+ of_machine_is_compatible("fsl,imx6qp")) {
if (val != OCOTP_CFG3_SPEED_852MHZ)
if (dev_pm_opp_disable(dev, 852000000))
dev_warn(dev, "failed to disable 852MHz OPP\n");
+ if (val != OCOTP_CFG3_SPEED_1P2GHZ)
+ if (dev_pm_opp_disable(dev, 1200000000))
+ dev_warn(dev, "failed to disable 1.2GHz OPP\n");
}
iounmap(base);
put_node:
unsigned long flags)
{
struct at_dma_chan *atchan = to_at_dma_chan(chan);
- struct data_chunk *first = xt->sgl;
+ struct data_chunk *first;
struct at_desc *desc = NULL;
size_t xfer_count;
unsigned int dwidth;
if (unlikely(!xt || xt->numf != 1 || !xt->frame_size))
return NULL;
+ first = xt->sgl;
+
dev_info(chan2dev(chan),
"%s: src=%pad, dest=%pad, numf=%d, frame_size=%d, flags=0x%lx\n",
__func__, &xt->src_start, &xt->dst_start, xt->numf,
ret = dma_async_device_register(dd);
if (ret)
- return ret;
+ goto err_clk;
irq = platform_get_irq(pdev, 0);
ret = request_irq(irq, jz4740_dma_irq, 0, dev_name(&pdev->dev), dmadev);
err_unregister:
dma_async_device_unregister(dd);
+err_clk:
+ clk_disable_unprepare(dmadev->clk);
return ret;
}
#define PATTERN_COUNT_MASK 0x1f
#define PATTERN_MEMSET_IDX 0x01
+/* poor man's completion - we want to use wait_event_freezable() on it */
+struct dmatest_done {
+ bool done;
+ wait_queue_head_t *wait;
+};
+
struct dmatest_thread {
struct list_head node;
struct dmatest_info *info;
u8 **dsts;
u8 **udsts;
enum dma_transaction_type type;
+ wait_queue_head_t done_wait;
+ struct dmatest_done test_done;
bool done;
};
return error_count;
}
-/* poor man's completion - we want to use wait_event_freezable() on it */
-struct dmatest_done {
- bool done;
- wait_queue_head_t *wait;
-};
static void dmatest_callback(void *arg)
{
struct dmatest_done *done = arg;
-
- done->done = true;
- wake_up_all(done->wait);
+ struct dmatest_thread *thread =
+ container_of(arg, struct dmatest_thread, done_wait);
+ if (!thread->done) {
+ done->done = true;
+ wake_up_all(done->wait);
+ } else {
+ /*
+ * If thread->done, it means that this callback occurred
+ * after the parent thread has cleaned up. This can
+ * happen in the case that driver doesn't implement
+ * the terminate_all() functionality and a dma operation
+ * did not occur within the timeout period
+ */
+ WARN(1, "dmatest: Kernel memory may be corrupted!!\n");
+ }
}
static unsigned int min_odd(unsigned int x, unsigned int y)
*/
static int dmatest_func(void *data)
{
- DECLARE_WAIT_QUEUE_HEAD_ONSTACK(done_wait);
struct dmatest_thread *thread = data;
- struct dmatest_done done = { .wait = &done_wait };
+ struct dmatest_done *done = &thread->test_done;
struct dmatest_info *info;
struct dmatest_params *params;
struct dma_chan *chan;
continue;
}
- done.done = false;
+ done->done = false;
tx->callback = dmatest_callback;
- tx->callback_param = &done;
+ tx->callback_param = done;
cookie = tx->tx_submit(tx);
if (dma_submit_error(cookie)) {
}
dma_async_issue_pending(chan);
- wait_event_freezable_timeout(done_wait, done.done,
+ wait_event_freezable_timeout(thread->done_wait, done->done,
msecs_to_jiffies(params->timeout));
status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
- if (!done.done) {
- /*
- * We're leaving the timed out dma operation with
- * dangling pointer to done_wait. To make this
- * correct, we'll need to allocate wait_done for
- * each test iteration and perform "who's gonna
- * free it this time?" dancing. For now, just
- * leave it dangling.
- */
- WARN(1, "dmatest: Kernel stack may be corrupted!!\n");
+ if (!done->done) {
dmaengine_unmap_put(um);
result("test timed out", total_tests, src_off, dst_off,
len, 0);
dmatest_KBs(runtime, total_len), ret);
/* terminate all transfers on specified channels */
- if (ret)
+ if (ret || failed_tests)
dmaengine_terminate_all(chan);
thread->done = true;
thread->info = info;
thread->chan = dtc->chan;
thread->type = type;
+ thread->test_done.wait = &thread->done_wait;
+ init_waitqueue_head(&thread->done_wait);
smp_wmb();
thread->task = kthread_create(dmatest_func, thread, "%s-%s%u",
dma_chan_name(chan), op, i);
}
}
-static void fsl_disable_clocks(struct fsl_edma_engine *fsl_edma)
+static void fsl_disable_clocks(struct fsl_edma_engine *fsl_edma, int nr_clocks)
{
int i;
- for (i = 0; i < DMAMUX_NR; i++)
+ for (i = 0; i < nr_clocks; i++)
clk_disable_unprepare(fsl_edma->muxclk[i]);
}
res = platform_get_resource(pdev, IORESOURCE_MEM, 1 + i);
fsl_edma->muxbase[i] = devm_ioremap_resource(&pdev->dev, res);
- if (IS_ERR(fsl_edma->muxbase[i]))
+ if (IS_ERR(fsl_edma->muxbase[i])) {
+ /* on error: disable all previously enabled clks */
+ fsl_disable_clocks(fsl_edma, i);
return PTR_ERR(fsl_edma->muxbase[i]);
+ }
sprintf(clkname, "dmamux%d", i);
fsl_edma->muxclk[i] = devm_clk_get(&pdev->dev, clkname);
if (IS_ERR(fsl_edma->muxclk[i])) {
dev_err(&pdev->dev, "Missing DMAMUX block clock.\n");
+ /* on error: disable all previously enabled clks */
+ fsl_disable_clocks(fsl_edma, i);
return PTR_ERR(fsl_edma->muxclk[i]);
}
ret = clk_prepare_enable(fsl_edma->muxclk[i]);
- if (ret) {
- /* disable only clks which were enabled on error */
- for (; i >= 0; i--)
- clk_disable_unprepare(fsl_edma->muxclk[i]);
-
- dev_err(&pdev->dev, "DMAMUX clk block failed.\n");
- return ret;
- }
+ if (ret)
+ /* on error: disable all previously enabled clks */
+ fsl_disable_clocks(fsl_edma, i);
}
if (ret) {
dev_err(&pdev->dev,
"Can't register Freescale eDMA engine. (%d)\n", ret);
- fsl_disable_clocks(fsl_edma);
+ fsl_disable_clocks(fsl_edma, DMAMUX_NR);
return ret;
}
dev_err(&pdev->dev,
"Can't register Freescale eDMA of_dma. (%d)\n", ret);
dma_async_device_unregister(&fsl_edma->dma_dev);
- fsl_disable_clocks(fsl_edma);
+ fsl_disable_clocks(fsl_edma, DMAMUX_NR);
return ret;
}
fsl_edma_cleanup_vchan(&fsl_edma->dma_dev);
of_dma_controller_free(np);
dma_async_device_unregister(&fsl_edma->dma_dev);
- fsl_disable_clocks(fsl_edma);
+ fsl_disable_clocks(fsl_edma, DMAMUX_NR);
return 0;
}
if (memcmp(src, dest, IOAT_TEST_SIZE)) {
dev_err(dev, "Self-test copy failed compare, disabling\n");
err = -ENODEV;
- goto free_resources;
+ goto unmap_dma;
}
unmap_dma:
#define NO_FURTHER_WRITE_ACTION -1
-#ifndef phys_to_page
-#define phys_to_page(x) pfn_to_page((x) >> PAGE_SHIFT)
-#endif
-
/**
* efi_free_all_buff_pages - free all previous allocated buffer pages
* @cap_info: pointer to current instance of capsule_info structure
static void efi_free_all_buff_pages(struct capsule_info *cap_info)
{
while (cap_info->index > 0)
- __free_page(phys_to_page(cap_info->pages[--cap_info->index]));
+ __free_page(cap_info->pages[--cap_info->index]);
cap_info->index = NO_FURTHER_WRITE_ACTION;
}
cap_info->pages = temp_page;
+ temp_page = krealloc(cap_info->phys,
+ pages_needed * sizeof(phys_addr_t *),
+ GFP_KERNEL | __GFP_ZERO);
+ if (!temp_page)
+ return -ENOMEM;
+
+ cap_info->phys = temp_page;
+
return 0;
}
**/
static ssize_t efi_capsule_submit_update(struct capsule_info *cap_info)
{
+ bool do_vunmap = false;
int ret;
- ret = efi_capsule_update(&cap_info->header, cap_info->pages);
+ /*
+ * cap_info->capsule may have been assigned already by a quirk
+ * handler, so only overwrite it if it is NULL
+ */
+ if (!cap_info->capsule) {
+ cap_info->capsule = vmap(cap_info->pages, cap_info->index,
+ VM_MAP, PAGE_KERNEL);
+ if (!cap_info->capsule)
+ return -ENOMEM;
+ do_vunmap = true;
+ }
+
+ ret = efi_capsule_update(cap_info->capsule, cap_info->phys);
+ if (do_vunmap)
+ vunmap(cap_info->capsule);
if (ret) {
pr_err("capsule update failed\n");
return ret;
goto failed;
}
- cap_info->pages[cap_info->index++] = page_to_phys(page);
+ cap_info->pages[cap_info->index] = page;
+ cap_info->phys[cap_info->index] = page_to_phys(page);
cap_info->page_bytes_remain = PAGE_SIZE;
+ cap_info->index++;
} else {
- page = phys_to_page(cap_info->pages[cap_info->index - 1]);
+ page = cap_info->pages[cap_info->index - 1];
}
kbuff = kmap(page);
struct capsule_info *cap_info = file->private_data;
kfree(cap_info->pages);
+ kfree(cap_info->phys);
kfree(file->private_data);
file->private_data = NULL;
return 0;
return -ENOMEM;
}
+ cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL);
+ if (!cap_info->phys) {
+ kfree(cap_info->pages);
+ kfree(cap_info);
+ return -ENOMEM;
+ }
+
file->private_data = cap_info;
return 0;
* category than their parents, so it won't report false recursion.
*/
static struct lock_class_key gpio_lock_class;
+static struct lock_class_key gpio_request_class;
static int bcm_kona_gpio_irq_map(struct irq_domain *d, unsigned int irq,
irq_hw_number_t hwirq)
ret = irq_set_chip_data(irq, d->host_data);
if (ret < 0)
return ret;
- irq_set_lockdep_class(irq, &gpio_lock_class);
+ irq_set_lockdep_class(irq, &gpio_lock_class, &gpio_request_class);
irq_set_chip_and_handler(irq, &bcm_gpio_irq_chip, handle_simple_irq);
irq_set_noprobe(irq);
* category than their parents, so it won't report false recursion.
*/
static struct lock_class_key brcmstb_gpio_irq_lock_class;
+static struct lock_class_key brcmstb_gpio_irq_request_class;
static int brcmstb_gpio_irq_map(struct irq_domain *d, unsigned int irq,
ret = irq_set_chip_data(irq, &bank->gc);
if (ret < 0)
return ret;
- irq_set_lockdep_class(irq, &brcmstb_gpio_irq_lock_class);
+ irq_set_lockdep_class(irq, &brcmstb_gpio_irq_lock_class,
+ &brcmstb_gpio_irq_request_class);
irq_set_chip_and_handler(irq, &priv->irq_chip, handle_level_irq);
irq_set_noprobe(irq);
return 0;
struct gpio_reg *r = to_gpio_reg(gc);
int irq = r->irqs[offset];
- if (irq >= 0 && r->irq.domain)
- irq = irq_find_mapping(r->irq.domain, irq);
+ if (irq >= 0 && r->irqdomain)
+ irq = irq_find_mapping(r->irqdomain, irq);
return irq;
}
* than their parents, so it won't report false recursion.
*/
static struct lock_class_key gpio_lock_class;
+static struct lock_class_key gpio_request_class;
static int tegra_gpio_probe(struct platform_device *pdev)
{
bank = &tgi->bank_info[GPIO_BANK(gpio)];
- irq_set_lockdep_class(irq, &gpio_lock_class);
+ irq_set_lockdep_class(irq, &gpio_lock_class,
+ &gpio_request_class);
irq_set_chip_data(irq, bank);
irq_set_chip_and_handler(irq, &tgi->ic, handle_simple_irq);
}
static int xgene_gpio_sb_domain_activate(struct irq_domain *d,
struct irq_data *irq_data,
- bool early)
+ bool reserve)
{
struct xgene_gpio_sb *priv = d->host_data;
u32 gpio = HWIRQ_TO_GPIO(priv, irq_data->hwirq);
}
if (!chip->names)
- devprop_gpiochip_set_names(chip);
+ devprop_gpiochip_set_names(chip, dev_fwnode(chip->parent));
acpi_gpiochip_request_regions(acpi_gpio);
acpi_gpiochip_scan_gpios(acpi_gpio);
/**
* devprop_gpiochip_set_names - Set GPIO line names using device properties
* @chip: GPIO chip whose lines should be named, if possible
+ * @fwnode: Property Node containing the gpio-line-names property
*
* Looks for device property "gpio-line-names" and if it exists assigns
* GPIO line names for the chip. The memory allocated for the assigned
* names belong to the underlying firmware node and should not be released
* by the caller.
*/
-void devprop_gpiochip_set_names(struct gpio_chip *chip)
+void devprop_gpiochip_set_names(struct gpio_chip *chip,
+ const struct fwnode_handle *fwnode)
{
struct gpio_device *gdev = chip->gpiodev;
const char **names;
int ret, i;
- if (!chip->parent) {
- dev_warn(&gdev->dev, "GPIO chip parent is NULL\n");
- return;
- }
-
- ret = device_property_read_string_array(chip->parent, "gpio-line-names",
+ ret = fwnode_property_read_string_array(fwnode, "gpio-line-names",
NULL, 0);
if (ret < 0)
return;
if (ret != gdev->ngpio) {
- dev_warn(chip->parent,
+ dev_warn(&gdev->dev,
"names %d do not match number of GPIOs %d\n", ret,
gdev->ngpio);
return;
if (!names)
return;
- ret = device_property_read_string_array(chip->parent, "gpio-line-names",
+ ret = fwnode_property_read_string_array(fwnode, "gpio-line-names",
names, gdev->ngpio);
if (ret < 0) {
- dev_warn(chip->parent, "failed to read GPIO line names\n");
+ dev_warn(&gdev->dev, "failed to read GPIO line names\n");
kfree(names);
return;
}
/* If the chip defines names itself, these take precedence */
if (!chip->names)
- devprop_gpiochip_set_names(chip);
+ devprop_gpiochip_set_names(chip,
+ of_fwnode_handle(chip->of_node));
of_node_get(chip->of_node);
static void gpiochip_free_hogs(struct gpio_chip *chip);
static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
- struct lock_class_key *key);
+ struct lock_class_key *lock_key,
+ struct lock_class_key *request_key);
static void gpiochip_irqchip_remove(struct gpio_chip *gpiochip);
static int gpiochip_irqchip_init_valid_mask(struct gpio_chip *gpiochip);
static void gpiochip_irqchip_free_valid_mask(struct gpio_chip *gpiochip);
}
int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data,
- struct lock_class_key *key)
+ struct lock_class_key *lock_key,
+ struct lock_class_key *request_key)
{
unsigned long flags;
int status = 0;
if (status)
goto err_remove_from_list;
- status = gpiochip_add_irqchip(chip, key);
+ status = gpiochip_add_irqchip(chip, lock_key, request_key);
if (status)
goto err_remove_chip;
* This lock class tells lockdep that GPIO irqs are in a different
* category than their parents, so it won't report false recursion.
*/
- irq_set_lockdep_class(irq, chip->irq.lock_key);
+ irq_set_lockdep_class(irq, chip->irq.lock_key, chip->irq.request_key);
irq_set_chip_and_handler(irq, chip->irq.chip, chip->irq.handler);
/* Chips that use nested thread handlers have them marked */
if (chip->irq.threaded)
/**
* gpiochip_add_irqchip() - adds an IRQ chip to a GPIO chip
* @gpiochip: the GPIO chip to add the IRQ chip to
- * @lock_key: lockdep class
+ * @lock_key: lockdep class for IRQ lock
+ * @request_key: lockdep class for IRQ request
*/
static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
- struct lock_class_key *lock_key)
+ struct lock_class_key *lock_key,
+ struct lock_class_key *request_key)
{
struct irq_chip *irqchip = gpiochip->irq.chip;
const struct irq_domain_ops *ops;
gpiochip->to_irq = gpiochip_to_irq;
gpiochip->irq.default_type = type;
gpiochip->irq.lock_key = lock_key;
+ gpiochip->irq.request_key = request_key;
if (gpiochip->irq.domain_ops)
ops = gpiochip->irq.domain_ops;
* @type: the default type for IRQs on this irqchip, pass IRQ_TYPE_NONE
* to have the core avoid setting up any default type in the hardware.
* @threaded: whether this irqchip uses a nested thread handler
- * @lock_key: lockdep class
+ * @lock_key: lockdep class for IRQ lock
+ * @request_key: lockdep class for IRQ request
*
* This function closely associates a certain irqchip with a certain
* gpiochip, providing an irq domain to translate the local IRQs to
irq_flow_handler_t handler,
unsigned int type,
bool threaded,
- struct lock_class_key *lock_key)
+ struct lock_class_key *lock_key,
+ struct lock_class_key *request_key)
{
struct device_node *of_node;
gpiochip->irq.default_type = type;
gpiochip->to_irq = gpiochip_to_irq;
gpiochip->irq.lock_key = lock_key;
+ gpiochip->irq.request_key = request_key;
gpiochip->irq.domain = irq_domain_add_simple(of_node,
gpiochip->ngpio, first_irq,
&gpiochip_domain_ops, gpiochip);
#else /* CONFIG_GPIOLIB_IRQCHIP */
static inline int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
- struct lock_class_key *key)
+ struct lock_class_key *lock_key,
+ struct lock_class_key *request_key)
{
return 0;
}
}
EXPORT_SYMBOL_GPL(gpiod_set_raw_value);
+/**
+ * gpiod_set_value_nocheck() - set a GPIO line value without checking
+ * @desc: the descriptor to set the value on
+ * @value: value to set
+ *
+ * This sets the value of a GPIO line backing a descriptor, applying
+ * different semantic quirks like active low and open drain/source
+ * handling.
+ */
+static void gpiod_set_value_nocheck(struct gpio_desc *desc, int value)
+{
+ if (test_bit(FLAG_ACTIVE_LOW, &desc->flags))
+ value = !value;
+ if (test_bit(FLAG_OPEN_DRAIN, &desc->flags))
+ gpio_set_open_drain_value_commit(desc, value);
+ else if (test_bit(FLAG_OPEN_SOURCE, &desc->flags))
+ gpio_set_open_source_value_commit(desc, value);
+ else
+ gpiod_set_raw_value_commit(desc, value);
+}
+
/**
* gpiod_set_value() - assign a gpio's value
* @desc: gpio whose value will be assigned
void gpiod_set_value(struct gpio_desc *desc, int value)
{
VALIDATE_DESC_VOID(desc);
- /* Should be using gpiod_set_value_cansleep() */
WARN_ON(desc->gdev->chip->can_sleep);
- if (test_bit(FLAG_ACTIVE_LOW, &desc->flags))
- value = !value;
- if (test_bit(FLAG_OPEN_DRAIN, &desc->flags))
- gpio_set_open_drain_value_commit(desc, value);
- else if (test_bit(FLAG_OPEN_SOURCE, &desc->flags))
- gpio_set_open_source_value_commit(desc, value);
- else
- gpiod_set_raw_value_commit(desc, value);
+ gpiod_set_value_nocheck(desc, value);
}
EXPORT_SYMBOL_GPL(gpiod_set_value);
{
might_sleep_if(extra_checks);
VALIDATE_DESC_VOID(desc);
- if (test_bit(FLAG_ACTIVE_LOW, &desc->flags))
- value = !value;
- gpiod_set_raw_value_commit(desc, value);
+ gpiod_set_value_nocheck(desc, value);
}
EXPORT_SYMBOL_GPL(gpiod_set_value_cansleep);
return desc - &desc->gdev->descs[0];
}
-void devprop_gpiochip_set_names(struct gpio_chip *chip);
+void devprop_gpiochip_set_names(struct gpio_chip *chip,
+ const struct fwnode_handle *fwnode);
/* With descriptor prefix */
PACKET3_MAP_QUEUES_PIPE(ring->pipe) |
PACKET3_MAP_QUEUES_ME((ring->me == 1 ? 0 : 1)) |
PACKET3_MAP_QUEUES_QUEUE_TYPE(0) | /*queue_type: normal compute queue */
- PACKET3_MAP_QUEUES_ALLOC_FORMAT(1) | /* alloc format: all_on_one_pipe */
+ PACKET3_MAP_QUEUES_ALLOC_FORMAT(0) | /* alloc format: all_on_one_pipe */
PACKET3_MAP_QUEUES_ENGINE_SEL(0) | /* engine_sel: compute */
PACKET3_MAP_QUEUES_NUM_QUEUES(1)); /* num_queues: must be 1 */
amdgpu_ring_write(kiq_ring, PACKET3_MAP_QUEUES_DOORBELL_OFFSET(ring->doorbell_index));
const struct dm_connector_state *dm_state)
{
struct drm_display_mode *preferred_mode = NULL;
- const struct drm_connector *drm_connector;
+ struct drm_connector *drm_connector;
struct dc_stream_state *stream = NULL;
struct drm_display_mode mode = *drm_mode;
bool native_mode_found = false;
if (!aconnector->dc_sink) {
/*
- * Exclude MST from creating fake_sink
- * TODO: need to enable MST into fake_sink feature
+ * Create dc_sink when necessary to MST
+ * Don't apply fake_sink to MST
*/
- if (aconnector->mst_port)
- goto stream_create_fail;
+ if (aconnector->mst_port) {
+ dm_dp_mst_dc_sink_create(drm_connector);
+ goto mst_dc_sink_create_done;
+ }
if (create_fake_sink(aconnector))
goto stream_create_fail;
stream_create_fail:
dm_state_null:
drm_connector_null:
+mst_dc_sink_create_done:
return stream;
}
struct mutex hpd_lock;
bool fake_enable;
+
+ bool mst_connected;
};
#define to_amdgpu_dm_connector(x) container_of(x, struct amdgpu_dm_connector, base)
return ret;
}
+void dm_dp_mst_dc_sink_create(struct drm_connector *connector)
+{
+ struct amdgpu_dm_connector *aconnector = to_amdgpu_dm_connector(connector);
+ struct edid *edid;
+ struct dc_sink *dc_sink;
+ struct dc_sink_init_data init_params = {
+ .link = aconnector->dc_link,
+ .sink_signal = SIGNAL_TYPE_DISPLAY_PORT_MST };
+
+ edid = drm_dp_mst_get_edid(connector, &aconnector->mst_port->mst_mgr, aconnector->port);
+
+ if (!edid) {
+ drm_mode_connector_update_edid_property(
+ &aconnector->base,
+ NULL);
+ return;
+ }
+
+ aconnector->edid = edid;
+
+ dc_sink = dc_link_add_remote_sink(
+ aconnector->dc_link,
+ (uint8_t *)aconnector->edid,
+ (aconnector->edid->extensions + 1) * EDID_LENGTH,
+ &init_params);
+
+ dc_sink->priv = aconnector;
+ aconnector->dc_sink = dc_sink;
+
+ amdgpu_dm_add_sink_to_freesync_module(
+ connector, aconnector->edid);
+
+ drm_mode_connector_update_edid_property(
+ &aconnector->base, aconnector->edid);
+}
+
static int dm_dp_mst_get_modes(struct drm_connector *connector)
{
struct amdgpu_dm_connector *aconnector = to_amdgpu_dm_connector(connector);
drm_mode_connector_set_path_property(connector, pathprop);
drm_connector_list_iter_end(&conn_iter);
+ aconnector->mst_connected = true;
return &aconnector->base;
}
}
*/
amdgpu_dm_connector_funcs_reset(connector);
+ aconnector->mst_connected = true;
+
DRM_INFO("DM_MST: added connector: %p [id: %d] [master: %p]\n",
aconnector, connector->base.id, aconnector->mst_port);
drm_mode_connector_update_edid_property(
&aconnector->base,
NULL);
+
+ aconnector->mst_connected = false;
}
static void dm_dp_mst_hotplug(struct drm_dp_mst_topology_mgr *mgr)
drm_kms_helper_hotplug_event(dev);
}
+static void dm_dp_mst_link_status_reset(struct drm_connector *connector)
+{
+ mutex_lock(&connector->dev->mode_config.mutex);
+ drm_mode_connector_set_link_status_property(connector, DRM_MODE_LINK_STATUS_BAD);
+ mutex_unlock(&connector->dev->mode_config.mutex);
+}
+
static void dm_dp_mst_register_connector(struct drm_connector *connector)
{
struct drm_device *dev = connector->dev;
struct amdgpu_device *adev = dev->dev_private;
+ struct amdgpu_dm_connector *aconnector = to_amdgpu_dm_connector(connector);
if (adev->mode_info.rfbdev)
drm_fb_helper_add_one_connector(&adev->mode_info.rfbdev->helper, connector);
drm_connector_register(connector);
+ if (aconnector->mst_connected)
+ dm_dp_mst_link_status_reset(connector);
}
static const struct drm_dp_mst_topology_cbs dm_mst_cbs = {
void amdgpu_dm_initialize_dp_connector(struct amdgpu_display_manager *dm,
struct amdgpu_dm_connector *aconnector);
+void dm_dp_mst_dc_sink_create(struct drm_connector *connector);
#endif
v->override_vta_ps[input_idx] = pipe->plane_res.scl_data.taps.v_taps;
v->override_hta_pschroma[input_idx] = pipe->plane_res.scl_data.taps.h_taps_c;
v->override_vta_pschroma[input_idx] = pipe->plane_res.scl_data.taps.v_taps_c;
+ /*
+ * Spreadsheet doesn't handle taps_c is one properly,
+ * need to force Chroma to always be scaled to pass
+ * bandwidth validation.
+ */
+ if (v->override_hta_pschroma[input_idx] == 1)
+ v->override_hta_pschroma[input_idx] = 2;
+ if (v->override_vta_pschroma[input_idx] == 1)
+ v->override_vta_pschroma[input_idx] = 2;
v->source_scan[input_idx] = (pipe->plane_state->rotation % 2) ? dcn_bw_vert : dcn_bw_hor;
}
if (v->is_line_buffer_bpp_fixed == dcn_bw_yes)
link->link_enc->funcs->disable_output(link->link_enc, signal, link);
}
-bool dp_active_dongle_validate_timing(
+static bool dp_active_dongle_validate_timing(
const struct dc_crtc_timing *timing,
const struct dc_dongle_caps *dongle_caps)
{
/* Check Color Depth and Pixel Clock */
if (timing->pixel_encoding == PIXEL_ENCODING_YCBCR420)
required_pix_clk /= 2;
+ else if (timing->pixel_encoding == PIXEL_ENCODING_YCBCR422)
+ required_pix_clk = required_pix_clk * 2 / 3;
switch (timing->display_color_depth) {
case COLOR_DEPTH_666:
int num_planes,
struct dc_state *context)
{
- int i, be_idx;
+ int i;
if (num_planes == 0)
return;
- be_idx = -1;
for (i = 0; i < dc->res_pool->pipe_count; i++) {
- if (stream == context->res_ctx.pipe_ctx[i].stream) {
- be_idx = context->res_ctx.pipe_ctx[i].stream_res.tg->inst;
- break;
+ struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
+ struct pipe_ctx *old_pipe_ctx = &dc->current_state->res_ctx.pipe_ctx[i];
+
+ if (stream == pipe_ctx->stream) {
+ if (!pipe_ctx->top_pipe &&
+ (pipe_ctx->plane_state || old_pipe_ctx->plane_state))
+ dc->hwss.pipe_control_lock(dc, pipe_ctx, true);
}
}
context->stream_count);
dce110_program_front_end_for_pipe(dc, pipe_ctx);
+
+ dc->hwss.update_plane_addr(dc, pipe_ctx);
+
program_surface_visibility(dc, pipe_ctx);
}
+
+ for (i = 0; i < dc->res_pool->pipe_count; i++) {
+ struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
+ struct pipe_ctx *old_pipe_ctx = &dc->current_state->res_ctx.pipe_ctx[i];
+
+ if ((stream == pipe_ctx->stream) &&
+ (!pipe_ctx->top_pipe) &&
+ (pipe_ctx->plane_state || old_pipe_ctx->plane_state))
+ dc->hwss.pipe_control_lock(dc, pipe_ctx, false);
+ }
}
static void dce110_power_down_fe(struct dc *dc, int fe_idx)
scl_data->taps.h_taps = 1;
if (IDENTITY_RATIO(scl_data->ratios.vert))
scl_data->taps.v_taps = 1;
- /*
- * Spreadsheet doesn't handle taps_c is one properly,
- * need to force Chroma to always be scaled to pass
- * bandwidth validation.
- */
+ if (IDENTITY_RATIO(scl_data->ratios.horz_c))
+ scl_data->taps.h_taps_c = 1;
+ if (IDENTITY_RATIO(scl_data->ratios.vert_c))
+ scl_data->taps.v_taps_c = 1;
}
return true;
void dpp1_cm_set_output_csc_default(
struct dpp *dpp_base,
- const struct default_adjustment *default_adjust);
+ enum dc_color_space colorspace);
void dpp1_cm_set_gamut_remap(
struct dpp *dpp,
void dpp1_cm_set_output_csc_default(
struct dpp *dpp_base,
- const struct default_adjustment *default_adjust)
+ enum dc_color_space colorspace)
{
struct dcn10_dpp *dpp = TO_DCN10_DPP(dpp_base);
uint32_t ocsc_mode = 0;
- if (default_adjust != NULL) {
- switch (default_adjust->out_color_space) {
+ switch (colorspace) {
case COLOR_SPACE_SRGB:
case COLOR_SPACE_2020_RGB_FULLRANGE:
ocsc_mode = 0;
case COLOR_SPACE_UNKNOWN:
default:
break;
- }
}
REG_SET(CM_OCSC_CONTROL, 0, CM_OCSC_MODE, ocsc_mode);
tbl_entry.color_space = color_space;
//tbl_entry.regval = matrix;
pipe_ctx->plane_res.dpp->funcs->opp_set_csc_adjustment(pipe_ctx->plane_res.dpp, &tbl_entry);
+ } else {
+ pipe_ctx->plane_res.dpp->funcs->opp_set_csc_default(pipe_ctx->plane_res.dpp, colorspace);
}
}
static bool is_lower_pipe_tree_visible(struct pipe_ctx *pipe_ctx)
void (*opp_set_csc_default)(
struct dpp *dpp,
- const struct default_adjustment *default_adjust);
+ enum dc_color_space colorspace);
void (*opp_set_csc_adjustment)(
struct dpp *dpp,
void armada_drm_plane_calc_addrs(u32 *addrs, struct drm_framebuffer *fb,
int x, int y)
{
+ const struct drm_format_info *format = fb->format;
+ unsigned int num_planes = format->num_planes;
u32 addr = drm_fb_obj(fb)->dev_addr;
- int num_planes = fb->format->num_planes;
int i;
if (num_planes > 3)
num_planes = 3;
- for (i = 0; i < num_planes; i++)
+ addrs[0] = addr + fb->offsets[0] + y * fb->pitches[0] +
+ x * format->cpp[0];
+
+ y /= format->vsub;
+ x /= format->hsub;
+
+ for (i = 1; i < num_planes; i++)
addrs[i] = addr + fb->offsets[i] + y * fb->pitches[i] +
- x * fb->format->cpp[i];
+ x * format->cpp[i];
for (; i < 3; i++)
addrs[i] = 0;
}
if (plane->fb)
drm_framebuffer_put(plane->fb);
- /* Power down the Y/U/V FIFOs */
- sram_para1 = CFG_PDWN16x66 | CFG_PDWN32x66;
-
/* Power down most RAMs and FIFOs if this is the primary plane */
if (plane->type == DRM_PLANE_TYPE_PRIMARY) {
- sram_para1 |= CFG_PDWN256x32 | CFG_PDWN256x24 | CFG_PDWN256x8 |
- CFG_PDWN32x32 | CFG_PDWN64x66;
+ sram_para1 = CFG_PDWN256x32 | CFG_PDWN256x24 | CFG_PDWN256x8 |
+ CFG_PDWN32x32 | CFG_PDWN64x66;
dma_ctrl0_mask = CFG_GRA_ENA;
} else {
+ /* Power down the Y/U/V FIFOs */
+ sram_para1 = CFG_PDWN16x66 | CFG_PDWN32x66;
dma_ctrl0_mask = CFG_DMA_ENA;
}
ret = devm_request_irq(dev, irq, armada_drm_irq, 0, "armada_drm_crtc",
dcrtc);
- if (ret < 0) {
- kfree(dcrtc);
- return ret;
- }
+ if (ret < 0)
+ goto err_crtc;
if (dcrtc->variant->init) {
ret = dcrtc->variant->init(dcrtc, dev);
- if (ret) {
- kfree(dcrtc);
- return ret;
- }
+ if (ret)
+ goto err_crtc;
}
/* Ensure AXI pipeline is enabled */
dcrtc->crtc.port = port;
primary = kzalloc(sizeof(*primary), GFP_KERNEL);
- if (!primary)
- return -ENOMEM;
+ if (!primary) {
+ ret = -ENOMEM;
+ goto err_crtc;
+ }
ret = armada_drm_plane_init(primary);
if (ret) {
kfree(primary);
- return ret;
+ goto err_crtc;
}
ret = drm_universal_plane_init(drm, &primary->base, 0,
DRM_PLANE_TYPE_PRIMARY, NULL);
if (ret) {
kfree(primary);
- return ret;
+ goto err_crtc;
}
ret = drm_crtc_init_with_planes(drm, &dcrtc->crtc, &primary->base, NULL,
err_crtc_init:
primary->base.funcs->destroy(&primary->base);
+err_crtc:
+ kfree(dcrtc);
+
return ret;
}
};
struct armada_plane_state {
+ u16 src_x;
+ u16 src_y;
u32 src_hw;
u32 dst_hw;
u32 dst_yx;
{
struct armada_ovl_plane *dplane = drm_to_armada_ovl_plane(plane);
struct armada_crtc *dcrtc = drm_to_armada_crtc(crtc);
+ const struct drm_format_info *format;
struct drm_rect src = {
.x1 = src_x,
.y1 = src_y,
};
uint32_t val, ctrl0;
unsigned idx = 0;
- bool visible;
+ bool visible, fb_changed;
int ret;
trace_armada_ovl_plane_update(plane, crtc, fb,
if (!visible)
ctrl0 &= ~CFG_DMA_ENA;
+ /*
+ * Shifting a YUV packed format image by one pixel causes the U/V
+ * planes to swap. Compensate for it by also toggling the UV swap.
+ */
+ format = fb->format;
+ if (format->num_planes == 1 && src.x1 >> 16 & (format->hsub - 1))
+ ctrl0 ^= CFG_DMA_MOD(CFG_SWAPUV);
+
+ fb_changed = plane->fb != fb ||
+ dplane->base.state.src_x != src.x1 >> 16 ||
+ dplane->base.state.src_y != src.y1 >> 16;
+
if (!dcrtc->plane) {
dcrtc->plane = plane;
armada_ovl_update_attr(&dplane->prop, dcrtc);
/* FIXME: overlay on an interlaced display */
/* Just updating the position/size? */
- if (plane->fb == fb && dplane->base.state.ctrl0 == ctrl0) {
+ if (!fb_changed && dplane->base.state.ctrl0 == ctrl0) {
val = (drm_rect_height(&src) & 0xffff0000) |
drm_rect_width(&src) >> 16;
dplane->base.state.src_hw = val;
if (armada_drm_plane_work_wait(&dplane->base, HZ / 25) == 0)
armada_drm_plane_work_cancel(dcrtc, &dplane->base);
- if (plane->fb != fb) {
- u32 addrs[3], pixel_format;
- int num_planes, hsub;
+ if (fb_changed) {
+ u32 addrs[3];
/*
* Take a reference on the new framebuffer - we want to
if (plane->fb)
armada_ovl_retire_fb(dplane, plane->fb);
- src_y = src.y1 >> 16;
- src_x = src.x1 >> 16;
+ dplane->base.state.src_y = src_y = src.y1 >> 16;
+ dplane->base.state.src_x = src_x = src.x1 >> 16;
armada_drm_plane_calc_addrs(addrs, fb, src_x, src_y);
- pixel_format = fb->format->format;
- hsub = drm_format_horz_chroma_subsampling(pixel_format);
- num_planes = fb->format->num_planes;
-
- /*
- * Annoyingly, shifting a YUYV-format image by one pixel
- * causes the U/V planes to toggle. Toggle the UV swap.
- * (Unfortunately, this causes momentary colour flickering.)
- */
- if (src_x & (hsub - 1) && num_planes == 1)
- ctrl0 ^= CFG_DMA_MOD(CFG_SWAPUV);
-
armada_reg_queue_set(dplane->vbl.regs, idx, addrs[0],
LCD_SPU_DMA_START_ADDR_Y0);
armada_reg_queue_set(dplane->vbl.regs, idx, addrs[1],
connector->funcs->destroy(connector);
}
-static void drm_connector_free_work_fn(struct work_struct *work)
+void drm_connector_free_work_fn(struct work_struct *work)
{
- struct drm_connector *connector =
- container_of(work, struct drm_connector, free_work);
- struct drm_device *dev = connector->dev;
+ struct drm_connector *connector, *n;
+ struct drm_device *dev =
+ container_of(work, struct drm_device, mode_config.connector_free_work);
+ struct drm_mode_config *config = &dev->mode_config;
+ unsigned long flags;
+ struct llist_node *freed;
- drm_mode_object_unregister(dev, &connector->base);
- connector->funcs->destroy(connector);
+ spin_lock_irqsave(&config->connector_list_lock, flags);
+ freed = llist_del_all(&config->connector_free_list);
+ spin_unlock_irqrestore(&config->connector_list_lock, flags);
+
+ llist_for_each_entry_safe(connector, n, freed, free_node) {
+ drm_mode_object_unregister(dev, &connector->base);
+ connector->funcs->destroy(connector);
+ }
}
/**
if (ret)
return ret;
- INIT_WORK(&connector->free_work, drm_connector_free_work_fn);
-
connector->base.properties = &connector->properties;
connector->dev = dev;
connector->funcs = funcs;
* actually release the connector when dropping our final reference.
*/
static void
-drm_connector_put_safe(struct drm_connector *conn)
+__drm_connector_put_safe(struct drm_connector *conn)
{
- if (refcount_dec_and_test(&conn->base.refcount.refcount))
- schedule_work(&conn->free_work);
+ struct drm_mode_config *config = &conn->dev->mode_config;
+
+ lockdep_assert_held(&config->connector_list_lock);
+
+ if (!refcount_dec_and_test(&conn->base.refcount.refcount))
+ return;
+
+ llist_add(&conn->free_node, &config->connector_free_list);
+ schedule_work(&config->connector_free_work);
}
/**
/* loop until it's not a zombie connector */
} while (!kref_get_unless_zero(&iter->conn->base.refcount));
- spin_unlock_irqrestore(&config->connector_list_lock, flags);
if (old_conn)
- drm_connector_put_safe(old_conn);
+ __drm_connector_put_safe(old_conn);
+ spin_unlock_irqrestore(&config->connector_list_lock, flags);
return iter->conn;
}
*/
void drm_connector_list_iter_end(struct drm_connector_list_iter *iter)
{
+ struct drm_mode_config *config = &iter->dev->mode_config;
+ unsigned long flags;
+
iter->dev = NULL;
- if (iter->conn)
- drm_connector_put_safe(iter->conn);
+ if (iter->conn) {
+ spin_lock_irqsave(&config->connector_list_lock, flags);
+ __drm_connector_put_safe(iter->conn);
+ spin_unlock_irqrestore(&config->connector_list_lock, flags);
+ }
lock_release(&connector_list_iter_dep_map, 0, _RET_IP_);
}
EXPORT_SYMBOL(drm_connector_list_iter_end);
if (edid)
size = EDID_LENGTH * (1 + edid->extensions);
+ /* Set the display info, using edid if available, otherwise
+ * reseting the values to defaults. This duplicates the work
+ * done in drm_add_edid_modes, but that function is not
+ * consistently called before this one in all drivers and the
+ * computation is cheap enough that it seems better to
+ * duplicate it rather than attempt to ensure some arbitrary
+ * ordering of calls.
+ */
+ if (edid)
+ drm_add_display_info(connector, edid);
+ else
+ drm_reset_display_info(connector);
+
drm_object_property_set_value(&connector->base,
dev->mode_config.non_desktop_property,
connector->display_info.non_desktop);
uint64_t value);
int drm_connector_create_standard_properties(struct drm_device *dev);
const char *drm_get_connector_force_name(enum drm_connector_force force);
+void drm_connector_free_work_fn(struct work_struct *work);
/* IOCTL */
int drm_mode_connector_property_set_ioctl(struct drm_device *dev,
*
* Returns true if @vendor is in @edid, false otherwise
*/
-static bool edid_vendor(struct edid *edid, const char *vendor)
+static bool edid_vendor(const struct edid *edid, const char *vendor)
{
char edid_vendor[3];
*
* This tells subsequent routines what fixes they need to apply.
*/
-static u32 edid_get_quirks(struct edid *edid)
+static u32 edid_get_quirks(const struct edid *edid)
{
const struct edid_quirk *quirk;
int i;
/*
* Search EDID for CEA extension block.
*/
-static u8 *drm_find_edid_extension(struct edid *edid, int ext_id)
+static u8 *drm_find_edid_extension(const struct edid *edid, int ext_id)
{
u8 *edid_ext = NULL;
int i;
return edid_ext;
}
-static u8 *drm_find_cea_extension(struct edid *edid)
+static u8 *drm_find_cea_extension(const struct edid *edid)
{
return drm_find_edid_extension(edid, CEA_EXT);
}
-static u8 *drm_find_displayid_extension(struct edid *edid)
+static u8 *drm_find_displayid_extension(const struct edid *edid)
{
return drm_find_edid_extension(edid, DISPLAYID_EXT);
}
}
static void drm_parse_cea_ext(struct drm_connector *connector,
- struct edid *edid)
+ const struct edid *edid)
{
struct drm_display_info *info = &connector->display_info;
const u8 *edid_ext;
}
}
-static void drm_add_display_info(struct drm_connector *connector,
- struct edid *edid, u32 quirks)
+/* A connector has no EDID information, so we've got no EDID to compute quirks from. Reset
+ * all of the values which would have been set from EDID
+ */
+void
+drm_reset_display_info(struct drm_connector *connector)
{
struct drm_display_info *info = &connector->display_info;
+ info->width_mm = 0;
+ info->height_mm = 0;
+
+ info->bpc = 0;
+ info->color_formats = 0;
+ info->cea_rev = 0;
+ info->max_tmds_clock = 0;
+ info->dvi_dual = false;
+
+ info->non_desktop = 0;
+}
+EXPORT_SYMBOL_GPL(drm_reset_display_info);
+
+u32 drm_add_display_info(struct drm_connector *connector, const struct edid *edid)
+{
+ struct drm_display_info *info = &connector->display_info;
+
+ u32 quirks = edid_get_quirks(edid);
+
info->width_mm = edid->width_cm * 10;
info->height_mm = edid->height_cm * 10;
info->non_desktop = !!(quirks & EDID_QUIRK_NON_DESKTOP);
+ DRM_DEBUG_KMS("non_desktop set to %d\n", info->non_desktop);
+
if (edid->revision < 3)
- return;
+ return quirks;
if (!(edid->input & DRM_EDID_INPUT_DIGITAL))
- return;
+ return quirks;
drm_parse_cea_ext(connector, edid);
/* Only defined for 1.4 with digital displays */
if (edid->revision < 4)
- return;
+ return quirks;
switch (edid->input & DRM_EDID_DIGITAL_DEPTH_MASK) {
case DRM_EDID_DIGITAL_DEPTH_6:
info->color_formats |= DRM_COLOR_FORMAT_YCRCB444;
if (edid->features & DRM_EDID_FEATURE_RGB_YCRCB422)
info->color_formats |= DRM_COLOR_FORMAT_YCRCB422;
+ return quirks;
}
+EXPORT_SYMBOL_GPL(drm_add_display_info);
static int validate_displayid(u8 *displayid, int length, int idx)
{
return 0;
}
- quirks = edid_get_quirks(edid);
-
/*
* CEA-861-F adds ycbcr capability map block, for HDMI 2.0 sinks.
* To avoid multiple parsing of same block, lets parse that map
* from sink info, before parsing CEA modes.
*/
- drm_add_display_info(connector, edid, quirks);
+ quirks = drm_add_display_info(connector, edid);
/*
* EDID spec says modes should be preferred in this order:
mutex_lock(&dev->mode_config.idr_mutex);
- /* Insert the new lessee into the tree */
- id = idr_alloc(&(drm_lease_owner(lessor)->lessee_idr), lessee, 1, 0, GFP_KERNEL);
- if (id < 0) {
- error = id;
- goto out_lessee;
- }
-
- lessee->lessee_id = id;
- lessee->lessor = drm_master_get(lessor);
- list_add_tail(&lessee->lessee_list, &lessor->lessees);
-
idr_for_each_entry(leases, entry, object) {
error = 0;
if (!idr_find(&dev->mode_config.crtc_idr, object))
}
}
+ /* Insert the new lessee into the tree */
+ id = idr_alloc(&(drm_lease_owner(lessor)->lessee_idr), lessee, 1, 0, GFP_KERNEL);
+ if (id < 0) {
+ error = id;
+ goto out_lessee;
+ }
+
+ lessee->lessee_id = id;
+ lessee->lessor = drm_master_get(lessor);
+ list_add_tail(&lessee->lessee_list, &lessor->lessees);
+
/* Move the leases over */
lessee->leases = *leases;
DRM_DEBUG_LEASE("new lessee %d %p, lessor %d %p\n", lessee->lessee_id, lessee, lessor->lessee_id, lessor);
return lessee;
out_lessee:
- drm_master_put(&lessee);
-
mutex_unlock(&dev->mode_config.idr_mutex);
+ drm_master_put(&lessee);
+
return ERR_PTR(error);
}
*/
void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
{
+ struct drm_mm *mm = old->mm;
+
DRM_MM_BUG_ON(!old->allocated);
*new = *old;
list_replace(&old->node_list, &new->node_list);
- rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree.rb_root);
+ rb_replace_node_cached(&old->rb, &new->rb, &mm->interval_tree);
if (drm_mm_hole_follows(old)) {
list_replace(&old->hole_stack, &new->hole_stack);
rb_replace_node(&old->rb_hole_size,
&new->rb_hole_size,
- &old->mm->holes_size);
+ &mm->holes_size);
rb_replace_node(&old->rb_hole_addr,
&new->rb_hole_addr,
- &old->mm->holes_addr);
+ &mm->holes_addr);
}
old->allocated = false;
ida_init(&dev->mode_config.connector_ida);
spin_lock_init(&dev->mode_config.connector_list_lock);
+ init_llist_head(&dev->mode_config.connector_free_list);
+ INIT_WORK(&dev->mode_config.connector_free_work, drm_connector_free_work_fn);
+
drm_mode_create_standard_properties(dev);
/* Just to be sure */
}
drm_connector_list_iter_end(&conn_iter);
/* connector_iter drops references in a work item. */
- flush_scheduled_work();
+ flush_work(&dev->mode_config.connector_free_work);
if (WARN_ON(!list_empty(&dev->mode_config.connector_list))) {
drm_connector_list_iter_begin(dev, &conn_iter);
drm_for_each_connector_iter(connector, &conn_iter)
}
/*
- * setplane_internal - setplane handler for internal callers
+ * __setplane_internal - setplane handler for internal callers
*
- * Note that we assume an extra reference has already been taken on fb. If the
- * update fails, this reference will be dropped before return; if it succeeds,
- * the previous framebuffer (if any) will be unreferenced instead.
+ * This function will take a reference on the new fb for the plane
+ * on success.
*
* src_{x,y,w,h} are provided in 16.16 fixed point format
*/
if (!ret) {
plane->crtc = crtc;
plane->fb = fb;
- fb = NULL;
+ drm_framebuffer_get(plane->fb);
} else {
plane->old_fb = NULL;
}
out:
- if (fb)
- drm_framebuffer_put(fb);
if (plane->old_fb)
drm_framebuffer_put(plane->old_fb);
plane->old_fb = NULL;
struct drm_plane *plane;
struct drm_crtc *crtc = NULL;
struct drm_framebuffer *fb = NULL;
+ int ret;
if (!drm_core_check_feature(dev, DRIVER_MODESET))
return -EINVAL;
}
}
- /*
- * setplane_internal will take care of deref'ing either the old or new
- * framebuffer depending on success.
- */
- return setplane_internal(plane, crtc, fb,
- plane_req->crtc_x, plane_req->crtc_y,
- plane_req->crtc_w, plane_req->crtc_h,
- plane_req->src_x, plane_req->src_y,
- plane_req->src_w, plane_req->src_h);
+ ret = setplane_internal(plane, crtc, fb,
+ plane_req->crtc_x, plane_req->crtc_y,
+ plane_req->crtc_w, plane_req->crtc_h,
+ plane_req->src_x, plane_req->src_y,
+ plane_req->src_w, plane_req->src_h);
+
+ if (fb)
+ drm_framebuffer_put(fb);
+
+ return ret;
}
static int drm_mode_cursor_universal(struct drm_crtc *crtc,
src_h = fb->height << 16;
}
- /*
- * setplane_internal will take care of deref'ing either the old or new
- * framebuffer depending on success.
- */
ret = __setplane_internal(crtc->cursor, crtc, fb,
- crtc_x, crtc_y, crtc_w, crtc_h,
- 0, 0, src_w, src_h, ctx);
+ crtc_x, crtc_y, crtc_w, crtc_h,
+ 0, 0, src_w, src_h, ctx);
+
+ if (fb)
+ drm_framebuffer_put(fb);
/* Update successful; save new cursor position, if necessary */
if (ret == 0 && req->flags & DRM_MODE_CURSOR_MOVE) {
.release = drm_syncobj_file_release,
};
-static int drm_syncobj_alloc_file(struct drm_syncobj *syncobj)
-{
- struct file *file = anon_inode_getfile("syncobj_file",
- &drm_syncobj_file_fops,
- syncobj, 0);
- if (IS_ERR(file))
- return PTR_ERR(file);
-
- drm_syncobj_get(syncobj);
- if (cmpxchg(&syncobj->file, NULL, file)) {
- /* lost the race */
- fput(file);
- }
-
- return 0;
-}
-
int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd)
{
- int ret;
+ struct file *file;
int fd;
fd = get_unused_fd_flags(O_CLOEXEC);
if (fd < 0)
return fd;
- if (!syncobj->file) {
- ret = drm_syncobj_alloc_file(syncobj);
- if (ret) {
- put_unused_fd(fd);
- return ret;
- }
+ file = anon_inode_getfile("syncobj_file",
+ &drm_syncobj_file_fops,
+ syncobj, 0);
+ if (IS_ERR(file)) {
+ put_unused_fd(fd);
+ return PTR_ERR(file);
}
- fd_install(fd, syncobj->file);
+
+ drm_syncobj_get(syncobj);
+ fd_install(fd, file);
+
*p_fd = fd;
return 0;
}
return ret;
}
-static struct drm_syncobj *drm_syncobj_fdget(int fd)
-{
- struct file *file = fget(fd);
-
- if (!file)
- return NULL;
- if (file->f_op != &drm_syncobj_file_fops)
- goto err;
-
- return file->private_data;
-err:
- fput(file);
- return NULL;
-};
-
static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
int fd, u32 *handle)
{
- struct drm_syncobj *syncobj = drm_syncobj_fdget(fd);
+ struct drm_syncobj *syncobj;
+ struct file *file;
int ret;
- if (!syncobj)
+ file = fget(fd);
+ if (!file)
return -EINVAL;
+ if (file->f_op != &drm_syncobj_file_fops) {
+ fput(file);
+ return -EINVAL;
+ }
+
/* take a reference to put in the idr */
+ syncobj = file->private_data;
drm_syncobj_get(syncobj);
idr_preload(GFP_KERNEL);
spin_unlock(&file_private->syncobj_table_lock);
idr_preload_end();
- if (ret < 0) {
- fput(syncobj->file);
- return ret;
- }
- *handle = ret;
- return 0;
+ if (ret > 0) {
+ *handle = ret;
+ ret = 0;
+ } else
+ drm_syncobj_put(syncobj);
+
+ fput(file);
+ return ret;
}
static int drm_syncobj_import_sync_file_fence(struct drm_file *file_private,
}
static struct cmd_info *find_cmd_entry_any_ring(struct intel_gvt *gvt,
- unsigned int opcode, int rings)
+ unsigned int opcode, unsigned long rings)
{
struct cmd_info *info = NULL;
unsigned int ring;
- for_each_set_bit(ring, (unsigned long *)&rings, I915_NUM_ENGINES) {
+ for_each_set_bit(ring, &rings, I915_NUM_ENGINES) {
info = find_cmd_entry(gvt, opcode, ring);
if (info)
break;
/* Clear host CRT status, so guest couldn't detect this host CRT. */
if (IS_BROADWELL(dev_priv))
vgpu_vreg(vgpu, PCH_ADPA) &= ~ADPA_CRT_HOTPLUG_MONITOR_MASK;
+
+ vgpu_vreg(vgpu, PIPECONF(PIPE_A)) |= PIPECONF_ENABLE;
}
static void clean_virtual_dp_monitor(struct intel_vgpu *vgpu, int port_num)
static int setup_virtual_dp_monitor(struct intel_vgpu *vgpu, int port_num,
int type, unsigned int resolution)
{
- struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
struct intel_vgpu_port *port = intel_vgpu_port(vgpu, port_num);
if (WARN_ON(resolution >= GVT_EDID_NUM))
port->type = type;
emulate_monitor_status_change(vgpu);
- vgpu_vreg(vgpu, PIPECONF(PIPE_A)) |= PIPECONF_ENABLE;
+
return 0;
}
return ret;
} else {
if (!test_bit(index, spt->post_shadow_bitmap)) {
+ int type = spt->shadow_page.type;
+
ppgtt_get_shadow_entry(spt, &se, index);
ret = ppgtt_handle_guest_entry_removal(gpt, &se, index);
if (ret)
return ret;
+ ops->set_pfn(&se, vgpu->gtt.scratch_pt[type].page_mfn);
+ ppgtt_set_shadow_entry(spt, &se, index);
}
-
ppgtt_set_post_shadow(spt, index);
}
*/
struct workqueue_struct *wq;
+ /* ordered wq for modesets */
+ struct workqueue_struct *modeset_wq;
+
/* Display functions */
struct drm_i915_display_funcs display;
* must wait for all rendering to complete to the object (as unbinding
* must anyway), and retire the requests.
*/
- ret = i915_gem_object_wait(obj,
- I915_WAIT_INTERRUPTIBLE |
- I915_WAIT_LOCKED |
- I915_WAIT_ALL,
- MAX_SCHEDULE_TIMEOUT,
- NULL);
+ ret = i915_gem_object_set_to_cpu_domain(obj, false);
if (ret)
return ret;
- i915_gem_retire_requests(to_i915(obj->base.dev));
-
while ((vma = list_first_entry_or_null(&obj->vma_list,
struct i915_vma,
obj_link))) {
struct drm_i915_gem_request *rq;
struct intel_engine_cs *engine;
- if (!dma_fence_is_i915(fence))
+ if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence))
return;
rq = to_request(fence);
#define RESET_PCH_HANDSHAKE_ENABLE (1<<4)
#define GEN8_CHICKEN_DCPR_1 _MMIO(0x46430)
+#define SKL_SELECT_ALTERNATE_DC_EXIT (1<<30)
#define MASK_WAKEMEM (1<<13)
#define SKL_DFSM _MMIO(0x51000)
#define GEN9_SLICE_COMMON_ECO_CHICKEN0 _MMIO(0x7308)
#define DISABLE_PIXEL_MASK_CAMMING (1<<14)
+#define GEN9_SLICE_COMMON_ECO_CHICKEN1 _MMIO(0x731c)
+
#define GEN7_L3SQCREG1 _MMIO(0xB010)
#define VLV_B0_WA_L3SQCREG1_VALUE 0x00D30000
#define BXT_CDCLK_CD2X_DIV_SEL_2 (2<<22)
#define BXT_CDCLK_CD2X_DIV_SEL_4 (3<<22)
#define BXT_CDCLK_CD2X_PIPE(pipe) ((pipe)<<20)
+#define CDCLK_DIVMUX_CD_OVERRIDE (1<<19)
#define BXT_CDCLK_CD2X_PIPE_NONE BXT_CDCLK_CD2X_PIPE(3)
#define BXT_CDCLK_SSA_PRECHARGE_ENABLE (1<<16)
#define CDCLK_FREQ_DECIMAL_MASK (0x7ff)
struct dma_fence *dma;
struct timer_list timer;
struct irq_work work;
+ struct rcu_head rcu;
};
static void timer_i915_sw_fence_wake(struct timer_list *t)
del_timer_sync(&cb->timer);
dma_fence_put(cb->dma);
- kfree(cb);
+ kfree_rcu(cb, rcu);
}
int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
struct intel_wait *wait, *n, *first;
if (!b->irq_armed)
- return;
+ goto wakeup_signaler;
/* We only disarm the irq when we are idle (all requests completed),
* so if the bottom-half remains asleep, it missed the request
b->waiters = RB_ROOT;
spin_unlock_irq(&b->rb_lock);
+
+ /*
+ * The signaling thread may be asleep holding a reference to a request,
+ * that had its signaling cancelled prior to being preempted. We need
+ * to kick the signaler, just in case, to release any such reference.
+ */
+wakeup_signaler:
+ wake_up_process(b->signaler);
}
static bool use_fake_irq(const struct intel_breadcrumbs *b)
}
if (unlikely(do_schedule)) {
- DEFINE_WAIT(exec);
-
if (kthread_should_park())
kthread_parkme();
- if (kthread_should_stop()) {
- GEM_BUG_ON(request);
+ if (unlikely(kthread_should_stop())) {
+ i915_gem_request_put(request);
break;
}
- if (request)
- add_wait_queue(&request->execute, &exec);
-
schedule();
-
- if (request)
- remove_wait_queue(&request->execute, &exec);
}
i915_gem_request_put(request);
} while (1);
static void skl_dpll0_enable(struct drm_i915_private *dev_priv, int vco)
{
- int min_cdclk = skl_calc_cdclk(0, vco);
u32 val;
WARN_ON(vco != 8100000 && vco != 8640000);
- /* select the minimum CDCLK before enabling DPLL 0 */
- val = CDCLK_FREQ_337_308 | skl_cdclk_decimal(min_cdclk);
- I915_WRITE(CDCLK_CTL, val);
- POSTING_READ(CDCLK_CTL);
-
/*
* We always enable DPLL0 with the lowest link rate possible, but still
* taking into account the VCO required to operate the eDP panel at the
{
int cdclk = cdclk_state->cdclk;
int vco = cdclk_state->vco;
- u32 freq_select, pcu_ack;
+ u32 freq_select, pcu_ack, cdclk_ctl;
int ret;
WARN_ON((cdclk == 24000) != (vco == 0));
return;
}
- /* set CDCLK_CTL */
+ /* Choose frequency for this cdclk */
switch (cdclk) {
case 450000:
case 432000:
dev_priv->cdclk.hw.vco != vco)
skl_dpll0_disable(dev_priv);
+ cdclk_ctl = I915_READ(CDCLK_CTL);
+
+ if (dev_priv->cdclk.hw.vco != vco) {
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK);
+ cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+ }
+
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl |= CDCLK_DIVMUX_CD_OVERRIDE;
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+ POSTING_READ(CDCLK_CTL);
+
if (dev_priv->cdclk.hw.vco != vco)
skl_dpll0_enable(dev_priv, vco);
- I915_WRITE(CDCLK_CTL, freq_select | skl_cdclk_decimal(cdclk));
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+
+ cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~CDCLK_DIVMUX_CD_OVERRIDE;
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
POSTING_READ(CDCLK_CTL);
/* inform PCU of the change */
if (WARN_ON(!pll))
return;
+ mutex_lock(&dev_priv->dpll_lock);
+
if (IS_CANNONLAKE(dev_priv)) {
/* Configure DPCLKA_CFGCR0 to map the DPLL to the DDI. */
val = I915_READ(DPCLKA_CFGCR0);
} else if (INTEL_INFO(dev_priv)->gen < 9) {
I915_WRITE(PORT_CLK_SEL(port), hsw_pll_to_ddi_pll_sel(pll));
}
+
+ mutex_unlock(&dev_priv->dpll_lock);
}
static void intel_ddi_clk_disable(struct intel_encoder *encoder)
}
ret = intel_modeset_setup_plane_state(state, crtc, mode, fb, 0, 0);
+ drm_framebuffer_put(fb);
if (ret)
goto fail;
- drm_framebuffer_put(fb);
-
ret = drm_atomic_set_mode_for_crtc(&crtc_state->base, mode);
if (ret)
goto fail;
INIT_WORK(&state->commit_work, intel_atomic_commit_work);
i915_sw_fence_commit(&intel_state->commit_ready);
- if (nonblock)
+ if (nonblock && intel_state->modeset) {
+ queue_work(dev_priv->modeset_wq, &state->commit_work);
+ } else if (nonblock) {
queue_work(system_unbound_wq, &state->commit_work);
- else
+ } else {
+ if (intel_state->modeset)
+ flush_workqueue(dev_priv->modeset_wq);
intel_atomic_commit_tail(state);
-
+ }
return 0;
}
primary->frontbuffer_bit = INTEL_FRONTBUFFER_PRIMARY(pipe);
primary->check_plane = intel_check_primary_plane;
- if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv)) {
+ if (INTEL_GEN(dev_priv) >= 10) {
intel_primary_formats = skl_primary_formats;
num_formats = ARRAY_SIZE(skl_primary_formats);
modifiers = skl_format_modifiers_ccs;
enum pipe pipe;
struct intel_crtc *crtc;
+ dev_priv->modeset_wq = alloc_ordered_workqueue("i915_modeset", 0);
+
drm_mode_config_init(dev);
dev->mode_config.min_width = 0;
intel_cleanup_gt_powersave(dev_priv);
intel_teardown_gmbus(dev_priv);
+
+ destroy_workqueue(dev_priv->modeset_wq);
}
void intel_connector_attach_encoder(struct intel_connector *connector,
if (ret)
return ret;
+ /* WA #0862: Userspace has to set "Barrier Mode" to avoid hangs. */
+ ret = wa_ring_whitelist_reg(engine, GEN9_SLICE_COMMON_ECO_CHICKEN1);
+ if (ret)
+ return ret;
+
/* WaToEnableHwFixForPushConstHWBug:glk */
WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION);
};
if (!pci_dev_present(atom_hdaudio_ids)) {
- DRM_INFO("%s\n", "HDaudio controller not detected, using LPE audio instead\n");
+ DRM_INFO("HDaudio controller not detected, using LPE audio instead\n");
lpe_present = true;
}
}
GEM_BUG_ON(prio == I915_PRIORITY_INVALID);
+ if (i915_gem_request_completed(request))
+ return;
+
if (prio <= READ_ONCE(request->priotree.priority))
return;
struct drm_i915_private *dev_priv = to_i915(dev);
if (dev_priv->psr.active) {
- i915_reg_t psr_ctl;
+ i915_reg_t psr_status;
u32 psr_status_mask;
if (dev_priv->psr.aux_frame_sync)
0);
if (dev_priv->psr.psr2_support) {
- psr_ctl = EDP_PSR2_CTL;
+ psr_status = EDP_PSR2_STATUS_CTL;
psr_status_mask = EDP_PSR2_STATUS_STATE_MASK;
- I915_WRITE(psr_ctl,
- I915_READ(psr_ctl) &
+ I915_WRITE(EDP_PSR2_CTL,
+ I915_READ(EDP_PSR2_CTL) &
~(EDP_PSR2_ENABLE | EDP_SU_TRACK_ENABLE));
} else {
- psr_ctl = EDP_PSR_STATUS_CTL;
+ psr_status = EDP_PSR_STATUS_CTL;
psr_status_mask = EDP_PSR_STATUS_STATE_MASK;
- I915_WRITE(psr_ctl,
- I915_READ(psr_ctl) & ~EDP_PSR_ENABLE);
+ I915_WRITE(EDP_PSR_CTL,
+ I915_READ(EDP_PSR_CTL) & ~EDP_PSR_ENABLE);
}
/* Wait till PSR is idle */
if (intel_wait_for_register(dev_priv,
- psr_ctl, psr_status_mask, 0,
+ psr_status, psr_status_mask, 0,
2000))
DRM_ERROR("Timed out waiting for PSR Idle State\n");
DRM_DEBUG_KMS("Enabling DC5\n");
+ /* Wa Display #1183: skl,kbl,cfl */
+ if (IS_GEN9_BC(dev_priv))
+ I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) |
+ SKL_SELECT_ALTERNATE_DC_EXIT);
+
gen9_set_dc_state(dev_priv, DC_STATE_EN_UPTO_DC5);
}
{
DRM_DEBUG_KMS("Disabling DC6\n");
+ /* Wa Display #1183: skl,kbl,cfl */
+ if (IS_GEN9_BC(dev_priv))
+ I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) |
+ SKL_SELECT_ALTERNATE_DC_EXIT);
+
gen9_set_dc_state(dev_priv, DC_STATE_DISABLE);
}
GLK_DISPLAY_POWERWELL_2_POWER_DOMAINS | \
BIT_ULL(POWER_DOMAIN_MODESET) | \
BIT_ULL(POWER_DOMAIN_AUX_A) | \
+ BIT_ULL(POWER_DOMAIN_GMBUS) | \
BIT_ULL(POWER_DOMAIN_INIT))
#define CNL_DISPLAY_POWERWELL_2_POWER_DOMAINS ( \
/* Determine if we can get a cache-coherent map, forcing
* uncached mapping if we can't.
*/
- if (mmu->type[drm->ttm.type_host].type & NVIF_MEM_UNCACHED)
+ if (!nouveau_drm_use_coherent_gpu_mapping(drm))
nvbo->force_coherent = true;
}
if (cli->device.info.family > NV_DEVICE_INFO_V0_CURIE &&
(flags & TTM_PL_FLAG_VRAM) && !vmm->page[i].vram)
continue;
- if ((flags & TTM_PL_FLAG_TT ) && !vmm->page[i].host)
+ if ((flags & TTM_PL_FLAG_TT) &&
+ (!vmm->page[i].host || vmm->page[i].shift > PAGE_SHIFT))
continue;
/* Select this page size if it's the first that supports
work->cli = cli;
mutex_lock(&cli->lock);
list_add_tail(&work->head, &cli->worker);
- mutex_unlock(&cli->lock);
if (dma_fence_add_callback(fence, &work->cb, nouveau_cli_work_fence))
nouveau_cli_work_fence(fence, &work->cb);
+ mutex_unlock(&cli->lock);
}
static void
struct nvif_object copy;
int mtrr;
int type_vram;
- int type_host;
- int type_ncoh;
+ int type_host[2];
+ int type_ncoh[2];
} ttm;
/* GEM interface support */
return dev->dev_private;
}
+static inline bool
+nouveau_drm_use_coherent_gpu_mapping(struct nouveau_drm *drm)
+{
+ struct nvif_mmu *mmu = &drm->client.mmu;
+ return !(mmu->type[drm->ttm.type_host[0]].type & NVIF_MEM_UNCACHED);
+}
+
int nouveau_pmops_suspend(struct device *);
int nouveau_pmops_resume(struct device *);
bool nouveau_pmops_runtime(void);
drm_fb_helper_unregister_fbi(&fbcon->helper);
drm_fb_helper_fini(&fbcon->helper);
- if (nouveau_fb->nvbo) {
+ if (nouveau_fb && nouveau_fb->nvbo) {
nouveau_vma_del(&nouveau_fb->vma);
nouveau_bo_unmap(nouveau_fb->nvbo);
nouveau_bo_unpin(nouveau_fb->nvbo);
u8 type;
int ret;
- if (mmu->type[drm->ttm.type_host].type & NVIF_MEM_UNCACHED)
- type = drm->ttm.type_ncoh;
+ if (!nouveau_drm_use_coherent_gpu_mapping(drm))
+ type = drm->ttm.type_ncoh[!!mem->kind];
else
- type = drm->ttm.type_host;
+ type = drm->ttm.type_host[0];
if (mem->kind && !(mmu->type[type].type & NVIF_MEM_KIND))
mem->comp = mem->kind = 0;
drm->ttm.mem_global_ref.release = NULL;
}
-int
-nouveau_ttm_init(struct nouveau_drm *drm)
+static int
+nouveau_ttm_init_host(struct nouveau_drm *drm, u8 kind)
{
- struct nvkm_device *device = nvxx_device(&drm->client.device);
- struct nvkm_pci *pci = device->pci;
struct nvif_mmu *mmu = &drm->client.mmu;
- struct drm_device *dev = drm->dev;
- int typei, ret;
+ int typei;
typei = nvif_mmu_type(mmu, NVIF_MEM_HOST | NVIF_MEM_MAPPABLE |
- NVIF_MEM_COHERENT);
+ kind | NVIF_MEM_COHERENT);
if (typei < 0)
return -ENOSYS;
- drm->ttm.type_host = typei;
+ drm->ttm.type_host[!!kind] = typei;
- typei = nvif_mmu_type(mmu, NVIF_MEM_HOST | NVIF_MEM_MAPPABLE);
+ typei = nvif_mmu_type(mmu, NVIF_MEM_HOST | NVIF_MEM_MAPPABLE | kind);
if (typei < 0)
return -ENOSYS;
- drm->ttm.type_ncoh = typei;
+ drm->ttm.type_ncoh[!!kind] = typei;
+ return 0;
+}
+
+int
+nouveau_ttm_init(struct nouveau_drm *drm)
+{
+ struct nvkm_device *device = nvxx_device(&drm->client.device);
+ struct nvkm_pci *pci = device->pci;
+ struct nvif_mmu *mmu = &drm->client.mmu;
+ struct drm_device *dev = drm->dev;
+ int typei, ret;
+
+ ret = nouveau_ttm_init_host(drm, 0);
+ if (ret)
+ return ret;
+
+ if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_TESLA &&
+ drm->client.device.info.chipset != 0x50) {
+ ret = nouveau_ttm_init_host(drm, NVIF_MEM_KIND);
+ if (ret)
+ return ret;
+ }
if (drm->client.device.info.platform != NV_DEVICE_INFO_V0_SOC &&
drm->client.device.info.family >= NV_DEVICE_INFO_V0_TESLA) {
nvif_vmm_put(&vma->vmm->vmm, &tmp);
}
list_del(&vma->head);
- *pvma = NULL;
kfree(*pvma);
+ *pvma = NULL;
}
}
.imem = gk20a_instmem_new,
.ltc = gp100_ltc_new,
.mc = gp10b_mc_new,
- .mmu = gf100_mmu_new,
+ .mmu = gp10b_mmu_new,
.secboot = gp10b_secboot_new,
.pmu = gm20b_pmu_new,
.timer = gk20a_timer_new,
.links = gf119_sor_dp_links,
.power = g94_sor_dp_power,
.pattern = gf119_sor_dp_pattern,
+ .drive = gf119_sor_dp_drive,
.vcpi = gf119_sor_dp_vcpi,
.audio = gf119_sor_dp_audio,
.audio_sym = gf119_sor_dp_audio_sym,
if (data) {
*ver = nvbios_rd08(bios, data + 0x00);
switch (*ver) {
+ case 0x20:
case 0x21:
case 0x30:
case 0x40:
if (data && idx < *cnt) {
u16 outp = nvbios_rd16(bios, data + *hdr + idx * *len);
switch (*ver * !!outp) {
+ case 0x20:
case 0x21:
case 0x30:
*hdr = nvbios_rd08(bios, data + 0x04);
info->type = nvbios_rd16(bios, data + 0x00);
info->mask = nvbios_rd16(bios, data + 0x02);
switch (*ver) {
+ case 0x20:
+ info->mask |= 0x00c0; /* match any link */
+ /* fall-through */
case 0x21:
case 0x30:
info->flags = nvbios_rd08(bios, data + 0x05);
info->script[0] = nvbios_rd16(bios, data + 0x06);
info->script[1] = nvbios_rd16(bios, data + 0x08);
- info->lnkcmp = nvbios_rd16(bios, data + 0x0a);
+ if (*len >= 0x0c)
+ info->lnkcmp = nvbios_rd16(bios, data + 0x0a);
if (*len >= 0x0f) {
info->script[2] = nvbios_rd16(bios, data + 0x0c);
info->script[3] = nvbios_rd16(bios, data + 0x0e);
memset(info, 0x00, sizeof(*info));
if (data) {
switch (*ver) {
+ case 0x20:
case 0x21:
info->dc = nvbios_rd08(bios, data + 0x02);
info->pe = nvbios_rd08(bios, data + 0x03);
iobj->base.memory.ptrs = &nv50_instobj_fast;
else
iobj->base.memory.ptrs = &nv50_instobj_slow;
- refcount_inc(&iobj->maps);
+ refcount_set(&iobj->maps, 1);
}
mutex_unlock(&imem->subdev.mutex);
return ret;
pci->irq = pdev->irq;
+
+ /* Ensure MSI interrupts are armed, for the case where there are
+ * already interrupts pending (for whatever reason) at load time.
+ */
+ if (pci->msi)
+ pci->func->msi_rearm(pci);
+
return ret;
}
/* then read the message */
msg.len = cnt & 0xf;
+ if (msg.len > CEC_MAX_MSG_SIZE - 2)
+ msg.len = CEC_MAX_MSG_SIZE - 2;
msg.msg[0] = hdmi_read_reg(core->base,
HDMI_CEC_RX_CMD_HEADER);
msg.msg[1] = hdmi_read_reg(core->base,
}
}
-static void hdmi_cec_transmit_fifo_empty(struct hdmi_core_data *core, u32 stat1)
-{
- if (stat1 & 2) {
- u32 dbg3 = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
-
- cec_transmit_done(core->adap,
- CEC_TX_STATUS_NACK |
- CEC_TX_STATUS_MAX_RETRIES,
- 0, (dbg3 >> 4) & 7, 0, 0);
- } else if (stat1 & 1) {
- cec_transmit_done(core->adap,
- CEC_TX_STATUS_ARB_LOST |
- CEC_TX_STATUS_MAX_RETRIES,
- 0, 0, 0, 0);
- } else if (stat1 == 0) {
- cec_transmit_done(core->adap, CEC_TX_STATUS_OK,
- 0, 0, 0, 0);
- }
-}
-
void hdmi4_cec_irq(struct hdmi_core_data *core)
{
u32 stat0 = hdmi_read_reg(core->base, HDMI_CEC_INT_STATUS_0);
hdmi_write_reg(core->base, HDMI_CEC_INT_STATUS_0, stat0);
hdmi_write_reg(core->base, HDMI_CEC_INT_STATUS_1, stat1);
- if (stat0 & 0x40)
+ if (stat0 & 0x20) {
+ cec_transmit_done(core->adap, CEC_TX_STATUS_OK,
+ 0, 0, 0, 0);
REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
- else if (stat0 & 0x24)
- hdmi_cec_transmit_fifo_empty(core, stat1);
- if (stat1 & 2) {
+ } else if (stat1 & 0x02) {
u32 dbg3 = hdmi_read_reg(core->base, HDMI_CEC_DBG_3);
cec_transmit_done(core->adap,
CEC_TX_STATUS_NACK |
CEC_TX_STATUS_MAX_RETRIES,
0, (dbg3 >> 4) & 7, 0, 0);
- } else if (stat1 & 1) {
- cec_transmit_done(core->adap,
- CEC_TX_STATUS_ARB_LOST |
- CEC_TX_STATUS_MAX_RETRIES,
- 0, 0, 0, 0);
+ REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
}
if (stat0 & 0x02)
hdmi_cec_received_msg(core);
- if (stat1 & 0x3)
- REG_FLD_MOD(core->base, HDMI_CEC_DBG_3, 0x1, 7, 7);
}
static bool hdmi_cec_clear_tx_fifo(struct cec_adapter *adap)
/*
* Enable CEC interrupts:
* Transmit Buffer Full/Empty Change event
- * Transmitter FIFO Empty event
* Receiver FIFO Not Empty event
*/
- hdmi_write_reg(core->base, HDMI_CEC_INT_ENABLE_0, 0x26);
+ hdmi_write_reg(core->base, HDMI_CEC_INT_ENABLE_0, 0x22);
/*
* Enable CEC interrupts:
- * RX FIFO Overrun Error event
- * Short Pulse Detected event
* Frame Retransmit Count Exceeded event
- * Start Bit Irregularity event
*/
- hdmi_write_reg(core->base, HDMI_CEC_INT_ENABLE_1, 0x0f);
+ hdmi_write_reg(core->base, HDMI_CEC_INT_ENABLE_1, 0x02);
/* cec calibration enable (self clearing) */
hdmi_write_reg(core->base, HDMI_CEC_SETUP, 0x03);
writel(val, hdmi->base + SUN4I_HDMI_VID_TIMING_POL_REG);
}
+static enum drm_mode_status sun4i_hdmi_mode_valid(struct drm_encoder *encoder,
+ const struct drm_display_mode *mode)
+{
+ struct sun4i_hdmi *hdmi = drm_encoder_to_sun4i_hdmi(encoder);
+ unsigned long rate = mode->clock * 1000;
+ unsigned long diff = rate / 200; /* +-0.5% allowed by HDMI spec */
+ long rounded_rate;
+
+ /* 165 MHz is the typical max pixelclock frequency for HDMI <= 1.2 */
+ if (rate > 165000000)
+ return MODE_CLOCK_HIGH;
+ rounded_rate = clk_round_rate(hdmi->tmds_clk, rate);
+ if (rounded_rate > 0 &&
+ max_t(unsigned long, rounded_rate, rate) -
+ min_t(unsigned long, rounded_rate, rate) < diff)
+ return MODE_OK;
+ return MODE_NOCLOCK;
+}
+
static const struct drm_encoder_helper_funcs sun4i_hdmi_helper_funcs = {
.atomic_check = sun4i_hdmi_atomic_check,
.disable = sun4i_hdmi_disable,
.enable = sun4i_hdmi_enable,
.mode_set = sun4i_hdmi_mode_set,
+ .mode_valid = sun4i_hdmi_mode_valid,
};
static const struct drm_encoder_funcs sun4i_hdmi_funcs = {
if (IS_ERR(tcon->crtc)) {
dev_err(dev, "Couldn't create our CRTC\n");
ret = PTR_ERR(tcon->crtc);
- goto err_free_clocks;
+ goto err_free_dotclock;
}
ret = sun4i_rgb_init(drm, tcon);
if (ret < 0)
- goto err_free_clocks;
+ goto err_free_dotclock;
if (tcon->quirks->needs_de_be_mux) {
/*
name, err);
goto remove;
}
+ } else {
+ /* fall back to the module clock on SOR0 (eDP/LVDS only) */
+ sor->clk_out = sor->clk;
}
sor->clk_parent = devm_clk_get(&pdev->dev, "parent");
freed += (nr_free_pool - shrink_pages) << pool->order;
if (freed >= sc->nr_to_scan)
break;
+ shrink_pages <<= pool->order;
}
mutex_unlock(&lock);
return freed;
int r = 0;
unsigned i, j, cpages;
unsigned npages = 1 << order;
- unsigned max_cpages = min(count, (unsigned)NUM_PAGES_TO_ALLOC);
+ unsigned max_cpages = min(count << order, (unsigned)NUM_PAGES_TO_ALLOC);
/* allocate array for page caching change */
caching_array = kmalloc(max_cpages*sizeof(struct page *), GFP_KERNEL);
pr_info("Initializing pool allocator\n");
_manager = kzalloc(sizeof(*_manager), GFP_KERNEL);
+ if (!_manager)
+ return -ENOMEM;
ttm_page_pool_init_locked(&_manager->wc_pool, GFP_HIGHUSER, "wc", 0);
/* If we got force-completed because of GPU reset rather than
* through our IRQ handler, signal the fence now.
*/
- if (exec->fence)
+ if (exec->fence) {
dma_fence_signal(exec->fence);
+ dma_fence_put(exec->fence);
+ }
if (exec->bo) {
for (i = 0; i < exec->bo_count; i++) {
list_move_tail(&exec->head, &vc4->job_done_list);
if (exec->fence) {
dma_fence_signal_locked(exec->fence);
+ dma_fence_put(exec->fence);
exec->fence = NULL;
}
vc4_submit_next_render_job(dev);
{
struct vc4_dev *vc4 = to_vc4_dev(dev);
- /* Undo the effects of a previous vc4_irq_uninstall. */
- enable_irq(dev->irq);
-
/* Enable both the render done and out of memory interrupts. */
V3D_WRITE(V3D_INTENA, V3D_DRIVER_IRQS);
return ret;
vc4_v3d_init_hw(vc4->dev);
+
+ /* We disabled the IRQ as part of vc4_irq_uninstall in suspend. */
+ enable_irq(vc4->dev->irq);
vc4_irq_postinstall(vc4->dev);
return 0;
}
view_type = vmw_view_cmd_to_type(header->id);
+ if (view_type == vmw_view_max)
+ return -EINVAL;
cmd = container_of(header, typeof(*cmd), header);
ret = vmw_cmd_res_check(dev_priv, sw_context, vmw_res_surface,
user_surface_converter,
vps->pinned = 0;
/* Mapping is managed by prepare_fb/cleanup_fb */
- memset(&vps->guest_map, 0, sizeof(vps->guest_map));
memset(&vps->host_map, 0, sizeof(vps->host_map));
vps->cpp = 0;
/* Should have been freed by cleanup_fb */
- if (vps->guest_map.virtual) {
- DRM_ERROR("Guest mapping not freed\n");
- ttm_bo_kunmap(&vps->guest_map);
- }
-
if (vps->host_map.virtual) {
DRM_ERROR("Host mapping not freed\n");
ttm_bo_kunmap(&vps->host_map);
int pinned;
/* For CPU Blit */
- struct ttm_bo_kmap_obj host_map, guest_map;
+ struct ttm_bo_kmap_obj host_map;
unsigned int cpp;
};
bool defined;
/* For CPU Blit */
- struct ttm_bo_kmap_obj host_map, guest_map;
+ struct ttm_bo_kmap_obj host_map;
unsigned int cpp;
};
s32 src_pitch, dst_pitch;
u8 *src, *dst;
bool not_used;
-
+ struct ttm_bo_kmap_obj guest_map;
+ int ret;
if (!dirty->num_hits)
return;
if (width == 0 || height == 0)
return;
+ ret = ttm_bo_kmap(&ddirty->buf->base, 0, ddirty->buf->base.num_pages,
+ &guest_map);
+ if (ret) {
+ DRM_ERROR("Failed mapping framebuffer for blit: %d\n",
+ ret);
+ goto out_cleanup;
+ }
/* Assume we are blitting from Host (display_srf) to Guest (dmabuf) */
src_pitch = stdu->display_srf->base_size.width * stdu->cpp;
src += ddirty->top * src_pitch + ddirty->left * stdu->cpp;
dst_pitch = ddirty->pitch;
- dst = ttm_kmap_obj_virtual(&stdu->guest_map, ¬_used);
+ dst = ttm_kmap_obj_virtual(&guest_map, ¬_used);
dst += ddirty->fb_top * dst_pitch + ddirty->fb_left * stdu->cpp;
vmw_fifo_commit(dev_priv, sizeof(*cmd));
}
+ ttm_bo_kunmap(&guest_map);
out_cleanup:
ddirty->left = ddirty->top = ddirty->fb_left = ddirty->fb_top = S32_MAX;
ddirty->right = ddirty->bottom = S32_MIN;
{
struct vmw_plane_state *vps = vmw_plane_state_to_vps(old_state);
- if (vps->guest_map.virtual)
- ttm_bo_kunmap(&vps->guest_map);
-
if (vps->host_map.virtual)
ttm_bo_kunmap(&vps->host_map);
*/
if (vps->content_fb_type == SEPARATE_DMA &&
!(dev_priv->capabilities & SVGA_CAP_3D)) {
-
- struct vmw_framebuffer_dmabuf *new_vfbd;
-
- new_vfbd = vmw_framebuffer_to_vfbd(new_fb);
-
- ret = ttm_bo_reserve(&new_vfbd->buffer->base, false, false,
- NULL);
- if (ret)
- goto out_srf_unpin;
-
- ret = ttm_bo_kmap(&new_vfbd->buffer->base, 0,
- new_vfbd->buffer->base.num_pages,
- &vps->guest_map);
-
- ttm_bo_unreserve(&new_vfbd->buffer->base);
-
- if (ret) {
- DRM_ERROR("Failed to map content buffer to CPU\n");
- goto out_srf_unpin;
- }
-
ret = ttm_bo_kmap(&vps->surf->res.backup->base, 0,
vps->surf->res.backup->base.num_pages,
&vps->host_map);
if (ret) {
DRM_ERROR("Failed to map display buffer to CPU\n");
- ttm_bo_kunmap(&vps->guest_map);
goto out_srf_unpin;
}
stdu->display_srf = vps->surf;
stdu->content_fb_type = vps->content_fb_type;
stdu->cpp = vps->cpp;
- memcpy(&stdu->guest_map, &vps->guest_map, sizeof(vps->guest_map));
memcpy(&stdu->host_map, &vps->host_map, sizeof(vps->host_map));
if (!stdu->defined)
ret = hid_add_field(parser, HID_FEATURE_REPORT, data);
break;
default:
- hid_err(parser->device, "unknown main item tag 0x%x\n", item->tag);
+ hid_warn(parser->device, "unknown main item tag 0x%x\n", item->tag);
ret = 0;
}
(u8 *)&word, 2);
break;
case I2C_SMBUS_I2C_BLOCK_DATA:
- size = I2C_SMBUS_BLOCK_DATA;
- /* fallthrough */
+ if (read_write == I2C_SMBUS_READ) {
+ read_length = data->block[0];
+ count = cp2112_write_read_req(buf, addr, read_length,
+ command, NULL, 0);
+ } else {
+ count = cp2112_write_req(buf, addr, command,
+ data->block + 1,
+ data->block[0]);
+ }
+ break;
case I2C_SMBUS_BLOCK_DATA:
if (I2C_SMBUS_READ == read_write) {
count = cp2112_write_read_req(buf, addr,
case I2C_SMBUS_WORD_DATA:
data->word = le16_to_cpup((__le16 *)buf);
break;
+ case I2C_SMBUS_I2C_BLOCK_DATA:
+ memcpy(data->block + 1, buf, read_length);
+ break;
case I2C_SMBUS_BLOCK_DATA:
if (read_length > I2C_SMBUS_BLOCK_MAX) {
ret = -EPROTO;
#ifdef CONFIG_HOLTEK_FF
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Anssi Hannula <anssi.hannula@iki.fi>");
-MODULE_DESCRIPTION("Force feedback support for Holtek On Line Grip based devices");
-
/*
* These commands and parameters are currently known:
*
.probe = holtek_probe,
};
module_hid_driver(holtek_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Anssi Hannula <anssi.hannula@iki.fi>");
+MODULE_DESCRIPTION("Force feedback support for Holtek On Line Grip based devices");
pr_debug("child device %s unregistered\n",
dev_name(&device_obj->device));
+ kset_unregister(device_obj->channels_kset);
+
/*
* Kick off the process of unregistering the device.
* This will call vmbus_remove() and eventually vmbus_device_release()
struct hwmon_device *hwdev, int index)
{
struct hwmon_thermal_data *tdata;
+ struct thermal_zone_device *tzd;
tdata = devm_kzalloc(dev, sizeof(*tdata), GFP_KERNEL);
if (!tdata)
tdata->hwdev = hwdev;
tdata->index = index;
- devm_thermal_zone_of_sensor_register(&hwdev->dev, index, tdata,
- &hwmon_thermal_ops);
+ tzd = devm_thermal_zone_of_sensor_register(&hwdev->dev, index, tdata,
+ &hwmon_thermal_ops);
+ /*
+ * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV,
+ * so ignore that error but forward any other error.
+ */
+ if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV))
+ return PTR_ERR(tzd);
return 0;
}
if (!chip->ops->is_visible(drvdata, hwmon_temp,
hwmon_temp_input, j))
continue;
- if (info[i]->config[j] & HWMON_T_INPUT)
- hwmon_thermal_add_sensor(dev, hwdev, j);
+ if (info[i]->config[j] & HWMON_T_INPUT) {
+ err = hwmon_thermal_add_sensor(dev,
+ hwdev, j);
+ if (err)
+ goto free_device;
+ }
}
}
}
return hdev;
+free_device:
+ device_unregister(hdev);
free_hwmon:
kfree(hwdev);
ida_remove:
* @len: length of the data packet
*/
static void notrace
-stm_ftrace_write(const void *buf, unsigned int len)
+stm_ftrace_write(struct trace_export *export, const void *buf, unsigned int len)
{
- stm_source_write(&stm_ftrace.data, STM_FTRACE_CHAN, buf, len);
+ struct stm_ftrace *stm = container_of(export, struct stm_ftrace, ftrace);
+
+ stm_source_write(&stm->data, STM_FTRACE_CHAN, buf, len);
}
static int stm_ftrace_link(struct stm_source_data *data)
return 0;
}
-static struct platform_device_id cht_wc_i2c_adap_id_table[] = {
+static const struct platform_device_id cht_wc_i2c_adap_id_table[] = {
{ .name = "cht_wcove_ext_chgr" },
{},
};
if (adapdata->smba) {
i2c_del_adapter(adap);
- if (adapdata->port == (0 << 1)) {
+ if (adapdata->port == (0 << piix4_port_shift_sb800)) {
release_region(adapdata->smba, SMBIOSIZE);
if (adapdata->sb800_main)
release_region(SB800_PIIX4_SMB_IDX, 2);
+// SPDX-License-Identifier: GPL-2.0
/*
* i2c-stm32.h
*
* Copyright (C) M'boumba Cedric Madianga 2017
+ * Copyright (C) STMicroelectronics 2017
* Author: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
*
- * License terms: GNU General Public License (GPL), version 2
*/
#ifndef _I2C_STM32_H
+// SPDX-License-Identifier: GPL-2.0
/*
* Driver for STMicroelectronics STM32 I2C controller
*
* http://www.st.com/resource/en/reference_manual/DM00031020.pdf
*
* Copyright (C) M'boumba Cedric Madianga 2016
+ * Copyright (C) STMicroelectronics 2017
* Author: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
*
* This driver is based on i2c-st.c
*
- * License terms: GNU General Public License (GPL), version 2
*/
#include <linux/clk.h>
+// SPDX-License-Identifier: GPL-2.0
/*
* Driver for STMicroelectronics STM32F7 I2C controller
*
* http://www.st.com/resource/en/reference_manual/dm00124865.pdf
*
* Copyright (C) M'boumba Cedric Madianga 2017
+ * Copyright (C) STMicroelectronics 2017
* Author: M'boumba Cedric Madianga <cedric.madianga@gmail.com>
*
* This driver is based on i2c-stm32f4.c
*
- * License terms: GNU General Public License (GPL), version 2
*/
#include <linux/clk.h>
#include <linux/delay.h>
return skb->len;
}
-static const struct rdma_nl_cbs cma_cb_table[] = {
+static const struct rdma_nl_cbs cma_cb_table[RDMA_NL_RDMA_CM_NUM_OPS] = {
[RDMA_NL_RDMA_CM_ID_STATS] = { .dump = cma_get_id_stats},
};
}
#endif
-struct ib_device *__ib_device_get_by_index(u32 ifindex);
+struct ib_device *ib_device_get_by_index(u32 ifindex);
/* RDMA device netlink */
void nldev_init(void);
void nldev_exit(void);
return 0;
}
-struct ib_device *__ib_device_get_by_index(u32 index)
+static struct ib_device *__ib_device_get_by_index(u32 index)
{
struct ib_device *device;
return NULL;
}
+/*
+ * Caller is responsible to return refrerence count by calling put_device()
+ */
+struct ib_device *ib_device_get_by_index(u32 index)
+{
+ struct ib_device *device;
+
+ down_read(&lists_rwsem);
+ device = __ib_device_get_by_index(index);
+ if (device)
+ get_device(&device->dev);
+
+ up_read(&lists_rwsem);
+ return device;
+}
+
static struct ib_device *__ib_device_get_by_name(const char *name)
{
struct ib_device *device;
}
EXPORT_SYMBOL(ib_get_net_dev_by_params);
-static const struct rdma_nl_cbs ibnl_ls_cb_table[] = {
+static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
[RDMA_NL_LS_OP_RESOLVE] = {
.doit = ib_nl_handle_resolve_resp,
.flags = RDMA_NL_ADMIN_PERM,
}
EXPORT_SYMBOL(iwcm_reject_msg);
-static struct rdma_nl_cbs iwcm_nl_cb_table[] = {
+static struct rdma_nl_cbs iwcm_nl_cb_table[RDMA_NL_IWPM_NUM_OPS] = {
[RDMA_NL_IWPM_REG_PID] = {.dump = iwpm_register_pid_cb},
[RDMA_NL_IWPM_ADD_MAPPING] = {.dump = iwpm_add_mapping_cb},
[RDMA_NL_IWPM_QUERY_MAPPING] = {.dump = iwpm_add_and_query_mapping_cb},
index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
- device = __ib_device_get_by_index(index);
+ device = ib_device_get_by_index(index);
if (!device)
return -EINVAL;
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
- if (!msg)
- return -ENOMEM;
+ if (!msg) {
+ err = -ENOMEM;
+ goto err;
+ }
nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET),
0, 0);
err = fill_dev_info(msg, device);
- if (err) {
- nlmsg_free(msg);
- return err;
- }
+ if (err)
+ goto err_free;
nlmsg_end(msg, nlh);
+ put_device(&device->dev);
return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_free:
+ nlmsg_free(msg);
+err:
+ put_device(&device->dev);
+ return err;
}
static int _nldev_get_dumpit(struct ib_device *device,
return -EINVAL;
index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
- device = __ib_device_get_by_index(index);
+ device = ib_device_get_by_index(index);
if (!device)
return -EINVAL;
port = nla_get_u32(tb[RDMA_NLDEV_ATTR_PORT_INDEX]);
- if (!rdma_is_port_valid(device, port))
- return -EINVAL;
+ if (!rdma_is_port_valid(device, port)) {
+ err = -EINVAL;
+ goto err;
+ }
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
- if (!msg)
- return -ENOMEM;
+ if (!msg) {
+ err = -ENOMEM;
+ goto err;
+ }
nlh = nlmsg_put(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_GET),
0, 0);
err = fill_port_info(msg, device, port);
- if (err) {
- nlmsg_free(msg);
- return err;
- }
+ if (err)
+ goto err_free;
nlmsg_end(msg, nlh);
+ put_device(&device->dev);
return rdma_nl_unicast(msg, NETLINK_CB(skb).portid);
+
+err_free:
+ nlmsg_free(msg);
+err:
+ put_device(&device->dev);
+ return err;
}
static int nldev_port_get_dumpit(struct sk_buff *skb,
return -EINVAL;
ifindex = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]);
- device = __ib_device_get_by_index(ifindex);
+ device = ib_device_get_by_index(ifindex);
if (!device)
return -EINVAL;
nlmsg_end(skb, nlh);
}
-out: cb->args[0] = idx;
+out:
+ put_device(&device->dev);
+ cb->args[0] = idx;
return skb->len;
}
-static const struct rdma_nl_cbs nldev_cb_table[] = {
+static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
[RDMA_NLDEV_CMD_GET] = {
.doit = nldev_get_doit,
.dump = nldev_get_dumpit,
if (ret)
return ret;
+ if (!qp->qp_sec)
+ return 0;
+
mutex_lock(&real_qp->qp_sec->mutex);
ret = check_qp_port_pkey_settings(real_qp->qp_sec->ports_pkeys,
qp->qp_sec);
if (!rdma_protocol_ib(map->agent.device, map->agent.port_num))
return 0;
- if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed)
- return -EACCES;
+ if (map->agent.qp->qp_type == IB_QPT_SMI) {
+ if (!map->agent.smp_allowed)
+ return -EACCES;
+ return 0;
+ }
return ib_security_pkey_access(map->agent.device,
map->agent.port_num,
goto release_qp;
}
+ if ((cmd->base.attr_mask & IB_QP_ALT_PATH) &&
+ !rdma_is_port_valid(qp->device, cmd->base.alt_port_num)) {
+ ret = -EINVAL;
+ goto release_qp;
+ }
+
attr->qp_state = cmd->base.qp_state;
attr->cur_qp_state = cmd->base.cur_qp_state;
attr->path_mtu = cmd->base.path_mtu;
return -EOPNOTSUPP;
if (ucore->inlen > sizeof(cmd)) {
- if (ib_is_udata_cleared(ucore, sizeof(cmd),
- ucore->inlen - sizeof(cmd)))
+ if (!ib_is_udata_cleared(ucore, sizeof(cmd),
+ ucore->inlen - sizeof(cmd)))
return -EOPNOTSUPP;
}
spin_unlock_irqrestore(&real_qp->device->event_handler_lock, flags);
atomic_dec(&real_qp->usecnt);
- ib_close_shared_qp_security(qp->qp_sec);
+ if (qp->qp_sec)
+ ib_close_shared_qp_security(qp->qp_sec);
kfree(qp);
return 0;
static int cqe_completes_wr(struct t4_cqe *cqe, struct t4_wq *wq)
{
+ if (DRAIN_CQE(cqe)) {
+ WARN_ONCE(1, "Unexpected DRAIN CQE qp id %u!\n", wq->sq.qid);
+ return 0;
+ }
+
if (CQE_OPCODE(cqe) == FW_RI_TERMINATE)
return 0;
/*
* Special cqe for drain WR completions...
*/
- if (CQE_OPCODE(hw_cqe) == C4IW_DRAIN_OPCODE) {
+ if (DRAIN_CQE(hw_cqe)) {
*cookie = CQE_DRAIN_COOKIE(hw_cqe);
*cqe = *hw_cqe;
goto skip_cqe;
ret = -EAGAIN;
goto skip_cqe;
}
- if (unlikely((CQE_WRID_MSN(hw_cqe) != (wq->rq.msn)))) {
+ if (unlikely(!CQE_STATUS(hw_cqe) &&
+ CQE_WRID_MSN(hw_cqe) != wq->rq.msn)) {
t4_set_wq_in_error(wq);
- hw_cqe->header |= htonl(CQE_STATUS_V(T4_ERR_MSN));
- goto proc_cqe;
+ hw_cqe->header |= cpu_to_be32(CQE_STATUS_V(T4_ERR_MSN));
}
goto proc_cqe;
}
c4iw_invalidate_mr(qhp->rhp,
CQE_WRID_FR_STAG(&cqe));
break;
- case C4IW_DRAIN_OPCODE:
- wc->opcode = IB_WC_SEND;
- break;
default:
pr_err("Unexpected opcode %d in the CQE received for QPID=0x%0x\n",
CQE_OPCODE(&cqe), CQE_QPID(&cqe));
return IB_QPS_ERR;
}
-#define C4IW_DRAIN_OPCODE FW_RI_SGE_EC_CR_RETURN
-
static inline u32 c4iw_ib_to_tpt_access(int a)
{
return (a & IB_ACCESS_REMOTE_WRITE ? FW_RI_MEM_ACCESS_REM_WRITE : 0) |
return 0;
}
-static void complete_sq_drain_wr(struct c4iw_qp *qhp, struct ib_send_wr *wr)
+static int ib_to_fw_opcode(int ib_opcode)
+{
+ int opcode;
+
+ switch (ib_opcode) {
+ case IB_WR_SEND_WITH_INV:
+ opcode = FW_RI_SEND_WITH_INV;
+ break;
+ case IB_WR_SEND:
+ opcode = FW_RI_SEND;
+ break;
+ case IB_WR_RDMA_WRITE:
+ opcode = FW_RI_RDMA_WRITE;
+ break;
+ case IB_WR_RDMA_READ:
+ case IB_WR_RDMA_READ_WITH_INV:
+ opcode = FW_RI_READ_REQ;
+ break;
+ case IB_WR_REG_MR:
+ opcode = FW_RI_FAST_REGISTER;
+ break;
+ case IB_WR_LOCAL_INV:
+ opcode = FW_RI_LOCAL_INV;
+ break;
+ default:
+ opcode = -EINVAL;
+ }
+ return opcode;
+}
+
+static int complete_sq_drain_wr(struct c4iw_qp *qhp, struct ib_send_wr *wr)
{
struct t4_cqe cqe = {};
struct c4iw_cq *schp;
unsigned long flag;
struct t4_cq *cq;
+ int opcode;
schp = to_c4iw_cq(qhp->ibqp.send_cq);
cq = &schp->cq;
+ opcode = ib_to_fw_opcode(wr->opcode);
+ if (opcode < 0)
+ return opcode;
+
cqe.u.drain_cookie = wr->wr_id;
cqe.header = cpu_to_be32(CQE_STATUS_V(T4_ERR_SWFLUSH) |
- CQE_OPCODE_V(C4IW_DRAIN_OPCODE) |
+ CQE_OPCODE_V(opcode) |
CQE_TYPE_V(1) |
CQE_SWCQE_V(1) |
+ CQE_DRAIN_V(1) |
CQE_QPID_V(qhp->wq.sq.qid));
spin_lock_irqsave(&schp->lock, flag);
schp->ibcq.cq_context);
spin_unlock_irqrestore(&schp->comp_handler_lock, flag);
}
+ return 0;
+}
+
+static int complete_sq_drain_wrs(struct c4iw_qp *qhp, struct ib_send_wr *wr,
+ struct ib_send_wr **bad_wr)
+{
+ int ret = 0;
+
+ while (wr) {
+ ret = complete_sq_drain_wr(qhp, wr);
+ if (ret) {
+ *bad_wr = wr;
+ break;
+ }
+ wr = wr->next;
+ }
+ return ret;
}
static void complete_rq_drain_wr(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
cqe.u.drain_cookie = wr->wr_id;
cqe.header = cpu_to_be32(CQE_STATUS_V(T4_ERR_SWFLUSH) |
- CQE_OPCODE_V(C4IW_DRAIN_OPCODE) |
+ CQE_OPCODE_V(FW_RI_SEND) |
CQE_TYPE_V(0) |
CQE_SWCQE_V(1) |
+ CQE_DRAIN_V(1) |
CQE_QPID_V(qhp->wq.sq.qid));
spin_lock_irqsave(&rchp->lock, flag);
}
}
+static void complete_rq_drain_wrs(struct c4iw_qp *qhp, struct ib_recv_wr *wr)
+{
+ while (wr) {
+ complete_rq_drain_wr(qhp, wr);
+ wr = wr->next;
+ }
+}
+
int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
struct ib_send_wr **bad_wr)
{
qhp = to_c4iw_qp(ibqp);
spin_lock_irqsave(&qhp->lock, flag);
- if (t4_wq_in_error(&qhp->wq)) {
+
+ /*
+ * If the qp has been flushed, then just insert a special
+ * drain cqe.
+ */
+ if (qhp->wq.flushed) {
spin_unlock_irqrestore(&qhp->lock, flag);
- complete_sq_drain_wr(qhp, wr);
+ err = complete_sq_drain_wrs(qhp, wr, bad_wr);
return err;
}
num_wrs = t4_sq_avail(&qhp->wq);
qhp = to_c4iw_qp(ibqp);
spin_lock_irqsave(&qhp->lock, flag);
- if (t4_wq_in_error(&qhp->wq)) {
+
+ /*
+ * If the qp has been flushed, then just insert a special
+ * drain cqe.
+ */
+ if (qhp->wq.flushed) {
spin_unlock_irqrestore(&qhp->lock, flag);
- complete_rq_drain_wr(qhp, wr);
+ complete_rq_drain_wrs(qhp, wr);
return err;
}
num_wrs = t4_rq_avail(&qhp->wq);
spin_unlock_irqrestore(&rchp->lock, flag);
if (schp == rchp) {
- if (t4_clear_cq_armed(&rchp->cq) &&
- (rq_flushed || sq_flushed)) {
+ if ((rq_flushed || sq_flushed) &&
+ t4_clear_cq_armed(&rchp->cq)) {
spin_lock_irqsave(&rchp->comp_handler_lock, flag);
(*rchp->ibcq.comp_handler)(&rchp->ibcq,
rchp->ibcq.cq_context);
spin_unlock_irqrestore(&rchp->comp_handler_lock, flag);
}
} else {
- if (t4_clear_cq_armed(&rchp->cq) && rq_flushed) {
+ if (rq_flushed && t4_clear_cq_armed(&rchp->cq)) {
spin_lock_irqsave(&rchp->comp_handler_lock, flag);
(*rchp->ibcq.comp_handler)(&rchp->ibcq,
rchp->ibcq.cq_context);
spin_unlock_irqrestore(&rchp->comp_handler_lock, flag);
}
- if (t4_clear_cq_armed(&schp->cq) && sq_flushed) {
+ if (sq_flushed && t4_clear_cq_armed(&schp->cq)) {
spin_lock_irqsave(&schp->comp_handler_lock, flag);
(*schp->ibcq.comp_handler)(&schp->ibcq,
schp->ibcq.cq_context);
#define CQE_SWCQE_G(x) ((((x) >> CQE_SWCQE_S)) & CQE_SWCQE_M)
#define CQE_SWCQE_V(x) ((x)<<CQE_SWCQE_S)
+#define CQE_DRAIN_S 10
+#define CQE_DRAIN_M 0x1
+#define CQE_DRAIN_G(x) ((((x) >> CQE_DRAIN_S)) & CQE_DRAIN_M)
+#define CQE_DRAIN_V(x) ((x)<<CQE_DRAIN_S)
+
#define CQE_STATUS_S 5
#define CQE_STATUS_M 0x1F
#define CQE_STATUS_G(x) ((((x) >> CQE_STATUS_S)) & CQE_STATUS_M)
#define CQE_OPCODE_V(x) ((x)<<CQE_OPCODE_S)
#define SW_CQE(x) (CQE_SWCQE_G(be32_to_cpu((x)->header)))
+#define DRAIN_CQE(x) (CQE_DRAIN_G(be32_to_cpu((x)->header)))
#define CQE_QPID(x) (CQE_QPID_G(be32_to_cpu((x)->header)))
#define CQE_TYPE(x) (CQE_TYPE_G(be32_to_cpu((x)->header)))
#define SQ_TYPE(x) (CQE_TYPE((x)))
u16 pcie_lnkctl;
u16 pcie_devctl2;
u32 pci_msix0;
- u32 pci_lnkctl3;
u32 pci_tph2;
/*
if (ret)
goto error;
- ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
- dd->pci_lnkctl3);
- if (ret)
- goto error;
-
- ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, dd->pci_tph2);
- if (ret)
- goto error;
-
+ if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) {
+ ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2,
+ dd->pci_tph2);
+ if (ret)
+ goto error;
+ }
return 0;
error:
if (ret)
goto error;
- ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
- &dd->pci_lnkctl3);
- if (ret)
- goto error;
-
- ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, &dd->pci_tph2);
- if (ret)
- goto error;
-
+ if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) {
+ ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2,
+ &dd->pci_tph2);
+ if (ret)
+ goto error;
+ }
return 0;
error:
goto err_free_mr;
mr->max_pages = max_num_sg;
-
err = mlx4_mr_enable(dev->dev, &mr->mmr);
if (err)
goto err_free_pl;
return &mr->ibmr;
err_free_pl:
+ mr->ibmr.device = pd->device;
mlx4_free_priv_pages(mr);
err_free_mr:
(void) mlx4_mr_free(dev->dev, &mr->mmr);
return (-EOPNOTSUPP);
}
+ if (ucmd->rx_hash_fields_mask & ~(MLX4_IB_RX_HASH_SRC_IPV4 |
+ MLX4_IB_RX_HASH_DST_IPV4 |
+ MLX4_IB_RX_HASH_SRC_IPV6 |
+ MLX4_IB_RX_HASH_DST_IPV6 |
+ MLX4_IB_RX_HASH_SRC_PORT_TCP |
+ MLX4_IB_RX_HASH_DST_PORT_TCP |
+ MLX4_IB_RX_HASH_SRC_PORT_UDP |
+ MLX4_IB_RX_HASH_DST_PORT_UDP)) {
+ pr_debug("RX Hash fields_mask has unsupported mask (0x%llx)\n",
+ ucmd->rx_hash_fields_mask);
+ return (-EOPNOTSUPP);
+ }
+
if ((ucmd->rx_hash_fields_mask & MLX4_IB_RX_HASH_SRC_IPV4) &&
(ucmd->rx_hash_fields_mask & MLX4_IB_RX_HASH_DST_IPV4)) {
rss_ctx->flags = MLX4_RSS_IPV4;
return (-EOPNOTSUPP);
}
- if (rss_ctx->flags & MLX4_RSS_IPV4) {
+ if (rss_ctx->flags & MLX4_RSS_IPV4)
rss_ctx->flags |= MLX4_RSS_UDP_IPV4;
- } else if (rss_ctx->flags & MLX4_RSS_IPV6) {
+ if (rss_ctx->flags & MLX4_RSS_IPV6)
rss_ctx->flags |= MLX4_RSS_UDP_IPV6;
- } else {
+ if (!(rss_ctx->flags & (MLX4_RSS_IPV6 | MLX4_RSS_IPV4))) {
pr_debug("RX Hash fields_mask is not supported - UDP must be set with IPv4 or IPv6\n");
return (-EOPNOTSUPP);
}
if ((ucmd->rx_hash_fields_mask & MLX4_IB_RX_HASH_SRC_PORT_TCP) &&
(ucmd->rx_hash_fields_mask & MLX4_IB_RX_HASH_DST_PORT_TCP)) {
- if (rss_ctx->flags & MLX4_RSS_IPV4) {
+ if (rss_ctx->flags & MLX4_RSS_IPV4)
rss_ctx->flags |= MLX4_RSS_TCP_IPV4;
- } else if (rss_ctx->flags & MLX4_RSS_IPV6) {
+ if (rss_ctx->flags & MLX4_RSS_IPV6)
rss_ctx->flags |= MLX4_RSS_TCP_IPV6;
- } else {
+ if (!(rss_ctx->flags & (MLX4_RSS_IPV6 | MLX4_RSS_IPV4))) {
pr_debug("RX Hash fields_mask is not supported - TCP must be set with IPv4 or IPv6\n");
return (-EOPNOTSUPP);
}
-
} else if ((ucmd->rx_hash_fields_mask & MLX4_IB_RX_HASH_SRC_PORT_TCP) ||
(ucmd->rx_hash_fields_mask & MLX4_IB_RX_HASH_DST_PORT_TCP)) {
pr_debug("RX Hash fields_mask is not supported - both TCP SRC and DST must be set\n");
return err;
}
-int mlx5_cmd_query_cong_counter(struct mlx5_core_dev *dev,
- bool reset, void *out, int out_size)
-{
- u32 in[MLX5_ST_SZ_DW(query_cong_statistics_in)] = { };
-
- MLX5_SET(query_cong_statistics_in, in, opcode,
- MLX5_CMD_OP_QUERY_CONG_STATISTICS);
- MLX5_SET(query_cong_statistics_in, in, clear, reset);
- return mlx5_cmd_exec(dev, in, sizeof(in), out, out_size);
-}
-
int mlx5_cmd_query_cong_params(struct mlx5_core_dev *dev, int cong_point,
void *out, int out_size)
{
#include <linux/mlx5/driver.h>
int mlx5_cmd_null_mkey(struct mlx5_core_dev *dev, u32 *null_mkey);
-int mlx5_cmd_query_cong_counter(struct mlx5_core_dev *dev,
- bool reset, void *out, int out_size);
int mlx5_cmd_query_cong_params(struct mlx5_core_dev *dev, int cong_point,
void *out, int out_size);
int mlx5_cmd_modify_cong_params(struct mlx5_core_dev *mdev,
}
INIT_LIST_HEAD(&context->vma_private_list);
+ mutex_init(&context->vma_private_list_mutex);
INIT_LIST_HEAD(&context->db_page_list);
mutex_init(&context->db_page_mutex);
* mlx5_ib_disassociate_ucontext().
*/
mlx5_ib_vma_priv_data->vma = NULL;
+ mutex_lock(mlx5_ib_vma_priv_data->vma_private_list_mutex);
list_del(&mlx5_ib_vma_priv_data->list);
+ mutex_unlock(mlx5_ib_vma_priv_data->vma_private_list_mutex);
kfree(mlx5_ib_vma_priv_data);
}
return -ENOMEM;
vma_prv->vma = vma;
+ vma_prv->vma_private_list_mutex = &ctx->vma_private_list_mutex;
vma->vm_private_data = vma_prv;
vma->vm_ops = &mlx5_ib_vm_ops;
+ mutex_lock(&ctx->vma_private_list_mutex);
list_add(&vma_prv->list, vma_head);
+ mutex_unlock(&ctx->vma_private_list_mutex);
return 0;
}
* mlx5_ib_vma_close.
*/
down_write(&owning_mm->mmap_sem);
+ mutex_lock(&context->vma_private_list_mutex);
list_for_each_entry_safe(vma_private, n, &context->vma_private_list,
list) {
vma = vma_private->vma;
list_del(&vma_private->list);
kfree(vma_private);
}
+ mutex_unlock(&context->vma_private_list_mutex);
up_write(&owning_mm->mmap_sem);
mmput(owning_mm);
put_task_struct(owning_process);
return ret;
}
-static int mlx5_ib_query_cong_counters(struct mlx5_ib_dev *dev,
- struct mlx5_ib_port *port,
- struct rdma_hw_stats *stats)
-{
- int outlen = MLX5_ST_SZ_BYTES(query_cong_statistics_out);
- void *out;
- int ret, i;
- int offset = port->cnts.num_q_counters;
-
- out = kvzalloc(outlen, GFP_KERNEL);
- if (!out)
- return -ENOMEM;
-
- ret = mlx5_cmd_query_cong_counter(dev->mdev, false, out, outlen);
- if (ret)
- goto free;
-
- for (i = 0; i < port->cnts.num_cong_counters; i++) {
- stats->value[i + offset] =
- be64_to_cpup((__be64 *)(out +
- port->cnts.offsets[i + offset]));
- }
-
-free:
- kvfree(out);
- return ret;
-}
-
static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
struct rdma_hw_stats *stats,
u8 port_num, int index)
num_counters = port->cnts.num_q_counters;
if (MLX5_CAP_GEN(dev->mdev, cc_query_allowed)) {
- ret = mlx5_ib_query_cong_counters(dev, port, stats);
+ ret = mlx5_lag_query_cong_counters(dev->mdev,
+ stats->value +
+ port->cnts.num_q_counters,
+ port->cnts.num_cong_counters,
+ port->cnts.offsets +
+ port->cnts.num_q_counters);
if (ret)
return ret;
num_counters += port->cnts.num_cong_counters;
struct mlx5_ib_vma_private_data {
struct list_head list;
struct vm_area_struct *vma;
+ /* protect vma_private_list add/del */
+ struct mutex *vma_private_list_mutex;
};
struct mlx5_ib_ucontext {
/* Transport Domain number */
u32 tdn;
struct list_head vma_private_list;
+ /* protect vma_private_list add/del */
+ struct mutex vma_private_list_mutex;
unsigned long upd_xlt_page;
/* protect ODP/KSM */
MLX5_SET(mkc, mkc, access_mode, mr->access_mode);
MLX5_SET(mkc, mkc, umr_en, 1);
+ mr->ibmr.device = pd->device;
err = mlx5_core_create_mkey(dev->mdev, &mr->mmkey, in, inlen);
if (err)
goto err_destroy_psv;
u32 cq_handle;
bool is_kernel;
atomic_t refcnt;
- wait_queue_head_t wait;
+ struct completion free;
};
struct pvrdma_id_table {
u32 srq_handle;
int npages;
refcount_t refcnt;
- wait_queue_head_t wait;
+ struct completion free;
};
struct pvrdma_qp {
bool is_kernel;
struct mutex mutex; /* QP state mutex. */
atomic_t refcnt;
- wait_queue_head_t wait;
+ struct completion free;
};
struct pvrdma_dev {
pvrdma_page_dir_insert_umem(&cq->pdir, cq->umem, 0);
atomic_set(&cq->refcnt, 1);
- init_waitqueue_head(&cq->wait);
+ init_completion(&cq->free);
spin_lock_init(&cq->cq_lock);
memset(cmd, 0, sizeof(*cmd));
static void pvrdma_free_cq(struct pvrdma_dev *dev, struct pvrdma_cq *cq)
{
- atomic_dec(&cq->refcnt);
- wait_event(cq->wait, !atomic_read(&cq->refcnt));
+ if (atomic_dec_and_test(&cq->refcnt))
+ complete(&cq->free);
+ wait_for_completion(&cq->free);
if (!cq->is_kernel)
ib_umem_release(cq->umem);
ibqp->event_handler(&e, ibqp->qp_context);
}
if (qp) {
- atomic_dec(&qp->refcnt);
- if (atomic_read(&qp->refcnt) == 0)
- wake_up(&qp->wait);
+ if (atomic_dec_and_test(&qp->refcnt))
+ complete(&qp->free);
}
}
ibcq->event_handler(&e, ibcq->cq_context);
}
if (cq) {
- atomic_dec(&cq->refcnt);
- if (atomic_read(&cq->refcnt) == 0)
- wake_up(&cq->wait);
+ if (atomic_dec_and_test(&cq->refcnt))
+ complete(&cq->free);
}
}
}
if (srq) {
if (refcount_dec_and_test(&srq->refcnt))
- wake_up(&srq->wait);
+ complete(&srq->free);
}
}
if (cq && cq->ibcq.comp_handler)
cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
if (cq) {
- atomic_dec(&cq->refcnt);
- if (atomic_read(&cq->refcnt))
- wake_up(&cq->wait);
+ if (atomic_dec_and_test(&cq->refcnt))
+ complete(&cq->free);
}
pvrdma_idx_ring_inc(&ring->cons_head, ring_slots);
}
spin_lock_init(&qp->rq.lock);
mutex_init(&qp->mutex);
atomic_set(&qp->refcnt, 1);
- init_waitqueue_head(&qp->wait);
+ init_completion(&qp->free);
qp->state = IB_QPS_RESET;
pvrdma_unlock_cqs(scq, rcq, &scq_flags, &rcq_flags);
- atomic_dec(&qp->refcnt);
- wait_event(qp->wait, !atomic_read(&qp->refcnt));
+ if (atomic_dec_and_test(&qp->refcnt))
+ complete(&qp->free);
+ wait_for_completion(&qp->free);
+
+ if (!qp->is_kernel) {
+ if (qp->rumem)
+ ib_umem_release(qp->rumem);
+ if (qp->sumem)
+ ib_umem_release(qp->sumem);
+ }
pvrdma_page_dir_cleanup(dev, &qp->pdir);
spin_lock_init(&srq->lock);
refcount_set(&srq->refcnt, 1);
- init_waitqueue_head(&srq->wait);
+ init_completion(&srq->free);
dev_dbg(&dev->pdev->dev,
"create shared receive queue from user space\n");
dev->srq_tbl[srq->srq_handle] = NULL;
spin_unlock_irqrestore(&dev->srq_tbl_lock, flags);
- refcount_dec(&srq->refcnt);
- wait_event(srq->wait, !refcount_read(&srq->refcnt));
+ if (refcount_dec_and_test(&srq->refcnt))
+ complete(&srq->free);
+ wait_for_completion(&srq->free);
/* There is no support for kernel clients, so this is safe. */
ib_umem_release(srq->umem);
noio_flag = memalloc_noio_save();
p->tx_ring = vzalloc(ipoib_sendq_size * sizeof(*p->tx_ring));
if (!p->tx_ring) {
+ memalloc_noio_restore(noio_flag);
ret = -ENOMEM;
goto err_tx;
}
ipoib_ib_dev_down(dev);
if (level == IPOIB_FLUSH_HEAVY) {
- rtnl_lock();
if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
ipoib_ib_dev_stop(dev);
- result = ipoib_ib_dev_open(dev);
- rtnl_unlock();
- if (result)
+ if (ipoib_ib_dev_open(dev))
return;
if (netif_queue_stopped(dev))
struct ipoib_dev_priv *priv =
container_of(work, struct ipoib_dev_priv, flush_heavy);
+ rtnl_lock();
__ipoib_ib_dev_flush(priv, IPOIB_FLUSH_HEAVY, 0);
+ rtnl_unlock();
}
void ipoib_ib_dev_cleanup(struct net_device *dev)
return 0;
}
-static void neigh_add_path(struct sk_buff *skb, u8 *daddr,
- struct net_device *dev)
+static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
+ struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
struct rdma_netdev *rn = netdev_priv(dev);
spin_unlock_irqrestore(&priv->lock, flags);
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
- return;
+ return NULL;
+ }
+
+ /* To avoid race condition, make sure that the
+ * neigh will be added only once.
+ */
+ if (unlikely(!list_empty(&neigh->list))) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ return neigh;
}
path = __path_find(dev, daddr + 4);
path->ah->last_send = rn->send(dev, skb, path->ah->ah,
IPOIB_QPN(daddr));
ipoib_neigh_put(neigh);
- return;
+ return NULL;
}
} else {
neigh->ah = NULL;
spin_unlock_irqrestore(&priv->lock, flags);
ipoib_neigh_put(neigh);
- return;
+ return NULL;
err_path:
ipoib_neigh_free(neigh);
spin_unlock_irqrestore(&priv->lock, flags);
ipoib_neigh_put(neigh);
+
+ return NULL;
}
static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
case htons(ETH_P_TIPC):
neigh = ipoib_neigh_get(dev, phdr->hwaddr);
if (unlikely(!neigh)) {
- neigh_add_path(skb, phdr->hwaddr, dev);
- return NETDEV_TX_OK;
+ neigh = neigh_add_path(skb, phdr->hwaddr, dev);
+ if (likely(!neigh))
+ return NETDEV_TX_OK;
}
break;
case htons(ETH_P_ARP):
spin_lock_irqsave(&priv->lock, flags);
if (!neigh) {
neigh = ipoib_neigh_alloc(daddr, dev);
- if (neigh) {
+ /* Make sure that the neigh will be added only
+ * once to mcast list.
+ */
+ if (neigh && list_empty(&neigh->list)) {
kref_get(&mcast->ah->ref);
neigh->ah = mcast->ah;
list_add_tail(&neigh->list, &mcast->neigh_list);
return -ENOMEM;
attr->qp_state = IB_QPS_INIT;
- attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_WRITE;
+ attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE;
attr->port_num = ch->sport->port;
attr->pkey_index = 0;
goto destroy_ib;
}
- guid = (__be16 *)¶m->primary_path->sgid.global.interface_id;
+ guid = (__be16 *)¶m->primary_path->dgid.global.interface_id;
snprintf(ch->ini_guid, sizeof(ch->ini_guid), "%04x:%04x:%04x:%04x",
be16_to_cpu(guid[0]), be16_to_cpu(guid[1]),
be16_to_cpu(guid[2]), be16_to_cpu(guid[3]));
#define GET_TIME(x) do { x = (unsigned int)rdtsc(); } while (0)
#define DELTA(x,y) ((y)-(x))
#define TIME_NAME "TSC"
-#elif defined(__alpha__) || defined(CONFIG_MN10300) || defined(CONFIG_ARM) || defined(CONFIG_ARM64) || defined(CONFIG_TILE)
+#elif defined(__alpha__) || defined(CONFIG_MN10300) || defined(CONFIG_ARM) || defined(CONFIG_ARM64) || defined(CONFIG_RISCV) || defined(CONFIG_TILE)
#define GET_TIME(x) do { x = get_cycles(); } while (0)
#define DELTA(x,y) ((y)-(x))
#define TIME_NAME "get_cycles"
return union_desc;
dev_err(&intf->dev,
- "Union descriptor to short (%d vs %zd\n)",
+ "Union descriptor too short (%d vs %zd)\n",
union_desc->bLength, sizeof(*union_desc));
return NULL;
}
0, width, 0, 0);
input_set_abs_params(mtouch, ABS_MT_POSITION_Y,
0, height, 0, 0);
- input_set_abs_params(mtouch, ABS_MT_PRESSURE,
- 0, 255, 0, 0);
ret = input_mt_init_slots(mtouch, num_cont, INPUT_MT_DIRECT);
if (ret) {
case 5:
etd->hw_version = 3;
break;
- case 6 ... 14:
+ case 6 ... 15:
etd->hw_version = 4;
break;
default:
#include <linux/module.h>
#include <linux/input.h>
#include <linux/interrupt.h>
+#include <linux/irq.h>
#include <linux/platform_device.h>
#include <linux/async.h>
#include <linux/i2c.h>
}
/*
- * Systems using device tree should set up interrupt via DTS,
- * the rest will use the default falling edge interrupts.
+ * Platform code (ACPI, DTS) should normally set up interrupt
+ * for us, but in case it did not let's fall back to using falling
+ * edge to be compatible with older Chromebooks.
*/
- irqflags = client->dev.of_node ? 0 : IRQF_TRIGGER_FALLING;
+ irqflags = irq_get_trigger_type(client->irq);
+ if (!irqflags)
+ irqflags = IRQF_TRIGGER_FALLING;
error = devm_request_threaded_irq(&client->dev, client->irq,
NULL, elants_i2c_irq,
#include <linux/of.h>
#include <linux/firmware.h>
#include <linux/delay.h>
-#include <linux/gpio.h>
-#include <linux/gpio/machine.h>
+#include <linux/gpio/consumer.h>
#include <linux/i2c.h>
#include <linux/acpi.h>
#include <linux/interrupt.h>
struct irq_cfg *cfg);
static int irq_remapping_activate(struct irq_domain *domain,
- struct irq_data *irq_data, bool early)
+ struct irq_data *irq_data, bool reserve)
{
struct amd_ir_data *data = irq_data->chip_data;
struct irq_2_irte *irte_info = &data->irq_2_irte;
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
domain->geometry.aperture_end = (1UL << ias) - 1;
domain->geometry.force_aperture = true;
- smmu_domain->pgtbl_ops = pgtbl_ops;
ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
- if (ret < 0)
+ if (ret < 0) {
free_io_pgtable_ops(pgtbl_ops);
+ return ret;
+ }
- return ret;
+ smmu_domain->pgtbl_ops = pgtbl_ops;
+ return 0;
}
static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
{
- int i;
+ int i, j;
struct arm_smmu_master_data *master = fwspec->iommu_priv;
struct arm_smmu_device *smmu = master->smmu;
u32 sid = fwspec->ids[i];
__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
+ /* Bridged PCI devices may end up with duplicated IDs */
+ for (j = 0; j < i; j++)
+ if (fwspec->ids[j] == sid)
+ break;
+ if (j < i)
+ continue;
+
arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste);
}
}
}
static int intel_irq_remapping_activate(struct irq_domain *domain,
- struct irq_data *irq_data, bool early)
+ struct irq_data *irq_data, bool reserve)
{
intel_ir_reconfigure_irte(irq_data, true);
return 0;
}
static int its_irq_domain_activate(struct irq_domain *domain,
- struct irq_data *d, bool early)
+ struct irq_data *d, bool reserve)
{
struct its_device *its_dev = irq_data_get_irq_chip_data(d);
u32 event = its_get_event_id(d);
}
static int its_vpe_irq_domain_activate(struct irq_domain *domain,
- struct irq_data *d, bool early)
+ struct irq_data *d, bool reserve)
{
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
struct its_node *its;
*/
static struct lock_class_key intc_irqpin_irq_lock_class;
+/* And this is for the request mutex */
+static struct lock_class_key intc_irqpin_irq_request_class;
+
static int intc_irqpin_irq_domain_map(struct irq_domain *h, unsigned int virq,
irq_hw_number_t hw)
{
intc_irqpin_dbg(&p->irq[hw], "map");
irq_set_chip_data(virq, h->host_data);
- irq_set_lockdep_class(virq, &intc_irqpin_irq_lock_class);
+ irq_set_lockdep_class(virq, &intc_irqpin_irq_lock_class,
+ &intc_irqpin_irq_request_class);
irq_set_chip_and_handler(virq, &p->irq_chip, handle_level_irq);
return 0;
}
{
del_timer_sync(&led_cdev->blink_timer);
+ clear_bit(LED_BLINK_SW, &led_cdev->work_flags);
clear_bit(LED_BLINK_ONESHOT, &led_cdev->work_flags);
clear_bit(LED_BLINK_ONESHOT_STOP, &led_cdev->work_flags);
int l;
struct dm_buffer *b, *tmp;
unsigned long freed = 0;
- unsigned long count = nr_to_scan;
+ unsigned long count = c->n_buffers[LIST_CLEAN] +
+ c->n_buffers[LIST_DIRTY];
unsigned long retain_target = get_retain_buffers(c);
for (l = 0; l < LIST_SIZE; l++) {
dm_bufio_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
{
struct dm_bufio_client *c = container_of(shrink, struct dm_bufio_client, shrinker);
+ unsigned long count = READ_ONCE(c->n_buffers[LIST_CLEAN]) +
+ READ_ONCE(c->n_buffers[LIST_DIRTY]);
+ unsigned long retain_target = get_retain_buffers(c);
- return READ_ONCE(c->n_buffers[LIST_CLEAN]) + READ_ONCE(c->n_buffers[LIST_DIRTY]);
+ return (count < retain_target) ? 0 : (count - retain_target);
}
/*
{
int r;
- r = dm_register_target(&cache_target);
- if (r) {
- DMERR("cache target registration failed: %d", r);
- return r;
- }
-
migration_cache = KMEM_CACHE(dm_cache_migration, 0);
if (!migration_cache) {
dm_unregister_target(&cache_target);
return -ENOMEM;
}
+ r = dm_register_target(&cache_target);
+ if (r) {
+ DMERR("cache target registration failed: %d", r);
+ return r;
+ }
+
return 0;
}
dm_noflush_suspending((m)->ti)); \
} while (0)
+/*
+ * Check whether bios must be queued in the device-mapper core rather
+ * than here in the target.
+ *
+ * If MPATHF_QUEUE_IF_NO_PATH and MPATHF_SAVED_QUEUE_IF_NO_PATH hold
+ * the same value then we are not between multipath_presuspend()
+ * and multipath_resume() calls and we have no need to check
+ * for the DMF_NOFLUSH_SUSPENDING flag.
+ */
+static bool __must_push_back(struct multipath *m, unsigned long flags)
+{
+ return ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &flags) !=
+ test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &flags)) &&
+ dm_noflush_suspending(m->ti));
+}
+
+/*
+ * Following functions use READ_ONCE to get atomic access to
+ * all m->flags to avoid taking spinlock
+ */
+static bool must_push_back_rq(struct multipath *m)
+{
+ unsigned long flags = READ_ONCE(m->flags);
+ return test_bit(MPATHF_QUEUE_IF_NO_PATH, &flags) || __must_push_back(m, flags);
+}
+
+static bool must_push_back_bio(struct multipath *m)
+{
+ unsigned long flags = READ_ONCE(m->flags);
+ return __must_push_back(m, flags);
+}
+
/*
* Map cloned requests (request-based multipath)
*/
pgpath = choose_pgpath(m, nr_bytes);
if (!pgpath) {
- if (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags))
+ if (must_push_back_rq(m))
return DM_MAPIO_DELAY_REQUEUE;
dm_report_EIO(m); /* Failed */
return DM_MAPIO_KILL;
}
if (!pgpath) {
- if (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags))
+ if (must_push_back_bio(m))
return DM_MAPIO_REQUEUE;
dm_report_EIO(m);
return DM_MAPIO_KILL;
assign_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags,
(save_old_value && test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) ||
(!save_old_value && queue_if_no_path));
- assign_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags,
- queue_if_no_path || dm_noflush_suspending(m->ti));
+ assign_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags, queue_if_no_path);
spin_unlock_irqrestore(&m->lock, flags);
if (!queue_if_no_path) {
fail_path(pgpath);
if (atomic_read(&m->nr_valid_paths) == 0 &&
- !test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
+ !must_push_back_rq(m)) {
if (error == BLK_STS_IOERR)
dm_report_EIO(m);
/* complete with the original error */
if (atomic_read(&m->nr_valid_paths) == 0 &&
!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
- dm_report_EIO(m);
- *error = BLK_STS_IOERR;
+ if (must_push_back_bio(m)) {
+ r = DM_ENDIO_REQUEUE;
+ } else {
+ dm_report_EIO(m);
+ *error = BLK_STS_IOERR;
+ }
goto done;
}
{
int r;
- r = dm_register_target(&multipath_target);
- if (r < 0) {
- DMERR("request-based register failed %d", r);
- r = -EINVAL;
- goto bad_register_target;
- }
-
kmultipathd = alloc_workqueue("kmpathd", WQ_MEM_RECLAIM, 0);
if (!kmultipathd) {
DMERR("failed to create workqueue kmpathd");
goto bad_alloc_kmpath_handlerd;
}
+ r = dm_register_target(&multipath_target);
+ if (r < 0) {
+ DMERR("request-based register failed %d", r);
+ r = -EINVAL;
+ goto bad_register_target;
+ }
+
return 0;
+bad_register_target:
+ destroy_workqueue(kmpath_handlerd);
bad_alloc_kmpath_handlerd:
destroy_workqueue(kmultipathd);
bad_alloc_kmultipathd:
- dm_unregister_target(&multipath_target);
-bad_register_target:
return r;
}
return r;
}
- r = dm_register_target(&snapshot_target);
- if (r < 0) {
- DMERR("snapshot target register failed %d", r);
- goto bad_register_snapshot_target;
- }
-
- r = dm_register_target(&origin_target);
- if (r < 0) {
- DMERR("Origin target register failed %d", r);
- goto bad_register_origin_target;
- }
-
- r = dm_register_target(&merge_target);
- if (r < 0) {
- DMERR("Merge target register failed %d", r);
- goto bad_register_merge_target;
- }
-
r = init_origin_hash();
if (r) {
DMERR("init_origin_hash failed.");
goto bad_pending_cache;
}
+ r = dm_register_target(&snapshot_target);
+ if (r < 0) {
+ DMERR("snapshot target register failed %d", r);
+ goto bad_register_snapshot_target;
+ }
+
+ r = dm_register_target(&origin_target);
+ if (r < 0) {
+ DMERR("Origin target register failed %d", r);
+ goto bad_register_origin_target;
+ }
+
+ r = dm_register_target(&merge_target);
+ if (r < 0) {
+ DMERR("Merge target register failed %d", r);
+ goto bad_register_merge_target;
+ }
+
return 0;
-bad_pending_cache:
- kmem_cache_destroy(exception_cache);
-bad_exception_cache:
- exit_origin_hash();
-bad_origin_hash:
- dm_unregister_target(&merge_target);
bad_register_merge_target:
dm_unregister_target(&origin_target);
bad_register_origin_target:
dm_unregister_target(&snapshot_target);
bad_register_snapshot_target:
+ kmem_cache_destroy(pending_cache);
+bad_pending_cache:
+ kmem_cache_destroy(exception_cache);
+bad_exception_cache:
+ exit_origin_hash();
+bad_origin_hash:
dm_exception_store_exit();
return r;
refcount_set(&dd->count, 1);
list_add(&dd->list, &t->devices);
+ goto out;
} else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) {
r = upgrade_mode(dd, mode, t->md);
if (r)
return r;
- refcount_inc(&dd->count);
}
-
+ refcount_inc(&dd->count);
+out:
*result = dd->dm_dev;
return 0;
}
static int __init dm_thin_init(void)
{
- int r;
+ int r = -ENOMEM;
pool_table_init();
+ _new_mapping_cache = KMEM_CACHE(dm_thin_new_mapping, 0);
+ if (!_new_mapping_cache)
+ return r;
+
r = dm_register_target(&thin_target);
if (r)
- return r;
+ goto bad_new_mapping_cache;
r = dm_register_target(&pool_target);
if (r)
- goto bad_pool_target;
-
- r = -ENOMEM;
-
- _new_mapping_cache = KMEM_CACHE(dm_thin_new_mapping, 0);
- if (!_new_mapping_cache)
- goto bad_new_mapping_cache;
+ goto bad_thin_target;
return 0;
-bad_new_mapping_cache:
- dm_unregister_target(&pool_target);
-bad_pool_target:
+bad_thin_target:
dm_unregister_target(&thin_target);
+bad_new_mapping_cache:
+ kmem_cache_destroy(_new_mapping_cache);
return r;
}
};
static struct lock_class_key arizona_irq_lock_class;
+static struct lock_class_key arizona_irq_request_class;
static int arizona_irq_map(struct irq_domain *h, unsigned int virq,
irq_hw_number_t hw)
struct arizona *data = h->host_data;
irq_set_chip_data(virq, data);
- irq_set_lockdep_class(virq, &arizona_irq_lock_class);
+ irq_set_lockdep_class(virq, &arizona_irq_lock_class,
+ &arizona_irq_request_class);
irq_set_chip_and_handler(virq, &arizona_irq_chip, handle_simple_irq);
irq_set_nested_thread(virq, 1);
irq_set_noprobe(virq);
u8 *ptr;
u8 *rx_buf;
u8 sum;
+ u8 rx_byte;
int ret = 0, final_ret;
len = cros_ec_prepare_tx(ec_dev, ec_msg);
if (!ret) {
/* Verify that EC can process command */
for (i = 0; i < len; i++) {
- switch (rx_buf[i]) {
- case EC_SPI_PAST_END:
- case EC_SPI_RX_BAD_DATA:
- case EC_SPI_NOT_READY:
- ret = -EAGAIN;
- ec_msg->result = EC_RES_IN_PROGRESS;
- default:
+ rx_byte = rx_buf[i];
+ if (rx_byte == EC_SPI_PAST_END ||
+ rx_byte == EC_SPI_RX_BAD_DATA ||
+ rx_byte == EC_SPI_NOT_READY) {
+ ret = -EREMOTEIO;
break;
}
- if (ret)
- break;
}
- if (!ret)
- ret = cros_ec_spi_receive_packet(ec_dev,
- ec_msg->insize + sizeof(*response));
- } else {
- dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
}
+ if (!ret)
+ ret = cros_ec_spi_receive_packet(ec_dev,
+ ec_msg->insize + sizeof(*response));
+ else
+ dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
final_ret = terminate_request(ec_dev);
spi_bus_unlock(ec_spi->spi->master);
int i, len;
u8 *ptr;
u8 *rx_buf;
+ u8 rx_byte;
int sum;
int ret = 0, final_ret;
if (!ret) {
/* Verify that EC can process command */
for (i = 0; i < len; i++) {
- switch (rx_buf[i]) {
- case EC_SPI_PAST_END:
- case EC_SPI_RX_BAD_DATA:
- case EC_SPI_NOT_READY:
- ret = -EAGAIN;
- ec_msg->result = EC_RES_IN_PROGRESS;
- default:
+ rx_byte = rx_buf[i];
+ if (rx_byte == EC_SPI_PAST_END ||
+ rx_byte == EC_SPI_RX_BAD_DATA ||
+ rx_byte == EC_SPI_NOT_READY) {
+ ret = -EREMOTEIO;
break;
}
- if (ret)
- break;
}
- if (!ret)
- ret = cros_ec_spi_receive_response(ec_dev,
- ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
- } else {
- dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
}
+ if (!ret)
+ ret = cros_ec_spi_receive_response(ec_dev,
+ ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
+ else
+ dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
final_ret = terminate_request(ec_dev);
spi_bus_unlock(ec_spi->spi->master);
sizeof(struct ec_response_get_protocol_info);
ec_dev->dout_size = sizeof(struct ec_host_request);
+ ec_spi->last_transfer_ns = ktime_get_ns();
err = cros_ec_register(ec_dev);
if (err) {
rtsx_pci_power_off(pcr, HOST_ENTER_S1);
pci_disable_device(pcidev);
+ free_irq(pcr->irq, (void *)pcr);
+ if (pcr->msi_en)
+ pci_disable_msi(pcr->pci);
}
#else /* CONFIG_PM */
EXPORT_SYMBOL_GPL(twl4030_audio_get_mclk);
static bool twl4030_audio_has_codec(struct twl4030_audio_data *pdata,
- struct device_node *node)
+ struct device_node *parent)
{
+ struct device_node *node;
+
if (pdata && pdata->codec)
return true;
- if (of_find_node_by_name(node, "codec"))
+ node = of_get_child_by_name(parent, "codec");
+ if (node) {
+ of_node_put(node);
return true;
+ }
return false;
}
};
-static bool twl6040_has_vibra(struct device_node *node)
+static bool twl6040_has_vibra(struct device_node *parent)
{
-#ifdef CONFIG_OF
- if (of_find_node_by_name(node, "vibra"))
+ struct device_node *node;
+
+ node = of_get_child_by_name(parent, "vibra");
+ if (node) {
+ of_node_put(node);
return true;
-#endif
+ }
+
return false;
}
static int at24_read(void *priv, unsigned int off, void *val, size_t count)
{
struct at24_data *at24 = priv;
- struct i2c_client *client;
+ struct device *dev = &at24->client[0]->dev;
char *buf = val;
int ret;
if (off + count > at24->chip.byte_len)
return -EINVAL;
- client = at24_translate_offset(at24, &off);
-
- ret = pm_runtime_get_sync(&client->dev);
+ ret = pm_runtime_get_sync(dev);
if (ret < 0) {
- pm_runtime_put_noidle(&client->dev);
+ pm_runtime_put_noidle(dev);
return ret;
}
status = at24->read_func(at24, buf, off, count);
if (status < 0) {
mutex_unlock(&at24->lock);
- pm_runtime_put(&client->dev);
+ pm_runtime_put(dev);
return status;
}
buf += status;
mutex_unlock(&at24->lock);
- pm_runtime_put(&client->dev);
+ pm_runtime_put(dev);
return 0;
}
static int at24_write(void *priv, unsigned int off, void *val, size_t count)
{
struct at24_data *at24 = priv;
- struct i2c_client *client;
+ struct device *dev = &at24->client[0]->dev;
char *buf = val;
int ret;
if (off + count > at24->chip.byte_len)
return -EINVAL;
- client = at24_translate_offset(at24, &off);
-
- ret = pm_runtime_get_sync(&client->dev);
+ ret = pm_runtime_get_sync(dev);
if (ret < 0) {
- pm_runtime_put_noidle(&client->dev);
+ pm_runtime_put_noidle(dev);
return ret;
}
status = at24->write_func(at24, buf, off, count);
if (status < 0) {
mutex_unlock(&at24->lock);
- pm_runtime_put(&client->dev);
+ pm_runtime_put(dev);
return status;
}
buf += status;
mutex_unlock(&at24->lock);
- pm_runtime_put(&client->dev);
+ pm_runtime_put(dev);
return 0;
}
at24->nvmem_config.reg_read = at24_read;
at24->nvmem_config.reg_write = at24_write;
at24->nvmem_config.priv = at24;
- at24->nvmem_config.stride = 4;
+ at24->nvmem_config.stride = 1;
at24->nvmem_config.word_size = 1;
at24->nvmem_config.size = chip.byte_len;
#include <linux/pci.h>
#include <linux/mutex.h>
#include <linux/miscdevice.h>
-#include <linux/pti.h>
+#include <linux/intel-pti.h>
#include <linux/slab.h>
#include <linux/uaccess.h>
#define EXT_CSD_REV_ANY (-1u)
#define CID_MANFID_SANDISK 0x2
+#define CID_MANFID_ATP 0x9
#define CID_MANFID_TOSHIBA 0x11
#define CID_MANFID_MICRON 0x13
#define CID_MANFID_SAMSUNG 0x15
+#define CID_MANFID_APACER 0x27
#define CID_MANFID_KINGSTON 0x70
#define CID_MANFID_HYNIX 0x90
static void mmc_select_driver_type(struct mmc_card *card)
{
- int card_drv_type, drive_strength, drv_type;
+ int card_drv_type, drive_strength, drv_type = 0;
int fixed_drv_type = card->host->fixed_drv_type;
card_drv_type = card->ext_csd.raw_driver_strength |
MMC_FIXUP("MMC32G", CID_MANFID_TOSHIBA, CID_OEMID_ANY, add_quirk_mmc,
MMC_QUIRK_BLK_NO_CMD23),
+ /*
+ * Some SD cards lockup while using CMD23 multiblock transfers.
+ */
+ MMC_FIXUP("AF SD", CID_MANFID_ATP, CID_OEMID_ANY, add_quirk_sd,
+ MMC_QUIRK_BLK_NO_CMD23),
+ MMC_FIXUP("APUSD", CID_MANFID_APACER, 0x5048, add_quirk_sd,
+ MMC_QUIRK_BLK_NO_CMD23),
+
/*
* Some MMC cards need longer data read timeout than indicated in CSD.
*/
#include <linux/kernel.h>
#include <linux/clk.h>
#include <linux/slab.h>
+#include <linux/module.h>
#include <linux/of_device.h>
#include <linux/platform_device.h>
#include <linux/mmc/host.h>
return 0;
}
EXPORT_SYMBOL_GPL(renesas_sdhi_remove);
+
+MODULE_LICENSE("GPL v2");
struct s3cmci_reg {
unsigned short addr;
unsigned char *name;
-} debug_regs[] = {
+};
+
+static const struct s3cmci_reg debug_regs[] = {
DBG_REG(CON),
DBG_REG(PRE),
DBG_REG(CMDARG),
static int s3cmci_regs_show(struct seq_file *seq, void *v)
{
struct s3cmci_host *host = seq->private;
- struct s3cmci_reg *rptr = debug_regs;
+ const struct s3cmci_reg *rptr = debug_regs;
for (; rptr->name; rptr++)
seq_printf(seq, "SDI%s\t=0x%08x\n", rptr->name,
if (!ops->oobbuf)
ops->ooblen = 0;
- if (offs < 0 || offs + ops->len >= mtd->size)
+ if (offs < 0 || offs + ops->len > mtd->size)
return -EINVAL;
if (ops->ooblen) {
err = brcmstb_nand_verify_erased_page(mtd, chip, buf,
addr);
/* erased page bitflips corrected */
- if (err > 0)
+ if (err >= 0)
return err;
}
goto out_ce;
}
- gpiomtd->nwp = devm_gpiod_get(dev, "ale", GPIOD_OUT_LOW);
- if (IS_ERR(gpiomtd->nwp)) {
- ret = PTR_ERR(gpiomtd->nwp);
+ gpiomtd->ale = devm_gpiod_get(dev, "ale", GPIOD_OUT_LOW);
+ if (IS_ERR(gpiomtd->ale)) {
+ ret = PTR_ERR(gpiomtd->ale);
goto out_ce;
}
return ret;
}
- /* handle the block mark swapping */
- block_mark_swapping(this, payload_virt, auxiliary_virt);
-
/* Loop over status bytes, accumulating ECC status. */
status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
max_bitflips = max_t(unsigned int, max_bitflips, *status);
}
+ /* handle the block mark swapping */
+ block_mark_swapping(this, buf, auxiliary_virt);
+
if (oob_required) {
/*
* It's time to deliver the OOB bytes. See gpmi_ecc_read_oob()
switch (command) {
case NAND_CMD_READ0:
+ case NAND_CMD_READOOB:
case NAND_CMD_PAGEPROG:
info->use_ecc = 1;
break;
data = be32_to_cpup((__be32 *)&cf->data[0]);
flexcan_write(data, &priv->tx_mb->data[0]);
}
- if (cf->can_dlc > 3) {
+ if (cf->can_dlc > 4) {
data = be32_to_cpup((__be32 *)&cf->data[4]);
flexcan_write(data, &priv->tx_mb->data[1]);
}
if (dev->can.state == CAN_STATE_ERROR_WARNING ||
dev->can.state == CAN_STATE_ERROR_PASSIVE) {
+ cf->can_id |= CAN_ERR_CRTL;
cf->data[1] = (txerr > rxerr) ?
CAN_ERR_CRTL_TX_PASSIVE : CAN_ERR_CRTL_RX_PASSIVE;
}
dev_err(netdev->dev.parent, "Couldn't set bittimings (err=%d)",
rc);
- return rc;
+ return (rc > 0) ? 0 : rc;
}
static void gs_usb_xmit_callback(struct urb *urb)
tbp = peer_tb;
}
- if (tbp[IFLA_IFNAME]) {
+ if (ifmp && tbp[IFLA_IFNAME]) {
nla_strlcpy(ifname, tbp[IFLA_IFNAME], IFNAMSIZ);
name_assign_type = NET_NAME_USER;
} else {
{
struct b53_device *dev = ds->priv;
- /* Older models support a different tag format that we do not
- * support in net/dsa/tag_brcm.c yet.
+ /* Older models (5325, 5365) support a different tag format that we do
+ * not support in net/dsa/tag_brcm.c yet. 539x and 531x5 require managed
+ * mode to be turned on which means we need to specifically manage ARL
+ * misses on multicast addresses (TBD).
*/
- if (is5325(dev) || is5365(dev) || !b53_can_enable_brcm_tags(ds, port))
+ if (is5325(dev) || is5365(dev) || is539x(dev) || is531x5(dev) ||
+ !b53_can_enable_brcm_tags(ds, port))
return DSA_TAG_PROTO_NONE;
/* Broadcom BCM58xx chips have a flow accelerator on Port 8
cmode = MV88E6XXX_PORT_STS_CMODE_2500BASEX;
break;
case PHY_INTERFACE_MODE_XGMII:
+ case PHY_INTERFACE_MODE_XAUI:
cmode = MV88E6XXX_PORT_STS_CMODE_XAUI;
break;
case PHY_INTERFACE_MODE_RXAUI:
struct sk_buff* rx_skbuff[RX_RING_SIZE];
struct sk_buff* tx_skbuff[TX_RING_SIZE];
unsigned int cur_rx, cur_tx; /* The next free ring entry */
- unsigned int dirty_rx, dirty_tx; /* The ring entries to be free()ed. */
+ unsigned int dirty_tx; /* The ring entries to be free()ed. */
struct vortex_extra_stats xstats; /* NIC-specific extra stats */
struct sk_buff *tx_skb; /* Packet being eaten by bus master ctrl. */
dma_addr_t tx_skb_dma; /* Allocated DMA address for bus master ctrl DMA. */
/* The remainder are related to chip state, mostly media selection. */
struct timer_list timer; /* Media selection timer. */
- struct timer_list rx_oom_timer; /* Rx skb allocation retry timer */
int options; /* User-settable misc. driver options. */
unsigned int media_override:4, /* Passed-in media type. */
default_media:4, /* Read from the EEPROM/Wn3_Config. */
static int mdio_read(struct net_device *dev, int phy_id, int location);
static void mdio_write(struct net_device *vp, int phy_id, int location, int value);
static void vortex_timer(struct timer_list *t);
-static void rx_oom_timer(struct timer_list *t);
static netdev_tx_t vortex_start_xmit(struct sk_buff *skb,
struct net_device *dev);
static netdev_tx_t boomerang_start_xmit(struct sk_buff *skb,
timer_setup(&vp->timer, vortex_timer, 0);
mod_timer(&vp->timer, RUN_AT(media_tbl[dev->if_port].wait));
- timer_setup(&vp->rx_oom_timer, rx_oom_timer, 0);
if (vortex_debug > 1)
pr_debug("%s: Initial media type %s.\n",
window_write16(vp, 0x0040, 4, Wn4_NetDiag);
if (vp->full_bus_master_rx) { /* Boomerang bus master. */
- vp->cur_rx = vp->dirty_rx = 0;
+ vp->cur_rx = 0;
/* Initialize the RxEarly register as recommended. */
iowrite16(SetRxThreshold + (1536>>2), ioaddr + EL3_CMD);
iowrite32(0x0020, ioaddr + PktStatus);
struct vortex_private *vp = netdev_priv(dev);
int i;
int retval;
+ dma_addr_t dma;
/* Use the now-standard shared IRQ implementation. */
if ((retval = request_irq(dev->irq, vp->full_bus_master_rx ?
break; /* Bad news! */
skb_reserve(skb, NET_IP_ALIGN); /* Align IP on 16 byte boundaries */
- vp->rx_ring[i].addr = cpu_to_le32(pci_map_single(VORTEX_PCI(vp), skb->data, PKT_BUF_SZ, PCI_DMA_FROMDEVICE));
+ dma = pci_map_single(VORTEX_PCI(vp), skb->data,
+ PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
+ if (dma_mapping_error(&VORTEX_PCI(vp)->dev, dma))
+ break;
+ vp->rx_ring[i].addr = cpu_to_le32(dma);
}
if (i != RX_RING_SIZE) {
pr_emerg("%s: no memory for rx ring\n", dev->name);
int len = (skb->len + 3) & ~3;
vp->tx_skb_dma = pci_map_single(VORTEX_PCI(vp), skb->data, len,
PCI_DMA_TODEVICE);
+ if (dma_mapping_error(&VORTEX_PCI(vp)->dev, vp->tx_skb_dma)) {
+ dev_kfree_skb_any(skb);
+ dev->stats.tx_dropped++;
+ return NETDEV_TX_OK;
+ }
+
spin_lock_irq(&vp->window_lock);
window_set(vp, 7);
iowrite32(vp->tx_skb_dma, ioaddr + Wn7_MasterAddr);
int entry = vp->cur_rx % RX_RING_SIZE;
void __iomem *ioaddr = vp->ioaddr;
int rx_status;
- int rx_work_limit = vp->dirty_rx + RX_RING_SIZE - vp->cur_rx;
+ int rx_work_limit = RX_RING_SIZE;
if (vortex_debug > 5)
pr_debug("boomerang_rx(): status %4.4x\n", ioread16(ioaddr+EL3_STATUS));
} else {
/* The packet length: up to 4.5K!. */
int pkt_len = rx_status & 0x1fff;
- struct sk_buff *skb;
+ struct sk_buff *skb, *newskb;
+ dma_addr_t newdma;
dma_addr_t dma = le32_to_cpu(vp->rx_ring[entry].addr);
if (vortex_debug > 4)
pci_dma_sync_single_for_device(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
vp->rx_copy++;
} else {
+ /* Pre-allocate the replacement skb. If it or its
+ * mapping fails then recycle the buffer thats already
+ * in place
+ */
+ newskb = netdev_alloc_skb_ip_align(dev, PKT_BUF_SZ);
+ if (!newskb) {
+ dev->stats.rx_dropped++;
+ goto clear_complete;
+ }
+ newdma = pci_map_single(VORTEX_PCI(vp), newskb->data,
+ PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
+ if (dma_mapping_error(&VORTEX_PCI(vp)->dev, newdma)) {
+ dev->stats.rx_dropped++;
+ consume_skb(newskb);
+ goto clear_complete;
+ }
+
/* Pass up the skbuff already on the Rx ring. */
skb = vp->rx_skbuff[entry];
- vp->rx_skbuff[entry] = NULL;
+ vp->rx_skbuff[entry] = newskb;
+ vp->rx_ring[entry].addr = cpu_to_le32(newdma);
skb_put(skb, pkt_len);
pci_unmap_single(VORTEX_PCI(vp), dma, PKT_BUF_SZ, PCI_DMA_FROMDEVICE);
vp->rx_nocopy++;
netif_rx(skb);
dev->stats.rx_packets++;
}
- entry = (++vp->cur_rx) % RX_RING_SIZE;
- }
- /* Refill the Rx ring buffers. */
- for (; vp->cur_rx - vp->dirty_rx > 0; vp->dirty_rx++) {
- struct sk_buff *skb;
- entry = vp->dirty_rx % RX_RING_SIZE;
- if (vp->rx_skbuff[entry] == NULL) {
- skb = netdev_alloc_skb_ip_align(dev, PKT_BUF_SZ);
- if (skb == NULL) {
- static unsigned long last_jif;
- if (time_after(jiffies, last_jif + 10 * HZ)) {
- pr_warn("%s: memory shortage\n",
- dev->name);
- last_jif = jiffies;
- }
- if ((vp->cur_rx - vp->dirty_rx) == RX_RING_SIZE)
- mod_timer(&vp->rx_oom_timer, RUN_AT(HZ * 1));
- break; /* Bad news! */
- }
- vp->rx_ring[entry].addr = cpu_to_le32(pci_map_single(VORTEX_PCI(vp), skb->data, PKT_BUF_SZ, PCI_DMA_FROMDEVICE));
- vp->rx_skbuff[entry] = skb;
- }
+clear_complete:
vp->rx_ring[entry].status = 0; /* Clear complete bit. */
iowrite16(UpUnstall, ioaddr + EL3_CMD);
+ entry = (++vp->cur_rx) % RX_RING_SIZE;
}
return 0;
}
-/*
- * If we've hit a total OOM refilling the Rx ring we poll once a second
- * for some memory. Otherwise there is no way to restart the rx process.
- */
-static void
-rx_oom_timer(struct timer_list *t)
-{
- struct vortex_private *vp = from_timer(vp, t, rx_oom_timer);
- struct net_device *dev = vp->mii.dev;
-
- spin_lock_irq(&vp->lock);
- if ((vp->cur_rx - vp->dirty_rx) == RX_RING_SIZE) /* This test is redundant, but makes me feel good */
- boomerang_rx(dev);
- if (vortex_debug > 1) {
- pr_debug("%s: rx_oom_timer %s\n", dev->name,
- ((vp->cur_rx - vp->dirty_rx) != RX_RING_SIZE) ? "succeeded" : "retrying");
- }
- spin_unlock_irq(&vp->lock);
-}
-
static void
vortex_down(struct net_device *dev, int final_down)
{
netdev_reset_queue(dev);
netif_stop_queue(dev);
- del_timer_sync(&vp->rx_oom_timer);
del_timer_sync(&vp->timer);
/* Turn off statistics ASAP. We update dev->stats below. */
MODULE_DEVICE_TABLE(pci, ena_pci_tbl);
static int ena_rss_init_default(struct ena_adapter *adapter);
+static void check_for_admin_com_state(struct ena_adapter *adapter);
+static void ena_destroy_device(struct ena_adapter *adapter);
+static int ena_restore_device(struct ena_adapter *adapter);
static void ena_tx_timeout(struct net_device *dev)
{
static int ena_up_complete(struct ena_adapter *adapter)
{
- int rc, i;
+ int rc;
rc = ena_rss_configure(adapter);
if (rc)
ena_napi_enable_all(adapter);
- /* Enable completion queues interrupt */
- for (i = 0; i < adapter->num_queues; i++)
- ena_unmask_interrupt(&adapter->tx_ring[i],
- &adapter->rx_ring[i]);
-
- /* schedule napi in case we had pending packets
- * from the last time we disable napi
- */
- for (i = 0; i < adapter->num_queues; i++)
- napi_schedule(&adapter->ena_napi[i].napi);
-
return 0;
}
static int ena_up(struct ena_adapter *adapter)
{
- int rc;
+ int rc, i;
netdev_dbg(adapter->netdev, "%s\n", __func__);
set_bit(ENA_FLAG_DEV_UP, &adapter->flags);
+ /* Enable completion queues interrupt */
+ for (i = 0; i < adapter->num_queues; i++)
+ ena_unmask_interrupt(&adapter->tx_ring[i],
+ &adapter->rx_ring[i]);
+
+ /* schedule napi in case we had pending packets
+ * from the last time we disable napi
+ */
+ for (i = 0; i < adapter->num_queues; i++)
+ napi_schedule(&adapter->ena_napi[i].napi);
+
return rc;
err_up:
if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags))
ena_down(adapter);
+ /* Check for device status and issue reset if needed*/
+ check_for_admin_com_state(adapter);
+ if (unlikely(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))) {
+ netif_err(adapter, ifdown, adapter->netdev,
+ "Destroy failure, restarting device\n");
+ ena_dump_stats_to_dmesg(adapter);
+ /* rtnl lock already obtained in dev_ioctl() layer */
+ ena_destroy_device(adapter);
+ ena_restore_device(adapter);
+ }
+
return 0;
}
ena_com_set_admin_running_state(ena_dev, false);
- ena_close(netdev);
+ if (test_bit(ENA_FLAG_DEV_UP, &adapter->flags))
+ ena_down(adapter);
/* Before releasing the ENA resources, a device reset is required.
* (to prevent the device from accessing them).
- * In case the reset flag is set and the device is up, ena_close
+ * In case the reset flag is set and the device is up, ena_down()
* already perform the reset, so it can be skipped.
*/
if (!(test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags) && dev_up))
#define AQ_CFG_PCI_FUNC_MSIX_IRQS 9U
#define AQ_CFG_PCI_FUNC_PORTS 2U
-#define AQ_CFG_SERVICE_TIMER_INTERVAL (2 * HZ)
+#define AQ_CFG_SERVICE_TIMER_INTERVAL (1 * HZ)
#define AQ_CFG_POLLING_TIMER_INTERVAL ((unsigned int)(2 * HZ))
#define AQ_CFG_SKB_FRAGS_MAX 32U
#define AQ_CFG_DRV_VERSION __stringify(NIC_MAJOR_DRIVER_VERSION)"."\
__stringify(NIC_MINOR_DRIVER_VERSION)"."\
__stringify(NIC_BUILD_DRIVER_VERSION)"."\
- __stringify(NIC_REVISION_DRIVER_VERSION)
+ __stringify(NIC_REVISION_DRIVER_VERSION) \
+ AQ_CFG_DRV_VERSION_SUFFIX
#endif /* AQ_CFG_H */
"OutUCast",
"OutMCast",
"OutBCast",
- "InUCastOctects",
- "OutUCastOctects",
- "InMCastOctects",
- "OutMCastOctects",
- "InBCastOctects",
- "OutBCastOctects",
- "InOctects",
- "OutOctects",
+ "InUCastOctets",
+ "OutUCastOctets",
+ "InMCastOctets",
+ "OutMCastOctets",
+ "InBCastOctets",
+ "OutBCastOctets",
+ "InOctets",
+ "OutOctets",
"InPacketsDma",
"OutPacketsDma",
"InOctetsDma",
unsigned int mbps;
};
+struct aq_stats_s {
+ u64 uprc;
+ u64 mprc;
+ u64 bprc;
+ u64 erpt;
+ u64 uptc;
+ u64 mptc;
+ u64 bptc;
+ u64 erpr;
+ u64 mbtc;
+ u64 bbtc;
+ u64 mbrc;
+ u64 bbrc;
+ u64 ubrc;
+ u64 ubtc;
+ u64 dpc;
+ u64 dma_pkt_rc;
+ u64 dma_pkt_tc;
+ u64 dma_oct_rc;
+ u64 dma_oct_tc;
+};
+
#define AQ_HW_IRQ_INVALID 0U
#define AQ_HW_IRQ_LEGACY 1U
#define AQ_HW_IRQ_MSI 2U
void (*destroy)(struct aq_hw_s *self);
int (*get_hw_caps)(struct aq_hw_s *self,
- struct aq_hw_caps_s *aq_hw_caps);
+ struct aq_hw_caps_s *aq_hw_caps,
+ unsigned short device,
+ unsigned short subsystem_device);
int (*hw_ring_tx_xmit)(struct aq_hw_s *self, struct aq_ring_s *aq_ring,
unsigned int frags);
int (*hw_update_stats)(struct aq_hw_s *self);
- int (*hw_get_hw_stats)(struct aq_hw_s *self, u64 *data,
- unsigned int *p_count);
+ struct aq_stats_s *(*hw_get_hw_stats)(struct aq_hw_s *self);
int (*hw_get_fw_version)(struct aq_hw_s *self, u32 *fw_version);
module_param_named(aq_itr_rx, aq_itr_rx, uint, 0644);
MODULE_PARM_DESC(aq_itr_rx, "RX interrupt throttle rate");
+static void aq_nic_update_ndev_stats(struct aq_nic_s *self);
+
static void aq_nic_rss_init(struct aq_nic_s *self, unsigned int num_rss_queues)
{
struct aq_nic_cfg_s *cfg = &self->aq_nic_cfg;
static void aq_nic_service_timer_cb(struct timer_list *t)
{
struct aq_nic_s *self = from_timer(self, t, service_timer);
- struct net_device *ndev = aq_nic_get_ndev(self);
+ int ctimer = AQ_CFG_SERVICE_TIMER_INTERVAL;
int err = 0;
- unsigned int i = 0U;
- struct aq_ring_stats_rx_s stats_rx;
- struct aq_ring_stats_tx_s stats_tx;
if (aq_utils_obj_test(&self->header.flags, AQ_NIC_FLAGS_IS_NOT_READY))
goto err_exit;
if (self->aq_hw_ops.hw_update_stats)
self->aq_hw_ops.hw_update_stats(self->aq_hw);
- memset(&stats_rx, 0U, sizeof(struct aq_ring_stats_rx_s));
- memset(&stats_tx, 0U, sizeof(struct aq_ring_stats_tx_s));
- for (i = AQ_DIMOF(self->aq_vec); i--;) {
- if (self->aq_vec[i])
- aq_vec_add_stats(self->aq_vec[i], &stats_rx, &stats_tx);
- }
+ aq_nic_update_ndev_stats(self);
- ndev->stats.rx_packets = stats_rx.packets;
- ndev->stats.rx_bytes = stats_rx.bytes;
- ndev->stats.rx_errors = stats_rx.errors;
- ndev->stats.tx_packets = stats_tx.packets;
- ndev->stats.tx_bytes = stats_tx.bytes;
- ndev->stats.tx_errors = stats_tx.errors;
+ /* If no link - use faster timer rate to detect link up asap */
+ if (!netif_carrier_ok(self->ndev))
+ ctimer = max(ctimer / 2, 1);
err_exit:
- mod_timer(&self->service_timer,
- jiffies + AQ_CFG_SERVICE_TIMER_INTERVAL);
+ mod_timer(&self->service_timer, jiffies + ctimer);
}
static void aq_nic_polling_timer_cb(struct timer_list *t)
struct aq_nic_s *aq_nic_alloc_cold(const struct net_device_ops *ndev_ops,
const struct ethtool_ops *et_ops,
- struct device *dev,
+ struct pci_dev *pdev,
struct aq_pci_func_s *aq_pci_func,
unsigned int port,
const struct aq_hw_ops *aq_hw_ops)
ndev->netdev_ops = ndev_ops;
ndev->ethtool_ops = et_ops;
- SET_NETDEV_DEV(ndev, dev);
+ SET_NETDEV_DEV(ndev, &pdev->dev);
ndev->if_port = port;
self->ndev = ndev;
self->aq_hw = self->aq_hw_ops.create(aq_pci_func, self->port,
&self->aq_hw_ops);
- err = self->aq_hw_ops.get_hw_caps(self->aq_hw, &self->aq_hw_caps);
+ err = self->aq_hw_ops.get_hw_caps(self->aq_hw, &self->aq_hw_caps,
+ pdev->device, pdev->subsystem_device);
if (err < 0)
goto err_exit;
void aq_nic_get_stats(struct aq_nic_s *self, u64 *data)
{
- struct aq_vec_s *aq_vec = NULL;
unsigned int i = 0U;
unsigned int count = 0U;
- int err = 0;
+ struct aq_vec_s *aq_vec = NULL;
+ struct aq_stats_s *stats = self->aq_hw_ops.hw_get_hw_stats(self->aq_hw);
- err = self->aq_hw_ops.hw_get_hw_stats(self->aq_hw, data, &count);
- if (err < 0)
+ if (!stats)
goto err_exit;
- data += count;
+ data[i] = stats->uprc + stats->mprc + stats->bprc;
+ data[++i] = stats->uprc;
+ data[++i] = stats->mprc;
+ data[++i] = stats->bprc;
+ data[++i] = stats->erpt;
+ data[++i] = stats->uptc + stats->mptc + stats->bptc;
+ data[++i] = stats->uptc;
+ data[++i] = stats->mptc;
+ data[++i] = stats->bptc;
+ data[++i] = stats->ubrc;
+ data[++i] = stats->ubtc;
+ data[++i] = stats->mbrc;
+ data[++i] = stats->mbtc;
+ data[++i] = stats->bbrc;
+ data[++i] = stats->bbtc;
+ data[++i] = stats->ubrc + stats->mbrc + stats->bbrc;
+ data[++i] = stats->ubtc + stats->mbtc + stats->bbtc;
+ data[++i] = stats->dma_pkt_rc;
+ data[++i] = stats->dma_pkt_tc;
+ data[++i] = stats->dma_oct_rc;
+ data[++i] = stats->dma_oct_tc;
+ data[++i] = stats->dpc;
+
+ i++;
+
+ data += i;
count = 0U;
for (i = 0U, aq_vec = self->aq_vec[0];
}
err_exit:;
- (void)err;
+}
+
+static void aq_nic_update_ndev_stats(struct aq_nic_s *self)
+{
+ struct net_device *ndev = self->ndev;
+ struct aq_stats_s *stats = self->aq_hw_ops.hw_get_hw_stats(self->aq_hw);
+
+ ndev->stats.rx_packets = stats->uprc + stats->mprc + stats->bprc;
+ ndev->stats.rx_bytes = stats->ubrc + stats->mbrc + stats->bbrc;
+ ndev->stats.rx_errors = stats->erpr;
+ ndev->stats.tx_packets = stats->uptc + stats->mptc + stats->bptc;
+ ndev->stats.tx_bytes = stats->ubtc + stats->mbtc + stats->bbtc;
+ ndev->stats.tx_errors = stats->erpt;
+ ndev->stats.multicast = stats->mprc;
}
void aq_nic_get_link_ksettings(struct aq_nic_s *self,
struct aq_nic_s *aq_nic_alloc_cold(const struct net_device_ops *ndev_ops,
const struct ethtool_ops *et_ops,
- struct device *dev,
+ struct pci_dev *pdev,
struct aq_pci_func_s *aq_pci_func,
unsigned int port,
const struct aq_hw_ops *aq_hw_ops);
pci_set_drvdata(pdev, self);
self->pdev = pdev;
- err = aq_hw_ops->get_hw_caps(NULL, &self->aq_hw_caps);
+ err = aq_hw_ops->get_hw_caps(NULL, &self->aq_hw_caps, pdev->device,
+ pdev->subsystem_device);
if (err < 0)
goto err_exit;
for (port = 0; port < self->ports; ++port) {
struct aq_nic_s *aq_nic = aq_nic_alloc_cold(ndev_ops, eth_ops,
- &pdev->dev, self,
+ pdev, self,
port, aq_hw_ops);
if (!aq_nic) {
#include "hw_atl_a0_internal.h"
static int hw_atl_a0_get_hw_caps(struct aq_hw_s *self,
- struct aq_hw_caps_s *aq_hw_caps)
+ struct aq_hw_caps_s *aq_hw_caps,
+ unsigned short device,
+ unsigned short subsystem_device)
{
memcpy(aq_hw_caps, &hw_atl_a0_hw_caps_, sizeof(*aq_hw_caps));
+
+ if (device == HW_ATL_DEVICE_ID_D108 && subsystem_device == 0x0001)
+ aq_hw_caps->link_speed_msk &= ~HW_ATL_A0_RATE_10G;
+
+ if (device == HW_ATL_DEVICE_ID_D109 && subsystem_device == 0x0001) {
+ aq_hw_caps->link_speed_msk &= ~HW_ATL_A0_RATE_10G;
+ aq_hw_caps->link_speed_msk &= ~HW_ATL_A0_RATE_5G;
+ }
+
return 0;
}
hw_atl_a0_hw_rss_set(self, &aq_nic_cfg->aq_rss);
hw_atl_a0_hw_rss_hash_set(self, &aq_nic_cfg->aq_rss);
+ /* Reset link status and read out initial hardware counters */
+ self->aq_link_status.mbps = 0;
+ hw_atl_utils_update_stats(self);
+
err = aq_hw_err_from_flags(self);
if (err < 0)
goto err_exit;
#include "hw_atl_utils.h"
#include "hw_atl_llh.h"
#include "hw_atl_b0_internal.h"
+#include "hw_atl_llh_internal.h"
static int hw_atl_b0_get_hw_caps(struct aq_hw_s *self,
- struct aq_hw_caps_s *aq_hw_caps)
+ struct aq_hw_caps_s *aq_hw_caps,
+ unsigned short device,
+ unsigned short subsystem_device)
{
memcpy(aq_hw_caps, &hw_atl_b0_hw_caps_, sizeof(*aq_hw_caps));
+
+ if (device == HW_ATL_DEVICE_ID_D108 && subsystem_device == 0x0001)
+ aq_hw_caps->link_speed_msk &= ~HW_ATL_B0_RATE_10G;
+
+ if (device == HW_ATL_DEVICE_ID_D109 && subsystem_device == 0x0001) {
+ aq_hw_caps->link_speed_msk &= ~HW_ATL_B0_RATE_10G;
+ aq_hw_caps->link_speed_msk &= ~HW_ATL_B0_RATE_5G;
+ }
+
return 0;
}
};
int err = 0;
+ u32 val;
self->aq_nic_cfg = aq_nic_cfg;
hw_atl_b0_hw_rss_set(self, &aq_nic_cfg->aq_rss);
hw_atl_b0_hw_rss_hash_set(self, &aq_nic_cfg->aq_rss);
+ /* Force limit MRRS on RDM/TDM to 2K */
+ val = aq_hw_read_reg(self, pci_reg_control6_adr);
+ aq_hw_write_reg(self, pci_reg_control6_adr, (val & ~0x707) | 0x404);
+
+ /* TX DMA total request limit. B0 hardware is not capable to
+ * handle more than (8K-MRRS) incoming DMA data.
+ * Value 24 in 256byte units
+ */
+ aq_hw_write_reg(self, tx_dma_total_req_limit_adr, 24);
+
+ /* Reset link status and read out initial hardware counters */
+ self->aq_link_status.mbps = 0;
+ hw_atl_utils_update_stats(self);
+
err = aq_hw_err_from_flags(self);
if (err < 0)
goto err_exit;
#define tx_dma_desc_base_addrmsw_adr(descriptor) \
(0x00007c04u + (descriptor) * 0x40)
+/* tx dma total request limit */
+#define tx_dma_total_req_limit_adr 0x00007b20u
+
/* tx interrupt moderation control register definitions
* Preprocessor definitions for TX Interrupt Moderation Control Register
* Base Address: 0x00008980
/* default value of bitfield reg_res_dsbl */
#define pci_reg_res_dsbl_default 0x1
+/* PCI core control register */
+#define pci_reg_control6_adr 0x1014u
+
/* global microprocessor scratch pad definitions */
#define glb_cpu_scratch_scp_adr(scratch_scp) (0x00000300u + (scratch_scp) * 0x4)
struct hw_atl_s *hw_self = PHAL_ATLANTIC;
struct hw_aq_atl_utils_mbox mbox;
- if (!self->aq_link_status.mbps)
- return 0;
-
hw_atl_utils_mpi_read_stats(self, &mbox);
#define AQ_SDELTA(_N_) (hw_self->curr_stats._N_ += \
mbox.stats._N_ - hw_self->last_stats._N_)
-
- AQ_SDELTA(uprc);
- AQ_SDELTA(mprc);
- AQ_SDELTA(bprc);
- AQ_SDELTA(erpt);
-
- AQ_SDELTA(uptc);
- AQ_SDELTA(mptc);
- AQ_SDELTA(bptc);
- AQ_SDELTA(erpr);
-
- AQ_SDELTA(ubrc);
- AQ_SDELTA(ubtc);
- AQ_SDELTA(mbrc);
- AQ_SDELTA(mbtc);
- AQ_SDELTA(bbrc);
- AQ_SDELTA(bbtc);
- AQ_SDELTA(dpc);
-
+ if (self->aq_link_status.mbps) {
+ AQ_SDELTA(uprc);
+ AQ_SDELTA(mprc);
+ AQ_SDELTA(bprc);
+ AQ_SDELTA(erpt);
+
+ AQ_SDELTA(uptc);
+ AQ_SDELTA(mptc);
+ AQ_SDELTA(bptc);
+ AQ_SDELTA(erpr);
+
+ AQ_SDELTA(ubrc);
+ AQ_SDELTA(ubtc);
+ AQ_SDELTA(mbrc);
+ AQ_SDELTA(mbtc);
+ AQ_SDELTA(bbrc);
+ AQ_SDELTA(bbtc);
+ AQ_SDELTA(dpc);
+ }
#undef AQ_SDELTA
+ hw_self->curr_stats.dma_pkt_rc = stats_rx_dma_good_pkt_counterlsw_get(self);
+ hw_self->curr_stats.dma_pkt_tc = stats_tx_dma_good_pkt_counterlsw_get(self);
+ hw_self->curr_stats.dma_oct_rc = stats_rx_dma_good_octet_counterlsw_get(self);
+ hw_self->curr_stats.dma_oct_tc = stats_tx_dma_good_octet_counterlsw_get(self);
memcpy(&hw_self->last_stats, &mbox.stats, sizeof(mbox.stats));
return 0;
}
-int hw_atl_utils_get_hw_stats(struct aq_hw_s *self,
- u64 *data, unsigned int *p_count)
+struct aq_stats_s *hw_atl_utils_get_hw_stats(struct aq_hw_s *self)
{
- struct hw_atl_s *hw_self = PHAL_ATLANTIC;
- struct hw_atl_stats_s *stats = &hw_self->curr_stats;
- int i = 0;
-
- data[i] = stats->uprc + stats->mprc + stats->bprc;
- data[++i] = stats->uprc;
- data[++i] = stats->mprc;
- data[++i] = stats->bprc;
- data[++i] = stats->erpt;
- data[++i] = stats->uptc + stats->mptc + stats->bptc;
- data[++i] = stats->uptc;
- data[++i] = stats->mptc;
- data[++i] = stats->bptc;
- data[++i] = stats->ubrc;
- data[++i] = stats->ubtc;
- data[++i] = stats->mbrc;
- data[++i] = stats->mbtc;
- data[++i] = stats->bbrc;
- data[++i] = stats->bbtc;
- data[++i] = stats->ubrc + stats->mbrc + stats->bbrc;
- data[++i] = stats->ubtc + stats->mbtc + stats->bbtc;
- data[++i] = stats_rx_dma_good_pkt_counterlsw_get(self);
- data[++i] = stats_tx_dma_good_pkt_counterlsw_get(self);
- data[++i] = stats_rx_dma_good_octet_counterlsw_get(self);
- data[++i] = stats_tx_dma_good_octet_counterlsw_get(self);
- data[++i] = stats->dpc;
-
- if (p_count)
- *p_count = ++i;
-
- return 0;
+ return &PHAL_ATLANTIC->curr_stats;
}
static const u32 hw_atl_utils_hw_mac_regs[] = {
struct __packed hw_atl_s {
struct aq_hw_s base;
struct hw_atl_stats_s last_stats;
- struct hw_atl_stats_s curr_stats;
+ struct aq_stats_s curr_stats;
u64 speed;
unsigned int chip_features;
u32 fw_ver_actual;
int hw_atl_utils_update_stats(struct aq_hw_s *self);
-int hw_atl_utils_get_hw_stats(struct aq_hw_s *self,
- u64 *data,
- unsigned int *p_count);
+struct aq_stats_s *hw_atl_utils_get_hw_stats(struct aq_hw_s *self);
#endif /* HW_ATL_UTILS_H */
#define VER_H
#define NIC_MAJOR_DRIVER_VERSION 1
-#define NIC_MINOR_DRIVER_VERSION 5
-#define NIC_BUILD_DRIVER_VERSION 345
+#define NIC_MINOR_DRIVER_VERSION 6
+#define NIC_BUILD_DRIVER_VERSION 13
#define NIC_REVISION_DRIVER_VERSION 0
+#define AQ_CFG_DRV_VERSION_SUFFIX "-kern"
+
#endif /* VER_H */
unsigned int link;
unsigned int duplex;
unsigned int speed;
+
+ unsigned int rx_missed_errors;
};
/**
#include "emac.h"
+static void arc_emac_restart(struct net_device *ndev);
+
/**
* arc_emac_tx_avail - Return the number of available slots in the tx ring.
* @priv: Pointer to ARC EMAC private data structure.
continue;
}
- pktlen = info & LEN_MASK;
- stats->rx_packets++;
- stats->rx_bytes += pktlen;
- skb = rx_buff->skb;
- skb_put(skb, pktlen);
- skb->dev = ndev;
- skb->protocol = eth_type_trans(skb, ndev);
-
- dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
- dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
-
- /* Prepare the BD for next cycle */
- rx_buff->skb = netdev_alloc_skb_ip_align(ndev,
- EMAC_BUFFER_SIZE);
- if (unlikely(!rx_buff->skb)) {
+ /* Prepare the BD for next cycle. netif_receive_skb()
+ * only if new skb was allocated and mapped to avoid holes
+ * in the RX fifo.
+ */
+ skb = netdev_alloc_skb_ip_align(ndev, EMAC_BUFFER_SIZE);
+ if (unlikely(!skb)) {
+ if (net_ratelimit())
+ netdev_err(ndev, "cannot allocate skb\n");
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
stats->rx_errors++;
- /* Because receive_skb is below, increment rx_dropped */
stats->rx_dropped++;
continue;
}
- /* receive_skb only if new skb was allocated to avoid holes */
- netif_receive_skb(skb);
-
- addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data,
+ addr = dma_map_single(&ndev->dev, (void *)skb->data,
EMAC_BUFFER_SIZE, DMA_FROM_DEVICE);
if (dma_mapping_error(&ndev->dev, addr)) {
if (net_ratelimit())
- netdev_err(ndev, "cannot dma map\n");
- dev_kfree_skb(rx_buff->skb);
+ netdev_err(ndev, "cannot map dma buffer\n");
+ dev_kfree_skb(skb);
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
stats->rx_errors++;
+ stats->rx_dropped++;
continue;
}
+
+ /* unmap previosly mapped skb */
+ dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
+ dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
+
+ pktlen = info & LEN_MASK;
+ stats->rx_packets++;
+ stats->rx_bytes += pktlen;
+ skb_put(rx_buff->skb, pktlen);
+ rx_buff->skb->dev = ndev;
+ rx_buff->skb->protocol = eth_type_trans(rx_buff->skb, ndev);
+
+ netif_receive_skb(rx_buff->skb);
+
+ rx_buff->skb = skb;
dma_unmap_addr_set(rx_buff, addr, addr);
dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);
return work_done;
}
+/**
+ * arc_emac_rx_miss_handle - handle R_MISS register
+ * @ndev: Pointer to the net_device structure.
+ */
+static void arc_emac_rx_miss_handle(struct net_device *ndev)
+{
+ struct arc_emac_priv *priv = netdev_priv(ndev);
+ struct net_device_stats *stats = &ndev->stats;
+ unsigned int miss;
+
+ miss = arc_reg_get(priv, R_MISS);
+ if (miss) {
+ stats->rx_errors += miss;
+ stats->rx_missed_errors += miss;
+ priv->rx_missed_errors += miss;
+ }
+}
+
+/**
+ * arc_emac_rx_stall_check - check RX stall
+ * @ndev: Pointer to the net_device structure.
+ * @budget: How many BDs requested to process on 1 call.
+ * @work_done: How many BDs processed
+ *
+ * Under certain conditions EMAC stop reception of incoming packets and
+ * continuously increment R_MISS register instead of saving data into
+ * provided buffer. This function detect that condition and restart
+ * EMAC.
+ */
+static void arc_emac_rx_stall_check(struct net_device *ndev,
+ int budget, unsigned int work_done)
+{
+ struct arc_emac_priv *priv = netdev_priv(ndev);
+ struct arc_emac_bd *rxbd;
+
+ if (work_done)
+ priv->rx_missed_errors = 0;
+
+ if (priv->rx_missed_errors && budget) {
+ rxbd = &priv->rxbd[priv->last_rx_bd];
+ if (le32_to_cpu(rxbd->info) & FOR_EMAC) {
+ arc_emac_restart(ndev);
+ priv->rx_missed_errors = 0;
+ }
+ }
+}
+
/**
* arc_emac_poll - NAPI poll handler.
* @napi: Pointer to napi_struct structure.
unsigned int work_done;
arc_emac_tx_clean(ndev);
+ arc_emac_rx_miss_handle(ndev);
work_done = arc_emac_rx(ndev, budget);
if (work_done < budget) {
arc_reg_or(priv, R_ENABLE, RXINT_MASK | TXINT_MASK);
}
+ arc_emac_rx_stall_check(ndev, budget, work_done);
+
return work_done;
}
if (status & MSER_MASK) {
stats->rx_missed_errors += 0x100;
stats->rx_errors += 0x100;
+ priv->rx_missed_errors += 0x100;
+ napi_schedule(&priv->napi);
}
if (status & RXCR_MASK) {
}
+/**
+ * arc_emac_restart - Restart EMAC
+ * @ndev: Pointer to net_device structure.
+ *
+ * This function do hardware reset of EMAC in order to restore
+ * network packets reception.
+ */
+static void arc_emac_restart(struct net_device *ndev)
+{
+ struct arc_emac_priv *priv = netdev_priv(ndev);
+ struct net_device_stats *stats = &ndev->stats;
+ int i;
+
+ if (net_ratelimit())
+ netdev_warn(ndev, "restarting stalled EMAC\n");
+
+ netif_stop_queue(ndev);
+
+ /* Disable interrupts */
+ arc_reg_clr(priv, R_ENABLE, RXINT_MASK | TXINT_MASK | ERR_MASK);
+
+ /* Disable EMAC */
+ arc_reg_clr(priv, R_CTRL, EN_MASK);
+
+ /* Return the sk_buff to system */
+ arc_free_tx_queue(ndev);
+
+ /* Clean Tx BD's */
+ priv->txbd_curr = 0;
+ priv->txbd_dirty = 0;
+ memset(priv->txbd, 0, TX_RING_SZ);
+
+ for (i = 0; i < RX_BD_NUM; i++) {
+ struct arc_emac_bd *rxbd = &priv->rxbd[i];
+ unsigned int info = le32_to_cpu(rxbd->info);
+
+ if (!(info & FOR_EMAC)) {
+ stats->rx_errors++;
+ stats->rx_dropped++;
+ }
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
+ }
+ priv->last_rx_bd = 0;
+
+ /* Make sure info is visible to EMAC before enable */
+ wmb();
+
+ /* Enable interrupts */
+ arc_reg_set(priv, R_ENABLE, RXINT_MASK | TXINT_MASK | ERR_MASK);
+
+ /* Enable EMAC */
+ arc_reg_or(priv, R_CTRL, EN_MASK);
+
+ netif_start_queue(ndev);
+}
+
static const struct net_device_ops arc_emac_netdev_ops = {
.ndo_open = arc_emac_open,
.ndo_stop = arc_emac_stop,
/* RMII interface needs always a rate of 50MHz */
err = clk_set_rate(priv->refclk, 50000000);
- if (err)
+ if (err) {
dev_err(dev,
"failed to change reference clock rate (%d)\n", err);
+ goto out_regulator_disable;
+ }
if (priv->soc_data->need_div_macclk) {
priv->macclk = devm_clk_get(dev, "macclk");
err = arc_emac_probe(ndev, interface);
if (err) {
dev_err(dev, "failed to probe arc emac (%d)\n", err);
- goto out_regulator_disable;
+ goto out_clk_disable_macclk;
}
return 0;
+
out_clk_disable_macclk:
- clk_disable_unprepare(priv->macclk);
+ if (priv->soc_data->need_div_macclk)
+ clk_disable_unprepare(priv->macclk);
out_regulator_disable:
if (priv->regulator)
regulator_disable(priv->regulator);
del_timer_sync(&bp->timer);
- if (IS_PF(bp)) {
+ if (IS_PF(bp) && !BP_NOMCP(bp)) {
/* Set ALWAYS_ALIVE bit in shmem */
bp->fw_drv_pulse_wr_seq |= DRV_PULSE_ALWAYS_ALIVE;
bnx2x_drv_pulse(bp);
bp->cnic_loaded = false;
/* Clear driver version indication in shmem */
- if (IS_PF(bp))
+ if (IS_PF(bp) && !BP_NOMCP(bp))
bnx2x_update_mng_version(bp);
/* Check if there are pending parity attentions. If there are - set
do {
bp->common.shmem_base = REG_RD(bp, MISC_REG_SHARED_MEM_ADDR);
+
+ /* If we read all 0xFFs, means we are in PCI error state and
+ * should bail out to avoid crashes on adapter's FW reads.
+ */
+ if (bp->common.shmem_base == 0xFFFFFFFF) {
+ bp->flags |= NO_MCP_FLAG;
+ return -ENODEV;
+ }
+
if (bp->common.shmem_base) {
val = SHMEM_RD(bp, validity_map[BP_PORT(bp)]);
if (val & SHR_MEM_VALIDITY_MB)
BNX2X_ERR("IO slot reset --> driver unload\n");
/* MCP should have been reset; Need to wait for validity */
- bnx2x_init_shmem(bp);
+ if (bnx2x_init_shmem(bp)) {
+ rtnl_unlock();
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
if (IS_PF(bp) && SHMEM2_HAS(bp, drv_capabilities_flag)) {
u32 v;
netdev_err(bp->dev, "vf ndo called though sriov is disabled\n");
return -EINVAL;
}
- if (vf_id >= bp->pf.max_vfs) {
+ if (vf_id >= bp->pf.active_vfs) {
netdev_err(bp->dev, "Invalid VF id %d\n", vf_id);
return -EINVAL;
}
}
/* If all IP and L4 fields are wildcarded then this is an L2 flow */
- if (is_wildcard(&l3_mask, sizeof(l3_mask)) &&
+ if (is_wildcard(l3_mask, sizeof(*l3_mask)) &&
is_wildcard(&flow->l4_mask, sizeof(flow->l4_mask))) {
flow_flags |= CFA_FLOW_ALLOC_REQ_FLAGS_FLOWTYPE_L2;
} else {
* Copyright (C) 2001, 2002, 2003, 2004 David S. Miller (davem@redhat.com)
* Copyright (C) 2001, 2002, 2003 Jeff Garzik (jgarzik@pobox.com)
* Copyright (C) 2004 Sun Microsystems Inc.
- * Copyright (C) 2005-2014 Broadcom Corporation.
+ * Copyright (C) 2005-2016 Broadcom Corporation.
+ * Copyright (C) 2016-2017 Broadcom Limited.
*
* Firmware is:
* Derived from proprietary unpublished source code,
- * Copyright (C) 2000-2003 Broadcom Corporation.
+ * Copyright (C) 2000-2016 Broadcom Corporation.
+ * Copyright (C) 2016-2017 Broadcom Ltd.
*
* Permission is hereby granted for the distribution of this firmware
* data in hexadecimal or equivalent format, provided this copyright
tw32(GRC_MODE, tp->grc_mode | val);
+ /* On one of the AMD platform, MRRS is restricted to 4000 because of
+ * south bridge limitation. As a workaround, Driver is setting MRRS
+ * to 2048 instead of default 4096.
+ */
+ if (tp->pdev->subsystem_vendor == PCI_VENDOR_ID_DELL &&
+ tp->pdev->subsystem_device == TG3PCI_SUBDEVICE_ID_DELL_5762) {
+ val = tr32(TG3PCI_DEV_STATUS_CTRL) & ~MAX_READ_REQ_MASK;
+ tw32(TG3PCI_DEV_STATUS_CTRL, val | MAX_READ_REQ_SIZE_2048);
+ }
+
/* Setup the timer prescalar register. Clock is always 66Mhz. */
val = tr32(GRC_MISC_CFG);
val &= ~0xff;
/* Reset PHY, otherwise the read DMA engine will be in a mode that
* breaks all requests to 256 bytes.
*/
- if (tg3_asic_rev(tp) == ASIC_REV_57766)
+ if (tg3_asic_rev(tp) == ASIC_REV_57766 ||
+ tg3_asic_rev(tp) == ASIC_REV_5717 ||
+ tg3_asic_rev(tp) == ASIC_REV_5719 ||
+ tg3_asic_rev(tp) == ASIC_REV_5720)
reset_phy = true;
err = tg3_restart_hw(tp, reset_phy);
* Copyright (C) 2001, 2002, 2003, 2004 David S. Miller (davem@redhat.com)
* Copyright (C) 2001 Jeff Garzik (jgarzik@pobox.com)
* Copyright (C) 2004 Sun Microsystems Inc.
- * Copyright (C) 2007-2014 Broadcom Corporation.
+ * Copyright (C) 2007-2016 Broadcom Corporation.
+ * Copyright (C) 2016-2017 Broadcom Limited.
*/
#ifndef _T3_H
#define TG3PCI_SUBDEVICE_ID_DELL_JAGUAR 0x0106
#define TG3PCI_SUBDEVICE_ID_DELL_MERLOT 0x0109
#define TG3PCI_SUBDEVICE_ID_DELL_SLIM_MERLOT 0x010a
+#define TG3PCI_SUBDEVICE_ID_DELL_5762 0x07f0
#define TG3PCI_SUBVENDOR_ID_COMPAQ PCI_VENDOR_ID_COMPAQ
#define TG3PCI_SUBDEVICE_ID_COMPAQ_BANSHEE 0x007c
#define TG3PCI_SUBDEVICE_ID_COMPAQ_BANSHEE_2 0x009a
#define TG3PCI_STD_RING_PROD_IDX 0x00000098 /* 64-bit */
#define TG3PCI_RCV_RET_RING_CON_IDX 0x000000a0 /* 64-bit */
/* 0xa8 --> 0xb8 unused */
+#define TG3PCI_DEV_STATUS_CTRL 0x000000b4
+#define MAX_READ_REQ_SIZE_2048 0x00004000
+#define MAX_READ_REQ_MASK 0x00007000
#define TG3PCI_DUAL_MAC_CTRL 0x000000b8
#define DUAL_MAC_CTRL_CH_MASK 0x00000003
#define DUAL_MAC_CTRL_ID 0x00000004
unsigned int sf_size; /* serial flash size in bytes */
unsigned int sf_nsec; /* # of flash sectors */
- unsigned int sf_fw_start; /* start of FW image in flash */
unsigned int fw_vers; /* firmware version */
unsigned int bs_vers; /* bootstrap version */
SF_RD_DATA_FAST = 0xb, /* read flash */
SF_RD_ID = 0x9f, /* read ID */
SF_ERASE_SECTOR = 0xd8, /* erase sector */
-
- FW_MAX_SIZE = 16 * SF_SEC_SIZE,
};
/**
const __be32 *p = (const __be32 *)fw_data;
const struct fw_hdr *hdr = (const struct fw_hdr *)fw_data;
unsigned int sf_sec_size = adap->params.sf_size / adap->params.sf_nsec;
- unsigned int fw_img_start = adap->params.sf_fw_start;
- unsigned int fw_start_sec = fw_img_start / sf_sec_size;
+ unsigned int fw_start_sec = FLASH_FW_START_SEC;
+ unsigned int fw_size = FLASH_FW_MAX_SIZE;
+ unsigned int fw_start = FLASH_FW_START;
if (!size) {
dev_err(adap->pdev_dev, "FW image has no data\n");
"FW image size differs from size in FW header\n");
return -EINVAL;
}
- if (size > FW_MAX_SIZE) {
+ if (size > fw_size) {
dev_err(adap->pdev_dev, "FW image too large, max is %u bytes\n",
- FW_MAX_SIZE);
+ fw_size);
return -EFBIG;
}
if (!t4_fw_matches_chip(adap, hdr))
*/
memcpy(first_page, fw_data, SF_PAGE_SIZE);
((struct fw_hdr *)first_page)->fw_ver = cpu_to_be32(0xffffffff);
- ret = t4_write_flash(adap, fw_img_start, SF_PAGE_SIZE, first_page);
+ ret = t4_write_flash(adap, fw_start, SF_PAGE_SIZE, first_page);
if (ret)
goto out;
- addr = fw_img_start;
+ addr = fw_start;
for (size -= SF_PAGE_SIZE; size; size -= SF_PAGE_SIZE) {
addr += SF_PAGE_SIZE;
fw_data += SF_PAGE_SIZE;
}
ret = t4_write_flash(adap,
- fw_img_start + offsetof(struct fw_hdr, fw_ver),
+ fw_start + offsetof(struct fw_hdr, fw_ver),
sizeof(hdr->fw_ver), (const u8 *)&hdr->fw_ver);
out:
if (ret)
for (i = 0; i < txq->bd.ring_size; i++) {
/* Initialize the BD for every fragment in the page. */
bdp->cbd_sc = cpu_to_fec16(0);
+ if (bdp->cbd_bufaddr &&
+ !IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr)))
+ dma_unmap_single(&fep->pdev->dev,
+ fec32_to_cpu(bdp->cbd_bufaddr),
+ fec16_to_cpu(bdp->cbd_datlen),
+ DMA_TO_DEVICE);
if (txq->tx_skbuff[i]) {
dev_kfree_skb_any(txq->tx_skbuff[i]);
txq->tx_skbuff[i] = NULL;
goto failed_regulator;
}
} else {
+ if (PTR_ERR(fep->reg_phy) == -EPROBE_DEFER) {
+ ret = -EPROBE_DEFER;
+ goto failed_regulator;
+ }
fep->reg_phy = NULL;
}
failed_clk:
if (of_phy_is_fixed_link(np))
of_phy_deregister_fixed_link(np);
-failed_phy:
of_node_put(phy_node);
+failed_phy:
+ dev_id--;
failed_ioremap:
free_netdev(ndev);
now = tmr_cnt_read(etsects);
now += delta;
tmr_cnt_write(etsects, now);
+ set_fipers(etsects);
spin_unlock_irqrestore(&etsects->lock, flags);
- set_fipers(etsects);
-
return 0;
}
enum e1000_state_t {
__E1000_TESTING,
__E1000_RESETTING,
- __E1000_DOWN
+ __E1000_DOWN,
+ __E1000_DISABLED
};
#undef pr_fmt
static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
struct net_device *netdev;
- struct e1000_adapter *adapter;
+ struct e1000_adapter *adapter = NULL;
struct e1000_hw *hw;
static int cards_found;
u16 tmp = 0;
u16 eeprom_apme_mask = E1000_EEPROM_APME;
int bars, need_ioport;
+ bool disable_dev = false;
/* do not allocate ioport bars when not needed */
need_ioport = e1000_is_need_ioport(pdev);
iounmap(hw->ce4100_gbe_mdio_base_virt);
iounmap(hw->hw_addr);
err_ioremap:
+ disable_dev = !test_and_set_bit(__E1000_DISABLED, &adapter->flags);
free_netdev(netdev);
err_alloc_etherdev:
pci_release_selected_regions(pdev, bars);
err_pci_reg:
- pci_disable_device(pdev);
+ if (!adapter || disable_dev)
+ pci_disable_device(pdev);
return err;
}
struct net_device *netdev = pci_get_drvdata(pdev);
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
+ bool disable_dev;
e1000_down_and_stop(adapter);
e1000_release_manageability(adapter);
iounmap(hw->flash_address);
pci_release_selected_regions(pdev, adapter->bars);
+ disable_dev = !test_and_set_bit(__E1000_DISABLED, &adapter->flags);
free_netdev(netdev);
- pci_disable_device(pdev);
+ if (disable_dev)
+ pci_disable_device(pdev);
}
/**
if (netif_running(netdev))
e1000_free_irq(adapter);
- pci_disable_device(pdev);
+ if (!test_and_set_bit(__E1000_DISABLED, &adapter->flags))
+ pci_disable_device(pdev);
return 0;
}
pr_err("Cannot enable PCI device from suspend\n");
return err;
}
+
+ /* flush memory to make sure state is correct */
+ smp_mb__before_atomic();
+ clear_bit(__E1000_DISABLED, &adapter->flags);
pci_set_master(pdev);
pci_enable_wake(pdev, PCI_D3hot, 0);
if (netif_running(netdev))
e1000_down(adapter);
- pci_disable_device(pdev);
+
+ if (!test_and_set_bit(__E1000_DISABLED, &adapter->flags))
+ pci_disable_device(pdev);
/* Request a slot slot reset. */
return PCI_ERS_RESULT_NEED_RESET;
pr_err("Cannot re-enable PCI device after reset.\n");
return PCI_ERS_RESULT_DISCONNECT;
}
+
+ /* flush memory to make sure state is correct */
+ smp_mb__before_atomic();
+ clear_bit(__E1000_DISABLED, &adapter->flags);
pci_set_master(pdev);
pci_enable_wake(pdev, PCI_D3hot, 0);
* Checks to see of the link status of the hardware has changed. If a
* change in link status has been detected, then we read the PHY registers
* to get the current speed/duplex if link exists.
+ *
+ * Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link
+ * up).
**/
static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
{
* Change or Rx Sequence Error interrupt.
*/
if (!mac->get_link_status)
- return 0;
+ return 1;
/* First we want to see if the MII Status Register reports
* link. If so, then we want to get the current speed/duplex
* different link partner.
*/
ret_val = e1000e_config_fc_after_link_up(hw);
- if (ret_val)
+ if (ret_val) {
e_dbg("Error configuring flow control\n");
+ return ret_val;
+ }
- return ret_val;
+ return 1;
}
static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
else
netdev_info(netdev, "set new mac address %pM\n", addr->sa_data);
+ /* Copy the address first, so that we avoid a possible race with
+ * .set_rx_mode(). If we copy after changing the address in the filter
+ * list, we might open ourselves to a narrow race window where
+ * .set_rx_mode could delete our dev_addr filter and prevent traffic
+ * from passing.
+ */
+ ether_addr_copy(netdev->dev_addr, addr->sa_data);
+
spin_lock_bh(&vsi->mac_filter_hash_lock);
i40e_del_mac_filter(vsi, netdev->dev_addr);
i40e_add_mac_filter(vsi, addr->sa_data);
spin_unlock_bh(&vsi->mac_filter_hash_lock);
- ether_addr_copy(netdev->dev_addr, addr->sa_data);
if (vsi->type == I40E_VSI_MAIN) {
i40e_status ret;
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
+ /* Under some circumstances, we might receive a request to delete
+ * our own device address from our uc list. Because we store the
+ * device address in the VSI's MAC/VLAN filter list, we need to ignore
+ * such requests and not delete our device address from this list.
+ */
+ if (ether_addr_equal(addr, netdev->dev_addr))
+ return 0;
+
i40e_del_mac_filter(vsi, addr);
return 0;
/* Set Bit 7 to be valid */
mode = I40E_AQ_SET_SWITCH_BIT7_VALID;
- /* Set L4type to both TCP and UDP support */
- mode |= I40E_AQ_SET_SWITCH_L4_TYPE_BOTH;
+ /* Set L4type for TCP support */
+ mode |= I40E_AQ_SET_SWITCH_L4_TYPE_TCP;
/* Set cloud filter mode */
mode |= I40E_AQ_SET_SWITCH_MODE_NON_TUNNEL;
is_valid_ether_addr(filter->src_mac)) ||
(is_multicast_ether_addr(filter->dst_mac) &&
is_multicast_ether_addr(filter->src_mac)))
- return -EINVAL;
+ return -EOPNOTSUPP;
- /* Make sure port is specified, otherwise bail out, for channel
- * specific cloud filter needs 'L4 port' to be non-zero
+ /* Big buffer cloud filter needs 'L4 port' to be non-zero. Also, UDP
+ * ports are not supported via big buffer now.
*/
- if (!filter->dst_port)
- return -EINVAL;
+ if (!filter->dst_port || filter->ip_proto == IPPROTO_UDP)
+ return -EOPNOTSUPP;
/* adding filter using src_port/src_ip is not supported at this stage */
if (filter->src_port || filter->src_ipv4 ||
!ipv6_addr_any(&filter->ip.v6.src_ip6))
- return -EINVAL;
+ return -EOPNOTSUPP;
/* copy element needed to add cloud filter from filter */
i40e_set_cld_element(filter, &cld_filter.element);
is_multicast_ether_addr(filter->src_mac)) {
/* MAC + IP : unsupported mode */
if (filter->dst_ipv4)
- return -EINVAL;
+ return -EOPNOTSUPP;
/* since we validated that L4 port must be valid before
* we get here, start with respective "flags" value
if (tc < 0) {
dev_err(&vsi->back->pdev->dev, "Invalid traffic class\n");
- return -EINVAL;
+ return -EOPNOTSUPP;
}
if (test_bit(__I40E_RESET_RECOVERY_PENDING, pf->state) ||
/* Walk through fragments adding latest fragment, testing it, and
* then removing stale fragments from the sum.
*/
- stale = &skb_shinfo(skb)->frags[0];
- for (;;) {
+ for (stale = &skb_shinfo(skb)->frags[0];; stale++) {
+ int stale_size = skb_frag_size(stale);
+
sum += skb_frag_size(frag++);
+ /* The stale fragment may present us with a smaller
+ * descriptor than the actual fragment size. To account
+ * for that we need to remove all the data on the front and
+ * figure out what the remainder would be in the last
+ * descriptor associated with the fragment.
+ */
+ if (stale_size > I40E_MAX_DATA_PER_TXD) {
+ int align_pad = -(stale->page_offset) &
+ (I40E_MAX_READ_REQ_SIZE - 1);
+
+ sum -= align_pad;
+ stale_size -= align_pad;
+
+ do {
+ sum -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ stale_size -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ } while (stale_size > I40E_MAX_DATA_PER_TXD);
+ }
+
/* if sum is negative we failed to make sufficient progress */
if (sum < 0)
return true;
if (!nr_frags--)
break;
- sum -= skb_frag_size(stale++);
+ sum -= stale_size;
}
return false;
/* Walk through fragments adding latest fragment, testing it, and
* then removing stale fragments from the sum.
*/
- stale = &skb_shinfo(skb)->frags[0];
- for (;;) {
+ for (stale = &skb_shinfo(skb)->frags[0];; stale++) {
+ int stale_size = skb_frag_size(stale);
+
sum += skb_frag_size(frag++);
+ /* The stale fragment may present us with a smaller
+ * descriptor than the actual fragment size. To account
+ * for that we need to remove all the data on the front and
+ * figure out what the remainder would be in the last
+ * descriptor associated with the fragment.
+ */
+ if (stale_size > I40E_MAX_DATA_PER_TXD) {
+ int align_pad = -(stale->page_offset) &
+ (I40E_MAX_READ_REQ_SIZE - 1);
+
+ sum -= align_pad;
+ stale_size -= align_pad;
+
+ do {
+ sum -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ stale_size -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ } while (stale_size > I40E_MAX_DATA_PER_TXD);
+ }
+
/* if sum is negative we failed to make sufficient progress */
if (sum < 0)
return true;
if (!nr_frags--)
break;
- sum -= skb_frag_size(stale++);
+ sum -= stale_size;
}
return false;
val &= ~MVNETA_GMAC0_PORT_ENABLE;
mvreg_write(pp, MVNETA_GMAC_CTRL_0, val);
+ pp->link = 0;
+ pp->duplex = -1;
+ pp->speed = 0;
+
udelay(200);
}
if (!mvneta_rxq_desc_is_first_last(rx_status) ||
(rx_status & MVNETA_RXD_ERR_SUMMARY)) {
+ mvneta_rx_error(pp, rx_desc);
err_drop_frame:
dev->stats.rx_errors++;
- mvneta_rx_error(pp, rx_desc);
/* leave the descriptor untouched */
continue;
}
{
int queue;
- for (queue = 0; queue < txq_number; queue++)
+ for (queue = 0; queue < rxq_number; queue++)
mvneta_rxq_deinit(pp, &pp->rxqs[queue]);
}
if (hw->ports > 1) {
skge_write32(hw, B0_IMSK, 0);
skge_read32(hw, B0_IMSK);
- free_irq(pdev->irq, hw);
}
spin_unlock_irq(&hw->hw_lock);
/* set GE2 TUNE */
regmap_write(eth->pctl, GPIO_BIAS_CTRL, 0x0);
- /* GE1, Force 1000M/FD, FC ON */
- mtk_w32(eth, MAC_MCR_FIXED_LINK, MTK_MAC_MCR(0));
-
- /* GE2, Force 1000M/FD, FC ON */
- mtk_w32(eth, MAC_MCR_FIXED_LINK, MTK_MAC_MCR(1));
+ /* Set linkdown as the default for each GMAC. Its own MCR would be set
+ * up with the more appropriate value when mtk_phy_link_adjust call is
+ * being invoked.
+ */
+ for (i = 0; i < MTK_MAC_COUNT; i++)
+ mtk_w32(eth, 0, MTK_MAC_MCR(i));
/* Indicates CDM to parse the MTK special tag from CPU
* which also is working out for untag packets.
struct net_device *dev = mdev->pndev[port];
struct mlx4_en_priv *priv = netdev_priv(dev);
struct net_device_stats *stats = &dev->stats;
- struct mlx4_cmd_mailbox *mailbox;
+ struct mlx4_cmd_mailbox *mailbox, *mailbox_priority;
u64 in_mod = reset << 8 | port;
int err;
int i, counter_index;
mailbox = mlx4_alloc_cmd_mailbox(mdev->dev);
if (IS_ERR(mailbox))
return PTR_ERR(mailbox);
+
+ mailbox_priority = mlx4_alloc_cmd_mailbox(mdev->dev);
+ if (IS_ERR(mailbox_priority)) {
+ mlx4_free_cmd_mailbox(mdev->dev, mailbox);
+ return PTR_ERR(mailbox_priority);
+ }
+
err = mlx4_cmd_box(mdev->dev, 0, mailbox->dma, in_mod, 0,
MLX4_CMD_DUMP_ETH_STATS, MLX4_CMD_TIME_CLASS_B,
MLX4_CMD_NATIVE);
mlx4_en_stats = mailbox->buf;
+ memset(&tmp_counter_stats, 0, sizeof(tmp_counter_stats));
+ counter_index = mlx4_get_default_counter_index(mdev->dev, port);
+ err = mlx4_get_counter_stats(mdev->dev, counter_index,
+ &tmp_counter_stats, reset);
+
+ /* 0xffs indicates invalid value */
+ memset(mailbox_priority->buf, 0xff,
+ sizeof(*flowstats) * MLX4_NUM_PRIORITIES);
+
+ if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_FLOWSTATS_EN) {
+ memset(mailbox_priority->buf, 0,
+ sizeof(*flowstats) * MLX4_NUM_PRIORITIES);
+ err = mlx4_cmd_box(mdev->dev, 0, mailbox_priority->dma,
+ in_mod | MLX4_DUMP_ETH_STATS_FLOW_CONTROL,
+ 0, MLX4_CMD_DUMP_ETH_STATS,
+ MLX4_CMD_TIME_CLASS_B, MLX4_CMD_NATIVE);
+ if (err)
+ goto out;
+ }
+
+ flowstats = mailbox_priority->buf;
+
spin_lock_bh(&priv->stats_lock);
mlx4_en_fold_software_stats(dev);
priv->pkstats.tx_prio[8][0] = be64_to_cpu(mlx4_en_stats->TTOT_novlan);
priv->pkstats.tx_prio[8][1] = be64_to_cpu(mlx4_en_stats->TOCT_novlan);
- spin_unlock_bh(&priv->stats_lock);
-
- memset(&tmp_counter_stats, 0, sizeof(tmp_counter_stats));
- counter_index = mlx4_get_default_counter_index(mdev->dev, port);
- err = mlx4_get_counter_stats(mdev->dev, counter_index,
- &tmp_counter_stats, reset);
-
- /* 0xffs indicates invalid value */
- memset(mailbox->buf, 0xff, sizeof(*flowstats) * MLX4_NUM_PRIORITIES);
-
- if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_FLOWSTATS_EN) {
- memset(mailbox->buf, 0,
- sizeof(*flowstats) * MLX4_NUM_PRIORITIES);
- err = mlx4_cmd_box(mdev->dev, 0, mailbox->dma,
- in_mod | MLX4_DUMP_ETH_STATS_FLOW_CONTROL,
- 0, MLX4_CMD_DUMP_ETH_STATS,
- MLX4_CMD_TIME_CLASS_B, MLX4_CMD_NATIVE);
- if (err)
- goto out;
- }
-
- flowstats = mailbox->buf;
-
- spin_lock_bh(&priv->stats_lock);
-
if (tmp_counter_stats.counter_mode == 0) {
priv->pf_stats.rx_bytes = be64_to_cpu(tmp_counter_stats.rx_bytes);
priv->pf_stats.tx_bytes = be64_to_cpu(tmp_counter_stats.tx_bytes);
out:
mlx4_free_cmd_mailbox(mdev->dev, mailbox);
+ mlx4_free_cmd_mailbox(mdev->dev, mailbox_priority);
return err;
}
if (priv->mdev->dev->caps.flags &
MLX4_DEV_CAP_FLAG_UC_LOOPBACK) {
buf[3] = mlx4_en_test_registers(priv);
- if (priv->port_up)
+ if (priv->port_up && dev->mtu >= MLX4_SELFTEST_LB_MIN_MTU)
buf[4] = mlx4_en_test_loopback(priv);
}
#define SMALL_PACKET_SIZE (256 - NET_IP_ALIGN)
#define HEADER_COPY_SIZE (128 - NET_IP_ALIGN)
#define MLX4_LOOPBACK_TEST_PAYLOAD (HEADER_COPY_SIZE - ETH_HLEN)
+#define PREAMBLE_LEN 8
+#define MLX4_SELFTEST_LB_MIN_MTU (MLX4_LOOPBACK_TEST_PAYLOAD + NET_IP_ALIGN + \
+ ETH_HLEN + PREAMBLE_LEN)
#define MLX4_EN_MIN_MTU 46
/* VLAN_HLEN is added twice,to support skb vlan tagged with multiple
MLX4_MAX_PORTS;
else
res_alloc->guaranteed[t] = 0;
- res_alloc->res_free -= res_alloc->guaranteed[t];
break;
default:
break;
case MLX5_CMD_OP_QUERY_VPORT_COUNTER:
case MLX5_CMD_OP_ALLOC_Q_COUNTER:
case MLX5_CMD_OP_QUERY_Q_COUNTER:
- case MLX5_CMD_OP_SET_RATE_LIMIT:
+ case MLX5_CMD_OP_SET_PP_RATE_LIMIT:
case MLX5_CMD_OP_QUERY_RATE_LIMIT:
case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT:
case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT:
MLX5_COMMAND_STR_CASE(ALLOC_Q_COUNTER);
MLX5_COMMAND_STR_CASE(DEALLOC_Q_COUNTER);
MLX5_COMMAND_STR_CASE(QUERY_Q_COUNTER);
- MLX5_COMMAND_STR_CASE(SET_RATE_LIMIT);
+ MLX5_COMMAND_STR_CASE(SET_PP_RATE_LIMIT);
MLX5_COMMAND_STR_CASE(QUERY_RATE_LIMIT);
MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT);
MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT);
max_t(u32, MLX5_MPWRQ_MIN_LOG_STRIDE_SZ(mdev), req)
#define MLX5_MPWRQ_DEF_LOG_STRIDE_SZ(mdev) MLX5_MPWRQ_LOG_STRIDE_SZ(mdev, 6)
#define MLX5_MPWRQ_CQE_CMPRS_LOG_STRIDE_SZ(mdev) MLX5_MPWRQ_LOG_STRIDE_SZ(mdev, 8)
+#define MLX5E_MPWQE_STRIDE_SZ(mdev, cqe_cmprs) \
+ (cqe_cmprs ? MLX5_MPWRQ_CQE_CMPRS_LOG_STRIDE_SZ(mdev) : \
+ MLX5_MPWRQ_DEF_LOG_STRIDE_SZ(mdev))
#define MLX5_MPWRQ_LOG_WQE_SZ 18
#define MLX5_MPWRQ_WQE_PAGE_ORDER (MLX5_MPWRQ_LOG_WQE_SZ - PAGE_SHIFT > 0 ? \
struct mlx5_core_dev *mdev;
struct hwtstamp_config *tstamp;
int ix;
+ int cpu;
};
struct mlx5e_channels {
u8 cq_period_mode);
void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
u8 cq_period_mode);
-void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev,
- struct mlx5e_params *params, u8 rq_type);
+void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
+ struct mlx5e_params *params,
+ u8 rq_type);
static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
{
static int mlx5e_dbcnl_validate_ets(struct net_device *netdev,
struct ieee_ets *ets)
{
+ bool have_ets_tc = false;
int bw_sum = 0;
int i;
}
/* Validate Bandwidth Sum */
- for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
- if (ets->tc_tsa[i] == IEEE_8021QAZ_TSA_ETS)
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ if (ets->tc_tsa[i] == IEEE_8021QAZ_TSA_ETS) {
+ have_ets_tc = true;
bw_sum += ets->tc_tx_bw[i];
+ }
+ }
- if (bw_sum != 0 && bw_sum != 100) {
+ if (have_ets_tc && bw_sum != 100) {
netdev_err(netdev,
"Failed to validate ETS: BW sum is illegal\n");
return -EINVAL;
new_channels.params = priv->channels.params;
MLX5E_SET_PFLAG(&new_channels.params, MLX5E_PFLAG_RX_CQE_COMPRESS, new_val);
- mlx5e_set_rq_type_params(priv->mdev, &new_channels.params,
- new_channels.params.rq_wq_type);
+ new_channels.params.mpwqe_log_stride_sz =
+ MLX5E_MPWQE_STRIDE_SZ(priv->mdev, new_val);
+ new_channels.params.mpwqe_log_num_strides =
+ MLX5_MPWRQ_LOG_WQE_SZ - new_channels.params.mpwqe_log_stride_sz;
if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
priv->channels.params = new_channels.params;
return err;
mlx5e_switch_priv_channels(priv, &new_channels, NULL);
+ mlx5e_dbg(DRV, priv, "MLX5E: RxCqeCmprss was turned %s\n",
+ MLX5E_GET_PFLAG(&priv->channels.params,
+ MLX5E_PFLAG_RX_CQE_COMPRESS) ? "ON" : "OFF");
+
return 0;
}
struct mlx5e_cq_param icosq_cq;
};
-static int mlx5e_get_node(struct mlx5e_priv *priv, int ix)
-{
- return pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix);
-}
-
static bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev)
{
return MLX5_CAP_GEN(mdev, striding_rq) &&
MLX5_CAP_ETH(mdev, reg_umr_sq);
}
-void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev,
- struct mlx5e_params *params, u8 rq_type)
+void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
+ struct mlx5e_params *params, u8 rq_type)
{
params->rq_wq_type = rq_type;
params->lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ;
params->log_rq_size = is_kdump_kernel() ?
MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE_MPW :
MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE_MPW;
- params->mpwqe_log_stride_sz =
- MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS) ?
- MLX5_MPWRQ_CQE_CMPRS_LOG_STRIDE_SZ(mdev) :
- MLX5_MPWRQ_DEF_LOG_STRIDE_SZ(mdev);
+ params->mpwqe_log_stride_sz = MLX5E_MPWQE_STRIDE_SZ(mdev,
+ MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS));
params->mpwqe_log_num_strides = MLX5_MPWRQ_LOG_WQE_SZ -
params->mpwqe_log_stride_sz;
break;
MLX5E_GET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS));
}
-static void mlx5e_set_rq_params(struct mlx5_core_dev *mdev, struct mlx5e_params *params)
+static void mlx5e_set_rq_params(struct mlx5_core_dev *mdev,
+ struct mlx5e_params *params)
{
u8 rq_type = mlx5e_check_fragmented_striding_rq_cap(mdev) &&
!params->xdp_prog && !MLX5_IPSEC_DEV(mdev) ?
MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ :
MLX5_WQ_TYPE_LINKED_LIST;
- mlx5e_set_rq_type_params(mdev, params, rq_type);
+ mlx5e_init_rq_type_params(mdev, params, rq_type);
}
static void mlx5e_update_carrier(struct mlx5e_priv *priv)
int wq_sz = mlx5_wq_ll_get_size(&rq->wq);
int mtt_sz = mlx5e_get_wqe_mtt_sz();
int mtt_alloc = mtt_sz + MLX5_UMR_ALIGN - 1;
- int node = mlx5e_get_node(c->priv, c->ix);
int i;
rq->mpwqe.info = kzalloc_node(wq_sz * sizeof(*rq->mpwqe.info),
- GFP_KERNEL, node);
+ GFP_KERNEL, cpu_to_node(c->cpu));
if (!rq->mpwqe.info)
goto err_out;
/* We allocate more than mtt_sz as we will align the pointer */
- rq->mpwqe.mtt_no_align = kzalloc_node(mtt_alloc * wq_sz,
- GFP_KERNEL, node);
+ rq->mpwqe.mtt_no_align = kzalloc_node(mtt_alloc * wq_sz, GFP_KERNEL,
+ cpu_to_node(c->cpu));
if (unlikely(!rq->mpwqe.mtt_no_align))
goto err_free_wqe_info;
int err;
int i;
- rqp->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ rqp->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_ll_create(mdev, &rqp->wq, rqc_wq, &rq->wq,
&rq->wq_ctrl);
default: /* MLX5_WQ_TYPE_LINKED_LIST */
rq->wqe.frag_info =
kzalloc_node(wq_sz * sizeof(*rq->wqe.frag_info),
- GFP_KERNEL,
- mlx5e_get_node(c->priv, c->ix));
+ GFP_KERNEL, cpu_to_node(c->cpu));
if (!rq->wqe.frag_info) {
err = -ENOMEM;
goto err_rq_wq_destroy;
sq->uar_map = mdev->mlx5e_res.bfreg.map;
sq->min_inline_mode = params->tx_min_inline_mode;
- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq, &sq->wq_ctrl);
if (err)
return err;
sq->wq.db = &sq->wq.db[MLX5_SND_DBR];
- err = mlx5e_alloc_xdpsq_db(sq, mlx5e_get_node(c->priv, c->ix));
+ err = mlx5e_alloc_xdpsq_db(sq, cpu_to_node(c->cpu));
if (err)
goto err_sq_wq_destroy;
sq->channel = c;
sq->uar_map = mdev->mlx5e_res.bfreg.map;
- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq, &sq->wq_ctrl);
if (err)
return err;
sq->wq.db = &sq->wq.db[MLX5_SND_DBR];
- err = mlx5e_alloc_icosq_db(sq, mlx5e_get_node(c->priv, c->ix));
+ err = mlx5e_alloc_icosq_db(sq, cpu_to_node(c->cpu));
if (err)
goto err_sq_wq_destroy;
if (MLX5_IPSEC_DEV(c->priv->mdev))
set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);
- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq, &sq->wq_ctrl);
if (err)
return err;
sq->wq.db = &sq->wq.db[MLX5_SND_DBR];
- err = mlx5e_alloc_txqsq_db(sq, mlx5e_get_node(c->priv, c->ix));
+ err = mlx5e_alloc_txqsq_db(sq, cpu_to_node(c->cpu));
if (err)
goto err_sq_wq_destroy;
struct mlx5_core_dev *mdev = c->priv->mdev;
int err;
- param->wq.buf_numa_node = mlx5e_get_node(c->priv, c->ix);
- param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix);
+ param->wq.buf_numa_node = cpu_to_node(c->cpu);
+ param->wq.db_numa_node = cpu_to_node(c->cpu);
param->eq_ix = c->ix;
err = mlx5e_alloc_cq_common(mdev, param, cq);
mlx5e_free_cq(cq);
}
+static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix)
+{
+ return cpumask_first(priv->mdev->priv.irq_info[ix].mask);
+}
+
static int mlx5e_open_tx_cqs(struct mlx5e_channel *c,
struct mlx5e_params *params,
struct mlx5e_channel_param *cparam)
{
struct mlx5e_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
+ int cpu = mlx5e_get_cpu(priv, ix);
struct mlx5e_channel *c;
unsigned int irq;
int err;
int eqn;
- c = kzalloc_node(sizeof(*c), GFP_KERNEL, mlx5e_get_node(priv, ix));
+ c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu));
if (!c)
return -ENOMEM;
c->mdev = priv->mdev;
c->tstamp = &priv->tstamp;
c->ix = ix;
+ c->cpu = cpu;
c->pdev = &priv->mdev->pdev->dev;
c->netdev = priv->netdev;
c->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key);
for (tc = 0; tc < c->num_tc; tc++)
mlx5e_activate_txqsq(&c->sq[tc]);
mlx5e_activate_rq(&c->rq);
- netif_set_xps_queue(c->netdev,
- mlx5_get_vector_affinity(c->priv->mdev, c->ix), c->ix);
+ netif_set_xps_queue(c->netdev, get_cpu_mask(c->cpu), c->ix);
}
static void mlx5e_deactivate_channel(struct mlx5e_channel *c)
struct sk_buff *skb,
netdev_features_t features)
{
+ unsigned int offset = 0;
struct udphdr *udph;
u8 proto;
u16 port;
proto = ip_hdr(skb)->protocol;
break;
case htons(ETH_P_IPV6):
- proto = ipv6_hdr(skb)->nexthdr;
+ proto = ipv6_find_hdr(skb, &offset, -1, NULL, NULL);
break;
default:
goto out;
break;
case MLX5_EVENT_TYPE_CQ_ERROR:
cqn = be32_to_cpu(eqe->data.cq_err.cqn) & 0xffffff;
- mlx5_core_warn(dev, "CQ error on CQN 0x%x, syndrom 0x%x\n",
+ mlx5_core_warn(dev, "CQ error on CQN 0x%x, syndrome 0x%x\n",
cqn, eqe->data.cq_err.syndrome);
mlx5_cq_event(dev, cqn, eqe->type);
break;
return err;
}
-int mlx5_stop_eqs(struct mlx5_core_dev *dev)
+void mlx5_stop_eqs(struct mlx5_core_dev *dev)
{
struct mlx5_eq_table *table = &dev->priv.eq_table;
int err;
if (MLX5_CAP_GEN(dev, pg)) {
err = mlx5_destroy_unmap_eq(dev, &table->pfault_eq);
if (err)
- return err;
+ mlx5_core_err(dev, "failed to destroy page fault eq, err(%d)\n",
+ err);
}
#endif
err = mlx5_destroy_unmap_eq(dev, &table->pages_eq);
if (err)
- return err;
+ mlx5_core_err(dev, "failed to destroy pages eq, err(%d)\n",
+ err);
- mlx5_destroy_unmap_eq(dev, &table->async_eq);
+ err = mlx5_destroy_unmap_eq(dev, &table->async_eq);
+ if (err)
+ mlx5_core_err(dev, "failed to destroy async eq, err(%d)\n",
+ err);
mlx5_cmd_use_polling(dev);
err = mlx5_destroy_unmap_eq(dev, &table->cmd_eq);
if (err)
- mlx5_cmd_use_events(dev);
-
- return err;
+ mlx5_core_err(dev, "failed to destroy command eq, err(%d)\n",
+ err);
}
int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
u8 actual_size;
int err;
+ if (!size)
+ return -EINVAL;
+
if (!fdev->mdev)
return -ENOTCONN;
u8 actual_size;
int err;
+ if (!size)
+ return -EINVAL;
+
if (!fdev->mdev)
return -ENOTCONN;
static void del_sw_flow_table(struct fs_node *node);
static void del_sw_flow_group(struct fs_node *node);
static void del_sw_fte(struct fs_node *node);
+static void del_sw_prio(struct fs_node *node);
+static void del_sw_ns(struct fs_node *node);
/* Delete rule (destination) is special case that
* requires to lock the FTE for all the deletion process.
*/
return NULL;
}
+static void del_sw_ns(struct fs_node *node)
+{
+ kfree(node);
+}
+
+static void del_sw_prio(struct fs_node *node)
+{
+ kfree(node);
+}
+
static void del_hw_flow_table(struct fs_node *node)
{
struct mlx5_flow_table *ft;
return ERR_PTR(-ENOMEM);
fs_prio->node.type = FS_TYPE_PRIO;
- tree_init_node(&fs_prio->node, NULL, NULL);
+ tree_init_node(&fs_prio->node, NULL, del_sw_prio);
tree_add_node(&fs_prio->node, &ns->node);
fs_prio->num_levels = num_levels;
fs_prio->prio = prio;
return ERR_PTR(-ENOMEM);
fs_init_namespace(ns);
- tree_init_node(&ns->node, NULL, NULL);
+ tree_init_node(&ns->node, NULL, del_sw_ns);
tree_add_node(&ns->node, &prio->node);
list_add_tail(&ns->node.list, &prio->node.children);
u32 fw;
int i;
- /* If the syndrom is 0, the device is OK and no need to print buffer */
+ /* If the syndrome is 0, the device is OK and no need to print buffer */
if (!ioread8(&h->synd))
return;
struct mlx5e_params *params)
{
/* Override RQ params as IPoIB supports only LINKED LIST RQ for now */
- mlx5e_set_rq_type_params(mdev, params, MLX5_WQ_TYPE_LINKED_LIST);
+ mlx5e_init_rq_type_params(mdev, params, MLX5_WQ_TYPE_LINKED_LIST);
/* RQ size in ipoib by default is 512 */
params->log_rq_size = is_kdump_kernel() ?
}
EXPORT_SYMBOL(mlx5_cmd_destroy_vport_lag);
+static int mlx5_cmd_query_cong_counter(struct mlx5_core_dev *dev,
+ bool reset, void *out, int out_size)
+{
+ u32 in[MLX5_ST_SZ_DW(query_cong_statistics_in)] = { };
+
+ MLX5_SET(query_cong_statistics_in, in, opcode,
+ MLX5_CMD_OP_QUERY_CONG_STATISTICS);
+ MLX5_SET(query_cong_statistics_in, in, clear, reset);
+ return mlx5_cmd_exec(dev, in, sizeof(in), out, out_size);
+}
+
static struct mlx5_lag *mlx5_lag_dev_get(struct mlx5_core_dev *dev)
{
return dev->priv.lag;
/* If bonded, we do not add an IB device for PF1. */
return false;
}
+
+int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
+ u64 *values,
+ int num_counters,
+ size_t *offsets)
+{
+ int outlen = MLX5_ST_SZ_BYTES(query_cong_statistics_out);
+ struct mlx5_core_dev *mdev[MLX5_MAX_PORTS];
+ struct mlx5_lag *ldev;
+ int num_ports;
+ int ret, i, j;
+ void *out;
+
+ out = kvzalloc(outlen, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ memset(values, 0, sizeof(*values) * num_counters);
+
+ mutex_lock(&lag_mutex);
+ ldev = mlx5_lag_dev_get(dev);
+ if (ldev && mlx5_lag_is_bonded(ldev)) {
+ num_ports = MLX5_MAX_PORTS;
+ mdev[0] = ldev->pf[0].dev;
+ mdev[1] = ldev->pf[1].dev;
+ } else {
+ num_ports = 1;
+ mdev[0] = dev;
+ }
+
+ for (i = 0; i < num_ports; ++i) {
+ ret = mlx5_cmd_query_cong_counter(mdev[i], false, out, outlen);
+ if (ret)
+ goto unlock;
+
+ for (j = 0; j < num_counters; ++j)
+ values[j] += be64_to_cpup((__be64 *)(out + offsets[j]));
+ }
+
+unlock:
+ mutex_unlock(&lag_mutex);
+ kvfree(out);
+ return ret;
+}
+EXPORT_SYMBOL(mlx5_lag_query_cong_counters);
{
struct mlx5_priv *priv = &dev->priv;
struct mlx5_eq_table *table = &priv->eq_table;
- struct irq_affinity irqdesc = {
- .pre_vectors = MLX5_EQ_VEC_COMP_BASE,
- };
int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq);
int nvec;
if (!priv->irq_info)
goto err_free_msix;
- nvec = pci_alloc_irq_vectors_affinity(dev->pdev,
+ nvec = pci_alloc_irq_vectors(dev->pdev,
MLX5_EQ_VEC_COMP_BASE + 1, nvec,
- PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
- &irqdesc);
+ PCI_IRQ_MSIX);
if (nvec < 0)
return nvec;
return (u64)timer_l | (u64)timer_h1 << 32;
}
+static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i)
+{
+ struct mlx5_priv *priv = &mdev->priv;
+ int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
+
+ if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) {
+ mlx5_core_warn(mdev, "zalloc_cpumask_var failed");
+ return -ENOMEM;
+ }
+
+ cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node),
+ priv->irq_info[i].mask);
+
+ if (IS_ENABLED(CONFIG_SMP) &&
+ irq_set_affinity_hint(irq, priv->irq_info[i].mask))
+ mlx5_core_warn(mdev, "irq_set_affinity_hint failed, irq 0x%.4x", irq);
+
+ return 0;
+}
+
+static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i)
+{
+ struct mlx5_priv *priv = &mdev->priv;
+ int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i);
+
+ irq_set_affinity_hint(irq, NULL);
+ free_cpumask_var(priv->irq_info[i].mask);
+}
+
+static int mlx5_irq_set_affinity_hints(struct mlx5_core_dev *mdev)
+{
+ int err;
+ int i;
+
+ for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) {
+ err = mlx5_irq_set_affinity_hint(mdev, i);
+ if (err)
+ goto err_out;
+ }
+
+ return 0;
+
+err_out:
+ for (i--; i >= 0; i--)
+ mlx5_irq_clear_affinity_hint(mdev, i);
+
+ return err;
+}
+
+static void mlx5_irq_clear_affinity_hints(struct mlx5_core_dev *mdev)
+{
+ int i;
+
+ for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++)
+ mlx5_irq_clear_affinity_hint(mdev, i);
+}
+
int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
unsigned int *irqn)
{
goto err_stop_eqs;
}
+ err = mlx5_irq_set_affinity_hints(dev);
+ if (err) {
+ dev_err(&pdev->dev, "Failed to alloc affinity hint cpumask\n");
+ goto err_affinity_hints;
+ }
+
err = mlx5_init_fs(dev);
if (err) {
dev_err(&pdev->dev, "Failed to init flow steering\n");
mlx5_cleanup_fs(dev);
err_fs:
+ mlx5_irq_clear_affinity_hints(dev);
+
+err_affinity_hints:
free_comp_eqs(dev);
err_stop_eqs:
mlx5_sriov_detach(dev);
mlx5_cleanup_fs(dev);
+ mlx5_irq_clear_affinity_hints(dev);
free_comp_eqs(dev);
mlx5_stop_eqs(dev);
mlx5_put_uars_page(dev, priv->uar);
err_cmd:
memset(din, 0, sizeof(din));
memset(dout, 0, sizeof(dout));
- MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
- MLX5_SET(destroy_qp_in, in, qpn, qp->qpn);
+ MLX5_SET(destroy_qp_in, din, opcode, MLX5_CMD_OP_DESTROY_QP);
+ MLX5_SET(destroy_qp_in, din, qpn, qp->qpn);
mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout));
return err;
}
return ret_entry;
}
-static int mlx5_set_rate_limit_cmd(struct mlx5_core_dev *dev,
+static int mlx5_set_pp_rate_limit_cmd(struct mlx5_core_dev *dev,
u32 rate, u16 index)
{
- u32 in[MLX5_ST_SZ_DW(set_rate_limit_in)] = {0};
- u32 out[MLX5_ST_SZ_DW(set_rate_limit_out)] = {0};
+ u32 in[MLX5_ST_SZ_DW(set_pp_rate_limit_in)] = {0};
+ u32 out[MLX5_ST_SZ_DW(set_pp_rate_limit_out)] = {0};
- MLX5_SET(set_rate_limit_in, in, opcode,
- MLX5_CMD_OP_SET_RATE_LIMIT);
- MLX5_SET(set_rate_limit_in, in, rate_limit_index, index);
- MLX5_SET(set_rate_limit_in, in, rate_limit, rate);
+ MLX5_SET(set_pp_rate_limit_in, in, opcode,
+ MLX5_CMD_OP_SET_PP_RATE_LIMIT);
+ MLX5_SET(set_pp_rate_limit_in, in, rate_limit_index, index);
+ MLX5_SET(set_pp_rate_limit_in, in, rate_limit, rate);
return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
}
entry->refcount++;
} else {
/* new rate limit */
- err = mlx5_set_rate_limit_cmd(dev, rate, entry->index);
+ err = mlx5_set_pp_rate_limit_cmd(dev, rate, entry->index);
if (err) {
mlx5_core_err(dev, "Failed configuring rate: %u (%d)\n",
rate, err);
entry->refcount--;
if (!entry->refcount) {
/* need to remove rate */
- mlx5_set_rate_limit_cmd(dev, 0, entry->index);
+ mlx5_set_pp_rate_limit_cmd(dev, 0, entry->index);
entry->rate = 0;
}
/* Clear all configured rates */
for (i = 0; i < table->max_size; i++)
if (table->rl_entry[i].rate)
- mlx5_set_rate_limit_cmd(dev, 0,
- table->rl_entry[i].index);
+ mlx5_set_pp_rate_limit_cmd(dev, 0,
+ table->rl_entry[i].index);
kfree(dev->priv.rl_table.rl_entry);
}
struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan;
struct mlx5e_vxlan *vxlan;
- spin_lock(&vxlan_db->lock);
+ spin_lock_bh(&vxlan_db->lock);
vxlan = radix_tree_lookup(&vxlan_db->tree, port);
- spin_unlock(&vxlan_db->lock);
+ spin_unlock_bh(&vxlan_db->lock);
return vxlan;
}
struct mlx5e_vxlan *vxlan;
int err;
- if (mlx5e_vxlan_lookup_port(priv, port))
+ mutex_lock(&priv->state_lock);
+ vxlan = mlx5e_vxlan_lookup_port(priv, port);
+ if (vxlan) {
+ atomic_inc(&vxlan->refcount);
goto free_work;
+ }
if (mlx5e_vxlan_core_add_port_cmd(priv->mdev, port))
goto free_work;
goto err_delete_port;
vxlan->udp_port = port;
+ atomic_set(&vxlan->refcount, 1);
- spin_lock_irq(&vxlan_db->lock);
+ spin_lock_bh(&vxlan_db->lock);
err = radix_tree_insert(&vxlan_db->tree, vxlan->udp_port, vxlan);
- spin_unlock_irq(&vxlan_db->lock);
+ spin_unlock_bh(&vxlan_db->lock);
if (err)
goto err_free;
err_delete_port:
mlx5e_vxlan_core_del_port_cmd(priv->mdev, port);
free_work:
+ mutex_unlock(&priv->state_lock);
kfree(vxlan_work);
}
-static void __mlx5e_vxlan_core_del_port(struct mlx5e_priv *priv, u16 port)
+static void mlx5e_vxlan_del_port(struct work_struct *work)
{
+ struct mlx5e_vxlan_work *vxlan_work =
+ container_of(work, struct mlx5e_vxlan_work, work);
+ struct mlx5e_priv *priv = vxlan_work->priv;
struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan;
+ u16 port = vxlan_work->port;
struct mlx5e_vxlan *vxlan;
+ bool remove = false;
- spin_lock_irq(&vxlan_db->lock);
- vxlan = radix_tree_delete(&vxlan_db->tree, port);
- spin_unlock_irq(&vxlan_db->lock);
-
+ mutex_lock(&priv->state_lock);
+ spin_lock_bh(&vxlan_db->lock);
+ vxlan = radix_tree_lookup(&vxlan_db->tree, port);
if (!vxlan)
- return;
-
- mlx5e_vxlan_core_del_port_cmd(priv->mdev, vxlan->udp_port);
-
- kfree(vxlan);
-}
+ goto out_unlock;
-static void mlx5e_vxlan_del_port(struct work_struct *work)
-{
- struct mlx5e_vxlan_work *vxlan_work =
- container_of(work, struct mlx5e_vxlan_work, work);
- struct mlx5e_priv *priv = vxlan_work->priv;
- u16 port = vxlan_work->port;
+ if (atomic_dec_and_test(&vxlan->refcount)) {
+ radix_tree_delete(&vxlan_db->tree, port);
+ remove = true;
+ }
- __mlx5e_vxlan_core_del_port(priv, port);
+out_unlock:
+ spin_unlock_bh(&vxlan_db->lock);
+ if (remove) {
+ mlx5e_vxlan_core_del_port_cmd(priv->mdev, port);
+ kfree(vxlan);
+ }
+ mutex_unlock(&priv->state_lock);
kfree(vxlan_work);
}
struct mlx5e_vxlan *vxlan;
unsigned int port = 0;
- spin_lock_irq(&vxlan_db->lock);
+ /* Lockless since we are the only radix-tree consumers, wq is disabled */
while (radix_tree_gang_lookup(&vxlan_db->tree, (void **)&vxlan, port, 1)) {
port = vxlan->udp_port;
- spin_unlock_irq(&vxlan_db->lock);
- __mlx5e_vxlan_core_del_port(priv, (u16)port);
- spin_lock_irq(&vxlan_db->lock);
+ radix_tree_delete(&vxlan_db->tree, port);
+ mlx5e_vxlan_core_del_port_cmd(priv->mdev, port);
+ kfree(vxlan);
}
- spin_unlock_irq(&vxlan_db->lock);
}
#include "en.h"
struct mlx5e_vxlan {
+ atomic_t refcount;
u16 udp_port;
};
return 0;
}
- wmb(); /* reset needs to be written before we read control register */
+ /* Reset needs to be written before we read control register, and
+ * we must wait for the HW to become responsive once again
+ */
+ wmb();
+ msleep(MLXSW_PCI_SW_RESET_WAIT_MSECS);
+
end = jiffies + msecs_to_jiffies(MLXSW_PCI_SW_RESET_TIMEOUT_MSECS);
do {
u32 val = mlxsw_pci_read32(mlxsw_pci, FW_READY);
#define MLXSW_PCI_SW_RESET 0xF0010
#define MLXSW_PCI_SW_RESET_RST_BIT BIT(0)
#define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS 5000
+#define MLXSW_PCI_SW_RESET_WAIT_MSECS 100
#define MLXSW_PCI_FW_READY 0xA1844
#define MLXSW_PCI_FW_READY_MASK 0xFFFF
#define MLXSW_PCI_FW_READY_MAGIC 0x5E
static int mlxsw_sp_port_ovs_join(struct mlxsw_sp_port *mlxsw_sp_port)
{
+ u16 vid = 1;
int err;
err = mlxsw_sp_port_vp_mode_set(mlxsw_sp_port, true);
true, false);
if (err)
goto err_port_vlan_set;
+
+ for (; vid <= VLAN_N_VID - 1; vid++) {
+ err = mlxsw_sp_port_vid_learning_set(mlxsw_sp_port,
+ vid, false);
+ if (err)
+ goto err_vid_learning_set;
+ }
+
return 0;
+err_vid_learning_set:
+ for (vid--; vid >= 1; vid--)
+ mlxsw_sp_port_vid_learning_set(mlxsw_sp_port, vid, true);
err_port_vlan_set:
mlxsw_sp_port_stp_set(mlxsw_sp_port, false);
err_port_stp_set:
static void mlxsw_sp_port_ovs_leave(struct mlxsw_sp_port *mlxsw_sp_port)
{
+ u16 vid;
+
+ for (vid = VLAN_N_VID - 1; vid >= 1; vid--)
+ mlxsw_sp_port_vid_learning_set(mlxsw_sp_port,
+ vid, true);
+
mlxsw_sp_port_vlan_set(mlxsw_sp_port, 2, VLAN_N_VID - 1,
false, false);
mlxsw_sp_port_stp_set(mlxsw_sp_port, false);
}
if (!info->linking)
break;
- if (netdev_has_any_upper_dev(upper_dev)) {
+ if (netdev_has_any_upper_dev(upper_dev) &&
+ (!netif_is_bridge_master(upper_dev) ||
+ !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp,
+ upper_dev))) {
NL_SET_ERR_MSG(extack,
"spectrum: Enslaving a port to a device that already has an upper device is not supported");
return -EINVAL;
u16 vid)
{
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+ struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
struct netdev_notifier_changeupper_info *info = ptr;
struct netlink_ext_ack *extack;
struct net_device *upper_dev;
}
if (!info->linking)
break;
- if (netdev_has_any_upper_dev(upper_dev)) {
+ if (netdev_has_any_upper_dev(upper_dev) &&
+ (!netif_is_bridge_master(upper_dev) ||
+ !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp,
+ upper_dev))) {
NL_SET_ERR_MSG(extack, "spectrum: Enslaving a port to a device that already has an upper device is not supported");
return -EINVAL;
}
void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
struct net_device *brport_dev,
struct net_device *br_dev);
+bool mlxsw_sp_bridge_device_is_offloaded(const struct mlxsw_sp *mlxsw_sp,
+ const struct net_device *br_dev);
/* spectrum.c */
int mlxsw_sp_port_ets_set(struct mlxsw_sp_port *mlxsw_sp_port,
int tclass_num, u32 min, u32 max,
u32 probability, bool is_ecn)
{
- char cwtp_cmd[max_t(u8, MLXSW_REG_CWTP_LEN, MLXSW_REG_CWTPM_LEN)];
+ char cwtpm_cmd[MLXSW_REG_CWTPM_LEN];
+ char cwtp_cmd[MLXSW_REG_CWTP_LEN];
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
int err;
if (err)
return err;
- mlxsw_reg_cwtpm_pack(cwtp_cmd, mlxsw_sp_port->local_port, tclass_num,
+ mlxsw_reg_cwtpm_pack(cwtpm_cmd, mlxsw_sp_port->local_port, tclass_num,
MLXSW_REG_CWTP_DEFAULT_PROFILE, true, is_ecn);
- return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(cwtpm), cwtp_cmd);
+ return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(cwtpm), cwtpm_cmd);
}
static int
rhashtable_destroy(&mlxsw_sp->router->neigh_ht);
}
-static int mlxsw_sp_neigh_rif_flush(struct mlxsw_sp *mlxsw_sp,
- const struct mlxsw_sp_rif *rif)
-{
- char rauht_pl[MLXSW_REG_RAUHT_LEN];
-
- mlxsw_reg_rauht_pack(rauht_pl, MLXSW_REG_RAUHT_OP_WRITE_DELETE_ALL,
- rif->rif_index, rif->addr);
- return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rauht), rauht_pl);
-}
-
static void mlxsw_sp_neigh_rif_gone_sync(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_rif *rif)
{
struct mlxsw_sp_neigh_entry *neigh_entry, *tmp;
- mlxsw_sp_neigh_rif_flush(mlxsw_sp, rif);
list_for_each_entry_safe(neigh_entry, tmp, &rif->neigh_list,
- rif_list_node)
+ rif_list_node) {
+ mlxsw_sp_neigh_entry_update(mlxsw_sp, neigh_entry, false);
mlxsw_sp_neigh_entry_destroy(mlxsw_sp, neigh_entry);
+ }
}
enum mlxsw_sp_nexthop_type {
{
if (!removing)
nh->should_offload = 1;
- else if (nh->offloaded)
+ else
nh->should_offload = 0;
nh->update = 1;
}
return NULL;
}
+bool mlxsw_sp_bridge_device_is_offloaded(const struct mlxsw_sp *mlxsw_sp,
+ const struct net_device *br_dev)
+{
+ return !!mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
+}
+
static struct mlxsw_sp_bridge_device *
mlxsw_sp_bridge_device_create(struct mlxsw_sp_bridge *bridge,
struct net_device *br_dev)
return nfp_net_ebpf_capable(nn) ? "BPF" : "";
}
+static int
+nfp_bpf_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id)
+{
+ int err;
+
+ nn->app_priv = kzalloc(sizeof(struct nfp_bpf_vnic), GFP_KERNEL);
+ if (!nn->app_priv)
+ return -ENOMEM;
+
+ err = nfp_app_nic_vnic_alloc(app, nn, id);
+ if (err)
+ goto err_free_priv;
+
+ return 0;
+err_free_priv:
+ kfree(nn->app_priv);
+ return err;
+}
+
static void nfp_bpf_vnic_free(struct nfp_app *app, struct nfp_net *nn)
{
+ struct nfp_bpf_vnic *bv = nn->app_priv;
+
if (nn->dp.bpf_offload_xdp)
nfp_bpf_xdp_offload(app, nn, NULL);
+ WARN_ON(bv->tc_prog);
+ kfree(bv);
}
static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
{
struct tc_cls_bpf_offload *cls_bpf = type_data;
struct nfp_net *nn = cb_priv;
+ struct bpf_prog *oldprog;
+ struct nfp_bpf_vnic *bv;
+ int err;
if (type != TC_SETUP_CLSBPF ||
!tc_can_offload(nn->dp.netdev) ||
cls_bpf->common.protocol != htons(ETH_P_ALL) ||
cls_bpf->common.chain_index)
return -EOPNOTSUPP;
- if (nn->dp.bpf_offload_xdp)
- return -EBUSY;
/* Only support TC direct action */
if (!cls_bpf->exts_integrated ||
return -EOPNOTSUPP;
}
- switch (cls_bpf->command) {
- case TC_CLSBPF_REPLACE:
- return nfp_net_bpf_offload(nn, cls_bpf->prog, true);
- case TC_CLSBPF_ADD:
- return nfp_net_bpf_offload(nn, cls_bpf->prog, false);
- case TC_CLSBPF_DESTROY:
- return nfp_net_bpf_offload(nn, NULL, true);
- default:
+ if (cls_bpf->command != TC_CLSBPF_OFFLOAD)
return -EOPNOTSUPP;
+
+ bv = nn->app_priv;
+ oldprog = cls_bpf->oldprog;
+
+ /* Don't remove if oldprog doesn't match driver's state */
+ if (bv->tc_prog != oldprog) {
+ oldprog = NULL;
+ if (!cls_bpf->prog)
+ return 0;
}
+
+ err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog);
+ if (err)
+ return err;
+
+ bv->tc_prog = cls_bpf->prog;
+ return 0;
}
static int nfp_bpf_setup_tc_block(struct net_device *netdev,
.extra_cap = nfp_bpf_extra_cap,
- .vnic_alloc = nfp_app_nic_vnic_alloc,
+ .vnic_alloc = nfp_bpf_vnic_alloc,
.vnic_free = nfp_bpf_vnic_free,
.setup_tc = nfp_bpf_setup_tc,
struct list_head insns;
};
+/**
+ * struct nfp_bpf_vnic - per-vNIC BPF priv structure
+ * @tc_prog: currently loaded cls_bpf program
+ */
+struct nfp_bpf_vnic {
+ struct bpf_prog *tc_prog;
+};
+
int nfp_bpf_jit(struct nfp_prog *prog);
extern const struct bpf_ext_analyzer_ops nfp_bpf_analyzer_ops;
return err;
}
nn_writeb(nn, ctrl_offset, entry->entry);
+ nfp_net_irq_unmask(nn, entry->entry);
return 0;
}
unsigned int vector_idx)
{
nn_writeb(nn, ctrl_offset, 0xff);
+ nn_pci_flush(nn);
free_irq(nn->irq_entries[vector_idx].vector, nn);
}
#define MDIO_CLK_25_28 7
#define MDIO_WAIT_TIMES 1000
+#define MDIO_STATUS_DELAY_TIME 1
static int emac_mdio_read(struct mii_bus *bus, int addr, int regnum)
{
if (readl_poll_timeout(adpt->base + EMAC_MDIO_CTRL, reg,
!(reg & (MDIO_START | MDIO_BUSY)),
- 100, MDIO_WAIT_TIMES * 100))
+ MDIO_STATUS_DELAY_TIME, MDIO_WAIT_TIMES * 100))
return -EIO;
return (reg >> MDIO_DATA_SHFT) & MDIO_DATA_BMSK;
writel(reg, adpt->base + EMAC_MDIO_CTRL);
if (readl_poll_timeout(adpt->base + EMAC_MDIO_CTRL, reg,
- !(reg & (MDIO_START | MDIO_BUSY)), 100,
- MDIO_WAIT_TIMES * 100))
+ !(reg & (MDIO_START | MDIO_BUSY)),
+ MDIO_STATUS_DELAY_TIME, MDIO_WAIT_TIMES * 100))
return -EIO;
return 0;
return ret;
}
- ret = emac_mac_up(adpt);
+ ret = adpt->phy.open(adpt);
if (ret) {
emac_mac_rx_tx_rings_free_all(adpt);
free_irq(irq->irq, irq);
return ret;
}
- ret = adpt->phy.open(adpt);
+ ret = emac_mac_up(adpt);
if (ret) {
- emac_mac_down(adpt);
emac_mac_rx_tx_rings_free_all(adpt);
free_irq(irq->irq, irq);
+ adpt->phy.close(adpt);
return ret;
}
struct ravb_private *priv = netdev_priv(ndev);
int ret = 0;
- if (priv->wol_enabled) {
- /* Reduce the usecount of the clock to zero and then
- * restore it to its original value. This is done to force
- * the clock to be re-enabled which is a workaround
- * for renesas-cpg-mssr driver which do not enable clocks
- * when resuming from PSCI suspend/resume.
- *
- * Without this workaround the driver fails to communicate
- * with the hardware if WoL was enabled when the system
- * entered PSCI suspend. This is due to that if WoL is enabled
- * we explicitly keep the clock from being turned off when
- * suspending, but in PSCI sleep power is cut so the clock
- * is disabled anyhow, the clock driver is not aware of this
- * so the clock is not turned back on when resuming.
- *
- * TODO: once the renesas-cpg-mssr suspend/resume is working
- * this clock dance should be removed.
- */
- clk_disable(priv->clk);
- clk_disable(priv->clk);
- clk_enable(priv->clk);
- clk_enable(priv->clk);
-
- /* Set reset mode to rearm the WoL logic */
+ /* If WoL is enabled set reset mode to rearm the WoL logic */
+ if (priv->wol_enabled)
ravb_write(ndev, CCC_OPC_RESET, CCC);
- }
/* All register have been reset to default values.
* Restore all registers which where setup at probe time and
[FWNLCR0] = 0x0090,
[FWALCR0] = 0x0094,
[TXNLCR1] = 0x00a0,
- [TXALCR1] = 0x00a0,
+ [TXALCR1] = 0x00a4,
[RXNLCR1] = 0x00a8,
[RXALCR1] = 0x00ac,
[FWNLCR1] = 0x00b0,
[FWNLCR0] = 0x0090,
[FWALCR0] = 0x0094,
[TXNLCR1] = 0x00a0,
- [TXALCR1] = 0x00a0,
+ [TXALCR1] = 0x00a4,
[RXNLCR1] = 0x00a8,
[RXALCR1] = 0x00ac,
[FWNLCR1] = 0x00b0,
return PTR_ERR(phydev);
}
+ /* mask with MAC supported features */
+ if (mdp->cd->register_type != SH_ETH_REG_GIGABIT) {
+ int err = phy_set_max_speed(phydev, SPEED_100);
+ if (err) {
+ netdev_err(ndev, "failed to limit PHY to 100 Mbit/s\n");
+ phy_disconnect(phydev);
+ return err;
+ }
+ }
+
phy_attached_info(phydev);
return 0;
/* ioremap the TSU registers */
if (mdp->cd->tsu) {
struct resource *rtsu;
+
rtsu = platform_get_resource(pdev, IORESOURCE_MEM, 1);
- mdp->tsu_addr = devm_ioremap_resource(&pdev->dev, rtsu);
- if (IS_ERR(mdp->tsu_addr)) {
- ret = PTR_ERR(mdp->tsu_addr);
+ if (!rtsu) {
+ dev_err(&pdev->dev, "no TSU resource\n");
+ ret = -ENODEV;
+ goto out_release;
+ }
+ /* We can only request the TSU region for the first port
+ * of the two sharing this TSU for the probe to succeed...
+ */
+ if (devno % 2 == 0 &&
+ !devm_request_mem_region(&pdev->dev, rtsu->start,
+ resource_size(rtsu),
+ dev_name(&pdev->dev))) {
+ dev_err(&pdev->dev, "can't request TSU resource.\n");
+ ret = -EBUSY;
+ goto out_release;
+ }
+ mdp->tsu_addr = devm_ioremap(&pdev->dev, rtsu->start,
+ resource_size(rtsu));
+ if (!mdp->tsu_addr) {
+ dev_err(&pdev->dev, "TSU region ioremap() failed.\n");
+ ret = -ENOMEM;
goto out_release;
}
mdp->port = devno % 2;
ndev->features = NETIF_F_HW_VLAN_CTAG_FILTER;
}
- /* initialize first or needed device */
- if (!devno || pd->needs_init) {
+ /* Need to init only the first port of the two sharing a TSU */
+ if (devno % 2 == 0) {
if (mdp->cd->chip_reset)
mdp->cd->chip_reset(ndev);
/* get timestamp value */
u64(*get_timestamp) (void *desc, u32 ats);
/* get rx timestamp status */
- int (*get_rx_timestamp_status) (void *desc, u32 ats);
+ int (*get_rx_timestamp_status)(void *desc, void *next_desc, u32 ats);
/* Display ring */
void (*display_ring)(void *head, unsigned int size, bool rx);
/* set MSS via context descriptor */
return ret;
}
-static int dwmac4_wrback_get_rx_timestamp_status(void *desc, u32 ats)
+static int dwmac4_wrback_get_rx_timestamp_status(void *desc, void *next_desc,
+ u32 ats)
{
struct dma_desc *p = (struct dma_desc *)desc;
int ret = -EINVAL;
/* Check if timestamp is OK from context descriptor */
do {
- ret = dwmac4_rx_check_timestamp(desc);
+ ret = dwmac4_rx_check_timestamp(next_desc);
if (ret < 0)
goto exit;
i++;
return ns;
}
-static int enh_desc_get_rx_timestamp_status(void *desc, u32 ats)
+static int enh_desc_get_rx_timestamp_status(void *desc, void *next_desc,
+ u32 ats)
{
if (ats) {
struct dma_extended_desc *p = (struct dma_extended_desc *)desc;
return ns;
}
-static int ndesc_get_rx_timestamp_status(void *desc, u32 ats)
+static int ndesc_get_rx_timestamp_status(void *desc, void *next_desc, u32 ats)
{
struct dma_desc *p = (struct dma_desc *)desc;
{
u32 value = readl(ioaddr + PTP_TCR);
unsigned long data;
+ u32 reg_value;
/* For GMAC3.x, 4.x versions, convert the ptp_clock to nano second
* formula = (1/ptp_clock) * 1000000000
data &= PTP_SSIR_SSINC_MASK;
+ reg_value = data;
if (gmac4)
- data = data << GMAC4_PTP_SSIR_SSINC_SHIFT;
+ reg_value <<= GMAC4_PTP_SSIR_SSINC_SHIFT;
- writel(data, ioaddr + PTP_SSIR);
+ writel(reg_value, ioaddr + PTP_SSIR);
return data;
}
bool stmmac_eee_init(struct stmmac_priv *priv)
{
struct net_device *ndev = priv->dev;
+ int interface = priv->plat->interface;
unsigned long flags;
bool ret = false;
+ if ((interface != PHY_INTERFACE_MODE_MII) &&
+ (interface != PHY_INTERFACE_MODE_GMII) &&
+ !phy_interface_mode_is_rgmii(interface))
+ goto out;
+
/* Using PCS we cannot dial with the phy registers at this stage
* so we do not support extra feature like EEE.
*/
desc = np;
/* Check if timestamp is available */
- if (priv->hw->desc->get_rx_timestamp_status(desc, priv->adv_ts)) {
+ if (priv->hw->desc->get_rx_timestamp_status(p, np, priv->adv_ts)) {
ns = priv->hw->desc->get_timestamp(desc, priv->adv_ts);
netdev_dbg(priv->dev, "get valid RX hw timestamp %llu\n", ns);
shhwtstamp = skb_hwtstamps(skb);
if (IS_ERR(rt))
return PTR_ERR(rt);
+ if (skb_dst(skb)) {
+ int mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr) -
+ GENEVE_BASE_HLEN - info->options_len - 14;
+
+ skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ }
+
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
if (geneve->collect_md) {
tos = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
if (IS_ERR(dst))
return PTR_ERR(dst);
+ if (skb_dst(skb)) {
+ int mtu = dst_mtu(dst) - sizeof(struct ipv6hdr) -
+ GENEVE_BASE_HLEN - info->options_len - 14;
+
+ skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+ }
+
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
if (geneve->collect_md) {
prio = ip_tunnel_ecn_encap(key->tos, ip_hdr(skb), skb);
rrpriv->info_dma);
rrpriv->info = NULL;
- free_irq(pdev->irq, dev);
spin_unlock_irqrestore(&rrpriv->lock, flags);
+ free_irq(pdev->irq, dev);
return 0;
}
return 0;
unregister_netdev:
+ /* macvlan_uninit would free the macvlan port */
unregister_netdevice(dev);
+ return err;
destroy_macvlan_port:
- if (create)
+ /* the macvlan port may be freed by macvlan_uninit when fail to register.
+ * so we destroy the macvlan port only when it's valid.
+ */
+ if (create && macvlan_port_get_rtnl(dev))
macvlan_port_destroy(port->dev);
return err;
}
{
int value;
- mutex_lock(&phydev->lock);
-
value = phy_read(phydev, MII_BMCR);
value &= ~(BMCR_PDOWN | BMCR_ISOLATE);
phy_write(phydev, MII_BMCR, value);
- mutex_unlock(&phydev->lock);
-
return 0;
}
if (err < 0)
goto error;
+ /* Do not touch the fiber page if we're in copper->sgmii mode */
+ if (phydev->interface == PHY_INTERFACE_MODE_SGMII)
+ return 0;
+
/* Then the fiber link */
err = marvell_set_page(phydev, MII_MARVELL_FIBER_PAGE);
if (err < 0)
/* SGMII-to-Copper mode initialization */
if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+ u32 pause;
+
/* Select page 18 */
err = marvell_set_page(phydev, 18);
if (err < 0)
err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
if (err < 0)
return err;
+
+ /* There appears to be a bug in the 88e1512 when used in
+ * SGMII to copper mode, where the AN advertisment register
+ * clears the pause bits each time a negotiation occurs.
+ * This means we can never be truely sure what was advertised,
+ * so disable Pause support.
+ */
+ pause = SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+ phydev->supported &= ~pause;
+ phydev->advertising &= ~pause;
}
return m88e1121_config_init(phydev);
.flags = PHY_HAS_INTERRUPT,
.probe = marvell_probe,
.config_init = &m88e1145_config_init,
- .config_aneg = &marvell_config_aneg,
+ .config_aneg = &m88e1101_config_aneg,
.read_status = &genphy_read_status,
.ack_interrupt = &marvell_ack_interrupt,
.config_intr = &marvell_config_intr,
data->regulator = devm_regulator_get(&pdev->dev, "phy");
if (IS_ERR(data->regulator)) {
- if (PTR_ERR(data->regulator) == -EPROBE_DEFER)
- return -EPROBE_DEFER;
+ if (PTR_ERR(data->regulator) == -EPROBE_DEFER) {
+ ret = -EPROBE_DEFER;
+ goto err_out_free_mdiobus;
+ }
dev_info(&pdev->dev, "no regulator found\n");
data->regulator = NULL;
}
ret = xgene_enet_ecc_init(pdata);
- if (ret)
+ if (ret) {
+ if (pdata->dev->of_node)
+ clk_disable_unprepare(pdata->clk);
return ret;
+ }
xgene_gmac_reset(pdata);
return 0;
return ret;
mdio_bus = mdiobus_alloc();
- if (!mdio_bus)
- return -ENOMEM;
+ if (!mdio_bus) {
+ ret = -ENOMEM;
+ goto out_clk;
+ }
mdio_bus->name = "APM X-Gene MDIO bus";
mdio_bus->phy_mask = ~0;
ret = mdiobus_register(mdio_bus);
if (ret)
- goto out;
+ goto out_mdiobus;
acpi_walk_namespace(ACPI_TYPE_DEVICE, ACPI_HANDLE(dev), 1,
acpi_register_phy, NULL, mdio_bus, NULL);
}
if (ret)
- goto out;
+ goto out_mdiobus;
pdata->mdio_bus = mdio_bus;
xgene_mdio_status = true;
return 0;
-out:
+out_mdiobus:
mdiobus_free(mdio_bus);
+out_clk:
+ if (dev->of_node)
+ clk_disable_unprepare(pdata->clk);
+
return ret;
}
if (addr == mdiodev->addr) {
dev->of_node = child;
+ dev->fwnode = of_fwnode_handle(child);
return;
}
}
#include <linux/ethtool.h>
#include <linux/phy.h>
#include <linux/netdevice.h>
+#include <linux/bitfield.h>
static int meson_gxl_config_init(struct phy_device *phydev)
{
return 0;
}
+/* This function is provided to cope with the possible failures of this phy
+ * during aneg process. When aneg fails, the PHY reports that aneg is done
+ * but the value found in MII_LPA is wrong:
+ * - Early failures: MII_LPA is just 0x0001. if MII_EXPANSION reports that
+ * the link partner (LP) supports aneg but the LP never acked our base
+ * code word, it is likely that we never sent it to begin with.
+ * - Late failures: MII_LPA is filled with a value which seems to make sense
+ * but it actually is not what the LP is advertising. It seems that we
+ * can detect this using a magic bit in the WOL bank (reg 12 - bit 12).
+ * If this particular bit is not set when aneg is reported being done,
+ * it means MII_LPA is likely to be wrong.
+ *
+ * In both case, forcing a restart of the aneg process solve the problem.
+ * When this failure happens, the first retry is usually successful but,
+ * in some cases, it may take up to 6 retries to get a decent result
+ */
+static int meson_gxl_read_status(struct phy_device *phydev)
+{
+ int ret, wol, lpa, exp;
+
+ if (phydev->autoneg == AUTONEG_ENABLE) {
+ ret = genphy_aneg_done(phydev);
+ if (ret < 0)
+ return ret;
+ else if (!ret)
+ goto read_status_continue;
+
+ /* Need to access WOL bank, make sure the access is open */
+ ret = phy_write(phydev, 0x14, 0x0000);
+ if (ret)
+ return ret;
+ ret = phy_write(phydev, 0x14, 0x0400);
+ if (ret)
+ return ret;
+ ret = phy_write(phydev, 0x14, 0x0000);
+ if (ret)
+ return ret;
+ ret = phy_write(phydev, 0x14, 0x0400);
+ if (ret)
+ return ret;
+
+ /* Request LPI_STATUS WOL register */
+ ret = phy_write(phydev, 0x14, 0x8D80);
+ if (ret)
+ return ret;
+
+ /* Read LPI_STATUS value */
+ wol = phy_read(phydev, 0x15);
+ if (wol < 0)
+ return wol;
+
+ lpa = phy_read(phydev, MII_LPA);
+ if (lpa < 0)
+ return lpa;
+
+ exp = phy_read(phydev, MII_EXPANSION);
+ if (exp < 0)
+ return exp;
+
+ if (!(wol & BIT(12)) ||
+ ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
+ /* Looks like aneg failed after all */
+ phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+ return genphy_restart_aneg(phydev);
+ }
+ }
+
+read_status_continue:
+ return genphy_read_status(phydev);
+}
+
static struct phy_driver meson_gxl_phy[] = {
{
.phy_id = 0x01814400,
.config_init = meson_gxl_config_init,
.config_aneg = genphy_config_aneg,
.aneg_done = genphy_aneg_done,
- .read_status = genphy_read_status,
+ .read_status = meson_gxl_read_status,
.suspend = genphy_suspend,
.resume = genphy_resume,
},
phydev->link = 0;
if (phydev->drv->config_intr && phy_interrupt_is_valid(phydev))
phydev->drv->config_intr(phydev);
+ return genphy_config_aneg(phydev);
}
return 0;
*/
void phy_start(struct phy_device *phydev)
{
- bool do_resume = false;
int err = 0;
mutex_lock(&phydev->lock);
phydev->state = PHY_UP;
break;
case PHY_HALTED:
+ /* if phy was suspended, bring the physical link up again */
+ phy_resume(phydev);
+
/* make sure interrupts are re-enabled for the PHY */
if (phydev->irq != PHY_POLL) {
err = phy_enable_interrupts(phydev);
}
phydev->state = PHY_RESUMING;
- do_resume = true;
break;
default:
break;
}
mutex_unlock(&phydev->lock);
- /* if phy was suspended, bring the physical link up again */
- if (do_resume)
- phy_resume(phydev);
-
phy_trigger_machine(phydev, true);
}
EXPORT_SYMBOL(phy_start);
if (!mdio_bus_phy_may_suspend(phydev))
goto no_resume;
+ mutex_lock(&phydev->lock);
ret = phy_resume(phydev);
+ mutex_unlock(&phydev->lock);
if (ret < 0)
return ret;
if (err)
goto error;
+ mutex_lock(&phydev->lock);
phy_resume(phydev);
+ mutex_unlock(&phydev->lock);
phy_led_triggers_register(phydev);
return err;
struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
int ret = 0;
+ WARN_ON(!mutex_is_locked(&phydev->lock));
+
if (phydev->drv && phydrv->resume)
ret = phydrv->resume(phydev);
{
int value;
- mutex_lock(&phydev->lock);
-
value = phy_read(phydev, MII_BMCR);
phy_write(phydev, MII_BMCR, value & ~BMCR_PDOWN);
- mutex_unlock(&phydev->lock);
-
return 0;
}
EXPORT_SYMBOL(genphy_resume);
pl->link_config.pause = MLO_PAUSE_AN;
pl->link_config.speed = SPEED_UNKNOWN;
pl->link_config.duplex = DUPLEX_UNKNOWN;
+ pl->link_config.an_enabled = true;
pl->ops = ops;
__set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
mutex_lock(&pl->state_mutex);
/* Configure the MAC to match the new settings */
linkmode_copy(pl->link_config.advertising, our_kset.link_modes.advertising);
+ pl->link_config.interface = config.interface;
pl->link_config.speed = our_kset.base.speed;
pl->link_config.duplex = our_kset.base.duplex;
pl->link_config.an_enabled = our_kset.base.autoneg != AUTONEG_DISABLE;
switch (cmd) {
case SIOCGMIIPHY:
mii->phy_id = pl->phydev->mdio.addr;
+ /* fall through */
case SIOCGMIIREG:
ret = phylink_phy_read(pl, mii->phy_id, mii->reg_num);
switch (cmd) {
case SIOCGMIIPHY:
mii->phy_id = 0;
+ /* fall through */
case SIOCGMIIREG:
ret = phylink_mii_read(pl, mii->phy_id, mii->reg_num);
WARN_ON(!lockdep_rtnl_is_held());
set_bit(PHYLINK_DISABLE_LINK, &pl->phylink_disable_state);
+ queue_work(system_power_efficient_wq, &pl->resolve);
flush_work(&pl->resolve);
-
- netif_carrier_off(pl->netdev);
}
static void phylink_sfp_link_up(void *upstream)
void sfp_unregister_upstream(struct sfp_bus *bus)
{
rtnl_lock();
- sfp_unregister_bus(bus);
+ if (bus->sfp)
+ sfp_unregister_bus(bus);
bus->upstream = NULL;
bus->netdev = NULL;
rtnl_unlock();
void sfp_unregister_socket(struct sfp_bus *bus)
{
rtnl_lock();
- sfp_unregister_bus(bus);
+ if (bus->netdev)
+ sfp_unregister_bus(bus);
bus->sfp_dev = NULL;
bus->sfp = NULL;
bus->socket_ops = NULL;
{QMI_FIXED_INTF(0x05c6, 0x9084, 4)},
{QMI_FIXED_INTF(0x05c6, 0x920d, 0)},
{QMI_FIXED_INTF(0x05c6, 0x920d, 5)},
+ {QMI_QUIRK_SET_DTR(0x05c6, 0x9625, 4)}, /* YUGA CLM920-NC5 */
{QMI_FIXED_INTF(0x0846, 0x68a2, 8)},
{QMI_FIXED_INTF(0x12d1, 0x140c, 1)}, /* Huawei E173 */
{QMI_FIXED_INTF(0x12d1, 0x14ac, 1)}, /* Huawei E1820 */
{QMI_FIXED_INTF(0x1199, 0x9079, 10)}, /* Sierra Wireless EM74xx */
{QMI_FIXED_INTF(0x1199, 0x907b, 8)}, /* Sierra Wireless EM74xx */
{QMI_FIXED_INTF(0x1199, 0x907b, 10)}, /* Sierra Wireless EM74xx */
+ {QMI_FIXED_INTF(0x1199, 0x9091, 8)}, /* Sierra Wireless EM7565 */
{QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */
{QMI_FIXED_INTF(0x1bbb, 0x0203, 2)}, /* Alcatel L800MA */
{QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */
{QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */
{QMI_QUIRK_SET_DTR(0x1bc7, 0x1040, 2)}, /* Telit LE922A */
{QMI_FIXED_INTF(0x1bc7, 0x1100, 3)}, /* Telit ME910 */
+ {QMI_FIXED_INTF(0x1bc7, 0x1101, 3)}, /* Telit ME910 dual modem */
{QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */
{QMI_QUIRK_SET_DTR(0x1bc7, 0x1201, 2)}, /* Telit LE920, LE920A4 */
{QMI_FIXED_INTF(0x1c9e, 0x9801, 3)}, /* Telewell TW-3G HSPA+ */
}
ndst = &rt->dst;
+ if (skb_dst(skb)) {
+ int mtu = dst_mtu(ndst) - VXLAN_HEADROOM;
+
+ skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
+ skb, mtu);
+ }
+
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
err = vxlan_build_skb(skb, ndst, sizeof(struct iphdr),
goto out_unlock;
}
+ if (skb_dst(skb)) {
+ int mtu = dst_mtu(ndst) - VXLAN6_HEADROOM;
+
+ skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
+ skb, mtu);
+ }
+
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
ttl = ttl ? : ip6_dst_hoplimit(ndst);
skb_scrub_packet(skb, xnet);
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
VXLAN_HEADROOM);
+ if (max_mtu < ETH_MIN_MTU)
+ max_mtu = ETH_MIN_MTU;
+
+ if (!changelink && !conf->mtu)
+ dev->mtu = max_mtu;
}
if (dev->mtu > max_mtu)
}
}
+ if (changed & IEEE80211_CONF_CHANGE_PS) {
+ list_for_each_entry(tmp, &wcn->vif_list, list) {
+ vif = wcn36xx_priv_to_vif(tmp);
+ if (hw->conf.flags & IEEE80211_CONF_PS) {
+ if (vif->bss_conf.ps) /* ps allowed ? */
+ wcn36xx_pmc_enter_bmps_state(wcn, vif);
+ } else {
+ wcn36xx_pmc_exit_bmps_state(wcn, vif);
+ }
+ }
+ }
+
mutex_unlock(&wcn->conf_mutex);
return 0;
vif_priv->dtim_period = bss_conf->dtim_period;
}
- if (changed & BSS_CHANGED_PS) {
- wcn36xx_dbg(WCN36XX_DBG_MAC,
- "mac bss PS set %d\n",
- bss_conf->ps);
- if (bss_conf->ps) {
- wcn36xx_pmc_enter_bmps_state(wcn, vif);
- } else {
- wcn36xx_pmc_exit_bmps_state(wcn, vif);
- }
- }
-
if (changed & BSS_CHANGED_BSSID) {
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac bss changed_bssid %pM\n",
bss_conf->bssid);
struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
if (WCN36XX_BMPS != vif_priv->pw_state) {
- wcn36xx_err("Not in BMPS mode, no need to exit from BMPS mode!\n");
- return -EINVAL;
+ /* Unbalanced call or last BMPS enter failed */
+ wcn36xx_dbg(WCN36XX_DBG_PMC,
+ "Not in BMPS mode, no need to exit\n");
+ return -EALREADY;
}
wcn36xx_smd_exit_bmps(wcn, vif);
vif_priv->pw_state = WCN36XX_FULL_POWER;
return index & (q->n_window - 1);
}
-static inline void *iwl_pcie_get_tfd(struct iwl_trans_pcie *trans_pcie,
+static inline void *iwl_pcie_get_tfd(struct iwl_trans *trans,
struct iwl_txq *txq, int idx)
{
- return txq->tfds + trans_pcie->tfd_size * iwl_pcie_get_cmd_index(txq,
- idx);
+ struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
+
+ if (trans->cfg->use_tfh)
+ idx = iwl_pcie_get_cmd_index(txq, idx);
+
+ return txq->tfds + trans_pcie->tfd_size * idx;
}
static inline void iwl_enable_rfkill_int(struct iwl_trans *trans)
static void iwl_pcie_gen2_free_tfd(struct iwl_trans *trans, struct iwl_txq *txq)
{
- struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
-
/* rd_ptr is bounded by TFD_QUEUE_SIZE_MAX and
* idx is bounded by n_window
*/
lockdep_assert_held(&txq->lock);
iwl_pcie_gen2_tfd_unmap(trans, &txq->entries[idx].meta,
- iwl_pcie_get_tfd(trans_pcie, txq, idx));
+ iwl_pcie_get_tfd(trans, txq, idx));
/* free SKB */
if (txq->entries) {
struct sk_buff *skb,
struct iwl_cmd_meta *out_meta)
{
- struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
int idx = iwl_pcie_get_cmd_index(txq, txq->write_ptr);
- struct iwl_tfh_tfd *tfd =
- iwl_pcie_get_tfd(trans_pcie, txq, idx);
+ struct iwl_tfh_tfd *tfd = iwl_pcie_get_tfd(trans, txq, idx);
dma_addr_t tb_phys;
bool amsdu;
int i, len, tb1_len, tb2_len, hdr_len;
u8 group_id = iwl_cmd_groupid(cmd->id);
const u8 *cmddata[IWL_MAX_CMD_TBS_PER_TFD];
u16 cmdlen[IWL_MAX_CMD_TBS_PER_TFD];
- struct iwl_tfh_tfd *tfd =
- iwl_pcie_get_tfd(trans_pcie, txq, txq->write_ptr);
+ struct iwl_tfh_tfd *tfd = iwl_pcie_get_tfd(trans, txq, txq->write_ptr);
memset(tfd, 0, sizeof(*tfd));
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
int i, num_tbs;
- void *tfd = iwl_pcie_get_tfd(trans_pcie, txq, index);
+ void *tfd = iwl_pcie_get_tfd(trans, txq, index);
/* Sanity check on number of chunks */
num_tbs = iwl_pcie_tfd_get_num_tbs(trans, tfd);
}
trace_iwlwifi_dev_tx(trans->dev, skb,
- iwl_pcie_get_tfd(trans_pcie, txq, txq->write_ptr),
+ iwl_pcie_get_tfd(trans, txq, txq->write_ptr),
trans_pcie->tfd_size,
&dev_cmd->hdr, IWL_FIRST_TB_SIZE + tb1_len,
hdr_len);
IEEE80211_CCMP_HDR_LEN : 0;
trace_iwlwifi_dev_tx(trans->dev, skb,
- iwl_pcie_get_tfd(trans_pcie, txq, txq->write_ptr),
+ iwl_pcie_get_tfd(trans, txq, txq->write_ptr),
trans_pcie->tfd_size,
&dev_cmd->hdr, IWL_FIRST_TB_SIZE + tb1_len, 0);
memcpy(&txq->first_tb_bufs[txq->write_ptr], &dev_cmd->hdr,
IWL_FIRST_TB_SIZE);
- tfd = iwl_pcie_get_tfd(trans_pcie, txq, txq->write_ptr);
+ tfd = iwl_pcie_get_tfd(trans, txq, txq->write_ptr);
/* Set up entry for this TFD in Tx byte-count array */
iwl_pcie_txq_update_byte_cnt_tbl(trans, txq, le16_to_cpu(tx_cmd->len),
iwl_pcie_tfd_get_num_tbs(trans, tfd));
hdr = skb_put(skb, sizeof(*hdr) - ETH_ALEN);
hdr->frame_control = cpu_to_le16(IEEE80211_FTYPE_DATA |
IEEE80211_STYPE_NULLFUNC |
+ IEEE80211_FCTL_TODS |
(ps ? IEEE80211_FCTL_PM : 0));
hdr->duration_id = cpu_to_le16(0);
memcpy(hdr->addr1, vp->bssid, ETH_ALEN);
if (!net_eq(wiphy_net(data->hw->wiphy), genl_info_net(info)))
continue;
- skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
if (!skb) {
res = -ENOMEM;
goto out_err;
netif_carrier_off(netdev);
+ xenbus_switch_state(dev, XenbusStateInitialising);
return netdev;
exit:
return ret;
}
-static int btt_log_read_pair(struct arena_info *arena, u32 lane,
- struct log_entry *ent)
+static int btt_log_group_read(struct arena_info *arena, u32 lane,
+ struct log_group *log)
{
return arena_read_bytes(arena,
- arena->logoff + (2 * lane * LOG_ENT_SIZE), ent,
- 2 * LOG_ENT_SIZE, 0);
+ arena->logoff + (lane * LOG_GRP_SIZE), log,
+ LOG_GRP_SIZE, 0);
}
static struct dentry *debugfs_root;
debugfs_create_x64("logoff", S_IRUGO, d, &a->logoff);
debugfs_create_x64("info2off", S_IRUGO, d, &a->info2off);
debugfs_create_x32("flags", S_IRUGO, d, &a->flags);
+ debugfs_create_u32("log_index_0", S_IRUGO, d, &a->log_index[0]);
+ debugfs_create_u32("log_index_1", S_IRUGO, d, &a->log_index[1]);
}
static void btt_debugfs_init(struct btt *btt)
}
}
+static u32 log_seq(struct log_group *log, int log_idx)
+{
+ return le32_to_cpu(log->ent[log_idx].seq);
+}
+
/*
* This function accepts two log entries, and uses the
* sequence number to find the 'older' entry.
*
* TODO The logic feels a bit kludge-y. make it better..
*/
-static int btt_log_get_old(struct log_entry *ent)
+static int btt_log_get_old(struct arena_info *a, struct log_group *log)
{
+ int idx0 = a->log_index[0];
+ int idx1 = a->log_index[1];
int old;
/*
* the next time, the following logic works out to put this
* (next) entry into [1]
*/
- if (ent[0].seq == 0) {
- ent[0].seq = cpu_to_le32(1);
+ if (log_seq(log, idx0) == 0) {
+ log->ent[idx0].seq = cpu_to_le32(1);
return 0;
}
- if (ent[0].seq == ent[1].seq)
+ if (log_seq(log, idx0) == log_seq(log, idx1))
return -EINVAL;
- if (le32_to_cpu(ent[0].seq) + le32_to_cpu(ent[1].seq) > 5)
+ if (log_seq(log, idx0) + log_seq(log, idx1) > 5)
return -EINVAL;
- if (le32_to_cpu(ent[0].seq) < le32_to_cpu(ent[1].seq)) {
- if (le32_to_cpu(ent[1].seq) - le32_to_cpu(ent[0].seq) == 1)
+ if (log_seq(log, idx0) < log_seq(log, idx1)) {
+ if ((log_seq(log, idx1) - log_seq(log, idx0)) == 1)
old = 0;
else
old = 1;
} else {
- if (le32_to_cpu(ent[0].seq) - le32_to_cpu(ent[1].seq) == 1)
+ if ((log_seq(log, idx0) - log_seq(log, idx1)) == 1)
old = 1;
else
old = 0;
{
int ret;
int old_ent, ret_ent;
- struct log_entry log[2];
+ struct log_group log;
- ret = btt_log_read_pair(arena, lane, log);
+ ret = btt_log_group_read(arena, lane, &log);
if (ret)
return -EIO;
- old_ent = btt_log_get_old(log);
+ old_ent = btt_log_get_old(arena, &log);
if (old_ent < 0 || old_ent > 1) {
dev_err(to_dev(arena),
"log corruption (%d): lane %d seq [%d, %d]\n",
- old_ent, lane, log[0].seq, log[1].seq);
+ old_ent, lane, log.ent[arena->log_index[0]].seq,
+ log.ent[arena->log_index[1]].seq);
/* TODO set error state? */
return -EIO;
}
ret_ent = (old_flag ? old_ent : (1 - old_ent));
if (ent != NULL)
- memcpy(ent, &log[ret_ent], LOG_ENT_SIZE);
+ memcpy(ent, &log.ent[arena->log_index[ret_ent]], LOG_ENT_SIZE);
return ret_ent;
}
u32 sub, struct log_entry *ent, unsigned long flags)
{
int ret;
- /*
- * Ignore the padding in log_entry for calculating log_half.
- * The entry is 'committed' when we write the sequence number,
- * and we want to ensure that that is the last thing written.
- * We don't bother writing the padding as that would be extra
- * media wear and write amplification
- */
- unsigned int log_half = (LOG_ENT_SIZE - 2 * sizeof(u64)) / 2;
- u64 ns_off = arena->logoff + (((2 * lane) + sub) * LOG_ENT_SIZE);
+ u32 group_slot = arena->log_index[sub];
+ unsigned int log_half = LOG_ENT_SIZE / 2;
void *src = ent;
+ u64 ns_off;
+ ns_off = arena->logoff + (lane * LOG_GRP_SIZE) +
+ (group_slot * LOG_ENT_SIZE);
/* split the 16B write into atomic, durable halves */
ret = arena_write_bytes(arena, ns_off, src, log_half, flags);
if (ret)
{
size_t logsize = arena->info2off - arena->logoff;
size_t chunk_size = SZ_4K, offset = 0;
- struct log_entry log;
+ struct log_entry ent;
void *zerobuf;
int ret;
u32 i;
}
for (i = 0; i < arena->nfree; i++) {
- log.lba = cpu_to_le32(i);
- log.old_map = cpu_to_le32(arena->external_nlba + i);
- log.new_map = cpu_to_le32(arena->external_nlba + i);
- log.seq = cpu_to_le32(LOG_SEQ_INIT);
- ret = __btt_log_write(arena, i, 0, &log, 0);
+ ent.lba = cpu_to_le32(i);
+ ent.old_map = cpu_to_le32(arena->external_nlba + i);
+ ent.new_map = cpu_to_le32(arena->external_nlba + i);
+ ent.seq = cpu_to_le32(LOG_SEQ_INIT);
+ ret = __btt_log_write(arena, i, 0, &ent, 0);
if (ret)
goto free;
}
return 0;
}
+static bool ent_is_padding(struct log_entry *ent)
+{
+ return (ent->lba == 0) && (ent->old_map == 0) && (ent->new_map == 0)
+ && (ent->seq == 0);
+}
+
+/*
+ * Detecting valid log indices: We read a log group (see the comments in btt.h
+ * for a description of a 'log_group' and its 'slots'), and iterate over its
+ * four slots. We expect that a padding slot will be all-zeroes, and use this
+ * to detect a padding slot vs. an actual entry.
+ *
+ * If a log_group is in the initial state, i.e. hasn't been used since the
+ * creation of this BTT layout, it will have three of the four slots with
+ * zeroes. We skip over these log_groups for the detection of log_index. If
+ * all log_groups are in the initial state (i.e. the BTT has never been
+ * written to), it is safe to assume the 'new format' of log entries in slots
+ * (0, 1).
+ */
+static int log_set_indices(struct arena_info *arena)
+{
+ bool idx_set = false, initial_state = true;
+ int ret, log_index[2] = {-1, -1};
+ u32 i, j, next_idx = 0;
+ struct log_group log;
+ u32 pad_count = 0;
+
+ for (i = 0; i < arena->nfree; i++) {
+ ret = btt_log_group_read(arena, i, &log);
+ if (ret < 0)
+ return ret;
+
+ for (j = 0; j < 4; j++) {
+ if (!idx_set) {
+ if (ent_is_padding(&log.ent[j])) {
+ pad_count++;
+ continue;
+ } else {
+ /* Skip if index has been recorded */
+ if ((next_idx == 1) &&
+ (j == log_index[0]))
+ continue;
+ /* valid entry, record index */
+ log_index[next_idx] = j;
+ next_idx++;
+ }
+ if (next_idx == 2) {
+ /* two valid entries found */
+ idx_set = true;
+ } else if (next_idx > 2) {
+ /* too many valid indices */
+ return -ENXIO;
+ }
+ } else {
+ /*
+ * once the indices have been set, just verify
+ * that all subsequent log groups are either in
+ * their initial state or follow the same
+ * indices.
+ */
+ if (j == log_index[0]) {
+ /* entry must be 'valid' */
+ if (ent_is_padding(&log.ent[j]))
+ return -ENXIO;
+ } else if (j == log_index[1]) {
+ ;
+ /*
+ * log_index[1] can be padding if the
+ * lane never got used and it is still
+ * in the initial state (three 'padding'
+ * entries)
+ */
+ } else {
+ /* entry must be invalid (padding) */
+ if (!ent_is_padding(&log.ent[j]))
+ return -ENXIO;
+ }
+ }
+ }
+ /*
+ * If any of the log_groups have more than one valid,
+ * non-padding entry, then the we are no longer in the
+ * initial_state
+ */
+ if (pad_count < 3)
+ initial_state = false;
+ pad_count = 0;
+ }
+
+ if (!initial_state && !idx_set)
+ return -ENXIO;
+
+ /*
+ * If all the entries in the log were in the initial state,
+ * assume new padding scheme
+ */
+ if (initial_state)
+ log_index[1] = 1;
+
+ /*
+ * Only allow the known permutations of log/padding indices,
+ * i.e. (0, 1), and (0, 2)
+ */
+ if ((log_index[0] == 0) && ((log_index[1] == 1) || (log_index[1] == 2)))
+ ; /* known index possibilities */
+ else {
+ dev_err(to_dev(arena), "Found an unknown padding scheme\n");
+ return -ENXIO;
+ }
+
+ arena->log_index[0] = log_index[0];
+ arena->log_index[1] = log_index[1];
+ dev_dbg(to_dev(arena), "log_index_0 = %d\n", log_index[0]);
+ dev_dbg(to_dev(arena), "log_index_1 = %d\n", log_index[1]);
+ return 0;
+}
+
static int btt_rtt_init(struct arena_info *arena)
{
arena->rtt = kcalloc(arena->nfree, sizeof(u32), GFP_KERNEL);
available -= 2 * BTT_PG_SIZE;
/* The log takes a fixed amount of space based on nfree */
- logsize = roundup(2 * arena->nfree * sizeof(struct log_entry),
- BTT_PG_SIZE);
+ logsize = roundup(arena->nfree * LOG_GRP_SIZE, BTT_PG_SIZE);
available -= logsize;
/* Calculate optimal split between map and data area */
arena->mapoff = arena->dataoff + datasize;
arena->logoff = arena->mapoff + mapsize;
arena->info2off = arena->logoff + logsize;
+
+ /* Default log indices are (0,1) */
+ arena->log_index[0] = 0;
+ arena->log_index[1] = 1;
return arena;
}
arena->external_lba_start = cur_nlba;
parse_arena_meta(arena, super, cur_off);
+ ret = log_set_indices(arena);
+ if (ret) {
+ dev_err(to_dev(arena),
+ "Unable to deduce log/padding indices\n");
+ goto out;
+ }
+
mutex_init(&arena->err_lock);
ret = btt_freelist_init(arena);
if (ret)
#define MAP_ERR_MASK (1 << MAP_ERR_SHIFT)
#define MAP_LBA_MASK (~((1 << MAP_TRIM_SHIFT) | (1 << MAP_ERR_SHIFT)))
#define MAP_ENT_NORMAL 0xC0000000
+#define LOG_GRP_SIZE sizeof(struct log_group)
#define LOG_ENT_SIZE sizeof(struct log_entry)
#define ARENA_MIN_SIZE (1UL << 24) /* 16 MB */
#define ARENA_MAX_SIZE (1ULL << 39) /* 512 GB */
INIT_READY
};
+/*
+ * A log group represents one log 'lane', and consists of four log entries.
+ * Two of the four entries are valid entries, and the remaining two are
+ * padding. Due to an old bug in the padding location, we need to perform a
+ * test to determine the padding scheme being used, and use that scheme
+ * thereafter.
+ *
+ * In kernels prior to 4.15, 'log group' would have actual log entries at
+ * indices (0, 2) and padding at indices (1, 3), where as the correct/updated
+ * format has log entries at indices (0, 1) and padding at indices (2, 3).
+ *
+ * Old (pre 4.15) format:
+ * +-----------------+-----------------+
+ * | ent[0] | ent[1] |
+ * | 16B | 16B |
+ * | lba/old/new/seq | pad |
+ * +-----------------------------------+
+ * | ent[2] | ent[3] |
+ * | 16B | 16B |
+ * | lba/old/new/seq | pad |
+ * +-----------------+-----------------+
+ *
+ * New format:
+ * +-----------------+-----------------+
+ * | ent[0] | ent[1] |
+ * | 16B | 16B |
+ * | lba/old/new/seq | lba/old/new/seq |
+ * +-----------------------------------+
+ * | ent[2] | ent[3] |
+ * | 16B | 16B |
+ * | pad | pad |
+ * +-----------------+-----------------+
+ *
+ * We detect during start-up which format is in use, and set
+ * arena->log_index[(0, 1)] with the detected format.
+ */
+
struct log_entry {
__le32 lba;
__le32 old_map;
__le32 new_map;
__le32 seq;
- __le64 padding[2];
+};
+
+struct log_group {
+ struct log_entry ent[4];
};
struct btt_sb {
* @list: List head for list of arenas
* @debugfs_dir: Debugfs dentry
* @flags: Arena flags - may signify error states.
+ * @err_lock: Mutex for synchronizing error clearing.
+ * @log_index: Indices of the valid log entries in a log_group
*
* arena_info is a per-arena handle. Once an arena is narrowed down for an
* IO, this struct is passed around for the duration of the IO.
/* Arena flags */
u32 flags;
struct mutex err_lock;
+ int log_index[2];
};
/**
* @init_lock: Mutex used for the BTT initialization
* @init_state: Flag describing the initialization state for the BTT
* @num_arenas: Number of arenas in the BTT instance
+ * @phys_bb: Pointer to the namespace's badblocks structure
*/
struct btt {
struct gendisk *btt_disk;
int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
{
u64 checksum, offset;
- unsigned long align;
enum nd_pfn_mode mode;
struct nd_namespace_io *nsio;
+ unsigned long align, start_pad;
struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
struct nd_namespace_common *ndns = nd_pfn->ndns;
const u8 *parent_uuid = nd_dev_to_uuid(&ndns->dev);
align = le32_to_cpu(pfn_sb->align);
offset = le64_to_cpu(pfn_sb->dataoff);
+ start_pad = le32_to_cpu(pfn_sb->start_pad);
if (align == 0)
align = 1UL << ilog2(offset);
mode = le32_to_cpu(pfn_sb->mode);
return -EBUSY;
}
- if ((align && !IS_ALIGNED(offset, align))
+ if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align))
|| !IS_ALIGNED(offset, PAGE_SIZE)) {
dev_err(&nd_pfn->dev,
"bad offset: %#llx dax disabled align: %#lx\n",
return altmap;
}
+static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys)
+{
+ return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys),
+ ALIGN_DOWN(phys, nd_pfn->align));
+}
+
static int nd_pfn_init(struct nd_pfn *nd_pfn)
{
u32 dax_label_reserve = is_nd_dax(&nd_pfn->dev) ? SZ_128K : 0;
start = nsio->res.start;
size = PHYS_SECTION_ALIGN_UP(start + size) - start;
if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
- IORES_DESC_NONE) == REGION_MIXED) {
+ IORES_DESC_NONE) == REGION_MIXED
+ || !IS_ALIGNED(start + resource_size(&nsio->res),
+ nd_pfn->align)) {
size = resource_size(&nsio->res);
- end_trunc = start + size - PHYS_SECTION_ALIGN_DOWN(start + size);
+ end_trunc = start + size - phys_pmem_align_down(nd_pfn,
+ start + size);
}
if (start_pad + end_trunc)
- dev_info(&nd_pfn->dev, "%s section collision, truncate %d bytes\n",
+ dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n",
dev_name(&ndns->dev), start_pad + end_trunc);
/*
BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
NVME_DSM_MAX_RANGES);
- queue->limits.discard_alignment = size;
+ queue->limits.discard_alignment = 0;
queue->limits.discard_granularity = size;
blk_queue_max_discard_sectors(queue, UINT_MAX);
struct nvme_ns *ns, struct nvme_id_ns *id)
{
sector_t capacity = le64_to_cpup(&id->nsze) << (ns->lba_shift - 9);
+ unsigned short bs = 1 << ns->lba_shift;
unsigned stream_alignment = 0;
if (ns->ctrl->nr_streams && ns->sws && ns->sgs)
blk_mq_freeze_queue(disk->queue);
blk_integrity_unregister(disk);
- blk_queue_logical_block_size(disk->queue, 1 << ns->lba_shift);
+ blk_queue_logical_block_size(disk->queue, bs);
+ blk_queue_physical_block_size(disk->queue, bs);
+ blk_queue_io_min(disk->queue, bs);
+
if (ns->ms && !ns->ext &&
(ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
nvme_init_integrity(disk, ns->ms, ns->pi_type);
blk_queue_max_hw_sectors(q, ctrl->max_hw_sectors);
blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
}
- if (ctrl->quirks & NVME_QUIRK_STRIPE_SIZE)
+ if ((ctrl->quirks & NVME_QUIRK_STRIPE_SIZE) &&
+ is_power_of_2(ctrl->max_hw_sectors))
blk_queue_chunk_sectors(q, ctrl->max_hw_sectors);
blk_queue_virt_boundary(q, ctrl->page_size - 1);
if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
nvme_set_queue_limits(ctrl, ns->queue);
- nvme_setup_streams_ns(ctrl, ns);
id = nvme_identify_ns(ctrl, nsid);
if (!id)
if (nvme_init_ns_head(ns, nsid, id, &new))
goto out_free_id;
+ nvme_setup_streams_ns(ctrl, ns);
#ifdef CONFIG_NVME_MULTIPATH
/*
return;
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
- if (blk_get_integrity(ns->disk))
- blk_integrity_unregister(ns->disk);
nvme_mpath_remove_disk_links(ns);
sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
&nvme_ns_id_attr_group);
nvme_nvm_unregister_sysfs(ns);
del_gendisk(ns->disk);
blk_cleanup_queue(ns->queue);
+ if (blk_get_integrity(ns->disk))
+ blk_integrity_unregister(ns->disk);
}
mutex_lock(&ns->ctrl->subsys->lock);
mutex_unlock(&ns->ctrl->namespaces_mutex);
synchronize_srcu(&ns->head->srcu);
+ nvme_mpath_check_last_path(ns);
nvme_put_ns(ns);
}
/* initiate nvme ctrl ref counting teardown */
nvme_uninit_ctrl(&ctrl->ctrl);
- nvme_put_ctrl(&ctrl->ctrl);
/* Remove core ctrl ref. */
nvme_put_ctrl(&ctrl->ctrl);
rcu_assign_pointer(head->current_path, NULL);
}
struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
+
+static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
+{
+ struct nvme_ns_head *head = ns->head;
+
+ if (head->disk && list_empty(&head->list))
+ kblockd_schedule_work(&head->requeue_work);
+}
+
#else
static inline void nvme_failover_req(struct request *req)
{
static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
{
}
+static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
+{
+}
#endif /* CONFIG_NVME_MULTIPATH */
#ifdef CONFIG_NVM
return (void **)(iod->sg + blk_rq_nr_phys_segments(req));
}
+static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
+{
+ struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+ unsigned int avg_seg_size;
+
+ avg_seg_size = DIV_ROUND_UP(blk_rq_payload_bytes(req),
+ blk_rq_nr_phys_segments(req));
+
+ if (!(dev->ctrl.sgls & ((1 << 0) | (1 << 1))))
+ return false;
+ if (!iod->nvmeq->qid)
+ return false;
+ if (!sgl_threshold || avg_seg_size < sgl_threshold)
+ return false;
+ return true;
+}
+
static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(rq);
int nseg = blk_rq_nr_phys_segments(rq);
unsigned int size = blk_rq_payload_bytes(rq);
+ iod->use_sgl = nvme_pci_use_sgls(dev, rq);
+
if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) {
size_t alloc_size = nvme_pci_iod_alloc_size(dev, size, nseg,
iod->use_sgl);
dma_addr_t prp_dma;
int nprps, i;
- iod->use_sgl = false;
-
length -= (page_size - offset);
if (length <= 0) {
iod->first_dma = 0;
int entries = iod->nents, i = 0;
dma_addr_t sgl_dma;
- iod->use_sgl = true;
-
/* setting the transfer type as SGL */
cmd->flags = NVME_CMD_SGL_METABUF;
return BLK_STS_OK;
}
-static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
-{
- struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
- unsigned int avg_seg_size;
-
- avg_seg_size = DIV_ROUND_UP(blk_rq_payload_bytes(req),
- blk_rq_nr_phys_segments(req));
-
- if (!(dev->ctrl.sgls & ((1 << 0) | (1 << 1))))
- return false;
- if (!iod->nvmeq->qid)
- return false;
- if (!sgl_threshold || avg_seg_size < sgl_threshold)
- return false;
- return true;
-}
-
static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
struct nvme_command *cmnd)
{
DMA_ATTR_NO_WARN))
goto out;
- if (nvme_pci_use_sgls(dev, req))
+ if (iod->use_sgl)
ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw);
else
ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
nvme_start_queues(&ctrl->ctrl);
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
+ /* state change failure should never happen */
+ WARN_ON_ONCE(1);
+ return;
+ }
+
nvme_rdma_reconnect_or_remove(ctrl);
}
static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
{
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING))
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
return;
queue_work(nvme_wq, &ctrl->err_work);
nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_shutdown_ctrl(ctrl, false);
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
+ /* state change failure should never happen */
+ WARN_ON_ONCE(1);
+ return;
+ }
+
ret = nvme_rdma_configure_admin_queue(ctrl, false);
if (ret)
goto out_fail;
const char *buf, size_t count)
{
struct fcloop_nport *nport = NULL, *tmpport;
- struct fcloop_tport *tport;
+ struct fcloop_tport *tport = NULL;
u64 nodename, portname;
unsigned long flags;
int ret;
MESON_MX_EFUSE_CNTL1_AUTO_RD_ENABLE,
MESON_MX_EFUSE_CNTL1_AUTO_RD_ENABLE);
- for (i = offset; i < offset + bytes; i += efuse->config.word_size) {
- addr = i / efuse->config.word_size;
+ for (i = 0; i < bytes; i += efuse->config.word_size) {
+ addr = (offset + i) / efuse->config.word_size;
err = meson_mx_efuse_read_addr(efuse, addr, &tmp);
if (err)
* can be looked up later */
of_node_get(child);
phy->mdio.dev.of_node = child;
+ phy->mdio.dev.fwnode = of_fwnode_handle(child);
/* All data is now stored in the phy struct;
* register it */
*/
of_node_get(child);
mdiodev->dev.of_node = child;
+ mdiodev->dev.fwnode = of_fwnode_handle(child);
/* All data is now stored in the mdiodev struct; register it. */
rc = mdio_device_register(mdiodev);
mdio->phy_mask = ~0;
mdio->dev.of_node = np;
+ mdio->dev.fwnode = of_fwnode_handle(np);
/* Get bus level PHY reset GPIO details */
mdio->reset_delay_us = DEFAULT_GPIO_RESET_DELAY;
rc = of_mdiobus_register_phy(mdio, child, addr);
else
rc = of_mdiobus_register_device(mdio, child, addr);
- if (rc)
+
+ if (rc == -ENODEV)
+ dev_err(&mdio->dev,
+ "MDIO device at address %d is missing.\n",
+ addr);
+ else if (rc)
goto unregister;
}
if (of_mdiobus_child_is_phy(child)) {
rc = of_mdiobus_register_phy(mdio, child, addr);
- if (rc)
+ if (rc && rc != -ENODEV)
goto unregister;
}
}
struct dino_device *dino_dev = irq_data_get_irq_chip_data(d);
int local_irq = gsc_find_local_irq(d->irq, dino_dev->global_irq, DINO_LOCAL_IRQS);
- DBG(KERN_WARNING "%s(0x%p, %d)\n", __func__, dino_dev, d->irq);
+ DBG(KERN_WARNING "%s(0x%px, %d)\n", __func__, dino_dev, d->irq);
/* Clear the matching bit in the IMR register */
dino_dev->imr &= ~(DINO_MASK_IRQ(local_irq));
int local_irq = gsc_find_local_irq(d->irq, dino_dev->global_irq, DINO_LOCAL_IRQS);
u32 tmp;
- DBG(KERN_WARNING "%s(0x%p, %d)\n", __func__, dino_dev, d->irq);
+ DBG(KERN_WARNING "%s(0x%px, %d)\n", __func__, dino_dev, d->irq);
/*
** clear pending IRQ bits
if (mask) {
if (--ilr_loop > 0)
goto ilr_again;
- printk(KERN_ERR "Dino 0x%p: stuck interrupt %d\n",
+ printk(KERN_ERR "Dino 0x%px: stuck interrupt %d\n",
dino_dev->hba.base_addr, mask);
return IRQ_NONE;
}
struct pci_dev *dev;
struct dino_device *dino_dev = DINO_DEV(parisc_walk_tree(bus->bridge));
- DBG(KERN_WARNING "%s(0x%p) bus %d platform_data 0x%p\n",
+ DBG(KERN_WARNING "%s(0x%px) bus %d platform_data 0x%px\n",
__func__, bus, bus->busn_res.start,
bus->bridge->platform_data);
res->flags = IORESOURCE_IO; /* do not mark it busy ! */
if (request_resource(&ioport_resource, res) < 0) {
printk(KERN_ERR "%s: request I/O Port region failed "
- "0x%lx/%lx (hpa 0x%p)\n",
+ "0x%lx/%lx (hpa 0x%px)\n",
name, (unsigned long)res->start, (unsigned long)res->end,
dino_dev->hba.base_addr);
return 1;
return retval;
}
- printk(KERN_INFO "EISA EEPROM at 0x%p\n", eisa_eeprom_addr);
+ printk(KERN_INFO "EISA EEPROM at 0x%px\n", eisa_eeprom_addr);
return 0;
}
iounmap(base_addr);
}
+
+/*
+ * The design of the Diva management card in rp34x0 machines (rp3410, rp3440)
+ * seems rushed, so that many built-in components simply don't work.
+ * The following quirks disable the serial AUX port and the built-in ATI RV100
+ * Radeon 7000 graphics card which both don't have any external connectors and
+ * thus are useless, and even worse, e.g. the AUX port occupies ttyS0 and as
+ * such makes those machines the only PARISC machines on which we can't use
+ * ttyS0 as boot console.
+ */
+static void quirk_diva_ati_card(struct pci_dev *dev)
+{
+ if (dev->subsystem_vendor != PCI_VENDOR_ID_HP ||
+ dev->subsystem_device != 0x1292)
+ return;
+
+ dev_info(&dev->dev, "Hiding Diva built-in ATI card");
+ dev->device = 0;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_RADEON_QY,
+ quirk_diva_ati_card);
+
+static void quirk_diva_aux_disable(struct pci_dev *dev)
+{
+ if (dev->subsystem_vendor != PCI_VENDOR_ID_HP ||
+ dev->subsystem_device != 0x1291)
+ return;
+
+ dev_info(&dev->dev, "Hiding Diva built-in AUX serial device");
+ dev->device = 0;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_DIVA_AUX,
+ quirk_diva_aux_disable);
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode =
- (apic->irq_delivery_mode == dest_LowestPrio) ?
- dest_LowestPrio : dest_Fixed;
+ int_pkt->int_desc.delivery_mode = dest_Fixed;
/*
* Create MSI w/ dummy vCPU set, overwritten by subsequent retarget in
int_pkt->wslot.slot = slot;
int_pkt->int_desc.vector = vector;
int_pkt->int_desc.vector_count = 1;
- int_pkt->int_desc.delivery_mode =
- (apic->irq_delivery_mode == dest_LowestPrio) ?
- dest_LowestPrio : dest_Fixed;
+ int_pkt->int_desc.delivery_mode = dest_Fixed;
/*
* Create MSI w/ dummy vCPU set targeting just one vCPU, overwritten
err = rcar_pcie_get_resources(pcie);
if (err < 0) {
dev_err(dev, "failed to request resources: %d\n", err);
- goto err_free_bridge;
+ goto err_free_resource_list;
}
err = rcar_pcie_parse_map_dma_ranges(pcie, dev->of_node);
if (err)
- goto err_free_bridge;
+ goto err_free_resource_list;
pm_runtime_enable(dev);
err = pm_runtime_get_sync(dev);
err_pm_disable:
pm_runtime_disable(dev);
-err_free_bridge:
- pci_free_host_bridge(bridge);
+err_free_resource_list:
pci_free_resource_list(&pcie->resources);
+ pci_free_host_bridge(bridge);
return err;
}
* the subsequent "thaw" callbacks for the device.
*/
if (dev_pm_smart_suspend_and_suspended(dev)) {
- dev->power.direct_complete = true;
+ dev_pm_skip_next_resume_phases(dev);
return 0;
}
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume_early(dev);
- pci_update_current_state(pci_dev, PCI_D0);
+ /*
+ * pci_restore_state() requires the device to be in D0 (because of MSI
+ * restoration among other things), so force it into D0 in case the
+ * driver's "freeze" callbacks put it into a low-power state directly.
+ */
+ pci_set_power_state(pci_dev, PCI_D0);
pci_restore_state(pci_dev);
if (drv && drv->pm && drv->pm->thaw_noirq)
int irq, error;
irq = platform_get_irq_byname(pdev, name);
- if (!irq)
+ if (irq < 0)
return -ENODEV;
error = devm_request_threaded_irq(ddata->dev, irq, NULL,
tristate "Renesas R-Car generation 3 USB 2.0 PHY driver"
depends on ARCH_RENESAS
depends on EXTCON
+ depends on USB_SUPPORT
select GENERIC_PHY
+ select USB_COMMON
help
Support for USB 2.0 PHY found on Renesas R-Car generation 3 SoCs.
if (IS_ERR(phy)) {
dev_err(dev, "failed to create phy: %s\n",
child_np->name);
+ pm_runtime_disable(dev);
return PTR_ERR(phy);
}
phy_provider = devm_of_phy_provider_register(dev, of_phy_simple_xlate);
if (IS_ERR(phy_provider)) {
dev_err(dev, "Failed to register phy provider\n");
+ pm_runtime_disable(dev);
return PTR_ERR(phy_provider);
}
static struct device_node *
tegra_xusb_find_pad_node(struct tegra_xusb_padctl *padctl, const char *name)
{
- /*
- * of_find_node_by_name() drops a reference, so make sure to grab one.
- */
- struct device_node *np = of_node_get(padctl->dev->of_node);
+ struct device_node *pads, *np;
+
+ pads = of_get_child_by_name(padctl->dev->of_node, "pads");
+ if (!pads)
+ return NULL;
- np = of_find_node_by_name(np, "pads");
- if (np)
- np = of_find_node_by_name(np, name);
+ np = of_get_child_by_name(pads, name);
+ of_node_put(pads);
return np;
}
static struct device_node *
tegra_xusb_pad_find_phy_node(struct tegra_xusb_pad *pad, unsigned int index)
{
- /*
- * of_find_node_by_name() drops a reference, so make sure to grab one.
- */
- struct device_node *np = of_node_get(pad->dev.of_node);
+ struct device_node *np, *lanes;
- np = of_find_node_by_name(np, "lanes");
- if (!np)
+ lanes = of_get_child_by_name(pad->dev.of_node, "lanes");
+ if (!lanes)
return NULL;
- return of_find_node_by_name(np, pad->soc->lanes[index].name);
+ np = of_get_child_by_name(lanes, pad->soc->lanes[index].name);
+ of_node_put(lanes);
+
+ return np;
}
static int
unsigned int i;
int err;
- children = of_find_node_by_name(pad->dev.of_node, "lanes");
+ children = of_get_child_by_name(pad->dev.of_node, "lanes");
if (!children)
return -ENODEV;
tegra_xusb_find_port_node(struct tegra_xusb_padctl *padctl, const char *type,
unsigned int index)
{
- /*
- * of_find_node_by_name() drops a reference, so make sure to grab one.
- */
- struct device_node *np = of_node_get(padctl->dev->of_node);
+ struct device_node *ports, *np;
+ char *name;
- np = of_find_node_by_name(np, "ports");
- if (np) {
- char *name;
+ ports = of_get_child_by_name(padctl->dev->of_node, "ports");
+ if (!ports)
+ return NULL;
- name = kasprintf(GFP_KERNEL, "%s-%u", type, index);
- if (!name)
- return ERR_PTR(-ENOMEM);
- np = of_find_node_by_name(np, name);
- kfree(name);
+ name = kasprintf(GFP_KERNEL, "%s-%u", type, index);
+ if (!name) {
+ of_node_put(ports);
+ return ERR_PTR(-ENOMEM);
}
+ np = of_get_child_by_name(ports, name);
+ kfree(name);
+ of_node_put(ports);
return np;
}
static int tegra_xusb_padctl_probe(struct platform_device *pdev)
{
- struct device_node *np = of_node_get(pdev->dev.of_node);
+ struct device_node *np = pdev->dev.of_node;
const struct tegra_xusb_padctl_soc *soc;
struct tegra_xusb_padctl *padctl;
const struct of_device_id *match;
int err;
/* for backwards compatibility with old device trees */
- np = of_find_node_by_name(np, "pads");
+ np = of_get_child_by_name(np, "pads");
if (!np) {
dev_warn(&pdev->dev, "deprecated DT, using legacy driver\n");
return tegra_xusb_padctl_legacy_probe(pdev);
clear_bit(i, chip->irq.valid_mask);
}
+ /*
+ * The same set of machines in chv_no_valid_mask[] have incorrectly
+ * configured GPIOs that generate spurious interrupts so we use
+ * this same list to apply another quirk for them.
+ *
+ * See also https://bugzilla.kernel.org/show_bug.cgi?id=197953.
+ */
+ if (!need_valid_mask) {
+ /*
+ * Mask all interrupts the community is able to generate
+ * but leave the ones that can only generate GPEs unmasked.
+ */
+ chv_writel(GENMASK(31, pctrl->community->nirqs),
+ pctrl->regs + CHV_INTMASK);
+ }
+
/* Clear all interrupts */
chv_writel(0xffff, pctrl->regs + CHV_INTSTAT);
*/
static struct lock_class_key pcs_lock_class;
+/* Class for the IRQ request mutex */
+static struct lock_class_key pcs_request_class;
+
/*
* REVISIT: Reads and writes could eventually use regmap or something
* generic. But at least on omaps, some mux registers are performance
irq_set_chip_data(irq, pcs_soc);
irq_set_chip_and_handler(irq, &pcs->chip,
handle_level_irq);
- irq_set_lockdep_class(irq, &pcs_lock_class);
+ irq_set_lockdep_class(irq, &pcs_lock_class, &pcs_request_class);
irq_set_noprobe(irq);
return 0;
}
static int stm32_gpio_domain_activate(struct irq_domain *d,
- struct irq_data *irq_data, bool early)
+ struct irq_data *irq_data, bool reserve)
{
struct stm32_gpio_bank *bank = d->host_data;
struct stm32_pinctrl *pctl = dev_get_drvdata(bank->gpio_chip.parent);
return;
}
input_report_key(data->idev, KEY_RFKILL, 1);
+ input_sync(data->idev);
input_report_key(data->idev, KEY_RFKILL, 0);
input_sync(data->idev);
}
struct quirk_entry {
u8 touchpad_led;
+ u8 kbd_led_levels_off_1;
int needs_kbd_timeouts;
/*
.kbd_timeouts = { 0, 5, 15, 60, 5 * 60, 15 * 60, -1 },
};
+static struct quirk_entry quirk_dell_latitude_e6410 = {
+ .kbd_led_levels_off_1 = 1,
+};
+
static struct platform_driver platform_driver = {
.driver = {
.name = "dell-laptop",
},
.driver_data = &quirk_dell_xps13_9333,
},
+ {
+ .callback = dmi_matched,
+ .ident = "Dell Latitude E6410",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Latitude E6410"),
+ },
+ .driver_data = &quirk_dell_latitude_e6410,
+ },
{ }
};
units = (buffer->output[2] >> 8) & 0xFF;
info->levels = (buffer->output[2] >> 16) & 0xFF;
+ if (quirks && quirks->kbd_led_levels_off_1 && info->levels)
+ info->levels--;
+
if (units & BIT(0))
info->seconds = (buffer->output[3] >> 0) & 0xFF;
if (units & BIT(1))
int ret;
buffer = kzalloc(sizeof(struct calling_interface_buffer), GFP_KERNEL);
+ if (!buffer)
+ return -ENOMEM;
buffer->cmd_class = CLASS_INFO;
buffer->cmd_select = SELECT_APP_REGISTRATION;
buffer->input[0] = 0x10000;
class_unregister(&wmi_bus_class);
}
-subsys_initcall(acpi_wmi_init);
+subsys_initcall_sync(acpi_wmi_init);
module_exit(acpi_wmi_exit);
erp = dasd_3990_erp_handle_match_erp(cqr, erp);
}
+
+ /*
+ * For path verification work we need to stick with the path that was
+ * originally chosen so that the per path configuration data is
+ * assigned correctly.
+ */
+ if (test_bit(DASD_CQR_VERIFY_PATH, &erp->flags) && cqr->lpm) {
+ erp->lpm = cqr->lpm;
+ }
+
if (device->features & DASD_FEATURE_ERPLOG) {
/* print current erp_chain */
dev_err(&device->cdev->dev,
CFLAGS_sclp_early_core.o += -march=z900
endif
+CFLAGS_sclp_early_core.o += -D__NO_FORTIFY
+
obj-y += ctrlchar.o keyboard.o defkeymap.o sclp.o sclp_rw.o sclp_quiesce.o \
sclp_cmd.o sclp_config.o sclp_cpi_sys.o sclp_ocf.o sclp_ctl.o \
sclp_early.o sclp_early_core.o
};
struct qeth_ipato {
- int enabled;
- int invert4;
- int invert6;
+ bool enabled;
+ bool invert4;
+ bool invert6;
struct list_head entries;
};
qeth_set_intial_options(card);
/* IP address takeover */
INIT_LIST_HEAD(&card->ipato.entries);
- card->ipato.enabled = 0;
- card->ipato.invert4 = 0;
- card->ipato.invert6 = 0;
+ card->ipato.enabled = false;
+ card->ipato.invert4 = false;
+ card->ipato.invert6 = false;
/* init QDIO stuff */
qeth_init_qdio_info(card);
INIT_DELAYED_WORK(&card->buffer_reclaim_work, qeth_buffer_reclaim_work);
}
EXPORT_SYMBOL_GPL(qeth_poll);
+static int qeth_setassparms_inspect_rc(struct qeth_ipa_cmd *cmd)
+{
+ if (!cmd->hdr.return_code)
+ cmd->hdr.return_code = cmd->data.setassparms.hdr.return_code;
+ return cmd->hdr.return_code;
+}
+
int qeth_setassparms_cb(struct qeth_card *card,
struct qeth_reply *reply, unsigned long data)
{
(struct qeth_checksum_cmd *)reply->param;
QETH_CARD_TEXT(card, 4, "chkdoccb");
- if (cmd->hdr.return_code)
+ if (qeth_setassparms_inspect_rc(cmd))
return 0;
memset(chksum_cb, 0, sizeof(*chksum_cb));
int qeth_l3_add_rxip(struct qeth_card *, enum qeth_prot_versions, const u8 *);
void qeth_l3_del_rxip(struct qeth_card *card, enum qeth_prot_versions,
const u8 *);
-int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *, struct qeth_ipaddr *);
+void qeth_l3_update_ipato(struct qeth_card *card);
struct qeth_ipaddr *qeth_l3_get_addr_buffer(enum qeth_prot_versions);
int qeth_l3_add_ip(struct qeth_card *, struct qeth_ipaddr *);
int qeth_l3_delete_ip(struct qeth_card *, struct qeth_ipaddr *);
}
}
-int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card,
- struct qeth_ipaddr *addr)
+static bool qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card,
+ struct qeth_ipaddr *addr)
{
struct qeth_ipato_entry *ipatoe;
u8 addr_bits[128] = {0, };
if (!card->ipato.enabled)
return 0;
+ if (addr->type != QETH_IP_TYPE_NORMAL)
+ return 0;
qeth_l3_convert_addr_to_bits((u8 *) &addr->u, addr_bits,
(addr->proto == QETH_PROT_IPV4)? 4:16);
memcpy(addr, tmp_addr, sizeof(struct qeth_ipaddr));
addr->ref_counter = 1;
- if (addr->type == QETH_IP_TYPE_NORMAL &&
- qeth_l3_is_addr_covered_by_ipato(card, addr)) {
+ if (qeth_l3_is_addr_covered_by_ipato(card, addr)) {
QETH_CARD_TEXT(card, 2, "tkovaddr");
addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
}
/*
* IP address takeover related functions
*/
+
+/**
+ * qeth_l3_update_ipato() - Update 'takeover' property, for all NORMAL IPs.
+ *
+ * Caller must hold ip_lock.
+ */
+void qeth_l3_update_ipato(struct qeth_card *card)
+{
+ struct qeth_ipaddr *addr;
+ unsigned int i;
+
+ hash_for_each(card->ip_htable, i, addr, hnode) {
+ if (addr->type != QETH_IP_TYPE_NORMAL)
+ continue;
+ if (qeth_l3_is_addr_covered_by_ipato(card, addr))
+ addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG;
+ else
+ addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG;
+ }
+}
+
static void qeth_l3_clear_ipato_list(struct qeth_card *card)
{
struct qeth_ipato_entry *ipatoe, *tmp;
kfree(ipatoe);
}
+ qeth_l3_update_ipato(card);
spin_unlock_bh(&card->ip_lock);
}
}
}
- if (!rc)
+ if (!rc) {
list_add_tail(&new->entry, &card->ipato.entries);
+ qeth_l3_update_ipato(card);
+ }
spin_unlock_bh(&card->ip_lock);
(proto == QETH_PROT_IPV4)? 4:16) &&
(ipatoe->mask_bits == mask_bits)) {
list_del(&ipatoe->entry);
+ qeth_l3_update_ipato(card);
kfree(ipatoe);
}
}
struct device_attribute *attr, const char *buf, size_t count)
{
struct qeth_card *card = dev_get_drvdata(dev);
- struct qeth_ipaddr *addr;
- int i, rc = 0;
+ bool enable;
+ int rc = 0;
if (!card)
return -EINVAL;
}
if (sysfs_streq(buf, "toggle")) {
- card->ipato.enabled = (card->ipato.enabled)? 0 : 1;
- } else if (sysfs_streq(buf, "1")) {
- card->ipato.enabled = 1;
- hash_for_each(card->ip_htable, i, addr, hnode) {
- if ((addr->type == QETH_IP_TYPE_NORMAL) &&
- qeth_l3_is_addr_covered_by_ipato(card, addr))
- addr->set_flags |=
- QETH_IPA_SETIP_TAKEOVER_FLAG;
- }
- } else if (sysfs_streq(buf, "0")) {
- card->ipato.enabled = 0;
- hash_for_each(card->ip_htable, i, addr, hnode) {
- if (addr->set_flags &
- QETH_IPA_SETIP_TAKEOVER_FLAG)
- addr->set_flags &=
- ~QETH_IPA_SETIP_TAKEOVER_FLAG;
- }
- } else
+ enable = !card->ipato.enabled;
+ } else if (kstrtobool(buf, &enable)) {
rc = -EINVAL;
+ goto out;
+ }
+
+ if (card->ipato.enabled != enable) {
+ card->ipato.enabled = enable;
+ spin_lock_bh(&card->ip_lock);
+ qeth_l3_update_ipato(card);
+ spin_unlock_bh(&card->ip_lock);
+ }
out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;
const char *buf, size_t count)
{
struct qeth_card *card = dev_get_drvdata(dev);
+ bool invert;
int rc = 0;
if (!card)
return -EINVAL;
mutex_lock(&card->conf_mutex);
- if (sysfs_streq(buf, "toggle"))
- card->ipato.invert4 = (card->ipato.invert4)? 0 : 1;
- else if (sysfs_streq(buf, "1"))
- card->ipato.invert4 = 1;
- else if (sysfs_streq(buf, "0"))
- card->ipato.invert4 = 0;
- else
+ if (sysfs_streq(buf, "toggle")) {
+ invert = !card->ipato.invert4;
+ } else if (kstrtobool(buf, &invert)) {
rc = -EINVAL;
+ goto out;
+ }
+
+ if (card->ipato.invert4 != invert) {
+ card->ipato.invert4 = invert;
+ spin_lock_bh(&card->ip_lock);
+ qeth_l3_update_ipato(card);
+ spin_unlock_bh(&card->ip_lock);
+ }
+out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;
}
struct device_attribute *attr, const char *buf, size_t count)
{
struct qeth_card *card = dev_get_drvdata(dev);
+ bool invert;
int rc = 0;
if (!card)
return -EINVAL;
mutex_lock(&card->conf_mutex);
- if (sysfs_streq(buf, "toggle"))
- card->ipato.invert6 = (card->ipato.invert6)? 0 : 1;
- else if (sysfs_streq(buf, "1"))
- card->ipato.invert6 = 1;
- else if (sysfs_streq(buf, "0"))
- card->ipato.invert6 = 0;
- else
+ if (sysfs_streq(buf, "toggle")) {
+ invert = !card->ipato.invert6;
+ } else if (kstrtobool(buf, &invert)) {
rc = -EINVAL;
+ goto out;
+ }
+
+ if (card->ipato.invert6 != invert) {
+ card->ipato.invert6 = invert;
+ spin_lock_bh(&card->ip_lock);
+ qeth_l3_update_ipato(card);
+ spin_unlock_bh(&card->ip_lock);
+ }
+out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;
}
#define FIB_CONTEXT_FLAG_NATIVE_HBA (0x00000010)
#define FIB_CONTEXT_FLAG_NATIVE_HBA_TMF (0x00000020)
#define FIB_CONTEXT_FLAG_SCSI_CMD (0x00000040)
+#define FIB_CONTEXT_FLAG_EH_RESET (0x00000080)
/*
* Define the command values
/* Synchronize our watches */
if (((NSEC_PER_SEC - (NSEC_PER_SEC / HZ)) > now.tv_nsec)
&& (now.tv_nsec > (NSEC_PER_SEC / HZ)))
- difference = (((NSEC_PER_SEC - now.tv_nsec) * HZ)
- + NSEC_PER_SEC / 2) / NSEC_PER_SEC;
+ difference = HZ + HZ / 2 -
+ now.tv_nsec / (NSEC_PER_SEC / HZ);
else {
if (now.tv_nsec > NSEC_PER_SEC / 2)
++now.tv_sec;
if (kthread_should_stop())
break;
+ /*
+ * we probably want usleep_range() here instead of the
+ * jiffies computation
+ */
schedule_timeout(difference);
if (kthread_should_stop())
info = &aac->hba_map[bus][cid];
if (bus >= AAC_MAX_BUSES || cid >= AAC_MAX_TARGETS ||
info->devtype != AAC_DEVTYPE_NATIVE_RAW) {
- fib->flags |= FIB_CONTEXT_FLAG_TIMED_OUT;
+ fib->flags |= FIB_CONTEXT_FLAG_EH_RESET;
cmd->SCp.phase = AAC_OWNER_ERROR_HANDLER;
}
}
struct fc_bsg_request *bsg_request = job->request;
struct fc_bsg_reply *bsg_reply = job->reply;
uint32_t vendor_cmd = bsg_request->rqst_data.h_vendor.vendor_cmd[0];
- struct bfad_im_port_s *im_port = shost_priv(fc_bsg_to_shost(job));
+ struct Scsi_Host *shost = fc_bsg_to_shost(job);
+ struct bfad_im_port_s *im_port = bfad_get_im_port(shost);
struct bfad_s *bfad = im_port->bfad;
void *payload_kbuf;
int rc = -EINVAL;
bfad_im_bsg_els_ct_request(struct bsg_job *job)
{
struct bfa_bsg_data *bsg_data;
- struct bfad_im_port_s *im_port = shost_priv(fc_bsg_to_shost(job));
+ struct Scsi_Host *shost = fc_bsg_to_shost(job);
+ struct bfad_im_port_s *im_port = bfad_get_im_port(shost);
struct bfad_s *bfad = im_port->bfad;
bfa_bsg_fcpt_t *bsg_fcpt;
struct bfad_fcxp *drv_fcxp;
bfad_im_scsi_host_alloc(struct bfad_s *bfad, struct bfad_im_port_s *im_port,
struct device *dev)
{
+ struct bfad_im_port_pointer *im_portp;
int error = 1;
mutex_lock(&bfad_mutex);
goto out_free_idr;
}
- im_port->shost->hostdata[0] = (unsigned long)im_port;
+ im_portp = shost_priv(im_port->shost);
+ im_portp->p = im_port;
im_port->shost->unique_id = im_port->idr_id;
im_port->shost->this_id = -1;
im_port->shost->max_id = MAX_FCP_TARGET;
sht->sg_tablesize = bfad->cfg_data.io_max_sge;
- return scsi_host_alloc(sht, sizeof(unsigned long));
+ return scsi_host_alloc(sht, sizeof(struct bfad_im_port_pointer));
}
void
struct fc_vport *fc_vport;
};
+struct bfad_im_port_pointer {
+ struct bfad_im_port_s *p;
+};
+
+static inline struct bfad_im_port_s *bfad_get_im_port(struct Scsi_Host *host)
+{
+ struct bfad_im_port_pointer *im_portp = shost_priv(host);
+ return im_portp->p;
+}
+
enum bfad_itnim_state {
ITNIM_STATE_NONE,
ITNIM_STATE_ONLINE,
case ELS_FLOGI:
if (!lport->point_to_multipoint)
fc_lport_recv_flogi_req(lport, fp);
+ else
+ fc_rport_recv_req(lport, fp);
break;
case ELS_LOGO:
if (fc_frame_sid(fp) == FC_FID_FLOGI)
fc_lport_recv_logo_req(lport, fp);
+ else
+ fc_rport_recv_req(lport, fp);
break;
case ELS_RSCN:
lport->tt.disc_recv_req(lport, fp);
struct sas_rphy *rphy)
{
struct domain_device *dev;
- unsigned int reslen = 0;
+ unsigned int rcvlen = 0;
int ret = -EINVAL;
/* no rphy means no smp target support (ie aic94xx host) */
ret = smp_execute_task_sg(dev, job->request_payload.sg_list,
job->reply_payload.sg_list);
- if (ret > 0) {
- /* positive number is the untransferred residual */
- reslen = ret;
+ if (ret >= 0) {
+ /* bsg_job_done() requires the length received */
+ rcvlen = job->reply_payload.payload_len - ret;
ret = 0;
}
out:
- bsg_job_done(job, ret, reslen);
+ bsg_job_done(job, ret, rcvlen);
}
drqe.address_hi = putPaddrHigh(rqb_entry->dbuf.phys);
rc = lpfc_sli4_rq_put(rqb_entry->hrq, rqb_entry->drq, &hrqe, &drqe);
if (rc < 0) {
- (rqbp->rqb_free_buffer)(phba, rqb_entry);
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"6409 Cannot post to RQ %d: %x %x\n",
rqb_entry->hrq->queue_id,
rqb_entry->hrq->host_index,
rqb_entry->hrq->hba_index);
+ (rqbp->rqb_free_buffer)(phba, rqb_entry);
} else {
list_add_tail(&rqb_entry->hbuf.list, &rqbp->rqb_buffer_list);
rqbp->buffer_count++;
return req;
for_each_bio(bio) {
- ret = blk_rq_append_bio(req, bio);
+ struct bio *bounce_bio = bio;
+
+ ret = blk_rq_append_bio(req, &bounce_bio);
if (ret)
return ERR_PTR(ret);
}
{
struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
- char buf[80];
+ const u8 *const cdb = READ_ONCE(cmd->cmnd);
+ char buf[80] = "(?)";
- __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
+ if (cdb)
+ __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
cmd->retries, msecs / 1000, msecs % 1000);
}
};
-static const char spaces[] = " "; /* 16 of them */
static blist_flags_t scsi_default_dev_flags;
static LIST_HEAD(scsi_dev_info_list);
static char scsi_dev_flags[256];
size_t from_length;
from_length = strlen(from);
- strncpy(to, from, min(to_length, from_length));
- if (from_length < to_length) {
- if (compatible) {
- /*
- * NUL terminate the string if it is short.
- */
- to[from_length] = '\0';
- } else {
- /*
- * space pad the string if it is short.
- */
- strncpy(&to[from_length], spaces,
- to_length - from_length);
- }
+ /* This zero-pads the destination */
+ strncpy(to, from, to_length);
+ if (from_length < to_length && !compatible) {
+ /*
+ * space pad the string if it is short.
+ */
+ memset(&to[from_length], ' ', to_length - from_length);
}
if (from_length > to_length)
printk(KERN_WARNING "%s: %s string '%s' is too long\n",
model, compatible);
if (strflags)
- devinfo->flags = simple_strtoul(strflags, NULL, 0);
- else
- devinfo->flags = flags;
-
+ flags = (__force blist_flags_t)simple_strtoul(strflags, NULL, 0);
+ devinfo->flags = flags;
devinfo->compatible = compatible;
if (compatible)
/*
* vendor strings must be an exact match
*/
- if (vmax != strlen(devinfo->vendor) ||
+ if (vmax != strnlen(devinfo->vendor,
+ sizeof(devinfo->vendor)) ||
memcmp(devinfo->vendor, vskip, vmax))
continue;
* @model specifies the full string, and
* must be larger or equal to devinfo->model
*/
- mlen = strlen(devinfo->model);
+ mlen = strnlen(devinfo->model, sizeof(devinfo->model));
if (mmax < mlen || memcmp(devinfo->model, mskip, mlen))
continue;
return devinfo;
out_put_device:
put_device(&sdev->sdev_gendev);
out:
+ if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
+ blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
return false;
}
* SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized
**/
static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
- int *bflags, int async)
+ blist_flags_t *bflags, int async)
{
int ret;
* - SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized
**/
static int scsi_probe_and_add_lun(struct scsi_target *starget,
- u64 lun, int *bflagsp,
+ u64 lun, blist_flags_t *bflagsp,
struct scsi_device **sdevp,
enum scsi_scan_mode rescan,
void *hostdata)
{
struct scsi_device *sdev;
unsigned char *result;
- int bflags, res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
+ blist_flags_t bflags;
+ int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
/*
* Modifies sdevscan->lun.
**/
static void scsi_sequential_lun_scan(struct scsi_target *starget,
- int bflags, int scsi_level,
+ blist_flags_t bflags, int scsi_level,
enum scsi_scan_mode rescan)
{
uint max_dev_lun;
* 0: scan completed (or no memory, so further scanning is futile)
* 1: could not scan with REPORT LUN
**/
-static int scsi_report_lun_scan(struct scsi_target *starget, int bflags,
+static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflags,
enum scsi_scan_mode rescan)
{
unsigned char scsi_cmd[MAX_COMMAND_SIZE];
unsigned int id, u64 lun, enum scsi_scan_mode rescan)
{
struct Scsi_Host *shost = dev_to_shost(parent);
- int bflags = 0;
+ blist_flags_t bflags = 0;
int res;
struct scsi_target *starget;
}
static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
-#define BLIST_FLAG_NAME(name) [ilog2(BLIST_##name)] = #name
+#define BLIST_FLAG_NAME(name) \
+ [ilog2((__force unsigned int)BLIST_##name)] = #name
static const char *const sdev_bflags_name[] = {
#include "scsi_devinfo_tbl.c"
};
for (i = 0; i < sizeof(sdev->sdev_bflags) * BITS_PER_BYTE; i++) {
const char *name = NULL;
- if (!(sdev->sdev_bflags & BIT(i)))
+ if (!(sdev->sdev_bflags & (__force blist_flags_t)BIT(i)))
continue;
if (i < ARRAY_SIZE(sdev_bflags_name) && sdev_bflags_name[i])
name = sdev_bflags_name[i];
* check.
*/
if (sdev->channel != starget->channel ||
- sdev->id != starget->id ||
+ sdev->id != starget->id)
+ continue;
+ if (sdev->sdev_state == SDEV_DEL ||
+ sdev->sdev_state == SDEV_CANCEL ||
!get_device(&sdev->sdev_gendev))
continue;
spin_unlock_irqrestore(shost->host_lock, flags);
/* Our blacklist flags */
enum {
- SPI_BLIST_NOIUS = 0x1,
+ SPI_BLIST_NOIUS = (__force blist_flags_t)0x1,
};
/* blacklist table, modelled on scsi_devinfo.c */
static struct {
char *vendor;
char *model;
- unsigned flags;
+ blist_flags_t flags;
} spi_static_device_list[] __initdata = {
{"HP", "Ultrium 3-SCSI", SPI_BLIST_NOIUS },
{"IBM", "ULTRIUM-TD3", SPI_BLIST_NOIUS },
{
struct scsi_device *sdev = to_scsi_device(dev);
struct scsi_target *starget = sdev->sdev_target;
- unsigned bflags = scsi_get_device_flags_keyed(sdev, &sdev->inquiry[8],
- &sdev->inquiry[16],
- SCSI_DEVINFO_SPI);
+ blist_flags_t bflags;
+
+ bflags = scsi_get_device_flags_keyed(sdev, &sdev->inquiry[8],
+ &sdev->inquiry[16],
+ SCSI_DEVINFO_SPI);
/* Populate the target capability fields with the values
* gleaned from the device inquiry */
static void sd_uninit_command(struct scsi_cmnd *SCpnt)
{
struct request *rq = SCpnt->request;
+ u8 *cmnd;
if (SCpnt->flags & SCMD_ZONE_WRITE_LOCK)
sd_zbc_write_unlock_zone(SCpnt);
__free_page(rq->special_vec.bv_page);
if (SCpnt->cmnd != scsi_req(rq)->cmd) {
- mempool_free(SCpnt->cmnd, sd_cdb_pool);
+ cmnd = SCpnt->cmnd;
SCpnt->cmnd = NULL;
SCpnt->cmd_len = 0;
+ mempool_free(cmnd, sd_cdb_pool);
}
}
case TEST_UNIT_READY:
break;
default:
- set_host_byte(scmnd, DID_TARGET_FAILURE);
+ set_host_byte(scmnd, DID_ERROR);
}
break;
case SRB_STATUS_INVALID_LUN:
+ set_host_byte(scmnd, DID_NO_CONNECT);
do_work = true;
process_err_fn = storvsc_remove_lun;
break;
#define A3700_SPI_BYTE_LEN BIT(5)
#define A3700_SPI_CLK_PRESCALE BIT(0)
#define A3700_SPI_CLK_PRESCALE_MASK (0x1f)
+#define A3700_SPI_CLK_EVEN_OFFS (0x10)
#define A3700_SPI_WFIFO_THRS_BIT 28
#define A3700_SPI_RFIFO_THRS_BIT 24
prescale = DIV_ROUND_UP(clk_get_rate(a3700_spi->clk), speed_hz);
+ /* For prescaler values over 15, we can only set it by steps of 2.
+ * Starting from A3700_SPI_CLK_EVEN_OFFS, we set values from 0 up to
+ * 30. We only use this range from 16 to 30.
+ */
+ if (prescale > 15)
+ prescale = A3700_SPI_CLK_EVEN_OFFS + DIV_ROUND_UP(prescale, 2);
+
val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG);
val = val & ~A3700_SPI_CLK_PRESCALE_MASK;
pm_runtime_get_sync(&pdev->dev);
/* reset the hardware and block queue progress */
- spin_lock_irq(&as->lock);
if (as->use_dma) {
atmel_spi_stop_dma(master);
atmel_spi_release_dma(master);
}
+ spin_lock_irq(&as->lock);
spi_writel(as, CR, SPI_BIT(SWRST));
spi_writel(as, CR, SPI_BIT(SWRST)); /* AT91SAM9263 Rev B workaround */
spi_readl(as, SR);
/* Sets SPCMD */
rspi_write16(rspi, rspi->spcmd, RSPI_SPCMD0);
- /* Enables SPI function in master mode */
- rspi_write8(rspi, SPCR_SPE | SPCR_MSTR, RSPI_SPCR);
+ /* Sets RSPI mode */
+ rspi_write8(rspi, SPCR_MSTR, RSPI_SPCR);
return 0;
}
static int sun4i_spi_remove(struct platform_device *pdev)
{
- pm_runtime_disable(&pdev->dev);
+ pm_runtime_force_suspend(&pdev->dev);
return 0;
}
while (remaining_words) {
int n_words, tx_words, rx_words;
u32 sr;
+ int stalled;
n_words = min(remaining_words, xspi->buffer_size);
/* Read out all the data from the Rx FIFO */
rx_words = n_words;
+ stalled = 10;
while (rx_words) {
+ if (rx_words == n_words && !(stalled--) &&
+ !(sr & XSPI_SR_TX_EMPTY_MASK) &&
+ (sr & XSPI_SR_RX_EMPTY_MASK)) {
+ dev_err(&spi->dev,
+ "Detected stall. Check C_SPI_MODE and C_SPI_MEMORY\n");
+ xspi_init_hw(xspi);
+ return -EIO;
+ }
+
if ((sr & XSPI_SR_TX_EMPTY_MASK) && (rx_words > 1)) {
xilinx_spi_rx(xspi);
rx_words--;
config ION_CMA_HEAP
bool "Ion CMA heap support"
- depends on ION && CMA
+ depends on ION && DMA_CMA
help
Choose this option to enable CMA heaps with Ion. This heap is backed
by the Contiguous Memory Allocator (CMA). If your system has these
mutex_lock(&buffer->lock);
list_for_each_entry(a, &buffer->attachments, list) {
dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents,
- DMA_BIDIRECTIONAL);
+ direction);
}
mutex_unlock(&buffer->lock);
mutex_lock(&buffer->lock);
list_for_each_entry(a, &buffer->attachments, list) {
dma_sync_sg_for_device(a->dev, a->table->sgl, a->table->nents,
- DMA_BIDIRECTIONAL);
+ direction);
}
mutex_unlock(&buffer->lock);
struct ion_cma_heap *cma_heap = to_cma_heap(heap);
struct sg_table *table;
struct page *pages;
+ unsigned long size = PAGE_ALIGN(len);
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ unsigned long align = get_order(size);
int ret;
- pages = cma_alloc(cma_heap->cma, len, 0, GFP_KERNEL);
+ if (align > CONFIG_CMA_ALIGNMENT)
+ align = CONFIG_CMA_ALIGNMENT;
+
+ pages = cma_alloc(cma_heap->cma, nr_pages, align, GFP_KERNEL);
if (!pages)
return -ENOMEM;
if (ret)
goto free_mem;
- sg_set_page(table->sgl, pages, len, 0);
+ sg_set_page(table->sgl, pages, size, 0);
buffer->priv_virt = pages;
buffer->sg_table = table;
free_mem:
kfree(table);
err:
- cma_release(cma_heap->cma, pages, buffer->size);
+ cma_release(cma_heap->cma, pages, nr_pages);
return -ENOMEM;
}
{
struct ion_cma_heap *cma_heap = to_cma_heap(buffer->heap);
struct page *pages = buffer->priv_virt;
+ unsigned long nr_pages = PAGE_ALIGN(buffer->size) >> PAGE_SHIFT;
/* release memory */
- cma_release(cma_heap->cma, pages, buffer->size);
+ cma_release(cma_heap->cma, pages, nr_pages);
/* release sg table */
sg_free_table(buffer->sg_table);
kfree(buffer->sg_table);
struct device *dev = drvdata_to_dev(ctx->drvdata);
struct ahash_req_ctx *state = ahash_request_ctx(req);
u32 tmp;
- int rc;
+ int rc = 0;
memcpy(&tmp, in, sizeof(u32));
if (tmp != CC_EXPORT_MAGIC) {
ksocknal_nid2peerlist(id.nid));
}
- route2 = NULL;
list_for_each_entry(route2, &peer->ksnp_routes, ksnr_list) {
- if (route2->ksnr_ipaddr == ipaddr)
- break;
-
- route2 = NULL;
- }
- if (!route2) {
- ksocknal_add_route_locked(peer, route);
- route->ksnr_share_count++;
- } else {
- ksocknal_route_decref(route);
- route2->ksnr_share_count++;
+ if (route2->ksnr_ipaddr == ipaddr) {
+ /* Route already exists, use the old one */
+ ksocknal_route_decref(route);
+ route2->ksnr_share_count++;
+ goto out;
+ }
}
-
+ /* Route doesn't already exist, add the new one */
+ ksocknal_add_route_locked(peer, route);
+ route->ksnr_share_count++;
+out:
write_unlock_bh(&ksocknal_data.ksnd_global_lock);
return 0;
currentValue = READ_REG(REG_DATAMODUL);
- switch (currentValue & MASK_DATAMODUL_MODULATION_TYPE >> 3) { // TODO improvement: change 3 to define
+ switch (currentValue & MASK_DATAMODUL_MODULATION_TYPE) {
case DATAMODUL_MODULATION_TYPE_OOK: return OOK;
case DATAMODUL_MODULATION_TYPE_FSK: return FSK;
default: return undefined;
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);
- rc = blk_rq_append_bio(req, bio);
+ rc = blk_rq_append_bio(req, &bio);
if (rc) {
pr_err("pSCSI: failed to append bio\n");
goto fail;
}
if (bio) {
- rc = blk_rq_append_bio(req, bio);
+ rc = blk_rq_append_bio(req, &bio);
if (rc) {
pr_err("pSCSI: failed to append bio\n");
goto fail;
return;
if (ring->start_poll) {
- __ring_interrupt_mask(ring, false);
+ __ring_interrupt_mask(ring, true);
ring->start_poll(ring->poll_data);
} else {
schedule_work(&ring->work);
{
struct n_tty_data *ldata = tty->disc_data;
- if (!old || (old->c_lflag ^ tty->termios.c_lflag) & ICANON) {
+ if (!old || (old->c_lflag ^ tty->termios.c_lflag) & (ICANON | EXTPROC)) {
bitmap_zero(ldata->read_flags, N_TTY_BUF_SIZE);
ldata->line_start = ldata->read_tail;
if (!L_ICANON(tty) || !read_cnt(ldata)) {
return put_user(tty_chars_in_buffer(tty), (int __user *) arg);
case TIOCINQ:
down_write(&tty->termios_rwsem);
- if (L_ICANON(tty))
+ if (L_ICANON(tty) && !L_EXTPROC(tty))
retval = inq_canon(ldata);
else
retval = read_cnt(ldata);
if (ret)
goto err_mux;
- ulpi_node = of_find_node_by_name(of_node_get(pdev->dev.of_node), "ulpi");
+ ulpi_node = of_get_child_by_name(pdev->dev.of_node, "ulpi");
if (ulpi_node) {
phy_node = of_get_next_available_child(ulpi_node, NULL);
ci->hsic = of_device_is_compatible(phy_node, "qcom,usb-hsic-phy");
unsigned iad_num = 0;
memcpy(&config->desc, buffer, USB_DT_CONFIG_SIZE);
+ nintf = nintf_orig = config->desc.bNumInterfaces;
+ config->desc.bNumInterfaces = 0; // Adjusted later
+
if (config->desc.bDescriptorType != USB_DT_CONFIG ||
config->desc.bLength < USB_DT_CONFIG_SIZE ||
config->desc.bLength > size) {
buffer += config->desc.bLength;
size -= config->desc.bLength;
- nintf = nintf_orig = config->desc.bNumInterfaces;
if (nintf > USB_MAXINTERFACES) {
dev_warn(ddev, "config %d has too many interfaces: %d, "
"using maximum allowed: %d\n",
case USB_SSP_CAP_TYPE:
ssp_cap = (struct usb_ssp_cap_descriptor *)buffer;
ssac = (le32_to_cpu(ssp_cap->bmAttributes) &
- USB_SSP_SUBLINK_SPEED_ATTRIBS) + 1;
+ USB_SSP_SUBLINK_SPEED_ATTRIBS);
if (length >= USB_DT_USB_SSP_CAP_SIZE(ssac))
dev->bos->ssp_cap = ssp_cap;
break;
/* Microsoft LifeCam-VX700 v2.0 */
{ USB_DEVICE(0x045e, 0x0770), .driver_info = USB_QUIRK_RESET_RESUME },
- /* Logitech HD Pro Webcams C920, C920-C and C930e */
+ /* Logitech HD Pro Webcams C920, C920-C, C925e and C930e */
{ USB_DEVICE(0x046d, 0x082d), .driver_info = USB_QUIRK_DELAY_INIT },
{ USB_DEVICE(0x046d, 0x0841), .driver_info = USB_QUIRK_DELAY_INIT },
{ USB_DEVICE(0x046d, 0x0843), .driver_info = USB_QUIRK_DELAY_INIT },
+ { USB_DEVICE(0x046d, 0x085b), .driver_info = USB_QUIRK_DELAY_INIT },
/* Logitech ConferenceCam CC3000e */
{ USB_DEVICE(0x046d, 0x0847), .driver_info = USB_QUIRK_DELAY_INIT },
/* Genesys Logic hub, internally used by KY-688 USB 3.1 Type-C Hub */
{ USB_DEVICE(0x05e3, 0x0612), .driver_info = USB_QUIRK_NO_LPM },
+ /* ELSA MicroLink 56K */
+ { USB_DEVICE(0x05cc, 0x2267), .driver_info = USB_QUIRK_RESET_RESUME },
+
/* Genesys Logic hub, internally used by Moshi USB to Ethernet Adapter */
{ USB_DEVICE(0x05e3, 0x0616), .driver_info = USB_QUIRK_NO_LPM },
* 2 - Internal DMA
* @power_optimized Are power optimizations enabled?
* @num_dev_ep Number of device endpoints available
+ * @num_dev_in_eps Number of device IN endpoints available
* @num_dev_perio_in_ep Number of device periodic IN endpoints
* available
* @dev_token_q_depth Device Mode IN Token Sequence Learning Queue
* 2 - 8 or 16 bits
* @snpsid: Value from SNPSID register
* @dev_ep_dirs: Direction of device endpoints (GHWCFG1)
+ * @g_tx_fifo_size[] Power-on values of TxFIFO sizes
*/
struct dwc2_hw_params {
unsigned op_mode:3;
unsigned fs_phy_type:2;
unsigned i2c_enable:1;
unsigned num_dev_ep:4;
+ unsigned num_dev_in_eps : 4;
unsigned num_dev_perio_in_ep:4;
unsigned total_fifo_size:16;
unsigned power_optimized:1;
unsigned utmi_phy_data_width:2;
u32 snpsid;
u32 dev_ep_dirs;
+ u32 g_tx_fifo_size[MAX_EPS_CHANNELS];
};
/* Size of control and EP0 buffers */
{
if (hsotg->hw_params.en_multiple_tx_fifo)
/* In dedicated FIFO mode we need count of IN EPs */
- return (dwc2_readl(hsotg->regs + GHWCFG4) &
- GHWCFG4_NUM_IN_EPS_MASK) >> GHWCFG4_NUM_IN_EPS_SHIFT;
+ return hsotg->hw_params.num_dev_in_eps;
else
/* In shared FIFO mode we need count of Periodic IN EPs */
return hsotg->hw_params.num_dev_perio_in_ep;
}
-/**
- * dwc2_hsotg_ep_info_size - return Endpoint Info Control block size in DWORDs
- */
-static int dwc2_hsotg_ep_info_size(struct dwc2_hsotg *hsotg)
-{
- int val = 0;
- int i;
- u32 ep_dirs;
-
- /*
- * Don't need additional space for ep info control registers in
- * slave mode.
- */
- if (!using_dma(hsotg)) {
- dev_dbg(hsotg->dev, "Buffer DMA ep info size 0\n");
- return 0;
- }
-
- /*
- * Buffer DMA mode - 1 location per endpoit
- * Descriptor DMA mode - 4 locations per endpoint
- */
- ep_dirs = hsotg->hw_params.dev_ep_dirs;
-
- for (i = 0; i <= hsotg->hw_params.num_dev_ep; i++) {
- val += ep_dirs & 3 ? 1 : 2;
- ep_dirs >>= 2;
- }
-
- if (using_desc_dma(hsotg))
- val = val * 4;
-
- return val;
-}
-
/**
* dwc2_hsotg_tx_fifo_total_depth - return total FIFO depth available for
* device mode TX FIFOs
*/
int dwc2_hsotg_tx_fifo_total_depth(struct dwc2_hsotg *hsotg)
{
- int ep_info_size;
int addr;
int tx_addr_max;
u32 np_tx_fifo_size;
hsotg->params.g_np_tx_fifo_size);
/* Get Endpoint Info Control block size in DWORDs. */
- ep_info_size = dwc2_hsotg_ep_info_size(hsotg);
- tx_addr_max = hsotg->hw_params.total_fifo_size - ep_info_size;
+ tx_addr_max = hsotg->hw_params.total_fifo_size;
addr = hsotg->params.g_rx_fifo_size + np_tx_fifo_size;
if (tx_addr_max <= addr)
}
for (fifo = 1; fifo <= fifo_count; fifo++) {
- dptxfszn = (dwc2_readl(hsotg->regs + DPTXFSIZN(fifo)) &
- FIFOSIZE_DEPTH_MASK) >> FIFOSIZE_DEPTH_SHIFT;
+ dptxfszn = hsotg->hw_params.g_tx_fifo_size[fifo];
if (hsotg->params.g_tx_fifo_size[fifo] < min ||
hsotg->params.g_tx_fifo_size[fifo] > dptxfszn) {
struct dwc2_hw_params *hw = &hsotg->hw_params;
bool forced;
u32 gnptxfsiz;
+ int fifo, fifo_count;
if (hsotg->dr_mode == USB_DR_MODE_HOST)
return;
gnptxfsiz = dwc2_readl(hsotg->regs + GNPTXFSIZ);
+ fifo_count = dwc2_hsotg_tx_fifo_count(hsotg);
+
+ for (fifo = 1; fifo <= fifo_count; fifo++) {
+ hw->g_tx_fifo_size[fifo] =
+ (dwc2_readl(hsotg->regs + DPTXFSIZN(fifo)) &
+ FIFOSIZE_DEPTH_MASK) >> FIFOSIZE_DEPTH_SHIFT;
+ }
+
if (forced)
dwc2_clear_force_mode(hsotg);
hwcfg4 = dwc2_readl(hsotg->regs + GHWCFG4);
grxfsiz = dwc2_readl(hsotg->regs + GRXFSIZ);
- /*
- * Host specific hardware parameters. Reading these parameters
- * requires the controller to be in host mode. The mode will
- * be forced, if necessary, to read these values.
- */
- dwc2_get_host_hwparams(hsotg);
- dwc2_get_dev_hwparams(hsotg);
-
/* hwcfg1 */
hw->dev_ep_dirs = hwcfg1;
hw->en_multiple_tx_fifo = !!(hwcfg4 & GHWCFG4_DED_FIFO_EN);
hw->num_dev_perio_in_ep = (hwcfg4 & GHWCFG4_NUM_DEV_PERIO_IN_EP_MASK) >>
GHWCFG4_NUM_DEV_PERIO_IN_EP_SHIFT;
+ hw->num_dev_in_eps = (hwcfg4 & GHWCFG4_NUM_IN_EPS_MASK) >>
+ GHWCFG4_NUM_IN_EPS_SHIFT;
hw->dma_desc_enable = !!(hwcfg4 & GHWCFG4_DESC_DMA);
hw->power_optimized = !!(hwcfg4 & GHWCFG4_POWER_OPTIMIZ);
hw->utmi_phy_data_width = (hwcfg4 & GHWCFG4_UTMI_PHY_DATA_WIDTH_MASK) >>
/* fifo sizes */
hw->rx_fifo_size = (grxfsiz & GRXFSIZ_DEPTH_MASK) >>
GRXFSIZ_DEPTH_SHIFT;
+ /*
+ * Host specific hardware parameters. Reading these parameters
+ * requires the controller to be in host mode. The mode will
+ * be forced, if necessary, to read these values.
+ */
+ dwc2_get_host_hwparams(hsotg);
+ dwc2_get_dev_hwparams(hsotg);
return 0;
}
clk = of_clk_get(np, i);
if (IS_ERR(clk)) {
- while (--i >= 0)
+ while (--i >= 0) {
+ clk_disable_unprepare(simple->clks[i]);
clk_put(simple->clks[i]);
+ }
return PTR_ERR(clk);
}
.driver = {
.name = "dwc3-of-simple",
.of_match_table = of_dwc3_simple_match,
+ .pm = &dwc3_of_simple_dev_pm_ops,
},
};
{
const struct usb_endpoint_descriptor *desc = dep->endpoint.desc;
struct dwc3 *dwc = dep->dwc;
- u32 timeout = 500;
+ u32 timeout = 1000;
u32 reg;
int cmd_status = 0;
*/
if (speed == USB_SPEED_HIGH) {
struct usb_ep *ep = &dep->endpoint;
- unsigned int mult = ep->mult - 1;
+ unsigned int mult = 2;
unsigned int maxp = usb_endpoint_maxp(ep->desc);
if (length <= (2 * maxp))
controller, and the relevant drivers for each function declared
by the device.
-endchoice
-
source "drivers/usb/gadget/legacy/Kconfig"
+endchoice
+
endif # USB_GADGET
# both kinds of controller can also support "USB On-the-Go" (CONFIG_USB_OTG).
#
-menuconfig USB_GADGET_LEGACY
- bool "Legacy USB Gadget Support"
- help
- Legacy USB gadgets are USB gadgets that do not use the USB gadget
- configfs interface.
-
-if USB_GADGET_LEGACY
-
config USB_ZERO
tristate "Gadget Zero (DEVELOPMENT)"
select USB_LIBCOMPOSITE
# or video class gadget drivers), or specific hardware, here.
config USB_G_WEBCAM
tristate "USB Webcam Gadget"
- depends on VIDEO_DEV
+ depends on VIDEO_V4L2
select USB_LIBCOMPOSITE
select VIDEOBUF2_VMALLOC
select USB_F_UVC
Say "y" to link the driver statically, or "m" to build a
dynamically linked module called "g_webcam".
-
-endif
static int xhci_ring_enqueue_show(struct seq_file *s, void *unused)
{
dma_addr_t dma;
- struct xhci_ring *ring = s->private;
+ struct xhci_ring *ring = *(struct xhci_ring **)s->private;
dma = xhci_trb_virt_to_dma(ring->enq_seg, ring->enqueue);
seq_printf(s, "%pad\n", &dma);
static int xhci_ring_dequeue_show(struct seq_file *s, void *unused)
{
dma_addr_t dma;
- struct xhci_ring *ring = s->private;
+ struct xhci_ring *ring = *(struct xhci_ring **)s->private;
dma = xhci_trb_virt_to_dma(ring->deq_seg, ring->dequeue);
seq_printf(s, "%pad\n", &dma);
static int xhci_ring_cycle_show(struct seq_file *s, void *unused)
{
- struct xhci_ring *ring = s->private;
+ struct xhci_ring *ring = *(struct xhci_ring **)s->private;
seq_printf(s, "%d\n", ring->cycle_state);
}
static struct dentry *xhci_debugfs_create_ring_dir(struct xhci_hcd *xhci,
- struct xhci_ring *ring,
+ struct xhci_ring **ring,
const char *name,
struct dentry *parent)
{
snprintf(epriv->name, sizeof(epriv->name), "ep%02d", ep_index);
epriv->root = xhci_debugfs_create_ring_dir(xhci,
- dev->eps[ep_index].new_ring,
+ &dev->eps[ep_index].new_ring,
epriv->name,
spriv->root);
spriv->eps[ep_index] = epriv;
priv->dev = dev;
dev->debugfs_private = priv;
- xhci_debugfs_create_ring_dir(xhci, dev->eps[0].ring,
+ xhci_debugfs_create_ring_dir(xhci, &dev->eps[0].ring,
"ep00", priv->root);
xhci_debugfs_create_context_files(xhci, priv->root, slot_id);
ARRAY_SIZE(xhci_extcap_dbc),
"reg-ext-dbc");
- xhci_debugfs_create_ring_dir(xhci, xhci->cmd_ring,
+ xhci_debugfs_create_ring_dir(xhci, &xhci->cmd_ring,
"command-ring",
xhci->debugfs_root);
- xhci_debugfs_create_ring_dir(xhci, xhci->event_ring,
+ xhci_debugfs_create_ring_dir(xhci, &xhci->event_ring,
"event-ring",
xhci->debugfs_root);
return 0;
}
- xhci->devs[slot_id] = kzalloc(sizeof(*xhci->devs[slot_id]), flags);
- if (!xhci->devs[slot_id])
+ dev = kzalloc(sizeof(*dev), flags);
+ if (!dev)
return 0;
- dev = xhci->devs[slot_id];
/* Allocate the (output) device context that will be used in the HC. */
dev->out_ctx = xhci_alloc_container_ctx(xhci, XHCI_CTX_TYPE_DEVICE, flags);
trace_xhci_alloc_virt_device(dev);
+ xhci->devs[slot_id] = dev;
+
return 1;
fail:
- xhci_free_virt_device(xhci, slot_id);
+
+ if (dev->in_ctx)
+ xhci_free_container_ctx(xhci, dev->in_ctx);
+ if (dev->out_ctx)
+ xhci_free_container_ctx(xhci, dev->out_ctx);
+ kfree(dev);
+
return 0;
}
xhci->quirks |= XHCI_TRUST_TX_LENGTH;
xhci->quirks |= XHCI_BROKEN_STREAMS;
}
+ if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
+ pdev->device == 0x0014)
+ xhci->quirks |= XHCI_TRUST_TX_LENGTH;
if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
pdev->device == 0x0015)
xhci->quirks |= XHCI_RESET_ON_RESUME;
{
u32 maxp, total_packet_count;
- /* MTK xHCI is mostly 0.97 but contains some features from 1.0 */
+ /* MTK xHCI 0.96 contains some features from 1.0 */
if (xhci->hci_version < 0x100 && !(xhci->quirks & XHCI_MTK_HOST))
return ((td_total_len - transferred) >> 10);
trb_buff_len == td_total_len)
return 0;
- /* for MTK xHCI, TD size doesn't include this TRB */
- if (xhci->quirks & XHCI_MTK_HOST)
+ /* for MTK xHCI 0.96, TD size include this TRB, but not in 1.x */
+ if ((xhci->quirks & XHCI_MTK_HOST) && (xhci->hci_version < 0x100))
trb_buff_len = 0;
maxp = usb_endpoint_maxp(&urb->ep->desc);
struct xhci_slot_ctx *slot_ctx;
int i, ret;
- xhci_debugfs_remove_slot(xhci, udev->slot_id);
-
#ifndef CONFIG_USB_DEFAULT_PERSIST
/*
* We called pm_runtime_get_noresume when the device was attached.
}
ret = xhci_disable_slot(xhci, udev->slot_id);
- if (ret)
+ if (ret) {
+ xhci_debugfs_remove_slot(xhci, udev->slot_id);
xhci_free_virt_device(xhci, udev->slot_id);
+ }
}
int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id)
musb->xceiv->otg->state = OTG_STATE_A_WAIT_VRISE;
portstate(musb->port1_status |= USB_PORT_STAT_POWER);
del_timer(&musb->dev_timer);
- } else {
+ } else if (!(musb->int_usb & MUSB_INTR_BABBLE)) {
+ /*
+ * When babble condition happens, drvvbus interrupt
+ * is also generated. Ignore this drvvbus interrupt
+ * and let babble interrupt handler recovers the
+ * controller; otherwise, the host-mode flag is lost
+ * due to the MUSB_DEV_MODE() call below and babble
+ * recovery logic will not be called.
+ */
musb->is_active = 0;
MUSB_DEV_MODE(musb);
otg->default_a = 0;
.driver_info = (kernel_ulong_t)&ftdi_jtag_quirk },
{ USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_BT_USB_PID) },
{ USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_WL_USB_PID) },
+ { USB_DEVICE(AIRBUS_DS_VID, AIRBUS_DS_P8GR) },
{ } /* Terminating entry */
};
#define ICPDAS_I7561U_PID 0x0104
#define ICPDAS_I7563U_PID 0x0105
+/*
+ * Airbus Defence and Space
+ */
+#define AIRBUS_DS_VID 0x1e8e /* Vendor ID */
+#define AIRBUS_DS_P8GR 0x6001 /* Tetra P8GR */
+
/*
* RT Systems programming cables for various ham radios
*/
/* These Quectel products use Qualcomm's vendor ID */
#define QUECTEL_PRODUCT_UC20 0x9003
#define QUECTEL_PRODUCT_UC15 0x9090
+/* These Yuga products use Qualcomm's vendor ID */
+#define YUGA_PRODUCT_CLM920_NC5 0x9625
#define QUECTEL_VENDOR_ID 0x2c7c
/* These Quectel products use Quectel's vendor ID */
#define TELIT_PRODUCT_LE922_USBCFG3 0x1043
#define TELIT_PRODUCT_LE922_USBCFG5 0x1045
#define TELIT_PRODUCT_ME910 0x1100
+#define TELIT_PRODUCT_ME910_DUAL_MODEM 0x1101
#define TELIT_PRODUCT_LE920 0x1200
#define TELIT_PRODUCT_LE910 0x1201
#define TELIT_PRODUCT_LE910_USBCFG4 0x1206
.reserved = BIT(1) | BIT(3),
};
+static const struct option_blacklist_info telit_me910_dual_modem_blacklist = {
+ .sendsetup = BIT(0),
+ .reserved = BIT(3),
+};
+
static const struct option_blacklist_info telit_le910_blacklist = {
.sendsetup = BIT(0),
.reserved = BIT(1) | BIT(2),
.reserved = BIT(4) | BIT(5),
};
+static const struct option_blacklist_info yuga_clm920_nc5_blacklist = {
+ .reserved = BIT(1) | BIT(4),
+};
+
static const struct usb_device_id option_ids[] = {
{ USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_COLT) },
{ USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_RICOLA) },
{ USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC15)},
{ USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC20),
.driver_info = (kernel_ulong_t)&net_intf4_blacklist },
+ /* Yuga products use Qualcomm vendor ID */
+ { USB_DEVICE(QUALCOMM_VENDOR_ID, YUGA_PRODUCT_CLM920_NC5),
+ .driver_info = (kernel_ulong_t)&yuga_clm920_nc5_blacklist },
/* Quectel products using Quectel vendor ID */
{ USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC21),
.driver_info = (kernel_ulong_t)&net_intf4_blacklist },
.driver_info = (kernel_ulong_t)&telit_le922_blacklist_usbcfg0 },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910),
.driver_info = (kernel_ulong_t)&telit_me910_blacklist },
+ { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910_DUAL_MODEM),
+ .driver_info = (kernel_ulong_t)&telit_me910_dual_modem_blacklist },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910),
.driver_info = (kernel_ulong_t)&telit_le910_blacklist },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910_USBCFG4),
{DEVICE_SWI(0x1199, 0x9079)}, /* Sierra Wireless EM74xx */
{DEVICE_SWI(0x1199, 0x907a)}, /* Sierra Wireless EM74xx QDL */
{DEVICE_SWI(0x1199, 0x907b)}, /* Sierra Wireless EM74xx */
+ {DEVICE_SWI(0x1199, 0x9090)}, /* Sierra Wireless EM7565 QDL */
+ {DEVICE_SWI(0x1199, 0x9091)}, /* Sierra Wireless EM7565 */
{DEVICE_SWI(0x413c, 0x81a2)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */
{DEVICE_SWI(0x413c, 0x81a3)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */
{DEVICE_SWI(0x413c, 0x81a4)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */
break;
case 2:
dev_dbg(dev, "NMEA GPS interface found\n");
+ sendsetup = true;
break;
case 3:
dev_dbg(dev, "Modem port found\n");
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA ),
+/* Reported by David Kozub <zub@linux.fjfi.cvut.cz> */
+UNUSUAL_DEV(0x152d, 0x0578, 0x0000, 0x9999,
+ "JMicron",
+ "JMS567",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_BROKEN_FUA),
+
/*
* Reported by Alexandre Oliva <oliva@lsd.ic.unicamp.br>
* JMicron responds to USN and several other SCSI ioctls with a
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA | US_FL_NO_REPORT_OPCODES),
+/* Reported-by: David Kozub <zub@linux.fjfi.cvut.cz> */
+UNUSUAL_DEV(0x152d, 0x0578, 0x0000, 0x9999,
+ "JMicron",
+ "JMS567",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_BROKEN_FUA),
+
/* Reported-by: Hans de Goede <hdegoede@redhat.com> */
UNUSUAL_DEV(0x2109, 0x0711, 0x0000, 0x9999,
"VIA",
* step 1?
*/
if (ud->tcp_socket) {
- dev_dbg(&sdev->udev->dev, "shutdown tcp_socket %p\n",
- ud->tcp_socket);
+ dev_dbg(&sdev->udev->dev, "shutdown sockfd %d\n", ud->sockfd);
kernel_sock_shutdown(ud->tcp_socket, SHUT_RDWR);
}
struct stub_priv *priv;
struct urb *urb;
- dev_dbg(&sdev->udev->dev, "free sdev %p\n", sdev);
+ dev_dbg(&sdev->udev->dev, "Stub device cleaning up urbs\n");
while ((priv = stub_priv_pop(sdev))) {
urb = priv->urb;
- dev_dbg(&sdev->udev->dev, "free urb %p\n", urb);
+ dev_dbg(&sdev->udev->dev, "free urb seqnum %lu\n",
+ priv->seqnum);
usb_kill_urb(urb);
kmem_cache_free(stub_priv_cache, priv);
if (priv->seqnum != pdu->u.cmd_unlink.seqnum)
continue;
- dev_info(&priv->urb->dev->dev, "unlink urb %p\n",
- priv->urb);
-
/*
* This matched urb is not completed yet (i.e., be in
* flight in usb hcd hardware/driver). Now we are
ret = usb_unlink_urb(priv->urb);
if (ret != -EINPROGRESS)
dev_err(&priv->urb->dev->dev,
- "failed to unlink a urb %p, ret %d\n",
- priv->urb, ret);
+ "failed to unlink a urb # %lu, ret %d\n",
+ priv->seqnum, ret);
return 0;
}
return priv;
}
-static int get_pipe(struct stub_device *sdev, int epnum, int dir)
+static int get_pipe(struct stub_device *sdev, struct usbip_header *pdu)
{
struct usb_device *udev = sdev->udev;
struct usb_host_endpoint *ep;
struct usb_endpoint_descriptor *epd = NULL;
+ int epnum = pdu->base.ep;
+ int dir = pdu->base.direction;
+
+ if (epnum < 0 || epnum > 15)
+ goto err_ret;
if (dir == USBIP_DIR_IN)
ep = udev->ep_in[epnum & 0x7f];
else
ep = udev->ep_out[epnum & 0x7f];
- if (!ep) {
- dev_err(&sdev->udev->dev, "no such endpoint?, %d\n",
- epnum);
- BUG();
- }
+ if (!ep)
+ goto err_ret;
epd = &ep->desc;
+
if (usb_endpoint_xfer_control(epd)) {
if (dir == USBIP_DIR_OUT)
return usb_sndctrlpipe(udev, epnum);
}
if (usb_endpoint_xfer_isoc(epd)) {
+ /* validate packet size and number of packets */
+ unsigned int maxp, packets, bytes;
+
+ maxp = usb_endpoint_maxp(epd);
+ maxp *= usb_endpoint_maxp_mult(epd);
+ bytes = pdu->u.cmd_submit.transfer_buffer_length;
+ packets = DIV_ROUND_UP(bytes, maxp);
+
+ if (pdu->u.cmd_submit.number_of_packets < 0 ||
+ pdu->u.cmd_submit.number_of_packets > packets) {
+ dev_err(&sdev->udev->dev,
+ "CMD_SUBMIT: isoc invalid num packets %d\n",
+ pdu->u.cmd_submit.number_of_packets);
+ return -1;
+ }
if (dir == USBIP_DIR_OUT)
return usb_sndisocpipe(udev, epnum);
else
return usb_rcvisocpipe(udev, epnum);
}
+err_ret:
/* NOT REACHED */
- dev_err(&sdev->udev->dev, "get pipe, epnum %d\n", epnum);
- return 0;
+ dev_err(&sdev->udev->dev, "CMD_SUBMIT: invalid epnum %d\n", epnum);
+ return -1;
}
static void masking_bogus_flags(struct urb *urb)
struct stub_priv *priv;
struct usbip_device *ud = &sdev->ud;
struct usb_device *udev = sdev->udev;
- int pipe = get_pipe(sdev, pdu->base.ep, pdu->base.direction);
+ int pipe = get_pipe(sdev, pdu);
+
+ if (pipe == -1)
+ return;
priv = stub_priv_alloc(sdev, pdu);
if (!priv)
/* link a urb to the queue of tx. */
spin_lock_irqsave(&sdev->priv_lock, flags);
if (sdev->ud.tcp_socket == NULL) {
- usbip_dbg_stub_tx("ignore urb for closed connection %p", urb);
+ usbip_dbg_stub_tx("ignore urb for closed connection\n");
/* It will be freed in stub_device_cleanup_urbs(). */
} else if (priv->unlinking) {
stub_enqueue_ret_unlink(sdev, priv->seqnum, urb->status);
memset(&pdu_header, 0, sizeof(pdu_header));
memset(&msg, 0, sizeof(msg));
+ if (urb->actual_length > 0 && !urb->transfer_buffer) {
+ dev_err(&sdev->udev->dev,
+ "urb: actual_length %d transfer_buffer null\n",
+ urb->actual_length);
+ return -1;
+ }
+
if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS)
iovnum = 2 + urb->number_of_packets;
else
/* 1. setup usbip_header */
setup_ret_submit_pdu(&pdu_header, urb);
- usbip_dbg_stub_tx("setup txdata seqnum: %d urb: %p\n",
- pdu_header.base.seqnum, urb);
+ usbip_dbg_stub_tx("setup txdata seqnum: %d\n",
+ pdu_header.base.seqnum);
usbip_header_correct_endian(&pdu_header, 1);
iov[iovnum].iov_base = &pdu_header;
struct msghdr msg = {.msg_flags = MSG_NOSIGNAL};
int total = 0;
+ if (!sock || !buf || !size)
+ return -EINVAL;
+
iov_iter_kvec(&msg.msg_iter, READ|ITER_KVEC, &iov, 1, size);
usbip_dbg_xmit("enter\n");
- if (!sock || !buf || !size) {
- pr_err("invalid arg, sock %p buff %p size %d\n", sock, buf,
- size);
- return -EINVAL;
- }
-
do {
- int sz = msg_data_left(&msg);
+ msg_data_left(&msg);
sock->sk->sk_allocation = GFP_NOIO;
result = sock_recvmsg(sock, &msg, MSG_WAITALL);
- if (result <= 0) {
- pr_debug("receive sock %p buf %p size %u ret %d total %d\n",
- sock, buf + total, sz, result, total);
+ if (result <= 0)
goto err;
- }
total += result;
} while (msg_data_left(&msg));
/* lock for status */
spinlock_t lock;
+ int sockfd;
struct socket *tcp_socket;
struct task_struct *tcp_rx;
struct vhci_device *vdev;
unsigned long flags;
- usbip_dbg_vhci_hc("enter, usb_hcd %p urb %p mem_flags %d\n",
- hcd, urb, mem_flags);
-
if (portnum > VHCI_HC_PORTS) {
pr_err("invalid port number %d\n", portnum);
return -ENODEV;
struct vhci_device *vdev;
unsigned long flags;
- pr_info("dequeue a urb %p\n", urb);
-
spin_lock_irqsave(&vhci->lock, flags);
priv = urb->hcpriv;
/* tcp connection is closed */
spin_lock(&vdev->priv_lock);
- pr_info("device %p seems to be disconnected\n", vdev);
list_del(&priv->list);
kfree(priv);
urb->hcpriv = NULL;
* vhci_rx will receive RET_UNLINK and give back the URB.
* Otherwise, we give back it here.
*/
- pr_info("gives back urb %p\n", urb);
-
usb_hcd_unlink_urb_from_ep(hcd, urb);
spin_unlock_irqrestore(&vhci->lock, flags);
unlink->unlink_seqnum = priv->seqnum;
- pr_info("device %p seems to be still connected\n", vdev);
-
/* send cmd_unlink and try to cancel the pending URB in the
* peer */
list_add_tail(&unlink->list, &vdev->unlink_tx);
/* need this? see stub_dev.c */
if (ud->tcp_socket) {
- pr_debug("shutdown tcp_socket %p\n", ud->tcp_socket);
+ pr_debug("shutdown tcp_socket %d\n", ud->sockfd);
kernel_sock_shutdown(ud->tcp_socket, SHUT_RDWR);
}
urb = priv->urb;
status = urb->status;
- usbip_dbg_vhci_rx("find urb %p vurb %p seqnum %u\n",
- urb, priv, seqnum);
+ usbip_dbg_vhci_rx("find urb seqnum %u\n", seqnum);
switch (status) {
case -ENOENT:
/* fall through */
case -ECONNRESET:
- dev_info(&urb->dev->dev,
- "urb %p was unlinked %ssynchronuously.\n", urb,
- status == -ENOENT ? "" : "a");
+ dev_dbg(&urb->dev->dev,
+ "urb seq# %u was unlinked %ssynchronuously\n",
+ seqnum, status == -ENOENT ? "" : "a");
break;
case -EINPROGRESS:
/* no info output */
break;
default:
- dev_info(&urb->dev->dev,
- "urb %p may be in a error, status %d\n", urb,
- status);
+ dev_dbg(&urb->dev->dev,
+ "urb seq# %u may be in a error, status %d\n",
+ seqnum, status);
}
list_del(&priv->list);
spin_unlock_irqrestore(&vdev->priv_lock, flags);
if (!urb) {
- pr_err("cannot find a urb of seqnum %u\n", pdu->base.seqnum);
- pr_info("max seqnum %d\n",
+ pr_err("cannot find a urb of seqnum %u max seqnum %d\n",
+ pdu->base.seqnum,
atomic_read(&vhci_hcd->seqnum));
usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
return;
if (usbip_dbg_flag_vhci_rx)
usbip_dump_urb(urb);
- usbip_dbg_vhci_rx("now giveback urb %p\n", urb);
+ usbip_dbg_vhci_rx("now giveback urb %u\n", pdu->base.seqnum);
spin_lock_irqsave(&vhci->lock, flags);
usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb);
pr_info("the urb (seqnum %d) was already given back\n",
pdu->base.seqnum);
} else {
- usbip_dbg_vhci_rx("now giveback urb %p\n", urb);
+ usbip_dbg_vhci_rx("now giveback urb %d\n", pdu->base.seqnum);
/* If unlink is successful, status is -ECONNRESET */
urb->status = pdu->u.ret_unlink.status;
/*
* output example:
- * hub port sta spd dev socket local_busid
- * hs 0000 004 000 00000000 c5a7bb80 1-2.3
+ * hub port sta spd dev sockfd local_busid
+ * hs 0000 004 000 00000000 3 1-2.3
* ................................................
- * ss 0008 004 000 00000000 d8cee980 2-3.4
+ * ss 0008 004 000 00000000 4 2-3.4
* ................................................
*
- * IP address can be retrieved from a socket pointer address by looking
- * up /proc/net/{tcp,tcp6}. Also, a userland program may remember a
- * port number and its peer IP address.
+ * Output includes socket fd instead of socket pointer address to avoid
+ * leaking kernel memory address in:
+ * /sys/devices/platform/vhci_hcd.0/status and in debug output.
+ * The socket pointer address is not used at the moment and it was made
+ * visible as a convenient way to find IP address from socket pointer
+ * address by looking up /proc/net/{tcp,tcp6}. As this opens a security
+ * hole, the change is made to use sockfd instead.
+ *
*/
static void port_show_vhci(char **out, int hub, int port, struct vhci_device *vdev)
{
if (vdev->ud.status == VDEV_ST_USED) {
*out += sprintf(*out, "%03u %08x ",
vdev->speed, vdev->devid);
- *out += sprintf(*out, "%16p %s",
- vdev->ud.tcp_socket,
+ *out += sprintf(*out, "%u %s",
+ vdev->ud.sockfd,
dev_name(&vdev->udev->dev));
} else {
char *s = out;
/*
- * Half the ports are for SPEED_HIGH and half for SPEED_SUPER, thus the * 2.
+ * Half the ports are for SPEED_HIGH and half for SPEED_SUPER,
+ * thus the * 2.
*/
out += sprintf(out, "%d\n", VHCI_PORTS * vhci_num_controllers);
return out - s;
vdev->devid = devid;
vdev->speed = speed;
+ vdev->ud.sockfd = sockfd;
vdev->ud.tcp_socket = socket;
vdev->ud.status = VDEV_ST_NOTASSIGNED;
memset(&msg, 0, sizeof(msg));
memset(&iov, 0, sizeof(iov));
- usbip_dbg_vhci_tx("setup txdata urb %p\n", urb);
+ usbip_dbg_vhci_tx("setup txdata urb seqnum %lu\n",
+ priv->seqnum);
/* 1. setup usbip_header */
setup_cmd_submit_pdu(&pdu_header, urb);
return -EBUSY;
vm_dev = devm_kzalloc(&pdev->dev, sizeof(*vm_dev), GFP_KERNEL);
- if (!vm_dev) {
- rc = -ENOMEM;
- goto free_mem;
- }
+ if (!vm_dev)
+ return -ENOMEM;
vm_dev->vdev.dev.parent = &pdev->dev;
vm_dev->vdev.dev.release = virtio_mmio_release_dev;
spin_lock_init(&vm_dev->lock);
vm_dev->base = devm_ioremap(&pdev->dev, mem->start, resource_size(mem));
- if (vm_dev->base == NULL) {
- rc = -EFAULT;
- goto free_vmdev;
- }
+ if (vm_dev->base == NULL)
+ return -EFAULT;
/* Check magic value */
magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE);
if (magic != ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)) {
dev_warn(&pdev->dev, "Wrong magic value 0x%08lx!\n", magic);
- rc = -ENODEV;
- goto unmap;
+ return -ENODEV;
}
/* Check device version */
if (vm_dev->version < 1 || vm_dev->version > 2) {
dev_err(&pdev->dev, "Version %ld not supported!\n",
vm_dev->version);
- rc = -ENXIO;
- goto unmap;
+ return -ENXIO;
}
vm_dev->vdev.id.device = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_ID);
* virtio-mmio device with an ID 0 is a (dummy) placeholder
* with no function. End probing now with no error reported.
*/
- rc = -ENODEV;
- goto unmap;
+ return -ENODEV;
}
vm_dev->vdev.id.vendor = readl(vm_dev->base + VIRTIO_MMIO_VENDOR_ID);
platform_set_drvdata(pdev, vm_dev);
rc = register_virtio_device(&vm_dev->vdev);
- if (rc) {
- iounmap(vm_dev->base);
- devm_release_mem_region(&pdev->dev, mem->start,
- resource_size(mem));
+ if (rc)
put_device(&vm_dev->vdev.dev);
- }
- return rc;
-unmap:
- iounmap(vm_dev->base);
-free_mem:
- devm_release_mem_region(&pdev->dev, mem->start,
- resource_size(mem));
-free_vmdev:
- devm_kfree(&pdev->dev, vm_dev);
+
return rc;
}
static int virtio_mmio_remove(struct platform_device *pdev)
{
struct virtio_mmio_device *vm_dev = platform_get_drvdata(pdev);
- struct resource *mem;
-
- iounmap(vm_dev->base);
- mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- if (mem)
- devm_release_mem_region(&pdev->dev, mem->start,
- resource_size(mem));
unregister_virtio_device(&vm_dev->vdev);
return 0;
config XEN_ACPI_PROCESSOR
tristate "Xen ACPI processor"
- depends on XEN && X86 && ACPI_PROCESSOR && CPU_FREQ
+ depends on XEN && XEN_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
default m
help
This ACPI processor uploads Power Management information to the Xen
kfree(resource);
}
+/*
+ * Host memory not allocated to dom0. We can use this range for hotplug-based
+ * ballooning.
+ *
+ * It's a type-less resource. Setting IORESOURCE_MEM will make resource
+ * management algorithms (arch_remove_reservations()) look into guest e820,
+ * which we don't want.
+ */
+static struct resource hostmem_resource = {
+ .name = "Host RAM",
+};
+
+void __attribute__((weak)) __init arch_xen_balloon_init(struct resource *res)
+{}
+
static struct resource *additional_memory_resource(phys_addr_t size)
{
- struct resource *res;
- int ret;
+ struct resource *res, *res_hostmem;
+ int ret = -ENOMEM;
res = kzalloc(sizeof(*res), GFP_KERNEL);
if (!res)
res->name = "System RAM";
res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
- ret = allocate_resource(&iomem_resource, res,
- size, 0, -1,
- PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
- if (ret < 0) {
- pr_err("Cannot allocate new System RAM resource\n");
- kfree(res);
- return NULL;
+ res_hostmem = kzalloc(sizeof(*res), GFP_KERNEL);
+ if (res_hostmem) {
+ /* Try to grab a range from hostmem */
+ res_hostmem->name = "Host memory";
+ ret = allocate_resource(&hostmem_resource, res_hostmem,
+ size, 0, -1,
+ PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
+ }
+
+ if (!ret) {
+ /*
+ * Insert this resource into iomem. Because hostmem_resource
+ * tracks portion of guest e820 marked as UNUSABLE noone else
+ * should try to use it.
+ */
+ res->start = res_hostmem->start;
+ res->end = res_hostmem->end;
+ ret = insert_resource(&iomem_resource, res);
+ if (ret < 0) {
+ pr_err("Can't insert iomem_resource [%llx - %llx]\n",
+ res->start, res->end);
+ release_memory_resource(res_hostmem);
+ res_hostmem = NULL;
+ res->start = res->end = 0;
+ }
+ }
+
+ if (ret) {
+ ret = allocate_resource(&iomem_resource, res,
+ size, 0, -1,
+ PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
+ if (ret < 0) {
+ pr_err("Cannot allocate new System RAM resource\n");
+ kfree(res);
+ return NULL;
+ }
}
#ifdef CONFIG_SPARSEMEM
pr_err("New System RAM resource outside addressable RAM (%lu > %lu)\n",
pfn, limit);
release_memory_resource(res);
+ release_memory_resource(res_hostmem);
return NULL;
}
}
set_online_page_callback(&xen_online_page);
register_memory_notifier(&xen_memory_nb);
register_sysctl_table(xen_root);
+
+ arch_xen_balloon_init(&hostmem_resource);
#endif
#ifdef CONFIG_XEN_PV
pvcalls_exit();
return ret;
}
- map2 = kzalloc(sizeof(*map2), GFP_KERNEL);
+ map2 = kzalloc(sizeof(*map2), GFP_ATOMIC);
if (map2 == NULL) {
clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
(void *)&map->passive.flags);
* However, if we didn't have a callback promise outstanding, or it was
* outstanding on a different server, then it won't break it either...
*/
-static int afs_dir_remove_link(struct dentry *dentry, struct key *key)
+static int afs_dir_remove_link(struct dentry *dentry, struct key *key,
+ unsigned long d_version_before,
+ unsigned long d_version_after)
{
+ bool dir_valid;
int ret = 0;
+ /* There were no intervening changes on the server if the version
+ * number we got back was incremented by exactly 1.
+ */
+ dir_valid = (d_version_after == d_version_before + 1);
+
if (d_really_is_positive(dentry)) {
struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry));
- if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
- kdebug("AFS_VNODE_DELETED");
- clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags);
-
- ret = afs_validate(vnode, key);
- if (ret == -ESTALE)
+ if (dir_valid) {
+ drop_nlink(&vnode->vfs_inode);
+ if (vnode->vfs_inode.i_nlink == 0) {
+ set_bit(AFS_VNODE_DELETED, &vnode->flags);
+ clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags);
+ }
ret = 0;
+ } else {
+ clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags);
+
+ if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
+ kdebug("AFS_VNODE_DELETED");
+
+ ret = afs_validate(vnode, key);
+ if (ret == -ESTALE)
+ ret = 0;
+ }
_debug("nlink %d [val %d]", vnode->vfs_inode.i_nlink, ret);
}
struct afs_fs_cursor fc;
struct afs_vnode *dvnode = AFS_FS_I(dir), *vnode;
struct key *key;
+ unsigned long d_version = (unsigned long)dentry->d_fsdata;
int ret;
_enter("{%x:%u},{%pd}",
afs_vnode_commit_status(&fc, dvnode, fc.cb_break);
ret = afs_end_vnode_operation(&fc);
if (ret == 0)
- ret = afs_dir_remove_link(dentry, key);
+ ret = afs_dir_remove_link(
+ dentry, key, d_version,
+ (unsigned long)dvnode->status.data_version);
}
error_key:
}
read_sequnlock_excl(&vnode->cb_lock);
+
+ if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
+ clear_nlink(&vnode->vfs_inode);
+
if (valid)
goto valid;
{
struct afs_net *net = call->net;
enum afs_call_state state;
- u32 remote_abort;
+ u32 remote_abort = 0;
int ret;
_enter("{%s,%zu},,%zu,%d",
ret = afs_fill_page(vnode, key, pos + copied,
len - copied, page);
if (ret < 0)
- return ret;
+ goto out;
}
SetPageUptodate(page);
}
set_page_dirty(page);
if (PageDirty(page))
_debug("dirtied");
+ ret = copied;
+
+out:
unlock_page(page);
put_page(page);
-
- return copied;
+ return ret;
}
/*
mutex_unlock(&sbi->wq_mutex);
- if (autofs4_write(sbi, pipe, &pkt, pktsz))
switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
case 0:
break;
spin_lock(&root->inode_lock);
node = radix_tree_lookup(&root->delayed_nodes_tree, ino);
+
if (node) {
if (btrfs_inode->delayed_node) {
refcount_inc(&node->refs); /* can be accessed */
spin_unlock(&root->inode_lock);
return node;
}
- btrfs_inode->delayed_node = node;
- /* can be accessed and cached in the inode */
- refcount_add(2, &node->refs);
+
+ /*
+ * It's possible that we're racing into the middle of removing
+ * this node from the radix tree. In this case, the refcount
+ * was zero and it should never go back to one. Just return
+ * NULL like it was never in the radix at all; our release
+ * function is in the process of removing it.
+ *
+ * Some implementations of refcount_inc refuse to bump the
+ * refcount once it has hit zero. If we don't do this dance
+ * here, refcount_inc() may decide to just WARN_ONCE() instead
+ * of actually bumping the refcount.
+ *
+ * If this node is properly in the radix, we want to bump the
+ * refcount twice, once for the inode and once for this get
+ * operation.
+ */
+ if (refcount_inc_not_zero(&node->refs)) {
+ refcount_inc(&node->refs);
+ btrfs_inode->delayed_node = node;
+ } else {
+ node = NULL;
+ }
+
spin_unlock(&root->inode_lock);
return node;
}
mutex_unlock(&delayed_node->mutex);
if (refcount_dec_and_test(&delayed_node->refs)) {
- bool free = false;
struct btrfs_root *root = delayed_node->root;
+
spin_lock(&root->inode_lock);
- if (refcount_read(&delayed_node->refs) == 0) {
- radix_tree_delete(&root->delayed_nodes_tree,
- delayed_node->inode_id);
- free = true;
- }
+ /*
+ * Once our refcount goes to zero, nobody is allowed to bump it
+ * back up. We can delete it now.
+ */
+ ASSERT(refcount_read(&delayed_node->refs) == 0);
+ radix_tree_delete(&root->delayed_nodes_tree,
+ delayed_node->inode_id);
spin_unlock(&root->inode_lock);
- if (free)
- kmem_cache_free(delayed_node_cache, delayed_node);
+ kmem_cache_free(delayed_node_cache, delayed_node);
}
}
kfree(dev);
return ERR_PTR(-ENOMEM);
}
- bio_get(dev->flush_bio);
INIT_LIST_HEAD(&dev->dev_list);
INIT_LIST_HEAD(&dev->dev_alloc_list);
return request_close_session(mdsc, session);
}
+static bool drop_negative_children(struct dentry *dentry)
+{
+ struct dentry *child;
+ bool all_negative = true;
+
+ if (!d_is_dir(dentry))
+ goto out;
+
+ spin_lock(&dentry->d_lock);
+ list_for_each_entry(child, &dentry->d_subdirs, d_child) {
+ if (d_really_is_positive(child)) {
+ all_negative = false;
+ break;
+ }
+ }
+ spin_unlock(&dentry->d_lock);
+
+ if (all_negative)
+ shrink_dcache_parent(dentry);
+out:
+ return all_negative;
+}
+
/*
* Trim old(er) caps.
*
if ((used | wanted) & ~oissued & mine)
goto out; /* we need these caps */
- session->s_trim_caps--;
if (oissued) {
/* we aren't the only cap.. just remove us */
__ceph_remove_cap(cap, true);
+ session->s_trim_caps--;
} else {
+ struct dentry *dentry;
/* try dropping referring dentries */
spin_unlock(&ci->i_ceph_lock);
- d_prune_aliases(inode);
- dout("trim_caps_cb %p cap %p pruned, count now %d\n",
- inode, cap, atomic_read(&inode->i_count));
+ dentry = d_find_any_alias(inode);
+ if (dentry && drop_negative_children(dentry)) {
+ int count;
+ dput(dentry);
+ d_prune_aliases(inode);
+ count = atomic_read(&inode->i_count);
+ if (count == 1)
+ session->s_trim_caps--;
+ dout("trim_caps_cb %p cap %p pruned, count now %d\n",
+ inode, cap, count);
+ } else {
+ dput(dentry);
+ }
return 0;
}
} while (rc == -EAGAIN);
if (rc) {
- cifs_dbg(VFS, "ioctl error in smb2_get_dfs_refer rc=%d\n", rc);
+ if (rc != -ENOENT)
+ cifs_dbg(VFS, "ioctl error in smb2_get_dfs_refer rc=%d\n", rc);
goto out;
}
cifs_small_buf_release(req);
rsp = (struct smb2_read_rsp *)rsp_iov.iov_base;
- shdr = get_sync_hdr(rsp);
- if (shdr->Status == STATUS_END_OF_FILE) {
+ if (rc) {
+ if (rc != -ENODATA) {
+ cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE);
+ cifs_dbg(VFS, "Send error in read = %d\n", rc);
+ }
free_rsp_buf(resp_buftype, rsp_iov.iov_base);
- return 0;
+ return rc == -ENODATA ? 0 : rc;
}
- if (rc) {
- cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE);
- cifs_dbg(VFS, "Send error in read = %d\n", rc);
- } else {
- *nbytes = le32_to_cpu(rsp->DataLength);
- if ((*nbytes > CIFS_MAX_MSGSIZE) ||
- (*nbytes > io_parms->length)) {
- cifs_dbg(FYI, "bad length %d for count %d\n",
- *nbytes, io_parms->length);
- rc = -EIO;
- *nbytes = 0;
- }
+ *nbytes = le32_to_cpu(rsp->DataLength);
+ if ((*nbytes > CIFS_MAX_MSGSIZE) ||
+ (*nbytes > io_parms->length)) {
+ cifs_dbg(FYI, "bad length %d for count %d\n",
+ *nbytes, io_parms->length);
+ rc = -EIO;
+ *nbytes = 0;
}
+ shdr = get_sync_hdr(rsp);
+
if (*buf) {
memcpy(*buf, (char *)shdr + rsp->DataOffset, *nbytes);
free_rsp_buf(resp_buftype, rsp_iov.iov_base);
config CRAMFS_MTD
bool "Support CramFs image directly mapped in physical memory"
depends on CRAMFS && MTD
+ depends on CRAMFS=m || MTD=y
default y if !CRAMFS_BLOCKDEV
help
This option allows the CramFs driver to load data directly from
if (pfn != pmd_pfn(*pmdp))
goto unlock_pmd;
- if (!pmd_dirty(*pmdp)
- && !pmd_access_permitted(*pmdp, WRITE))
+ if (!pmd_dirty(*pmdp) && !pmd_write(*pmdp))
goto unlock_pmd;
flush_cache_page(vma, address, pfn);
return -EAGAIN;
}
-char *get_task_comm(char *buf, struct task_struct *tsk)
+char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
{
- /* buf must be at least sizeof(tsk->comm) in size */
task_lock(tsk);
- strncpy(buf, tsk->comm, sizeof(tsk->comm));
+ strncpy(buf, tsk->comm, buf_size);
task_unlock(tsk);
return buf;
}
-EXPORT_SYMBOL_GPL(get_task_comm);
+EXPORT_SYMBOL_GPL(__get_task_comm);
/*
* These functions flushes out all traces of the currently running executable
* avoid bad behavior from the prior rlimits. This has to
* happen before arch_pick_mmap_layout(), which examines
* RLIMIT_STACK, but after the point of no return to avoid
- * races from other threads changing the limits. This also
- * must be protected from races with prlimit() calls.
+ * needing to clean up the change on failure.
*/
- task_lock(current->group_leader);
if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM)
current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM;
- if (current->signal->rlim[RLIMIT_STACK].rlim_max > _STK_LIM)
- current->signal->rlim[RLIMIT_STACK].rlim_max = _STK_LIM;
- task_unlock(current->group_leader);
}
arch_pick_mmap_layout(current->mm);
current->sas_ss_sp = current->sas_ss_size = 0;
- /* Figure out dumpability. */
+ /*
+ * Figure out dumpability. Note that this checking only of current
+ * is wrong, but userspace depends on it. This should be testing
+ * bprm->secureexec instead.
+ */
if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
- bprm->secureexec)
+ !(uid_eq(current_euid(), current_uid()) &&
+ gid_eq(current_egid(), current_gid())))
set_dumpable(current->mm, suid_dumpable);
else
set_dumpable(current->mm, SUID_DUMP_USER);
EXT4_INODE_EOFBLOCKS);
}
ext4_mark_inode_dirty(handle, inode);
+ ext4_update_inode_fsync_trans(handle, inode, 1);
ret2 = ext4_journal_stop(handle);
if (ret2)
break;
#ifdef CONFIG_EXT4_FS_POSIX_ACL
struct posix_acl *p = get_acl(dir, ACL_TYPE_DEFAULT);
+ if (IS_ERR(p))
+ return ERR_CAST(p);
if (p) {
int acl_size = p->a_count * sizeof(ext4_acl_entry);
*/
int ext4_inode_is_fast_symlink(struct inode *inode)
{
+ if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) {
+ int ea_blocks = EXT4_I(inode)->i_file_acl ?
+ EXT4_CLUSTER_SIZE(inode->i_sb) >> 9 : 0;
+
+ if (ext4_has_inline_data(inode))
+ return 0;
+
+ return (S_ISLNK(inode->i_mode) && inode->i_blocks - ea_blocks == 0);
+ }
return S_ISLNK(inode->i_mode) && inode->i_size &&
(inode->i_size < EXT4_N_BLOCKS * 4);
}
"falling back\n"));
}
nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb);
+ if (!nblocks) {
+ ret = NULL;
+ goto cleanup_and_exit;
+ }
start = EXT4_I(dir)->i_dir_start_lookup;
if (start >= nblocks)
start = 0;
SB_DIRSYNC |
SB_SILENT |
SB_POSIXACL |
+ SB_LAZYTIME |
SB_I_VERSION);
if (flags & MS_REMOUNT)
const struct sockaddr *sap = data->addr;
struct nfs_net *nn = net_generic(data->net, nfs_net_id);
+again:
list_for_each_entry(clp, &nn->nfs_client_list, cl_share_link) {
const struct sockaddr *clap = (struct sockaddr *)&clp->cl_addr;
/* Don't match clients that failed to initialise properly */
if (clp->cl_cons_state < 0)
continue;
+ /* If a client is still initializing then we need to wait */
+ if (clp->cl_cons_state > NFS_CS_READY) {
+ refcount_inc(&clp->cl_count);
+ spin_unlock(&nn->nfs_client_lock);
+ nfs_wait_client_init_complete(clp);
+ nfs_put_client(clp);
+ spin_lock(&nn->nfs_client_lock);
+ goto again;
+ }
+
/* Different NFS versions cannot share the same nfs_client */
if (clp->rpc_ops != data->nfs_mod->rpc_ops)
continue;
if (error < 0)
goto error;
- if (!nfs4_has_session(clp))
- nfs_mark_client_ready(clp, NFS_CS_READY);
-
error = nfs4_discover_server_trunking(clp, &old);
if (error < 0)
goto error;
- if (clp != old)
+ if (clp != old) {
clp->cl_preserve_clid = true;
+ /*
+ * Mark the client as having failed initialization so other
+ * processes walking the nfs_client_list in nfs_match_client()
+ * won't try to use it.
+ */
+ nfs_mark_client_ready(clp, -EPERM);
+ }
nfs_put_client(clp);
clear_bit(NFS_CS_TSM_POSSIBLE, &clp->cl_flags);
return old;
spin_lock(&nn->nfs_client_lock);
list_for_each_entry(pos, &nn->nfs_client_list, cl_share_link) {
+ if (pos == new)
+ goto found;
+
status = nfs4_match_client(pos, new, &prev, nn);
if (status < 0)
goto out_unlock;
* way that a SETCLIENTID_CONFIRM to pos can succeed is
* if new and pos point to the same server:
*/
+found:
refcount_inc(&pos->cl_count);
spin_unlock(&nn->nfs_client_lock);
case 0:
nfs4_swap_callback_idents(pos, new);
pos->cl_confirm = new->cl_confirm;
+ nfs_mark_client_ready(pos, NFS_CS_READY);
prev = NULL;
*result = pos;
if (res)
error = nfs_generic_commit_list(inode, &head, how, &cinfo);
nfs_commit_end(cinfo.mds);
+ if (res == 0)
+ return res;
if (error < 0)
goto out_error;
if (!may_wait)
gi->gid[i] = exp->ex_anon_gid;
else
gi->gid[i] = rqgi->gid[i];
+
+ /* Each thread allocates its own gi, no race */
+ groups_sort(gi);
}
} else {
gi = get_group_info(rqgi);
an overlay which has redirects on a kernel that doesn't support this
feature will have unexpected results.
+config OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW
+ bool "Overlayfs: follow redirects even if redirects are turned off"
+ default y
+ depends on OVERLAY_FS
+ help
+ Disable this to get a possibly more secure configuration, but that
+ might not be backward compatible with previous kernels.
+
+ For more information, see Documentation/filesystems/overlayfs.txt
+
config OVERLAY_FS_INDEX
bool "Overlayfs: turn on inodes index feature by default"
depends on OVERLAY_FS
spin_unlock(&dentry->d_lock);
} else {
kfree(redirect);
- pr_warn_ratelimited("overlay: failed to set redirect (%i)\n", err);
+ pr_warn_ratelimited("overlayfs: failed to set redirect (%i)\n",
+ err);
/* Fall back to userspace copy-up */
err = -EXDEV;
}
/* Check if index is orphan and don't warn before cleaning it */
if (d_inode(index)->i_nlink == 1 &&
- ovl_get_nlink(index, origin.dentry, 0) == 0)
+ ovl_get_nlink(origin.dentry, index, 0) == 0)
err = -ENOENT;
dput(origin.dentry);
if (d.stop)
break;
+ /*
+ * Following redirects can have security consequences: it's like
+ * a symlink into the lower layer without the permission checks.
+ * This is only a problem if the upper layer is untrusted (e.g
+ * comes from an USB drive). This can allow a non-readable file
+ * or directory to become readable.
+ *
+ * Only following redirects when redirects are enabled disables
+ * this attack vector when not necessary.
+ */
+ err = -EPERM;
+ if (d.redirect && !ofs->config.redirect_follow) {
+ pr_warn_ratelimited("overlay: refusing to follow redirect for (%pd2)\n", dentry);
+ goto out_put;
+ }
+
if (d.redirect && d.redirect[0] == '/' && poe != roe) {
poe = roe;
static inline struct dentry *ovl_do_tmpfile(struct dentry *dentry, umode_t mode)
{
struct dentry *ret = vfs_tmpfile(dentry, mode, 0);
- int err = IS_ERR(ret) ? PTR_ERR(ret) : 0;
+ int err = PTR_ERR_OR_ZERO(ret);
pr_debug("tmpfile(%pd2, 0%o) = %i\n", dentry, mode, err);
return ret;
char *workdir;
bool default_permissions;
bool redirect_dir;
+ bool redirect_follow;
+ const char *redirect_mode;
bool index;
};
return err;
fail:
- pr_warn_ratelimited("overlay: failed to look up (%s) for ino (%i)\n",
+ pr_warn_ratelimited("overlayfs: failed to look up (%s) for ino (%i)\n",
p->name, err);
goto out;
}
return PTR_ERR(rdt.cache);
}
- return iterate_dir(od->realfile, &rdt.ctx);
+ err = iterate_dir(od->realfile, &rdt.ctx);
+ ctx->pos = rdt.ctx.pos;
+
+ return err;
}
MODULE_PARM_DESC(ovl_redirect_dir_def,
"Default to on or off for the redirect_dir feature");
+static bool ovl_redirect_always_follow =
+ IS_ENABLED(CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW);
+module_param_named(redirect_always_follow, ovl_redirect_always_follow,
+ bool, 0644);
+MODULE_PARM_DESC(ovl_redirect_always_follow,
+ "Follow redirects even if redirect_dir feature is turned off");
+
static bool ovl_index_def = IS_ENABLED(CONFIG_OVERLAY_FS_INDEX);
module_param_named(index, ovl_index_def, bool, 0644);
MODULE_PARM_DESC(ovl_index_def,
kfree(ofs->config.lowerdir);
kfree(ofs->config.upperdir);
kfree(ofs->config.workdir);
+ kfree(ofs->config.redirect_mode);
if (ofs->creator_cred)
put_cred(ofs->creator_cred);
kfree(ofs);
ovl_free_fs(ofs);
}
+/* Sync real dirty inodes in upper filesystem (if it exists) */
static int ovl_sync_fs(struct super_block *sb, int wait)
{
struct ovl_fs *ofs = sb->s_fs_info;
if (!ofs->upper_mnt)
return 0;
- upper_sb = ofs->upper_mnt->mnt_sb;
- if (!upper_sb->s_op->sync_fs)
+
+ /*
+ * If this is a sync(2) call or an emergency sync, all the super blocks
+ * will be iterated, including upper_sb, so no need to do anything.
+ *
+ * If this is a syncfs(2) call, then we do need to call
+ * sync_filesystem() on upper_sb, but enough if we do it when being
+ * called with wait == 1.
+ */
+ if (!wait)
return 0;
- /* real inodes have already been synced by sync_filesystem(ovl_sb) */
+ upper_sb = ofs->upper_mnt->mnt_sb;
+
down_read(&upper_sb->s_umount);
- ret = upper_sb->s_op->sync_fs(upper_sb, wait);
+ ret = sync_filesystem(upper_sb);
up_read(&upper_sb->s_umount);
+
return ret;
}
return (!ofs->upper_mnt || !ofs->workdir);
}
+static const char *ovl_redirect_mode_def(void)
+{
+ return ovl_redirect_dir_def ? "on" : "off";
+}
+
/**
* ovl_show_options
*
}
if (ofs->config.default_permissions)
seq_puts(m, ",default_permissions");
- if (ofs->config.redirect_dir != ovl_redirect_dir_def)
- seq_printf(m, ",redirect_dir=%s",
- ofs->config.redirect_dir ? "on" : "off");
+ if (strcmp(ofs->config.redirect_mode, ovl_redirect_mode_def()) != 0)
+ seq_printf(m, ",redirect_dir=%s", ofs->config.redirect_mode);
if (ofs->config.index != ovl_index_def)
- seq_printf(m, ",index=%s",
- ofs->config.index ? "on" : "off");
+ seq_printf(m, ",index=%s", ofs->config.index ? "on" : "off");
return 0;
}
OPT_UPPERDIR,
OPT_WORKDIR,
OPT_DEFAULT_PERMISSIONS,
- OPT_REDIRECT_DIR_ON,
- OPT_REDIRECT_DIR_OFF,
+ OPT_REDIRECT_DIR,
OPT_INDEX_ON,
OPT_INDEX_OFF,
OPT_ERR,
{OPT_UPPERDIR, "upperdir=%s"},
{OPT_WORKDIR, "workdir=%s"},
{OPT_DEFAULT_PERMISSIONS, "default_permissions"},
- {OPT_REDIRECT_DIR_ON, "redirect_dir=on"},
- {OPT_REDIRECT_DIR_OFF, "redirect_dir=off"},
+ {OPT_REDIRECT_DIR, "redirect_dir=%s"},
{OPT_INDEX_ON, "index=on"},
{OPT_INDEX_OFF, "index=off"},
{OPT_ERR, NULL}
return sbegin;
}
+static int ovl_parse_redirect_mode(struct ovl_config *config, const char *mode)
+{
+ if (strcmp(mode, "on") == 0) {
+ config->redirect_dir = true;
+ /*
+ * Does not make sense to have redirect creation without
+ * redirect following.
+ */
+ config->redirect_follow = true;
+ } else if (strcmp(mode, "follow") == 0) {
+ config->redirect_follow = true;
+ } else if (strcmp(mode, "off") == 0) {
+ if (ovl_redirect_always_follow)
+ config->redirect_follow = true;
+ } else if (strcmp(mode, "nofollow") != 0) {
+ pr_err("overlayfs: bad mount option \"redirect_dir=%s\"\n",
+ mode);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int ovl_parse_opt(char *opt, struct ovl_config *config)
{
char *p;
+ config->redirect_mode = kstrdup(ovl_redirect_mode_def(), GFP_KERNEL);
+ if (!config->redirect_mode)
+ return -ENOMEM;
+
while ((p = ovl_next_opt(&opt)) != NULL) {
int token;
substring_t args[MAX_OPT_ARGS];
config->default_permissions = true;
break;
- case OPT_REDIRECT_DIR_ON:
- config->redirect_dir = true;
- break;
-
- case OPT_REDIRECT_DIR_OFF:
- config->redirect_dir = false;
+ case OPT_REDIRECT_DIR:
+ kfree(config->redirect_mode);
+ config->redirect_mode = match_strdup(&args[0]);
+ if (!config->redirect_mode)
+ return -ENOMEM;
break;
case OPT_INDEX_ON:
config->workdir = NULL;
}
- return 0;
+ return ovl_parse_redirect_mode(config, config->redirect_mode);
}
#define OVL_WORKDIR_NAME "work"
if (!cred)
goto out_err;
- ofs->config.redirect_dir = ovl_redirect_dir_def;
ofs->config.index = ovl_index_def;
err = ovl_parse_opt((char *) data, &ofs->config);
if (err)
INIT_LIST_HEAD(&s->s_mounts);
s->s_user_ns = get_user_ns(user_ns);
+ init_rwsem(&s->s_umount);
+ lockdep_set_class(&s->s_umount, &type->s_umount_key);
+ /*
+ * sget() can have s_umount recursion.
+ *
+ * When it cannot find a suitable sb, it allocates a new
+ * one (this one), and tries again to find a suitable old
+ * one.
+ *
+ * In case that succeeds, it will acquire the s_umount
+ * lock of the old one. Since these are clearly distrinct
+ * locks, and this object isn't exposed yet, there's no
+ * risk of deadlocks.
+ *
+ * Annotate this by putting this lock in a different
+ * subclass.
+ */
+ down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING);
if (security_sb_alloc(s))
goto fail;
goto fail;
if (list_lru_init_memcg(&s->s_inode_lru))
goto fail;
-
- init_rwsem(&s->s_umount);
- lockdep_set_class(&s->s_umount, &type->s_umount_key);
- /*
- * sget() can have s_umount recursion.
- *
- * When it cannot find a suitable sb, it allocates a new
- * one (this one), and tries again to find a suitable old
- * one.
- *
- * In case that succeeds, it will acquire the s_umount
- * lock of the old one. Since these are clearly distrinct
- * locks, and this object isn't exposed yet, there's no
- * risk of deadlocks.
- *
- * Annotate this by putting this lock in a different
- * subclass.
- */
- down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING);
s->s_count = 1;
atomic_set(&s->s_active, 1);
mutex_init(&s->s_vfs_rename_mutex);
hlist_add_head(&s->s_instances, &type->fs_supers);
spin_unlock(&sb_lock);
get_filesystem(type);
- register_shrinker(&s->s_shrink);
+ err = register_shrinker(&s->s_shrink);
+ if (err) {
+ deactivate_locked_super(s);
+ s = ERR_PTR(err);
+ }
return s;
}
static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
struct userfaultfd_wait_queue *ewq)
{
+ struct userfaultfd_ctx *release_new_ctx;
+
if (WARN_ON_ONCE(current->flags & PF_EXITING))
goto out;
ewq->ctx = ctx;
init_waitqueue_entry(&ewq->wq, current);
+ release_new_ctx = NULL;
spin_lock(&ctx->event_wqh.lock);
/*
new = (struct userfaultfd_ctx *)
(unsigned long)
ewq->msg.arg.reserved.reserved1;
-
- userfaultfd_ctx_put(new);
+ release_new_ctx = new;
}
break;
}
__set_current_state(TASK_RUNNING);
spin_unlock(&ctx->event_wqh.lock);
+ if (release_new_ctx) {
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = release_new_ctx->mm;
+
+ /* the various vma->vm_userfaultfd_ctx still points to it */
+ down_write(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next)
+ if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx)
+ vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
+ up_write(&mm->mmap_sem);
+
+ userfaultfd_ctx_put(release_new_ctx);
+ }
+
/*
* ctx may go away after this if the userfault pseudo fd is
* already released.
ASSERT(args->agbno % args->alignment == 0);
/* if not file data, insert new block into the reverse map btree */
- if (args->oinfo.oi_owner != XFS_RMAP_OWN_UNKNOWN) {
+ if (!xfs_rmap_should_skip_owner_update(&args->oinfo)) {
error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
args->agbno, args->len, &args->oinfo);
if (error)
bno_cur = cnt_cur = NULL;
mp = tp->t_mountp;
- if (oinfo->oi_owner != XFS_RMAP_OWN_UNKNOWN) {
+ if (!xfs_rmap_should_skip_owner_update(oinfo)) {
error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo);
if (error)
goto error0;
int flags)
{
struct xfs_mount *mp = dp->i_mount;
+ struct xfs_buf *leaf_bp = NULL;
struct xfs_da_args args;
struct xfs_defer_ops dfops;
struct xfs_trans_res tres;
* GROT: another possible req'mt for a double-split btree op.
*/
xfs_defer_init(args.dfops, args.firstblock);
- error = xfs_attr_shortform_to_leaf(&args);
+ error = xfs_attr_shortform_to_leaf(&args, &leaf_bp);
if (error)
goto out_defer_cancel;
+ /*
+ * Prevent the leaf buffer from being unlocked so that a
+ * concurrent AIL push cannot grab the half-baked leaf
+ * buffer and run into problems with the write verifier.
+ */
+ xfs_trans_bhold(args.trans, leaf_bp);
+ xfs_defer_bjoin(args.dfops, leaf_bp);
xfs_defer_ijoin(args.dfops, dp);
error = xfs_defer_finish(&args.trans, args.dfops);
if (error)
/*
* Commit the leaf transformation. We'll need another (linked)
- * transaction to add the new attribute to the leaf.
+ * transaction to add the new attribute to the leaf, which
+ * means that we have to hold & join the leaf buffer here too.
*/
-
error = xfs_trans_roll_inode(&args.trans, dp);
if (error)
goto out;
-
+ xfs_trans_bjoin(args.trans, leaf_bp);
+ leaf_bp = NULL;
}
if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
out_defer_cancel:
xfs_defer_cancel(&dfops);
- args.trans = NULL;
out:
+ if (leaf_bp)
+ xfs_trans_brelse(args.trans, leaf_bp);
if (args.trans)
xfs_trans_cancel(args.trans);
xfs_iunlock(dp, XFS_ILOCK_EXCL);
}
/*
- * Convert from using the shortform to the leaf.
+ * Convert from using the shortform to the leaf. On success, return the
+ * buffer so that we can keep it locked until we're totally done with it.
*/
int
-xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
+xfs_attr_shortform_to_leaf(
+ struct xfs_da_args *args,
+ struct xfs_buf **leaf_bp)
{
xfs_inode_t *dp;
xfs_attr_shortform_t *sf;
sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
}
error = 0;
-
+ *leaf_bp = bp;
out:
kmem_free(tmpbuffer);
return error;
void xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff);
int xfs_attr_shortform_lookup(struct xfs_da_args *args);
int xfs_attr_shortform_getvalue(struct xfs_da_args *args);
-int xfs_attr_shortform_to_leaf(struct xfs_da_args *args);
+int xfs_attr_shortform_to_leaf(struct xfs_da_args *args,
+ struct xfs_buf **leaf_bp);
int xfs_attr_shortform_remove(struct xfs_da_args *args);
int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes);
* blowing out the transaction with a mix of EFIs and reflink
* adjustments.
*/
- if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK)
+ if (tp && xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK)
max_len = min(len, xfs_refcount_max_unmap(tp->t_log_res));
else
max_len = len;
for (i = 0; i < XFS_DEFER_OPS_NR_INODES && dop->dop_inodes[i]; i++)
xfs_trans_log_inode(*tp, dop->dop_inodes[i], XFS_ILOG_CORE);
+ /* Hold the (previously bjoin'd) buffer locked across the roll. */
+ for (i = 0; i < XFS_DEFER_OPS_NR_BUFS && dop->dop_bufs[i]; i++)
+ xfs_trans_dirty_buf(*tp, dop->dop_bufs[i]);
+
trace_xfs_defer_trans_roll((*tp)->t_mountp, dop);
/* Roll the transaction. */
for (i = 0; i < XFS_DEFER_OPS_NR_INODES && dop->dop_inodes[i]; i++)
xfs_trans_ijoin(*tp, dop->dop_inodes[i], 0);
+ /* Rejoin the buffers and dirty them so the log moves forward. */
+ for (i = 0; i < XFS_DEFER_OPS_NR_BUFS && dop->dop_bufs[i]; i++) {
+ xfs_trans_bjoin(*tp, dop->dop_bufs[i]);
+ xfs_trans_bhold(*tp, dop->dop_bufs[i]);
+ }
+
return error;
}
}
}
+ ASSERT(0);
+ return -EFSCORRUPTED;
+}
+
+/*
+ * Add this buffer to the deferred op. Each joined buffer is relogged
+ * each time we roll the transaction.
+ */
+int
+xfs_defer_bjoin(
+ struct xfs_defer_ops *dop,
+ struct xfs_buf *bp)
+{
+ int i;
+
+ for (i = 0; i < XFS_DEFER_OPS_NR_BUFS; i++) {
+ if (dop->dop_bufs[i] == bp)
+ return 0;
+ else if (dop->dop_bufs[i] == NULL) {
+ dop->dop_bufs[i] = bp;
+ return 0;
+ }
+ }
+
+ ASSERT(0);
return -EFSCORRUPTED;
}
struct xfs_defer_ops *dop,
xfs_fsblock_t *fbp)
{
- dop->dop_committed = false;
- dop->dop_low = false;
- memset(&dop->dop_inodes, 0, sizeof(dop->dop_inodes));
+ memset(dop, 0, sizeof(struct xfs_defer_ops));
*fbp = NULLFSBLOCK;
INIT_LIST_HEAD(&dop->dop_intake);
INIT_LIST_HEAD(&dop->dop_pending);
};
#define XFS_DEFER_OPS_NR_INODES 2 /* join up to two inodes */
+#define XFS_DEFER_OPS_NR_BUFS 2 /* join up to two buffers */
struct xfs_defer_ops {
bool dop_committed; /* did any trans commit? */
struct list_head dop_intake; /* unlogged pending work */
struct list_head dop_pending; /* logged pending work */
- /* relog these inodes with each roll */
+ /* relog these with each roll */
struct xfs_inode *dop_inodes[XFS_DEFER_OPS_NR_INODES];
+ struct xfs_buf *dop_bufs[XFS_DEFER_OPS_NR_BUFS];
};
void xfs_defer_add(struct xfs_defer_ops *dop, enum xfs_defer_ops_type type,
void xfs_defer_init(struct xfs_defer_ops *dop, xfs_fsblock_t *fbp);
bool xfs_defer_has_unfinished_work(struct xfs_defer_ops *dop);
int xfs_defer_ijoin(struct xfs_defer_ops *dop, struct xfs_inode *ip);
+int xfs_defer_bjoin(struct xfs_defer_ops *dop, struct xfs_buf *bp);
/* Description of a deferred type. */
struct xfs_defer_op_type {
xfs_ialloc_ag_select(
xfs_trans_t *tp, /* transaction pointer */
xfs_ino_t parent, /* parent directory inode number */
- umode_t mode, /* bits set to indicate file type */
- int okalloc) /* ok to allocate more space */
+ umode_t mode) /* bits set to indicate file type */
{
xfs_agnumber_t agcount; /* number of ag's in the filesystem */
xfs_agnumber_t agno; /* current ag number */
return agno;
}
- if (!okalloc)
- goto nextag;
-
if (!pag->pagf_init) {
error = xfs_alloc_pagf_init(mp, tp, agno, flags);
if (error)
struct xfs_trans *tp,
xfs_ino_t parent,
umode_t mode,
- int okalloc,
struct xfs_buf **IO_agbp,
xfs_ino_t *inop)
{
int noroom = 0;
xfs_agnumber_t start_agno;
struct xfs_perag *pag;
+ int okalloc = 1;
if (*IO_agbp) {
/*
* We do not have an agbp, so select an initial allocation
* group for inode allocation.
*/
- start_agno = xfs_ialloc_ag_select(tp, parent, mode, okalloc);
+ start_agno = xfs_ialloc_ag_select(tp, parent, mode);
if (start_agno == NULLAGNUMBER) {
*inop = NULLFSINO;
return 0;
struct xfs_trans *tp, /* transaction pointer */
xfs_ino_t parent, /* parent inode (directory) */
umode_t mode, /* mode bits for new inode */
- int okalloc, /* ok to allocate more space */
struct xfs_buf **agbp, /* buf for a.g. inode header */
xfs_ino_t *inop); /* inode number allocated */
struct xfs_iext_leaf *new = NULL;
int nr_entries, i;
- trace_xfs_iext_insert(ip, cur, state, _RET_IP_);
-
if (ifp->if_height == 0)
xfs_iext_alloc_root(ifp, cur);
else if (ifp->if_height == 1)
xfs_iext_set(cur_rec(cur), irec);
ifp->if_bytes += sizeof(struct xfs_iext_rec);
+ trace_xfs_iext_insert(ip, cur, state, _RET_IP_);
+
if (new)
xfs_iext_insert_node(ifp, xfs_iext_leaf_key(new, 0), new, 2);
}
xfs_extlen_t aglen,
struct xfs_defer_ops *dfops)
{
- int error;
-
trace_xfs_refcount_cow_increase(rcur->bc_mp, rcur->bc_private.a.agno,
agbno, aglen);
/* Add refcount btree reservation */
- error = xfs_refcount_adjust_cow(rcur, agbno, aglen,
+ return xfs_refcount_adjust_cow(rcur, agbno, aglen,
XFS_REFCOUNT_ADJUST_COW_ALLOC, dfops);
- if (error)
- return error;
-
- /* Add rmap entry */
- if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) {
- error = xfs_rmap_alloc_extent(rcur->bc_mp, dfops,
- rcur->bc_private.a.agno,
- agbno, aglen, XFS_RMAP_OWN_COW);
- if (error)
- return error;
- }
-
- return error;
}
/*
xfs_extlen_t aglen,
struct xfs_defer_ops *dfops)
{
- int error;
-
trace_xfs_refcount_cow_decrease(rcur->bc_mp, rcur->bc_private.a.agno,
agbno, aglen);
/* Remove refcount btree reservation */
- error = xfs_refcount_adjust_cow(rcur, agbno, aglen,
+ return xfs_refcount_adjust_cow(rcur, agbno, aglen,
XFS_REFCOUNT_ADJUST_COW_FREE, dfops);
- if (error)
- return error;
-
- /* Remove rmap entry */
- if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) {
- error = xfs_rmap_free_extent(rcur->bc_mp, dfops,
- rcur->bc_private.a.agno,
- agbno, aglen, XFS_RMAP_OWN_COW);
- if (error)
- return error;
- }
-
- return error;
}
/* Record a CoW staging extent in the refcount btree. */
xfs_fsblock_t fsb,
xfs_extlen_t len)
{
+ int error;
+
if (!xfs_sb_version_hasreflink(&mp->m_sb))
return 0;
- return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW,
+ error = __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW,
fsb, len);
+ if (error)
+ return error;
+
+ /* Add rmap entry */
+ return xfs_rmap_alloc_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb),
+ XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW);
}
/* Forget a CoW staging event in the refcount btree. */
xfs_fsblock_t fsb,
xfs_extlen_t len)
{
+ int error;
+
if (!xfs_sb_version_hasreflink(&mp->m_sb))
return 0;
+ /* Remove rmap entry */
+ error = xfs_rmap_free_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb),
+ XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW);
+ if (error)
+ return error;
+
return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_FREE_COW,
fsb, len);
}
return error;
}
+/*
+ * Perform all the relevant owner checks for a removal op. If we're doing an
+ * unknown-owner removal then we have no owner information to check.
+ */
+static int
+xfs_rmap_free_check_owner(
+ struct xfs_mount *mp,
+ uint64_t ltoff,
+ struct xfs_rmap_irec *rec,
+ xfs_fsblock_t bno,
+ xfs_filblks_t len,
+ uint64_t owner,
+ uint64_t offset,
+ unsigned int flags)
+{
+ int error = 0;
+
+ if (owner == XFS_RMAP_OWN_UNKNOWN)
+ return 0;
+
+ /* Make sure the unwritten flag matches. */
+ XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) ==
+ (rec->rm_flags & XFS_RMAP_UNWRITTEN), out);
+
+ /* Make sure the owner matches what we expect to find in the tree. */
+ XFS_WANT_CORRUPTED_GOTO(mp, owner == rec->rm_owner, out);
+
+ /* Check the offset, if necessary. */
+ if (XFS_RMAP_NON_INODE_OWNER(owner))
+ goto out;
+
+ if (flags & XFS_RMAP_BMBT_BLOCK) {
+ XFS_WANT_CORRUPTED_GOTO(mp, rec->rm_flags & XFS_RMAP_BMBT_BLOCK,
+ out);
+ } else {
+ XFS_WANT_CORRUPTED_GOTO(mp, rec->rm_offset <= offset, out);
+ XFS_WANT_CORRUPTED_GOTO(mp,
+ ltoff + rec->rm_blockcount >= offset + len,
+ out);
+ }
+
+out:
+ return error;
+}
+
/*
* Find the extent in the rmap btree and remove it.
*
goto out_done;
}
- /* Make sure the unwritten flag matches. */
- XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) ==
- (ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error);
+ /*
+ * If we're doing an unknown-owner removal for EFI recovery, we expect
+ * to find the full range in the rmapbt or nothing at all. If we
+ * don't find any rmaps overlapping either end of the range, we're
+ * done. Hopefully this means that the EFI creator already queued
+ * (and finished) a RUI to remove the rmap.
+ */
+ if (owner == XFS_RMAP_OWN_UNKNOWN &&
+ ltrec.rm_startblock + ltrec.rm_blockcount <= bno) {
+ struct xfs_rmap_irec rtrec;
+
+ error = xfs_btree_increment(cur, 0, &i);
+ if (error)
+ goto out_error;
+ if (i == 0)
+ goto out_done;
+ error = xfs_rmap_get_rec(cur, &rtrec, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ if (rtrec.rm_startblock >= bno + len)
+ goto out_done;
+ }
/* Make sure the extent we found covers the entire freeing range. */
XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno &&
- ltrec.rm_startblock + ltrec.rm_blockcount >=
- bno + len, out_error);
+ ltrec.rm_startblock + ltrec.rm_blockcount >=
+ bno + len, out_error);
- /* Make sure the owner matches what we expect to find in the tree. */
- XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
- XFS_RMAP_NON_INODE_OWNER(owner), out_error);
-
- /* Check the offset, if necessary. */
- if (!XFS_RMAP_NON_INODE_OWNER(owner)) {
- if (flags & XFS_RMAP_BMBT_BLOCK) {
- XFS_WANT_CORRUPTED_GOTO(mp,
- ltrec.rm_flags & XFS_RMAP_BMBT_BLOCK,
- out_error);
- } else {
- XFS_WANT_CORRUPTED_GOTO(mp,
- ltrec.rm_offset <= offset, out_error);
- XFS_WANT_CORRUPTED_GOTO(mp,
- ltoff + ltrec.rm_blockcount >= offset + len,
- out_error);
- }
- }
+ /* Check owner information. */
+ error = xfs_rmap_free_check_owner(mp, ltoff, <rec, bno, len, owner,
+ offset, flags);
+ if (error)
+ goto out_error;
if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
/* exact match, simply remove the record from rmap tree */
flags |= XFS_RMAP_UNWRITTEN;
trace_xfs_rmap_map(mp, cur->bc_private.a.agno, bno, len,
unwritten, oinfo);
+ ASSERT(!xfs_rmap_should_skip_owner_update(oinfo));
/*
* For the initial lookup, look for an exact match or the left-adjacent
xfs_rmap_skip_owner_update(
struct xfs_owner_info *oi)
{
- oi->oi_owner = XFS_RMAP_OWN_UNKNOWN;
+ xfs_rmap_ag_owner(oi, XFS_RMAP_OWN_NULL);
+}
+
+static inline bool
+xfs_rmap_should_skip_owner_update(
+ struct xfs_owner_info *oi)
+{
+ return oi->oi_owner == XFS_RMAP_OWN_NULL;
+}
+
+static inline void
+xfs_rmap_any_owner_update(
+ struct xfs_owner_info *oi)
+{
+ xfs_rmap_ag_owner(oi, XFS_RMAP_OWN_UNKNOWN);
}
/* Reverse mapping functions. */
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
-#include "scrub/scrub.h"
#include "scrub/btree.h"
/*
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_da_format.h"
-#include "xfs_defer.h"
#include "xfs_inode.h"
#include "xfs_btree.h"
#include "xfs_trans.h"
(ip->i_df.if_flags & XFS_IFEXTENTS));
ASSERT(offset <= mp->m_super->s_maxbytes);
- if ((xfs_ufsize_t)offset + count > mp->m_super->s_maxbytes)
+ if (offset > mp->m_super->s_maxbytes - count)
count = mp->m_super->s_maxbytes - offset;
end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
offset_fsb = XFS_B_TO_FSBT(mp, offset);
lockmode = xfs_ilock_data_map_shared(ip);
ASSERT(offset <= mp->m_super->s_maxbytes);
- if ((xfs_ufsize_t)offset + size > mp->m_super->s_maxbytes)
+ if (offset > mp->m_super->s_maxbytes - size)
size = mp->m_super->s_maxbytes - offset;
end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + size);
offset_fsb = XFS_B_TO_FSBT(mp, offset);
return error;
efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
- xfs_rmap_skip_owner_update(&oinfo);
+ xfs_rmap_any_owner_update(&oinfo);
for (i = 0; i < efip->efi_format.efi_nextents; i++) {
extp = &efip->efi_format.efi_extents[i];
error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
* this doesn't actually exist in the rmap btree.
*/
xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
+ error = xfs_rmap_free(tp, bp, agno,
+ be32_to_cpu(agf->agf_length) - new,
+ new, &oinfo);
+ if (error)
+ goto error0;
error = xfs_free_extent(tp,
XFS_AGB_TO_FSB(mp, agno,
be32_to_cpu(agf->agf_length) - new),
* based on the 'speculative_cow_prealloc_lifetime' tunable (5m by default).
* (We'll just piggyback on the post-EOF prealloc space workqueue.)
*/
-STATIC void
+void
xfs_queue_cowblocks(
struct xfs_mount *mp)
{
return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_eofblocks);
}
+static inline unsigned long
+xfs_iflag_for_tag(
+ int tag)
+{
+ switch (tag) {
+ case XFS_ICI_EOFBLOCKS_TAG:
+ return XFS_IEOFBLOCKS;
+ case XFS_ICI_COWBLOCKS_TAG:
+ return XFS_ICOWBLOCKS;
+ default:
+ ASSERT(0);
+ return 0;
+ }
+}
+
static void
-__xfs_inode_set_eofblocks_tag(
+__xfs_inode_set_blocks_tag(
xfs_inode_t *ip,
void (*execute)(struct xfs_mount *mp),
void (*set_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
* Don't bother locking the AG and looking up in the radix trees
* if we already know that we have the tag set.
*/
- if (ip->i_flags & XFS_IEOFBLOCKS)
+ if (ip->i_flags & xfs_iflag_for_tag(tag))
return;
spin_lock(&ip->i_flags_lock);
- ip->i_flags |= XFS_IEOFBLOCKS;
+ ip->i_flags |= xfs_iflag_for_tag(tag);
spin_unlock(&ip->i_flags_lock);
pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
xfs_inode_t *ip)
{
trace_xfs_inode_set_eofblocks_tag(ip);
- return __xfs_inode_set_eofblocks_tag(ip, xfs_queue_eofblocks,
+ return __xfs_inode_set_blocks_tag(ip, xfs_queue_eofblocks,
trace_xfs_perag_set_eofblocks,
XFS_ICI_EOFBLOCKS_TAG);
}
static void
-__xfs_inode_clear_eofblocks_tag(
+__xfs_inode_clear_blocks_tag(
xfs_inode_t *ip,
void (*clear_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
int error, unsigned long caller_ip),
struct xfs_perag *pag;
spin_lock(&ip->i_flags_lock);
- ip->i_flags &= ~XFS_IEOFBLOCKS;
+ ip->i_flags &= ~xfs_iflag_for_tag(tag);
spin_unlock(&ip->i_flags_lock);
pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
xfs_inode_t *ip)
{
trace_xfs_inode_clear_eofblocks_tag(ip);
- return __xfs_inode_clear_eofblocks_tag(ip,
+ return __xfs_inode_clear_blocks_tag(ip,
trace_xfs_perag_clear_eofblocks, XFS_ICI_EOFBLOCKS_TAG);
}
xfs_inode_t *ip)
{
trace_xfs_inode_set_cowblocks_tag(ip);
- return __xfs_inode_set_eofblocks_tag(ip, xfs_queue_cowblocks,
+ return __xfs_inode_set_blocks_tag(ip, xfs_queue_cowblocks,
trace_xfs_perag_set_cowblocks,
XFS_ICI_COWBLOCKS_TAG);
}
xfs_inode_t *ip)
{
trace_xfs_inode_clear_cowblocks_tag(ip);
- return __xfs_inode_clear_eofblocks_tag(ip,
+ return __xfs_inode_clear_blocks_tag(ip,
trace_xfs_perag_clear_cowblocks, XFS_ICI_COWBLOCKS_TAG);
}
int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
void xfs_cowblocks_worker(struct work_struct *);
+void xfs_queue_cowblocks(struct xfs_mount *);
int xfs_inode_ag_iterator(struct xfs_mount *mp,
int (*execute)(struct xfs_inode *ip, int flags, void *args),
xfs_nlink_t nlink,
dev_t rdev,
prid_t prid,
- int okalloc,
xfs_buf_t **ialloc_context,
xfs_inode_t **ipp)
{
* Call the space management code to pick
* the on-disk inode to be allocated.
*/
- error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode, okalloc,
+ error = xfs_dialloc(tp, pip ? pip->i_ino : 0, mode,
ialloc_context, &ino);
if (error)
return error;
xfs_nlink_t nlink,
dev_t rdev,
prid_t prid, /* project id */
- int okalloc, /* ok to allocate new space */
xfs_inode_t **ipp, /* pointer to inode; it will be
locked. */
int *committed)
* transaction commit so that no other process can steal
* the inode(s) that we've just allocated.
*/
- code = xfs_ialloc(tp, dp, mode, nlink, rdev, prid, okalloc,
- &ialloc_context, &ip);
+ code = xfs_ialloc(tp, dp, mode, nlink, rdev, prid, &ialloc_context,
+ &ip);
/*
* Return an error if we were unable to allocate a new inode.
* this call should always succeed.
*/
code = xfs_ialloc(tp, dp, mode, nlink, rdev, prid,
- okalloc, &ialloc_context, &ip);
+ &ialloc_context, &ip);
/*
* If we get an error at this point, return to the caller
xfs_flush_inodes(mp);
error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
}
- if (error == -ENOSPC) {
- /* No space at all so try a "no-allocation" reservation */
- resblks = 0;
- error = xfs_trans_alloc(mp, tres, 0, 0, 0, &tp);
- }
if (error)
goto out_release_inode;
if (error)
goto out_trans_cancel;
- if (!resblks) {
- error = xfs_dir_canenter(tp, dp, name);
- if (error)
- goto out_trans_cancel;
- }
-
/*
* A newly created regular or special file just has one directory
* entry pointing to them, but a directory also the "." entry
* pointing to itself.
*/
- error = xfs_dir_ialloc(&tp, dp, mode, is_dir ? 2 : 1, rdev,
- prid, resblks > 0, &ip, NULL);
+ error = xfs_dir_ialloc(&tp, dp, mode, is_dir ? 2 : 1, rdev, prid, &ip,
+ NULL);
if (error)
goto out_trans_cancel;
tres = &M_RES(mp)->tr_create_tmpfile;
error = xfs_trans_alloc(mp, tres, resblks, 0, 0, &tp);
- if (error == -ENOSPC) {
- /* No space at all so try a "no-allocation" reservation */
- resblks = 0;
- error = xfs_trans_alloc(mp, tres, 0, 0, 0, &tp);
- }
if (error)
goto out_release_inode;
if (error)
goto out_trans_cancel;
- error = xfs_dir_ialloc(&tp, dp, mode, 1, 0,
- prid, resblks > 0, &ip, NULL);
+ error = xfs_dir_ialloc(&tp, dp, mode, 1, 0, prid, &ip, NULL);
if (error)
goto out_trans_cancel;
return error;
}
+/* Clear the reflink flag and the cowblocks tag if possible. */
+static void
+xfs_itruncate_clear_reflink_flags(
+ struct xfs_inode *ip)
+{
+ struct xfs_ifork *dfork;
+ struct xfs_ifork *cfork;
+
+ if (!xfs_is_reflink_inode(ip))
+ return;
+ dfork = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+ cfork = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+ if (dfork->if_bytes == 0 && cfork->if_bytes == 0)
+ ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+ if (cfork->if_bytes == 0)
+ xfs_inode_clear_cowblocks_tag(ip);
+}
+
/*
* Free up the underlying blocks past new_size. The new size must be smaller
* than the current size. This routine can be used both for the attribute and
if (error)
goto out;
- /*
- * Clear the reflink flag if there are no data fork blocks and
- * there are no extents staged in the cow fork.
- */
- if (xfs_is_reflink_inode(ip) && ip->i_cnextents == 0) {
- if (ip->i_d.di_nblocks == 0)
- ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
- xfs_inode_clear_cowblocks_tag(ip);
- }
+ xfs_itruncate_clear_reflink_flags(ip);
/*
* Always re-log the inode so that our permanent transaction can keep
* log recovery to replay a bmap operation on the inode.
*/
#define XFS_IRECOVERY (1 << 11)
+#define XFS_ICOWBLOCKS (1 << 12)/* has the cowblocks tag set */
/*
* Per-lifetime flags need to be reset when re-using a reclaimable inode during
xfs_extlen_t xfs_get_cowextsz_hint(struct xfs_inode *ip);
int xfs_dir_ialloc(struct xfs_trans **, struct xfs_inode *, umode_t,
- xfs_nlink_t, dev_t, prid_t, int,
+ xfs_nlink_t, dev_t, prid_t,
struct xfs_inode **, int *);
/* from xfs_file.c */
}
ASSERT(offset <= mp->m_super->s_maxbytes);
- if ((xfs_fsize_t)offset + length > mp->m_super->s_maxbytes)
+ if (offset > mp->m_super->s_maxbytes - length)
length = mp->m_super->s_maxbytes - offset;
offset_fsb = XFS_B_TO_FSBT(mp, offset);
end_fsb = XFS_B_TO_FSB(mp, offset + length);
ASSERT(ip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL);
error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
- &nimaps, XFS_BMAPI_ENTIRE | XFS_BMAPI_ATTRFORK);
+ &nimaps, XFS_BMAPI_ATTRFORK);
out_unlock:
xfs_iunlock(ip, lockmode);
STATIC int xfs_qm_init_quotainos(xfs_mount_t *);
STATIC int xfs_qm_init_quotainfo(xfs_mount_t *);
-
+STATIC void xfs_qm_destroy_quotainos(xfs_quotainfo_t *qi);
STATIC void xfs_qm_dqfree_one(struct xfs_dquot *dqp);
/*
* We use the batch lookup interface to iterate over the dquots as it
qinf->qi_shrinker.scan_objects = xfs_qm_shrink_scan;
qinf->qi_shrinker.seeks = DEFAULT_SEEKS;
qinf->qi_shrinker.flags = SHRINKER_NUMA_AWARE;
- register_shrinker(&qinf->qi_shrinker);
+
+ error = register_shrinker(&qinf->qi_shrinker);
+ if (error)
+ goto out_free_inos;
+
return 0;
+out_free_inos:
+ mutex_destroy(&qinf->qi_quotaofflock);
+ mutex_destroy(&qinf->qi_tree_lock);
+ xfs_qm_destroy_quotainos(qinf);
out_free_lru:
list_lru_destroy(&qinf->qi_lru);
out_free_qinf:
return error;
}
-
/*
* Gets called when unmounting a filesystem or when all quotas get
* turned off.
unregister_shrinker(&qi->qi_shrinker);
list_lru_destroy(&qi->qi_lru);
-
- if (qi->qi_uquotaip) {
- IRELE(qi->qi_uquotaip);
- qi->qi_uquotaip = NULL; /* paranoia */
- }
- if (qi->qi_gquotaip) {
- IRELE(qi->qi_gquotaip);
- qi->qi_gquotaip = NULL;
- }
- if (qi->qi_pquotaip) {
- IRELE(qi->qi_pquotaip);
- qi->qi_pquotaip = NULL;
- }
+ xfs_qm_destroy_quotainos(qi);
+ mutex_destroy(&qi->qi_tree_lock);
mutex_destroy(&qi->qi_quotaofflock);
kmem_free(qi);
mp->m_quotainfo = NULL;
return error;
if (need_alloc) {
- error = xfs_dir_ialloc(&tp, NULL, S_IFREG, 1, 0, 0, 1, ip,
- &committed);
+ error = xfs_dir_ialloc(&tp, NULL, S_IFREG, 1, 0, 0, ip,
+ &committed);
if (error) {
xfs_trans_cancel(tp);
return error;
return error;
}
+STATIC void
+xfs_qm_destroy_quotainos(
+ xfs_quotainfo_t *qi)
+{
+ if (qi->qi_uquotaip) {
+ IRELE(qi->qi_uquotaip);
+ qi->qi_uquotaip = NULL; /* paranoia */
+ }
+ if (qi->qi_gquotaip) {
+ IRELE(qi->qi_gquotaip);
+ qi->qi_gquotaip = NULL;
+ }
+ if (qi->qi_pquotaip) {
+ IRELE(qi->qi_pquotaip);
+ qi->qi_pquotaip = NULL;
+ }
+}
+
STATIC void
xfs_qm_dqfree_one(
struct xfs_dquot *dqp)
#include "xfs_alloc.h"
#include "xfs_quota_defs.h"
#include "xfs_quota.h"
-#include "xfs_btree.h"
-#include "xfs_bmap_btree.h"
#include "xfs_reflink.h"
#include "xfs_iomap.h"
#include "xfs_rmap_btree.h"
if (error)
goto out_bmap_cancel;
+ xfs_inode_set_cowblocks_tag(ip);
+
/* Finish up. */
error = xfs_defer_finish(&tp, &dfops);
if (error)
struct xfs_iext_cursor icur;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
- ASSERT(xfs_is_reflink_inode(ip));
+ if (!xfs_is_reflink_inode(ip))
+ return false;
offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
return false;
/* Remove the mapping from the CoW fork. */
xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
+ } else {
+ /* Didn't do anything, push cursor back. */
+ xfs_iext_prev(ifp, &icur);
}
next_extent:
if (!xfs_iext_get_extent(ifp, &icur, &got))
(unsigned int)(end_fsb - offset_fsb),
XFS_DATA_FORK);
error = xfs_trans_alloc(ip->i_mount, &M_RES(ip->i_mount)->tr_write,
- resblks, 0, 0, &tp);
+ resblks, 0, XFS_TRANS_RESERVE, &tp);
if (error)
goto out;
trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out);
+ /*
+ * Clear out post-eof preallocations because we don't have page cache
+ * backing the delayed allocations and they'll never get freed on
+ * their own.
+ */
+ if (xfs_can_free_eofblocks(dest, true)) {
+ ret = xfs_free_eofblocks(dest);
+ if (ret)
+ goto out_unlock;
+ }
+
/* Set flags and remap blocks. */
ret = xfs_reflink_set_inode_flag(src, dest);
if (ret)
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
return error;
}
+ xfs_queue_cowblocks(mp);
/* Create the per-AG metadata reservation pool .*/
error = xfs_fs_reserve_ag_blocks(mp);
/* rw -> ro */
if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & SB_RDONLY)) {
+ /* Get rid of any leftover CoW reservations... */
+ cancel_delayed_work_sync(&mp->m_cowblocks_work);
+ error = xfs_icache_free_cowblocks(mp, NULL);
+ if (error) {
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+ return error;
+ }
+
/* Free the per-AG metadata reservation pool. */
error = xfs_fs_unreserve_ag_blocks(mp);
if (error) {
resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_symlink, resblks, 0, 0, &tp);
- if (error == -ENOSPC && fs_blocks == 0) {
- resblks = 0;
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_symlink, 0, 0, 0,
- &tp);
- }
if (error)
goto out_release_inode;
if (error)
goto out_trans_cancel;
- /*
- * Check for ability to enter directory entry, if no space reserved.
- */
- if (!resblks) {
- error = xfs_dir_canenter(tp, dp, link_name);
- if (error)
- goto out_trans_cancel;
- }
/*
* Initialize the bmap freelist prior to calling either
* bmapi or the directory create code.
* Allocate an inode for the symlink.
*/
error = xfs_dir_ialloc(&tp, dp, S_IFLNK | (mode & ~S_IFMT), 1, 0,
- prid, resblks > 0, &ip, NULL);
+ prid, &ip, NULL);
if (error)
goto out_trans_cancel;
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_da_format.h"
-#include "xfs_defer.h"
#include "xfs_inode.h"
#include "xfs_btree.h"
#include "xfs_da_btree.h"
#ifndef _ASM_GENERIC_MM_HOOKS_H
#define _ASM_GENERIC_MM_HOOKS_H
-static inline void arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+ struct mm_struct *mm)
{
+ return 0;
}
static inline void arch_exit_mmap(struct mm_struct *mm)
struct file;
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot);
+
+#ifndef CONFIG_X86_ESPFIX64
+static inline void init_espfix_bsp(void) { }
+#endif
+
#endif /* !__ASSEMBLY__ */
#ifndef io_remap_pfn_range
struct ahash_instance *inst);
void ahash_free_instance(struct crypto_instance *inst);
+int shash_no_setkey(struct crypto_shash *tfm, const u8 *key,
+ unsigned int keylen);
+
+static inline bool crypto_shash_alg_has_setkey(struct shash_alg *alg)
+{
+ return alg->setkey != shash_no_setkey;
+}
+
int crypto_init_ahash_spawn(struct crypto_ahash_spawn *spawn,
struct hash_alg_common *alg,
struct crypto_instance *inst);
#define __DRM_CONNECTOR_H__
#include <linux/list.h>
+#include <linux/llist.h>
#include <linux/ctype.h>
#include <linux/hdmi.h>
#include <drm/drm_mode_object.h>
uint16_t tile_h_size, tile_v_size;
/**
- * @free_work:
+ * @free_node:
*
- * Work used only by &drm_connector_iter to be able to clean up a
- * connector from any context.
+ * List used only by &drm_connector_iter to be able to clean up a
+ * connector from any context, in conjunction with
+ * &drm_mode_config.connector_free_work.
*/
- struct work_struct free_work;
+ struct llist_node free_node;
};
#define obj_to_connector(x) container_of(x, struct drm_connector, base)
struct edid *drm_get_edid_switcheroo(struct drm_connector *connector,
struct i2c_adapter *adapter);
struct edid *drm_edid_duplicate(const struct edid *edid);
+void drm_reset_display_info(struct drm_connector *connector);
+u32 drm_add_display_info(struct drm_connector *connector, const struct edid *edid);
int drm_add_edid_modes(struct drm_connector *connector, struct edid *edid);
u8 drm_match_cea_mode(const struct drm_display_mode *to_match);
#include <linux/types.h>
#include <linux/idr.h>
#include <linux/workqueue.h>
+#include <linux/llist.h>
#include <drm/drm_modeset_lock.h>
/**
* @connector_list_lock: Protects @num_connector and
- * @connector_list.
+ * @connector_list and @connector_free_list.
*/
spinlock_t connector_list_lock;
/**
* &struct drm_connector_list_iter to walk this list.
*/
struct list_head connector_list;
+ /**
+ * @connector_free_list:
+ *
+ * List of connector objects linked with &drm_connector.free_head.
+ * Protected by @connector_list_lock. Used by
+ * drm_for_each_connector_iter() and
+ * &struct drm_connector_list_iter to savely free connectors using
+ * @connector_free_work.
+ */
+ struct llist_head connector_free_list;
+ /**
+ * @connector_free_work: Work to clean up @connector_free_list.
+ */
+ struct work_struct connector_free_work;
+
/**
* @num_encoder:
*
bool enabled;
};
-int kvm_timer_hyp_init(void);
+int kvm_timer_hyp_init(bool);
int kvm_timer_enable(struct kvm_vcpu *vcpu);
int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
#define bio_set_dev(bio, bdev) \
do { \
+ if ((bio)->bi_disk != (bdev)->bd_disk) \
+ bio_clear_flag(bio, BIO_THROTTLED);\
(bio)->bi_disk = (bdev)->bd_disk; \
(bio)->bi_partno = (bdev)->bd_partno; \
} while (0)
struct bio {
struct bio *bi_next; /* request queue link */
struct gendisk *bi_disk;
- u8 bi_partno;
- blk_status_t bi_status;
unsigned int bi_opf; /* bottom bits req flags,
* top bits REQ_OP. Use
* accessors.
unsigned short bi_flags; /* status, etc and bvec pool number */
unsigned short bi_ioprio;
unsigned short bi_write_hint;
-
- struct bvec_iter bi_iter;
+ blk_status_t bi_status;
+ u8 bi_partno;
/* Number of segments in this BIO after
* physical address coalescing is performed.
unsigned int bi_seg_front_size;
unsigned int bi_seg_back_size;
- atomic_t __bi_remaining;
+ struct bvec_iter bi_iter;
+ atomic_t __bi_remaining;
bio_end_io_t *bi_end_io;
void *bi_private;
struct request {
struct list_head queuelist;
union {
- call_single_data_t csd;
+ struct __call_single_data csd;
u64 fifo_time;
};
struct request *next_rq;
};
+static inline bool blk_op_is_scsi(unsigned int op)
+{
+ return op == REQ_OP_SCSI_IN || op == REQ_OP_SCSI_OUT;
+}
+
+static inline bool blk_op_is_private(unsigned int op)
+{
+ return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT;
+}
+
static inline bool blk_rq_is_scsi(struct request *rq)
{
- return req_op(rq) == REQ_OP_SCSI_IN || req_op(rq) == REQ_OP_SCSI_OUT;
+ return blk_op_is_scsi(req_op(rq));
}
static inline bool blk_rq_is_private(struct request *rq)
{
- return req_op(rq) == REQ_OP_DRV_IN || req_op(rq) == REQ_OP_DRV_OUT;
+ return blk_op_is_private(req_op(rq));
}
static inline bool blk_rq_is_passthrough(struct request *rq)
return blk_rq_is_scsi(rq) || blk_rq_is_private(rq);
}
+static inline bool bio_is_passthrough(struct bio *bio)
+{
+ unsigned op = bio_op(bio);
+
+ return blk_op_is_scsi(op) || blk_op_is_private(op);
+}
+
static inline unsigned short req_get_ioprio(struct request *req)
{
return req->ioprio;
extern void blk_rq_unprep_clone(struct request *rq);
extern blk_status_t blk_insert_cloned_request(struct request_queue *q,
struct request *rq);
-extern int blk_rq_append_bio(struct request *rq, struct bio *bio);
+extern int blk_rq_append_bio(struct request *rq, struct bio **bio);
extern void blk_delay_queue(struct request_queue *, unsigned long);
extern void blk_queue_split(struct request_queue *, struct bio **);
extern void blk_recount_segments(struct request_queue *, struct bio *);
};
struct bpf_map {
- atomic_t refcnt;
+ /* 1st cacheline with read-mostly members of which some
+ * are also accessed in fast-path (e.g. ops, max_entries).
+ */
+ const struct bpf_map_ops *ops ____cacheline_aligned;
+ struct bpf_map *inner_map_meta;
+#ifdef CONFIG_SECURITY
+ void *security;
+#endif
enum bpf_map_type map_type;
u32 key_size;
u32 value_size;
u32 pages;
u32 id;
int numa_node;
- struct user_struct *user;
- const struct bpf_map_ops *ops;
- struct work_struct work;
+ bool unpriv_array;
+ /* 7 bytes hole */
+
+ /* 2nd cacheline with misc members to avoid false sharing
+ * particularly with refcounting.
+ */
+ struct user_struct *user ____cacheline_aligned;
+ atomic_t refcnt;
atomic_t usercnt;
- struct bpf_map *inner_map_meta;
+ struct work_struct work;
char name[BPF_OBJ_NAME_LEN];
-#ifdef CONFIG_SECURITY
- void *security;
-#endif
};
/* function argument constraints */
struct bpf_array {
struct bpf_map map;
u32 elem_size;
+ u32 index_mask;
/* 'ownership' of prog_array is claimed by the first program that
* is going to use this map or by the first program which FD is stored
* in the map to make sure that all callers and callees have the same
attr->numa_node : NUMA_NO_NODE;
}
+struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type);
+
#else /* !CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
{
{
return 0;
}
+
+static inline struct bpf_prog *bpf_prog_get_type_path(const char *name,
+ enum bpf_prog_type type)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
#endif /* CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
return bpf_prog_get_type_dev(ufd, type, false);
}
+bool bpf_prog_get_ok(struct bpf_prog *, enum bpf_prog_type *, bool);
+
int bpf_prog_offload_compile(struct bpf_prog *prog);
void bpf_prog_offload_destroy(struct bpf_prog *prog);
* In practice this is far bigger than any realistic pointer offset; this limit
* ensures that umax_value + (int)off + (int)size cannot overflow a u64.
*/
-#define BPF_MAX_VAR_OFF (1ULL << 31)
+#define BPF_MAX_VAR_OFF (1 << 29)
/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO]. This ensures
* that converting umax_value to int cannot overflow.
*/
-#define BPF_MAX_VAR_SIZ INT_MAX
+#define BPF_MAX_VAR_SIZ (1 << 29)
/* Liveness marks, used for registers and spilled-regs (in stack slots).
* Read marks propagate upwards until they find a write mark; they record that
/*
* Prevent the compiler from merging or refetching reads or writes. The
* compiler is also forbidden from reordering successive instances of
- * READ_ONCE, WRITE_ONCE and ACCESS_ONCE (see below), but only when the
- * compiler is aware of some particular ordering. One way to make the
- * compiler aware of ordering is to put the two invocations of READ_ONCE,
- * WRITE_ONCE or ACCESS_ONCE() in different C statements.
+ * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
+ * particular ordering. One way to make the compiler aware of ordering is to
+ * put the two invocations of READ_ONCE or WRITE_ONCE in different C
+ * statements.
*
- * In contrast to ACCESS_ONCE these two macros will also work on aggregate
- * data types like structs or unions. If the size of the accessed data
- * type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
- * READ_ONCE() and WRITE_ONCE() will fall back to memcpy(). There's at
- * least two memcpy()s: one for the __builtin_memcpy() and then one for
- * the macro doing the copy of variable - '__u' allocated on the stack.
+ * These two macros will also work on aggregate data types like structs or
+ * unions. If the size of the accessed data type exceeds the word size of
+ * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
+ * fall back to memcpy(). There's at least two memcpy()s: one for the
+ * __builtin_memcpy() and then one for the macro doing the copy of variable
+ * - '__u' allocated on the stack.
*
* Their two major use cases are: (1) Mediating communication between
* process-level code and irq/NMI handlers, all running on the same CPU,
- * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
+ * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
* mutilate accesses that either do not require ordering or that interact
* with an explicit memory barrier or atomic instruction that provides the
* required ordering.
compiletime_assert(__native_word(t), \
"Need native word sized stores/loads for atomicity.")
-/*
- * Prevent the compiler from merging or refetching accesses. The compiler
- * is also forbidden from reordering successive instances of ACCESS_ONCE(),
- * but only when the compiler is aware of some particular ordering. One way
- * to make the compiler aware of ordering is to put the two invocations of
- * ACCESS_ONCE() in different C statements.
- *
- * ACCESS_ONCE will only work on scalar types. For union types, ACCESS_ONCE
- * on a union member will work as long as the size of the member matches the
- * size of the union and the size is smaller than word size.
- *
- * The major use cases of ACCESS_ONCE used to be (1) Mediating communication
- * between process-level code and irq/NMI handlers, all running on the same CPU,
- * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
- * mutilate accesses that either do not require ordering or that interact
- * with an explicit memory barrier or atomic instruction that provides the
- * required ordering.
- *
- * If possible use READ_ONCE()/WRITE_ONCE() instead.
- */
-#define __ACCESS_ONCE(x) ({ \
- __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \
- (volatile typeof(x) *)&(x); })
-#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
-
#endif /* __LINUX_COMPILER_H */
*/
#include <linux/wait.h>
-#ifdef CONFIG_LOCKDEP_COMPLETIONS
-#include <linux/lockdep.h>
-#endif
/*
* struct completion - structure used to maintain state for a "completion"
struct completion {
unsigned int done;
wait_queue_head_t wait;
-#ifdef CONFIG_LOCKDEP_COMPLETIONS
- struct lockdep_map_cross map;
-#endif
};
-#ifdef CONFIG_LOCKDEP_COMPLETIONS
-static inline void complete_acquire(struct completion *x)
-{
- lock_acquire_exclusive((struct lockdep_map *)&x->map, 0, 0, NULL, _RET_IP_);
-}
-
-static inline void complete_release(struct completion *x)
-{
- lock_release((struct lockdep_map *)&x->map, 0, _RET_IP_);
-}
-
-static inline void complete_release_commit(struct completion *x)
-{
- lock_commit_crosslock((struct lockdep_map *)&x->map);
-}
-
-#define init_completion_map(x, m) \
-do { \
- lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
- (m)->name, (m)->key, 0); \
- __init_completion(x); \
-} while (0)
-
-#define init_completion(x) \
-do { \
- static struct lock_class_key __key; \
- lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
- "(completion)" #x, \
- &__key, 0); \
- __init_completion(x); \
-} while (0)
-#else
#define init_completion_map(x, m) __init_completion(x)
#define init_completion(x) __init_completion(x)
static inline void complete_acquire(struct completion *x) {}
static inline void complete_release(struct completion *x) {}
static inline void complete_release_commit(struct completion *x) {}
-#endif
-#ifdef CONFIG_LOCKDEP_COMPLETIONS
-#define COMPLETION_INITIALIZER(work) \
- { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
- STATIC_CROSS_LOCKDEP_MAP_INIT("(completion)" #work, &(work)) }
-#else
#define COMPLETION_INITIALIZER(work) \
{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
-#endif
#define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
(*({ init_completion_map(&(work), &(map)); &(work); }))
CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_KVM_PPC_BOOK3S_PREPARE,
CPUHP_ZCOMP_PREPARE,
- CPUHP_TIMERS_DEAD,
+ CPUHP_TIMERS_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
extern void set_groups(struct cred *, struct group_info *);
extern int groups_search(const struct group_info *, kgid_t);
extern bool may_setgroups(void);
+extern void groups_sort(struct group_info *);
/*
* The security context of a task
struct capsule_info {
efi_capsule_header_t header;
+ efi_capsule_header_t *capsule;
int reset_type;
long index;
size_t count;
size_t total_size;
- phys_addr_t *pages;
+ struct page **pages;
+ phys_addr_t *phys;
size_t page_bytes_remain;
};
{
if (fscache_cookie_valid(cookie) && PageFsCache(page))
return __fscache_maybe_release_page(cookie, page, gfp);
- return false;
+ return true;
}
/**
/**
* @lock_key:
*
- * Per GPIO IRQ chip lockdep class.
+ * Per GPIO IRQ chip lockdep classes.
*/
struct lock_class_key *lock_key;
+ struct lock_class_key *request_key;
/**
* @parent_handler:
/* add/remove chips */
extern int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data,
- struct lock_class_key *lock_key);
+ struct lock_class_key *lock_key,
+ struct lock_class_key *request_key);
/**
* gpiochip_add_data() - register a gpio_chip
*/
#ifdef CONFIG_LOCKDEP
#define gpiochip_add_data(chip, data) ({ \
- static struct lock_class_key key; \
- gpiochip_add_data_with_key(chip, data, &key); \
+ static struct lock_class_key lock_key; \
+ static struct lock_class_key request_key; \
+ gpiochip_add_data_with_key(chip, data, &lock_key, \
+ &request_key); \
})
#else
-#define gpiochip_add_data(chip, data) gpiochip_add_data_with_key(chip, data, NULL)
+#define gpiochip_add_data(chip, data) gpiochip_add_data_with_key(chip, data, NULL, NULL)
#endif
static inline int gpiochip_add(struct gpio_chip *chip)
irq_flow_handler_t handler,
unsigned int type,
bool threaded,
- struct lock_class_key *lock_key);
+ struct lock_class_key *lock_key,
+ struct lock_class_key *request_key);
#ifdef CONFIG_LOCKDEP
irq_flow_handler_t handler,
unsigned int type)
{
- static struct lock_class_key key;
+ static struct lock_class_key lock_key;
+ static struct lock_class_key request_key;
return gpiochip_irqchip_add_key(gpiochip, irqchip, first_irq,
- handler, type, false, &key);
+ handler, type, false,
+ &lock_key, &request_key);
}
static inline int gpiochip_irqchip_add_nested(struct gpio_chip *gpiochip,
unsigned int type)
{
- static struct lock_class_key key;
+ static struct lock_class_key lock_key;
+ static struct lock_class_key request_key;
return gpiochip_irqchip_add_key(gpiochip, irqchip, first_irq,
- handler, type, true, &key);
+ handler, type, true,
+ &lock_key, &request_key);
}
#else
static inline int gpiochip_irqchip_add(struct gpio_chip *gpiochip,
unsigned int type)
{
return gpiochip_irqchip_add_key(gpiochip, irqchip, first_irq,
- handler, type, false, NULL);
+ handler, type, false, NULL, NULL);
}
static inline int gpiochip_irqchip_add_nested(struct gpio_chip *gpiochip,
unsigned int type)
{
return gpiochip_irqchip_add_key(gpiochip, irqchip, first_irq,
- handler, type, true, NULL);
+ handler, type, true, NULL, NULL);
}
#endif /* CONFIG_LOCKDEP */
#include <linux/radix-tree.h>
#include <linux/gfp.h>
#include <linux/percpu.h>
+#include <linux/bug.h>
struct idr {
struct radix_tree_root idr_rt;
--- /dev/null
+/*
+ * Copyright (C) Intel 2011
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * The PTI (Parallel Trace Interface) driver directs trace data routed from
+ * various parts in the system out through the Intel Penwell PTI port and
+ * out of the mobile device for analysis with a debugging tool
+ * (Lauterbach, Fido). This is part of a solution for the MIPI P1149.7,
+ * compact JTAG, standard.
+ *
+ * This header file will allow other parts of the OS to use the
+ * interface to write out it's contents for debugging a mobile system.
+ */
+
+#ifndef LINUX_INTEL_PTI_H_
+#define LINUX_INTEL_PTI_H_
+
+/* offset for last dword of any PTI message. Part of MIPI P1149.7 */
+#define PTI_LASTDWORD_DTS 0x30
+
+/* basic structure used as a write address to the PTI HW */
+struct pti_masterchannel {
+ u8 master;
+ u8 channel;
+};
+
+/* the following functions are defined in misc/pti.c */
+void pti_writedata(struct pti_masterchannel *mc, u8 *buf, int count);
+struct pti_masterchannel *pti_request_masterchannel(u8 type,
+ const char *thread_name);
+void pti_release_masterchannel(struct pti_masterchannel *mc);
+
+#endif /* LINUX_INTEL_PTI_H_ */
* 100: prefer care-of address
*/
dontfrag:1,
- autoflowlabel:1;
+ autoflowlabel:1,
+ autoflowlabel_set:1;
__u8 min_hopcount;
__u8 tclass;
__be32 rcv_flowinfo;
* mask. Applies only to affinity managed irqs.
* IRQD_SINGLE_TARGET - IRQ allows only a single affinity target
* IRQD_DEFAULT_TRIGGER_SET - Expected trigger already been set
+ * IRQD_CAN_RESERVE - Can use reservation mode
*/
enum {
IRQD_TRIGGER_MASK = 0xf,
IRQD_MANAGED_SHUTDOWN = (1 << 23),
IRQD_SINGLE_TARGET = (1 << 24),
IRQD_DEFAULT_TRIGGER_SET = (1 << 25),
+ IRQD_CAN_RESERVE = (1 << 26),
};
#define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
return __irqd_to_state(d) & IRQD_MANAGED_SHUTDOWN;
}
+static inline void irqd_set_can_reserve(struct irq_data *d)
+{
+ __irqd_to_state(d) |= IRQD_CAN_RESERVE;
+}
+
+static inline void irqd_clr_can_reserve(struct irq_data *d)
+{
+ __irqd_to_state(d) &= ~IRQD_CAN_RESERVE;
+}
+
+static inline bool irqd_can_reserve(struct irq_data *d)
+{
+ return __irqd_to_state(d) & IRQD_CAN_RESERVE;
+}
+
#undef __irqd_to_state
static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
}
static inline void
-irq_set_lockdep_class(unsigned int irq, struct lock_class_key *class)
+irq_set_lockdep_class(unsigned int irq, struct lock_class_key *lock_class,
+ struct lock_class_key *request_class)
{
struct irq_desc *desc = irq_to_desc(irq);
- if (desc)
- lockdep_set_class(&desc->lock, class);
+ if (desc) {
+ lockdep_set_class(&desc->lock, lock_class);
+ lockdep_set_class(&desc->request_mutex, request_class);
+ }
}
#ifdef CONFIG_IRQ_PREFLOW_FASTEOI
unsigned int nr_irqs, void *arg);
void (*free)(struct irq_domain *d, unsigned int virq,
unsigned int nr_irqs);
- int (*activate)(struct irq_domain *d, struct irq_data *irqd, bool early);
+ int (*activate)(struct irq_domain *d, struct irq_data *irqd, bool reserve);
void (*deactivate)(struct irq_domain *d, struct irq_data *irq_data);
int (*translate)(struct irq_domain *d, struct irq_fwspec *fwspec,
unsigned long *out_hwirq, unsigned int *out_type);
int cpu;
unsigned long ip;
#endif
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
- /*
- * Whether it's a crosslock.
- */
- int cross;
-#endif
};
static inline void lockdep_copy_map(struct lockdep_map *to,
unsigned int hardirqs_off:1;
unsigned int references:12; /* 32 bits */
unsigned int pin_count;
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
- /*
- * Generation id.
- *
- * A value of cross_gen_id will be stored when holding this,
- * which is globally increased whenever each crosslock is held.
- */
- unsigned int gen_id;
-#endif
-};
-
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
-#define MAX_XHLOCK_TRACE_ENTRIES 5
-
-/*
- * This is for keeping locks waiting for commit so that true dependencies
- * can be added at commit step.
- */
-struct hist_lock {
- /*
- * Id for each entry in the ring buffer. This is used to
- * decide whether the ring buffer was overwritten or not.
- *
- * For example,
- *
- * |<----------- hist_lock ring buffer size ------->|
- * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
- * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
- *
- * where 'p' represents an acquisition in process
- * context, 'i' represents an acquisition in irq
- * context.
- *
- * In this example, the ring buffer was overwritten by
- * acquisitions in irq context, that should be detected on
- * rollback or commit.
- */
- unsigned int hist_id;
-
- /*
- * Seperate stack_trace data. This will be used at commit step.
- */
- struct stack_trace trace;
- unsigned long trace_entries[MAX_XHLOCK_TRACE_ENTRIES];
-
- /*
- * Seperate hlock instance. This will be used at commit step.
- *
- * TODO: Use a smaller data structure containing only necessary
- * data. However, we should make lockdep code able to handle the
- * smaller one first.
- */
- struct held_lock hlock;
};
-/*
- * To initialize a lock as crosslock, lockdep_init_map_crosslock() should
- * be called instead of lockdep_init_map().
- */
-struct cross_lock {
- /*
- * When more than one acquisition of crosslocks are overlapped,
- * we have to perform commit for them based on cross_gen_id of
- * the first acquisition, which allows us to add more true
- * dependencies.
- *
- * Moreover, when no acquisition of a crosslock is in progress,
- * we should not perform commit because the lock might not exist
- * any more, which might cause incorrect memory access. So we
- * have to track the number of acquisitions of a crosslock.
- */
- int nr_acquire;
-
- /*
- * Seperate hlock instance. This will be used at commit step.
- *
- * TODO: Use a smaller data structure containing only necessary
- * data. However, we should make lockdep code able to handle the
- * smaller one first.
- */
- struct held_lock hlock;
-};
-
-struct lockdep_map_cross {
- struct lockdep_map map;
- struct cross_lock xlock;
-};
-#endif
-
/*
* Initialization, self-test and debugging-output methods:
*/
XHLOCK_CTX_NR,
};
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
-extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
- const char *name,
- struct lock_class_key *key,
- int subclass);
-extern void lock_commit_crosslock(struct lockdep_map *lock);
-
-/*
- * What we essencially have to initialize is 'nr_acquire'. Other members
- * will be initialized in add_xlock().
- */
-#define STATIC_CROSS_LOCK_INIT() \
- { .nr_acquire = 0,}
-
-#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
- { .map.name = (_name), .map.key = (void *)(_key), \
- .map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
-
-/*
- * To initialize a lockdep_map statically use this macro.
- * Note that _name must not be NULL.
- */
-#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
- { .name = (_name), .key = (void *)(_key), .cross = 0, }
-
-extern void crossrelease_hist_start(enum xhlock_context_t c);
-extern void crossrelease_hist_end(enum xhlock_context_t c);
-extern void lockdep_invariant_state(bool force);
-extern void lockdep_init_task(struct task_struct *task);
-extern void lockdep_free_task(struct task_struct *task);
-#else /* !CROSSRELEASE */
#define lockdep_init_map_crosslock(m, n, k, s) do {} while (0)
/*
* To initialize a lockdep_map statically use this macro.
static inline void lockdep_invariant_state(bool force) {}
static inline void lockdep_init_task(struct task_struct *task) {}
static inline void lockdep_free_task(struct task_struct *task) {}
-#endif /* CROSSRELEASE */
#ifdef CONFIG_LOCK_STAT
#define LTR_L1SS_PWR_GATE_CHECK_CARD_EN BIT(6)
enum dev_aspm_mode {
- DEV_ASPM_DISABLE = 0,
DEV_ASPM_DYNAMIC,
DEV_ASPM_BACKDOOR,
DEV_ASPM_STATIC,
+ DEV_ASPM_DISABLE,
};
/*
};
struct mlx5_irq_info {
+ cpumask_var_t mask;
char name[MLX5_MAX_IRQ_NAME];
};
enum mlx5_eq_type type);
int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
int mlx5_start_eqs(struct mlx5_core_dev *dev);
-int mlx5_stop_eqs(struct mlx5_core_dev *dev);
+void mlx5_stop_eqs(struct mlx5_core_dev *dev);
int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn,
unsigned int *irqn);
int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn);
int mlx5_cmd_destroy_vport_lag(struct mlx5_core_dev *dev);
bool mlx5_lag_is_active(struct mlx5_core_dev *dev);
struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev);
+int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
+ u64 *values,
+ int num_counters,
+ size_t *offsets);
struct mlx5_uars_page *mlx5_get_uars_page(struct mlx5_core_dev *mdev);
void mlx5_put_uars_page(struct mlx5_core_dev *mdev, struct mlx5_uars_page *up);
MLX5_CMD_OP_ALLOC_Q_COUNTER = 0x771,
MLX5_CMD_OP_DEALLOC_Q_COUNTER = 0x772,
MLX5_CMD_OP_QUERY_Q_COUNTER = 0x773,
- MLX5_CMD_OP_SET_RATE_LIMIT = 0x780,
+ MLX5_CMD_OP_SET_PP_RATE_LIMIT = 0x780,
MLX5_CMD_OP_QUERY_RATE_LIMIT = 0x781,
MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT = 0x782,
MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT = 0x783,
u8 vxlan_udp_port[0x10];
};
-struct mlx5_ifc_set_rate_limit_out_bits {
+struct mlx5_ifc_set_pp_rate_limit_out_bits {
u8 status[0x8];
u8 reserved_at_8[0x18];
u8 reserved_at_40[0x40];
};
-struct mlx5_ifc_set_rate_limit_in_bits {
+struct mlx5_ifc_set_pp_rate_limit_in_bits {
u8 opcode[0x10];
u8 reserved_at_10[0x10];
u8 reserved_at_60[0x20];
u8 rate_limit[0x20];
+
+ u8 reserved_at_a0[0x160];
};
struct mlx5_ifc_access_register_out_bits {
return tsk->signal->oom_mm;
}
+/*
+ * Use this helper if tsk->mm != mm and the victim mm needs a special
+ * handling. This is guaranteed to stay true after once set.
+ */
+static inline bool mm_is_oom_victim(struct mm_struct *mm)
+{
+ return test_bit(MMF_OOM_VICTIM, &mm->flags);
+}
+
/*
* Checks whether a page fault on the given mm is still reliable.
* This is no longer true if the oom reaper started to reap the
static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus,
unsigned int devfn)
{ return NULL; }
+static inline struct pci_dev *pci_get_domain_bus_and_slot(int domain,
+ unsigned int bus, unsigned int devfn)
+{ return NULL; }
static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
static inline struct pci_dev *pci_dev_get(struct pci_dev *dev) { return NULL; }
extern int pm_generic_poweroff(struct device *dev);
extern void pm_generic_complete(struct device *dev);
+extern void dev_pm_skip_next_resume_phases(struct device *dev);
extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
#else /* !CONFIG_PM_SLEEP */
-/*
- * Copyright (C) Intel 2011
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- *
- * The PTI (Parallel Trace Interface) driver directs trace data routed from
- * various parts in the system out through the Intel Penwell PTI port and
- * out of the mobile device for analysis with a debugging tool
- * (Lauterbach, Fido). This is part of a solution for the MIPI P1149.7,
- * compact JTAG, standard.
- *
- * This header file will allow other parts of the OS to use the
- * interface to write out it's contents for debugging a mobile system.
- */
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _INCLUDE_PTI_H
+#define _INCLUDE_PTI_H
-#ifndef PTI_H_
-#define PTI_H_
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#include <asm/pti.h>
+#else
+static inline void pti_init(void) { }
+#endif
-/* offset for last dword of any PTI message. Part of MIPI P1149.7 */
-#define PTI_LASTDWORD_DTS 0x30
-
-/* basic structure used as a write address to the PTI HW */
-struct pti_masterchannel {
- u8 master;
- u8 channel;
-};
-
-/* the following functions are defined in misc/pti.c */
-void pti_writedata(struct pti_masterchannel *mc, u8 *buf, int count);
-struct pti_masterchannel *pti_request_masterchannel(u8 type,
- const char *thread_name);
-void pti_release_masterchannel(struct pti_masterchannel *mc);
-
-#endif /*PTI_H_*/
+#endif
/* Note: callers invoking this in a loop must use a compiler barrier,
* for example cpu_relax(). Callers must hold producer_lock.
+ * Callers are responsible for making sure pointer that is being queued
+ * points to a valid data.
*/
static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
{
if (unlikely(!r->size) || r->queue[r->producer])
return -ENOSPC;
+ /* Make sure the pointer we are storing points to a valid data. */
+ /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
+ smp_wmb();
+
r->queue[r->producer++] = ptr;
if (unlikely(r->producer >= r->size))
r->producer = 0;
if (ptr)
__ptr_ring_discard_one(r);
+ /* Make sure anyone accessing data through the pointer is up to date. */
+ /* Pairs with smp_wmb in __ptr_ring_produce. */
+ smp_read_barrier_depends();
return ptr;
}
struct rb_root *root);
extern void rb_replace_node_rcu(struct rb_node *victim, struct rb_node *new,
struct rb_root *root);
+extern void rb_replace_node_cached(struct rb_node *victim, struct rb_node *new,
+ struct rb_root_cached *root);
static inline void rb_link_node(struct rb_node *node, struct rb_node *parent,
struct rb_node **rb_link)
*/
typedef struct {
arch_rwlock_t raw_lock;
-#ifdef CONFIG_GENERIC_LOCKBREAK
- unsigned int break_lock;
-#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
struct held_lock held_locks[MAX_LOCK_DEPTH];
#endif
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
-#define MAX_XHLOCKS_NR 64UL
- struct hist_lock *xhlocks; /* Crossrelease history locks */
- unsigned int xhlock_idx;
- /* For restoring at history boundaries */
- unsigned int xhlock_idx_hist[XHLOCK_CTX_NR];
- unsigned int hist_id;
- /* For overwrite check at each context exit */
- unsigned int hist_id_save[XHLOCK_CTX_NR];
-#endif
-
#ifdef CONFIG_UBSAN
unsigned int in_ubsan;
#endif
__set_task_comm(tsk, from, false);
}
-extern char *get_task_comm(char *to, struct task_struct *tsk);
+extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
+#define get_task_comm(buf, tsk) ({ \
+ BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN); \
+ __get_task_comm(buf, sizeof(buf), tsk); \
+})
#ifdef CONFIG_SMP
void scheduler_ipi(void);
#define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */
#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
+#define MMF_OOM_VICTIM 25 /* mm is the oom victim */
#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
unsigned char mac_addr[ETH_ALEN];
unsigned no_ether_link:1;
unsigned ether_link_active_low:1;
- unsigned needs_init:1;
};
#endif
* for that name. This appears in the sysfs "modalias" attribute
* for driver coldplugging, and in uevents used for hotplugging
* @cs_gpio: gpio number of the chipselect line (optional, -ENOENT when
- * when not using a GPIO line)
+ * not using a GPIO line)
*
* @statistics: statistics for the spi_device
*
#define raw_spin_is_locked(lock) arch_spin_is_locked(&(lock)->raw_lock)
-#ifdef CONFIG_GENERIC_LOCKBREAK
-#define raw_spin_is_contended(lock) ((lock)->break_lock)
-#else
-
#ifdef arch_spin_is_contended
#define raw_spin_is_contended(lock) arch_spin_is_contended(&(lock)->raw_lock)
#else
#define raw_spin_is_contended(lock) (((void)(lock), 0))
#endif /*arch_spin_is_contended*/
-#endif
/*
* This barrier must provide two things:
typedef struct raw_spinlock {
arch_spinlock_t raw_lock;
-#ifdef CONFIG_GENERIC_LOCKBREAK
- unsigned int break_lock;
-#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
{
__kernel_size_t ret;
size_t p_size = __builtin_object_size(p, 0);
- if (p_size == (size_t)-1)
+
+ /* Work around gcc excess stack consumption issue */
+ if (p_size == (size_t)-1 ||
+ (__builtin_constant_p(p[p_size - 1]) && p[p_size - 1] == '\0'))
return __builtin_strlen(p);
ret = strnlen(p, p_size);
if (p_size <= ret)
extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern unsigned long tick_nohz_get_idle_calls(void);
+extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#else /* !CONFIG_NO_HZ_COMMON */
unsigned long round_jiffies_up_relative(unsigned long j);
#ifdef CONFIG_HOTPLUG_CPU
+int timers_prepare_cpu(unsigned int cpu);
int timers_dead_cpu(unsigned int cpu);
#else
-#define timers_dead_cpu NULL
+#define timers_prepare_cpu NULL
+#define timers_dead_cpu NULL
#endif
#endif
*/
struct trace_export {
struct trace_export __rcu *next;
- void (*write)(const void *, unsigned int);
+ void (*write)(struct trace_export *, const void *, unsigned int);
};
int register_ftrace_export(struct trace_export *export);
* @WIPHY_FLAG_IBSS_RSN: The device supports IBSS RSN.
* @WIPHY_FLAG_MESH_AUTH: The device supports mesh authentication by routing
* auth frames to userspace. See @NL80211_MESH_SETUP_USERSPACE_AUTH.
- * @WIPHY_FLAG_SUPPORTS_SCHED_SCAN: The device supports scheduled scans.
* @WIPHY_FLAG_SUPPORTS_FW_ROAM: The device supports roaming feature in the
* firmware.
* @WIPHY_FLAG_AP_UAPSD: The device supports uapsd on AP.
#else
#error "Please fix <asm/byteorder.h>"
#endif
- __u8 proto_ctype;
- __u16 flags;
+ __u8 proto_ctype;
+ __be16 flags;
};
- __u32 word;
+ __be32 word;
};
};
* if there is an unknown standard or private flags, or the options length for
* the flags exceeds the options length specific in hlen of the GUE header.
*/
-static inline int validate_gue_flags(struct guehdr *guehdr,
- size_t optlen)
+static inline int validate_gue_flags(struct guehdr *guehdr, size_t optlen)
{
+ __be16 flags = guehdr->flags;
size_t len;
- __be32 flags = guehdr->flags;
if (flags & ~GUE_FLAGS_ALL)
return 1;
/* Private flags are last four bytes accounted in
* guehdr_flags_len
*/
- flags = *(__be32 *)((void *)&guehdr[1] + len - GUE_LEN_PRIV);
+ __be32 pflags = *(__be32 *)((void *)&guehdr[1] +
+ len - GUE_LEN_PRIV);
- if (flags & ~GUE_PFLAGS_ALL)
+ if (pflags & ~GUE_PFLAGS_ALL)
return 1;
- len += guehdr_priv_flags_len(flags);
+ len += guehdr_priv_flags_len(pflags);
if (len > optlen)
return 1;
}
#include <net/flow_dissector.h>
#define IPV4_MAX_PMTU 65535U /* RFC 2675, Section 5.1 */
+#define IPV4_MIN_MTU 68 /* RFC 791 */
struct sock;
};
enum tc_clsbpf_command {
- TC_CLSBPF_ADD,
- TC_CLSBPF_REPLACE,
- TC_CLSBPF_DESTROY,
+ TC_CLSBPF_OFFLOAD,
TC_CLSBPF_STATS,
};
enum tc_clsbpf_command command;
struct tcf_exts *exts;
struct bpf_prog *prog;
+ struct bpf_prog *oldprog;
const char *name;
bool exts_integrated;
u32 gen_flags;
* qdisc_tree_decrease_qlen() should stop.
*/
#define TCQ_F_INVISIBLE 0x80 /* invisible by default in dump */
+#define TCQ_F_OFFLOADED 0x200 /* qdisc is offloaded to HW */
u32 limit;
const struct Qdisc_ops *ops;
struct qdisc_size_table __rcu *stab;
void sctp_transport_burst_reset(struct sctp_transport *);
unsigned long sctp_transport_timeout(struct sctp_transport *);
void sctp_transport_reset(struct sctp_transport *t);
-void sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu);
+bool sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu);
void sctp_transport_immediate_rtx(struct sctp_transport *);
void sctp_transport_dst_release(struct sctp_transport *t);
void sctp_transport_dst_confirm(struct sctp_transport *t);
return sk->sk_lock.owned;
}
+static inline bool sock_owned_by_user_nocheck(const struct sock *sk)
+{
+ return sk->sk_lock.owned;
+}
+
/* no reclassification while locks are held */
static inline bool sock_allow_reclassification(const struct sock *csk)
{
np_applied:1,
instance_applied:1,
version:2,
-reserved_flags2:2;
+ reserved_flags2:2;
#elif defined(__BIG_ENDIAN_BITFIELD)
u8 reserved_flags2:2,
version:2,
int xfrm_prepare_input(struct xfrm_state *x, struct sk_buff *skb);
int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
int xfrm_input_resume(struct sk_buff *skb, int nexthdr);
+int xfrm_trans_queue(struct sk_buff *skb,
+ int (*finish)(struct net *, struct sock *,
+ struct sk_buff *));
int xfrm_output_resume(struct sk_buff *skb, int err);
int xfrm_output(struct sock *sk, struct sk_buff *skb);
int xfrm_inner_extract_output(struct xfrm_state *x, struct sk_buff *skb);
TP_STRUCT__entry(
__string( name, core->name )
- __string( pname, parent->name )
+ __string( pname, parent ? parent->name : "none" )
),
TP_fast_assign(
__assign_str(name, core->name);
- __assign_str(pname, parent->name);
+ __assign_str(pname, parent ? parent->name : "none");
),
TP_printk("%s %s", __get_str(name), __get_str(pname))
{ KVM_TRACE_MMIO_WRITE, "write" }
TRACE_EVENT(kvm_mmio,
- TP_PROTO(int type, int len, u64 gpa, u64 val),
+ TP_PROTO(int type, int len, u64 gpa, void *val),
TP_ARGS(type, len, gpa, val),
TP_STRUCT__entry(
__entry->type = type;
__entry->len = len;
__entry->gpa = gpa;
- __entry->val = val;
+ __entry->val = 0;
+ if (val)
+ memcpy(&__entry->val, val,
+ min_t(u32, sizeof(__entry->val), len));
),
TP_printk("mmio %s len %u gpa 0x%llx val 0x%llx",
#include <trace/define_trace.h>
-#else /* !CONFIG_PREEMPTIRQ_EVENTS */
+#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
+#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
#define trace_irq_enable(...)
#define trace_irq_disable(...)
-#define trace_preempt_enable(...)
-#define trace_preempt_disable(...)
#define trace_irq_enable_rcuidle(...)
#define trace_irq_disable_rcuidle(...)
+#endif
+
+#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
#define trace_preempt_enable_rcuidle(...)
#define trace_preempt_disable_rcuidle(...)
-
#endif
tcp_state_name(TCP_CLOSING), \
tcp_state_name(TCP_NEW_SYN_RECV))
+#define TP_STORE_V4MAPPED(__entry, saddr, daddr) \
+ do { \
+ struct in6_addr *pin6; \
+ \
+ pin6 = (struct in6_addr *)__entry->saddr_v6; \
+ ipv6_addr_set_v4mapped(saddr, pin6); \
+ pin6 = (struct in6_addr *)__entry->daddr_v6; \
+ ipv6_addr_set_v4mapped(daddr, pin6); \
+ } while (0)
+
+#if IS_ENABLED(CONFIG_IPV6)
+#define TP_STORE_ADDRS(__entry, saddr, daddr, saddr6, daddr6) \
+ do { \
+ if (sk->sk_family == AF_INET6) { \
+ struct in6_addr *pin6; \
+ \
+ pin6 = (struct in6_addr *)__entry->saddr_v6; \
+ *pin6 = saddr6; \
+ pin6 = (struct in6_addr *)__entry->daddr_v6; \
+ *pin6 = daddr6; \
+ } else { \
+ TP_STORE_V4MAPPED(__entry, saddr, daddr); \
+ } \
+ } while (0)
+#else
+#define TP_STORE_ADDRS(__entry, saddr, daddr, saddr6, daddr6) \
+ TP_STORE_V4MAPPED(__entry, saddr, daddr)
+#endif
+
/*
* tcp event with arguments sk and skb
*
TP_fast_assign(
struct inet_sock *inet = inet_sk(sk);
- struct in6_addr *pin6;
__be32 *p32;
__entry->skbaddr = skb;
p32 = (__be32 *) __entry->daddr;
*p32 = inet->inet_daddr;
-#if IS_ENABLED(CONFIG_IPV6)
- if (sk->sk_family == AF_INET6) {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- *pin6 = sk->sk_v6_rcv_saddr;
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- *pin6 = sk->sk_v6_daddr;
- } else
-#endif
- {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
- }
+ TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
+ sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
),
TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
TP_fast_assign(
struct inet_sock *inet = inet_sk(sk);
- struct in6_addr *pin6;
__be32 *p32;
__entry->skaddr = sk;
p32 = (__be32 *) __entry->daddr;
*p32 = inet->inet_daddr;
-#if IS_ENABLED(CONFIG_IPV6)
- if (sk->sk_family == AF_INET6) {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- *pin6 = sk->sk_v6_rcv_saddr;
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- *pin6 = sk->sk_v6_daddr;
- } else
-#endif
- {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
- }
+ TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
+ sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
),
TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
TP_fast_assign(
struct inet_sock *inet = inet_sk(sk);
- struct in6_addr *pin6;
__be32 *p32;
__entry->skaddr = sk;
p32 = (__be32 *) __entry->daddr;
*p32 = inet->inet_daddr;
-#if IS_ENABLED(CONFIG_IPV6)
- if (sk->sk_family == AF_INET6) {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- *pin6 = sk->sk_v6_rcv_saddr;
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- *pin6 = sk->sk_v6_daddr;
- } else
-#endif
- {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
- }
+ TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
+ sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
),
TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
TP_fast_assign(
struct inet_request_sock *ireq = inet_rsk(req);
- struct in6_addr *pin6;
__be32 *p32;
__entry->skaddr = sk;
p32 = (__be32 *) __entry->daddr;
*p32 = ireq->ir_rmt_addr;
-#if IS_ENABLED(CONFIG_IPV6)
- if (sk->sk_family == AF_INET6) {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- *pin6 = ireq->ir_v6_loc_addr;
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- *pin6 = ireq->ir_v6_rmt_addr;
- } else
-#endif
- {
- pin6 = (struct in6_addr *)__entry->saddr_v6;
- ipv6_addr_set_v4mapped(ireq->ir_loc_addr, pin6);
- pin6 = (struct in6_addr *)__entry->daddr_v6;
- ipv6_addr_set_v4mapped(ireq->ir_rmt_addr, pin6);
- }
+ TP_STORE_ADDRS(__entry, ireq->ir_loc_addr, ireq->ir_rmt_addr,
+ ireq->ir_v6_loc_addr, ireq->ir_v6_rmt_addr);
),
TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
#define _UAPI_LINUX_IF_ETHER_H
#include <linux/types.h>
+#include <linux/libc-compat.h>
/*
* IEEE 802.3 Ethernet magic constants. The frame sizes omit the preamble
* This is an Ethernet frame header.
*/
+#if __UAPI_DEF_ETHHDR
struct ethhdr {
unsigned char h_dest[ETH_ALEN]; /* destination eth addr */
unsigned char h_source[ETH_ALEN]; /* source ether addr */
__be16 h_proto; /* packet type ID field */
} __attribute__((packed));
+#endif
#endif /* _UAPI_LINUX_IF_ETHER_H */
/* If we did not see any headers from any supported C libraries,
* or we are being included in the kernel, then define everything
- * that we need. */
+ * that we need. Check for previous __UAPI_* definitions to give
+ * unsupported C libraries a way to opt out of any kernel definition. */
#else /* !defined(__GLIBC__) */
/* Definitions for if.h */
+#ifndef __UAPI_DEF_IF_IFCONF
#define __UAPI_DEF_IF_IFCONF 1
+#endif
+#ifndef __UAPI_DEF_IF_IFMAP
#define __UAPI_DEF_IF_IFMAP 1
+#endif
+#ifndef __UAPI_DEF_IF_IFNAMSIZ
#define __UAPI_DEF_IF_IFNAMSIZ 1
+#endif
+#ifndef __UAPI_DEF_IF_IFREQ
#define __UAPI_DEF_IF_IFREQ 1
+#endif
/* Everything up to IFF_DYNAMIC, matches net/if.h until glibc 2.23 */
+#ifndef __UAPI_DEF_IF_NET_DEVICE_FLAGS
#define __UAPI_DEF_IF_NET_DEVICE_FLAGS 1
+#endif
/* For the future if glibc adds IFF_LOWER_UP, IFF_DORMANT and IFF_ECHO */
+#ifndef __UAPI_DEF_IF_NET_DEVICE_FLAGS_LOWER_UP_DORMANT_ECHO
#define __UAPI_DEF_IF_NET_DEVICE_FLAGS_LOWER_UP_DORMANT_ECHO 1
+#endif
/* Definitions for in.h */
+#ifndef __UAPI_DEF_IN_ADDR
#define __UAPI_DEF_IN_ADDR 1
+#endif
+#ifndef __UAPI_DEF_IN_IPPROTO
#define __UAPI_DEF_IN_IPPROTO 1
+#endif
+#ifndef __UAPI_DEF_IN_PKTINFO
#define __UAPI_DEF_IN_PKTINFO 1
+#endif
+#ifndef __UAPI_DEF_IP_MREQ
#define __UAPI_DEF_IP_MREQ 1
+#endif
+#ifndef __UAPI_DEF_SOCKADDR_IN
#define __UAPI_DEF_SOCKADDR_IN 1
+#endif
+#ifndef __UAPI_DEF_IN_CLASS
#define __UAPI_DEF_IN_CLASS 1
+#endif
/* Definitions for in6.h */
+#ifndef __UAPI_DEF_IN6_ADDR
#define __UAPI_DEF_IN6_ADDR 1
+#endif
+#ifndef __UAPI_DEF_IN6_ADDR_ALT
#define __UAPI_DEF_IN6_ADDR_ALT 1
+#endif
+#ifndef __UAPI_DEF_SOCKADDR_IN6
#define __UAPI_DEF_SOCKADDR_IN6 1
+#endif
+#ifndef __UAPI_DEF_IPV6_MREQ
#define __UAPI_DEF_IPV6_MREQ 1
+#endif
+#ifndef __UAPI_DEF_IPPROTO_V6
#define __UAPI_DEF_IPPROTO_V6 1
+#endif
+#ifndef __UAPI_DEF_IPV6_OPTIONS
#define __UAPI_DEF_IPV6_OPTIONS 1
+#endif
+#ifndef __UAPI_DEF_IN6_PKTINFO
#define __UAPI_DEF_IN6_PKTINFO 1
+#endif
+#ifndef __UAPI_DEF_IP6_MTUINFO
#define __UAPI_DEF_IP6_MTUINFO 1
+#endif
/* Definitions for ipx.h */
+#ifndef __UAPI_DEF_SOCKADDR_IPX
#define __UAPI_DEF_SOCKADDR_IPX 1
+#endif
+#ifndef __UAPI_DEF_IPX_ROUTE_DEFINITION
#define __UAPI_DEF_IPX_ROUTE_DEFINITION 1
+#endif
+#ifndef __UAPI_DEF_IPX_INTERFACE_DEFINITION
#define __UAPI_DEF_IPX_INTERFACE_DEFINITION 1
+#endif
+#ifndef __UAPI_DEF_IPX_CONFIG_DATA
#define __UAPI_DEF_IPX_CONFIG_DATA 1
+#endif
+#ifndef __UAPI_DEF_IPX_ROUTE_DEF
#define __UAPI_DEF_IPX_ROUTE_DEF 1
+#endif
/* Definitions for xattr.h */
+#ifndef __UAPI_DEF_XATTR
#define __UAPI_DEF_XATTR 1
+#endif
#endif /* __GLIBC__ */
+/* Definitions for if_ether.h */
+/* allow libcs like musl to deactivate this, glibc does not implement this. */
+#ifndef __UAPI_DEF_ETHHDR
+#define __UAPI_DEF_ETHHDR 1
+#endif
+
#endif /* _UAPI_LIBC_COMPAT_H */
#define NF_CT_STATE_INVALID_BIT (1 << 0)
#define NF_CT_STATE_BIT(ctinfo) (1 << ((ctinfo) % IP_CT_IS_REPLY + 1))
-#define NF_CT_STATE_UNTRACKED_BIT (1 << (IP_CT_UNTRACKED + 1))
+#define NF_CT_STATE_UNTRACKED_BIT (1 << 6)
/* Bitset representing status of connection. */
enum ip_conntrack_status {
#define TC_RED_ECN 1
#define TC_RED_HARDDROP 2
#define TC_RED_ADAPTATIVE 4
-#define TC_RED_OFFLOADED 8
};
struct tc_red_xstats {
TCA_PAD,
TCA_DUMP_INVISIBLE,
TCA_CHAIN,
+ TCA_HW_OFFLOAD,
__TCA_MAX
};
{
}
#endif
+
+#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
+struct resource;
+void arch_xen_balloon_init(struct resource *hostmem_resource);
+#endif
config CPU_ISOLATION
bool "CPU isolation"
+ default y
help
Make sure that CPUs running critical tasks are not disturbed by
any source of "noise" such as unbound workqueues, timers, kthreads...
- Unbound jobs get offloaded to housekeeping CPUs.
+ Unbound jobs get offloaded to housekeeping CPUs. This is driven by
+ the "isolcpus=" boot parameter.
+
+ Say Y if unsure.
source "kernel/rcu/Kconfig"
Enable the bpf() system call that allows to manipulate eBPF
programs and maps via file descriptors.
+config BPF_JIT_ALWAYS_ON
+ bool "Permanently enable BPF JIT and remove BPF interpreter"
+ depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT
+ help
+ Enables BPF JIT and removes BPF interpreter to avoid
+ speculative execution of BPF instructions by the interpreter
+
config USERFAULTFD
bool "Enable userfaultfd() system call"
select ANON_INODES
#include <linux/slab.h>
#include <linux/perf_event.h>
#include <linux/ptrace.h>
+#include <linux/pti.h>
#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/sched_clock.h>
pgtable_init();
vmalloc_init();
ioremap_huge_init();
+ /* Should be run before the first non-init thread is created */
+ init_espfix_bsp();
+ /* Should be run after espfix64 is set up. */
+ pti_init();
}
asmlinkage __visible void __init start_kernel(void)
local_irq_disable();
radix_tree_init();
+ /*
+ * Set up housekeeping before setting up workqueues to allow the unbound
+ * workqueue to take non-housekeeping into account.
+ */
+ housekeeping_init();
+
/*
* Allow workqueue creation and work item queueing/cancelling
* early. Work item execution depends on kthreads and starts after
early_irq_init();
init_IRQ();
tick_init();
- housekeeping_init();
rcu_init_nohz();
init_timers();
hrtimers_init();
#ifdef CONFIG_X86
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi_enter_virtual_mode();
-#endif
-#ifdef CONFIG_X86_ESPFIX64
- /* Should be run before the first non-init thread is created */
- init_espfix_bsp();
#endif
thread_stack_cache_init();
cred_init();
{
struct kstatfs sbuf;
- if (time_is_before_jiffies(acct->needcheck))
+ if (time_is_after_jiffies(acct->needcheck))
goto out;
/* May block */
{
bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
int numa_node = bpf_map_attr_numa_node(attr);
+ u32 elem_size, index_mask, max_entries;
+ bool unpriv = !capable(CAP_SYS_ADMIN);
struct bpf_array *array;
u64 array_size;
- u32 elem_size;
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
elem_size = round_up(attr->value_size, 8);
+ max_entries = attr->max_entries;
+ index_mask = roundup_pow_of_two(max_entries) - 1;
+
+ if (unpriv)
+ /* round up array size to nearest power of 2,
+ * since cpu will speculate within index_mask limits
+ */
+ max_entries = index_mask + 1;
+
array_size = sizeof(*array);
if (percpu)
- array_size += (u64) attr->max_entries * sizeof(void *);
+ array_size += (u64) max_entries * sizeof(void *);
else
- array_size += (u64) attr->max_entries * elem_size;
+ array_size += (u64) max_entries * elem_size;
/* make sure there is no u32 overflow later in round_up() */
if (array_size >= U32_MAX - PAGE_SIZE)
array = bpf_map_area_alloc(array_size, numa_node);
if (!array)
return ERR_PTR(-ENOMEM);
+ array->index_mask = index_mask;
+ array->map.unpriv_array = unpriv;
/* copy mandatory map attributes */
array->map.map_type = attr->map_type;
if (unlikely(index >= array->map.max_entries))
return NULL;
- return array->value + array->elem_size * index;
+ return array->value + array->elem_size * (index & array->index_mask);
}
/* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
{
+ struct bpf_array *array = container_of(map, struct bpf_array, map);
struct bpf_insn *insn = insn_buf;
u32 elem_size = round_up(map->value_size, 8);
const int ret = BPF_REG_0;
*insn++ = BPF_ALU64_IMM(BPF_ADD, map_ptr, offsetof(struct bpf_array, value));
*insn++ = BPF_LDX_MEM(BPF_W, ret, index, 0);
- *insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 3);
+ if (map->unpriv_array) {
+ *insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 4);
+ *insn++ = BPF_ALU32_IMM(BPF_AND, ret, array->index_mask);
+ } else {
+ *insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 3);
+ }
if (is_power_of_2(elem_size)) {
*insn++ = BPF_ALU64_IMM(BPF_LSH, ret, ilog2(elem_size));
if (unlikely(index >= array->map.max_entries))
return NULL;
- return this_cpu_ptr(array->pptrs[index]);
+ return this_cpu_ptr(array->pptrs[index & array->index_mask]);
}
int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value)
*/
size = round_up(map->value_size, 8);
rcu_read_lock();
- pptr = array->pptrs[index];
+ pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) {
bpf_long_memcpy(value + off, per_cpu_ptr(pptr, cpu), size);
off += size;
return -EEXIST;
if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
- memcpy(this_cpu_ptr(array->pptrs[index]),
+ memcpy(this_cpu_ptr(array->pptrs[index & array->index_mask]),
value, map->value_size);
else
- memcpy(array->value + array->elem_size * index,
+ memcpy(array->value +
+ array->elem_size * (index & array->index_mask),
value, map->value_size);
return 0;
}
*/
size = round_up(map->value_size, 8);
rcu_read_lock();
- pptr = array->pptrs[index];
+ pptr = array->pptrs[index & array->index_mask];
for_each_possible_cpu(cpu) {
bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off, size);
off += size;
static u32 array_of_map_gen_lookup(struct bpf_map *map,
struct bpf_insn *insn_buf)
{
+ struct bpf_array *array = container_of(map, struct bpf_array, map);
u32 elem_size = round_up(map->value_size, 8);
struct bpf_insn *insn = insn_buf;
const int ret = BPF_REG_0;
*insn++ = BPF_ALU64_IMM(BPF_ADD, map_ptr, offsetof(struct bpf_array, value));
*insn++ = BPF_LDX_MEM(BPF_W, ret, index, 0);
- *insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 5);
+ if (map->unpriv_array) {
+ *insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 6);
+ *insn++ = BPF_ALU32_IMM(BPF_AND, ret, array->index_mask);
+ } else {
+ *insn++ = BPF_JMP_IMM(BPF_JGE, ret, map->max_entries, 5);
+ }
if (is_power_of_2(elem_size))
*insn++ = BPF_ALU64_IMM(BPF_LSH, ret, ilog2(elem_size));
else
}
EXPORT_SYMBOL_GPL(__bpf_call_base);
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
/**
* __bpf_prog_run - run eBPF program on a given context
* @ctx: is the data we are operating on
EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
};
+#else
+static unsigned int __bpf_prog_ret0(const void *ctx,
+ const struct bpf_insn *insn)
+{
+ return 0;
+}
+#endif
+
bool bpf_prog_array_compatible(struct bpf_array *array,
const struct bpf_prog *fp)
{
*/
struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
{
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1);
fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1];
+#else
+ fp->bpf_func = __bpf_prog_ret0;
+#endif
/* eBPF JITs can rewrite the program in case constant
* blinding is active. However, in case of error during
*/
if (!bpf_prog_is_dev_bound(fp->aux)) {
fp = bpf_int_jit_compile(fp);
+#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+ if (!fp->jited) {
+ *err = -ENOTSUPP;
+ return fp;
+ }
+#endif
} else {
*err = bpf_prog_offload_compile(fp);
if (*err)
pptr = htab_elem_get_ptr(get_htab_elem(htab, i),
htab->map.key_size);
free_percpu(pptr);
+ cond_resched();
}
free_elems:
bpf_map_area_free(htab->elems);
goto free_elems;
htab_elem_set_ptr(get_htab_elem(htab, i), htab->map.key_size,
pptr);
+ cond_resched();
}
skip_percpu_elems:
putname(pname);
return ret;
}
-EXPORT_SYMBOL_GPL(bpf_obj_get_user);
+
+static struct bpf_prog *__get_prog_inode(struct inode *inode, enum bpf_prog_type type)
+{
+ struct bpf_prog *prog;
+ int ret = inode_permission(inode, MAY_READ | MAY_WRITE);
+ if (ret)
+ return ERR_PTR(ret);
+
+ if (inode->i_op == &bpf_map_iops)
+ return ERR_PTR(-EINVAL);
+ if (inode->i_op != &bpf_prog_iops)
+ return ERR_PTR(-EACCES);
+
+ prog = inode->i_private;
+
+ ret = security_bpf_prog(prog);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
+ if (!bpf_prog_get_ok(prog, &type, false))
+ return ERR_PTR(-EINVAL);
+
+ return bpf_prog_inc(prog);
+}
+
+struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type)
+{
+ struct bpf_prog *prog;
+ struct path path;
+ int ret = kern_path(name, LOOKUP_FOLLOW, &path);
+ if (ret)
+ return ERR_PTR(ret);
+ prog = __get_prog_inode(d_backing_inode(path.dentry), type);
+ if (!IS_ERR(prog))
+ touch_atime(&path);
+ path_put(&path);
+ return prog;
+}
+EXPORT_SYMBOL(bpf_prog_get_type_path);
static void bpf_evict_inode(struct inode *inode)
{
write_lock_bh(&sock->sk_callback_lock);
psock = smap_psock_sk(sock);
- smap_list_remove(psock, &stab->sock_map[i]);
- smap_release_sock(psock, sock);
+ /* This check handles a racing sock event that can get the
+ * sk_callback_lock before this case but after xchg happens
+ * causing the refcnt to hit zero and sock user data (psock)
+ * to be null and queued for garbage collection.
+ */
+ if (likely(psock)) {
+ smap_list_remove(psock, &stab->sock_map[i]);
+ smap_release_sock(psock, sock);
+ }
write_unlock_bh(&sock->sk_callback_lock);
}
rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(bpf_prog_inc_not_zero);
-static bool bpf_prog_get_ok(struct bpf_prog *prog,
+bool bpf_prog_get_ok(struct bpf_prog *prog,
enum bpf_prog_type *attach_type, bool attach_drv)
{
/* not an attachment, just a refcount inc, always allow */
break;
case PTR_TO_STACK:
pointer_desc = "stack ";
+ /* The stack spill tracking logic in check_stack_write()
+ * and check_stack_read() relies on stack accesses being
+ * aligned.
+ */
+ strict = true;
break;
default:
break;
strict);
}
+/* truncate register to smaller size (in bytes)
+ * must be called with size < BPF_REG_SIZE
+ */
+static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
+{
+ u64 mask;
+
+ /* clear high bits in bit representation */
+ reg->var_off = tnum_cast(reg->var_off, size);
+
+ /* fix arithmetic bounds */
+ mask = ((u64)1 << (size * 8)) - 1;
+ if ((reg->umin_value & ~mask) == (reg->umax_value & ~mask)) {
+ reg->umin_value &= mask;
+ reg->umax_value &= mask;
+ } else {
+ reg->umin_value = 0;
+ reg->umax_value = mask;
+ }
+ reg->smin_value = reg->umin_value;
+ reg->smax_value = reg->umax_value;
+}
+
/* check whether memory at (regno + off) is accessible for t = (read | write)
* if t==write, value_regno is a register which value is stored into memory
* if t==read, value_regno is a register which will receive the value from memory
if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ &&
regs[value_regno].type == SCALAR_VALUE) {
/* b/h/w load zero-extends, mark upper bits as known 0 */
- regs[value_regno].var_off =
- tnum_cast(regs[value_regno].var_off, size);
- __update_reg_bounds(®s[value_regno]);
+ coerce_reg_to_size(®s[value_regno], size);
}
return err;
}
tnum_strn(tn_buf, sizeof(tn_buf), regs[regno].var_off);
verbose(env, "invalid variable stack read R%d var_off=%s\n",
regno, tn_buf);
+ return -EACCES;
}
off = regs[regno].off + regs[regno].var_off.value;
if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 ||
return -EINVAL;
}
+ /* With LD_ABS/IND some JITs save/restore skb from r1. */
changes_data = bpf_helper_changes_pkt_data(fn->func);
+ if (changes_data && fn->arg1_type != ARG_PTR_TO_CTX) {
+ verbose(env, "kernel subsystem misconfigured func %s#%d: r1 != ctx\n",
+ func_id_name(func_id), func_id);
+ return -EINVAL;
+ }
memset(&meta, 0, sizeof(meta));
meta.pkt_access = fn->pkt_access;
err = check_func_arg(env, BPF_REG_2, fn->arg2_type, &meta);
if (err)
return err;
+ if (func_id == BPF_FUNC_tail_call) {
+ if (meta.map_ptr == NULL) {
+ verbose(env, "verifier bug\n");
+ return -EINVAL;
+ }
+ env->insn_aux_data[insn_idx].map_ptr = meta.map_ptr;
+ }
err = check_func_arg(env, BPF_REG_3, fn->arg3_type, &meta);
if (err)
return err;
return 0;
}
-static void coerce_reg_to_32(struct bpf_reg_state *reg)
-{
- /* clear high 32 bits */
- reg->var_off = tnum_cast(reg->var_off, 4);
- /* Update bounds */
- __update_reg_bounds(reg);
-}
-
static bool signed_add_overflows(s64 a, s64 b)
{
/* Do the add in u64, where overflow is well-defined */
return res > a;
}
+static bool check_reg_sane_offset(struct bpf_verifier_env *env,
+ const struct bpf_reg_state *reg,
+ enum bpf_reg_type type)
+{
+ bool known = tnum_is_const(reg->var_off);
+ s64 val = reg->var_off.value;
+ s64 smin = reg->smin_value;
+
+ if (known && (val >= BPF_MAX_VAR_OFF || val <= -BPF_MAX_VAR_OFF)) {
+ verbose(env, "math between %s pointer and %lld is not allowed\n",
+ reg_type_str[type], val);
+ return false;
+ }
+
+ if (reg->off >= BPF_MAX_VAR_OFF || reg->off <= -BPF_MAX_VAR_OFF) {
+ verbose(env, "%s pointer offset %d is not allowed\n",
+ reg_type_str[type], reg->off);
+ return false;
+ }
+
+ if (smin == S64_MIN) {
+ verbose(env, "math between %s pointer and register with unbounded min value is not allowed\n",
+ reg_type_str[type]);
+ return false;
+ }
+
+ if (smin >= BPF_MAX_VAR_OFF || smin <= -BPF_MAX_VAR_OFF) {
+ verbose(env, "value %lld makes %s pointer be out of bounds\n",
+ smin, reg_type_str[type]);
+ return false;
+ }
+
+ return true;
+}
+
/* Handles arithmetic on a pointer and a scalar: computes new min/max and var_off.
* Caller should also handle BPF_MOV case separately.
* If we return -EACCES, caller may want to try again treating pointer as a
if (BPF_CLASS(insn->code) != BPF_ALU64) {
/* 32-bit ALU ops on pointers produce (meaningless) scalars */
- if (!env->allow_ptr_leaks)
- verbose(env,
- "R%d 32-bit pointer arithmetic prohibited\n",
- dst);
+ verbose(env,
+ "R%d 32-bit pointer arithmetic prohibited\n",
+ dst);
return -EACCES;
}
if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
- if (!env->allow_ptr_leaks)
- verbose(env, "R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
- dst);
+ verbose(env, "R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
+ dst);
return -EACCES;
}
if (ptr_reg->type == CONST_PTR_TO_MAP) {
- if (!env->allow_ptr_leaks)
- verbose(env, "R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
- dst);
+ verbose(env, "R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n",
+ dst);
return -EACCES;
}
if (ptr_reg->type == PTR_TO_PACKET_END) {
- if (!env->allow_ptr_leaks)
- verbose(env, "R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
- dst);
+ verbose(env, "R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n",
+ dst);
return -EACCES;
}
dst_reg->type = ptr_reg->type;
dst_reg->id = ptr_reg->id;
+ if (!check_reg_sane_offset(env, off_reg, ptr_reg->type) ||
+ !check_reg_sane_offset(env, ptr_reg, ptr_reg->type))
+ return -EINVAL;
+
switch (opcode) {
case BPF_ADD:
/* We can take a fixed offset as long as it doesn't overflow
case BPF_SUB:
if (dst_reg == off_reg) {
/* scalar -= pointer. Creates an unknown scalar */
- if (!env->allow_ptr_leaks)
- verbose(env, "R%d tried to subtract pointer from scalar\n",
- dst);
+ verbose(env, "R%d tried to subtract pointer from scalar\n",
+ dst);
return -EACCES;
}
/* We don't allow subtraction from FP, because (according to
* be able to deal with it.
*/
if (ptr_reg->type == PTR_TO_STACK) {
- if (!env->allow_ptr_leaks)
- verbose(env, "R%d subtraction from stack pointer prohibited\n",
- dst);
+ verbose(env, "R%d subtraction from stack pointer prohibited\n",
+ dst);
return -EACCES;
}
if (known && (ptr_reg->off - smin_val ==
case BPF_AND:
case BPF_OR:
case BPF_XOR:
- /* bitwise ops on pointers are troublesome, prohibit for now.
- * (However, in principle we could allow some cases, e.g.
- * ptr &= ~3 which would reduce min_value by 3.)
- */
- if (!env->allow_ptr_leaks)
- verbose(env, "R%d bitwise operator %s on pointer prohibited\n",
- dst, bpf_alu_string[opcode >> 4]);
+ /* bitwise ops on pointers are troublesome, prohibit. */
+ verbose(env, "R%d bitwise operator %s on pointer prohibited\n",
+ dst, bpf_alu_string[opcode >> 4]);
return -EACCES;
default:
/* other operators (e.g. MUL,LSH) produce non-pointer results */
- if (!env->allow_ptr_leaks)
- verbose(env, "R%d pointer arithmetic with %s operator prohibited\n",
- dst, bpf_alu_string[opcode >> 4]);
+ verbose(env, "R%d pointer arithmetic with %s operator prohibited\n",
+ dst, bpf_alu_string[opcode >> 4]);
return -EACCES;
}
+ if (!check_reg_sane_offset(env, dst_reg, ptr_reg->type))
+ return -EINVAL;
+
__update_reg_bounds(dst_reg);
__reg_deduce_bounds(dst_reg);
__reg_bound_offset(dst_reg);
return 0;
}
+/* WARNING: This function does calculations on 64-bit values, but the actual
+ * execution may occur on 32-bit values. Therefore, things like bitshifts
+ * need extra checks in the 32-bit case.
+ */
static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
struct bpf_insn *insn,
struct bpf_reg_state *dst_reg,
bool src_known, dst_known;
s64 smin_val, smax_val;
u64 umin_val, umax_val;
+ u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32;
- if (BPF_CLASS(insn->code) != BPF_ALU64) {
- /* 32-bit ALU ops are (32,32)->64 */
- coerce_reg_to_32(dst_reg);
- coerce_reg_to_32(&src_reg);
- }
smin_val = src_reg.smin_value;
smax_val = src_reg.smax_value;
umin_val = src_reg.umin_value;
src_known = tnum_is_const(src_reg.var_off);
dst_known = tnum_is_const(dst_reg->var_off);
+ if (!src_known &&
+ opcode != BPF_ADD && opcode != BPF_SUB && opcode != BPF_AND) {
+ __mark_reg_unknown(dst_reg);
+ return 0;
+ }
+
switch (opcode) {
case BPF_ADD:
if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
__update_reg_bounds(dst_reg);
break;
case BPF_LSH:
- if (umax_val > 63) {
- /* Shifts greater than 63 are undefined. This includes
- * shifts by a negative number.
+ if (umax_val >= insn_bitness) {
+ /* Shifts greater than 31 or 63 are undefined.
+ * This includes shifts by a negative number.
*/
mark_reg_unknown(env, regs, insn->dst_reg);
break;
__update_reg_bounds(dst_reg);
break;
case BPF_RSH:
- if (umax_val > 63) {
- /* Shifts greater than 63 are undefined. This includes
- * shifts by a negative number.
+ if (umax_val >= insn_bitness) {
+ /* Shifts greater than 31 or 63 are undefined.
+ * This includes shifts by a negative number.
*/
mark_reg_unknown(env, regs, insn->dst_reg);
break;
}
- /* BPF_RSH is an unsigned shift, so make the appropriate casts */
- if (dst_reg->smin_value < 0) {
- if (umin_val) {
- /* Sign bit will be cleared */
- dst_reg->smin_value = 0;
- } else {
- /* Lost sign bit information */
- dst_reg->smin_value = S64_MIN;
- dst_reg->smax_value = S64_MAX;
- }
- } else {
- dst_reg->smin_value =
- (u64)(dst_reg->smin_value) >> umax_val;
- }
+ /* BPF_RSH is an unsigned shift. If the value in dst_reg might
+ * be negative, then either:
+ * 1) src_reg might be zero, so the sign bit of the result is
+ * unknown, so we lose our signed bounds
+ * 2) it's known negative, thus the unsigned bounds capture the
+ * signed bounds
+ * 3) the signed bounds cross zero, so they tell us nothing
+ * about the result
+ * If the value in dst_reg is known nonnegative, then again the
+ * unsigned bounts capture the signed bounds.
+ * Thus, in all cases it suffices to blow away our signed bounds
+ * and rely on inferring new ones from the unsigned bounds and
+ * var_off of the result.
+ */
+ dst_reg->smin_value = S64_MIN;
+ dst_reg->smax_value = S64_MAX;
if (src_known)
dst_reg->var_off = tnum_rshift(dst_reg->var_off,
umin_val);
break;
}
+ if (BPF_CLASS(insn->code) != BPF_ALU64) {
+ /* 32-bit ALU ops are (32,32)->32 */
+ coerce_reg_to_size(dst_reg, 4);
+ coerce_reg_to_size(&src_reg, 4);
+ }
+
__reg_deduce_bounds(dst_reg);
__reg_bound_offset(dst_reg);
return 0;
struct bpf_reg_state *regs = cur_regs(env), *dst_reg, *src_reg;
struct bpf_reg_state *ptr_reg = NULL, off_reg = {0};
u8 opcode = BPF_OP(insn->code);
- int rc;
dst_reg = ®s[insn->dst_reg];
src_reg = NULL;
if (src_reg->type != SCALAR_VALUE) {
if (dst_reg->type != SCALAR_VALUE) {
/* Combining two pointers by any ALU op yields
- * an arbitrary scalar.
+ * an arbitrary scalar. Disallow all math except
+ * pointer subtraction
*/
- if (!env->allow_ptr_leaks) {
- verbose(env, "R%d pointer %s pointer prohibited\n",
- insn->dst_reg,
- bpf_alu_string[opcode >> 4]);
- return -EACCES;
+ if (opcode == BPF_SUB){
+ mark_reg_unknown(env, regs, insn->dst_reg);
+ return 0;
}
- mark_reg_unknown(env, regs, insn->dst_reg);
- return 0;
+ verbose(env, "R%d pointer %s pointer prohibited\n",
+ insn->dst_reg,
+ bpf_alu_string[opcode >> 4]);
+ return -EACCES;
} else {
/* scalar += pointer
* This is legal, but we have to reverse our
* src/dest handling in computing the range
*/
- rc = adjust_ptr_min_max_vals(env, insn,
- src_reg, dst_reg);
- if (rc == -EACCES && env->allow_ptr_leaks) {
- /* scalar += unknown scalar */
- __mark_reg_unknown(&off_reg);
- return adjust_scalar_min_max_vals(
- env, insn,
- dst_reg, off_reg);
- }
- return rc;
+ return adjust_ptr_min_max_vals(env, insn,
+ src_reg, dst_reg);
}
} else if (ptr_reg) {
/* pointer += scalar */
- rc = adjust_ptr_min_max_vals(env, insn,
- dst_reg, src_reg);
- if (rc == -EACCES && env->allow_ptr_leaks) {
- /* unknown scalar += scalar */
- __mark_reg_unknown(dst_reg);
- return adjust_scalar_min_max_vals(
- env, insn, dst_reg, *src_reg);
- }
- return rc;
+ return adjust_ptr_min_max_vals(env, insn,
+ dst_reg, src_reg);
}
} else {
/* Pretend the src is a reg with a known value, since we only
off_reg.type = SCALAR_VALUE;
__mark_reg_known(&off_reg, insn->imm);
src_reg = &off_reg;
- if (ptr_reg) { /* pointer += K */
- rc = adjust_ptr_min_max_vals(env, insn,
- ptr_reg, src_reg);
- if (rc == -EACCES && env->allow_ptr_leaks) {
- /* unknown scalar += K */
- __mark_reg_unknown(dst_reg);
- return adjust_scalar_min_max_vals(
- env, insn, dst_reg, off_reg);
- }
- return rc;
- }
+ if (ptr_reg) /* pointer += K */
+ return adjust_ptr_min_max_vals(env, insn,
+ ptr_reg, src_reg);
}
/* Got here implies adding two SCALAR_VALUEs */
return -EACCES;
}
mark_reg_unknown(env, regs, insn->dst_reg);
- /* high 32 bits are known zero. */
- regs[insn->dst_reg].var_off = tnum_cast(
- regs[insn->dst_reg].var_off, 4);
- __update_reg_bounds(®s[insn->dst_reg]);
+ coerce_reg_to_size(®s[insn->dst_reg], 4);
}
} else {
/* case: R = imm
* remember the value we stored into this reg
*/
regs[insn->dst_reg].type = SCALAR_VALUE;
- __mark_reg_known(regs + insn->dst_reg, insn->imm);
+ if (BPF_CLASS(insn->code) == BPF_ALU64) {
+ __mark_reg_known(regs + insn->dst_reg,
+ insn->imm);
+ } else {
+ __mark_reg_known(regs + insn->dst_reg,
+ (u32)insn->imm);
+ }
}
} else if (opcode > BPF_END) {
return range_within(rold, rcur) &&
tnum_in(rold->var_off, rcur->var_off);
} else {
- /* if we knew anything about the old value, we're not
- * equal, because we can't know anything about the
- * scalar value of the pointer in the new value.
+ /* We're trying to use a pointer in place of a scalar.
+ * Even if the scalar was unbounded, this could lead to
+ * pointer leaks because scalars are allowed to leak
+ * while pointers are not. We could make this safe in
+ * special cases if root is calling us, but it's
+ * probably not worth the hassle.
*/
- return rold->umin_value == 0 &&
- rold->umax_value == U64_MAX &&
- rold->smin_value == S64_MIN &&
- rold->smax_value == S64_MAX &&
- tnum_is_unknown(rold->var_off);
+ return false;
}
case PTR_TO_MAP_VALUE:
/* If the new min/max/var_off satisfy the old ones and
*/
insn->imm = 0;
insn->code = BPF_JMP | BPF_TAIL_CALL;
+
+ /* instead of changing every JIT dealing with tail_call
+ * emit two extra insns:
+ * if (index >= max_entries) goto out;
+ * index &= array->index_mask;
+ * to avoid out-of-bounds cpu speculation
+ */
+ map_ptr = env->insn_aux_data[i + delta].map_ptr;
+ if (map_ptr == BPF_MAP_PTR_POISON) {
+ verbose(env, "tail_call obusing map_ptr\n");
+ return -EINVAL;
+ }
+ if (!map_ptr->unpriv_array)
+ continue;
+ insn_buf[0] = BPF_JMP_IMM(BPF_JGE, BPF_REG_3,
+ map_ptr->max_entries, 2);
+ insn_buf[1] = BPF_ALU32_IMM(BPF_AND, BPF_REG_3,
+ container_of(map_ptr,
+ struct bpf_array,
+ map)->index_mask);
+ insn_buf[2] = *insn;
+ cnt = 3;
+ new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
continue;
}
*/
do {
css_task_iter_start(&from->self, 0, &it);
- task = css_task_iter_next(&it);
+
+ do {
+ task = css_task_iter_next(&it);
+ } while (task && (task->flags & PF_EXITING));
+
if (task)
get_task_struct(task);
css_task_iter_end(&it);
cgroup_on_dfl(cgrp) ? ss->name : ss->legacy_name,
cft->name);
else
- strncpy(buf, cft->name, CGROUP_FILE_NAME_MAX);
+ strlcpy(buf, cft->name, CGROUP_FILE_NAME_MAX);
return buf;
}
root->flags = opts->flags;
if (opts->release_agent)
- strcpy(root->release_agent_path, opts->release_agent);
+ strlcpy(root->release_agent_path, opts->release_agent, PATH_MAX);
if (opts->name)
- strcpy(root->name, opts->name);
+ strlcpy(root->name, opts->name, MAX_CGROUP_ROOT_NAMELEN);
if (opts->cpuset_clone_children)
set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
}
static void css_task_iter_advance(struct css_task_iter *it)
{
- struct list_head *l = it->task_pos;
+ struct list_head *next;
lockdep_assert_held(&css_set_lock);
- WARN_ON_ONCE(!l);
-
repeat:
/*
* Advance iterator to find next entry. cset->tasks is consumed
* first and then ->mg_tasks. After ->mg_tasks, we move onto the
* next cset.
*/
- l = l->next;
+ next = it->task_pos->next;
- if (l == it->tasks_head)
- l = it->mg_tasks_head->next;
+ if (next == it->tasks_head)
+ next = it->mg_tasks_head->next;
- if (l == it->mg_tasks_head)
+ if (next == it->mg_tasks_head)
css_task_iter_advance_css_set(it);
else
- it->task_pos = l;
+ it->task_pos = next;
/* if PROCS, skip over tasks which aren't group leaders */
if ((it->flags & CSS_TASK_ITER_PROCS) && it->task_pos &&
spin_lock_irq(&css_set_lock);
rcu_read_lock();
- cset = rcu_dereference(current->cgroups);
+ cset = task_css_set(current);
refcnt = refcount_read(&cset->refcount);
seq_printf(seq, "css_set %pK %d", cset, refcnt);
if (refcnt > cset->nr_tasks)
spin_lock_irq(&css_set_lock);
rcu_read_lock();
- cset = rcu_dereference(current->cgroups);
+ cset = task_css_set(current);
list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
struct cgroup *c = link->cgrp;
}
/* ->updated_children list is self terminated */
- for_each_possible_cpu(cpu)
- cgroup_cpu_stat(cgrp, cpu)->updated_children = cgrp;
+ for_each_possible_cpu(cpu) {
+ struct cgroup_cpu_stat *cstat = cgroup_cpu_stat(cgrp, cpu);
+
+ cstat->updated_children = cgrp;
+ u64_stats_init(&cstat->sync);
+ }
prev_cputime_init(&cgrp->stat.prev_cputime);
STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
-static void inline cpuhp_lock_acquire(bool bringup)
+static inline void cpuhp_lock_acquire(bool bringup)
{
lock_map_acquire(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
}
-static void inline cpuhp_lock_release(bool bringup)
+static inline void cpuhp_lock_release(bool bringup)
{
lock_map_release(bringup ? &cpuhp_state_up_map : &cpuhp_state_down_map);
}
#else
-static void inline cpuhp_lock_acquire(bool bringup) { }
-static void inline cpuhp_lock_release(bool bringup) { }
+static inline void cpuhp_lock_acquire(bool bringup) { }
+static inline void cpuhp_lock_release(bool bringup) { }
#endif
* before blk_mq_queue_reinit_notify() from notify_dead(),
* otherwise a RCU stall occurs.
*/
- [CPUHP_TIMERS_DEAD] = {
+ [CPUHP_TIMERS_PREPARE] = {
.name = "timers:dead",
- .startup.single = NULL,
+ .startup.single = timers_prepare_cpu,
.teardown.single = timers_dead_cpu,
},
/* Kicks the plugged cpu into life */
return -EFAULT;
}
#endif
+
+__weak void abort(void)
+{
+ BUG();
+
+ /* if that doesn't kill us, halt */
+ panic("Oops failed to kill thread");
+}
+EXPORT_SYMBOL(abort);
goto out;
}
/* a new mm has just been created */
- arch_dup_mmap(oldmm, mm);
- retval = 0;
+ retval = arch_dup_mmap(oldmm, mm);
out:
up_write(&mm->mmap_sem);
flush_tlb_mm(oldmm);
return gid_gt(a, b) - gid_lt(a, b);
}
-static void groups_sort(struct group_info *group_info)
+void groups_sort(struct group_info *group_info)
{
sort(group_info->gid, group_info->ngroups, sizeof(*group_info->gid),
gid_cmp, NULL);
}
+EXPORT_SYMBOL(groups_sort);
/* a simple bsearch */
int groups_search(const struct group_info *group_info, kgid_t grp)
void set_groups(struct cred *new, struct group_info *group_info)
{
put_group_info(new->group_info);
- groups_sort(group_info);
get_group_info(group_info);
new->group_info = group_info;
}
return retval;
}
+ groups_sort(group_info);
retval = set_current_groups(group_info);
put_group_info(group_info);
static inline void print_irq_desc(unsigned int irq, struct irq_desc *desc)
{
+ static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 5);
+
+ if (!__ratelimit(&ratelimit))
+ return;
+
printk("irq %d, desc: %p, depth: %d, count: %d, unhandled: %d\n",
irq, desc, desc->depth, desc->irq_count, desc->irqs_unhandled);
printk("->handle_irq(): %p, ", desc->handle_irq);
BIT_MASK_DESCR(IRQD_SETAFFINITY_PENDING),
BIT_MASK_DESCR(IRQD_AFFINITY_MANAGED),
BIT_MASK_DESCR(IRQD_MANAGED_SHUTDOWN),
+ BIT_MASK_DESCR(IRQD_CAN_RESERVE),
BIT_MASK_DESCR(IRQD_FORWARDED_TO_VCPU),
EXPORT_SYMBOL_GPL(irq_get_domain_generic_chip);
/*
- * Separate lockdep class for interrupt chip which can nest irq_desc
- * lock.
+ * Separate lockdep classes for interrupt chip which can nest irq_desc
+ * lock and request mutex.
*/
static struct lock_class_key irq_nested_lock_class;
+static struct lock_class_key irq_nested_request_class;
/*
* irq_map_generic_chip - Map a generic chip for an irq domain
set_bit(idx, &gc->installed);
if (dgc->gc_flags & IRQ_GC_INIT_NESTED_LOCK)
- irq_set_lockdep_class(virq, &irq_nested_lock_class);
+ irq_set_lockdep_class(virq, &irq_nested_lock_class,
+ &irq_nested_request_class);
if (chip->irq_calc_mask)
chip->irq_calc_mask(data);
continue;
if (flags & IRQ_GC_INIT_NESTED_LOCK)
- irq_set_lockdep_class(i, &irq_nested_lock_class);
+ irq_set_lockdep_class(i, &irq_nested_lock_class,
+ &irq_nested_request_class);
if (!(flags & IRQ_GC_NO_MASK)) {
struct irq_data *d = irq_get_irq_data(i);
#endif /* !CONFIG_GENERIC_PENDING_IRQ */
#if !defined(CONFIG_IRQ_DOMAIN) || !defined(CONFIG_IRQ_DOMAIN_HIERARCHY)
-static inline int irq_domain_activate_irq(struct irq_data *data, bool early)
+static inline int irq_domain_activate_irq(struct irq_data *data, bool reserve)
{
irqd_set_activated(data);
return 0;
}
}
-static int __irq_domain_activate_irq(struct irq_data *irqd, bool early)
+static int __irq_domain_activate_irq(struct irq_data *irqd, bool reserve)
{
int ret = 0;
if (irqd->parent_data)
ret = __irq_domain_activate_irq(irqd->parent_data,
- early);
+ reserve);
if (!ret && domain->ops->activate) {
- ret = domain->ops->activate(domain, irqd, early);
+ ret = domain->ops->activate(domain, irqd, reserve);
/* Rollback in case of error */
if (ret && irqd->parent_data)
__irq_domain_deactivate_irq(irqd->parent_data);
/**
* irq_domain_activate_irq - Call domain_ops->activate recursively to activate
* interrupt
- * @irq_data: outermost irq_data associated with interrupt
+ * @irq_data: Outermost irq_data associated with interrupt
+ * @reserve: If set only reserve an interrupt vector instead of assigning one
*
* This is the second step to call domain_ops->activate to program interrupt
* controllers, so the interrupt could actually get delivered.
*/
-int irq_domain_activate_irq(struct irq_data *irq_data, bool early)
+int irq_domain_activate_irq(struct irq_data *irq_data, bool reserve)
{
int ret = 0;
if (!irqd_is_activated(irq_data))
- ret = __irq_domain_activate_irq(irq_data, early);
+ ret = __irq_domain_activate_irq(irq_data, reserve);
if (!ret)
irqd_set_activated(irq_data);
return ret;
return ret;
}
+/*
+ * Carefully check whether the device can use reservation mode. If
+ * reservation mode is enabled then the early activation will assign a
+ * dummy vector to the device. If the PCI/MSI device does not support
+ * masking of the entry then this can result in spurious interrupts when
+ * the device driver is not absolutely careful. But even then a malfunction
+ * of the hardware could result in a spurious interrupt on the dummy vector
+ * and render the device unusable. If the entry can be masked then the core
+ * logic will prevent the spurious interrupt and reservation mode can be
+ * used. For now reservation mode is restricted to PCI/MSI.
+ */
+static bool msi_check_reservation_mode(struct irq_domain *domain,
+ struct msi_domain_info *info,
+ struct device *dev)
+{
+ struct msi_desc *desc;
+
+ if (domain->bus_token != DOMAIN_BUS_PCI_MSI)
+ return false;
+
+ if (!(info->flags & MSI_FLAG_MUST_REACTIVATE))
+ return false;
+
+ if (IS_ENABLED(CONFIG_PCI_MSI) && pci_msi_ignore_mask)
+ return false;
+
+ /*
+ * Checking the first MSI descriptor is sufficient. MSIX supports
+ * masking and MSI does so when the maskbit is set.
+ */
+ desc = first_msi_entry(dev);
+ return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
+}
+
/**
* msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
* @domain: The domain to allocate from
{
struct msi_domain_info *info = domain->host_data;
struct msi_domain_ops *ops = info->ops;
- msi_alloc_info_t arg;
+ struct irq_data *irq_data;
struct msi_desc *desc;
+ msi_alloc_info_t arg;
int i, ret, virq;
+ bool can_reserve;
ret = msi_domain_prepare_irqs(domain, dev, nvec, &arg);
if (ret)
if (ops->msi_finish)
ops->msi_finish(&arg, 0);
+ can_reserve = msi_check_reservation_mode(domain, info, dev);
+
for_each_msi_entry(desc, dev) {
virq = desc->irq;
if (desc->nvec_used == 1)
* the MSI entries before the PCI layer enables MSI in the
* card. Otherwise the card latches a random msi message.
*/
- if (info->flags & MSI_FLAG_ACTIVATE_EARLY) {
- struct irq_data *irq_data;
+ if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY))
+ continue;
+ irq_data = irq_domain_get_irq_data(domain, desc->irq);
+ if (!can_reserve)
+ irqd_clr_can_reserve(irq_data);
+ ret = irq_domain_activate_irq(irq_data, can_reserve);
+ if (ret)
+ goto cleanup;
+ }
+
+ /*
+ * If these interrupts use reservation mode, clear the activated bit
+ * so request_irq() will assign the final vector.
+ */
+ if (can_reserve) {
+ for_each_msi_entry(desc, dev) {
irq_data = irq_domain_get_irq_data(domain, desc->irq);
- ret = irq_domain_activate_irq(irq_data, true);
- if (ret)
- goto cleanup;
- if (info->flags & MSI_FLAG_MUST_REACTIVATE)
- irqd_clr_activated(irq_data);
+ irqd_clr_activated(irq_data);
}
}
return 0;
}
EXPORT_SYMBOL(__sanitizer_cov_trace_cmp2);
-void notrace __sanitizer_cov_trace_cmp4(u16 arg1, u16 arg2)
+void notrace __sanitizer_cov_trace_cmp4(u32 arg1, u32 arg2)
{
write_comp_data(KCOV_CMP_SIZE(2), arg1, arg2, _RET_IP_);
}
}
EXPORT_SYMBOL(__sanitizer_cov_trace_const_cmp2);
-void notrace __sanitizer_cov_trace_const_cmp4(u16 arg1, u16 arg2)
+void notrace __sanitizer_cov_trace_const_cmp4(u32 arg1, u32 arg2)
{
write_comp_data(KCOV_CMP_SIZE(2) | KCOV_CMP_CONST, arg1, arg2,
_RET_IP_);
#define CREATE_TRACE_POINTS
#include <trace/events/lock.h>
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
-#include <linux/slab.h>
-#endif
-
#ifdef CONFIG_PROVE_LOCKING
int prove_locking = 1;
module_param(prove_locking, int, 0644);
#define lock_stat 0
#endif
-#ifdef CONFIG_BOOTPARAM_LOCKDEP_CROSSRELEASE_FULLSTACK
-static int crossrelease_fullstack = 1;
-#else
-static int crossrelease_fullstack;
-#endif
-static int __init allow_crossrelease_fullstack(char *str)
-{
- crossrelease_fullstack = 1;
- return 0;
-}
-
-early_param("crossrelease_fullstack", allow_crossrelease_fullstack);
-
/*
* lockdep_lock: protects the lockdep graph, the hashes and the
* class/list/hash allocators.
return is_static || static_obj(lock->key) ? NULL : ERR_PTR(-EINVAL);
}
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
-static void cross_init(struct lockdep_map *lock, int cross);
-static int cross_lock(struct lockdep_map *lock);
-static int lock_acquire_crosslock(struct held_lock *hlock);
-static int lock_release_crosslock(struct lockdep_map *lock);
-#else
-static inline void cross_init(struct lockdep_map *lock, int cross) {}
-static inline int cross_lock(struct lockdep_map *lock) { return 0; }
-static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 2; }
-static inline int lock_release_crosslock(struct lockdep_map *lock) { return 2; }
-#endif
-
/*
* Register a lock's class in the hash-table, if the class is not present
* yet. Otherwise we look it up. We cache the result in the lock object
printk(KERN_CONT "\n\n");
}
- if (cross_lock(tgt->instance)) {
- printk(" Possible unsafe locking scenario by crosslock:\n\n");
- printk(" CPU0 CPU1\n");
- printk(" ---- ----\n");
- printk(" lock(");
- __print_lock_name(parent);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(source);
- printk(KERN_CONT ");\n");
- printk(" unlock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk("\n *** DEADLOCK ***\n\n");
- } else {
- printk(" Possible unsafe locking scenario:\n\n");
- printk(" CPU0 CPU1\n");
- printk(" ---- ----\n");
- printk(" lock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(parent);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(source);
- printk(KERN_CONT ");\n");
- printk("\n *** DEADLOCK ***\n\n");
- }
+ printk(" Possible unsafe locking scenario:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(parent);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(source);
+ printk(KERN_CONT ");\n");
+ printk("\n *** DEADLOCK ***\n\n");
}
/*
curr->comm, task_pid_nr(curr));
print_lock(check_src);
- if (cross_lock(check_tgt->instance))
- pr_warn("\nbut now in release context of a crosslock acquired at the following:\n");
- else
- pr_warn("\nbut task is already holding lock:\n");
+ pr_warn("\nbut task is already holding lock:\n");
print_lock(check_tgt);
pr_warn("\nwhich lock already depends on the new lock.\n\n");
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
- if (cross_lock(check_tgt->instance))
- this->trace = *trace;
- else if (!save_trace(&this->trace))
+ if (!save_trace(&this->trace))
return 0;
depth = get_lock_depth(target);
if (nest)
return 2;
- if (cross_lock(prev->instance))
- continue;
-
return print_deadlock_bug(curr, prev, next);
}
return 1;
for (;;) {
int distance = curr->lockdep_depth - depth + 1;
hlock = curr->held_locks + depth - 1;
+
/*
- * Only non-crosslock entries get new dependencies added.
- * Crosslock entries will be added by commit later:
+ * Only non-recursive-read entries get new dependencies
+ * added:
*/
- if (!cross_lock(hlock->instance)) {
+ if (hlock->read != 2 && hlock->check) {
+ int ret = check_prev_add(curr, hlock, next, distance, &trace, save_trace);
+ if (!ret)
+ return 0;
+
/*
- * Only non-recursive-read entries get new dependencies
- * added:
+ * Stop after the first non-trylock entry,
+ * as non-trylock entries have added their
+ * own direct dependencies already, so this
+ * lock is connected to them indirectly:
*/
- if (hlock->read != 2 && hlock->check) {
- int ret = check_prev_add(curr, hlock, next,
- distance, &trace, save_trace);
- if (!ret)
- return 0;
-
- /*
- * Stop after the first non-trylock entry,
- * as non-trylock entries have added their
- * own direct dependencies already, so this
- * lock is connected to them indirectly:
- */
- if (!hlock->trylock)
- break;
- }
+ if (!hlock->trylock)
+ break;
}
+
depth--;
/*
* End of lock-stack?
void lockdep_init_map(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass)
{
- cross_init(lock, 0);
__lockdep_init_map(lock, name, key, subclass);
}
EXPORT_SYMBOL_GPL(lockdep_init_map);
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
-void lockdep_init_map_crosslock(struct lockdep_map *lock, const char *name,
- struct lock_class_key *key, int subclass)
-{
- cross_init(lock, 1);
- __lockdep_init_map(lock, name, key, subclass);
-}
-EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
-#endif
-
struct lock_class_key __lockdep_no_validate__;
EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
int chain_head = 0;
int class_idx;
u64 chain_key;
- int ret;
if (unlikely(!debug_locks))
return 0;
class_idx = class - lock_classes + 1;
- /* TODO: nest_lock is not implemented for crosslock yet. */
- if (depth && !cross_lock(lock)) {
+ if (depth) {
hlock = curr->held_locks + depth - 1;
if (hlock->class_idx == class_idx && nest_lock) {
if (hlock->references) {
if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
return 0;
- ret = lock_acquire_crosslock(hlock);
- /*
- * 2 means normal acquire operations are needed. Otherwise, it's
- * ok just to return with '0:fail, 1:success'.
- */
- if (ret != 2)
- return ret;
-
curr->curr_chain_key = chain_key;
curr->lockdep_depth++;
check_chain_key(curr);
struct task_struct *curr = current;
struct held_lock *hlock;
unsigned int depth;
- int ret, i;
+ int i;
if (unlikely(!debug_locks))
return 0;
- ret = lock_release_crosslock(lock);
- /*
- * 2 means normal release operations are needed. Otherwise, it's
- * ok just to return with '0:fail, 1:success'.
- */
- if (ret != 2)
- return ret;
-
depth = curr->lockdep_depth;
/*
* So we're all set to release this lock.. wait what lock? We don't
dump_stack();
}
EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
-
-#ifdef CONFIG_LOCKDEP_CROSSRELEASE
-
-/*
- * Crossrelease works by recording a lock history for each thread and
- * connecting those historic locks that were taken after the
- * wait_for_completion() in the complete() context.
- *
- * Task-A Task-B
- *
- * mutex_lock(&A);
- * mutex_unlock(&A);
- *
- * wait_for_completion(&C);
- * lock_acquire_crosslock();
- * atomic_inc_return(&cross_gen_id);
- * |
- * | mutex_lock(&B);
- * | mutex_unlock(&B);
- * |
- * | complete(&C);
- * `-- lock_commit_crosslock();
- *
- * Which will then add a dependency between B and C.
- */
-
-#define xhlock(i) (current->xhlocks[(i) % MAX_XHLOCKS_NR])
-
-/*
- * Whenever a crosslock is held, cross_gen_id will be increased.
- */
-static atomic_t cross_gen_id; /* Can be wrapped */
-
-/*
- * Make an entry of the ring buffer invalid.
- */
-static inline void invalidate_xhlock(struct hist_lock *xhlock)
-{
- /*
- * Normally, xhlock->hlock.instance must be !NULL.
- */
- xhlock->hlock.instance = NULL;
-}
-
-/*
- * Lock history stacks; we have 2 nested lock history stacks:
- *
- * HARD(IRQ)
- * SOFT(IRQ)
- *
- * The thing is that once we complete a HARD/SOFT IRQ the future task locks
- * should not depend on any of the locks observed while running the IRQ. So
- * what we do is rewind the history buffer and erase all our knowledge of that
- * temporal event.
- */
-
-void crossrelease_hist_start(enum xhlock_context_t c)
-{
- struct task_struct *cur = current;
-
- if (!cur->xhlocks)
- return;
-
- cur->xhlock_idx_hist[c] = cur->xhlock_idx;
- cur->hist_id_save[c] = cur->hist_id;
-}
-
-void crossrelease_hist_end(enum xhlock_context_t c)
-{
- struct task_struct *cur = current;
-
- if (cur->xhlocks) {
- unsigned int idx = cur->xhlock_idx_hist[c];
- struct hist_lock *h = &xhlock(idx);
-
- cur->xhlock_idx = idx;
-
- /* Check if the ring was overwritten. */
- if (h->hist_id != cur->hist_id_save[c])
- invalidate_xhlock(h);
- }
-}
-
-/*
- * lockdep_invariant_state() is used to annotate independence inside a task, to
- * make one task look like multiple independent 'tasks'.
- *
- * Take for instance workqueues; each work is independent of the last. The
- * completion of a future work does not depend on the completion of a past work
- * (in general). Therefore we must not carry that (lock) dependency across
- * works.
- *
- * This is true for many things; pretty much all kthreads fall into this
- * pattern, where they have an invariant state and future completions do not
- * depend on past completions. Its just that since they all have the 'same'
- * form -- the kthread does the same over and over -- it doesn't typically
- * matter.
- *
- * The same is true for system-calls, once a system call is completed (we've
- * returned to userspace) the next system call does not depend on the lock
- * history of the previous system call.
- *
- * They key property for independence, this invariant state, is that it must be
- * a point where we hold no locks and have no history. Because if we were to
- * hold locks, the restore at _end() would not necessarily recover it's history
- * entry. Similarly, independence per-definition means it does not depend on
- * prior state.
- */
-void lockdep_invariant_state(bool force)
-{
- /*
- * We call this at an invariant point, no current state, no history.
- * Verify the former, enforce the latter.
- */
- WARN_ON_ONCE(!force && current->lockdep_depth);
- if (current->xhlocks)
- invalidate_xhlock(&xhlock(current->xhlock_idx));
-}
-
-static int cross_lock(struct lockdep_map *lock)
-{
- return lock ? lock->cross : 0;
-}
-
-/*
- * This is needed to decide the relationship between wrapable variables.
- */
-static inline int before(unsigned int a, unsigned int b)
-{
- return (int)(a - b) < 0;
-}
-
-static inline struct lock_class *xhlock_class(struct hist_lock *xhlock)
-{
- return hlock_class(&xhlock->hlock);
-}
-
-static inline struct lock_class *xlock_class(struct cross_lock *xlock)
-{
- return hlock_class(&xlock->hlock);
-}
-
-/*
- * Should we check a dependency with previous one?
- */
-static inline int depend_before(struct held_lock *hlock)
-{
- return hlock->read != 2 && hlock->check && !hlock->trylock;
-}
-
-/*
- * Should we check a dependency with next one?
- */
-static inline int depend_after(struct held_lock *hlock)
-{
- return hlock->read != 2 && hlock->check;
-}
-
-/*
- * Check if the xhlock is valid, which would be false if,
- *
- * 1. Has not used after initializaion yet.
- * 2. Got invalidated.
- *
- * Remind hist_lock is implemented as a ring buffer.
- */
-static inline int xhlock_valid(struct hist_lock *xhlock)
-{
- /*
- * xhlock->hlock.instance must be !NULL.
- */
- return !!xhlock->hlock.instance;
-}
-
-/*
- * Record a hist_lock entry.
- *
- * Irq disable is only required.
- */
-static void add_xhlock(struct held_lock *hlock)
-{
- unsigned int idx = ++current->xhlock_idx;
- struct hist_lock *xhlock = &xhlock(idx);
-
-#ifdef CONFIG_DEBUG_LOCKDEP
- /*
- * This can be done locklessly because they are all task-local
- * state, we must however ensure IRQs are disabled.
- */
- WARN_ON_ONCE(!irqs_disabled());
-#endif
-
- /* Initialize hist_lock's members */
- xhlock->hlock = *hlock;
- xhlock->hist_id = ++current->hist_id;
-
- xhlock->trace.nr_entries = 0;
- xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
- xhlock->trace.entries = xhlock->trace_entries;
-
- if (crossrelease_fullstack) {
- xhlock->trace.skip = 3;
- save_stack_trace(&xhlock->trace);
- } else {
- xhlock->trace.nr_entries = 1;
- xhlock->trace.entries[0] = hlock->acquire_ip;
- }
-}
-
-static inline int same_context_xhlock(struct hist_lock *xhlock)
-{
- return xhlock->hlock.irq_context == task_irq_context(current);
-}
-
-/*
- * This should be lockless as far as possible because this would be
- * called very frequently.
- */
-static void check_add_xhlock(struct held_lock *hlock)
-{
- /*
- * Record a hist_lock, only in case that acquisitions ahead
- * could depend on the held_lock. For example, if the held_lock
- * is trylock then acquisitions ahead never depends on that.
- * In that case, we don't need to record it. Just return.
- */
- if (!current->xhlocks || !depend_before(hlock))
- return;
-
- add_xhlock(hlock);
-}
-
-/*
- * For crosslock.
- */
-static int add_xlock(struct held_lock *hlock)
-{
- struct cross_lock *xlock;
- unsigned int gen_id;
-
- if (!graph_lock())
- return 0;
-
- xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
-
- /*
- * When acquisitions for a crosslock are overlapped, we use
- * nr_acquire to perform commit for them, based on cross_gen_id
- * of the first acquisition, which allows to add additional
- * dependencies.
- *
- * Moreover, when no acquisition of a crosslock is in progress,
- * we should not perform commit because the lock might not exist
- * any more, which might cause incorrect memory access. So we
- * have to track the number of acquisitions of a crosslock.
- *
- * depend_after() is necessary to initialize only the first
- * valid xlock so that the xlock can be used on its commit.
- */
- if (xlock->nr_acquire++ && depend_after(&xlock->hlock))
- goto unlock;
-
- gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
- xlock->hlock = *hlock;
- xlock->hlock.gen_id = gen_id;
-unlock:
- graph_unlock();
- return 1;
-}
-
-/*
- * Called for both normal and crosslock acquires. Normal locks will be
- * pushed on the hist_lock queue. Cross locks will record state and
- * stop regular lock_acquire() to avoid being placed on the held_lock
- * stack.
- *
- * Return: 0 - failure;
- * 1 - crosslock, done;
- * 2 - normal lock, continue to held_lock[] ops.
- */
-static int lock_acquire_crosslock(struct held_lock *hlock)
-{
- /*
- * CONTEXT 1 CONTEXT 2
- * --------- ---------
- * lock A (cross)
- * X = atomic_inc_return(&cross_gen_id)
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- * Y = atomic_read_acquire(&cross_gen_id)
- * lock B
- *
- * atomic_read_acquire() is for ordering between A and B,
- * IOW, A happens before B, when CONTEXT 2 see Y >= X.
- *
- * Pairs with atomic_inc_return() in add_xlock().
- */
- hlock->gen_id = (unsigned int)atomic_read_acquire(&cross_gen_id);
-
- if (cross_lock(hlock->instance))
- return add_xlock(hlock);
-
- check_add_xhlock(hlock);
- return 2;
-}
-
-static int copy_trace(struct stack_trace *trace)
-{
- unsigned long *buf = stack_trace + nr_stack_trace_entries;
- unsigned int max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
- unsigned int nr = min(max_nr, trace->nr_entries);
-
- trace->nr_entries = nr;
- memcpy(buf, trace->entries, nr * sizeof(trace->entries[0]));
- trace->entries = buf;
- nr_stack_trace_entries += nr;
-
- if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
- if (!debug_locks_off_graph_unlock())
- return 0;
-
- print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
- dump_stack();
-
- return 0;
- }
-
- return 1;
-}
-
-static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
-{
- unsigned int xid, pid;
- u64 chain_key;
-
- xid = xlock_class(xlock) - lock_classes;
- chain_key = iterate_chain_key((u64)0, xid);
- pid = xhlock_class(xhlock) - lock_classes;
- chain_key = iterate_chain_key(chain_key, pid);
-
- if (lookup_chain_cache(chain_key))
- return 1;
-
- if (!add_chain_cache_classes(xid, pid, xhlock->hlock.irq_context,
- chain_key))
- return 0;
-
- if (!check_prev_add(current, &xlock->hlock, &xhlock->hlock, 1,
- &xhlock->trace, copy_trace))
- return 0;
-
- return 1;
-}
-
-static void commit_xhlocks(struct cross_lock *xlock)
-{
- unsigned int cur = current->xhlock_idx;
- unsigned int prev_hist_id = xhlock(cur).hist_id;
- unsigned int i;
-
- if (!graph_lock())
- return;
-
- if (xlock->nr_acquire) {
- for (i = 0; i < MAX_XHLOCKS_NR; i++) {
- struct hist_lock *xhlock = &xhlock(cur - i);
-
- if (!xhlock_valid(xhlock))
- break;
-
- if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
- break;
-
- if (!same_context_xhlock(xhlock))
- break;
-
- /*
- * Filter out the cases where the ring buffer was
- * overwritten and the current entry has a bigger
- * hist_id than the previous one, which is impossible
- * otherwise:
- */
- if (unlikely(before(prev_hist_id, xhlock->hist_id)))
- break;
-
- prev_hist_id = xhlock->hist_id;
-
- /*
- * commit_xhlock() returns 0 with graph_lock already
- * released if fail.
- */
- if (!commit_xhlock(xlock, xhlock))
- return;
- }
- }
-
- graph_unlock();
-}
-
-void lock_commit_crosslock(struct lockdep_map *lock)
-{
- struct cross_lock *xlock;
- unsigned long flags;
-
- if (unlikely(!debug_locks || current->lockdep_recursion))
- return;
-
- if (!current->xhlocks)
- return;
-
- /*
- * Do commit hist_locks with the cross_lock, only in case that
- * the cross_lock could depend on acquisitions after that.
- *
- * For example, if the cross_lock does not have the 'check' flag
- * then we don't need to check dependencies and commit for that.
- * Just skip it. In that case, of course, the cross_lock does
- * not depend on acquisitions ahead, either.
- *
- * WARNING: Don't do that in add_xlock() in advance. When an
- * acquisition context is different from the commit context,
- * invalid(skipped) cross_lock might be accessed.
- */
- if (!depend_after(&((struct lockdep_map_cross *)lock)->xlock.hlock))
- return;
-
- raw_local_irq_save(flags);
- check_flags(flags);
- current->lockdep_recursion = 1;
- xlock = &((struct lockdep_map_cross *)lock)->xlock;
- commit_xhlocks(xlock);
- current->lockdep_recursion = 0;
- raw_local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(lock_commit_crosslock);
-
-/*
- * Return: 0 - failure;
- * 1 - crosslock, done;
- * 2 - normal lock, continue to held_lock[] ops.
- */
-static int lock_release_crosslock(struct lockdep_map *lock)
-{
- if (cross_lock(lock)) {
- if (!graph_lock())
- return 0;
- ((struct lockdep_map_cross *)lock)->xlock.nr_acquire--;
- graph_unlock();
- return 1;
- }
- return 2;
-}
-
-static void cross_init(struct lockdep_map *lock, int cross)
-{
- if (cross)
- ((struct lockdep_map_cross *)lock)->xlock.nr_acquire = 0;
-
- lock->cross = cross;
-
- /*
- * Crossrelease assumes that the ring buffer size of xhlocks
- * is aligned with power of 2. So force it on build.
- */
- BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCKS_NR - 1));
-}
-
-void lockdep_init_task(struct task_struct *task)
-{
- int i;
-
- task->xhlock_idx = UINT_MAX;
- task->hist_id = 0;
-
- for (i = 0; i < XHLOCK_CTX_NR; i++) {
- task->xhlock_idx_hist[i] = UINT_MAX;
- task->hist_id_save[i] = 0;
- }
-
- task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
- GFP_KERNEL);
-}
-
-void lockdep_free_task(struct task_struct *task)
-{
- if (task->xhlocks) {
- void *tmp = task->xhlocks;
- /* Diable crossrelease for current */
- task->xhlocks = NULL;
- kfree(tmp);
- }
-}
-#endif
break; \
preempt_enable(); \
\
- if (!(lock)->break_lock) \
- (lock)->break_lock = 1; \
- while ((lock)->break_lock) \
- arch_##op##_relax(&lock->raw_lock); \
+ arch_##op##_relax(&lock->raw_lock); \
} \
- (lock)->break_lock = 0; \
} \
\
unsigned long __lockfunc __raw_##op##_lock_irqsave(locktype##_t *lock) \
local_irq_restore(flags); \
preempt_enable(); \
\
- if (!(lock)->break_lock) \
- (lock)->break_lock = 1; \
- while ((lock)->break_lock) \
- arch_##op##_relax(&lock->raw_lock); \
+ arch_##op##_relax(&lock->raw_lock); \
} \
- (lock)->break_lock = 0; \
+ \
return flags; \
} \
\
}
if (unlikely(is_child_reaper(pid))) {
- if (pid_ns_prepare_proc(ns)) {
- disable_pid_allocation(ns);
+ if (pid_ns_prepare_proc(ns))
goto out_free;
- }
}
get_pid_ns(ns);
while (++i <= ns->level)
idr_remove(&ns->idr, (pid->numbers + i)->nr);
+ /* On failure to allocate the first pid, reset the state */
+ if (ns->pid_allocated == PIDNS_ADDING)
+ idr_set_cursor(&ns->idr, 0);
+
spin_unlock_irq(&pidmap_lock);
kmem_cache_free(ns->pid_cachep, pid);
return ret;
}
-/**
- * sys_sched_rr_get_interval - return the default timeslice of a process.
- * @pid: pid of the process.
- * @interval: userspace pointer to the timeslice value.
- *
- * this syscall writes the default timeslice value of a given process
- * into the user-space timespec buffer. A value of '0' means infinity.
- *
- * Return: On success, 0 and the timeslice is in @interval. Otherwise,
- * an error code.
- */
static int sched_rr_get_interval(pid_t pid, struct timespec64 *t)
{
struct task_struct *p;
return retval;
}
+/**
+ * sys_sched_rr_get_interval - return the default timeslice of a process.
+ * @pid: pid of the process.
+ * @interval: userspace pointer to the timeslice value.
+ *
+ * this syscall writes the default timeslice value of a given process
+ * into the user-space timespec buffer. A value of '0' means infinity.
+ *
+ * Return: On success, 0 and the timeslice is in @interval. Otherwise,
+ * an error code.
+ */
SYSCALL_DEFINE2(sched_rr_get_interval, pid_t, pid,
struct timespec __user *, interval)
{
#ifdef CONFIG_NO_HZ_COMMON
static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
{
- unsigned long idle_calls = tick_nohz_get_idle_calls();
+ unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
bool ret = idle_calls == sg_cpu->saved_idle_calls;
sg_cpu->saved_idle_calls = idle_calls;
bool resched = false;
struct task_struct *p;
struct rq *src_rq;
+ int rt_overload_count = rt_overloaded(this_rq);
- if (likely(!rt_overloaded(this_rq)))
+ if (likely(!rt_overload_count))
return;
/*
*/
smp_rmb();
+ /* If we are the only overloaded CPU do nothing */
+ if (rt_overload_count == 1 &&
+ cpumask_test_cpu(this_rq->cpu, this_rq->rd->rto_mask))
+ return;
+
#ifdef HAVE_RT_PUSH_IPI
if (sched_feat(RT_PUSH_IPI)) {
tell_cpu_to_push(this_rq);
select RCU_NOCB_CPU
select VIRT_CPU_ACCOUNTING_GEN
select IRQ_WORK
+ select CPU_ISOLATION
help
Adaptively try to shutdown the tick whenever possible, even when
the CPU is running tasks. Typically this requires running a single
{
struct task_struct *rtn = current->group_leader;
- if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
- (!(rtn = find_task_by_vpid(event->sigev_notify_thread_id)) ||
- !same_thread_group(rtn, current) ||
- (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
+ switch (event->sigev_notify) {
+ case SIGEV_SIGNAL | SIGEV_THREAD_ID:
+ rtn = find_task_by_vpid(event->sigev_notify_thread_id);
+ if (!rtn || !same_thread_group(rtn, current))
+ return NULL;
+ /* FALLTHRU */
+ case SIGEV_SIGNAL:
+ case SIGEV_THREAD:
+ if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
+ return NULL;
+ /* FALLTHRU */
+ case SIGEV_NONE:
+ return task_pid(rtn);
+ default:
return NULL;
-
- if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
- ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
- return NULL;
-
- return task_pid(rtn);
+ }
}
static struct k_itimer * alloc_posix_timer(void)
struct timespec64 ts64;
bool sig_none;
- sig_none = (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE;
+ sig_none = timr->it_sigev_notify == SIGEV_NONE;
iv = timr->it_interval;
/* interval timer ? */
timr->it_interval = timespec64_to_ktime(new_setting->it_interval);
expires = timespec64_to_ktime(new_setting->it_value);
- sigev_none = (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE;
+ sigev_none = timr->it_sigev_notify == SIGEV_NONE;
kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none);
timr->it_active = !sigev_none;
ts->next_tick = 0;
}
+static inline bool local_timer_softirq_pending(void)
+{
+ return local_softirq_pending() & TIMER_SOFTIRQ;
+}
+
static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
ktime_t now, int cpu)
{
} while (read_seqretry(&jiffies_lock, seq));
ts->last_jiffies = basejiff;
- if (rcu_needs_cpu(basemono, &next_rcu) ||
- arch_needs_cpu() || irq_work_needs_cpu()) {
+ /*
+ * Keep the periodic tick, when RCU, architecture or irq_work
+ * requests it.
+ * Aside of that check whether the local timer softirq is
+ * pending. If so its a bad idea to call get_next_timer_interrupt()
+ * because there is an already expired timer, so it will request
+ * immeditate expiry, which rearms the hardware timer with a
+ * minimal delta which brings us back to this place
+ * immediately. Lather, rinse and repeat...
+ */
+ if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
+ irq_work_needs_cpu() || local_timer_softirq_pending()) {
next_tick = basemono + TICK_NSEC;
} else {
/*
return ts->sleep_length;
}
+/**
+ * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
+ * for a particular CPU.
+ *
+ * Called from the schedutil frequency scaling governor in scheduler context.
+ */
+unsigned long tick_nohz_get_idle_calls_cpu(int cpu)
+{
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
+
+ return ts->idle_calls;
+}
+
/**
* tick_nohz_get_idle_calls - return the current idle calls counter value
*
struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu);
/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
return base;
}
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
/*
- * If the timer is deferrable and nohz is active then we need to use
- * the deferrable base.
+ * If the timer is deferrable and NO_HZ_COMMON is set then we need
+ * to use the deferrable base.
*/
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active &&
- (tflags & TIMER_DEFERRABLE))
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
base = this_cpu_ptr(&timer_bases[BASE_DEF]);
return base;
}
if (!ret && (options & MOD_TIMER_PENDING_ONLY))
goto out_unlock;
- debug_activate(timer, expires);
-
new_base = get_target_base(base, timer->flags);
if (base != new_base) {
}
}
+ debug_activate(timer, expires);
+
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance
base->must_forward_clk = false;
__run_timers(base);
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
+ if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
}
}
}
+int timers_prepare_cpu(unsigned int cpu)
+{
+ struct timer_base *base;
+ int b;
+
+ for (b = 0; b < NR_BASES; b++) {
+ base = per_cpu_ptr(&timer_bases[b], cpu);
+ base->clk = jiffies;
+ base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
+ base->is_idle = false;
+ base->must_forward_clk = true;
+ }
+ return 0;
+}
+
int timers_dead_cpu(unsigned int cpu)
{
struct timer_base *old_base;
bool "Enable trace events for preempt and irq disable/enable"
select TRACE_IRQFLAGS
depends on DEBUG_PREEMPT || !PROVE_LOCKING
+ depends on TRACING
default n
help
Enable tracing of disable and enable events for preemption and irqs.
.arg4_type = ARG_CONST_SIZE,
};
-static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd);
+static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd);
static __always_inline u64
__bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
- u64 flags, struct perf_raw_record *raw)
+ u64 flags, struct perf_sample_data *sd)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
- struct perf_sample_data *sd = this_cpu_ptr(&bpf_sd);
unsigned int cpu = smp_processor_id();
u64 index = flags & BPF_F_INDEX_MASK;
struct bpf_event_entry *ee;
if (unlikely(event->oncpu != cpu))
return -EOPNOTSUPP;
- perf_sample_data_init(sd, 0, 0);
- sd->raw = raw;
perf_event_output(event, sd, regs);
return 0;
}
BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags, void *, data, u64, size)
{
+ struct perf_sample_data *sd = this_cpu_ptr(&bpf_trace_sd);
struct perf_raw_record raw = {
.frag = {
.size = size,
if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
return -EINVAL;
- return __bpf_perf_event_output(regs, map, flags, &raw);
+ perf_sample_data_init(sd, 0, 0);
+ sd->raw = &raw;
+
+ return __bpf_perf_event_output(regs, map, flags, sd);
}
static const struct bpf_func_proto bpf_perf_event_output_proto = {
};
static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs);
+static DEFINE_PER_CPU(struct perf_sample_data, bpf_misc_sd);
u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy)
{
+ struct perf_sample_data *sd = this_cpu_ptr(&bpf_misc_sd);
struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs);
struct perf_raw_frag frag = {
.copy = ctx_copy,
};
perf_fetch_caller_regs(regs);
+ perf_sample_data_init(sd, 0, 0);
+ sd->raw = &raw;
- return __bpf_perf_event_output(regs, map, flags, &raw);
+ return __bpf_perf_event_output(regs, map, flags, sd);
}
BPF_CALL_0(bpf_get_current_task)
/* Missed count stored at end */
#define RB_MISSED_STORED (1 << 30)
+#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED)
+
struct buffer_data_page {
u64 time_stamp; /* page time stamp */
local_t commit; /* write committed index */
*/
size_t ring_buffer_page_len(void *page)
{
- return local_read(&((struct buffer_data_page *)page)->commit)
+ struct buffer_data_page *bpage = page;
+
+ return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS)
+ BUF_PAGE_HDR_SIZE;
}
}
EXPORT_SYMBOL_GPL(ring_buffer_change_overwrite);
-static __always_inline void *
-__rb_data_page_index(struct buffer_data_page *bpage, unsigned index)
-{
- return bpage->data + index;
-}
-
static __always_inline void *__rb_page_index(struct buffer_page *bpage, unsigned index)
{
return bpage->page->data + index;
{
struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
struct buffer_data_page *bpage = data;
+ struct page *page = virt_to_page(bpage);
unsigned long flags;
+ /* If the page is still in use someplace else, we can't reuse it */
+ if (page_ref_count(page) > 1)
+ goto out;
+
local_irq_save(flags);
arch_spin_lock(&cpu_buffer->lock);
arch_spin_unlock(&cpu_buffer->lock);
local_irq_restore(flags);
+ out:
free_page((unsigned long)bpage);
}
EXPORT_SYMBOL_GPL(ring_buffer_free_read_page);
}
/**
- * trace_pid_filter_add_remove - Add or remove a task from a pid_list
+ * trace_pid_filter_add_remove_task - Add or remove a task from a pid_list
* @pid_list: The list to modify
* @self: The current task for fork or NULL for exit
* @task: The task to add or remove
}
/**
- * trace_snapshot - take a snapshot of the current buffer.
+ * tracing_snapshot - take a snapshot of the current buffer.
*
* This causes a swap between the snapshot buffer and the current live
* tracing buffer. You can use this to take snapshots of the live
EXPORT_SYMBOL_GPL(tracing_alloc_snapshot);
/**
- * trace_snapshot_alloc - allocate and take a snapshot of the current buffer.
+ * tracing_snapshot_alloc - allocate and take a snapshot of the current buffer.
*
- * This is similar to trace_snapshot(), but it will allocate the
+ * This is similar to tracing_snapshot(), but it will allocate the
* snapshot buffer if it isn't already allocated. Use this only
* where it is safe to sleep, as the allocation may sleep.
*
/*
* Copy the new maximum trace into the separate maximum-trace
* structure. (this way the maximum trace is permanently saved,
- * for later retrieval via /sys/kernel/debug/tracing/latency_trace)
+ * for later retrieval via /sys/kernel/tracing/tracing_max_latency)
*/
static void
__update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
entry = ring_buffer_event_data(event);
size = ring_buffer_event_length(event);
- export->write(entry, size);
+ export->write(export, entry, size);
}
static DEFINE_MUTEX(ftrace_export_lock);
.llseek = seq_lseek,
};
-/*
- * The tracer itself will not take this lock, but still we want
- * to provide a consistent cpumask to user-space:
- */
-static DEFINE_MUTEX(tracing_cpumask_update_lock);
-
-/*
- * Temporary storage for the character representation of the
- * CPU bitmask (and one more byte for the newline):
- */
-static char mask_str[NR_CPUS + 1];
-
static ssize_t
tracing_cpumask_read(struct file *filp, char __user *ubuf,
size_t count, loff_t *ppos)
{
struct trace_array *tr = file_inode(filp)->i_private;
+ char *mask_str;
int len;
- mutex_lock(&tracing_cpumask_update_lock);
+ len = snprintf(NULL, 0, "%*pb\n",
+ cpumask_pr_args(tr->tracing_cpumask)) + 1;
+ mask_str = kmalloc(len, GFP_KERNEL);
+ if (!mask_str)
+ return -ENOMEM;
- len = snprintf(mask_str, count, "%*pb\n",
+ len = snprintf(mask_str, len, "%*pb\n",
cpumask_pr_args(tr->tracing_cpumask));
if (len >= count) {
count = -EINVAL;
goto out_err;
}
- count = simple_read_from_buffer(ubuf, count, ppos, mask_str, NR_CPUS+1);
+ count = simple_read_from_buffer(ubuf, count, ppos, mask_str, len);
out_err:
- mutex_unlock(&tracing_cpumask_update_lock);
+ kfree(mask_str);
return count;
}
if (err)
goto err_unlock;
- mutex_lock(&tracing_cpumask_update_lock);
-
local_irq_disable();
arch_spin_lock(&tr->max_lock);
for_each_tracing_cpu(cpu) {
local_irq_enable();
cpumask_copy(tr->tracing_cpumask, tracing_cpumask_new);
-
- mutex_unlock(&tracing_cpumask_update_lock);
free_cpumask_var(tracing_cpumask_new);
return count;
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
- int entries, size, i;
+ int entries, i;
ssize_t ret = 0;
#ifdef CONFIG_TRACER_MAX_TRACE
break;
}
- /*
- * zero out any left over data, this is going to
- * user land.
- */
- size = ring_buffer_page_len(ref->page);
- if (size < PAGE_SIZE)
- memset(ref->page + size, 0, PAGE_SIZE - size);
-
page = virt_to_page(ref->page);
spd.pages[i] = page;
buf->data = alloc_percpu(struct trace_array_cpu);
if (!buf->data) {
ring_buffer_free(buf->buffer);
+ buf->buffer = NULL;
return -ENOMEM;
}
allocate_snapshot ? size : 1);
if (WARN_ON(ret)) {
ring_buffer_free(tr->trace_buffer.buffer);
+ tr->trace_buffer.buffer = NULL;
free_percpu(tr->trace_buffer.data);
+ tr->trace_buffer.data = NULL;
return -ENOMEM;
}
tr->allocated_snapshot = allocate_snapshot;
if (__this_cpu_read(disable_stack_tracer) != 1)
goto out;
+ /* If rcu is not watching, then save stack trace can fail */
+ if (!rcu_is_watching())
+ goto out;
+
ip += MCOUNT_INSN_SIZE;
check_stack(ip, &stack);
return retval;
}
+ groups_sort(group_info);
retval = set_current_groups(group_info);
put_group_info(group_info);
#include <linux/hardirq.h>
#include <linux/mempolicy.h>
#include <linux/freezer.h>
-#include <linux/kallsyms.h>
#include <linux/debug_locks.h>
#include <linux/lockdep.h>
#include <linux/idr.h>
#include <linux/nodemask.h>
#include <linux/moduleparam.h>
#include <linux/uaccess.h>
+#include <linux/sched/isolation.h>
#include "workqueue_internal.h"
mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
/*
- * Sanity check nr_running. Because wq_unbind_fn() releases
+ * Sanity check nr_running. Because unbind_workers() releases
* pool->lock between setting %WORKER_UNBOUND and zapping
* nr_running, the warning may trigger spuriously. Check iff
* unbind is not in progress.
* cpu comes back online.
*/
-static void wq_unbind_fn(struct work_struct *work)
+static void unbind_workers(int cpu)
{
- int cpu = smp_processor_id();
struct worker_pool *pool;
struct worker *worker;
spin_lock_irq(&pool->lock);
- /*
- * XXX: CPU hotplug notifiers are weird and can call DOWN_FAILED
- * w/o preceding DOWN_PREPARE. Work around it. CPU hotplug is
- * being reworked and this can go away in time.
- */
- if (!(pool->flags & POOL_DISASSOCIATED)) {
- spin_unlock_irq(&pool->lock);
- return;
- }
-
pool->flags &= ~POOL_DISASSOCIATED;
for_each_pool_worker(worker, pool) {
int workqueue_offline_cpu(unsigned int cpu)
{
- struct work_struct unbind_work;
struct workqueue_struct *wq;
/* unbinding per-cpu workers should happen on the local CPU */
- INIT_WORK_ONSTACK(&unbind_work, wq_unbind_fn);
- queue_work_on(cpu, system_highpri_wq, &unbind_work);
+ if (WARN_ON(cpu != smp_processor_id()))
+ return -1;
+
+ unbind_workers(cpu);
/* update NUMA affinity of unbound workqueues */
mutex_lock(&wq_pool_mutex);
wq_update_unbound_numa(wq, cpu, false);
mutex_unlock(&wq_pool_mutex);
- /* wait for per-cpu unbinding to finish */
- flush_work(&unbind_work);
- destroy_work_on_stack(&unbind_work);
return 0;
}
if (!zalloc_cpumask_var(&saved_cpumask, GFP_KERNEL))
return -ENOMEM;
+ /*
+ * Not excluding isolated cpus on purpose.
+ * If the user wishes to include them, we allow that.
+ */
cpumask_and(cpumask, cpumask, cpu_possible_mask);
if (!cpumask_empty(cpumask)) {
apply_wqattrs_lock();
WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL));
- cpumask_copy(wq_unbound_cpumask, cpu_possible_mask);
+ cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_FLAG_DOMAIN));
pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
select DEBUG_MUTEXES
select DEBUG_RT_MUTEXES if RT_MUTEXES
select DEBUG_LOCK_ALLOC
- select LOCKDEP_CROSSRELEASE
- select LOCKDEP_COMPLETIONS
select TRACE_IRQFLAGS
default n
help
CONFIG_LOCK_STAT defines "contended" and "acquired" lock events.
(CONFIG_LOCKDEP defines "acquire" and "release" events.)
-config LOCKDEP_CROSSRELEASE
- bool
- help
- This makes lockdep work for crosslock which is a lock allowed to
- be released in a different context from the acquisition context.
- Normally a lock must be released in the context acquiring the lock.
- However, relexing this constraint helps synchronization primitives
- such as page locks or completions can use the lock correctness
- detector, lockdep.
-
-config LOCKDEP_COMPLETIONS
- bool
- help
- A deadlock caused by wait_for_completion() and complete() can be
- detected by lockdep using crossrelease feature.
-
-config BOOTPARAM_LOCKDEP_CROSSRELEASE_FULLSTACK
- bool "Enable the boot parameter, crossrelease_fullstack"
- depends on LOCKDEP_CROSSRELEASE
- default n
- help
- The lockdep "cross-release" feature needs to record stack traces
- (of calling functions) for all acquisitions, for eventual later
- use during analysis. By default only a single caller is recorded,
- because the unwind operation can be very expensive with deeper
- stack chains.
-
- However a boot parameter, crossrelease_fullstack, was
- introduced since sometimes deeper traces are required for full
- analysis. This option turns on the boot parameter.
-
config DEBUG_LOCKDEP
bool "Lock dependency engine debugging"
depends on DEBUG_KERNEL && LOCKDEP
static void zap_modalias_env(struct kobj_uevent_env *env)
{
static const char modalias_prefix[] = "MODALIAS=";
- int i;
+ size_t len;
+ int i, j;
for (i = 0; i < env->envp_idx;) {
if (strncmp(env->envp[i], modalias_prefix,
continue;
}
- if (i != env->envp_idx - 1)
- memmove(&env->envp[i], &env->envp[i + 1],
- sizeof(env->envp[i]) * env->envp_idx - 1);
+ len = strlen(env->envp[i]) + 1;
+
+ if (i != env->envp_idx - 1) {
+ memmove(env->envp[i], env->envp[i + 1],
+ env->buflen - len);
+
+ for (j = i; j < env->envp_idx - 1; j++)
+ env->envp[j] = env->envp[j + 1] - len;
+ }
env->envp_idx--;
+ env->buflen -= len;
}
}
}
EXPORT_SYMBOL(rb_replace_node);
+void rb_replace_node_cached(struct rb_node *victim, struct rb_node *new,
+ struct rb_root_cached *root)
+{
+ rb_replace_node(victim, new, &root->rb_root);
+
+ if (root->rb_leftmost == victim)
+ root->rb_leftmost = new;
+}
+EXPORT_SYMBOL(rb_replace_node_cached);
+
void rb_replace_node_rcu(struct rb_node *victim, struct rb_node *new,
struct rb_root *root)
{
return 0;
}
+static int bpf_fill_ld_abs_vlan_push_pop2(struct bpf_test *self)
+{
+ struct bpf_insn *insn;
+
+ insn = kmalloc_array(16, sizeof(*insn), GFP_KERNEL);
+ if (!insn)
+ return -ENOMEM;
+
+ /* Due to func address being non-const, we need to
+ * assemble this here.
+ */
+ insn[0] = BPF_MOV64_REG(R6, R1);
+ insn[1] = BPF_LD_ABS(BPF_B, 0);
+ insn[2] = BPF_LD_ABS(BPF_H, 0);
+ insn[3] = BPF_LD_ABS(BPF_W, 0);
+ insn[4] = BPF_MOV64_REG(R7, R6);
+ insn[5] = BPF_MOV64_IMM(R6, 0);
+ insn[6] = BPF_MOV64_REG(R1, R7);
+ insn[7] = BPF_MOV64_IMM(R2, 1);
+ insn[8] = BPF_MOV64_IMM(R3, 2);
+ insn[9] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ bpf_skb_vlan_push_proto.func - __bpf_call_base);
+ insn[10] = BPF_MOV64_REG(R6, R7);
+ insn[11] = BPF_LD_ABS(BPF_B, 0);
+ insn[12] = BPF_LD_ABS(BPF_H, 0);
+ insn[13] = BPF_LD_ABS(BPF_W, 0);
+ insn[14] = BPF_MOV64_IMM(R0, 42);
+ insn[15] = BPF_EXIT_INSN();
+
+ self->u.ptr.insns = insn;
+ self->u.ptr.len = 16;
+
+ return 0;
+}
+
static int bpf_fill_jump_around_ld_abs(struct bpf_test *self)
{
unsigned int len = BPF_MAXINSNS;
{},
{ {0x1, 0x42 } },
},
+ {
+ "LD_ABS with helper changing skb data",
+ { },
+ INTERNAL,
+ { 0x34 },
+ { { ETH_HLEN, 42 } },
+ .fill_helper = bpf_fill_ld_abs_vlan_push_pop2,
+ },
};
static struct net_device dev;
return NULL;
}
}
- /* We don't expect to fail. */
if (*err) {
- pr_cont("FAIL to attach err=%d len=%d\n",
+ pr_cont("FAIL to prog_create err=%d len=%d\n",
*err, fprog.len);
return NULL;
}
* checks.
*/
fp = bpf_prog_select_runtime(fp, err);
+ if (*err) {
+ pr_cont("FAIL to select_runtime err=%d\n", *err);
+ return NULL;
+ }
break;
}
pass_cnt++;
continue;
}
-
- return err;
+ err_cnt++;
+ continue;
}
pr_cont("jited:%u ", fp->jited);
* @head: head of timerqueue
* @node: timer node to be added
*
- * Adds the timer node to the timerqueue, sorted by the
- * node's expires value.
+ * Adds the timer node to the timerqueue, sorted by the node's expires
+ * value. Returns true if the newly added timer is the first expiring timer in
+ * the queue.
*/
bool timerqueue_add(struct timerqueue_head *head, struct timerqueue_node *node)
{
* @head: head of timerqueue
* @node: timer node to be removed
*
- * Removes the timer node from the timerqueue.
+ * Removes the timer node from the timerqueue. Returns true if the queue is
+ * not empty after the remove.
*/
bool timerqueue_del(struct timerqueue_head *head, struct timerqueue_node *node)
{
if (IS_ERR(dev))
return PTR_ERR(dev);
- if (bdi_debug_register(bdi, dev_name(dev))) {
- device_destroy(bdi_class, dev->devt);
- return -ENOMEM;
- }
cgwb_bdi_register(bdi);
bdi->dev = dev;
+ bdi_debug_register(bdi, dev_name(dev));
set_bit(WB_registered, &bdi->wb.state);
spin_lock_bh(&bdi_lock);
*/
int mapcount = PageSlab(page) ? 0 : page_mapcount(page);
- pr_emerg("page:%p count:%d mapcount:%d mapping:%p index:%#lx",
+ pr_emerg("page:%px count:%d mapcount:%d mapping:%px index:%#lx",
page, page_ref_count(page), mapcount,
page->mapping, page_to_pgoff(page));
if (PageCompound(page))
#ifdef CONFIG_MEMCG
if (page->mem_cgroup)
- pr_alert("page->mem_cgroup:%p\n", page->mem_cgroup);
+ pr_alert("page->mem_cgroup:%px\n", page->mem_cgroup);
#endif
}
void dump_vma(const struct vm_area_struct *vma)
{
- pr_emerg("vma %p start %p end %p\n"
- "next %p prev %p mm %p\n"
- "prot %lx anon_vma %p vm_ops %p\n"
- "pgoff %lx file %p private_data %p\n"
+ pr_emerg("vma %px start %px end %px\n"
+ "next %px prev %px mm %px\n"
+ "prot %lx anon_vma %px vm_ops %px\n"
+ "pgoff %lx file %px private_data %px\n"
"flags: %#lx(%pGv)\n",
vma, (void *)vma->vm_start, (void *)vma->vm_end, vma->vm_next,
vma->vm_prev, vma->vm_mm,
void dump_mm(const struct mm_struct *mm)
{
- pr_emerg("mm %p mmap %p seqnum %d task_size %lu\n"
+ pr_emerg("mm %px mmap %px seqnum %d task_size %lu\n"
#ifdef CONFIG_MMU
- "get_unmapped_area %p\n"
+ "get_unmapped_area %px\n"
#endif
"mmap_base %lu mmap_legacy_base %lu highest_vm_end %lu\n"
- "pgd %p mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
+ "pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
"pinned_vm %lx data_vm %lx exec_vm %lx stack_vm %lx\n"
"start_code %lx end_code %lx start_data %lx end_data %lx\n"
"start_brk %lx brk %lx start_stack %lx\n"
"arg_start %lx arg_end %lx env_start %lx env_end %lx\n"
- "binfmt %p flags %lx core_state %p\n"
+ "binfmt %px flags %lx core_state %px\n"
#ifdef CONFIG_AIO
- "ioctx_table %p\n"
+ "ioctx_table %px\n"
#endif
#ifdef CONFIG_MEMCG
- "owner %p "
+ "owner %px "
#endif
- "exe_file %p\n"
+ "exe_file %px\n"
#ifdef CONFIG_MMU_NOTIFIER
- "mmu_notifier_mm %p\n"
+ "mmu_notifier_mm %px\n"
#endif
#ifdef CONFIG_NUMA_BALANCING
"numa_next_scan %lu numa_scan_offset %lu numa_scan_seq %d\n"
enum fixed_addresses idx;
int i, slot;
- WARN_ON(system_state != SYSTEM_BOOTING);
+ WARN_ON(system_state >= SYSTEM_RUNNING);
slot = -1;
for (i = 0; i < FIX_BTMAPS_SLOTS; i++) {
* get_user_pages_longterm() and disallow it for filesystem-dax
* mappings.
*/
- if (vma_is_fsdax(vma))
- return -EOPNOTSUPP;
+ if (vma_is_fsdax(vma)) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
vec->got_ref = true;
*/
static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
{
- return pte_access_permitted(pte, WRITE) ||
+ return pte_write(pte) ||
((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
}
if (pmd_protnone(pmd))
return hmm_vma_walk_clear(start, end, walk);
- if (!pmd_access_permitted(pmd, write_fault))
+ if (write_fault && !pmd_write(pmd))
return hmm_vma_walk_clear(start, end, walk);
pfn = pmd_pfn(pmd) + pte_index(addr);
- flag |= pmd_access_permitted(pmd, WRITE) ? HMM_PFN_WRITE : 0;
+ flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
for (; addr < end; addr += PAGE_SIZE, i++, pfn++)
pfns[i] = hmm_pfn_t_from_pfn(pfn) | flag;
return 0;
continue;
}
- if (!pte_access_permitted(pte, write_fault))
+ if (write_fault && !pte_write(pte))
goto fault;
pfns[i] = hmm_pfn_t_from_pfn(pte_pfn(pte)) | flag;
- pfns[i] |= pte_access_permitted(pte, WRITE) ? HMM_PFN_WRITE : 0;
+ pfns[i] |= pte_write(pte) ? HMM_PFN_WRITE : 0;
continue;
fault:
*/
WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set");
- if (!pmd_access_permitted(*pmd, flags & FOLL_WRITE))
+ if (flags & FOLL_WRITE && !pmd_write(*pmd))
return NULL;
if (pmd_present(*pmd) && pmd_devmap(*pmd))
assert_spin_locked(pud_lockptr(mm, pud));
- if (!pud_access_permitted(*pud, flags & FOLL_WRITE))
+ if (flags & FOLL_WRITE && !pud_write(*pud))
return NULL;
if (pud_present(*pud) && pud_devmap(*pud))
*/
static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
{
- return pmd_access_permitted(pmd, WRITE) ||
+ return pmd_write(pmd) ||
((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
}
if (page_count(page) == 0)
continue;
scan_block(page, page + 1, NULL);
- if (!(pfn % (MAX_SCAN_SIZE / sizeof(*page))))
+ if (!(pfn & 63))
cond_resched();
}
}
return VM_FAULT_FALLBACK;
}
-static int wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd)
+/* `inline' is required to avoid gcc 4.1.2 build error */
+static inline int wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd)
{
if (vma_is_anonymous(vmf->vma))
return do_huge_pmd_wp_page(vmf, orig_pmd);
if (unlikely(!pte_same(*vmf->pte, entry)))
goto unlock;
if (vmf->flags & FAULT_FLAG_WRITE) {
- if (!pte_access_permitted(entry, WRITE))
+ if (!pte_write(entry))
return do_wp_page(vmf);
entry = pte_mkdirty(entry);
}
/* NUMA case for anonymous PUDs would go here */
- if (dirty && !pud_access_permitted(orig_pud, WRITE)) {
+ if (dirty && !pud_write(orig_pud)) {
ret = wp_huge_pud(&vmf, orig_pud);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
return do_huge_pmd_numa_page(&vmf, orig_pmd);
- if (dirty && !pmd_access_permitted(orig_pmd, WRITE)) {
+ if (dirty && !pmd_write(orig_pmd)) {
ret = wp_huge_pmd(&vmf, orig_pmd);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
goto out;
pte = *ptep;
- if (!pte_access_permitted(pte, flags & FOLL_WRITE))
+ if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
*prot = pgprot_val(pte_pgprot(pte));
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- if (unlikely(tsk_is_oom_victim(current))) {
+ if (unlikely(mm_is_oom_victim(mm))) {
/*
* Wait for oom_reap_task() to stop working on this
* mm. Because MMF_OOM_SKIP is already set before
* calling down_read(), oom_reap_task() will not run
* on this "mm" post up_write().
*
- * tsk_is_oom_victim() cannot be set from under us
- * either because current->mm is already set to NULL
+ * mm_is_oom_victim() cannot be set from under us
+ * either because victim->mm is already set to NULL
* under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if current->mm
+ * set not NULL by the OOM killer only if victim->mm
* is found not NULL while holding the task_lock.
*/
+ set_bit(MMF_OOM_SKIP, &mm->flags);
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
next = pmd_addr_end(addr, end);
if (!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)
&& pmd_none_or_clear_bad(pmd))
- continue;
+ goto next;
/* invoke the mmu notifier if the pmd is populated */
if (!mni_start) {
}
/* huge pmd was handled */
- continue;
+ goto next;
}
}
/* fall through, the trans huge pmd just split */
this_pages = change_pte_range(vma, pmd, addr, next, newprot,
dirty_accountable, prot_numa);
pages += this_pages;
+next:
+ cond_resched();
} while (pmd++, addr = next, addr != end);
if (mni_start)
return;
/* oom_mm is bound to the signal struct life time. */
- if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
+ if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
mmgrab(tsk->signal->oom_mm);
+ set_bit(MMF_OOM_VICTIM, &mm->flags);
+ }
/*
* Make sure that the task is woken up from uninterruptible sleep
{
struct page *page, *next;
unsigned long flags, pfn;
+ int batch_count = 0;
/* Prepare pages for freeing */
list_for_each_entry_safe(page, next, list, lru) {
set_page_private(page, 0);
trace_mm_page_free_batched(page);
free_unref_page_commit(page, pfn);
+
+ /*
+ * Guard against excessive IRQ disabled times when we get
+ * a large list of pages to free.
+ */
+ if (++batch_count == SWAP_CLUSTER_MAX) {
+ local_irq_restore(flags);
+ batch_count = 0;
+ local_irq_save(flags);
+ }
}
local_irq_restore(flags);
}
pgcnt = 0;
for_each_resv_unavail_range(i, &start, &end) {
for (pfn = PFN_DOWN(start); pfn < PFN_UP(end); pfn++) {
+ if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages)))
+ continue;
mm_zero_struct_page(pfn_to_page(pfn));
pgcnt++;
}
if (pcpu_setup_first_chunk(ai, fc) < 0)
panic("Failed to initialize percpu areas.");
+#ifdef CONFIG_CRIS
+#warning "the CRIS architecture has physical and virtual addresses confused"
+#else
pcpu_free_alloc_info(ai);
+#endif
}
#endif /* CONFIG_SMP */
*dbg_redzone2(cachep, objp));
}
- if (cachep->flags & SLAB_STORE_USER) {
- pr_err("Last user: [<%p>](%pSR)\n",
- *dbg_userword(cachep, objp),
- *dbg_userword(cachep, objp));
- }
+ if (cachep->flags & SLAB_STORE_USER)
+ pr_err("Last user: (%pSR)\n", *dbg_userword(cachep, objp));
realobj = (char *)objp + obj_offset(cachep);
size = cachep->object_size;
for (i = 0; i < size && lines; i += 16, lines--) {
/* Mismatch ! */
/* Print header */
if (lines == 0) {
- pr_err("Slab corruption (%s): %s start=%p, len=%d\n",
+ pr_err("Slab corruption (%s): %s start=%px, len=%d\n",
print_tainted(), cachep->name,
realobj, size);
print_objinfo(cachep, objp, 0);
if (objnr) {
objp = index_to_obj(cachep, page, objnr - 1);
realobj = (char *)objp + obj_offset(cachep);
- pr_err("Prev obj: start=%p, len=%d\n", realobj, size);
+ pr_err("Prev obj: start=%px, len=%d\n", realobj, size);
print_objinfo(cachep, objp, 2);
}
if (objnr + 1 < cachep->num) {
objp = index_to_obj(cachep, page, objnr + 1);
realobj = (char *)objp + obj_offset(cachep);
- pr_err("Next obj: start=%p, len=%d\n", realobj, size);
+ pr_err("Next obj: start=%px, len=%d\n", realobj, size);
print_objinfo(cachep, objp, 2);
}
}
/* Verify double free bug */
for (i = page->active; i < cachep->num; i++) {
if (get_free_obj(page, i) == objnr) {
- pr_err("slab: double free detected in cache '%s', objp %p\n",
+ pr_err("slab: double free detected in cache '%s', objp %px\n",
cachep->name, objp);
BUG();
}
else
slab_error(cache, "memory outside object was overwritten");
- pr_err("%p: redzone 1:0x%llx, redzone 2:0x%llx\n",
+ pr_err("%px: redzone 1:0x%llx, redzone 2:0x%llx\n",
obj, redzone1, redzone2);
}
if (*dbg_redzone1(cachep, objp) != RED_INACTIVE ||
*dbg_redzone2(cachep, objp) != RED_INACTIVE) {
slab_error(cachep, "double free, or memory outside object was overwritten");
- pr_err("%p: redzone 1:0x%llx, redzone 2:0x%llx\n",
+ pr_err("%px: redzone 1:0x%llx, redzone 2:0x%llx\n",
objp, *dbg_redzone1(cachep, objp),
*dbg_redzone2(cachep, objp));
}
cachep->ctor(objp);
if (ARCH_SLAB_MINALIGN &&
((unsigned long)objp & (ARCH_SLAB_MINALIGN-1))) {
- pr_err("0x%p: not aligned to ARCH_SLAB_MINALIGN=%d\n",
+ pr_err("0x%px: not aligned to ARCH_SLAB_MINALIGN=%d\n",
objp, (int)ARCH_SLAB_MINALIGN);
}
return objp;
return;
}
#endif
- seq_printf(m, "%p", (void *)address);
+ seq_printf(m, "%px", (void *)address);
}
static int leaks_show(struct seq_file *m, void *p)
if (unlikely(!mem_section)) {
unsigned long size, align;
- size = sizeof(struct mem_section) * NR_SECTION_ROOTS;
+ size = sizeof(struct mem_section*) * NR_SECTION_ROOTS;
align = 1 << (INTERNODE_CACHE_SHIFT);
mem_section = memblock_virt_alloc(size, align);
}
*/
void unregister_shrinker(struct shrinker *shrinker)
{
+ if (!shrinker->nr_deferred)
+ return;
down_write(&shrinker_rwsem);
list_del(&shrinker->list);
up_write(&shrinker_rwsem);
kfree(shrinker->nr_deferred);
+ shrinker->nr_deferred = NULL;
}
EXPORT_SYMBOL(unregister_shrinker);
#include <linux/mount.h>
#include <linux/migrate.h>
#include <linux/pagemap.h>
+#include <linux/fs.h>
#define ZSPAGE_MAGIC 0x58
vlan_gvrp_uninit_applicant(real_dev);
}
- /* Take it out of our own structures, but be sure to interlock with
- * HW accelerating devices or SW vlan input packet processing if
- * VLAN is not 0 (leave it there for 802.1p).
- */
- if (vlan_id)
- vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
+ vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
/* Get rid of the vlan's reference to real_dev */
dev_put(real_dev);
orig_node->last_seen = jiffies;
/* find packet count of corresponding one hop neighbor */
- spin_lock_bh(&orig_node->bat_iv.ogm_cnt_lock);
+ spin_lock_bh(&orig_neigh_node->bat_iv.ogm_cnt_lock);
if_num = if_incoming->if_num;
orig_eq_count = orig_neigh_node->bat_iv.bcast_own_sum[if_num];
neigh_ifinfo = batadv_neigh_ifinfo_new(neigh_node, if_outgoing);
} else {
neigh_rq_count = 0;
}
- spin_unlock_bh(&orig_node->bat_iv.ogm_cnt_lock);
+ spin_unlock_bh(&orig_neigh_node->bat_iv.ogm_cnt_lock);
/* pay attention to not get a value bigger than 100 % */
if (orig_eq_count > neigh_rq_count)
}
orig_gw = batadv_gw_node_get(bat_priv, orig_node);
- if (!orig_node)
+ if (!orig_gw)
goto out;
if (batadv_v_gw_throughput_get(orig_gw, &orig_throughput) < 0)
*/
if (skb->priority >= 256 && skb->priority <= 263)
frag_header.priority = skb->priority - 256;
+ else
+ frag_header.priority = 0;
ether_addr_copy(frag_header.orig, primary_if->net_dev->dev_addr);
ether_addr_copy(frag_header.dest, orig_node->orig);
/**
* batadv_tp_sender_timeout - timer that fires in case of packet loss
- * @arg: address of the related tp_vars
+ * @t: address to timer_list inside tp_vars
*
* If fired it means that there was packet loss.
* Switch to Slow Start, set the ss_threshold to half of the current cwnd and
/**
* batadv_tp_receiver_shutdown - stop a tp meter receiver when timeout is
* reached without received ack
- * @arg: address of the related tp_vars
+ * @t: address to timer_list inside tp_vars
*/
static void batadv_tp_receiver_shutdown(struct timer_list *t)
{
struct net_bridge *br = netdev_priv(dev);
int err;
+ err = register_netdevice(dev);
+ if (err)
+ return err;
+
if (tb[IFLA_ADDRESS]) {
spin_lock_bh(&br->lock);
br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
spin_unlock_bh(&br->lock);
}
- err = register_netdevice(dev);
- if (err)
- return err;
-
err = br_changelink(dev, tb, data, extack);
if (err)
- unregister_netdevice(dev);
+ br_dev_delete(dev, NULL);
+
return err;
}
mutex_lock(&caifdevs->lock);
list_add_rcu(&caifd->list, &caifdevs->list);
- strncpy(caifd->layer.name, dev->name,
- sizeof(caifd->layer.name) - 1);
- caifd->layer.name[sizeof(caifd->layer.name) - 1] = 0;
+ strlcpy(caifd->layer.name, dev->name,
+ sizeof(caifd->layer.name));
caifd->layer.transmit = transmit;
cfcnfg_add_phy_layer(cfg,
dev,
dev_add_pack(&caif_usb_type);
pack_added = true;
- strncpy(layer->name, dev->name,
- sizeof(layer->name) - 1);
- layer->name[sizeof(layer->name) - 1] = 0;
+ strlcpy(layer->name, dev->name, sizeof(layer->name));
return 0;
}
case CAIFPROTO_RFM:
l->linktype = CFCTRL_SRV_RFM;
l->u.datagram.connid = s->sockaddr.u.rfm.connection_id;
- strncpy(l->u.rfm.volume, s->sockaddr.u.rfm.volume,
- sizeof(l->u.rfm.volume)-1);
- l->u.rfm.volume[sizeof(l->u.rfm.volume)-1] = 0;
+ strlcpy(l->u.rfm.volume, s->sockaddr.u.rfm.volume,
+ sizeof(l->u.rfm.volume));
break;
case CAIFPROTO_UTIL:
l->linktype = CFCTRL_SRV_UTIL;
l->endpoint = 0x00;
l->chtype = 0x00;
- strncpy(l->u.utility.name, s->sockaddr.u.util.service,
- sizeof(l->u.utility.name)-1);
- l->u.utility.name[sizeof(l->u.utility.name)-1] = 0;
+ strlcpy(l->u.utility.name, s->sockaddr.u.util.service,
+ sizeof(l->u.utility.name));
caif_assert(sizeof(l->u.utility.name) > 10);
l->u.utility.paramlen = s->param.size;
if (l->u.utility.paramlen > sizeof(l->u.utility.params))
tmp16 = cpu_to_le16(param->u.utility.fifosize_bufs);
cfpkt_add_body(pkt, &tmp16, 2);
memset(utility_name, 0, sizeof(utility_name));
- strncpy(utility_name, param->u.utility.name,
- UTILITY_NAME_LENGTH - 1);
+ strlcpy(utility_name, param->u.utility.name,
+ UTILITY_NAME_LENGTH);
cfpkt_add_body(pkt, utility_name, UTILITY_NAME_LENGTH);
tmp8 = param->u.utility.paramlen;
cfpkt_add_body(pkt, &tmp8, 1);
int dev_get_valid_name(struct net *net, struct net_device *dev,
const char *name)
{
- return dev_alloc_name_ns(net, dev, name);
+ BUG_ON(!net);
+
+ if (!dev_valid_name(name))
+ return -EINVAL;
+
+ if (strchr(name, '%'))
+ return dev_alloc_name_ns(net, dev, name);
+ else if (__dev_get_by_name(net, name))
+ return -EEXIST;
+ else if (dev->name != name)
+ strlcpy(dev->name, name, IFNAMSIZ);
+
+ return 0;
}
EXPORT_SYMBOL(dev_get_valid_name);
hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0,
troom > 0 ? troom + 128 : 0, GFP_ATOMIC))
goto do_drop;
- if (troom > 0 && __skb_linearize(skb))
+ if (skb_linearize(skb))
goto do_drop;
}
return dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings);
}
-static void
-warn_incomplete_ethtool_legacy_settings_conversion(const char *details)
-{
- char name[sizeof(current->comm)];
-
- pr_info_once("warning: `%s' uses legacy ethtool link settings API, %s\n",
- get_task_comm(name, current), details);
-}
-
/* Query device for its ethtool_cmd settings.
*
* Backward compatibility note: for compatibility with legacy ethtool,
&link_ksettings);
if (err < 0)
return err;
- if (!convert_link_ksettings_to_legacy_settings(&cmd,
- &link_ksettings))
- warn_incomplete_ethtool_legacy_settings_conversion(
- "link modes are only partially reported");
+ convert_link_ksettings_to_legacy_settings(&cmd,
+ &link_ksettings);
/* send a sensible cmd tag back to user */
cmd.cmd = ETHTOOL_GSET;
*/
goto out_err_free;
- /* We are guaranteed to never error here with cBPF to eBPF
- * transitions, since there's no issue with type compatibility
- * checks on program arrays.
- */
fp = bpf_prog_select_runtime(fp, &err);
+ if (err)
+ goto out_err_free;
kfree(old_prog);
return fp;
spin_lock_bh(&net->nsid_lock);
peer = idr_find(&net->netns_ids, id);
if (peer)
- get_net(peer);
+ peer = maybe_get_net(peer);
spin_unlock_bh(&net->nsid_lock);
rcu_read_unlock();
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/types.h>
-#include <linux/module.h>
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/skbuff.h>
return false;
}
-static struct net *get_target_net(struct sk_buff *skb, int netnsid)
+static struct net *get_target_net(struct sock *sk, int netnsid)
{
struct net *net;
- net = get_net_ns_by_id(sock_net(skb->sk), netnsid);
+ net = get_net_ns_by_id(sock_net(sk), netnsid);
if (!net)
return ERR_PTR(-EINVAL);
/* For now, the caller is required to have CAP_NET_ADMIN in
* the user namespace owning the target net ns.
*/
- if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
+ if (!sk_ns_capable(sk, net->user_ns, CAP_NET_ADMIN)) {
put_net(net);
return ERR_PTR(-EACCES);
}
ifla_policy, NULL) >= 0) {
if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(skb, netnsid);
+ tgt_net = get_target_net(skb->sk, netnsid);
if (IS_ERR(tgt_net)) {
tgt_net = net;
netnsid = -1;
if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(skb, netnsid);
+ tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
int i, new_frags;
u32 d_off;
- if (!num_frags)
- return 0;
-
if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
return -EINVAL;
+ if (!num_frags)
+ goto release;
+
new_frags = (__skb_pagelen(skb) + PAGE_SIZE - 1) >> PAGE_SHIFT;
for (i = 0; i < new_frags; i++) {
page = alloc_page(gfp_mask);
__skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off);
skb_shinfo(skb)->nr_frags = new_frags;
+release:
skb_zcopy_clear(skb, false);
return 0;
}
skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
SKBTX_SHARED_FRAG;
- if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
- goto err;
while (pos < offset + len) {
if (i >= nfrags) {
if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
goto err;
+ if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+ goto err;
*nskb_frag = *frag;
__skb_frag_ref(nskb_frag);
struct sock *sk = skb->sk;
if (!skb_may_tx_timestamp(sk, false))
- return;
+ goto err;
/* Take a reference to prevent skb_orphan() from freeing the socket,
* but only if the socket refcount is not zero.
*skb_hwtstamps(skb) = *hwtstamps;
__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, false);
sock_put(sk);
+ return;
}
+
+err:
+ kfree_skb(skb);
}
EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
case SKNLGRP_INET6_UDP_DESTROY:
if (!sock_diag_handlers[AF_INET6])
request_module("net-pf-%d-proto-%d-type-%d", PF_NETLINK,
- NETLINK_SOCK_DIAG, AF_INET);
+ NETLINK_SOCK_DIAG, AF_INET6);
break;
}
return 0;
.data = &bpf_jit_enable,
.maxlen = sizeof(int),
.mode = 0644,
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
.proc_handler = proc_dointvec
+#else
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &one,
+ .extra2 = &one,
+#endif
},
# ifdef CONFIG_HAVE_EBPF_JIT
{
#include <linux/of_net.h>
#include <linux/of_mdio.h>
#include <linux/mdio.h>
-#include <linux/list.h>
#include <net/rtnetlink.h>
#include <net/pkt_cls.h>
#include <net/tc_act/tc_mirred.h>
static bool inetdev_valid_mtu(unsigned int mtu)
{
- return mtu >= 68;
+ return mtu >= IPV4_MIN_MTU;
}
static void inetdev_send_gratuitous_arp(struct net_device *dev,
static void ip_fib_net_exit(struct net *net)
{
- unsigned int i;
+ int i;
rtnl_lock();
#ifdef CONFIG_IP_MULTIPLE_TABLES
RCU_INIT_POINTER(net->ipv4.fib_main, NULL);
RCU_INIT_POINTER(net->ipv4.fib_default, NULL);
#endif
- for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
+ /* Destroy the tables in reverse order to guarantee that the
+ * local table, ID 255, is destroyed before the main table, ID
+ * 254. This is necessary as the local table may contain
+ * references to data contained in the main table.
+ */
+ for (i = FIB_TABLE_HASHSZ - 1; i >= 0; i--) {
struct hlist_head *head = &net->ipv4.fib_table_hash[i];
struct hlist_node *tmp;
struct fib_table *tb;
nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
int type = nla_type(nla);
- u32 val;
+ u32 fi_val, val;
if (!type)
continue;
val = nla_get_u32(nla);
}
- if (fi->fib_metrics->metrics[type - 1] != val)
+ fi_val = fi->fib_metrics->metrics[type - 1];
+ if (type == RTAX_FEATURES)
+ fi_val &= ~DST_FEATURE_ECN_CA;
+
+ if (fi_val != val)
return false;
}
#include <linux/rtnetlink.h>
#include <linux/times.h>
#include <linux/pkt_sched.h>
+#include <linux/byteorder/generic.h>
#include <net/net_namespace.h>
#include <net/arp.h>
return scount;
}
+/* source address selection per RFC 3376 section 4.2.13 */
+static __be32 igmpv3_get_srcaddr(struct net_device *dev,
+ const struct flowi4 *fl4)
+{
+ struct in_device *in_dev = __in_dev_get_rcu(dev);
+
+ if (!in_dev)
+ return htonl(INADDR_ANY);
+
+ for_ifa(in_dev) {
+ if (inet_ifa_match(fl4->saddr, ifa))
+ return fl4->saddr;
+ } endfor_ifa(in_dev);
+
+ return htonl(INADDR_ANY);
+}
+
static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
{
struct sk_buff *skb;
pip->frag_off = htons(IP_DF);
pip->ttl = 1;
pip->daddr = fl4.daddr;
- pip->saddr = fl4.saddr;
+ pip->saddr = igmpv3_get_srcaddr(dev, &fl4);
pip->protocol = IPPROTO_IGMP;
pip->tot_len = 0; /* filled in later */
ip_select_ident(net, skb, NULL);
}
static struct sk_buff *add_grhead(struct sk_buff *skb, struct ip_mc_list *pmc,
- int type, struct igmpv3_grec **ppgr)
+ int type, struct igmpv3_grec **ppgr, unsigned int mtu)
{
struct net_device *dev = pmc->interface->dev;
struct igmpv3_report *pih;
struct igmpv3_grec *pgr;
- if (!skb)
- skb = igmpv3_newpack(dev, dev->mtu);
- if (!skb)
- return NULL;
+ if (!skb) {
+ skb = igmpv3_newpack(dev, mtu);
+ if (!skb)
+ return NULL;
+ }
pgr = skb_put(skb, sizeof(struct igmpv3_grec));
pgr->grec_type = type;
pgr->grec_auxwords = 0;
struct igmpv3_grec *pgr = NULL;
struct ip_sf_list *psf, *psf_next, *psf_prev, **psf_list;
int scount, stotal, first, isquery, truncate;
+ unsigned int mtu;
if (pmc->multiaddr == IGMP_ALL_HOSTS)
return skb;
if (ipv4_is_local_multicast(pmc->multiaddr) && !net->ipv4.sysctl_igmp_llm_reports)
return skb;
+ mtu = READ_ONCE(dev->mtu);
+ if (mtu < IPV4_MIN_MTU)
+ return skb;
+
isquery = type == IGMPV3_MODE_IS_INCLUDE ||
type == IGMPV3_MODE_IS_EXCLUDE;
truncate = type == IGMPV3_MODE_IS_EXCLUDE ||
AVAILABLE(skb) < grec_size(pmc, type, gdeleted, sdeleted)) {
if (skb)
igmpv3_sendpack(skb);
- skb = igmpv3_newpack(dev, dev->mtu);
+ skb = igmpv3_newpack(dev, mtu);
}
}
first = 1;
pgr->grec_nsrcs = htons(scount);
if (skb)
igmpv3_sendpack(skb);
- skb = igmpv3_newpack(dev, dev->mtu);
+ skb = igmpv3_newpack(dev, mtu);
first = 1;
scount = 0;
}
if (first) {
- skb = add_grhead(skb, pmc, type, &pgr);
+ skb = add_grhead(skb, pmc, type, &pgr, mtu);
first = 0;
}
if (!skb)
igmpv3_sendpack(skb);
skb = NULL; /* add_grhead will get a new one */
}
- skb = add_grhead(skb, pmc, type, &pgr);
+ skb = add_grhead(skb, pmc, type, &pgr, mtu);
}
}
if (pgr)
len = gre_hdr_len + sizeof(*ershdr);
if (unlikely(!pskb_may_pull(skb, len)))
- return -ENOMEM;
+ return PACKET_REJECT;
iph = ip_hdr(skb);
ershdr = (struct erspanhdr *)(skb->data + gre_hdr_len);
static void ipgre_tap_setup(struct net_device *dev)
{
ether_setup(dev);
+ dev->max_mtu = 0;
dev->netdev_ops = &gre_tap_netdev_ops;
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
dev->needed_headroom = t_hlen + hlen;
mtu -= (dev->hard_header_len + t_hlen);
- if (mtu < 68)
- mtu = 68;
+ if (mtu < IPV4_MIN_MTU)
+ mtu = IPV4_MIN_MTU;
return mtu;
}
if (!xt_find_jump_offset(offsets, newpos,
newinfo->number))
return 0;
- e = entry0 + newpos;
} else {
/* ... this is a fallthru */
newpos = pos + e->next_offset;
if (!xt_find_jump_offset(offsets, newpos,
newinfo->number))
return 0;
- e = entry0 + newpos;
} else {
/* ... this is a fallthru */
newpos = pos + e->next_offset;
static void clusterip_net_exit(struct net *net)
{
-#ifdef CONFIG_PROC_FS
struct clusterip_net *cn = net_generic(net, clusterip_net_id);
+#ifdef CONFIG_PROC_FS
proc_remove(cn->procdir);
cn->procdir = NULL;
#endif
nf_unregister_net_hook(net, &cip_arp_ops);
+ WARN_ON_ONCE(!list_empty(&cn->configs));
}
static struct pernet_operations clusterip_net_ops = {
int err;
struct ip_options_data opt_copy;
struct raw_frag_vec rfv;
+ int hdrincl;
err = -EMSGSIZE;
if (len > 0xFFFF)
goto out;
+ /* hdrincl should be READ_ONCE(inet->hdrincl)
+ * but READ_ONCE() doesn't work with bit fields.
+ * Doing this indirectly yields the same result.
+ */
+ hdrincl = inet->hdrincl;
+ hdrincl = READ_ONCE(hdrincl);
/*
* Check the flags.
*/
/* Linux does not mangle headers on raw sockets,
* so that IP options + IP_HDRINCL is non-sense.
*/
- if (inet->hdrincl)
+ if (hdrincl)
goto done;
if (ipc.opt->opt.srr) {
if (!daddr)
flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos,
RT_SCOPE_UNIVERSE,
- inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
+ hdrincl ? IPPROTO_RAW : sk->sk_protocol,
inet_sk_flowi_flags(sk) |
- (inet->hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
+ (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
- if (!inet->hdrincl) {
+ if (!hdrincl) {
rfv.msg = msg;
rfv.hlen = 0;
goto do_confirm;
back_from_confirm:
- if (inet->hdrincl)
+ if (hdrincl)
err = raw_send_hdrinc(sk, &fl4, msg, len,
&rt, msg->msg_flags, &ipc.sockc);
u32 new_sample = tp->rcv_rtt_est.rtt_us;
long m = sample;
- if (m == 0)
- m = 1;
-
if (new_sample != 0) {
/* If we sample in larger samples in the non-timestamp
* case, we could grossly overestimate the RTT especially
if (before(tp->rcv_nxt, tp->rcv_rtt_est.seq))
return;
delta_us = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcv_rtt_est.time);
+ if (!delta_us)
+ delta_us = 1;
tcp_rcv_rtt_update(tp, delta_us, 1);
new_measure:
(TCP_SKB_CB(skb)->end_seq -
TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss)) {
u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
- u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
+ u32 delta_us;
+ if (!delta)
+ delta = 1;
+ delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
tcp_rcv_rtt_update(tp, delta_us, 0);
}
}
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent,
0,
- tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr,
+ tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->saddr,
AF_INET),
inet_rsk(req)->no_srccheck ? IP_REPLY_ARG_NOSRCCHECK : 0,
ip_hdr(skb)->tos);
icsk->icsk_ack.pingpong = 0;
icsk->icsk_ack.ato = TCP_ATO_MIN;
}
+ tcp_mstamp_refresh(tcp_sk(sk));
tcp_send_ack(sk);
__NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS);
}
goto out;
}
+ tcp_mstamp_refresh(tp);
if (sk->sk_state == TCP_FIN_WAIT2 && sock_flag(sk, SOCK_DEAD)) {
if (tp->linger2 >= 0) {
const int tmo = tcp_fin_time(sk) - TCP_TIMEWAIT_LEN;
return xfrm4_extract_header(skb);
}
+static int xfrm4_rcv_encap_finish2(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ return dst_input(skb);
+}
+
static inline int xfrm4_rcv_encap_finish(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
iph->tos, skb->dev))
goto drop;
}
- return dst_input(skb);
+
+ if (xfrm_trans_queue(skb, xfrm4_rcv_encap_finish2))
+ goto drop;
+
+ return 0;
drop:
kfree_skb(skb);
return NET_RX_DROP;
np->mcast_hops = IPV6_DEFAULT_MCASTHOPS;
np->mc_loop = 1;
np->pmtudisc = IPV6_PMTUDISC_WANT;
- np->autoflowlabel = ip6_default_np_autolabel(net);
np->repflow = net->ipv6.sysctl.flowlabel_reflect;
sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;
sr_phdr->segments[0] = **addr_p;
*addr_p = &sr_ihdr->segments[sr_ihdr->segments_left];
+ if (sr_ihdr->hdrlen > hops * 2) {
+ int tlvs_offset, tlvs_length;
+
+ tlvs_offset = (1 + hops * 2) << 3;
+ tlvs_length = (sr_ihdr->hdrlen - hops * 2) << 3;
+ memcpy((char *)sr_phdr + tlvs_offset,
+ (char *)sr_ihdr + tlvs_offset, tlvs_length);
+ }
+
#ifdef CONFIG_IPV6_SEG6_HMAC
if (sr_has_hmac(sr_phdr)) {
struct net *net = NULL;
if (!(fn->fn_flags & RTN_RTINFO)) {
RCU_INIT_POINTER(fn->leaf, NULL);
rt6_release(leaf);
+ /* remove null_entry in the root node */
+ } else if (fn->fn_flags & RTN_TL_ROOT &&
+ rcu_access_pointer(fn->leaf) ==
+ net->ipv6.ip6_null_entry) {
+ RCU_INIT_POINTER(fn->leaf, NULL);
}
return fn;
* If fib6_add_1 has cleared the old leaf pointer in the
* super-tree leaf node we have to find a new one for it.
*/
- struct rt6_info *pn_leaf = rcu_dereference_protected(pn->leaf,
- lockdep_is_held(&table->tb6_lock));
- if (pn != fn && pn_leaf == rt) {
- pn_leaf = NULL;
- RCU_INIT_POINTER(pn->leaf, NULL);
- atomic_dec(&rt->rt6i_ref);
- }
- if (pn != fn && !pn_leaf && !(pn->fn_flags & RTN_RTINFO)) {
- pn_leaf = fib6_find_prefix(info->nl_net, table, pn);
-#if RT6_DEBUG >= 2
- if (!pn_leaf) {
- WARN_ON(!pn_leaf);
- pn_leaf = info->nl_net->ipv6.ip6_null_entry;
+ if (pn != fn) {
+ struct rt6_info *pn_leaf =
+ rcu_dereference_protected(pn->leaf,
+ lockdep_is_held(&table->tb6_lock));
+ if (pn_leaf == rt) {
+ pn_leaf = NULL;
+ RCU_INIT_POINTER(pn->leaf, NULL);
+ atomic_dec(&rt->rt6i_ref);
}
+ if (!pn_leaf && !(pn->fn_flags & RTN_RTINFO)) {
+ pn_leaf = fib6_find_prefix(info->nl_net, table,
+ pn);
+#if RT6_DEBUG >= 2
+ if (!pn_leaf) {
+ WARN_ON(!pn_leaf);
+ pn_leaf =
+ info->nl_net->ipv6.ip6_null_entry;
+ }
#endif
- atomic_inc(&pn_leaf->rt6i_ref);
- rcu_assign_pointer(pn->leaf, pn_leaf);
+ atomic_inc(&pn_leaf->rt6i_ref);
+ rcu_assign_pointer(pn->leaf, pn_leaf);
+ }
}
#endif
goto failure;
return err;
failure:
- /* fn->leaf could be NULL if fn is an intermediate node and we
- * failed to add the new route to it in both subtree creation
- * failure and fib6_add_rt2node() failure case.
- * In both cases, fib6_repair_tree() should be called to fix
- * fn->leaf.
+ /* fn->leaf could be NULL and fib6_repair_tree() needs to be called if:
+ * 1. fn is an intermediate node and we failed to add the new
+ * route to it in both subtree creation failure and fib6_add_rt2node()
+ * failure case.
+ * 2. fn is the root node in the table and we fail to add the first
+ * default route to it.
*/
- if (fn && !(fn->fn_flags & (RTN_RTINFO|RTN_ROOT)))
+ if (fn &&
+ (!(fn->fn_flags & (RTN_RTINFO|RTN_ROOT)) ||
+ (fn->fn_flags & RTN_TL_ROOT &&
+ !rcu_access_pointer(fn->leaf))))
fib6_repair_tree(info->nl_net, table, fn);
/* Always release dst as dst->__refcnt is guaranteed
* to be taken before entering this function
struct fib6_walker *w;
int iter = 0;
+ /* Set fn->leaf to null_entry for root node. */
+ if (fn->fn_flags & RTN_TL_ROOT) {
+ rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry);
+ return fn;
+ }
+
for (;;) {
struct fib6_node *fn_r = rcu_dereference_protected(fn->right,
lockdep_is_held(&table->tb6_lock));
}
read_unlock(&net->ipv6.fib6_walker_lock);
- /* If it was last route, expunge its radix tree node */
+ /* If it was last route, call fib6_repair_tree() to:
+ * 1. For root node, put back null_entry as how the table was created.
+ * 2. For other nodes, expunge its radix tree node.
+ */
if (!rcu_access_pointer(fn->leaf)) {
- fn->fn_flags &= ~RTN_RTINFO;
- net->ipv6.rt6_stats->fib_route_nodes--;
+ if (!(fn->fn_flags & RTN_TL_ROOT)) {
+ fn->fn_flags &= ~RTN_RTINFO;
+ net->ipv6.rt6_stats->fib_route_nodes--;
+ }
fn = fib6_repair_tree(net, table, fn);
}
eth_random_addr(dev->perm_addr);
}
+#define GRE6_FEATURES (NETIF_F_SG | \
+ NETIF_F_FRAGLIST | \
+ NETIF_F_HIGHDMA | \
+ NETIF_F_HW_CSUM)
+
+static void ip6gre_tnl_init_features(struct net_device *dev)
+{
+ struct ip6_tnl *nt = netdev_priv(dev);
+
+ dev->features |= GRE6_FEATURES;
+ dev->hw_features |= GRE6_FEATURES;
+
+ if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
+ /* TCP offload with GRE SEQ is not supported, nor
+ * can we support 2 levels of outer headers requiring
+ * an update.
+ */
+ if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
+ nt->encap.type == TUNNEL_ENCAP_NONE) {
+ dev->features |= NETIF_F_GSO_SOFTWARE;
+ dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+ }
+
+ /* Can use a lockless transmit, unless we generate
+ * output sequences
+ */
+ dev->features |= NETIF_F_LLTX;
+ }
+}
+
static int ip6gre_tunnel_init_common(struct net_device *dev)
{
struct ip6_tnl *tunnel;
if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
dev->mtu -= 8;
+ ip6gre_tnl_init_features(dev);
+
return 0;
}
.ndo_get_iflink = ip6_tnl_get_iflink,
};
-#define GRE6_FEATURES (NETIF_F_SG | \
- NETIF_F_FRAGLIST | \
- NETIF_F_HIGHDMA | \
- NETIF_F_HW_CSUM)
-
static void ip6gre_tap_setup(struct net_device *dev)
{
ether_setup(dev);
+ dev->max_mtu = 0;
dev->netdev_ops = &ip6gre_tap_netdev_ops;
dev->needs_free_netdev = true;
dev->priv_destructor = ip6gre_dev_free;
nt->net = dev_net(dev);
ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
- dev->features |= GRE6_FEATURES;
- dev->hw_features |= GRE6_FEATURES;
-
- if (!(nt->parms.o_flags & TUNNEL_SEQ)) {
- /* TCP offload with GRE SEQ is not supported, nor
- * can we support 2 levels of outer headers requiring
- * an update.
- */
- if (!(nt->parms.o_flags & TUNNEL_CSUM) ||
- (nt->encap.type == TUNNEL_ENCAP_NONE)) {
- dev->features |= NETIF_F_GSO_SOFTWARE;
- dev->hw_features |= NETIF_F_GSO_SOFTWARE;
- }
-
- /* Can use a lockless transmit, unless we generate
- * output sequences
- */
- dev->features |= NETIF_F_LLTX;
- }
-
err = register_netdevice(dev);
if (err)
goto out;
!(IP6CB(skb)->flags & IP6SKB_REROUTED));
}
+static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
+{
+ if (!np->autoflowlabel_set)
+ return ip6_default_np_autolabel(net);
+ else
+ return np->autoflowlabel;
+}
+
/*
* xmit an sk_buff (used by TCP, SCTP and DCCP)
* Note : socket lock is not held for SYNACK packets, but might be modified
hlimit = ip6_dst_hoplimit(dst);
ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
- np->autoflowlabel, fl6));
+ ip6_autoflowlabel(net, np), fl6));
hdr->payload_len = htons(seg_len);
hdr->nexthdr = proto;
ip6_flow_hdr(hdr, v6_cork->tclass,
ip6_make_flowlabel(net, skb, fl6->flowlabel,
- np->autoflowlabel, fl6));
+ ip6_autoflowlabel(net, np), fl6));
hdr->hop_limit = v6_cork->hop_limit;
hdr->nexthdr = proto;
hdr->saddr = fl6->saddr;
cork.base.opt = NULL;
v6_cork.opt = NULL;
err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6);
- if (err)
+ if (err) {
+ ip6_cork_release(&cork, &v6_cork);
return ERR_PTR(err);
-
+ }
if (ipc6->dontfrag < 0)
ipc6->dontfrag = inet6_sk(sk)->dontfrag;
memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
neigh_release(neigh);
}
- } else if (!(t->parms.flags &
- (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) {
- /* enable the cache only only if the routing decision does
- * not depend on the current inner header value
+ } else if (t->parms.proto != 0 && !(t->parms.flags &
+ (IP6_TNL_F_USE_ORIG_TCLASS |
+ IP6_TNL_F_USE_ORIG_FWMARK))) {
+ /* enable the cache only if neither the outer protocol nor the
+ * routing decision depends on the current inner header value
*/
use_cache = true;
}
max_headroom += 8;
mtu -= 8;
}
- if (mtu < IPV6_MIN_MTU)
- mtu = IPV6_MIN_MTU;
+ if (skb->protocol == htons(ETH_P_IPV6)) {
+ if (mtu < IPV6_MIN_MTU)
+ mtu = IPV6_MIN_MTU;
+ } else if (mtu < 576) {
+ mtu = 576;
+ }
+
if (skb_dst(skb) && !t->parms.collect_md)
skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
{
struct ip6_tnl *tnl = netdev_priv(dev);
- if (tnl->parms.proto == IPPROTO_IPIP) {
- if (new_mtu < ETH_MIN_MTU)
+ if (tnl->parms.proto == IPPROTO_IPV6) {
+ if (new_mtu < IPV6_MIN_MTU)
return -EINVAL;
} else {
- if (new_mtu < IPV6_MIN_MTU)
+ if (new_mtu < ETH_MIN_MTU)
return -EINVAL;
}
if (new_mtu > 0xFFF8 - dev->hard_header_len)
break;
case IPV6_AUTOFLOWLABEL:
np->autoflowlabel = valbool;
+ np->autoflowlabel_set = 1;
retv = 0;
break;
case IPV6_RECVFRAGSIZE:
}
static struct sk_buff *add_grhead(struct sk_buff *skb, struct ifmcaddr6 *pmc,
- int type, struct mld2_grec **ppgr)
+ int type, struct mld2_grec **ppgr, unsigned int mtu)
{
- struct net_device *dev = pmc->idev->dev;
struct mld2_report *pmr;
struct mld2_grec *pgr;
- if (!skb)
- skb = mld_newpack(pmc->idev, dev->mtu);
- if (!skb)
- return NULL;
+ if (!skb) {
+ skb = mld_newpack(pmc->idev, mtu);
+ if (!skb)
+ return NULL;
+ }
pgr = skb_put(skb, sizeof(struct mld2_grec));
pgr->grec_type = type;
pgr->grec_auxwords = 0;
struct mld2_grec *pgr = NULL;
struct ip6_sf_list *psf, *psf_next, *psf_prev, **psf_list;
int scount, stotal, first, isquery, truncate;
+ unsigned int mtu;
if (pmc->mca_flags & MAF_NOREPORT)
return skb;
+ mtu = READ_ONCE(dev->mtu);
+ if (mtu < IPV6_MIN_MTU)
+ return skb;
+
isquery = type == MLD2_MODE_IS_INCLUDE ||
type == MLD2_MODE_IS_EXCLUDE;
truncate = type == MLD2_MODE_IS_EXCLUDE ||
AVAILABLE(skb) < grec_size(pmc, type, gdeleted, sdeleted)) {
if (skb)
mld_sendpack(skb);
- skb = mld_newpack(idev, dev->mtu);
+ skb = mld_newpack(idev, mtu);
}
}
first = 1;
pgr->grec_nsrcs = htons(scount);
if (skb)
mld_sendpack(skb);
- skb = mld_newpack(idev, dev->mtu);
+ skb = mld_newpack(idev, mtu);
first = 1;
scount = 0;
}
if (first) {
- skb = add_grhead(skb, pmc, type, &pgr);
+ skb = add_grhead(skb, pmc, type, &pgr, mtu);
first = 0;
}
if (!skb)
mld_sendpack(skb);
skb = NULL; /* add_grhead will get a new one */
}
- skb = add_grhead(skb, pmc, type, &pgr);
+ skb = add_grhead(skb, pmc, type, &pgr, mtu);
}
}
if (pgr)
if (!xt_find_jump_offset(offsets, newpos,
newinfo->number))
return 0;
- e = entry0 + newpos;
} else {
/* ... this is a fallthru */
newpos = pos + e->next_offset;
if (range->flags & NF_NAT_RANGE_MAP_IPS)
return -EINVAL;
- return 0;
+ return nf_ct_netns_get(par->net, par->family);
+}
+
+static void masquerade_tg6_destroy(const struct xt_tgdtor_param *par)
+{
+ nf_ct_netns_put(par->net, par->family);
}
static struct xt_target masquerade_tg6_reg __read_mostly = {
.name = "MASQUERADE",
.family = NFPROTO_IPV6,
.checkentry = masquerade_tg6_checkentry,
+ .destroy = masquerade_tg6_destroy,
.target = masquerade_tg6,
.targetsize = sizeof(struct nf_nat_range),
.table = "nat",
}
rt->dst.flags |= DST_HOST;
+ rt->dst.input = ip6_input;
rt->dst.output = ip6_output;
rt->rt6i_gateway = fl6->daddr;
rt->rt6i_dst.addr = fl6->daddr;
if (!ipv6_addr_any(&fl6.saddr))
flags |= RT6_LOOKUP_F_HAS_SADDR;
- if (!fibmatch)
- dst = ip6_route_input_lookup(net, dev, &fl6, flags);
- else
- dst = ip6_route_lookup(net, &fl6, 0);
+ dst = ip6_route_input_lookup(net, dev, &fl6, flags);
rcu_read_unlock();
} else {
fl6.flowi6_oif = oif;
- if (!fibmatch)
- dst = ip6_route_output(net, NULL, &fl6);
- else
- dst = ip6_route_lookup(net, &fl6, 0);
+ dst = ip6_route_output(net, NULL, &fl6);
}
goto errout;
}
+ if (fibmatch && rt->dst.from) {
+ struct rt6_info *ort = container_of(rt->dst.from,
+ struct rt6_info, dst);
+
+ dst_hold(&ort->dst);
+ ip6_rt_put(rt);
+ rt = ort;
+ }
+
skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb) {
ip6_rt_put(rt);
req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent, sk->sk_bound_dev_if,
- tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
+ tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr),
0, 0);
}
}
EXPORT_SYMBOL(xfrm6_rcv_spi);
+static int xfrm6_transport_finish2(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+{
+ if (xfrm_trans_queue(skb, ip6_rcv_finish))
+ __kfree_skb(skb);
+ return -1;
+}
+
int xfrm6_transport_finish(struct sk_buff *skb, int async)
{
struct xfrm_offload *xo = xfrm_offload(skb);
NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
dev_net(skb->dev), NULL, skb, skb->dev, NULL,
- ip6_rcv_finish);
+ xfrm6_transport_finish2);
return -1;
}
int i;
mutex_lock(&sta->ampdu_mlme.mtx);
- for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++)
___ieee80211_stop_rx_ba_session(sta, i, WLAN_BACK_RECIPIENT,
WLAN_REASON_QSTA_LEAVE_QBSS,
reason != AGG_STOP_DESTROY_STA &&
reason != AGG_STOP_PEER_REQUEST);
- }
- mutex_unlock(&sta->ampdu_mlme.mtx);
for (i = 0; i < IEEE80211_NUM_TIDS; i++)
___ieee80211_stop_tx_ba_session(sta, i, reason);
+ mutex_unlock(&sta->ampdu_mlme.mtx);
/* stopping might queue the work again - so cancel only afterwards */
cancel_work_sync(&sta->ampdu_mlme.work);
}
return true;
case NL80211_IFTYPE_MESH_POINT:
+ if (ether_addr_equal(sdata->vif.addr, hdr->addr2))
+ return false;
if (multicast)
return true;
return ether_addr_equal(sdata->vif.addr, hdr->addr1);
#define INC_BIT(bs) if((++(bs)->bit)>7){(bs)->cur++;(bs)->bit=0;}
#define INC_BITS(bs,b) if(((bs)->bit+=(b))>7){(bs)->cur+=(bs)->bit>>3;(bs)->bit&=7;}
#define BYTE_ALIGN(bs) if((bs)->bit){(bs)->cur++;(bs)->bit=0;}
-#define CHECK_BOUND(bs,n) if((bs)->cur+(n)>(bs)->end)return(H323_ERROR_BOUND)
static unsigned int get_len(struct bitstr *bs);
static unsigned int get_bit(struct bitstr *bs);
static unsigned int get_bits(struct bitstr *bs, unsigned int b);
return v;
}
+static int nf_h323_error_boundary(struct bitstr *bs, size_t bytes, size_t bits)
+{
+ bits += bs->bit;
+ bytes += bits / BITS_PER_BYTE;
+ if (bits % BITS_PER_BYTE > 0)
+ bytes++;
+
+ if (*bs->cur + bytes > *bs->end)
+ return 1;
+
+ return 0;
+}
+
/****************************************************************************/
static unsigned int get_bit(struct bitstr *bs)
{
PRINT("%*.s%s\n", level * TAB_SIZE, " ", f->name);
INC_BIT(bs);
-
- CHECK_BOUND(bs, 0);
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
return H323_ERROR_NONE;
}
PRINT("%*.s%s\n", level * TAB_SIZE, " ", f->name);
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 1);
+ if (nf_h323_error_boundary(bs, 1, 0))
+ return H323_ERROR_BOUND;
+
len = *bs->cur++;
bs->cur += len;
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
- CHECK_BOUND(bs, 0);
return H323_ERROR_NONE;
}
bs->cur += 2;
break;
case CONS: /* 64K < Range < 4G */
+ if (nf_h323_error_boundary(bs, 0, 2))
+ return H323_ERROR_BOUND;
len = get_bits(bs, 2) + 1;
BYTE_ALIGN(bs);
if (base && (f->attr & DECODE)) { /* timeToLive */
break;
case UNCO:
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
bs->cur += len;
break;
PRINT("\n");
- CHECK_BOUND(bs, 0);
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
return H323_ERROR_NONE;
}
INC_BITS(bs, f->sz);
}
- CHECK_BOUND(bs, 0);
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
return H323_ERROR_NONE;
}
len = f->lb;
break;
case WORD: /* 2-byte length */
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = (*bs->cur++) << 8;
len += (*bs->cur++) + f->lb;
break;
case SEMI:
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
break;
default:
bs->cur += len >> 3;
bs->bit = len & 7;
- CHECK_BOUND(bs, 0);
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
return H323_ERROR_NONE;
}
PRINT("%*.s%s\n", level * TAB_SIZE, " ", f->name);
/* 2 <= Range <= 255 */
+ if (nf_h323_error_boundary(bs, 0, f->sz))
+ return H323_ERROR_BOUND;
len = get_bits(bs, f->sz) + f->lb;
BYTE_ALIGN(bs);
INC_BITS(bs, (len << 2));
- CHECK_BOUND(bs, 0);
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
return H323_ERROR_NONE;
}
break;
case BYTE: /* Range == 256 */
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 1);
+ if (nf_h323_error_boundary(bs, 1, 0))
+ return H323_ERROR_BOUND;
len = (*bs->cur++) + f->lb;
break;
case SEMI:
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs) + f->lb;
break;
default: /* 2 <= Range <= 255 */
+ if (nf_h323_error_boundary(bs, 0, f->sz))
+ return H323_ERROR_BOUND;
len = get_bits(bs, f->sz) + f->lb;
BYTE_ALIGN(bs);
break;
PRINT("\n");
- CHECK_BOUND(bs, 0);
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
return H323_ERROR_NONE;
}
switch (f->sz) {
case BYTE: /* Range == 256 */
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 1);
+ if (nf_h323_error_boundary(bs, 1, 0))
+ return H323_ERROR_BOUND;
len = (*bs->cur++) + f->lb;
break;
default: /* 2 <= Range <= 255 */
+ if (nf_h323_error_boundary(bs, 0, f->sz))
+ return H323_ERROR_BOUND;
len = get_bits(bs, f->sz) + f->lb;
BYTE_ALIGN(bs);
break;
bs->cur += len << 1;
- CHECK_BOUND(bs, 0);
+ if (nf_h323_error_boundary(bs, 0, 0))
+ return H323_ERROR_BOUND;
return H323_ERROR_NONE;
}
base = (base && (f->attr & DECODE)) ? base + f->offset : NULL;
/* Extensible? */
+ if (nf_h323_error_boundary(bs, 0, 1))
+ return H323_ERROR_BOUND;
ext = (f->attr & EXT) ? get_bit(bs) : 0;
/* Get fields bitmap */
+ if (nf_h323_error_boundary(bs, 0, f->sz))
+ return H323_ERROR_BOUND;
bmp = get_bitmap(bs, f->sz);
if (base)
*(unsigned int *)base = bmp;
/* Decode */
if (son->attr & OPEN) { /* Open field */
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
- CHECK_BOUND(bs, len);
+ if (nf_h323_error_boundary(bs, len, 0))
+ return H323_ERROR_BOUND;
if (!base || !(son->attr & DECODE)) {
PRINT("%*.s%s\n", (level + 1) * TAB_SIZE,
" ", son->name);
return H323_ERROR_NONE;
/* Get the extension bitmap */
+ if (nf_h323_error_boundary(bs, 0, 7))
+ return H323_ERROR_BOUND;
bmp2_len = get_bits(bs, 7) + 1;
- CHECK_BOUND(bs, (bmp2_len + 7) >> 3);
+ if (nf_h323_error_boundary(bs, 0, bmp2_len))
+ return H323_ERROR_BOUND;
bmp2 = get_bitmap(bs, bmp2_len);
bmp |= bmp2 >> f->sz;
if (base)
for (opt = 0; opt < bmp2_len; opt++, i++, son++) {
/* Check Range */
if (i >= f->ub) { /* Newer Version? */
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
- CHECK_BOUND(bs, len);
+ if (nf_h323_error_boundary(bs, len, 0))
+ return H323_ERROR_BOUND;
bs->cur += len;
continue;
}
if (!((0x80000000 >> opt) & bmp2)) /* Not present */
continue;
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
- CHECK_BOUND(bs, len);
+ if (nf_h323_error_boundary(bs, len, 0))
+ return H323_ERROR_BOUND;
if (!base || !(son->attr & DECODE)) {
PRINT("%*.s%s\n", (level + 1) * TAB_SIZE, " ",
son->name);
switch (f->sz) {
case BYTE:
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 1);
+ if (nf_h323_error_boundary(bs, 1, 0))
+ return H323_ERROR_BOUND;
count = *bs->cur++;
break;
case WORD:
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
count = *bs->cur++;
count <<= 8;
count += *bs->cur++;
break;
case SEMI:
BYTE_ALIGN(bs);
- CHECK_BOUND(bs, 2);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
count = get_len(bs);
break;
default:
+ if (nf_h323_error_boundary(bs, 0, f->sz))
+ return H323_ERROR_BOUND;
count = get_bits(bs, f->sz);
break;
}
for (i = 0; i < count; i++) {
if (son->attr & OPEN) {
BYTE_ALIGN(bs);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
- CHECK_BOUND(bs, len);
+ if (nf_h323_error_boundary(bs, len, 0))
+ return H323_ERROR_BOUND;
if (!base || !(son->attr & DECODE)) {
PRINT("%*.s%s\n", (level + 1) * TAB_SIZE,
" ", son->name);
base = (base && (f->attr & DECODE)) ? base + f->offset : NULL;
/* Decode the choice index number */
+ if (nf_h323_error_boundary(bs, 0, 1))
+ return H323_ERROR_BOUND;
if ((f->attr & EXT) && get_bit(bs)) {
ext = 1;
+ if (nf_h323_error_boundary(bs, 0, 7))
+ return H323_ERROR_BOUND;
type = get_bits(bs, 7) + f->lb;
} else {
ext = 0;
+ if (nf_h323_error_boundary(bs, 0, f->sz))
+ return H323_ERROR_BOUND;
type = get_bits(bs, f->sz);
if (type >= f->lb)
return H323_ERROR_RANGE;
/* Check Range */
if (type >= f->ub) { /* Newer version? */
BYTE_ALIGN(bs);
+ if (nf_h323_error_boundary(bs, 2, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
- CHECK_BOUND(bs, len);
+ if (nf_h323_error_boundary(bs, len, 0))
+ return H323_ERROR_BOUND;
bs->cur += len;
return H323_ERROR_NONE;
}
if (ext || (son->attr & OPEN)) {
BYTE_ALIGN(bs);
+ if (nf_h323_error_boundary(bs, len, 0))
+ return H323_ERROR_BOUND;
len = get_len(bs);
- CHECK_BOUND(bs, len);
+ if (nf_h323_error_boundary(bs, len, 0))
+ return H323_ERROR_BOUND;
if (!base || !(son->attr & DECODE)) {
PRINT("%*.s%s\n", (level + 1) * TAB_SIZE, " ",
son->name);
#include <net/netfilter/nf_conntrack_zones.h>
#include <net/netfilter/nf_conntrack_timestamp.h>
#include <net/netfilter/nf_conntrack_labels.h>
-#include <net/netfilter/nf_conntrack_seqadj.h>
#include <net/netfilter/nf_conntrack_synproxy.h>
#ifdef CONFIG_NF_NAT_NEEDED
#include <net/netfilter/nf_nat_core.h>
static int ctnetlink_change_timeout(struct nf_conn *ct,
const struct nlattr * const cda[])
{
- u_int32_t timeout = ntohl(nla_get_be32(cda[CTA_TIMEOUT]));
+ u64 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
- ct->timeout = nfct_time_stamp + timeout * HZ;
+ if (timeout > INT_MAX)
+ timeout = INT_MAX;
+ ct->timeout = nfct_time_stamp + (u32)timeout;
if (test_bit(IPS_DYING_BIT, &ct->status))
return -ETIME;
int err = -EINVAL;
struct nf_conntrack_helper *helper;
struct nf_conn_tstamp *tstamp;
+ u64 timeout;
ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC);
if (IS_ERR(ct))
if (!cda[CTA_TIMEOUT])
goto err1;
- ct->timeout = nfct_time_stamp + ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
+ timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
+ if (timeout > INT_MAX)
+ timeout = INT_MAX;
+ ct->timeout = (u32)timeout + nfct_time_stamp;
rcu_read_lock();
if (cda[CTA_HELP]) {
IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED &&
timeouts[new_state] > timeouts[TCP_CONNTRACK_UNACK])
timeout = timeouts[TCP_CONNTRACK_UNACK];
+ else if (ct->proto.tcp.last_win == 0 &&
+ timeouts[new_state] > timeouts[TCP_CONNTRACK_RETRANS])
+ timeout = timeouts[TCP_CONNTRACK_RETRANS];
else
timeout = timeouts[new_state];
spin_unlock_bh(&ct->lock);
continue;
list_for_each_entry_rcu(chain, &table->chains, list) {
- if (ctx && ctx->chain[0] &&
+ if (ctx && ctx->chain &&
strcmp(ctx->chain, chain->name) != 0)
continue;
{
struct nft_obj_filter *filter = cb->data;
- kfree(filter->table);
- kfree(filter);
+ if (filter) {
+ kfree(filter->table);
+ kfree(filter);
+ }
return 0;
}
return 0;
}
+static void __net_exit nf_tables_exit_net(struct net *net)
+{
+ WARN_ON_ONCE(!list_empty(&net->nft.af_info));
+ WARN_ON_ONCE(!list_empty(&net->nft.commit_list));
+}
+
int __nft_release_basechain(struct nft_ctx *ctx)
{
struct nft_rule *rule, *nr;
static struct pernet_operations nf_tables_net_ops = {
.init = nf_tables_init_net,
+ .exit = nf_tables_exit_net,
};
static int __init nf_tables_module_init(void)
#include <linux/types.h>
#include <linux/list.h>
#include <linux/errno.h>
+#include <linux/capability.h>
#include <net/netlink.h>
#include <net/sock.h>
struct nfnl_cthelper *nlcth;
int ret = 0;
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
if (!tb[NFCTH_NAME] || !tb[NFCTH_TUPLE])
return -EINVAL;
struct nfnl_cthelper *nlcth;
bool tuple_set = false;
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
if (nlh->nlmsg_flags & NLM_F_DUMP) {
struct netlink_dump_control c = {
.dump = nfnl_cthelper_dump_table,
struct nfnl_cthelper *nlcth, *n;
int j = 0, ret;
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
if (tb[NFCTH_NAME])
helper_name = nla_data(tb[NFCTH_NAME]);
static void __net_exit nfnl_log_net_exit(struct net *net)
{
+ struct nfnl_log_net *log = nfnl_log_pernet(net);
+ unsigned int i;
+
#ifdef CONFIG_PROC_FS
remove_proc_entry("nfnetlink_log", net->nf.proc_netfilter);
#endif
nf_log_unset(net, &nfulnl_logger);
+ for (i = 0; i < INSTANCE_BUCKETS; i++)
+ WARN_ON_ONCE(!hlist_empty(&log->instance_table[i]));
}
static struct pernet_operations nfnl_log_net_ops = {
static void __net_exit nfnl_queue_net_exit(struct net *net)
{
+ struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+ unsigned int i;
+
nf_unregister_queue_handler(net);
#ifdef CONFIG_PROC_FS
remove_proc_entry("nfnetlink_queue", net->nf.proc_netfilter);
#endif
+ for (i = 0; i < INSTANCE_BUCKETS; i++)
+ WARN_ON_ONCE(!hlist_empty(&q->instance_table[i]));
}
static void nfnl_queue_net_exit_batch(struct list_head *net_exit_list)
[NFTA_EXTHDR_OFFSET] = { .type = NLA_U32 },
[NFTA_EXTHDR_LEN] = { .type = NLA_U32 },
[NFTA_EXTHDR_FLAGS] = { .type = NLA_U32 },
+ [NFTA_EXTHDR_OP] = { .type = NLA_U32 },
+ [NFTA_EXTHDR_SREG] = { .type = NLA_U32 },
};
static int nft_exthdr_init(const struct nft_ctx *ctx,
return 0;
}
+static void __net_exit xt_net_exit(struct net *net)
+{
+ int i;
+
+ for (i = 0; i < NFPROTO_NUMPROTO; i++)
+ WARN_ON_ONCE(!list_empty(&net->xt.tables[i]));
+}
+
static struct pernet_operations xt_net_ops = {
.init = xt_net_init,
+ .exit = xt_net_exit,
};
static int __init xt_init(void)
{
struct sock_fprog_kern program;
+ if (len > XT_BPF_MAX_NUM_INSTR)
+ return -EINVAL;
+
program.len = len;
program.filter = insns;
static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
{
- mm_segment_t oldfs = get_fs();
- int retval, fd;
-
- set_fs(KERNEL_DS);
- fd = bpf_obj_get_user(path, 0);
- set_fs(oldfs);
- if (fd < 0)
- return fd;
-
- retval = __bpf_mt_check_fd(fd, ret);
- sys_close(fd);
- return retval;
+ if (strnlen(path, XT_BPF_PATH_MAX) == XT_BPF_PATH_MAX)
+ return -EINVAL;
+
+ *ret = bpf_prog_get_type_path(path, BPF_PROG_TYPE_SOCKET_FILTER);
+ return PTR_ERR_OR_ZERO(*ret);
}
static int bpf_mt_check(const struct xt_mtchk_param *par)
#include <linux/module.h>
#include <linux/kernel.h>
+#include <linux/capability.h>
#include <linux/if.h>
#include <linux/inetdevice.h>
#include <linux/ip.h>
struct xt_osf_finger *kf = NULL, *sf;
int err = 0;
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
if (!osf_attrs[OSF_ATTR_FINGER])
return -EINVAL;
struct xt_osf_finger *sf;
int err = -ENOENT;
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
if (!osf_attrs[OSF_ATTR_FINGER])
return -EINVAL;
struct sock *sk = skb->sk;
int ret = -ENOMEM;
+ if (!net_eq(dev_net(dev), sock_net(sk)))
+ return 0;
+
dev_hold(dev);
if (is_vmalloc_addr(skb->head))
return -EINVAL;
skb_reset_network_header(skb);
+ key->eth.type = skb->protocol;
} else {
eth = eth_hdr(skb);
ether_addr_copy(key->eth.src, eth->h_source);
if (unlikely(parse_vlan(skb, key)))
return -ENOMEM;
- skb->protocol = parse_ethertype(skb);
- if (unlikely(skb->protocol == htons(0)))
+ key->eth.type = parse_ethertype(skb);
+ if (unlikely(key->eth.type == htons(0)))
return -ENOMEM;
+ /* Multiple tagged packets need to retain TPID to satisfy
+ * skb_vlan_pop(), which will later shift the ethertype into
+ * skb->protocol.
+ */
+ if (key->eth.cvlan.tci & htons(VLAN_TAG_PRESENT))
+ skb->protocol = key->eth.cvlan.tpid;
+ else
+ skb->protocol = key->eth.type;
+
skb_reset_network_header(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
}
skb_reset_mac_len(skb);
- key->eth.type = skb->protocol;
/* Network layer. */
if (key->eth.type == htons(ETH_P_IP)) {
local_vec = (struct rds_iovec __user *)(unsigned long) args->local_vec_addr;
+ if (args->nr_local == 0)
+ return -EINVAL;
+
/* figure out the number of pages in the vector */
for (i = 0; i < args->nr_local; i++) {
if (copy_from_user(&vec, &local_vec[i],
err:
if (page)
put_page(page);
+ rm->atomic.op_active = 0;
kfree(rm->atomic.op_notifier);
return ret;
continue;
if (cmsg->cmsg_type == RDS_CMSG_RDMA_ARGS) {
+ if (cmsg->cmsg_len <
+ CMSG_LEN(sizeof(struct rds_rdma_args)))
+ return -EINVAL;
args = CMSG_DATA(cmsg);
*rdma_bytes += args->remote_vec.bytes;
}
if (action == TC_ACT_SHOT)
this_cpu_ptr(gact->common.cpu_qstats)->drops += packets;
- tm->lastuse = lastuse;
+ tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}
static int tcf_gact_dump(struct sk_buff *skb, struct tc_action *a,
#include <net/pkt_sched.h>
#include <uapi/linux/tc_act/tc_ife.h>
#include <net/tc_act/tc_ife.h>
-#include <linux/rtnetlink.h>
static int skbmark_encode(struct sk_buff *skb, void *skbdata,
struct tcf_meta_info *e)
#include <net/pkt_sched.h>
#include <uapi/linux/tc_act/tc_ife.h>
#include <net/tc_act/tc_ife.h>
-#include <linux/rtnetlink.h>
static int skbtcindex_encode(struct sk_buff *skb, void *skbdata,
struct tcf_meta_info *e)
struct tcf_t *tm = &m->tcf_tm;
_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
- tm->lastuse = lastuse;
+ tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}
static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind,
#include <linux/skbuff.h>
#include <linux/init.h>
#include <linux/kmod.h>
-#include <linux/err.h>
#include <linux/slab.h>
#include <net/net_namespace.h>
#include <net/sock.h>
{
struct tcf_chain *chain;
+ if (!block)
+ return;
/* Hold a refcnt for all chains, except 0, so that they don't disappear
* while we are iterating.
*/
struct list_head link;
struct tcf_result res;
bool exts_integrated;
- bool offloaded;
u32 gen_flags;
struct tcf_exts exts;
u32 handle;
}
static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
- enum tc_clsbpf_command cmd)
+ struct cls_bpf_prog *oldprog)
{
- bool addorrep = cmd == TC_CLSBPF_ADD || cmd == TC_CLSBPF_REPLACE;
struct tcf_block *block = tp->chain->block;
- bool skip_sw = tc_skip_sw(prog->gen_flags);
struct tc_cls_bpf_offload cls_bpf = {};
+ struct cls_bpf_prog *obj;
+ bool skip_sw;
int err;
+ skip_sw = prog && tc_skip_sw(prog->gen_flags);
+ obj = prog ?: oldprog;
+
tc_cls_common_offload_init(&cls_bpf.common, tp);
- cls_bpf.command = cmd;
- cls_bpf.exts = &prog->exts;
- cls_bpf.prog = prog->filter;
- cls_bpf.name = prog->bpf_name;
- cls_bpf.exts_integrated = prog->exts_integrated;
- cls_bpf.gen_flags = prog->gen_flags;
+ cls_bpf.command = TC_CLSBPF_OFFLOAD;
+ cls_bpf.exts = &obj->exts;
+ cls_bpf.prog = prog ? prog->filter : NULL;
+ cls_bpf.oldprog = oldprog ? oldprog->filter : NULL;
+ cls_bpf.name = obj->bpf_name;
+ cls_bpf.exts_integrated = obj->exts_integrated;
+ cls_bpf.gen_flags = obj->gen_flags;
err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
- if (addorrep) {
+ if (prog) {
if (err < 0) {
- cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY);
+ cls_bpf_offload_cmd(tp, oldprog, prog);
return err;
} else if (err > 0) {
prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
}
}
- if (addorrep && skip_sw && !(prog->gen_flags & TCA_CLS_FLAGS_IN_HW))
+ if (prog && skip_sw && !(prog->gen_flags & TCA_CLS_FLAGS_IN_HW))
return -EINVAL;
return 0;
static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
struct cls_bpf_prog *oldprog)
{
- struct cls_bpf_prog *obj = prog;
- enum tc_clsbpf_command cmd;
- bool skip_sw;
- int ret;
-
- skip_sw = tc_skip_sw(prog->gen_flags) ||
- (oldprog && tc_skip_sw(oldprog->gen_flags));
-
- if (oldprog && oldprog->offloaded) {
- if (!tc_skip_hw(prog->gen_flags)) {
- cmd = TC_CLSBPF_REPLACE;
- } else if (!tc_skip_sw(prog->gen_flags)) {
- obj = oldprog;
- cmd = TC_CLSBPF_DESTROY;
- } else {
- return -EINVAL;
- }
- } else {
- if (tc_skip_hw(prog->gen_flags))
- return skip_sw ? -EINVAL : 0;
- cmd = TC_CLSBPF_ADD;
- }
-
- ret = cls_bpf_offload_cmd(tp, obj, cmd);
- if (ret)
- return ret;
+ if (prog && oldprog && prog->gen_flags != oldprog->gen_flags)
+ return -EINVAL;
- obj->offloaded = true;
- if (oldprog)
- oldprog->offloaded = false;
+ if (prog && tc_skip_hw(prog->gen_flags))
+ prog = NULL;
+ if (oldprog && tc_skip_hw(oldprog->gen_flags))
+ oldprog = NULL;
+ if (!prog && !oldprog)
+ return 0;
- return 0;
+ return cls_bpf_offload_cmd(tp, prog, oldprog);
}
static void cls_bpf_stop_offload(struct tcf_proto *tp,
{
int err;
- if (!prog->offloaded)
- return;
-
- err = cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY);
- if (err) {
+ err = cls_bpf_offload_cmd(tp, NULL, prog);
+ if (err)
pr_err("Stopping hardware offload failed: %d\n", err);
- return;
- }
-
- prog->offloaded = false;
}
static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
struct cls_bpf_prog *prog)
{
- if (!prog->offloaded)
- return;
+ struct tcf_block *block = tp->chain->block;
+ struct tc_cls_bpf_offload cls_bpf = {};
+
+ tc_cls_common_offload_init(&cls_bpf.common, tp);
+ cls_bpf.command = TC_CLSBPF_STATS;
+ cls_bpf.exts = &prog->exts;
+ cls_bpf.prog = prog->filter;
+ cls_bpf.name = prog->bpf_name;
+ cls_bpf.exts_integrated = prog->exts_integrated;
+ cls_bpf.gen_flags = prog->gen_flags;
- cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_STATS);
+ tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, &cls_bpf, false);
}
static int cls_bpf_init(struct tcf_proto *tp)
#include <net/netlink.h>
#include <net/act_api.h>
#include <net/pkt_cls.h>
-#include <linux/netdevice.h>
#include <linux/idr.h>
struct tc_u_knode {
tcm->tcm_info = refcount_read(&q->refcnt);
if (nla_put_string(skb, TCA_KIND, q->ops->id))
goto nla_put_failure;
+ if (nla_put_u8(skb, TCA_HW_OFFLOAD, !!(q->flags & TCQ_F_OFFLOADED)))
+ goto nla_put_failure;
if (q->ops->dump && q->ops->dump(q, skb) < 0)
goto nla_put_failure;
qlen = q->q.qlen;
if (!tp_head) {
RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
+ /* Wait for flying RCU callback before it is freed. */
+ rcu_barrier_bh();
return;
}
rcu_assign_pointer(*miniqp->p_miniq, miniq);
if (miniq_old)
- /* This is counterpart of the rcu barrier above. We need to
+ /* This is counterpart of the rcu barriers above. We need to
* block potential new user of miniq_old until all readers
* are not seeing it.
*/
struct net_device *dev = qdisc_dev(sch);
int err;
+ net_inc_ingress_queue();
+
mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
q->block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
if (err)
return err;
- net_inc_ingress_queue();
sch->flags |= TCQ_F_CPUSTATS;
return 0;
struct net_device *dev = qdisc_dev(sch);
int err;
+ net_inc_ingress_queue();
+ net_inc_egress_queue();
+
mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress);
q->ingress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
err = tcf_block_get_ext(&q->egress_block, sch, &q->egress_block_info);
if (err)
- goto err_egress_block_get;
-
- net_inc_ingress_queue();
- net_inc_egress_queue();
+ return err;
sch->flags |= TCQ_F_CPUSTATS;
return 0;
-
-err_egress_block_get:
- tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
- return err;
}
static void clsact_destroy(struct Qdisc *sch)
.handle = sch->handle,
.parent = sch->parent,
};
+ int err;
if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
return -EOPNOTSUPP;
opt.command = TC_RED_DESTROY;
}
- return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, &opt);
+ err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, &opt);
+
+ if (!err && enable)
+ sch->flags |= TCQ_F_OFFLOADED;
+ else
+ sch->flags &= ~TCQ_F_OFFLOADED;
+
+ return err;
}
static void red_destroy(struct Qdisc *sch)
return red_change(sch, opt);
}
-static int red_dump_offload(struct Qdisc *sch, struct tc_red_qopt *opt)
+static int red_dump_offload_stats(struct Qdisc *sch, struct tc_red_qopt *opt)
{
struct net_device *dev = qdisc_dev(sch);
struct tc_red_qopt_offload hw_stats = {
.stats.qstats = &sch->qstats,
},
};
- int err;
- opt->flags &= ~TC_RED_OFFLOADED;
- if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
- return 0;
-
- err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
- &hw_stats);
- if (err == -EOPNOTSUPP)
+ if (!(sch->flags & TCQ_F_OFFLOADED))
return 0;
- if (!err)
- opt->flags |= TC_RED_OFFLOADED;
-
- return err;
+ return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
+ &hw_stats);
}
static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
int err;
sch->qstats.backlog = q->qdisc->qstats.backlog;
- err = red_dump_offload(sch, &opt);
+ err = red_dump_offload_stats(sch, &opt);
if (err)
goto nla_put_failure;
.marked = q->stats.prob_mark + q->stats.forced_mark,
};
- if (tc_can_offload(dev) && dev->netdev_ops->ndo_setup_tc) {
+ if (sch->flags & TCQ_F_OFFLOADED) {
struct red_stats hw_stats = {0};
struct tc_red_qopt_offload hw_stats_request = {
.command = TC_RED_XSTATS,
case SCTP_CID_AUTH:
return "AUTH";
+ case SCTP_CID_RECONF:
+ return "RECONF";
+
default:
break;
}
return;
}
- if (t->param_flags & SPP_PMTUD_ENABLE) {
- /* Update transports view of the MTU */
- sctp_transport_update_pmtu(t, pmtu);
-
- /* Update association pmtu. */
- sctp_assoc_sync_pmtu(asoc);
- }
+ if (!(t->param_flags & SPP_PMTUD_ENABLE))
+ /* We can't allow retransmitting in such case, as the
+ * retransmission would be sized just as before, and thus we
+ * would get another icmp, and retransmit again.
+ */
+ return;
- /* Retransmit with the new pmtu setting.
- * Normally, if PMTU discovery is disabled, an ICMP Fragmentation
- * Needed will never be sent, but if a message was sent before
- * PMTU discovery was disabled that was larger than the PMTU, it
- * would not be fragmented, so it must be re-transmitted fragmented.
+ /* Update transports view of the MTU. Return if no update was needed.
+ * If an update wasn't needed/possible, it also doesn't make sense to
+ * try to retransmit now.
*/
+ if (!sctp_transport_update_pmtu(t, pmtu))
+ return;
+
+ /* Update association pmtu. */
+ sctp_assoc_sync_pmtu(asoc);
+
+ /* Retransmit with the new pmtu setting. */
sctp_retransmit(&asoc->outqueue, t, SCTP_RTXR_PMTUD);
}
if (asoc && sctp_outq_is_empty(&asoc->outqueue)) {
event = sctp_ulpevent_make_sender_dry_event(asoc,
- GFP_ATOMIC);
+ GFP_USER | __GFP_NOWARN);
if (!event)
return -ENOMEM;
if (optlen < sizeof(struct sctp_hmacalgo))
return -EINVAL;
+ optlen = min_t(unsigned int, optlen, sizeof(struct sctp_hmacalgo) +
+ SCTP_AUTH_NUM_HMACS * sizeof(u16));
hmacs = memdup_user(optval, optlen);
if (IS_ERR(hmacs))
if (optlen <= sizeof(struct sctp_authkey))
return -EINVAL;
+ /* authkey->sca_keylength is u16, so optlen can't be bigger than
+ * this.
+ */
+ optlen = min_t(unsigned int, optlen, USHRT_MAX +
+ sizeof(struct sctp_authkey));
authkey = memdup_user(optval, optlen);
if (IS_ERR(authkey))
struct sctp_association *asoc;
int retval = -EINVAL;
- if (optlen < sizeof(struct sctp_reset_streams))
+ if (optlen < sizeof(*params))
return -EINVAL;
+ /* srs_number_streams is u16, so optlen can't be bigger than this. */
+ optlen = min_t(unsigned int, optlen, USHRT_MAX +
+ sizeof(__u16) * sizeof(*params));
params = memdup_user(optval, optlen);
if (IS_ERR(params))
return PTR_ERR(params);
+ if (params->srs_number_streams * sizeof(__u16) >
+ optlen - sizeof(*params))
+ goto out;
+
asoc = sctp_id2assoc(sk, params->srs_assoc_id);
if (!asoc)
goto out;
SCTP_DBG_OBJCNT_INC(sock);
local_bh_disable();
- percpu_counter_inc(&sctp_sockets_allocated);
+ sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
/* Nothing can fail after this block, otherwise
}
sctp_endpoint_free(sp->ep);
local_bh_disable();
- percpu_counter_dec(&sctp_sockets_allocated);
+ sk_sockets_allocated_dec(sk);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
local_bh_enable();
}
len = sizeof(int);
if (put_user(len, optlen))
return -EFAULT;
- if (copy_to_user(optval, &sctp_sk(sk)->autoclose, sizeof(int)))
+ if (copy_to_user(optval, &sctp_sk(sk)->autoclose, len))
return -EFAULT;
return 0;
}
err = -EFAULT;
goto out;
}
+ /* XXX: We should have accounted for sizeof(struct sctp_getaddrs) too,
+ * but we can't change it anymore.
+ */
if (put_user(bytes_copied, optlen))
err = -EFAULT;
out:
params.assoc_id = 0;
} else if (len >= sizeof(struct sctp_assoc_value)) {
len = sizeof(struct sctp_assoc_value);
- if (copy_from_user(¶ms, optval, sizeof(params)))
+ if (copy_from_user(¶ms, optval, len))
return -EFAULT;
} else
return -EINVAL;
if (len < sizeof(struct sctp_authkeyid))
return -EINVAL;
- if (copy_from_user(&val, optval, sizeof(struct sctp_authkeyid)))
+
+ len = sizeof(struct sctp_authkeyid);
+ if (copy_from_user(&val, optval, len))
return -EFAULT;
asoc = sctp_id2assoc(sk, val.scact_assoc_id);
else
val.scact_keynumber = ep->active_key_id;
- len = sizeof(struct sctp_authkeyid);
if (put_user(len, optlen))
return -EFAULT;
if (copy_to_user(optval, &val, len))
if (len < sizeof(struct sctp_authchunks))
return -EINVAL;
- if (copy_from_user(&val, optval, sizeof(struct sctp_authchunks)))
+ if (copy_from_user(&val, optval, sizeof(val)))
return -EFAULT;
to = p->gauth_chunks;
if (len < sizeof(struct sctp_authchunks))
return -EINVAL;
- if (copy_from_user(&val, optval, sizeof(struct sctp_authchunks)))
+ if (copy_from_user(&val, optval, sizeof(val)))
return -EFAULT;
to = p->gauth_chunks;
sctp_stream_outq_migrate(stream, NULL, outcnt);
sched->sched_all(stream);
- i = sctp_stream_alloc_out(stream, outcnt, gfp);
- if (i)
- return i;
+ ret = sctp_stream_alloc_out(stream, outcnt, gfp);
+ if (ret)
+ goto out;
stream->outcnt = outcnt;
for (i = 0; i < stream->outcnt; i++)
if (!incnt)
goto out;
- i = sctp_stream_alloc_in(stream, incnt, gfp);
- if (i) {
- ret = -ENOMEM;
- goto free;
+ ret = sctp_stream_alloc_in(stream, incnt, gfp);
+ if (ret) {
+ sched->free(stream);
+ kfree(stream->out);
+ stream->out = NULL;
+ stream->outcnt = 0;
+ goto out;
}
stream->incnt = incnt;
- goto out;
-free:
- sched->free(stream);
- kfree(stream->out);
- stream->out = NULL;
out:
return ret;
}
transport->pathmtu = SCTP_DEFAULT_MAXSEGMENT;
}
-void sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu)
+bool sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu)
{
struct dst_entry *dst = sctp_transport_dst_check(t);
+ bool change = true;
if (unlikely(pmtu < SCTP_DEFAULT_MINSEGMENT)) {
- pr_warn("%s: Reported pmtu %d too low, using default minimum of %d\n",
- __func__, pmtu, SCTP_DEFAULT_MINSEGMENT);
- /* Use default minimum segment size and disable
- * pmtu discovery on this transport.
- */
- t->pathmtu = SCTP_DEFAULT_MINSEGMENT;
- } else {
- t->pathmtu = pmtu;
+ pr_warn_ratelimited("%s: Reported pmtu %d too low, using default minimum of %d\n",
+ __func__, pmtu, SCTP_DEFAULT_MINSEGMENT);
+ /* Use default minimum segment instead */
+ pmtu = SCTP_DEFAULT_MINSEGMENT;
}
+ pmtu = SCTP_TRUNC4(pmtu);
if (dst) {
dst->ops->update_pmtu(dst, t->asoc->base.sk, NULL, pmtu);
dst = sctp_transport_dst_check(t);
}
- if (!dst)
+ if (!dst) {
t->af_specific->get_dst(t, &t->saddr, &t->fl, t->asoc->base.sk);
+ dst = t->dst;
+ }
+
+ if (dst) {
+ /* Re-fetch, as under layers may have a higher minimum size */
+ pmtu = SCTP_TRUNC4(dst_mtu(dst));
+ change = t->pathmtu != pmtu;
+ }
+ t->pathmtu = pmtu;
+
+ return change;
}
/* Caches the dst entry and source address for a transport's destination
void sctp_ulpq_renege(struct sctp_ulpq *ulpq, struct sctp_chunk *chunk,
gfp_t gfp)
{
- struct sctp_association *asoc;
- __u16 needed, freed;
-
- asoc = ulpq->asoc;
+ struct sctp_association *asoc = ulpq->asoc;
+ __u32 freed = 0;
+ __u16 needed;
- if (chunk) {
- needed = ntohs(chunk->chunk_hdr->length);
- needed -= sizeof(struct sctp_data_chunk);
- } else
- needed = SCTP_DEFAULT_MAXWINDOW;
-
- freed = 0;
+ needed = ntohs(chunk->chunk_hdr->length) -
+ sizeof(struct sctp_data_chunk);
if (skb_queue_empty(&asoc->base.sk->sk_receive_queue)) {
freed = sctp_ulpq_renege_order(ulpq, needed);
- if (freed < needed) {
+ if (freed < needed)
freed += sctp_ulpq_renege_frags(ulpq, needed - freed);
- }
}
/* If able to free enough room, accept this chunk. */
- if (chunk && (freed >= needed)) {
- int retval;
- retval = sctp_ulpq_tail_data(ulpq, chunk, gfp);
+ if (freed >= needed) {
+ int retval = sctp_ulpq_tail_data(ulpq, chunk, gfp);
/*
* Enter partial delivery if chunk has not been
* delivered; otherwise, drain the reassembly queue.
{
struct file *newfile;
int fd = get_unused_fd_flags(flags);
- if (unlikely(fd < 0))
+ if (unlikely(fd < 0)) {
+ sock_release(sock);
return fd;
+ }
newfile = sock_alloc_file(sock, flags, NULL);
if (likely(!IS_ERR(newfile))) {
core_initcall(sock_init); /* early initcall */
+static int __init jit_init(void)
+{
+#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+ bpf_jit_enable = 1;
+#endif
+ return 0;
+}
+pure_initcall(jit_init);
+
#ifdef CONFIG_PROC_FS
void socket_seq_show(struct seq_file *seq)
{
* allows a thread in BH context to safely check if the process
* lock is held. In this case, if the lock is held, queue work.
*/
- if (sock_owned_by_user(strp->sk)) {
+ if (sock_owned_by_user_nocheck(strp->sk)) {
queue_work(strp_wq, &strp->work);
return;
}
goto out_free_groups;
creds->cr_group_info->gid[i] = kgid;
}
+ groups_sort(creds->cr_group_info);
return 0;
out_free_groups:
goto out;
rsci.cred.cr_group_info->gid[i] = kgid;
}
+ groups_sort(rsci.cred.cr_group_info);
/* mech name */
len = qword_get(&mesg, buf, mlen);
ug.gi->gid[i] = kgid;
}
+ groups_sort(ug.gi);
ugp = unix_gid_lookup(cd, uid);
if (ugp) {
struct cache_head *ch;
kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
cred->cr_group_info->gid[i] = kgid;
}
+ groups_sort(cred->cr_group_info);
if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
*authp = rpc_autherr_badverf;
return SVC_DENIED;
{
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
+ unsigned int connect_cookie;
int status, numreqs;
dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);
} else if (!req->rq_bytes_sent)
return;
+ connect_cookie = xprt->connect_cookie;
req->rq_xtime = ktime_get();
status = xprt->ops->send_request(task);
trace_xprt_transmit(xprt, req->rq_xid, status);
xprt->stat.bklog_u += xprt->backlog.qlen;
xprt->stat.sending_u += xprt->sending.qlen;
xprt->stat.pending_u += xprt->pending.qlen;
+ spin_unlock_bh(&xprt->transport_lock);
- /* Don't race with disconnect */
- if (!xprt_connected(xprt))
- task->tk_status = -ENOTCONN;
- else {
+ req->rq_connect_cookie = connect_cookie;
+ if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) {
/*
- * Sleep on the pending queue since
- * we're expecting a reply.
+ * Sleep on the pending queue if we're expecting a reply.
+ * The spinlock ensures atomicity between the test of
+ * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on().
*/
- if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task))
+ spin_lock(&xprt->recv_lock);
+ if (!req->rq_reply_bytes_recvd) {
rpc_sleep_on(&xprt->pending, task, xprt_timer);
- req->rq_connect_cookie = xprt->connect_cookie;
+ /*
+ * Send an extra queue wakeup call if the
+ * connection was dropped in case the call to
+ * rpc_sleep_on() raced.
+ */
+ if (!xprt_connected(xprt))
+ xprt_wake_pending_tasks(xprt, -ENOTCONN);
+ }
+ spin_unlock(&xprt->recv_lock);
}
- spin_unlock_bh(&xprt->transport_lock);
}
static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task)
dprintk("RPC: %s: reply %p completes request %p (xid 0x%08x)\n",
__func__, rep, req, be32_to_cpu(rep->rr_xid));
- if (list_empty(&req->rl_registered) &&
- !test_bit(RPCRDMA_REQ_F_TX_RESOURCES, &req->rl_flags))
- rpcrdma_complete_rqst(rep);
- else
- queue_work(rpcrdma_receive_wq, &rep->rr_work);
+ queue_work_on(req->rl_cpu, rpcrdma_receive_wq, &rep->rr_work);
return;
out_badstatus:
#include <linux/slab.h>
#include <linux/seq_file.h>
#include <linux/sunrpc/addr.h>
+#include <linux/smp.h>
#include "xprt_rdma.h"
task->tk_pid, __func__, rqst->rq_callsize,
rqst->rq_rcvsize, req);
+ req->rl_cpu = smp_processor_id();
req->rl_connect_cookie = 0; /* our reserved value */
rpcrdma_set_xprtdata(rqst, req);
rqst->rq_buffer = req->rl_sendbuf->rg_base;
struct workqueue_struct *recv_wq;
recv_wq = alloc_workqueue("xprtrdma_receive",
- WQ_MEM_RECLAIM | WQ_UNBOUND | WQ_HIGHPRI,
+ WQ_MEM_RECLAIM | WQ_HIGHPRI,
0);
if (!recv_wq)
return -ENOMEM;
struct rpcrdma_buffer;
struct rpcrdma_req {
struct list_head rl_list;
+ int rl_cpu;
unsigned int rl_connect_cookie;
struct rpcrdma_buffer *rl_buffer;
struct rpcrdma_rep *rl_reply;
if (res) {
pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
name, -res);
+ kfree(b);
return -EINVAL;
}
if (skb)
tipc_bearer_xmit_skb(net, bearer_id, skb, &b->bcast_addr);
- if (tipc_mon_create(net, bearer_id))
+ if (tipc_mon_create(net, bearer_id)) {
+ bearer_disable(net, b);
return -ENOMEM;
+ }
pr_info("Enabled bearer <%s>, discovery domain %s, priority %u\n",
name,
static void tipc_group_decr_active(struct tipc_group *grp,
struct tipc_member *m)
{
- if (m->state == MBR_ACTIVE || m->state == MBR_RECLAIMING)
+ if (m->state == MBR_ACTIVE || m->state == MBR_RECLAIMING ||
+ m->state == MBR_REMITTED)
grp->active_cnt--;
}
if (m->window >= ADV_IDLE)
return;
- if (!list_empty(&m->congested))
- return;
+ list_del_init(&m->congested);
/* Sort member into congested members' list */
list_for_each_entry_safe(_m, tmp, &grp->congested, congested) {
u16 prev = grp->bc_snd_nxt - 1;
struct tipc_member *m;
struct rb_node *n;
+ u16 ackers = 0;
for (n = rb_first(&grp->members); n; n = rb_next(n)) {
m = container_of(n, struct tipc_member, tree_node);
if (tipc_group_is_enabled(m)) {
tipc_group_update_member(m, len);
m->bc_acked = prev;
+ ackers++;
}
}
/* Mark number of acknowledges to expect, if any */
if (ack)
- grp->bc_ackers = grp->member_cnt;
+ grp->bc_ackers = ackers;
grp->bc_snd_nxt++;
}
int max_active = grp->max_active;
int reclaim_limit = max_active * 3 / 4;
int active_cnt = grp->active_cnt;
- struct tipc_member *m, *rm;
+ struct tipc_member *m, *rm, *pm;
m = tipc_group_find_member(grp, node, port);
if (!m)
pr_warn_ratelimited("Rcv unexpected msg after REMIT\n");
tipc_group_proto_xmit(grp, m, GRP_ADV_MSG, xmitq);
}
+ grp->active_cnt--;
+ list_del_init(&m->list);
+ if (list_empty(&grp->pending))
+ return;
+
+ /* Set oldest pending member to active and advertise */
+ pm = list_first_entry(&grp->pending, struct tipc_member, list);
+ pm->state = MBR_ACTIVE;
+ list_move_tail(&pm->list, &grp->active);
+ grp->active_cnt++;
+ tipc_group_proto_xmit(grp, pm, GRP_ADV_MSG, xmitq);
break;
case MBR_RECLAIMING:
case MBR_DISCOVERED:
} else if (mtyp == GRP_REMIT_MSG) {
msg_set_grp_remitted(hdr, m->window);
}
+ msg_set_dest_droppable(hdr, true);
__skb_queue_tail(xmitq, skb);
}
msg_set_grp_bc_seqno(ehdr, m->bc_syncpt);
__skb_queue_tail(inputq, m->event_msg);
}
- if (m->window < ADV_IDLE)
- tipc_group_update_member(m, 0);
- else
- list_del_init(&m->congested);
+ list_del_init(&m->congested);
+ tipc_group_update_member(m, 0);
return;
case GRP_LEAVE_MSG:
if (!m)
return;
m->bc_syncpt = msg_grp_bc_syncpt(hdr);
+ list_del_init(&m->list);
+ list_del_init(&m->congested);
+ *usr_wakeup = true;
/* Wait until WITHDRAW event is received */
if (m->state != MBR_LEAVING) {
ehdr = buf_msg(m->event_msg);
msg_set_grp_bc_seqno(ehdr, m->bc_syncpt);
__skb_queue_tail(inputq, m->event_msg);
- *usr_wakeup = true;
- list_del_init(&m->congested);
return;
case GRP_ADV_MSG:
if (!m)
if (!m || m->state != MBR_RECLAIMING)
return;
- list_del_init(&m->list);
- grp->active_cnt--;
remitted = msg_grp_remitted(hdr);
/* Messages preceding the REMIT still in receive queue */
if (m->advertised > remitted) {
m->state = MBR_REMITTED;
in_flight = m->advertised - remitted;
+ m->advertised = ADV_IDLE + in_flight;
+ return;
}
/* All messages preceding the REMIT have been read */
if (m->advertised <= remitted) {
tipc_group_proto_xmit(grp, m, GRP_ADV_MSG, xmitq);
m->advertised = ADV_IDLE + in_flight;
+ grp->active_cnt--;
+ list_del_init(&m->list);
/* Set oldest pending member to active and advertise */
if (list_empty(&grp->pending))
*usr_wakeup = true;
m->usr_pending = false;
node_up = tipc_node_is_up(net, node);
-
- /* Hold back event if more messages might be expected */
- if (m->state != MBR_LEAVING && node_up) {
- m->event_msg = skb;
- tipc_group_decr_active(grp, m);
- m->state = MBR_LEAVING;
- } else {
- if (node_up)
+ m->event_msg = NULL;
+
+ if (node_up) {
+ /* Hold back event if a LEAVE msg should be expected */
+ if (m->state != MBR_LEAVING) {
+ m->event_msg = skb;
+ tipc_group_decr_active(grp, m);
+ m->state = MBR_LEAVING;
+ } else {
msg_set_grp_bc_seqno(hdr, m->bc_syncpt);
- else
+ __skb_queue_tail(inputq, skb);
+ }
+ } else {
+ if (m->state != MBR_LEAVING) {
+ tipc_group_decr_active(grp, m);
+ m->state = MBR_LEAVING;
msg_set_grp_bc_seqno(hdr, m->bc_rcv_nxt);
+ } else {
+ msg_set_grp_bc_seqno(hdr, m->bc_syncpt);
+ }
__skb_queue_tail(inputq, skb);
}
+ list_del_init(&m->list);
list_del_init(&m->congested);
}
*sk_rcvbuf = tipc_group_rcvbuf_limit(grp);
{
struct tipc_net *tn = tipc_net(net);
struct tipc_monitor *mon = tipc_monitor(net, bearer_id);
- struct tipc_peer *self = get_self(net, bearer_id);
+ struct tipc_peer *self;
struct tipc_peer *peer, *tmp;
+ if (!mon)
+ return;
+
+ self = get_self(net, bearer_id);
write_lock_bh(&mon->lock);
tn->monitors[bearer_id] = NULL;
list_for_each_entry_safe(peer, tmp, &self->list, list) {
switch (sk->sk_state) {
case TIPC_ESTABLISHED:
+ case TIPC_CONNECTING:
if (!tsk->cong_link_cnt && !tsk_conn_cong(tsk))
revents |= POLLOUT;
/* fall thru' */
case TIPC_LISTEN:
- case TIPC_CONNECTING:
if (!skb_queue_empty(&sk->sk_receive_queue))
revents |= POLLIN | POLLRDNORM;
break;
__skb_dequeue(arrvq);
__skb_queue_tail(inputq, skb);
}
- refcount_dec(&skb->users);
+ kfree_skb(skb);
spin_unlock_bh(&inputq->lock);
continue;
}
cfg80211-y += extra-certs.o
endif
-$(obj)/shipped-certs.c: $(wildcard $(srctree)/$(src)/certs/*.x509)
+$(obj)/shipped-certs.c: $(wildcard $(srctree)/$(src)/certs/*.hex)
@$(kecho) " GEN $@"
- @echo '#include "reg.h"' > $@
- @echo 'const u8 shipped_regdb_certs[] = {' >> $@
- @for f in $^ ; do hexdump -v -e '1/1 "0x%.2x," "\n"' < $$f >> $@ ; done
- @echo '};' >> $@
- @echo 'unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);' >> $@
+ @(echo '#include "reg.h"'; \
+ echo 'const u8 shipped_regdb_certs[] = {'; \
+ cat $^ ; \
+ echo '};'; \
+ echo 'unsigned int shipped_regdb_certs_len = sizeof(shipped_regdb_certs);'; \
+ ) > $@
$(obj)/extra-certs.c: $(CONFIG_CFG80211_EXTRA_REGDB_KEYDIR:"%"=%) \
$(wildcard $(CONFIG_CFG80211_EXTRA_REGDB_KEYDIR:"%"=%)/*.x509)
@$(kecho) " GEN $@"
- @echo '#include "reg.h"' > $@
- @echo 'const u8 extra_regdb_certs[] = {' >> $@
- @for f in $^ ; do test -f $$f && hexdump -v -e '1/1 "0x%.2x," "\n"' < $$f >> $@ || true ; done
- @echo '};' >> $@
- @echo 'unsigned int extra_regdb_certs_len = sizeof(extra_regdb_certs);' >> $@
+ @(set -e; \
+ allf=""; \
+ for f in $^ ; do \
+ # similar to hexdump -v -e '1/1 "0x%.2x," "\n"' \
+ thisf=$$(od -An -v -tx1 < $$f | \
+ sed -e 's/ /\n/g' | \
+ sed -e 's/^[0-9a-f]\+$$/\0/;t;d' | \
+ sed -e 's/^/0x/;s/$$/,/'); \
+ # file should not be empty - maybe command substitution failed? \
+ test ! -z "$$thisf";\
+ allf=$$allf$$thisf;\
+ done; \
+ ( \
+ echo '#include "reg.h"'; \
+ echo 'const u8 extra_regdb_certs[] = {'; \
+ echo "$$allf"; \
+ echo '};'; \
+ echo 'unsigned int extra_regdb_certs_len = sizeof(extra_regdb_certs);'; \
+ ) > $@)
+
+clean-files += shipped-certs.c extra-certs.c
--- /dev/null
+/* Seth Forshee's regdb certificate */
+0x30, 0x82, 0x02, 0xa4, 0x30, 0x82, 0x01, 0x8c,
+0x02, 0x09, 0x00, 0xb2, 0x8d, 0xdf, 0x47, 0xae,
+0xf9, 0xce, 0xa7, 0x30, 0x0d, 0x06, 0x09, 0x2a,
+0x86, 0x48, 0x86, 0xf7, 0x0d, 0x01, 0x01, 0x0b,
+0x05, 0x00, 0x30, 0x13, 0x31, 0x11, 0x30, 0x0f,
+0x06, 0x03, 0x55, 0x04, 0x03, 0x0c, 0x08, 0x73,
+0x66, 0x6f, 0x72, 0x73, 0x68, 0x65, 0x65, 0x30,
+0x20, 0x17, 0x0d, 0x31, 0x37, 0x31, 0x30, 0x30,
+0x36, 0x31, 0x39, 0x34, 0x30, 0x33, 0x35, 0x5a,
+0x18, 0x0f, 0x32, 0x31, 0x31, 0x37, 0x30, 0x39,
+0x31, 0x32, 0x31, 0x39, 0x34, 0x30, 0x33, 0x35,
+0x5a, 0x30, 0x13, 0x31, 0x11, 0x30, 0x0f, 0x06,
+0x03, 0x55, 0x04, 0x03, 0x0c, 0x08, 0x73, 0x66,
+0x6f, 0x72, 0x73, 0x68, 0x65, 0x65, 0x30, 0x82,
+0x01, 0x22, 0x30, 0x0d, 0x06, 0x09, 0x2a, 0x86,
+0x48, 0x86, 0xf7, 0x0d, 0x01, 0x01, 0x01, 0x05,
+0x00, 0x03, 0x82, 0x01, 0x0f, 0x00, 0x30, 0x82,
+0x01, 0x0a, 0x02, 0x82, 0x01, 0x01, 0x00, 0xb5,
+0x40, 0xe3, 0x9c, 0x28, 0x84, 0x39, 0x03, 0xf2,
+0x39, 0xd7, 0x66, 0x2c, 0x41, 0x38, 0x15, 0xac,
+0x7e, 0xa5, 0x83, 0x71, 0x25, 0x7e, 0x90, 0x7c,
+0x68, 0xdd, 0x6f, 0x3f, 0xd9, 0xd7, 0x59, 0x38,
+0x9f, 0x7c, 0x6a, 0x52, 0xc2, 0x03, 0x2a, 0x2d,
+0x7e, 0x66, 0xf4, 0x1e, 0xb3, 0x12, 0x70, 0x20,
+0x5b, 0xd4, 0x97, 0x32, 0x3d, 0x71, 0x8b, 0x3b,
+0x1b, 0x08, 0x17, 0x14, 0x6b, 0x61, 0xc4, 0x57,
+0x8b, 0x96, 0x16, 0x1c, 0xfd, 0x24, 0xd5, 0x0b,
+0x09, 0xf9, 0x68, 0x11, 0x84, 0xfb, 0xca, 0x51,
+0x0c, 0xd1, 0x45, 0x19, 0xda, 0x10, 0x44, 0x8a,
+0xd9, 0xfe, 0x76, 0xa9, 0xfd, 0x60, 0x2d, 0x18,
+0x0b, 0x28, 0x95, 0xb2, 0x2d, 0xea, 0x88, 0x98,
+0xb8, 0xd1, 0x56, 0x21, 0xf0, 0x53, 0x1f, 0xf1,
+0x02, 0x6f, 0xe9, 0x46, 0x9b, 0x93, 0x5f, 0x28,
+0x90, 0x0f, 0xac, 0x36, 0xfa, 0x68, 0x23, 0x71,
+0x57, 0x56, 0xf6, 0xcc, 0xd3, 0xdf, 0x7d, 0x2a,
+0xd9, 0x1b, 0x73, 0x45, 0xeb, 0xba, 0x27, 0x85,
+0xef, 0x7a, 0x7f, 0xa5, 0xcb, 0x80, 0xc7, 0x30,
+0x36, 0xd2, 0x53, 0xee, 0xec, 0xac, 0x1e, 0xe7,
+0x31, 0xf1, 0x36, 0xa2, 0x9c, 0x63, 0xc6, 0x65,
+0x5b, 0x7f, 0x25, 0x75, 0x68, 0xa1, 0xea, 0xd3,
+0x7e, 0x00, 0x5c, 0x9a, 0x5e, 0xd8, 0x20, 0x18,
+0x32, 0x77, 0x07, 0x29, 0x12, 0x66, 0x1e, 0x36,
+0x73, 0xe7, 0x97, 0x04, 0x41, 0x37, 0xb1, 0xb1,
+0x72, 0x2b, 0xf4, 0xa1, 0x29, 0x20, 0x7c, 0x96,
+0x79, 0x0b, 0x2b, 0xd0, 0xd8, 0xde, 0xc8, 0x6c,
+0x3f, 0x93, 0xfb, 0xc5, 0xee, 0x78, 0x52, 0x11,
+0x15, 0x1b, 0x7a, 0xf6, 0xe2, 0x68, 0x99, 0xe7,
+0xfb, 0x46, 0x16, 0x84, 0xe3, 0xc7, 0xa1, 0xe6,
+0xe0, 0xd2, 0x46, 0xd5, 0xe1, 0xc4, 0x5f, 0xa0,
+0x66, 0xf4, 0xda, 0xc4, 0xff, 0x95, 0x1d, 0x02,
+0x03, 0x01, 0x00, 0x01, 0x30, 0x0d, 0x06, 0x09,
+0x2a, 0x86, 0x48, 0x86, 0xf7, 0x0d, 0x01, 0x01,
+0x0b, 0x05, 0x00, 0x03, 0x82, 0x01, 0x01, 0x00,
+0x87, 0x03, 0xda, 0xf2, 0x82, 0xc2, 0xdd, 0xaf,
+0x7c, 0x44, 0x2f, 0x86, 0xd3, 0x5f, 0x4c, 0x93,
+0x48, 0xb9, 0xfe, 0x07, 0x17, 0xbb, 0x21, 0xf7,
+0x25, 0x23, 0x4e, 0xaa, 0x22, 0x0c, 0x16, 0xb9,
+0x73, 0xae, 0x9d, 0x46, 0x7c, 0x75, 0xd9, 0xc3,
+0x49, 0x57, 0x47, 0xbf, 0x33, 0xb7, 0x97, 0xec,
+0xf5, 0x40, 0x75, 0xc0, 0x46, 0x22, 0xf0, 0xa0,
+0x5d, 0x9c, 0x79, 0x13, 0xa1, 0xff, 0xb8, 0xa3,
+0x2f, 0x7b, 0x8e, 0x06, 0x3f, 0xc8, 0xb6, 0xe4,
+0x6a, 0x28, 0xf2, 0x34, 0x5c, 0x23, 0x3f, 0x32,
+0xc0, 0xe6, 0xad, 0x0f, 0xac, 0xcf, 0x55, 0x74,
+0x47, 0x73, 0xd3, 0x01, 0x85, 0xb7, 0x0b, 0x22,
+0x56, 0x24, 0x7d, 0x9f, 0x09, 0xa9, 0x0e, 0x86,
+0x9e, 0x37, 0x5b, 0x9c, 0x6d, 0x02, 0xd9, 0x8c,
+0xc8, 0x50, 0x6a, 0xe2, 0x59, 0xf3, 0x16, 0x06,
+0xea, 0xb2, 0x42, 0xb5, 0x58, 0xfe, 0xba, 0xd1,
+0x81, 0x57, 0x1a, 0xef, 0xb2, 0x38, 0x88, 0x58,
+0xf6, 0xaa, 0xc4, 0x2e, 0x8b, 0x5a, 0x27, 0xe4,
+0xa5, 0xe8, 0xa4, 0xca, 0x67, 0x5c, 0xac, 0x72,
+0x67, 0xc3, 0x6f, 0x13, 0xc3, 0x2d, 0x35, 0x79,
+0xd7, 0x8a, 0xe7, 0xf5, 0xd4, 0x21, 0x30, 0x4a,
+0xd5, 0xf6, 0xa3, 0xd9, 0x79, 0x56, 0xf2, 0x0f,
+0x10, 0xf7, 0x7d, 0xd0, 0x51, 0x93, 0x2f, 0x47,
+0xf8, 0x7d, 0x4b, 0x0a, 0x84, 0x55, 0x12, 0x0a,
+0x7d, 0x4e, 0x3b, 0x1f, 0x2b, 0x2f, 0xfc, 0x28,
+0xb3, 0x69, 0x34, 0xe1, 0x80, 0x80, 0xbb, 0xe2,
+0xaf, 0xb9, 0xd6, 0x30, 0xf1, 0x1d, 0x54, 0x87,
+0x23, 0x99, 0x9f, 0x51, 0x03, 0x4c, 0x45, 0x7d,
+0x02, 0x65, 0x73, 0xab, 0xfd, 0xcf, 0x94, 0xcc,
+0x0d, 0x3a, 0x60, 0xfd, 0x3c, 0x14, 0x2f, 0x16,
+0x33, 0xa9, 0x21, 0x1f, 0xcb, 0x50, 0xb1, 0x8f,
+0x03, 0xee, 0xa0, 0x66, 0xa9, 0x16, 0x79, 0x14,
case NL80211_IFTYPE_AP:
if (wdev->ssid_len &&
nla_put(msg, NL80211_ATTR_SSID, wdev->ssid_len, wdev->ssid))
- goto nla_put_failure;
+ goto nla_put_failure_locked;
break;
case NL80211_IFTYPE_STATION:
case NL80211_IFTYPE_P2P_CLIENT:
if (!ssid_ie)
break;
if (nla_put(msg, NL80211_ATTR_SSID, ssid_ie[1], ssid_ie + 2))
- goto nla_put_failure;
+ goto nla_put_failure_locked;
break;
}
default:
genlmsg_end(msg, hdr);
return 0;
+ nla_put_failure_locked:
+ wdev_unlock(wdev);
nla_put_failure:
genlmsg_cancel(msg, hdr);
return -EMSGSIZE;
break;
case NL80211_NAN_FUNC_FOLLOW_UP:
if (!tb[NL80211_NAN_FUNC_FOLLOW_UP_ID] ||
- !tb[NL80211_NAN_FUNC_FOLLOW_UP_REQ_ID]) {
+ !tb[NL80211_NAN_FUNC_FOLLOW_UP_REQ_ID] ||
+ !tb[NL80211_NAN_FUNC_FOLLOW_UP_DEST]) {
err = -EINVAL;
goto out;
}
*
*/
+#include <linux/bottom_half.h>
+#include <linux/interrupt.h>
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/netdevice.h>
+#include <linux/percpu.h>
#include <net/dst.h>
#include <net/ip.h>
#include <net/xfrm.h>
#include <net/ip_tunnels.h>
#include <net/ip6_tunnel.h>
+struct xfrm_trans_tasklet {
+ struct tasklet_struct tasklet;
+ struct sk_buff_head queue;
+};
+
+struct xfrm_trans_cb {
+ int (*finish)(struct net *net, struct sock *sk, struct sk_buff *skb);
+};
+
+#define XFRM_TRANS_SKB_CB(__skb) ((struct xfrm_trans_cb *)&((__skb)->cb[0]))
+
static struct kmem_cache *secpath_cachep __read_mostly;
static DEFINE_SPINLOCK(xfrm_input_afinfo_lock);
static struct gro_cells gro_cells;
static struct net_device xfrm_napi_dev;
+static DEFINE_PER_CPU(struct xfrm_trans_tasklet, xfrm_trans_tasklet);
+
int xfrm_input_register_afinfo(const struct xfrm_input_afinfo *afinfo)
{
int err = 0;
xfrm_address_t *daddr;
struct xfrm_mode *inner_mode;
u32 mark = skb->mark;
- unsigned int family;
+ unsigned int family = AF_UNSPEC;
int decaps = 0;
int async = 0;
bool xfrm_gro = false;
if (encap_type < 0) {
x = xfrm_input_state(skb);
+
+ if (unlikely(x->km.state != XFRM_STATE_VALID)) {
+ if (x->km.state == XFRM_STATE_ACQ)
+ XFRM_INC_STATS(net, LINUX_MIB_XFRMACQUIREERROR);
+ else
+ XFRM_INC_STATS(net,
+ LINUX_MIB_XFRMINSTATEINVALID);
+ goto drop;
+ }
+
family = x->outer_mode->afinfo->family;
/* An encap_type of -1 indicates async resumption. */
}
EXPORT_SYMBOL(xfrm_input_resume);
+static void xfrm_trans_reinject(unsigned long data)
+{
+ struct xfrm_trans_tasklet *trans = (void *)data;
+ struct sk_buff_head queue;
+ struct sk_buff *skb;
+
+ __skb_queue_head_init(&queue);
+ skb_queue_splice_init(&trans->queue, &queue);
+
+ while ((skb = __skb_dequeue(&queue)))
+ XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
+}
+
+int xfrm_trans_queue(struct sk_buff *skb,
+ int (*finish)(struct net *, struct sock *,
+ struct sk_buff *))
+{
+ struct xfrm_trans_tasklet *trans;
+
+ trans = this_cpu_ptr(&xfrm_trans_tasklet);
+
+ if (skb_queue_len(&trans->queue) >= netdev_max_backlog)
+ return -ENOBUFS;
+
+ XFRM_TRANS_SKB_CB(skb)->finish = finish;
+ skb_queue_tail(&trans->queue, skb);
+ tasklet_schedule(&trans->tasklet);
+ return 0;
+}
+EXPORT_SYMBOL(xfrm_trans_queue);
+
void __init xfrm_input_init(void)
{
int err;
+ int i;
init_dummy_netdev(&xfrm_napi_dev);
err = gro_cells_init(&gro_cells, &xfrm_napi_dev);
sizeof(struct sec_path),
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC,
NULL);
+
+ for_each_possible_cpu(i) {
+ struct xfrm_trans_tasklet *trans;
+
+ trans = &per_cpu(xfrm_trans_tasklet, i);
+ __skb_queue_head_init(&trans->queue);
+ tasklet_init(&trans->tasklet, xfrm_trans_reinject,
+ (unsigned long)trans);
+ }
}
again:
pol = rcu_dereference(sk->sk_policy[dir]);
if (pol != NULL) {
- bool match = xfrm_selector_match(&pol->selector, fl, family);
+ bool match;
int err = 0;
+ if (pol->family != family) {
+ pol = NULL;
+ goto out;
+ }
+
+ match = xfrm_selector_match(&pol->selector, fl, family);
if (match) {
if ((sk->sk_mark & pol->mark.m) != pol->mark.v) {
pol = NULL;
sizeof(struct xfrm_policy *) * num_pols) == 0 &&
xfrm_xdst_can_reuse(xdst, xfrm, err)) {
dst_hold(&xdst->u.dst);
+ xfrm_pols_put(pols, num_pols);
while (err > 0)
xfrm_state_put(xfrm[--err]);
return xdst;
if (orig->aead) {
x->aead = xfrm_algo_aead_clone(orig->aead);
+ x->geniv = orig->geniv;
if (!x->aead)
goto error;
}
static int validate_tmpl(int nr, struct xfrm_user_tmpl *ut, u16 family)
{
+ u16 prev_family;
int i;
if (nr > XFRM_MAX_DEPTH)
return -EINVAL;
+ prev_family = family;
+
for (i = 0; i < nr; i++) {
/* We never validated the ut->family value, so many
* applications simply leave it at zero. The check was
if (!ut[i].family)
ut[i].family = family;
+ if ((ut[i].mode == XFRM_MODE_TRANSPORT) &&
+ (ut[i].family != prev_family))
+ return -EINVAL;
+
+ prev_family = ut[i].family;
+
switch (ut[i].family) {
case AF_INET:
break;
default:
return -EINVAL;
}
+
+ switch (ut[i].id.proto) {
+ case IPPROTO_AH:
+ case IPPROTO_ESP:
+ case IPPROTO_COMP:
+#if IS_ENABLED(CONFIG_IPV6)
+ case IPPROTO_ROUTING:
+ case IPPROTO_DSTOPTS:
+#endif
+ case IPSEC_PROTO_ANY:
+ break;
+ default:
+ return -EINVAL;
+ }
+
}
return 0;
[XFRMA_PROTO] = { .type = NLA_U8 },
[XFRMA_ADDRESS_FILTER] = { .len = sizeof(struct xfrm_address_filter) },
[XFRMA_OFFLOAD_DEV] = { .len = sizeof(struct xfrm_user_offload) },
- [XFRMA_OUTPUT_MARK] = { .len = NLA_U32 },
+ [XFRMA_OUTPUT_MARK] = { .type = NLA_U32 },
};
static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
}
}
-# whine about ACCESS_ONCE
- if ($^V && $^V ge 5.10.0 &&
- $line =~ /\bACCESS_ONCE\s*$balanced_parens\s*(=(?!=))?\s*($FuncArg)?/) {
- my $par = $1;
- my $eq = $2;
- my $fun = $3;
- $par =~ s/^\(\s*(.*)\s*\)$/$1/;
- if (defined($eq)) {
- if (WARN("PREFER_WRITE_ONCE",
- "Prefer WRITE_ONCE(<FOO>, <BAR>) over ACCESS_ONCE(<FOO>) = <BAR>\n" . $herecurr) &&
- $fix) {
- $fixed[$fixlinenr] =~ s/\bACCESS_ONCE\s*\(\s*\Q$par\E\s*\)\s*$eq\s*\Q$fun\E/WRITE_ONCE($par, $fun)/;
- }
- } else {
- if (WARN("PREFER_READ_ONCE",
- "Prefer READ_ONCE(<FOO>) over ACCESS_ONCE(<FOO>)\n" . $herecurr) &&
- $fix) {
- $fixed[$fixlinenr] =~ s/\bACCESS_ONCE\s*\(\s*\Q$par\E\s*\)/READ_ONCE($par)/;
- }
- }
- }
-
# check for mutex_trylock_recursive usage
if ($line =~ /mutex_trylock_recursive/) {
ERROR("LOCKING",
set -o errexit
set -o nounset
-READELF="${CROSS_COMPILE}readelf"
-ADDR2LINE="${CROSS_COMPILE}addr2line"
-SIZE="${CROSS_COMPILE}size"
-NM="${CROSS_COMPILE}nm"
+READELF="${CROSS_COMPILE:-}readelf"
+ADDR2LINE="${CROSS_COMPILE:-}addr2line"
+SIZE="${CROSS_COMPILE:-}size"
+NM="${CROSS_COMPILE:-}nm"
command -v awk >/dev/null 2>&1 || die "awk isn't installed"
command -v ${READELF} >/dev/null 2>&1 || die "readelf isn't installed"
implement socket and networking access controls.
If you are unsure how to answer this question, answer N.
+config PAGE_TABLE_ISOLATION
+ bool "Remove the kernel mapping in user mode"
+ default y
+ depends on X86_64 && !UML
+ help
+ This feature reduces the number of hardware side channels by
+ ensuring that the majority of kernel addresses are not mapped
+ into userspace.
+
+ See Documentation/x86/pagetable-isolation.txt for more details.
+
config SECURITY_INFINIBAND
bool "Infiniband Security Hooks"
depends on SECURITY && INFINIBAND
AA_BUG(!mntpath);
AA_BUG(!buffer);
+ if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
+ return 0;
+
error = aa_path_name(mntpath, path_flags(profile, mntpath), buffer,
&mntpnt, &info, profile->disconnected);
if (error)
AA_BUG(!profile);
AA_BUG(devpath && !devbuffer);
+ if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
+ return 0;
+
if (devpath) {
error = aa_path_name(devpath, path_flags(profile, devpath),
devbuffer, &devname, &info,
AA_BUG(!profile);
AA_BUG(!path);
+ if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
+ return 0;
+
error = aa_path_name(path, path_flags(profile, path), buffer, &name,
&info, profile->disconnected);
if (error)
AA_BUG(!new_path);
AA_BUG(!old_path);
- if (profile_unconfined(profile))
+ if (profile_unconfined(profile) ||
+ !PROFILE_MEDIATES(profile, AA_CLASS_MOUNT))
return aa_get_newest_label(&profile->label);
error = aa_path_name(old_path, path_flags(profile, old_path),
return m & ~VFS_CAP_FLAGS_EFFECTIVE;
}
-static bool is_v2header(size_t size, __le32 magic)
+static bool is_v2header(size_t size, const struct vfs_cap_data *cap)
{
- __u32 m = le32_to_cpu(magic);
if (size != XATTR_CAPS_SZ_2)
return false;
- return sansflags(m) == VFS_CAP_REVISION_2;
+ return sansflags(le32_to_cpu(cap->magic_etc)) == VFS_CAP_REVISION_2;
}
-static bool is_v3header(size_t size, __le32 magic)
+static bool is_v3header(size_t size, const struct vfs_cap_data *cap)
{
- __u32 m = le32_to_cpu(magic);
-
if (size != XATTR_CAPS_SZ_3)
return false;
- return sansflags(m) == VFS_CAP_REVISION_3;
+ return sansflags(le32_to_cpu(cap->magic_etc)) == VFS_CAP_REVISION_3;
}
/*
fs_ns = inode->i_sb->s_user_ns;
cap = (struct vfs_cap_data *) tmpbuf;
- if (is_v2header((size_t) ret, cap->magic_etc)) {
+ if (is_v2header((size_t) ret, cap)) {
/* If this is sizeof(vfs_cap_data) then we're ok with the
* on-disk value, so return that. */
if (alloc)
else
kfree(tmpbuf);
return ret;
- } else if (!is_v3header((size_t) ret, cap->magic_etc)) {
+ } else if (!is_v3header((size_t) ret, cap)) {
kfree(tmpbuf);
return -EINVAL;
}
return make_kuid(task_ns, rootid);
}
-static bool validheader(size_t size, __le32 magic)
+static bool validheader(size_t size, const struct vfs_cap_data *cap)
{
- return is_v2header(size, magic) || is_v3header(size, magic);
+ return is_v2header(size, cap) || is_v3header(size, cap);
}
/*
if (!*ivalue)
return -EINVAL;
- if (!validheader(size, cap->magic_etc))
+ if (!validheader(size, cap))
return -EINVAL;
if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
return -EPERM;
v = snd_pcm_hw_param_last(pcm, params, var, dir);
else
v = snd_pcm_hw_param_first(pcm, params, var, dir);
- snd_BUG_ON(v < 0);
return v;
}
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0)
return tmp;
- mutex_lock(&runtime->oss.params_lock);
while (bytes > 0) {
+ if (mutex_lock_interruptible(&runtime->oss.params_lock)) {
+ tmp = -ERESTARTSYS;
+ break;
+ }
if (bytes < runtime->oss.period_bytes || runtime->oss.buffer_used > 0) {
tmp = bytes;
if (tmp + runtime->oss.buffer_used > runtime->oss.period_bytes)
xfer += tmp;
if ((substream->f_flags & O_NONBLOCK) != 0 &&
tmp != runtime->oss.period_bytes)
- break;
+ tmp = -EAGAIN;
}
- }
- mutex_unlock(&runtime->oss.params_lock);
- return xfer;
-
err:
- mutex_unlock(&runtime->oss.params_lock);
+ mutex_unlock(&runtime->oss.params_lock);
+ if (tmp < 0)
+ break;
+ if (signal_pending(current)) {
+ tmp = -ERESTARTSYS;
+ break;
+ }
+ tmp = 0;
+ }
return xfer > 0 ? (snd_pcm_sframes_t)xfer : tmp;
}
if ((tmp = snd_pcm_oss_make_ready(substream)) < 0)
return tmp;
- mutex_lock(&runtime->oss.params_lock);
while (bytes > 0) {
+ if (mutex_lock_interruptible(&runtime->oss.params_lock)) {
+ tmp = -ERESTARTSYS;
+ break;
+ }
if (bytes < runtime->oss.period_bytes || runtime->oss.buffer_used > 0) {
if (runtime->oss.buffer_used == 0) {
tmp = snd_pcm_oss_read2(substream, runtime->oss.buffer, runtime->oss.period_bytes, 1);
bytes -= tmp;
xfer += tmp;
}
- }
- mutex_unlock(&runtime->oss.params_lock);
- return xfer;
-
err:
- mutex_unlock(&runtime->oss.params_lock);
+ mutex_unlock(&runtime->oss.params_lock);
+ if (tmp < 0)
+ break;
+ if (signal_pending(current)) {
+ tmp = -ERESTARTSYS;
+ break;
+ }
+ tmp = 0;
+ }
return xfer > 0 ? (snd_pcm_sframes_t)xfer : tmp;
}
snd_pcm_sframes_t frames = size;
plugin = snd_pcm_plug_first(plug);
- while (plugin && frames > 0) {
+ while (plugin) {
+ if (frames <= 0)
+ return frames;
if ((next = plugin->next) != NULL) {
snd_pcm_sframes_t frames1 = frames;
- if (plugin->dst_frames)
+ if (plugin->dst_frames) {
frames1 = plugin->dst_frames(plugin, frames);
+ if (frames1 <= 0)
+ return frames1;
+ }
if ((err = next->client_channels(next, frames1, &dst_channels)) < 0) {
return err;
}
if (err != frames1) {
frames = err;
- if (plugin->src_frames)
+ if (plugin->src_frames) {
frames = plugin->src_frames(plugin, frames1);
+ if (frames <= 0)
+ return frames;
+ }
}
} else
dst_channels = NULL;
return changed;
if (params->rmask) {
int err = snd_pcm_hw_refine(pcm, params);
- if (snd_BUG_ON(err < 0))
+ if (err < 0)
return err;
}
return snd_pcm_hw_param_value(params, var, dir);
return changed;
if (params->rmask) {
int err = snd_pcm_hw_refine(pcm, params);
- if (snd_BUG_ON(err < 0))
+ if (err < 0)
return err;
}
return snd_pcm_hw_param_value(params, var, dir);
return ret < 0 ? ret : frames;
}
-/* decrease the appl_ptr; returns the processed frames or a negative error */
+/* decrease the appl_ptr; returns the processed frames or zero for error */
static snd_pcm_sframes_t rewind_appl_ptr(struct snd_pcm_substream *substream,
snd_pcm_uframes_t frames,
snd_pcm_sframes_t avail)
if (appl_ptr < 0)
appl_ptr += runtime->boundary;
ret = pcm_lib_apply_appl_ptr(substream, appl_ptr);
- return ret < 0 ? ret : frames;
+ /* NOTE: we return zero for errors because PulseAudio gets depressed
+ * upon receiving an error from rewind ioctl and stops processing
+ * any longer. Returning zero means that no rewind is done, so
+ * it's not absolutely wrong to answer like that.
+ */
+ return ret < 0 ? 0 : frames;
}
static snd_pcm_sframes_t snd_pcm_playback_rewind(struct snd_pcm_substream *substream,
return 0;
}
-int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info)
+static int __snd_rawmidi_info_select(struct snd_card *card,
+ struct snd_rawmidi_info *info)
{
struct snd_rawmidi *rmidi;
struct snd_rawmidi_str *pstr;
struct snd_rawmidi_substream *substream;
- mutex_lock(®ister_mutex);
rmidi = snd_rawmidi_search(card, info->device);
- mutex_unlock(®ister_mutex);
if (!rmidi)
return -ENXIO;
if (info->stream < 0 || info->stream > 1)
}
return -ENXIO;
}
+
+int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info)
+{
+ int ret;
+
+ mutex_lock(®ister_mutex);
+ ret = __snd_rawmidi_info_select(card, info);
+ mutex_unlock(®ister_mutex);
+ return ret;
+}
EXPORT_SYMBOL(snd_rawmidi_info_select);
static int snd_rawmidi_info_select_user(struct snd_card *card,
#include <sound/core.h>
#include <sound/control.h>
#include <sound/pcm.h>
+#include <sound/pcm_params.h>
#include <sound/info.h>
#include <sound/initval.h>
return 0;
}
-static void params_change_substream(struct loopback_pcm *dpcm,
- struct snd_pcm_runtime *runtime)
-{
- struct snd_pcm_runtime *dst_runtime;
-
- if (dpcm == NULL || dpcm->substream == NULL)
- return;
- dst_runtime = dpcm->substream->runtime;
- if (dst_runtime == NULL)
- return;
- dst_runtime->hw = dpcm->cable->hw;
-}
-
static void params_change(struct snd_pcm_substream *substream)
{
struct snd_pcm_runtime *runtime = substream->runtime;
cable->hw.rate_max = runtime->rate;
cable->hw.channels_min = runtime->channels;
cable->hw.channels_max = runtime->channels;
- params_change_substream(cable->streams[SNDRV_PCM_STREAM_PLAYBACK],
- runtime);
- params_change_substream(cable->streams[SNDRV_PCM_STREAM_CAPTURE],
- runtime);
}
static int loopback_prepare(struct snd_pcm_substream *substream)
static int rule_format(struct snd_pcm_hw_params *params,
struct snd_pcm_hw_rule *rule)
{
+ struct loopback_pcm *dpcm = rule->private;
+ struct loopback_cable *cable = dpcm->cable;
+ struct snd_mask m;
- struct snd_pcm_hardware *hw = rule->private;
- struct snd_mask *maskp = hw_param_mask(params, rule->var);
-
- maskp->bits[0] &= (u_int32_t)hw->formats;
- maskp->bits[1] &= (u_int32_t)(hw->formats >> 32);
- memset(maskp->bits + 2, 0, (SNDRV_MASK_MAX-64) / 8); /* clear rest */
- if (! maskp->bits[0] && ! maskp->bits[1])
- return -EINVAL;
- return 0;
+ snd_mask_none(&m);
+ mutex_lock(&dpcm->loopback->cable_lock);
+ m.bits[0] = (u_int32_t)cable->hw.formats;
+ m.bits[1] = (u_int32_t)(cable->hw.formats >> 32);
+ mutex_unlock(&dpcm->loopback->cable_lock);
+ return snd_mask_refine(hw_param_mask(params, rule->var), &m);
}
static int rule_rate(struct snd_pcm_hw_params *params,
struct snd_pcm_hw_rule *rule)
{
- struct snd_pcm_hardware *hw = rule->private;
+ struct loopback_pcm *dpcm = rule->private;
+ struct loopback_cable *cable = dpcm->cable;
struct snd_interval t;
- t.min = hw->rate_min;
- t.max = hw->rate_max;
+ mutex_lock(&dpcm->loopback->cable_lock);
+ t.min = cable->hw.rate_min;
+ t.max = cable->hw.rate_max;
+ mutex_unlock(&dpcm->loopback->cable_lock);
t.openmin = t.openmax = 0;
t.integer = 0;
return snd_interval_refine(hw_param_interval(params, rule->var), &t);
static int rule_channels(struct snd_pcm_hw_params *params,
struct snd_pcm_hw_rule *rule)
{
- struct snd_pcm_hardware *hw = rule->private;
+ struct loopback_pcm *dpcm = rule->private;
+ struct loopback_cable *cable = dpcm->cable;
struct snd_interval t;
- t.min = hw->channels_min;
- t.max = hw->channels_max;
+ mutex_lock(&dpcm->loopback->cable_lock);
+ t.min = cable->hw.channels_min;
+ t.max = cable->hw.channels_max;
+ mutex_unlock(&dpcm->loopback->cable_lock);
t.openmin = t.openmax = 0;
t.integer = 0;
return snd_interval_refine(hw_param_interval(params, rule->var), &t);
}
+static void free_cable(struct snd_pcm_substream *substream)
+{
+ struct loopback *loopback = substream->private_data;
+ int dev = get_cable_index(substream);
+ struct loopback_cable *cable;
+
+ cable = loopback->cables[substream->number][dev];
+ if (!cable)
+ return;
+ if (cable->streams[!substream->stream]) {
+ /* other stream is still alive */
+ cable->streams[substream->stream] = NULL;
+ } else {
+ /* free the cable */
+ loopback->cables[substream->number][dev] = NULL;
+ kfree(cable);
+ }
+}
+
static int loopback_open(struct snd_pcm_substream *substream)
{
struct snd_pcm_runtime *runtime = substream->runtime;
struct loopback *loopback = substream->private_data;
struct loopback_pcm *dpcm;
- struct loopback_cable *cable;
+ struct loopback_cable *cable = NULL;
int err = 0;
int dev = get_cable_index(substream);
if (!cable) {
cable = kzalloc(sizeof(*cable), GFP_KERNEL);
if (!cable) {
- kfree(dpcm);
err = -ENOMEM;
goto unlock;
}
/* are cached -> they do not reflect the actual state */
err = snd_pcm_hw_rule_add(runtime, 0,
SNDRV_PCM_HW_PARAM_FORMAT,
- rule_format, &runtime->hw,
+ rule_format, dpcm,
SNDRV_PCM_HW_PARAM_FORMAT, -1);
if (err < 0)
goto unlock;
err = snd_pcm_hw_rule_add(runtime, 0,
SNDRV_PCM_HW_PARAM_RATE,
- rule_rate, &runtime->hw,
+ rule_rate, dpcm,
SNDRV_PCM_HW_PARAM_RATE, -1);
if (err < 0)
goto unlock;
err = snd_pcm_hw_rule_add(runtime, 0,
SNDRV_PCM_HW_PARAM_CHANNELS,
- rule_channels, &runtime->hw,
+ rule_channels, dpcm,
SNDRV_PCM_HW_PARAM_CHANNELS, -1);
if (err < 0)
goto unlock;
else
runtime->hw = cable->hw;
unlock:
+ if (err < 0) {
+ free_cable(substream);
+ kfree(dpcm);
+ }
mutex_unlock(&loopback->cable_lock);
return err;
}
{
struct loopback *loopback = substream->private_data;
struct loopback_pcm *dpcm = substream->runtime->private_data;
- struct loopback_cable *cable;
- int dev = get_cable_index(substream);
loopback_timer_stop(dpcm);
mutex_lock(&loopback->cable_lock);
- cable = loopback->cables[substream->number][dev];
- if (cable->streams[!substream->stream]) {
- /* other stream is still alive */
- cable->streams[substream->stream] = NULL;
- } else {
- /* free the cable */
- loopback->cables[substream->number][dev] = NULL;
- kfree(cable);
- }
+ free_cable(substream);
mutex_unlock(&loopback->cable_lock);
return 0;
}
*/
int snd_hdac_i915_register_notifier(const struct i915_audio_component_audio_ops *aops)
{
- if (WARN_ON(!hdac_acomp))
+ if (!hdac_acomp)
return -ENODEV;
hdac_acomp->audio_ops = aops;
CXT_FIXUP_HP_SPECTRE,
CXT_FIXUP_HP_GATE_MIC,
CXT_FIXUP_MUTE_LED_GPIO,
+ CXT_FIXUP_HEADSET_MIC,
+ CXT_FIXUP_HP_MIC_NO_PRESENCE,
};
/* for hda_fixup_thinkpad_acpi() */
}
}
+static void cxt_fixup_headset_mic(struct hda_codec *codec,
+ const struct hda_fixup *fix, int action)
+{
+ struct conexant_spec *spec = codec->spec;
+
+ switch (action) {
+ case HDA_FIXUP_ACT_PRE_PROBE:
+ spec->parse_flags |= HDA_PINCFG_HEADSET_MIC;
+ break;
+ }
+}
+
/* OPLC XO 1.5 fixup */
/* OLPC XO-1.5 supports DC input mode (e.g. for use with analog sensors)
.type = HDA_FIXUP_FUNC,
.v.func = cxt_fixup_mute_led_gpio,
},
+ [CXT_FIXUP_HEADSET_MIC] = {
+ .type = HDA_FIXUP_FUNC,
+ .v.func = cxt_fixup_headset_mic,
+ },
+ [CXT_FIXUP_HP_MIC_NO_PRESENCE] = {
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+ { 0x1a, 0x02a1113c },
+ { }
+ },
+ .chained = true,
+ .chain_id = CXT_FIXUP_HEADSET_MIC,
+ },
};
static const struct snd_pci_quirk cxt5045_fixups[] = {
SND_PCI_QUIRK(0x103c, 0x8115, "HP Z1 Gen3", CXT_FIXUP_HP_GATE_MIC),
SND_PCI_QUIRK(0x103c, 0x814f, "HP ZBook 15u G3", CXT_FIXUP_MUTE_LED_GPIO),
SND_PCI_QUIRK(0x103c, 0x822e, "HP ProBook 440 G4", CXT_FIXUP_MUTE_LED_GPIO),
+ SND_PCI_QUIRK(0x103c, 0x8299, "HP 800 G3 SFF", CXT_FIXUP_HP_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x103c, 0x829a, "HP 800 G3 DM", CXT_FIXUP_HP_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN),
SND_PCI_QUIRK(0x152d, 0x0833, "OLPC XO-1.5", CXT_FIXUP_OLPC_XO),
SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410),
#define is_kabylake(codec) ((codec)->core.vendor_id == 0x8086280b)
#define is_geminilake(codec) (((codec)->core.vendor_id == 0x8086280d) || \
((codec)->core.vendor_id == 0x80862800))
+#define is_cannonlake(codec) ((codec)->core.vendor_id == 0x8086280c)
#define is_haswell_plus(codec) (is_haswell(codec) || is_broadwell(codec) \
|| is_skylake(codec) || is_broxton(codec) \
- || is_kabylake(codec)) || is_geminilake(codec)
-
+ || is_kabylake(codec)) || is_geminilake(codec) \
+ || is_cannonlake(codec)
#define is_valleyview(codec) ((codec)->core.vendor_id == 0x80862882)
#define is_cherryview(codec) ((codec)->core.vendor_id == 0x80862883)
#define is_valleyview_plus(codec) (is_valleyview(codec) || is_cherryview(codec))
HDA_CODEC_ENTRY(0x80862809, "Skylake HDMI", patch_i915_hsw_hdmi),
HDA_CODEC_ENTRY(0x8086280a, "Broxton HDMI", patch_i915_hsw_hdmi),
HDA_CODEC_ENTRY(0x8086280b, "Kabylake HDMI", patch_i915_hsw_hdmi),
+HDA_CODEC_ENTRY(0x8086280c, "Cannonlake HDMI", patch_i915_glk_hdmi),
HDA_CODEC_ENTRY(0x8086280d, "Geminilake HDMI", patch_i915_glk_hdmi),
HDA_CODEC_ENTRY(0x80862800, "Geminilake HDMI", patch_i915_glk_hdmi),
HDA_CODEC_ENTRY(0x80862880, "CedarTrail HDMI", patch_generic_hdmi),
case 0x10ec0292:
alc_update_coef_idx(codec, 0x4, 1<<15, 0);
break;
- case 0x10ec0215:
case 0x10ec0225:
+ case 0x10ec0295:
+ case 0x10ec0299:
+ alc_update_coef_idx(codec, 0x67, 0xf000, 0x3000);
+ /* fallthrough */
+ case 0x10ec0215:
case 0x10ec0233:
case 0x10ec0236:
case 0x10ec0255:
case 0x10ec0286:
case 0x10ec0288:
case 0x10ec0285:
- case 0x10ec0295:
case 0x10ec0298:
case 0x10ec0289:
- case 0x10ec0299:
alc_update_coef_idx(codec, 0x10, 1<<9, 0);
break;
case 0x10ec0275:
}
}
+/* Forcibly assign NID 0x03 to HP/LO while NID 0x02 to SPK for EQ */
+static void alc274_fixup_bind_dacs(struct hda_codec *codec,
+ const struct hda_fixup *fix, int action)
+{
+ struct alc_spec *spec = codec->spec;
+ static hda_nid_t preferred_pairs[] = {
+ 0x21, 0x03, 0x1b, 0x03, 0x16, 0x02,
+ 0
+ };
+
+ if (action != HDA_FIXUP_ACT_PRE_PROBE)
+ return;
+
+ spec->gen.preferred_dacs = preferred_pairs;
+}
+
/* for hda_fixup_thinkpad_acpi() */
#include "thinkpad_helper.c"
ALC233_FIXUP_LENOVO_MULTI_CODECS,
ALC294_FIXUP_LENOVO_MIC_LOCATION,
ALC700_FIXUP_INTEL_REFERENCE,
+ ALC274_FIXUP_DELL_BIND_DACS,
+ ALC274_FIXUP_DELL_AIO_LINEOUT_VERB,
};
static const struct hda_fixup alc269_fixups[] = {
{}
}
},
+ [ALC274_FIXUP_DELL_BIND_DACS] = {
+ .type = HDA_FIXUP_FUNC,
+ .v.func = alc274_fixup_bind_dacs,
+ .chained = true,
+ .chain_id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE
+ },
+ [ALC274_FIXUP_DELL_AIO_LINEOUT_VERB] = {
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+ { 0x1b, 0x0401102f },
+ { }
+ },
+ .chained = true,
+ .chain_id = ALC274_FIXUP_DELL_BIND_DACS
+ },
};
static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x30bb, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x30e2, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x310c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION),
+ SND_PCI_QUIRK(0x17aa, 0x313c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION),
SND_PCI_QUIRK(0x17aa, 0x3112, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY),
SND_PCI_QUIRK(0x17aa, 0x3902, "Lenovo E50-80", ALC269_FIXUP_DMIC_THINKPAD_ACPI),
SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC),
SND_HDA_PIN_QUIRK(0x10ec0255, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
{0x1b, 0x01011020},
{0x21, 0x02211010}),
+ SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
+ {0x12, 0x90a60130},
+ {0x14, 0x90170110},
+ {0x1b, 0x01011020},
+ {0x21, 0x0221101f}),
SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
{0x12, 0x90a60160},
{0x14, 0x90170120},
{0x14, 0x90170110},
{0x1b, 0x90a70130},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB,
{0x12, 0xb7a60130},
{0x13, 0xb8a61140},
{0x16, 0x90170110},
struct resource *res;
const u32 *pdata = pdev->dev.platform_data;
+ if (!pdata) {
+ dev_err(&pdev->dev, "Missing platform data\n");
+ return -ENODEV;
+ }
+
audio_drv_data = devm_kzalloc(&pdev->dev, sizeof(struct audio_drv_data),
GFP_KERNEL);
if (audio_drv_data == NULL)
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
audio_drv_data->acp_mmio = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(audio_drv_data->acp_mmio))
+ return PTR_ERR(audio_drv_data->acp_mmio);
/* The following members gets populated in device 'open'
* function. Till then interrupts are disabled in 'acp_init'
config SND_ATMEL_SOC_CLASSD
tristate "Atmel ASoC driver for boards using CLASSD"
depends on ARCH_AT91 || COMPILE_TEST
- select SND_ATMEL_SOC_DMA
+ select SND_SOC_GENERIC_DMAENGINE_PCM
select REGMAP_MMIO
help
Say Y if you want to add support for Atmel ASoC driver for boards using
}
if (da7218->dev_id == DA7218_DEV_ID) {
- hpldet_np = of_find_node_by_name(np, "da7218_hpldet");
+ hpldet_np = of_get_child_by_name(np, "da7218_hpldet");
if (!hpldet_np)
return pdata;
#define MSM8916_WCD_ANALOG_RATES (SNDRV_PCM_RATE_8000 | SNDRV_PCM_RATE_16000 |\
SNDRV_PCM_RATE_32000 | SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_ANALOG_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)
static int btn_mask = SND_JACK_BTN_0 | SND_JACK_BTN_1 |
SND_JACK_BTN_2 | SND_JACK_BTN_3 | SND_JACK_BTN_4;
SNDRV_PCM_RATE_32000 | \
SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_DIGITAL_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)
struct msm8916_wcd_digital_priv {
struct clk *ahbclk, *mclk;
RX_I2S_CTL_RX_I2S_MODE_MASK,
RX_I2S_CTL_RX_I2S_MODE_16);
break;
- case SNDRV_PCM_FORMAT_S24_LE:
+ case SNDRV_PCM_FORMAT_S32_LE:
snd_soc_update_bits(dai->codec, LPASS_CDC_CLK_TX_I2S_CTL,
TX_I2S_CTL_TX_I2S_MODE_MASK,
TX_I2S_CTL_TX_I2S_MODE_32);
switch (event) {
case SND_SOC_DAPM_POST_PMU:
+ msleep(125);
regmap_update_bits(nau8825->regmap, NAU8825_REG_ENA_CTRL,
NAU8825_ENABLE_ADC, NAU8825_ENABLE_ADC);
break;
dev_err(&rt5514_spi->dev,
"%s Failed to reguest IRQ: %d\n", __func__,
ret);
+ else
+ device_init_wakeup(rt5514_dsp->dev, true);
}
return 0;
return ret;
}
- device_init_wakeup(&spi->dev, true);
-
return 0;
}
if (device_may_wakeup(dev))
disable_irq_wake(irq);
- if (rt5514_dsp->substream) {
- rt5514_spi_burst_read(RT5514_IRQ_CTRL, (u8 *)&buf, sizeof(buf));
- if (buf[0] & RT5514_IRQ_STATUS_BIT)
- rt5514_schedule_copy(rt5514_dsp);
+ if (rt5514_dsp) {
+ if (rt5514_dsp->substream) {
+ rt5514_spi_burst_read(RT5514_IRQ_CTRL, (u8 *)&buf,
+ sizeof(buf));
+ if (buf[0] & RT5514_IRQ_STATUS_BIT)
+ rt5514_schedule_copy(rt5514_dsp);
+ }
}
return 0;
SND_SOC_DAPM_PGA("DMIC1", SND_SOC_NOPM, 0, 0, NULL, 0),
SND_SOC_DAPM_PGA("DMIC2", SND_SOC_NOPM, 0, 0, NULL, 0),
- SND_SOC_DAPM_SUPPLY("DMIC CLK", SND_SOC_NOPM, 0, 0,
+ SND_SOC_DAPM_SUPPLY_S("DMIC CLK", 1, SND_SOC_NOPM, 0, 0,
rt5514_set_dmic_clk, SND_SOC_DAPM_PRE_PMU),
SND_SOC_DAPM_SUPPLY("ADC CLK", RT5514_CLK_CTRL1,
regmap_read(regmap, RT5645_VENDOR_ID, &val);
rt5645->v_id = val & 0xff;
+ regmap_write(rt5645->regmap, RT5645_AD_DA_MIXER, 0x8080);
+
ret = regmap_register_patch(rt5645->regmap, init_list,
ARRAY_SIZE(init_list));
if (ret != 0)
RT5663_IRQ_POW_SAV_MASK, RT5663_IRQ_POW_SAV_EN);
snd_soc_update_bits(codec, RT5663_IRQ_1,
RT5663_EN_IRQ_JD1_MASK, RT5663_EN_IRQ_JD1_EN);
+ snd_soc_update_bits(codec, RT5663_EM_JACK_TYPE_1,
+ RT5663_EM_JD_MASK, RT5663_EM_JD_RST);
+ snd_soc_update_bits(codec, RT5663_EM_JACK_TYPE_1,
+ RT5663_EM_JD_MASK, RT5663_EM_JD_NOR);
while (true) {
regmap_read(rt5663->regmap, RT5663_INT_ST_2, &val);
#define RT5663_POL_EXT_JD_SHIFT 10
#define RT5663_POL_EXT_JD_EN (0x1 << 10)
#define RT5663_POL_EXT_JD_DIS (0x0 << 10)
+#define RT5663_EM_JD_MASK (0x1 << 7)
+#define RT5663_EM_JD_SHIFT 7
+#define RT5663_EM_JD_NOR (0x1 << 7)
+#define RT5663_EM_JD_RST (0x0 << 7)
/* DACREF LDO Control (0x0112)*/
#define RT5663_PWR_LDO_DACREFL_MASK (0x1 << 9)
/* INT2 interrupt control */
#define AIC31XX_INT2CTRL AIC31XX_REG(0, 49)
/* GPIO1 control */
-#define AIC31XX_GPIO1 AIC31XX_REG(0, 50)
+#define AIC31XX_GPIO1 AIC31XX_REG(0, 51)
#define AIC31XX_DACPRB AIC31XX_REG(0, 60)
/* ADC Instruction Set Register */
struct twl4030_codec_data *pdata = dev_get_platdata(codec->dev);
struct device_node *twl4030_codec_node = NULL;
- twl4030_codec_node = of_find_node_by_name(codec->dev->parent->of_node,
+ twl4030_codec_node = of_get_child_by_name(codec->dev->parent->of_node,
"codec");
if (!pdata && twl4030_codec_node) {
GFP_KERNEL);
if (!pdata) {
dev_err(codec->dev, "Can not allocate memory\n");
+ of_node_put(twl4030_codec_node);
return NULL;
}
twl4030_setup_pdata_of(pdata, twl4030_codec_node);
+ of_node_put(twl4030_codec_node);
}
return pdata;
le64_to_cpu(footer->timestamp));
while (pos < firmware->size &&
- pos - firmware->size > sizeof(*region)) {
+ sizeof(*region) < firmware->size - pos) {
region = (void *)&(firmware->data[pos]);
region_name = "Unknown";
reg = 0;
regions, le32_to_cpu(region->len), offset,
region_name);
- if ((pos + le32_to_cpu(region->len) + sizeof(*region)) >
- firmware->size) {
+ if (le32_to_cpu(region->len) >
+ firmware->size - pos - sizeof(*region)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, regions, region_name,
blocks = 0;
while (pos < firmware->size &&
- pos - firmware->size > sizeof(*blk)) {
+ sizeof(*blk) < firmware->size - pos) {
blk = (void *)(&firmware->data[pos]);
type = le16_to_cpu(blk->type);
}
if (reg) {
- if ((pos + le32_to_cpu(blk->len) + sizeof(*blk)) >
- firmware->size) {
+ if (le32_to_cpu(blk->len) >
+ firmware->size - pos - sizeof(*blk)) {
adsp_err(dsp,
"%s.%d: %s region len %d bytes exceeds file length %zu\n",
file, blocks, region_name,
#define ASRFSTi_OUTPUT_FIFO_SHIFT 12
#define ASRFSTi_OUTPUT_FIFO_MASK (((1 << ASRFSTi_OUTPUT_FIFO_WIDTH) - 1) << ASRFSTi_OUTPUT_FIFO_SHIFT)
#define ASRFSTi_IAEi_SHIFT 11
-#define ASRFSTi_IAEi_MASK (1 << ASRFSTi_OAFi_SHIFT)
-#define ASRFSTi_IAEi (1 << ASRFSTi_OAFi_SHIFT)
+#define ASRFSTi_IAEi_MASK (1 << ASRFSTi_IAEi_SHIFT)
+#define ASRFSTi_IAEi (1 << ASRFSTi_IAEi_SHIFT)
#define ASRFSTi_INPUT_FIFO_WIDTH 7
#define ASRFSTi_INPUT_FIFO_SHIFT 0
#define ASRFSTi_INPUT_FIFO_MASK ((1 << ASRFSTi_INPUT_FIFO_WIDTH) - 1)
#include <linux/ctype.h>
#include <linux/device.h>
#include <linux/delay.h>
+#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/of.h>
u32 fifo_watermark;
u32 dma_maxburst;
+
+ struct mutex ac97_reg_lock;
};
/*
if (reg > 0x7f)
return;
+ mutex_lock(&fsl_ac97_data->ac97_reg_lock);
+
ret = clk_prepare_enable(fsl_ac97_data->clk);
if (ret) {
pr_err("ac97 write clk_prepare_enable failed: %d\n",
ret);
- return;
+ goto ret_unlock;
}
lreg = reg << 12;
udelay(100);
clk_disable_unprepare(fsl_ac97_data->clk);
+
+ret_unlock:
+ mutex_unlock(&fsl_ac97_data->ac97_reg_lock);
}
static unsigned short fsl_ssi_ac97_read(struct snd_ac97 *ac97,
{
struct regmap *regs = fsl_ac97_data->regs;
- unsigned short val = -1;
+ unsigned short val = 0;
u32 reg_val;
unsigned int lreg;
int ret;
+ mutex_lock(&fsl_ac97_data->ac97_reg_lock);
+
ret = clk_prepare_enable(fsl_ac97_data->clk);
if (ret) {
pr_err("ac97 read clk_prepare_enable failed: %d\n",
ret);
- return -1;
+ goto ret_unlock;
}
lreg = (reg & 0x7f) << 12;
clk_disable_unprepare(fsl_ac97_data->clk);
+ret_unlock:
+ mutex_unlock(&fsl_ac97_data->ac97_reg_lock);
return val;
}
sizeof(fsl_ssi_ac97_dai));
fsl_ac97_data = ssi_private;
-
- ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
- if (ret) {
- dev_err(&pdev->dev, "could not set AC'97 ops\n");
- return ret;
- }
} else {
/* Initialize this copy of the CPU DAI driver structure */
memcpy(&ssi_private->cpu_dai_drv, &fsl_ssi_dai_template,
return ret;
}
+ if (fsl_ssi_is_ac97(ssi_private)) {
+ mutex_init(&ssi_private->ac97_reg_lock);
+ ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev);
+ if (ret) {
+ dev_err(&pdev->dev, "could not set AC'97 ops\n");
+ goto error_ac97_ops;
+ }
+ }
+
ret = devm_snd_soc_register_component(&pdev->dev, &fsl_ssi_component,
&ssi_private->cpu_dai_drv, 1);
if (ret) {
fsl_ssi_debugfs_remove(&ssi_private->dbg_stats);
error_asoc_register:
+ if (fsl_ssi_is_ac97(ssi_private))
+ snd_soc_set_ac97_ops(NULL);
+
+error_ac97_ops:
+ if (fsl_ssi_is_ac97(ssi_private))
+ mutex_destroy(&ssi_private->ac97_reg_lock);
+
if (ssi_private->soc->imx)
fsl_ssi_imx_clean(pdev, ssi_private);
if (ssi_private->soc->imx)
fsl_ssi_imx_clean(pdev, ssi_private);
- if (fsl_ssi_is_ac97(ssi_private))
+ if (fsl_ssi_is_ac97(ssi_private)) {
snd_soc_set_ac97_ops(NULL);
+ mutex_destroy(&ssi_private->ac97_reg_lock);
+ }
return 0;
}
{ "ssp0 Tx", NULL, "spk_out" },
{ "AIF Playback", NULL, "ssp1 Tx" },
- { "ssp1 Tx", NULL, "hs_out" },
+ { "ssp1 Tx", NULL, "codec1_out" },
{ "hs_in", NULL, "ssp1 Rx" },
{ "ssp1 Rx", NULL, "AIF Capture" },
{ "ssp0 Tx", NULL, "spk_out" },
{ "AIF Playback", NULL, "ssp1 Tx" },
- { "ssp1 Tx", NULL, "hs_out" },
+ { "ssp1 Tx", NULL, "codec1_out" },
{ "hs_in", NULL, "ssp1 Rx" },
{ "ssp1 Rx", NULL, "AIF Capture" },
if ((epnt->virtual_bus_id == instance_id) &&
(epnt->linktype == link_type) &&
- (epnt->direction == dirn) &&
- (epnt->device_type == dev_type))
- return true;
- else
- return false;
+ (epnt->direction == dirn)) {
+ /* do not check dev_type for DMIC link type */
+ if (epnt->linktype == NHLT_LINK_DMIC)
+ return true;
+
+ if (epnt->device_type == dev_type)
+ return true;
+ }
+
+ return false;
}
struct nhlt_specific_cfg
break;
default:
- dev_warn(bus->dev, "Control load not supported %d:%d:%d\n",
+ dev_dbg(bus->dev, "Control load not supported %d:%d:%d\n",
hdr->ops.get, hdr->ops.put, hdr->ops.info);
break;
}
spdif->mclk = devm_clk_get(&pdev->dev, "mclk");
if (IS_ERR(spdif->mclk)) {
dev_err(&pdev->dev, "Can't retrieve rk_spdif master clock\n");
- return PTR_ERR(spdif->mclk);
+ ret = PTR_ERR(spdif->mclk);
+ goto err_disable_hclk;
}
ret = clk_prepare_enable(spdif->mclk);
if (ret) {
dev_err(spdif->dev, "clock enable failed %d\n", ret);
- return ret;
+ goto err_disable_clocks;
}
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
regs = devm_ioremap_resource(&pdev->dev, res);
- if (IS_ERR(regs))
- return PTR_ERR(regs);
+ if (IS_ERR(regs)) {
+ ret = PTR_ERR(regs);
+ goto err_disable_clocks;
+ }
spdif->regmap = devm_regmap_init_mmio_clk(&pdev->dev, "hclk", regs,
&rk_spdif_regmap_config);
if (IS_ERR(spdif->regmap)) {
dev_err(&pdev->dev,
"Failed to initialise managed register map\n");
- return PTR_ERR(spdif->regmap);
+ ret = PTR_ERR(spdif->regmap);
+ goto err_disable_clocks;
}
spdif->playback_dma_data.addr = res->start + SPDIF_SMPDR;
err_pm_runtime:
pm_runtime_disable(&pdev->dev);
+err_disable_clocks:
+ clk_disable_unprepare(spdif->mclk);
+err_disable_hclk:
+ clk_disable_unprepare(spdif->hclk);
return ret;
}
NULL, &val, NULL);
val = val << shift;
- mask = 0xffff << shift;
+ mask = 0x0f1f << shift;
rsnd_mod_bset(adg_mod, CMDOUT_TIMSEL, mask, val);
in = in << shift;
out = out << shift;
- mask = 0xffff << shift;
+ mask = 0x0f1f << shift;
switch (id / 2) {
case 0:
ckr = 0x80000000;
}
- rsnd_mod_bset(adg_mod, BRGCKR, 0x80FF0000, adg->ckr | ckr);
+ rsnd_mod_bset(adg_mod, BRGCKR, 0x80770000, adg->ckr | ckr);
rsnd_mod_write(adg_mod, BRRA, adg->rbga);
rsnd_mod_write(adg_mod, BRRB, adg->rbgb);
return snd_pcm_lib_preallocate_pages_for_all(
rtd->pcm,
- SNDRV_DMA_TYPE_CONTINUOUS,
- snd_dma_continuous_data(GFP_KERNEL),
+ SNDRV_DMA_TYPE_DEV,
+ rtd->card->snd_card->dev,
PREALLOC_BUFFER, PREALLOC_BUFFER_MAX);
}
struct rsnd_dmaen {
struct dma_chan *chan;
dma_cookie_t cookie;
- dma_addr_t dma_buf;
unsigned int dma_len;
- unsigned int dma_period;
- unsigned int dma_cnt;
};
struct rsnd_dmapp {
/*
* Audio DMAC
*/
-#define rsnd_dmaen_sync(dmaen, io, i) __rsnd_dmaen_sync(dmaen, io, i, 1)
-#define rsnd_dmaen_unsync(dmaen, io, i) __rsnd_dmaen_sync(dmaen, io, i, 0)
-static void __rsnd_dmaen_sync(struct rsnd_dmaen *dmaen, struct rsnd_dai_stream *io,
- int i, int sync)
-{
- struct device *dev = dmaen->chan->device->dev;
- enum dma_data_direction dir;
- int is_play = rsnd_io_is_play(io);
- dma_addr_t buf;
- int len, max;
- size_t period;
-
- len = dmaen->dma_len;
- period = dmaen->dma_period;
- max = len / period;
- i = i % max;
- buf = dmaen->dma_buf + (period * i);
-
- dir = is_play ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
-
- if (sync)
- dma_sync_single_for_device(dev, buf, period, dir);
- else
- dma_sync_single_for_cpu(dev, buf, period, dir);
-}
-
static void __rsnd_dmaen_complete(struct rsnd_mod *mod,
struct rsnd_dai_stream *io)
{
struct rsnd_priv *priv = rsnd_mod_to_priv(mod);
- struct rsnd_dma *dma = rsnd_mod_to_dma(mod);
- struct rsnd_dmaen *dmaen = rsnd_dma_to_dmaen(dma);
bool elapsed = false;
unsigned long flags;
*/
spin_lock_irqsave(&priv->lock, flags);
- if (rsnd_io_is_working(io)) {
- rsnd_dmaen_unsync(dmaen, io, dmaen->dma_cnt);
-
- /*
- * Next period is already started.
- * Let's sync Next Next period
- * see
- * rsnd_dmaen_start()
- */
- rsnd_dmaen_sync(dmaen, io, dmaen->dma_cnt + 2);
-
+ if (rsnd_io_is_working(io))
elapsed = true;
- dmaen->dma_cnt++;
- }
-
spin_unlock_irqrestore(&priv->lock, flags);
if (elapsed)
struct rsnd_dma *dma = rsnd_mod_to_dma(mod);
struct rsnd_dmaen *dmaen = rsnd_dma_to_dmaen(dma);
- if (dmaen->chan) {
- int is_play = rsnd_io_is_play(io);
-
+ if (dmaen->chan)
dmaengine_terminate_all(dmaen->chan);
- dma_unmap_single(dmaen->chan->device->dev,
- dmaen->dma_buf, dmaen->dma_len,
- is_play ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
- }
return 0;
}
struct device *dev = rsnd_priv_to_dev(priv);
struct dma_async_tx_descriptor *desc;
struct dma_slave_config cfg = {};
- dma_addr_t buf;
- size_t len;
- size_t period;
int is_play = rsnd_io_is_play(io);
- int i;
int ret;
cfg.direction = is_play ? DMA_MEM_TO_DEV : DMA_DEV_TO_MEM;
if (ret < 0)
return ret;
- len = snd_pcm_lib_buffer_bytes(substream);
- period = snd_pcm_lib_period_bytes(substream);
- buf = dma_map_single(dmaen->chan->device->dev,
- substream->runtime->dma_area,
- len,
- is_play ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
- if (dma_mapping_error(dmaen->chan->device->dev, buf)) {
- dev_err(dev, "dma map failed\n");
- return -EIO;
- }
-
desc = dmaengine_prep_dma_cyclic(dmaen->chan,
- buf, len, period,
+ substream->runtime->dma_addr,
+ snd_pcm_lib_buffer_bytes(substream),
+ snd_pcm_lib_period_bytes(substream),
is_play ? DMA_MEM_TO_DEV : DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
desc->callback = rsnd_dmaen_complete;
desc->callback_param = rsnd_mod_get(dma);
- dmaen->dma_buf = buf;
- dmaen->dma_len = len;
- dmaen->dma_period = period;
- dmaen->dma_cnt = 0;
-
- /*
- * synchronize this and next period
- * see
- * __rsnd_dmaen_complete()
- */
- for (i = 0; i < 2; i++)
- rsnd_dmaen_sync(dmaen, io, i);
+ dmaen->dma_len = snd_pcm_lib_buffer_bytes(substream);
dmaen->cookie = dmaengine_submit(desc);
if (dmaen->cookie < 0) {
int byte)
{
struct rsnd_ssi *ssi = rsnd_mod_to_ssi(mod);
+ bool ret = false;
+ int byte_pos;
- ssi->byte_pos += byte;
+ byte_pos = ssi->byte_pos + byte;
- if (ssi->byte_pos >= ssi->next_period_byte) {
+ if (byte_pos >= ssi->next_period_byte) {
struct snd_pcm_runtime *runtime = rsnd_io_to_runtime(io);
ssi->period_pos++;
ssi->next_period_byte += ssi->byte_per_period;
if (ssi->period_pos >= runtime->periods) {
- ssi->byte_pos = 0;
+ byte_pos = 0;
ssi->period_pos = 0;
ssi->next_period_byte = ssi->byte_per_period;
}
- return true;
+ ret = true;
}
- return false;
+ WRITE_ONCE(ssi->byte_pos, byte_pos);
+
+ return ret;
}
/*
struct rsnd_ssi *ssi = rsnd_mod_to_ssi(mod);
struct snd_pcm_runtime *runtime = rsnd_io_to_runtime(io);
- *pointer = bytes_to_frames(runtime, ssi->byte_pos);
+ *pointer = bytes_to_frames(runtime, READ_ONCE(ssi->byte_pos));
return 0;
}
{
int hdmi = rsnd_ssi_hdmi_port(io);
int ret;
+ u32 mode = 0;
ret = rsnd_ssiu_init(mod, io, priv);
if (ret < 0)
* see
* rsnd_ssi_config_init()
*/
- rsnd_mod_write(mod, SSI_MODE, 0x1);
+ mode = 0x1;
}
+ rsnd_mod_write(mod, SSI_MODE, mode);
+
if (rsnd_ssi_use_busif(io)) {
rsnd_mod_write(mod, SSI_BUSIF_ADINR,
rsnd_get_adinr_bit(mod, io) |
kctl->private_value = (unsigned long)namelist;
kctl->private_free = usb_mixer_selector_elem_free;
- nameid = uac_selector_unit_iSelector(desc);
+ /* check the static mapping table at first */
len = check_mapped_name(map, kctl->id.name, sizeof(kctl->id.name));
- if (len)
- ;
- else if (nameid)
- len = snd_usb_copy_string_desc(state, nameid, kctl->id.name,
- sizeof(kctl->id.name));
- else
- len = get_term_name(state, &state->oterm,
- kctl->id.name, sizeof(kctl->id.name), 0);
-
if (!len) {
- strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name));
+ /* no mapping ? */
+ /* if iSelector is given, use it */
+ nameid = uac_selector_unit_iSelector(desc);
+ if (nameid)
+ len = snd_usb_copy_string_desc(state, nameid,
+ kctl->id.name,
+ sizeof(kctl->id.name));
+ /* ... or pick up the terminal name at next */
+ if (!len)
+ len = get_term_name(state, &state->oterm,
+ kctl->id.name, sizeof(kctl->id.name), 0);
+ /* ... or use the fixed string "USB" as the last resort */
+ if (!len)
+ strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name));
+ /* and add the proper suffix */
if (desc->bDescriptorSubtype == UAC2_CLOCK_SELECTOR)
append_ctl_name(kctl, " Clock Source");
else if ((state->oterm.type & 0xff00) == 0x0100)
/* TEAC UD-501/UD-503/NT-503 USB DACs need a vendor cmd to switch
* between PCM/DOP and native DSD mode
*/
-static bool is_teac_50X_dac(unsigned int id)
+static bool is_teac_dsd_dac(unsigned int id)
{
switch (id) {
case USB_ID(0x0644, 0x8043): /* TEAC UD-501/UD-503/NT-503 */
+ case USB_ID(0x0644, 0x8044): /* Esoteric D-05X */
return true;
}
return false;
break;
}
mdelay(20);
- } else if (is_teac_50X_dac(subs->stream->chip->usb_id)) {
+ } else if (is_teac_dsd_dac(subs->stream->chip->usb_id)) {
/* Vendor mode switch cmd is required. */
switch (fmt->altsetting) {
case 3: /* DSD mode (DSD_U32) requested */
}
/* TEAC devices with USB DAC functionality */
- if (is_teac_50X_dac(chip->usb_id)) {
+ if (is_teac_dsd_dac(chip->usb_id)) {
if (fp->altsetting == 3)
return SNDRV_PCM_FMTBIT_DSD_U32_BE;
}
#ifndef _UAPI__ASM_BPF_PERF_EVENT_H__
#define _UAPI__ASM_BPF_PERF_EVENT_H__
-#include <asm/ptrace.h>
+#include "ptrace.h"
typedef user_pt_regs bpf_user_pt_regs_t;
--- /dev/null
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _ASM_S390_PERF_REGS_H
+#define _ASM_S390_PERF_REGS_H
+
+enum perf_event_s390_regs {
+ PERF_REG_S390_R0,
+ PERF_REG_S390_R1,
+ PERF_REG_S390_R2,
+ PERF_REG_S390_R3,
+ PERF_REG_S390_R4,
+ PERF_REG_S390_R5,
+ PERF_REG_S390_R6,
+ PERF_REG_S390_R7,
+ PERF_REG_S390_R8,
+ PERF_REG_S390_R9,
+ PERF_REG_S390_R10,
+ PERF_REG_S390_R11,
+ PERF_REG_S390_R12,
+ PERF_REG_S390_R13,
+ PERF_REG_S390_R14,
+ PERF_REG_S390_R15,
+ PERF_REG_S390_FP0,
+ PERF_REG_S390_FP1,
+ PERF_REG_S390_FP2,
+ PERF_REG_S390_FP3,
+ PERF_REG_S390_FP4,
+ PERF_REG_S390_FP5,
+ PERF_REG_S390_FP6,
+ PERF_REG_S390_FP7,
+ PERF_REG_S390_FP8,
+ PERF_REG_S390_FP9,
+ PERF_REG_S390_FP10,
+ PERF_REG_S390_FP11,
+ PERF_REG_S390_FP12,
+ PERF_REG_S390_FP13,
+ PERF_REG_S390_FP14,
+ PERF_REG_S390_FP15,
+ PERF_REG_S390_MASK,
+ PERF_REG_S390_PC,
+
+ PERF_REG_S390_MAX
+};
+
+#endif /* _ASM_S390_PERF_REGS_H */
/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
#define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */
+#define X86_FEATURE_XSAVEERPTR (13*32+ 2) /* Always save/restore FP error pointers */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
break;
p_err("can't get next map: %s%s", strerror(errno),
errno == EINVAL ? " -- kernel too old?" : "");
- return -1;
+ break;
}
fd = bpf_map_get_fd_by_id(id);
if (fd < 0) {
+ if (errno == ENOENT)
+ continue;
p_err("can't get map by id (%u): %s",
id, strerror(errno));
- return -1;
+ break;
}
err = bpf_obj_get_info_by_fd(fd, &info, &len);
if (err) {
p_err("can't get map info: %s", strerror(errno));
close(fd);
- return -1;
+ break;
}
if (json_output)
fd = bpf_prog_get_fd_by_id(id);
if (fd < 0) {
+ if (errno == ENOENT)
+ continue;
p_err("can't get prog by id (%u): %s",
id, strerror(errno));
err = -1;
#define uninitialized_var(x) x = *(&(x))
-#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
-
#include <linux/types.h>
/*
/*
* Prevent the compiler from merging or refetching reads or writes. The
* compiler is also forbidden from reordering successive instances of
- * READ_ONCE, WRITE_ONCE and ACCESS_ONCE (see below), but only when the
- * compiler is aware of some particular ordering. One way to make the
- * compiler aware of ordering is to put the two invocations of READ_ONCE,
- * WRITE_ONCE or ACCESS_ONCE() in different C statements.
+ * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
+ * particular ordering. One way to make the compiler aware of ordering is to
+ * put the two invocations of READ_ONCE or WRITE_ONCE in different C
+ * statements.
*
- * In contrast to ACCESS_ONCE these two macros will also work on aggregate
- * data types like structs or unions. If the size of the accessed data
- * type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
- * READ_ONCE() and WRITE_ONCE() will fall back to memcpy and print a
- * compile-time warning.
+ * These two macros will also work on aggregate data types like structs or
+ * unions. If the size of the accessed data type exceeds the word size of
+ * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
+ * fall back to memcpy and print a compile-time warning.
*
* Their two major use cases are: (1) Mediating communication between
* process-level code and irq/NMI handlers, all running on the same CPU,
- * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
+ * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
* mutilate accesses that either do not require ordering or that interact
* with an explicit memory barrier or atomic instruction that provides the
* required ordering.
#define printk(...) dprintf(STDOUT_FILENO, __VA_ARGS__)
#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
#define pr_warn pr_err
+#define pr_cont pr_err
#define list_del_rcu list_del
--- /dev/null
+#if defined(__aarch64__)
+#include "../../arch/arm64/include/uapi/asm/bpf_perf_event.h"
+#elif defined(__s390__)
+#include "../../arch/s390/include/uapi/asm/bpf_perf_event.h"
+#else
+#include <uapi/asm-generic/bpf_perf_event.h>
+#endif
struct kvm_s390_irq_state {
__u64 buf;
- __u32 flags;
+ __u32 flags; /* will stay unused for compatibility reasons */
__u32 len;
- __u32 reserved[4];
+ __u32 reserved[4]; /* will stay unused for compatibility reasons */
};
/* for KVM_SET_GUEST_DEBUG */
@staticmethod
def is_field_wanted(fields_filter, field):
"""Indicate whether field is valid according to fields_filter."""
- if not fields_filter or fields_filter == "help":
+ if not fields_filter:
return True
return re.match(fields_filter, field) is not None
def update_fields(self, fields_filter):
"""Refresh fields, applying fields_filter"""
- self._fields = [field for field in self.get_available_fields()
- if self.is_field_wanted(fields_filter, field)]
+ self.fields = [field for field in self.get_available_fields()
+ if self.is_field_wanted(fields_filter, field)]
@staticmethod
def get_online_cpus():
curses.nocbreak()
curses.endwin()
- def get_all_gnames(self):
+ @staticmethod
+ def get_all_gnames():
"""Returns a list of (pid, gname) tuples of all running guests"""
res = []
try:
# perform a sanity check before calling the more expensive
# function to possibly extract the guest name
if ' -name ' in line[1]:
- res.append((line[0], self.get_gname_from_pid(line[0])))
+ res.append((line[0], Tui.get_gname_from_pid(line[0])))
child.stdout.close()
return res
except Exception:
self.screen.addstr(row + 1, 2, 'Not available')
- def get_pid_from_gname(self, gname):
+ @staticmethod
+ def get_pid_from_gname(gname):
"""Fuzzy function to convert guest name to QEMU process pid.
Returns a list of potential pids, can be empty if no match found.
"""
pids = []
- for line in self.get_all_gnames():
+ for line in Tui.get_all_gnames():
if gname == line[1]:
pids.append(int(line[0]))
# sort by totals
return (0, -stats[x][0])
total = 0.
- for val in stats.values():
- total += val[0]
+ for key in stats.keys():
+ if key.find('(') is -1:
+ total += stats[key][0]
if self._sorting == SORT_DEFAULT:
sortkey = sortCurAvg
else:
sortkey = sortTotal
+ tavg = 0
for key in sorted(stats.keys(), key=sortkey):
-
- if row >= self.screen.getmaxyx()[0]:
+ if row >= self.screen.getmaxyx()[0] - 1:
break
values = stats[key]
if not values[0] and not values[1]:
self.screen.addstr(row, 1, '%-40s %10d%7.1f %8s' %
(key, values[0], values[0] * 100 / total,
cur))
+ if cur is not '' and key.find('(') is -1:
+ tavg += cur
row += 1
if row == 3:
self.screen.addstr(4, 1, 'No matching events reported yet')
+ else:
+ self.screen.addstr(row, 1, '%-40s %10d %8s' %
+ ('Total', total, tavg if tavg else ''),
+ curses.A_BOLD)
self.screen.refresh()
def show_msg(self, text):
if char == 'x':
self.update_drilldown()
# prevents display of current values on next refresh
- self.stats.get()
+ self.stats.get(self._display_guests)
except KeyboardInterrupt:
break
except curses.error:
try:
pids = Tui.get_pid_from_gname(val)
except:
- raise optparse.OptionValueError('Error while searching for guest '
- '"{}", use "-p" to specify a pid '
- 'instead'.format(val))
+ sys.exit('Error while searching for guest "{}". Use "-p" to '
+ 'specify a pid instead?'.format(val))
if len(pids) == 0:
- raise optparse.OptionValueError('No guest by the name "{}" '
- 'found'.format(val))
+ sys.exit('Error: No guest by the name "{}" found'.format(val))
if len(pids) > 1:
- raise optparse.OptionValueError('Multiple processes found (pids: '
- '{}) - use "-p" to specify a pid '
- 'instead'.format(" ".join(pids)))
+ sys.exit('Error: Multiple processes found (pids: {}). Use "-p" '
+ 'to specify the desired pid'.format(" ".join(pids)))
parser.values.pid = pids[0]
optparser = optparse.OptionParser(description=description_text,
help='restrict statistics to guest by name',
callback=cb_guest_to_pid,
)
- (options, _) = optparser.parse_args(sys.argv)
+ options, unkn = optparser.parse_args(sys.argv)
+ if len(unkn) != 1:
+ sys.exit('Error: Extra argument(s): ' + ' '.join(unkn[1:]))
+ try:
+ # verify that we were passed a valid regex up front
+ re.compile(options.fields)
+ except re.error:
+ sys.exit('Error: "' + options.fields + '" is not a valid regular '
+ 'expression')
+
return options
stats = Stats(options)
- if options.fields == "help":
- event_list = "\n"
- s = stats.get()
- for key in s.keys():
- if key.find('(') != -1:
- key = key[0:key.find('(')]
- if event_list.find('\n' + key + '\n') == -1:
- event_list += key + '\n'
- sys.stdout.write(event_list)
- return ""
+ if options.fields == 'help':
+ stats.fields_filter = None
+ event_list = []
+ for key in stats.get().keys():
+ event_list.append(key.split('(', 1)[0])
+ sys.stdout.write(' ' + '\n '.join(sorted(set(event_list))) + '\n')
+ sys.exit(0)
if options.log:
log(stats)
*s*:: set update interval
*x*:: toggle reporting of stats for child trace events
+ :: *Note*: The stats for the parents summarize the respective child trace
+ events
Press any other key to refresh statistics immediately.
-f<fields>::
--fields=<fields>::
- fields to display (regex)
+ fields to display (regex), "-f help" for a list of available events
-h::
--help::
*type = INSN_STACK;
op->src.type = OP_SRC_ADD;
op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
- op->dest.type = OP_SRC_REG;
+ op->dest.type = OP_DEST_REG;
op->dest.reg = CFI_SP;
}
break;
fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1)
fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1)
fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
-ff:
+ff: UD0
EndTable
Table: 3-byte opcode 1 (0x0f 0x38)
7e: vpermt2d/q Vx,Hx,Wx (66),(ev)
7f: vpermt2ps/d Vx,Hx,Wx (66),(ev)
80: INVEPT Gy,Mdq (66)
-81: INVPID Gy,Mdq (66)
+81: INVVPID Gy,Mdq (66)
82: INVPCID Gy,Mdq (66)
83: vpmultishiftqb Vx,Hx,Wx (66),(ev)
88: vexpandps/d Vpd,Wpd (66),(ev)
GrpTable: Grp3_1
0: TEST Eb,Ib
-1:
+1: TEST Eb,Ib
2: NOT Eb
3: NEG Eb
4: MUL AL,Eb
EndTable
GrpTable: Grp10
+# all are UD1
+0: UD1
+1: UD1
+2: UD1
+3: UD1
+4: UD1
+5: UD1
+6: UD1
+7: UD1
EndTable
# Grp11A and Grp11B are expressed as Grp11 in Intel SDM
const char *objname;
argc--; argv++;
+ if (argc <= 0)
+ usage_with_options(orc_usage, check_options);
+
if (!strncmp(argv[0], "gen", 3)) {
argc = parse_options(argc, argv, check_options, orc_usage, 0);
if (argc != 1)
objname = argv[0];
return check(objname, no_fp, no_unreachable, true);
-
}
if (!strcmp(argv[0], "dump")) {
/* create .orc_unwind_ip and .rela.orc_unwind_ip sections */
sec = elf_create_section(file->elf, ".orc_unwind_ip", sizeof(int), idx);
+ if (!sec)
+ return -1;
ip_relasec = elf_create_rela_section(file->elf, sec);
if (!ip_relasec)
PYTHON_EMBED_LDFLAGS := $(call strip-libs,$(PYTHON_EMBED_LDOPTS))
PYTHON_EMBED_LIBADD := $(call grep-libs,$(PYTHON_EMBED_LDOPTS)) -lutil
PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null)
- ifeq ($(CC_NO_CLANG), 1)
- PYTHON_EMBED_CCOPTS := $(filter-out -specs=%,$(PYTHON_EMBED_CCOPTS))
- endif
+ PYTHON_EMBED_CCOPTS := $(filter-out -specs=%,$(PYTHON_EMBED_CCOPTS))
FLAGS_PYTHON_EMBED := $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS)
endif
endif
endif
-
ifdef NO_LIBPERL
CFLAGS += -DNO_LIBPERL
else
PERL_EMBED_LDOPTS = $(shell perl -MExtUtils::Embed -e ldopts 2>/dev/null)
PERL_EMBED_LDFLAGS = $(call strip-libs,$(PERL_EMBED_LDOPTS))
PERL_EMBED_LIBADD = $(call grep-libs,$(PERL_EMBED_LDOPTS))
- PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null`
+ PERL_EMBED_CCOPTS = $(shell perl -MExtUtils::Embed -e ccopts 2>/dev/null)
+ PERL_EMBED_CCOPTS := $(filter-out -specs=%,$(PERL_EMBED_CCOPTS))
+ PERL_EMBED_LDOPTS := $(filter-out -specs=%,$(PERL_EMBED_LDOPTS))
FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
ifneq ($(feature-libperl), 1)
#include <stdlib.h>
#include <linux/types.h>
-#include <../../../../arch/s390/include/uapi/asm/perf_regs.h>
+#include <asm/perf_regs.h>
void perf_regs_load(u64 *regs);
arch/arm/include/uapi/asm/perf_regs.h
arch/arm64/include/uapi/asm/perf_regs.h
arch/powerpc/include/uapi/asm/perf_regs.h
+arch/s390/include/uapi/asm/perf_regs.h
arch/x86/include/uapi/asm/perf_regs.h
arch/x86/include/uapi/asm/kvm.h
arch/x86/include/uapi/asm/kvm_perf.h
}
int
-jvmti_write_debug_info(void *agent, uint64_t code, const char *file,
- jvmti_line_info_t *li, int nr_lines)
+jvmti_write_debug_info(void *agent, uint64_t code,
+ int nr_lines, jvmti_line_info_t *li,
+ const char * const * file_names)
{
struct jr_code_debug_info rec;
- size_t sret, len, size, flen;
+ size_t sret, len, size, flen = 0;
uint64_t addr;
- const char *fn = file;
FILE *fp = agent;
int i;
return -1;
}
- flen = strlen(file) + 1;
+ for (i = 0; i < nr_lines; ++i) {
+ flen += strlen(file_names[i]) + 1;
+ }
rec.p.id = JIT_CODE_DEBUG_INFO;
size = sizeof(rec);
* file[] : source file name
*/
size += nr_lines * sizeof(struct debug_entry);
- size += flen * nr_lines;
+ size += flen;
rec.p.total_size = size;
/*
if (sret != 1)
goto error;
- sret = fwrite_unlocked(fn, flen, 1, fp);
+ sret = fwrite_unlocked(file_names[i], strlen(file_names[i]) + 1, 1, fp);
if (sret != 1)
goto error;
}
unsigned long pc;
int line_number;
int discrim; /* discriminator -- 0 for now */
+ jmethodID methodID;
} jvmti_line_info_t;
void *jvmti_open(void);
uint64_t vma, void const *code,
const unsigned int code_size);
-int jvmti_write_debug_info(void *agent,
- uint64_t code,
- const char *file,
+int jvmti_write_debug_info(void *agent, uint64_t code, int nr_lines,
jvmti_line_info_t *li,
- int nr_lines);
+ const char * const * file_names);
#if defined(__cplusplus)
}
tab[lines].pc = (unsigned long)pc;
tab[lines].line_number = loc_tab[i].line_number;
tab[lines].discrim = 0; /* not yet used */
+ tab[lines].methodID = m;
lines++;
} else {
break;
return JVMTI_ERROR_NONE;
}
+static void
+copy_class_filename(const char * class_sign, const char * file_name, char * result, size_t max_length)
+{
+ /*
+ * Assume path name is class hierarchy, this is a common practice with Java programs
+ */
+ if (*class_sign == 'L') {
+ int j, i = 0;
+ char *p = strrchr(class_sign, '/');
+ if (p) {
+ /* drop the 'L' prefix and copy up to the final '/' */
+ for (i = 0; i < (p - class_sign); i++)
+ result[i] = class_sign[i+1];
+ }
+ /*
+ * append file name, we use loops and not string ops to avoid modifying
+ * class_sign which is used later for the symbol name
+ */
+ for (j = 0; i < (max_length - 1) && file_name && j < strlen(file_name); j++, i++)
+ result[i] = file_name[j];
+
+ result[i] = '\0';
+ } else {
+ /* fallback case */
+ size_t file_name_len = strlen(file_name);
+ strncpy(result, file_name, file_name_len < max_length ? file_name_len : max_length);
+ }
+}
+
+static jvmtiError
+get_source_filename(jvmtiEnv *jvmti, jmethodID methodID, char ** buffer)
+{
+ jvmtiError ret;
+ jclass decl_class;
+ char *file_name = NULL;
+ char *class_sign = NULL;
+ char fn[PATH_MAX];
+ size_t len;
+
+ ret = (*jvmti)->GetMethodDeclaringClass(jvmti, methodID, &decl_class);
+ if (ret != JVMTI_ERROR_NONE) {
+ print_error(jvmti, "GetMethodDeclaringClass", ret);
+ return ret;
+ }
+
+ ret = (*jvmti)->GetSourceFileName(jvmti, decl_class, &file_name);
+ if (ret != JVMTI_ERROR_NONE) {
+ print_error(jvmti, "GetSourceFileName", ret);
+ return ret;
+ }
+
+ ret = (*jvmti)->GetClassSignature(jvmti, decl_class, &class_sign, NULL);
+ if (ret != JVMTI_ERROR_NONE) {
+ print_error(jvmti, "GetClassSignature", ret);
+ goto free_file_name_error;
+ }
+
+ copy_class_filename(class_sign, file_name, fn, PATH_MAX);
+ len = strlen(fn);
+ *buffer = malloc((len + 1) * sizeof(char));
+ if (!*buffer) {
+ print_error(jvmti, "GetClassSignature", ret);
+ ret = JVMTI_ERROR_OUT_OF_MEMORY;
+ goto free_class_sign_error;
+ }
+ strcpy(*buffer, fn);
+ ret = JVMTI_ERROR_NONE;
+
+free_class_sign_error:
+ (*jvmti)->Deallocate(jvmti, (unsigned char *)class_sign);
+free_file_name_error:
+ (*jvmti)->Deallocate(jvmti, (unsigned char *)file_name);
+
+ return ret;
+}
+
+static jvmtiError
+fill_source_filenames(jvmtiEnv *jvmti, int nr_lines,
+ const jvmti_line_info_t * line_tab,
+ char ** file_names)
+{
+ int index;
+ jvmtiError ret;
+
+ for (index = 0; index < nr_lines; ++index) {
+ ret = get_source_filename(jvmti, line_tab[index].methodID, &(file_names[index]));
+ if (ret != JVMTI_ERROR_NONE)
+ return ret;
+ }
+
+ return JVMTI_ERROR_NONE;
+}
+
static void JNICALL
compiled_method_load_cb(jvmtiEnv *jvmti,
jmethodID method,
const void *compile_info)
{
jvmti_line_info_t *line_tab = NULL;
+ char ** line_file_names = NULL;
jclass decl_class;
char *class_sign = NULL;
char *func_name = NULL;
char *func_sign = NULL;
- char *file_name= NULL;
+ char *file_name = NULL;
char fn[PATH_MAX];
uint64_t addr = (uint64_t)(uintptr_t)code_addr;
jvmtiError ret;
int nr_lines = 0; /* in line_tab[] */
size_t len;
+ int output_debug_info = 0;
ret = (*jvmti)->GetMethodDeclaringClass(jvmti, method,
&decl_class);
if (ret != JVMTI_ERROR_NONE) {
warnx("jvmti: cannot get line table for method");
nr_lines = 0;
+ } else if (nr_lines > 0) {
+ line_file_names = malloc(sizeof(char*) * nr_lines);
+ if (!line_file_names) {
+ warnx("jvmti: cannot allocate space for line table method names");
+ } else {
+ memset(line_file_names, 0, sizeof(char*) * nr_lines);
+ ret = fill_source_filenames(jvmti, nr_lines, line_tab, line_file_names);
+ if (ret != JVMTI_ERROR_NONE) {
+ warnx("jvmti: fill_source_filenames failed");
+ } else {
+ output_debug_info = 1;
+ }
+ }
}
}
goto error;
}
- /*
- * Assume path name is class hierarchy, this is a common practice with Java programs
- */
- if (*class_sign == 'L') {
- int j, i = 0;
- char *p = strrchr(class_sign, '/');
- if (p) {
- /* drop the 'L' prefix and copy up to the final '/' */
- for (i = 0; i < (p - class_sign); i++)
- fn[i] = class_sign[i+1];
- }
- /*
- * append file name, we use loops and not string ops to avoid modifying
- * class_sign which is used later for the symbol name
- */
- for (j = 0; i < (PATH_MAX - 1) && file_name && j < strlen(file_name); j++, i++)
- fn[i] = file_name[j];
- fn[i] = '\0';
- } else {
- /* fallback case */
- strcpy(fn, file_name);
- }
+ copy_class_filename(class_sign, file_name, fn, PATH_MAX);
+
/*
* write source line info record if we have it
*/
- if (jvmti_write_debug_info(jvmti_agent, addr, fn, line_tab, nr_lines))
- warnx("jvmti: write_debug_info() failed");
+ if (output_debug_info)
+ if (jvmti_write_debug_info(jvmti_agent, addr, nr_lines, line_tab, (const char * const *) line_file_names))
+ warnx("jvmti: write_debug_info() failed");
len = strlen(func_name) + strlen(class_sign) + strlen(func_sign) + 2;
{
(*jvmti)->Deallocate(jvmti, (unsigned char *)class_sign);
(*jvmti)->Deallocate(jvmti, (unsigned char *)file_name);
free(line_tab);
+ while (line_file_names && (nr_lines > 0)) {
+ if (line_file_names[nr_lines - 1]) {
+ free(line_file_names[nr_lines - 1]);
+ }
+ nr_lines -= 1;
+ }
+ free(line_file_names);
}
static void JNICALL
fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1)
fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1)
fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
-ff:
+ff: UD0
EndTable
Table: 3-byte opcode 1 (0x0f 0x38)
7e: vpermt2d/q Vx,Hx,Wx (66),(ev)
7f: vpermt2ps/d Vx,Hx,Wx (66),(ev)
80: INVEPT Gy,Mdq (66)
-81: INVPID Gy,Mdq (66)
+81: INVVPID Gy,Mdq (66)
82: INVPCID Gy,Mdq (66)
83: vpmultishiftqb Vx,Hx,Wx (66),(ev)
88: vexpandps/d Vpd,Wpd (66),(ev)
EndTable
GrpTable: Grp10
+# all are UD1
+0: UD1
+1: UD1
+2: UD1
+3: UD1
+4: UD1
+5: UD1
+6: UD1
+7: UD1
EndTable
# Grp11A and Grp11B are expressed as Grp11 in Intel SDM
static inline u64 perf_mmap__read_head(struct perf_mmap *mm)
{
struct perf_event_mmap_page *pc = mm->base;
- u64 head = ACCESS_ONCE(pc->data_head);
+ u64 head = READ_ONCE(pc->data_head);
rmb();
return head;
}
# SPDX-License-Identifier: GPL-2.0
-ifeq ($(srctree),)
-srctree := $(patsubst %/,%,$(dir $(CURDIR)))
-srctree := $(patsubst %/,%,$(dir $(srctree)))
-srctree := $(patsubst %/,%,$(dir $(srctree)))
-srctree := $(patsubst %/,%,$(dir $(srctree)))
-endif
-include $(srctree)/tools/scripts/Makefile.arch
-
-$(call detected_var,SRCARCH)
-
LIBDIR := ../../../lib
BPFDIR := $(LIBDIR)/bpf
APIDIR := ../../../include/uapi
-ASMDIR:= ../../../arch/$(ARCH)/include/uapi
GENDIR := ../../../../include/generated
GENHDR := $(GENDIR)/autoconf.h
GENFLAGS := -DHAVE_GENHDR
endif
-CFLAGS += -Wall -O2 -I$(APIDIR) -I$(ASMDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
-LDLIBS += -lcap -lelf
+CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
+LDLIBS += -lcap -lelf -lrt
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_align test_verifier_log test_dev_cgroup
CLANG ?= clang
LLC ?= llc
-PROBE := $(shell llc -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
+PROBE := $(shell $(LLC) -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
# Let newer LLVM versions transparently probe the kernel for availability
# of full BPF instruction set.
.result = REJECT,
.matches = {
{4, "R5=pkt(id=0,off=0,r=0,imm=0)"},
- /* ptr & 0x40 == either 0 or 0x40 */
- {5, "R5=inv(id=0,umax_value=64,var_off=(0x0; 0x40))"},
- /* ptr << 2 == unknown, (4n) */
- {7, "R5=inv(id=0,smax_value=9223372036854775804,umax_value=18446744073709551612,var_off=(0x0; 0xfffffffffffffffc))"},
- /* (4n) + 14 == (4n+2). We blow our bounds, because
- * the add could overflow.
- */
- {8, "R5=inv(id=0,var_off=(0x2; 0xfffffffffffffffc))"},
- /* Checked s>=0 */
- {10, "R5=inv(id=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"},
- /* packet pointer + nonnegative (4n+2) */
- {12, "R6=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"},
- {14, "R4=pkt(id=1,off=4,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"},
- /* NET_IP_ALIGN + (4n+2) == (4n), alignment is fine.
- * We checked the bounds, but it might have been able
- * to overflow if the packet pointer started in the
- * upper half of the address space.
- * So we did not get a 'range' on R6, and the access
- * attempt will fail.
- */
- {16, "R6=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"},
+ /* R5 bitwise operator &= on pointer prohibited */
}
},
{
info_len != sizeof(struct bpf_map_info) ||
strcmp((char *)map_infos[i].name, expected_map_name),
"get-map-info(fd)",
- "err %d errno %d type %d(%d) info_len %u(%lu) key_size %u value_size %u max_entries %u map_flags %X name %s(%s)\n",
+ "err %d errno %d type %d(%d) info_len %u(%Zu) key_size %u value_size %u max_entries %u map_flags %X name %s(%s)\n",
err, errno,
map_infos[i].type, BPF_MAP_TYPE_ARRAY,
info_len, sizeof(struct bpf_map_info),
*(int *)prog_infos[i].map_ids != map_infos[i].id ||
strcmp((char *)prog_infos[i].name, expected_prog_name),
"get-prog-info(fd)",
- "err %d errno %d i %d type %d(%d) info_len %u(%lu) jit_enabled %d jited_prog_len %u xlated_prog_len %u jited_prog %d xlated_prog %d load_time %lu(%lu) uid %u(%u) nr_map_ids %u(%u) map_id %u(%u) name %s(%s)\n",
+ "err %d errno %d i %d type %d(%d) info_len %u(%Zu) jit_enabled %d jited_prog_len %u xlated_prog_len %u jited_prog %d xlated_prog %d load_time %lu(%lu) uid %u(%u) nr_map_ids %u(%u) map_id %u(%u) name %s(%s)\n",
err, errno, i,
prog_infos[i].type, BPF_PROG_TYPE_SOCKET_FILTER,
info_len, sizeof(struct bpf_prog_info),
memcmp(&prog_info, &prog_infos[i], info_len) ||
*(int *)prog_info.map_ids != saved_map_id,
"get-prog-info(next_id->fd)",
- "err %d errno %d info_len %u(%lu) memcmp %d map_id %u(%u)\n",
+ "err %d errno %d info_len %u(%Zu) memcmp %d map_id %u(%u)\n",
err, errno, info_len, sizeof(struct bpf_prog_info),
memcmp(&prog_info, &prog_infos[i], info_len),
*(int *)prog_info.map_ids, saved_map_id);
memcmp(&map_info, &map_infos[i], info_len) ||
array_value != array_magic_value,
"check get-map-info(next_id->fd)",
- "err %d errno %d info_len %u(%lu) memcmp %d array_value %llu(%llu)\n",
+ "err %d errno %d info_len %u(%Zu) memcmp %d array_value %llu(%llu)\n",
err, errno, info_len, sizeof(struct bpf_map_info),
memcmp(&map_info, &map_infos[i], info_len),
array_value, array_magic_value);
BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
- .errstr_unpriv = "R1 subtraction from stack pointer",
- .result_unpriv = REJECT,
- .errstr = "R1 invalid mem access",
+ .errstr = "R1 subtraction from stack pointer",
.result = REJECT,
},
{
},
.errstr = "misaligned stack access",
.result = REJECT,
- .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"invalid map_fd for function call",
},
.result = REJECT,
.errstr = "misaligned stack access off (0x0; 0x0)+-8+2 size 8",
- .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"PTR_TO_STACK store/load - bad alignment on reg",
},
.result = REJECT,
.errstr = "misaligned stack access off (0x0; 0x0)+-10+8 size 8",
- .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"PTR_TO_STACK store/load - out of bounds low",
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
- .result = ACCEPT,
- .result_unpriv = REJECT,
- .errstr_unpriv = "R1 pointer += pointer",
+ .result = REJECT,
+ .errstr = "R1 pointer += pointer",
},
{
"unpriv: neg pointer",
BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
offsetof(struct __sk_buff, data)),
BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_4),
- BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+ offsetof(struct __sk_buff, len)),
BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 49),
BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 49),
BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
- .errstr = "invalid access to packet",
+ .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3, 11 },
- .errstr_unpriv = "R0 pointer += pointer",
- .errstr = "R0 invalid mem access 'inv'",
- .result_unpriv = REJECT,
+ .errstr = "R0 pointer += pointer",
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
BPF_EXIT_INSN(),
},
.fixup_map1 = { 4 },
- .errstr = "R4 invalid mem access",
+ .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS
},
BPF_EXIT_INSN(),
},
.fixup_map1 = { 4 },
- .errstr = "R4 invalid mem access",
+ .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS
},
BPF_EXIT_INSN(),
},
.fixup_map1 = { 4 },
- .errstr = "R4 invalid mem access",
+ .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS
},
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3 },
- .errstr_unpriv = "R0 bitwise operator &= on pointer",
- .errstr = "invalid mem access 'inv'",
+ .errstr = "R0 bitwise operator &= on pointer",
.result = REJECT,
- .result_unpriv = REJECT,
},
{
"map element value illegal alu op, 2",
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3 },
- .errstr_unpriv = "R0 32-bit pointer arithmetic prohibited",
- .errstr = "invalid mem access 'inv'",
+ .errstr = "R0 32-bit pointer arithmetic prohibited",
.result = REJECT,
- .result_unpriv = REJECT,
},
{
"map element value illegal alu op, 3",
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3 },
- .errstr_unpriv = "R0 pointer arithmetic with /= operator",
- .errstr = "invalid mem access 'inv'",
+ .errstr = "R0 pointer arithmetic with /= operator",
.result = REJECT,
- .result_unpriv = REJECT,
},
{
"map element value illegal alu op, 4",
BPF_EXIT_INSN(),
},
.fixup_map_in_map = { 3 },
- .errstr = "R1 type=inv expected=map_ptr",
- .errstr_unpriv = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited",
+ .errstr = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited",
.result = REJECT,
},
{
},
.result = ACCEPT,
},
+ {
+ "ld_abs: tests on r6 and skb data reload helper",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_LD_ABS(BPF_B, 0),
+ BPF_LD_ABS(BPF_H, 0),
+ BPF_LD_ABS(BPF_W, 0),
+ BPF_MOV64_REG(BPF_REG_7, BPF_REG_6),
+ BPF_MOV64_IMM(BPF_REG_6, 0),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+ BPF_MOV64_IMM(BPF_REG_2, 1),
+ BPF_MOV64_IMM(BPF_REG_3, 2),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_skb_vlan_push),
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_7),
+ BPF_LD_ABS(BPF_B, 0),
+ BPF_LD_ABS(BPF_H, 0),
+ BPF_LD_ABS(BPF_W, 0),
+ BPF_MOV64_IMM(BPF_REG_0, 42),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ },
{
"ld_ind: check calling conv, r1",
.insns = {
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R8 invalid mem access 'inv'",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R8 invalid mem access 'inv'",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_JMP_IMM(BPF_JA, 0, 0, -7),
},
.fixup_map1 = { 4 },
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
},
{
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
- .errstr_unpriv = "R0 pointer comparison prohibited",
- .errstr = "R0 min value is negative",
+ .errstr = "unbounded min value",
.result = REJECT,
.result_unpriv = REJECT,
},
.errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.",
.result = REJECT,
},
+ {
+ "bounds check based on zero-extended MOV",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+ /* r2 = 0x0000'0000'ffff'ffff */
+ BPF_MOV32_IMM(BPF_REG_2, 0xffffffff),
+ /* r2 = 0 */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 32),
+ /* no-op */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2),
+ /* access at offset 0 */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .result = ACCEPT
+ },
+ {
+ "bounds check based on sign-extended MOV. test1",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+ /* r2 = 0xffff'ffff'ffff'ffff */
+ BPF_MOV64_IMM(BPF_REG_2, 0xffffffff),
+ /* r2 = 0xffff'ffff */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 32),
+ /* r0 = <oob pointer> */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2),
+ /* access to OOB pointer */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "map_value pointer and 4294967295",
+ .result = REJECT
+ },
+ {
+ "bounds check based on sign-extended MOV. test2",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+ /* r2 = 0xffff'ffff'ffff'ffff */
+ BPF_MOV64_IMM(BPF_REG_2, 0xffffffff),
+ /* r2 = 0xfff'ffff */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 36),
+ /* r0 = <oob pointer> */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2),
+ /* access to OOB pointer */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "R0 min value is outside of the array range",
+ .result = REJECT
+ },
+ {
+ "bounds check based on reg_off + var_off + insn_off. test1",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, mark)),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_6, 1),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, (1 << 29) - 1),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, (1 << 29) - 1),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 4 },
+ .errstr = "value_size=8 off=1073741825",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ },
+ {
+ "bounds check based on reg_off + var_off + insn_off. test2",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, mark)),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_6, 1),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, (1 << 30) - 1),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, (1 << 29) - 1),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 3),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 4 },
+ .errstr = "value 1073741823",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ },
+ {
+ "bounds check after truncation of non-boundary-crossing range",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
+ /* r1 = [0x00, 0xff] */
+ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+ BPF_MOV64_IMM(BPF_REG_2, 1),
+ /* r2 = 0x10'0000'0000 */
+ BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 36),
+ /* r1 = [0x10'0000'0000, 0x10'0000'00ff] */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
+ /* r1 = [0x10'7fff'ffff, 0x10'8000'00fe] */
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff),
+ /* r1 = [0x00, 0xff] */
+ BPF_ALU32_IMM(BPF_SUB, BPF_REG_1, 0x7fffffff),
+ /* r1 = 0 */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8),
+ /* no-op */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ /* access at offset 0 */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .result = ACCEPT
+ },
+ {
+ "bounds check after truncation of boundary-crossing range (1)",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
+ /* r1 = [0x00, 0xff] */
+ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = [0xffff'ff80, 0x1'0000'007f] */
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = [0xffff'ff80, 0xffff'ffff] or
+ * [0x0000'0000, 0x0000'007f]
+ */
+ BPF_ALU32_IMM(BPF_ADD, BPF_REG_1, 0),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = [0x00, 0xff] or
+ * [0xffff'ffff'0000'0080, 0xffff'ffff'ffff'ffff]
+ */
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = 0 or
+ * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff]
+ */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8),
+ /* no-op or OOB pointer computation */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ /* potentially OOB access */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ /* not actually fully unbounded, but the bound is very high */
+ .errstr = "R0 unbounded memory access",
+ .result = REJECT
+ },
+ {
+ "bounds check after truncation of boundary-crossing range (2)",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9),
+ /* r1 = [0x00, 0xff] */
+ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = [0xffff'ff80, 0x1'0000'007f] */
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = [0xffff'ff80, 0xffff'ffff] or
+ * [0x0000'0000, 0x0000'007f]
+ * difference to previous test: truncation via MOV32
+ * instead of ALU32.
+ */
+ BPF_MOV32_REG(BPF_REG_1, BPF_REG_1),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = [0x00, 0xff] or
+ * [0xffff'ffff'0000'0080, 0xffff'ffff'ffff'ffff]
+ */
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1),
+ /* r1 = 0 or
+ * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff]
+ */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8),
+ /* no-op or OOB pointer computation */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ /* potentially OOB access */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ /* not actually fully unbounded, but the bound is very high */
+ .errstr = "R0 unbounded memory access",
+ .result = REJECT
+ },
+ {
+ "bounds check after wrapping 32-bit addition",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+ /* r1 = 0x7fff'ffff */
+ BPF_MOV64_IMM(BPF_REG_1, 0x7fffffff),
+ /* r1 = 0xffff'fffe */
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff),
+ /* r1 = 0 */
+ BPF_ALU32_IMM(BPF_ADD, BPF_REG_1, 2),
+ /* no-op */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ /* access at offset 0 */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .result = ACCEPT
+ },
+ {
+ "bounds check after shift with oversized count operand",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ BPF_MOV64_IMM(BPF_REG_2, 32),
+ BPF_MOV64_IMM(BPF_REG_1, 1),
+ /* r1 = (u32)1 << (u32)32 = ? */
+ BPF_ALU32_REG(BPF_LSH, BPF_REG_1, BPF_REG_2),
+ /* r1 = [0x0000, 0xffff] */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0xffff),
+ /* computes unknown pointer, potentially OOB */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ /* potentially OOB access */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "R0 max value is outside of the array range",
+ .result = REJECT
+ },
+ {
+ "bounds check after right shift of maybe-negative number",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ /* r1 = [0x00, 0xff] */
+ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+ /* r1 = [-0x01, 0xfe] */
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1),
+ /* r1 = 0 or 0xff'ffff'ffff'ffff */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8),
+ /* r1 = 0 or 0xffff'ffff'ffff */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8),
+ /* computes unknown pointer, potentially OOB */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ /* potentially OOB access */
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0),
+ /* exit */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "R0 unbounded memory access",
+ .result = REJECT
+ },
+ {
+ "bounds check map access with off+size signed 32bit overflow. test1",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x7ffffffe),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+ BPF_JMP_A(0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "map_value pointer and 2147483646",
+ .result = REJECT
+ },
+ {
+ "bounds check map access with off+size signed 32bit overflow. test2",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+ BPF_JMP_A(0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "pointer offset 1073741822",
+ .result = REJECT
+ },
+ {
+ "bounds check map access with off+size signed 32bit overflow. test3",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 0x1fffffff),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 0x1fffffff),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 2),
+ BPF_JMP_A(0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "pointer offset -1073741822",
+ .result = REJECT
+ },
+ {
+ "bounds check map access with off+size signed 32bit overflow. test4",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+ BPF_EXIT_INSN(),
+ BPF_MOV64_IMM(BPF_REG_1, 1000000),
+ BPF_ALU64_IMM(BPF_MUL, BPF_REG_1, 1000000),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 2),
+ BPF_JMP_A(0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .errstr = "map_value pointer and 1000000000000",
+ .result = REJECT
+ },
+ {
+ "pointer/scalar confusion in state equality check (way 1)",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+ BPF_JMP_A(1),
+ BPF_MOV64_REG(BPF_REG_0, BPF_REG_10),
+ BPF_JMP_A(0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .result = ACCEPT,
+ .result_unpriv = REJECT,
+ .errstr_unpriv = "R0 leaks addr as return value"
+ },
+ {
+ "pointer/scalar confusion in state equality check (way 2)",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV64_REG(BPF_REG_0, BPF_REG_10),
+ BPF_JMP_A(1),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .result = ACCEPT,
+ .result_unpriv = REJECT,
+ .errstr_unpriv = "R0 leaks addr as return value"
+ },
{
"variable-offset ctx access",
.insns = {
.result = REJECT,
.prog_type = BPF_PROG_TYPE_LWT_IN,
},
+ {
+ "indirect variable-offset stack access",
+ .insns = {
+ /* Fill the top 8 bytes of the stack */
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ /* Get an unknown value */
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+ /* Make it small and 4-byte aligned */
+ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4),
+ BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 8),
+ /* add it to fp. We now have either fp-4 or fp-8, but
+ * we don't know which
+ */
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10),
+ /* dereference it indirectly */
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 5 },
+ .errstr = "variable stack read R2",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_LWT_IN,
+ },
+ {
+ "direct stack access with 32-bit wraparound. test1",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff),
+ BPF_MOV32_IMM(BPF_REG_0, 0),
+ BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+ BPF_EXIT_INSN()
+ },
+ .errstr = "fp pointer and 2147483647",
+ .result = REJECT
+ },
+ {
+ "direct stack access with 32-bit wraparound. test2",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x3fffffff),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x3fffffff),
+ BPF_MOV32_IMM(BPF_REG_0, 0),
+ BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+ BPF_EXIT_INSN()
+ },
+ .errstr = "fp pointer and 1073741823",
+ .result = REJECT
+ },
+ {
+ "direct stack access with 32-bit wraparound. test3",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x1fffffff),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x1fffffff),
+ BPF_MOV32_IMM(BPF_REG_0, 0),
+ BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+ BPF_EXIT_INSN()
+ },
+ .errstr = "fp pointer offset 1073741822",
+ .result = REJECT
+ },
{
"liveness pruning and write screening",
.insns = {
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
+ {
+ "pkt_end - pkt_start is allowed",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct __sk_buff, data_end)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+ offsetof(struct __sk_buff, data)),
+ BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_2),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ },
{
"XDP pkt read, pkt_end mangling, bad access 1",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
- .errstr = "R1 offset is outside of the packet",
+ .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_XDP,
},
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
- .errstr = "R1 offset is outside of the packet",
+ .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_XDP,
},
CONFIG_USER_NS=y
CONFIG_BPF_SYSCALL=y
CONFIG_TEST_BPF=m
+CONFIG_NUMA=y
* NB: Different Linux versions do different things with the
* accessed bit in set_thread_area().
*/
- if (ar != expected_ar &&
- (ldt || ar != (expected_ar | AR_ACCESSED))) {
+ if (ar != expected_ar && ar != (expected_ar | AR_ACCESSED)) {
printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n",
(ldt ? "LDT" : "GDT"), index, ar, expected_ar);
nerrs++;
static int finish_exec_test(void)
{
/*
- * In a sensible world, this would be check_invalid_segment(0, 1);
- * For better or for worse, though, the LDT is inherited across exec.
- * We can probably change this safely, but for now we test it.
+ * Older kernel versions did inherit the LDT on exec() which is
+ * wrong because exec() starts from a clean state.
*/
- check_valid_segment(0, 1,
- AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB,
- 42, true);
+ check_invalid_segment(0, 1);
return nerrs ? 1 : 0;
}
while (*c != '\0') {
int port, status, speed, devid;
- unsigned long socket;
+ int sockfd;
char lbusid[SYSFS_BUS_ID_SIZE];
struct usbip_imported_device *idev;
char hub[3];
- ret = sscanf(c, "%2s %d %d %d %x %lx %31s\n",
+ ret = sscanf(c, "%2s %d %d %d %x %u %31s\n",
hub, &port, &status, &speed,
- &devid, &socket, lbusid);
+ &devid, &sockfd, lbusid);
if (ret < 5) {
dbg("sscanf failed: %d", ret);
dbg("hub %s port %d status %d speed %d devid %x",
hub, port, status, speed, devid);
- dbg("socket %lx lbusid %s", socket, lbusid);
+ dbg("sockfd %u lbusid %s", sockfd, lbusid);
/* if a device is connected, look at it */
idev = &vhci_driver->idev[port];
return 0;
}
-#define MAX_STATUS_NAME 16
+#define MAX_STATUS_NAME 18
static int refresh_imported_device_list(void)
{
char command[SYSFS_BUS_ID_SIZE + 4];
char match_busid_attr_path[SYSFS_PATH_MAX];
int rc;
+ int cmd_size;
snprintf(match_busid_attr_path, sizeof(match_busid_attr_path),
"%s/%s/%s/%s/%s/%s", SYSFS_MNT_PATH, SYSFS_BUS_NAME,
attr_name);
if (add)
- snprintf(command, SYSFS_BUS_ID_SIZE + 4, "add %s", busid);
+ cmd_size = snprintf(command, SYSFS_BUS_ID_SIZE + 4, "add %s",
+ busid);
else
- snprintf(command, SYSFS_BUS_ID_SIZE + 4, "del %s", busid);
+ cmd_size = snprintf(command, SYSFS_BUS_ID_SIZE + 4, "del %s",
+ busid);
rc = write_sysfs_attribute(match_busid_attr_path, command,
- sizeof(command));
+ cmd_size);
if (rc < 0) {
dbg("failed to write match_busid: %s", strerror(errno));
return -1;
#define unlikely(x) (__builtin_expect(!!(x), 0))
#define likely(x) (__builtin_expect(!!(x), 1))
#define ALIGN(x, a) (((x) + (a) - 1) / (a) * (a))
+#define SIZE_MAX (~(size_t)0)
+
typedef pthread_spinlock_t spinlock_t;
typedef int gfp_t;
-static void *kmalloc(unsigned size, gfp_t gfp)
-{
- return memalign(64, size);
-}
+#define __GFP_ZERO 0x1
-static void *kzalloc(unsigned size, gfp_t gfp)
+static void *kmalloc(unsigned size, gfp_t gfp)
{
void *p = memalign(64, size);
if (!p)
return p;
- memset(p, 0, size);
+ if (gfp & __GFP_ZERO)
+ memset(p, 0, size);
return p;
}
+static inline void *kzalloc(unsigned size, gfp_t flags)
+{
+ return kmalloc(size, flags | __GFP_ZERO);
+}
+
+static inline void *kmalloc_array(size_t n, size_t size, gfp_t flags)
+{
+ if (size != 0 && n > SIZE_MAX / size)
+ return NULL;
+ return kmalloc(n * size, flags);
+}
+
+static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
+{
+ return kmalloc_array(n, size, flags | __GFP_ZERO);
+}
+
static void kfree(void *p)
{
if (p)
-#!/bin/sh
+#!/bin/bash
# Sergey Senozhatsky, 2015
# sergey.senozhatsky.work@gmail.com
{
struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
struct arch_timer_context *vtimer;
+ u32 cnt_ctl;
- if (!vcpu) {
- pr_warn_once("Spurious arch timer IRQ on non-VCPU thread\n");
- return IRQ_NONE;
- }
- vtimer = vcpu_vtimer(vcpu);
+ /*
+ * We may see a timer interrupt after vcpu_put() has been called which
+ * sets the CPU's vcpu pointer to NULL, because even though the timer
+ * has been disabled in vtimer_save_state(), the hardware interrupt
+ * signal may not have been retired from the interrupt controller yet.
+ */
+ if (!vcpu)
+ return IRQ_HANDLED;
+ vtimer = vcpu_vtimer(vcpu);
if (!vtimer->irq.level) {
- vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
- if (kvm_timer_irq_can_fire(vtimer))
+ cnt_ctl = read_sysreg_el0(cntv_ctl);
+ cnt_ctl &= ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_STAT |
+ ARCH_TIMER_CTRL_IT_MASK;
+ if (cnt_ctl == (ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_STAT))
kvm_timer_update_irq(vcpu, true, vtimer);
}
/* Disable the virtual timer */
write_sysreg_el0(0, cntv_ctl);
+ isb();
vtimer->loaded = false;
out:
return 0;
}
-int kvm_timer_hyp_init(void)
+int kvm_timer_hyp_init(bool has_gic)
{
struct arch_timer_kvm_info *info;
int err;
return err;
}
- err = irq_set_vcpu_affinity(host_vtimer_irq, kvm_get_running_vcpus());
- if (err) {
- kvm_err("kvm_arch_timer: error setting vcpu affinity\n");
- goto out_free_irq;
+ if (has_gic) {
+ err = irq_set_vcpu_affinity(host_vtimer_irq,
+ kvm_get_running_vcpus());
+ if (err) {
+ kvm_err("kvm_arch_timer: error setting vcpu affinity\n");
+ goto out_free_irq;
+ }
}
kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);
no_vgic:
preempt_disable();
timer->enabled = 1;
- if (!irqchip_in_kernel(vcpu->kvm))
- kvm_timer_vcpu_load_user(vcpu);
- else
- kvm_timer_vcpu_load_vgic(vcpu);
+ kvm_timer_vcpu_load(vcpu);
preempt_enable();
return 0;
/*
* Init HYP architected timer support
*/
- err = kvm_timer_hyp_init();
+ err = kvm_timer_hyp_init(vgic_present);
if (err)
goto out;
}
trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
- data);
+ &data);
data = vcpu_data_host_to_guest(vcpu, data, len);
vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data);
}
data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
len);
- trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, data);
+ trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, &data);
kvm_mmio_write_buf(data_buf, len, data);
ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, fault_ipa, len,
data_buf);
} else {
trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, len,
- fault_ipa, 0);
+ fault_ipa, NULL);
ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, fault_ipa, len,
data_buf);
*/
void free_hyp_pgds(void)
{
- unsigned long addr;
-
mutex_lock(&kvm_hyp_pgd_mutex);
if (boot_hyp_pgd) {
if (hyp_pgd) {
unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
- for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
- unmap_hyp_range(hyp_pgd, kern_hyp_va(addr), PGDIR_SIZE);
- for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE)
- unmap_hyp_range(hyp_pgd, kern_hyp_va(addr), PGDIR_SIZE);
+ unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET),
+ (uintptr_t)high_memory - PAGE_OFFSET);
+ unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START),
+ VMALLOC_END - VMALLOC_START);
free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
hyp_pgd = NULL;