The surprise hotplug is driven by interrupt in PowerNV PCI hotplug
driver. In the interrupt handler, pnv_php_interrupt(), we bail when
pnv_pci_get_presence_state() returns zero wrongly. It causes the
presence change event is always ignored incorrectly.
This fixes the issue by bailing on error (non-zero value) returned
from pnv_pci_get_presence_state().
Fixes: 360aebd85a4 ("drivers/pci/hotplug: Support surprise hotplug in powernv driver") Cc: stable@vger.kernel.org #v4.9+ Reported-by: Hank Chang <hankmax0000@gmail.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Willie Liauw <williel@supermicro.com.tw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After updating ppc-dis.c, ppc-opc.c and ppc.h the following changes were
made to enable compilation and working of xmon:
1. Remove all disassembler_info
2. Use xmon's printf/print_address to output data and addresses
respectively.
3. All bfd_* types and casts have been removed.
4. Optimizations related to opcd_indices have been removed.
5. The dialect is set based on cpu features.
6. PPC_OPCODE_CLASSIC is no longer supported in the new
disassembler.
7. VLE opcode parsing and printing has been stripped.
8. Coding style conventions used for those routines has
been retained and it does not match our CodingStyle.
9. The highest supported dialect is POWER9.
10. Defined ATTRIBUTE_UNUSED in ppc-dis.c.
11. Defined _(x) in ppc-dis.c.
Finally, we remove the dependency on BROKEN so that XMON_DISASSEMBLY can
be enabled again.
Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Balbir Singh [Thu, 2 Feb 2017 05:03:43 +0000 (10:33 +0530)]
powerpc/xmon: Apply binutils changes to upgrade disassembly
The following commit-ids from the binutils project were applied on the
xmon branch and relicensed with the permission of the authors under
GPLv2 for the following files:
ppc-opc.c
ppc-dis.c
ppc.h
Working off of binutils commit 65b650b4c746 we have now moved up to
binutils commit a5721ba270dd.
Some commit logs have been taken verbatim, some are summarized for ease
of understanding.
Here is a summary of the commits:
33e8d5ac613d PPC7450 New. (powerpc_opcodes): Use it in dcba. c3d65c1ced61 New opcodes and mask 8dbcd839b1bb Instruction Sorting 91eb7075e370 (powerpc_opcodes): Fix the first two operands of dquaiq. 548b1dcfcbab ppc-opc.c (powerpc_opcodes): Remove the dcffix and dcffix. 930bb4cfae30 Support optional L form mtmsr. de866fccd87d (powerpc_opcodes): Order and format. 19a6653ce8c6 ppc e500mc support fa452fa6833c (ppc_cpu_t): New typedef. c8187e1509b2 (parse_cpu): Handle -m464. 081ba1b3c08b Define. (PPC_OPERAND_FSL, PPC_OPERAND_FCR, PPC_OPERAND_UDI) 9b4e57660d38 Rename altivec_or_spe to retain_flags. Handle -mvsx and -mpower7. 899d85beadd0 (powerpc_opcodes): Enable rfci, mfpmr, mtpmr for e300. e1c93c699b7d (extract_sprg): Correct operand range check. 2f3bb96af796 (powerpc_init_dialect): Do not set PPC_OPCODE_BOOKE 1cb0a7674666 (ppc_setup_opcodes): Remove PPC_OPCODE_NOPOWER4 test 21169fcfadfa (print_insn_powerpc): Skip insn if it is deprecated 80890a619b85 ("dcbt", "dcbtst") 0e55be1624c2 ("lfdepx", "stfdepx") 066be9f7bd8e (parse_cpu): Extend -mpower7 to accept power7 and isel instructions. c72ab5f2c55d (powerpc_opcodes): Reorder the opcode table so that instructions 69fe9ce501f5 (ppc_parse_cpu): New function. (powerpc_init_dialect) e401b04ca7cd (powerpc_opcodes) <"dcbzl">: Merge the POWER4 and E500MC entries. 70dc4e324b9a (powerpc_init_dialect): Do not choose a default dialect due to -many/-Many. 858d7a6db20b (powerpc_opcodes) <"tlbilxlpid", "tlbilxpid", "tlbilxva", "tlbilx" bdc7fcfe59f1 (powerpc_macros <extrdi>): Allow n+b of 64 e0d602ecffb0 (md_show_usage): Document -mpcca2 b961e85b6ebe (ppc_cpu_t): Typedef to uint64_t 8765b5569284 (powerpc_opcodes): Remove support for the the "lxsdux", "lxvd2ux" 634b50f2a623 Rename "ppca2" to "a2" 9fe54b1ca1c0 (md_show_usage): Document -m476 0dc9305793c8 Add bfd_mach_ppc_e500mc64 ce3d2015b21b Define. bfd/ * archures.c (bfd_mach_ppc_titan) cdc51b0748c4 Add -mpwr4, -mpwr5, -mpwr5x, -mpwr6 and -mpwr7 63d0fa4e9e57 Add PPC_OPCODE_E500MC for "e500mc64" cee62821d472 New Define. ("dccci"): Enable for PPCA2 85d4ac0b3c0b Correct wclr encoding. 51b5d4a8c5e5 (powerpc_opcodes): Enable divdeu, devweu, divde, divwe, divdeuo e01d869a3be2 (md_assemble): Emit APUinfo section for PPC_OPCODE_E500 09a8ad8d8f56 (powerpc_opcodes): Revert deprecation of mfocrf, mtcrf and mtocrf on EFS. f2bae120dcef (PPC_OPCODE_COMMON): Expand comment. 81a0b7e2ae09 (PPCPWR2): Add PPC_OPCODE_COMMON. (powerpc_opcodes): Add "subc" bdc70b4a03fd (PPC_OPCODE_32, PPC_OPCODE_BOOKE64, PPC_OPCODE_CLASSIC) 7102e95e4943 (ppc_set_cpu): Cast PPC_OPCODE_xxx to ppc_cpu_t before inverting f383de6633cb (powerpc_opcodes) [lswx,lswi,stswx,stswi]: Deprecate on E500 and E500MC 6b069ee70de3 Remove PPC_OPCODE_PPCPS 2f7f77101279 (powerpc_opcodes): Enable icswx for POWER7 989993d80a97 (insert_nbi, insert_rbx, FRAp, FRBp, FRSp, FRTp, NBI, RAX, RBX) a08fc94222d1 <drrndq, drrndq., dtstexq, dctqpq, dctqpq., dctfixq, dctfixq. 8ebac3aae962 (ISA_V2): Define and use for relevant BO field tests aea77599d0db Add PPC_OPCODE_ALTIVEC2, PPC_OPCODE_E6500, PPC_OPCODE_TMR b240011aba98 (disassemble_init_for_target): Handle ppc init. d668828207c2 (powerpc_opcd_indices): Bump array size b9c361e0ad33 Add support for PowerPC VLE. e1dad58d73dc (has_tls_reloc, has_tls_get_addr_call, has_vle_insns, is_ppc_vle) df7b86aa4cb6 Add check that sysdep.h has been included before 98c76446ea6b (extract_sprg): Use ALLOW8_SPRG to include VLE. a4ebc835cbcb (powerpc_macros): Add entries for e_extlwi to e_clrlslwi 94caa966375d (has_vle_insns, is_ppc_vle): Delete c7a8dbf91f37 Change RA to RA0 d908c8af5a1d Add necessary casts for printing integer values 03edbe3bfb93 Add/remove PPCVLE for some 32-bit insns 9f6a6cc022e1 <xnop, yield, mdoio, mdoom>: New extended mnemonics 588925d06545 <RSQ, RTQ>: Use PPC_OPERAND_GPR 8baf7b78b5d9 <"lswx">: Use RAX for the second and RBX for the third operand e67ed0e885d6 Changed opcode for vabsdub, vabsduh, vabsduw, mviwsplt fb048c26f19f (UIMM4, UIMM3, UIMM2, VXVA_MASK, VXVB_MASK, VXVAVB_MASK, VXVDVA_MASK 382c72e90441 (VXASHB_MASK): New define c7a5aa9c64fc (ppc_opts) <altivec>: Use PPC_OPCODE_ALTIVEC2 ab4437c3224f <vcfpsxws>: Fix opcode spelling 62082a42b9cd "lfdp" and "stfdp" use DS offset. 776fc41826bb (ppc_parse_cpu): Update prototype 943d398f4c52 (insert_sci8, extract_sci8): Rewrite. 5817ffd1f81c New define (PPC_OPCODE_HTM/POWER8) 9f0682fe89d9 (extract_vlesi): Properly sign extend c0637f3af686 (powerpc_init_dialect): Set default dialect to power8. 58ae08f29af8 (powerpc_opcodes): Add tdui, twui, tdu, twu, tui, tu 4f6ffcd38d90 (powerpc_init_dialect): Use ppc_parse_cpu() to set dialect 4b95cf5c0c75 Update copyright years a47622ac1bad Allow both signed and unsigned fields in PowerPC cmpli insn 12e87fac5c76 ppc: enable msgclr and msgsnd on Power8 8514e4db84cc Don't deprecate powerpc mftb insn db76a70026ab Power4 should treat mftb as extended mfspr mnemonic b90efa5b79ac ChangeLog rotatation and copyright year update c4e676f19656 powerpc: Add slbfee. instruction 27c49e9a8fc0 powerpc: Only initialise opcode indices once 4fff86c517ab DCBT_EO): New define 4bc0608a8b69 Fix some PPC assembler errors dc302c00611b Add hwsync extended mnemonic 99a2c5612124 Remove unused MTMSRD_L macro and re-add accidentally deleted comment 11a0cf2ec0ed Allow for optional operands with non-zero default values 7b9341139a69 PPC sync instruction accepts invalid and incompatible operands ef5a96d564a2 Remove ppc860, ppc750cl, ppc7450 insns from common ppc 43e65147c07b Remove trailing spaces in opcodes 6dca4fd141fd Add dscr and ctrl SPR mnemonics b6518b387185 Fix compile time warnings generated when compiling with clang 36f7a9411dcd Patches for illegal ppc 500 instructions a680de9a980e Add assembler, disassembler and linker support for power9 dd2887fc3de4 Reorder some power9 insns b817670b52b7 Enable 2 operand form of powerpc mfcr with -many 6f2750feaf28 Copyright update for binutils afa8d4054b8e Delete opcodes that have been removed from ISA 3.0 1178da445ad5 Accept valid one byte signed and unsigned values for the IMM8 operand e43de63c8fd1 Fix powerpc subis range 514e58b72633 Correct "Fix powerpc subis range" 19dfcc89e8d9 Add support for new POWER ISA 3.0 instructions 1fe0971e41a4 add more extern C 026122a67044 Re-add support for lbarx, lharx, stbcx. and sthcx. insns back to the E6500 cpu 14b57c7c6a53 PowerPC VLE 6fd3a02da554 Add support for yet some more new ISA 3.0 instructions dfdaec14b0db Fix some PowerPC VLE BFD issues and add some PowerPC VLE instructions fd486b633e87 Modify POWER9 support to match final ISA 3.0 documentation a5721ba270dd Disallow 3-operand cmp[l][i] for ppc64
This updates the disassembly capabilities to add support for newer
processors.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Reformat commit list for brevity] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
That is the last version of those files that were licensed under GPLv2.
This leaves the code in a state that does not compile, because the
binutils code needs to be tweaked to work in the kernel. We don't fix
that in this commit, because we want to import more binutils changes in
subsequent commits. So for now we mark XMON_DISASSEMBLY as BROKEN, so it
can't be built.
Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm/radix: Skip ptesync in pte update helpers
We do them at the start of tlb flush, and we are sure a pte update will be
followed by a tlbflush. Hence we can skip the ptesync in pte update helpers.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm/radix: Update pte update sequence for pte clear case
In the kernel we do follow the below sequence in different code paths.
pte = ptep_get_clear(ptep)
....
set_pte_at(ptep, pte)
We do that for mremap, autonuma protection update and softdirty clearing. This
implies our optimization to skip a tlb flush when clearing a pte update is
not valid, because for DD1 system that followup set_pte_at will be done witout
doing the required tlbflush. Fix that by always doing the dd1 style pte update
irrespective of new_pte value. In a later patch we will optimize the application
exit case.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/mm: Update PROTFAULT handling in the page fault path
With radix, we can get page fault with DSISR_PROTFAULT value set in case of
PROT_NONE or autonuma mapping. The PROT_NONE case in handled by the vma check
where we consider the access bad. For autonuma we should fall through and fixup
the access mask correctly.
Without this patch we trigger the WARN_ON() on radix. This code moves that
WARN_ON() within a radix_enabled() check. I also moved the WARN_ON() outside
the if condition making it apply for all type of faults (exec/write/read). It
is also conditionalized for book3s, because BOOK3E can also get a PROTFAULT to
handle the D/I cache sync.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ravi Bangoria [Tue, 22 Nov 2016 09:25:59 +0000 (14:55 +0530)]
powerpc/xmon: Fix data-breakpoint
Currently xmon data-breakpoint feature is broken.
Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.
Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.
Cc: stable@vger.kernel.org Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 14 Feb 2017 02:44:05 +0000 (13:44 +1100)]
powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y
Currently the build breaks if CMA=n and SPAPR_TCE_IOMMU=y:
arch/powerpc/mm/mmu_context_iommu.c: In function ‘mm_iommu_get’:
arch/powerpc/mm/mmu_context_iommu.c:193:42: error: ‘MIGRATE_CMA’ undeclared (first use in this function)
if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
^~~~~~~~~~~
Fix it by using the existing is_migrate_cma_page(), which evaulates to
false when CMA=n.
Fixes: 2e5bbb5461f1 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 14 Feb 2017 02:11:38 +0000 (13:11 +1100)]
powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
If we enable RADIX but disable HUGETLBFS, the build breaks with:
arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of function 'pmd_huge'
arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of function 'pud_huge'
Fix it by stubbing those functions when HUGETLBFS=n.
Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Naveen N. Rao [Tue, 7 Feb 2017 19:54:14 +0000 (01:24 +0530)]
kprobes: Introduce weak variant of kprobe_exceptions_notify()
kprobe_exceptions_notify() is not used on some of the architectures such
as arm[64] and powerpc anymore. Introduce a weak variant for such
architectures.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Fri, 10 Feb 2017 01:16:59 +0000 (12:16 +1100)]
powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL
The final paragraph of the help text is reversed. We want to enable
this option by default, and disable it if the toolchain has a working
-mprofile-kernel.
Fixes: 8c50b72a3b4f ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is because we incorrectly load the opcode into r0 before calling
__trace_opal_exit(), whereas it expects the opcode in r3 (first function
parameter). In fact we are leaving the retval in r3, so opcode and
retval will always show the same value.
Anju T [Wed, 8 Feb 2017 09:50:51 +0000 (15:20 +0530)]
powerpc/kprobes: Implement Optprobes
Current infrastructure of kprobe uses the unconditional trap instruction
to probe a running kernel. Optprobe allows kprobe to replace the trap
with a branch instruction to a detour buffer. Detour buffer contains
instructions to create an in memory pt_regs. Detour buffer also has a
call to optimized_callback() which in turn call the pre_handler(). After
the execution of the pre-handler, a call is made for instruction
emulation. The NIP is determined in advanced through dummy instruction
emulation and a branch instruction is created to the NIP at the end of
the trampoline.
To address the limitation of branch instruction in POWER architecture,
detour buffer slot is allocated from a reserved area. For the time
being, 64KB is reserved in memory for this purpose.
Instructions which can be emulated using analyse_instr() are the
candidates for optimization. Before optimization ensure that the address
range between the detour buffer allocated and the instruction being
probed is within +/- 32MB.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Naveen N. Rao [Wed, 8 Feb 2017 08:57:31 +0000 (14:27 +0530)]
powerpc/kprobes: Fixes for kprobe_lookup_name() on BE
Fix two issues with kprobes.h on BE which were exposed with the
optprobes work:
- one, having to do with a missing include for linux/module.h for
MODULE_NAME_LEN -- this didn't show up previously since the only
users of kprobe_lookup_name were in kprobes.c, which included
linux/module.h through other headers, and
- two, with a missing const qualifier for a local variable which ends
up referring a string literal. Again, this is unique to how
kprobe_lookup_name is being invoked in optprobes.c
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anju T [Wed, 8 Feb 2017 08:57:30 +0000 (14:27 +0530)]
powerpc: Add helper to check if offset is within relative branch range
To permit the use of relative branch instruction in powerpc, the target
address has to be relatively nearby, since the address is specified in an
immediate field (24 bit filed) in the instruction opcode itself. Here
nearby refers to 32MB on either side of the current instruction.
This patch verifies whether the target address is within +/- 32MB
range or not.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Naveen N. Rao [Wed, 8 Feb 2017 08:57:29 +0000 (14:27 +0530)]
powerpc/bpf: Introduce __PPC_SH64()
Introduce __PPC_SH64() as a 64-bit variant to encode shift field in some
of the shift and rotate instructions operating on double-words. Convert
some of the BPF instruction macros to use the same.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
David Gibson [Fri, 9 Dec 2016 00:07:38 +0000 (11:07 +1100)]
powerpc/pseries: Automatically resize HPT for memory hot add/remove
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.
This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed. This tries to always keep the HPT at
a reasonable size for our current memory size.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
David Gibson [Fri, 9 Dec 2016 00:07:37 +0000 (11:07 +1100)]
powerpc/pseries: Advertise HPT resizing support via CAS
The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.
If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added. Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.
This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface. We use bit 5 of byte 6 of
option vector 5 for this purpose, as defined in the PAPR ACR "HPT
resizing option".
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
David Gibson [Fri, 9 Dec 2016 00:07:36 +0000 (11:07 +1100)]
powerpc/pseries: Add support for hash table resizing
This adds support for using two hypercalls to change the size of the
main hash page table while running as a PAPR guest. For now these
hypercalls are only in experimental qemu versions.
The interface is two part: first H_RESIZE_HPT_PREPARE is used to
allocate and prepare the new hash table. This may be slow, but can be
done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the
new hash table. This requires that no CPUs be concurrently updating the
HPT, and so must be run under stop_machine().
This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Paul Mackerras <paulus@samba.org>
[mpe: Rename the debugfs file to "hpt_order"] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
David Gibson [Fri, 9 Dec 2016 00:07:35 +0000 (11:07 +1100)]
powerpc/pseries: Add hypercall wrappers for hash page table resizing
This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.
These hypercall numbers are defined in the PAPR ACR "HPT resizing
option".
It also adds a new firmware feature flag to track the presence of the
HPT resizing calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Chris Packham [Fri, 3 Feb 2017 00:43:16 +0000 (13:43 +1300)]
Documentation: powerpc/fsl: Update compatible for l2cache binding
List all the current valid compatible strings for the l2cache binding.
This should stop checkpatch.pl from complaining and will hopefully save
someone from having to debug a typo in their dts.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/opal-irqchip: Use interrupt names if present
Recent versions of OPAL can provide names for the various OPAL interrupts,
so let's use them. This also modernises the code that fetches the
interrupt array to use the helpers provided by the generic code instead
of hand-parsing the property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Free irqs on error, check allocation of names, consolidate error
handling, whitespace.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/powernv: Display the correct error info for CAPP errors.
On some CAPP errors we see console messages that prints unknown HMIs for
which CAPI recovery is in progress. This patch fixes this by printing
correct error info for HMI generated due to CAPP recovery.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
m68k/mac: Replace via-maciisi driver with via-cuda driver
Change the device probe test in the via-cuda.c driver so it will load on
Egret-based machines too. Remove the now redundant via-maciisi.c driver.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Add support for Egret system controller
The Egret system controller was the predecessor to the Cuda and the
differences are minor.
On Cuda, byte acknowledgement requires one transition of the TACK
signal; on Egret two are needed. On Cuda, TIP is active low; on Egret
it is active high. And Cuda raises certain interrupts that Egret omits.
Accomodating these differences complicates the Cuda driver slightly
but avoids a lot of duplication (see next patch).
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Initialize data_index early and increment consistently
Initialize data_index where appropriate to improve readability and
assist debugging. This change doesn't affect driver behaviour.
I prefer to see
current_req->data[data_index++]
in place of
current_req->data[0]
or
current_req->data[1]
inasmuchas it becomes obvious what the data_index variable does.
Moreover, the actual value of data_index when examined at any given moment
tells me something about prior events, which did prove helpful.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Use spinlock_irq_save/restore instead of enable/disable_irq
The cuda_start() function uses spinlock_irq_save/restore for mutual
exclusion. Let's have cuda_poll() do the same when polling the VIA
interrupt.
The benefit to disabling local irqs when the interrupt is being polled
is that the interrupt handler now has the same timing properties
regardless of whether it is invoked normally or from cuda_poll().
This driver was written back when local irqs remained enabled during
execution of interrupt handlers and cuda_poll() was probably trying
to achieve the same effect by use of enable/disable_irq.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Avoid TREQ race condition
When a read transaction completes, one of several things will happen:
a new transfer is started by the driver, a new transfer request
is raised by the Cuda (i.e. TREQ asserted), or both happen at once.
When both happen at once, there is a race condition between the TREQ test
in the read_done state and the same test in cuda_start(). Moreover, the
former test uses a stale TREQ value.
Theoretically, this can result in the undesirable outcome that the
interrupt handler completes with the state machine 'idle' when it should
instead start the next transaction.
Avoid this race by calling cuda_start() first and then confirming that it
succeeded. If not, test the current TREQ value before entering the
'reading' state.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Fix re-initialization of reply_ptr and reading_reply
When reading_reply is set, reply_ptr points into an adb_request struct.
Conversely, when reply_ptr instead points into the global cuda_rbuf,
reading_reply must be false.
Unfortunately, this rule can be violated because re-initialization
of reply_ptr and reading_reply presently depends on the TREQ input.
Fix this by re-initializing reply_ptr and reading_reply as soon as they
are known to be invalid.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Prevent read buffer overflow
If the Cuda driver does not enter the 'read_done' state for some
reason, it may continue in the 'reading' state until the buffer
overflows. Add a bounds check to prevent this.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Add TREQ, TIP and TACK signal helpers
Introduce some helpers for handling the signalling between VIA and
Cuda. This abstraction will be used to add support for Egret devices,
which utilize slightly different signalling.
Don't invert the sense of the Cuda's active-low signals when storing
them in the 'status' variable. Just assert, negate and test those
signals using the helpers.
The state machine does not need to test its own output signals to
figure out what to do next: the next state depends on the Cuda's TREQ
output. Just call the TREQ_asserted() helper function to test for that.
Similarly, there is no need to store pin directions in the 'status'
variable. That was only useful for debugging messages.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Remove redundant temporary variable
There is no possibility that current_req can change during execution of
cuda_start(). This can be confirmed by inspection: cuda_lock is always
held whenever cuda_start() is called or current_req is modified.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Sun, 1 Jan 2017 00:56:26 +0000 (19:56 -0500)]
via-cuda: Cleanup printk calls
Add missing log message severity, remove old debug messages and
replace printk() loop with print_hex_dump() call.
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/powernv: Remove separate entry for OPAL real mode calls
All entry points already read the MSR so they can easily do
the right thing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 27 Jan 2017 04:24:33 +0000 (14:24 +1000)]
powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts
The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 6 Dec 2016 01:41:12 +0000 (11:41 +1000)]
powerpc/64s: Use (start, size) rather than (start, end) for exception handlers
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 6 Dec 2016 01:40:15 +0000 (11:40 +1000)]
powerpc/64s: Tidy up after exception handler rework
Somewhere along the line, search/replace left some naming garbled,
and untidy alignment (aka. mpe stuffed it up). Might as well fix them
all up now while git blame history doesn't extend too far.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds AUX vectors for the L1I,D, L2 and L3 cache levels
providing for each cache level the size of the cache in bytes
and the geometry (line size and number of ways).
We chose to not use the existing alpha/sh definition which
packs all the information in a single entry per cache level as
it is too restricted to represent some of the geometries used
on POWER.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/64: Clean up ppc64_caches using a struct per cache
We have two set of identical struct members for the I and D sides
and mostly identical bunches of code to parse the device-tree to
populate them. Instead make a ppc_cache_info structure with one
copy for I and one for D
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/64: Fix naming of cache block vs. cache line
In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).
We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add an additional version check in gcc-plugins-check to advise users to
upgrade to gcc 5.2+ on powerpc to avoid issues with header files (gcc <=
4.6) or missing copies of rs6000-cpus.def (4.8 to 5.1 on 64-bit
targets).
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc: Correctly disable latent entropy GCC plugin on prom_init.o
Commit 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") excludes
certain powerpc early boot code from the latent entropy plugin by adding
appropriate CFLAGS. It looks like this was supposed to cover
prom_init.o, but ended up saying init.o (which doesn't exist) instead.
Fix the typo.
Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
gcc-plugins: Fix definition of DISABLE_LATENT_ENTROPY_PLUGIN
The variable DISABLE_LATENT_ENTROPY_PLUGIN is defined when
CONFIG_PAX_LATENT_ENTROPY is set. This is leftover from the original PaX
version of the plugin code and doesn't actually exist. Change the condition
to depend on CONFIG_GCC_PLUGIN_LATENT_ENTROPY instead.
Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Stub out the debugfs functions so that the build doesn't break when
CONFIG_DEBUG_FS=n.
Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nathan Fontenot [Wed, 11 Jan 2017 17:00:58 +0000 (12:00 -0500)]
powerpc/pseries: Report DLPAR capabilities
As we add the ability to do DLPAR of additional devices through
the sysfs interface we need to know which devices are supported.
This adds the reporting of supported devices with a comma separated
list reported in the existing /sys/kernel/dlpar.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
John Allen [Fri, 6 Jan 2017 19:28:54 +0000 (13:28 -0600)]
powerpc/pseries: Update affinity for memory and cpus specified in a PRRN event
Extend the existing PRRN infrastructure to perform the actual affinity
updating for cpus and memory in addition to the device tree updating.
For cpus, dynamic affinity updating already appears to exist in the
kernel in the form of arch_update_cpu_topology(). For memory, we must
place a READD operation on the hotplug queue for any phandle included in
the PRRN event that is determined to be an LMB.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
John Allen [Fri, 6 Jan 2017 19:25:53 +0000 (13:25 -0600)]
powerpc/pseries: Make the acquire/release of the drc for memory a seperate step
When adding and removing LMBs we should make the acquire/release of
the DRC a separate step to allow for a few improvements. First
this will ensure that LMBs removed during a remove by count operation
are all available if a error occurs and we need to add them back. By
first removeing all the LMBs from the kernel before releasing their
DRCs the LMBs are available to add back should an error occur.
Also, this will allow for faster re-add operations of memory for
PRRN event handling since we can skip the unneeded step of having
to release the DRC and the acquire it back.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Thu, 12 Jan 2017 10:17:34 +0000 (21:17 +1100)]
powerpc/64: Add BPF_JIT to powernv and pseries defconfigs
Commit db9112173b18 ("powerpc: Turn on BPF_JIT in ppc64_defconfig")
only added BPF_JIT to the ppc64 defconfig. Add it to our powernv
and pseries defconfigs too.
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Thu, 12 Jan 2017 10:17:33 +0000 (21:17 +1100)]
powerpc/64: Move HAVE_CONTEXT_TRACKING from pseries to common Kconfig
We added support for HAVE_CONTEXT_TRACKING, but placed the option inside
PPC_PSERIES.
This has the undesirable effect that NO_HZ_FULL can be enabled on a
kernel with both powernv and pseries support, but cannot on a kernel
with powernv only support.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Mon, 30 Jan 2017 06:41:55 +0000 (17:41 +1100)]
powerpc/sparse: Constify the address pointer in __get_user_nosleep()
In __get_user_nosleep, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nosleep macro.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Mon, 30 Jan 2017 06:41:54 +0000 (17:41 +1100)]
powerpc/sparse: Constify the address pointer in __get_user_nocheck()
In __get_user_nocheck, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nocheck macro.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Daniel Axtens [Mon, 30 Jan 2017 06:41:53 +0000 (17:41 +1100)]
powerpc/sparse: Constify the address pointer in __get_user_check()
In __get_user_check, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_check macro.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:53 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Enable radix guest support
This adds a few last pieces of the support for radix guests:
* Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
KVM_PPC_GET_RMMU_INFO ioctls for radix guests
* On POWER9, allow secondary threads to be on/off-lined while guests
are running.
* Set up LPCR and the partition table entry for radix guests.
* Don't allocate the rmap array in the kvm_memory_slot structure
on radix.
* Don't try to initialize the HPT for radix guests, since they don't
have an HPT.
* Take out the code that prevents the HV KVM module from
initializing on radix hosts.
At this stage, we only support radix guests if the host is running
in radix mode, and only support HPT guests if the host is running in
HPT mode. Thus a guest cannot switch from one mode to the other,
which enables some simplifications.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:52 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1
On POWER9 DD1, we need to invalidate the ERAT (effective to real
address translation cache) when changing the PIDR register, which
we do as part of guest entry and exit.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:51 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Allow guest exit path to have MMU on
If we allow LPCR[AIL] to be set for radix guests, then interrupts from
the guest to the host can be delivered by the hardware with relocation
on, and thus the code path starting at kvmppc_interrupt_hv can be
executed in virtual mode (MMU on) for radix guests (previously it was
only ever executed in real mode).
Most of the code is indifferent to whether the MMU is on or off, but
the calls to OPAL that use the real-mode OPAL entry code need to
be switched to use the virtual-mode code instead. The affected
calls are the calls to the OPAL XICS emulation functions in
kvmppc_read_one_intr() and related functions. We test the MSR[IR]
bit to detect whether we are in real or virtual mode, and call the
opal_rm_* or opal_* function as appropriate.
The other place that depends on the MMU being off is the optimization
where the guest exit code jumps to the external interrupt vector or
hypervisor doorbell interrupt vector, or returns to its caller (which
is __kvmppc_vcore_entry). If the MMU is on and we are returning to
the caller, then we don't need to use an rfid instruction since the
MMU is already on; a simple blr suffices. If there is an external
or hypervisor doorbell interrupt to handle, we branch to the
relocation-on version of the interrupt vector.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:50 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement
With radix, the guest can do TLB invalidations itself using the tlbie
(global) and tlbiel (local) TLB invalidation instructions. Linux guests
use local TLB invalidations for translations that have only ever been
accessed on one vcpu. However, that doesn't mean that the translations
have only been accessed on one physical cpu (pcpu) since vcpus can move
around from one pcpu to another. Thus a tlbiel might leave behind stale
TLB entries on a pcpu where the vcpu previously ran, and if that task
then moves back to that previous pcpu, it could see those stale TLB
entries and thus access memory incorrectly. The usual symptom of this
is random segfaults in userspace programs in the guest.
To cope with this, we detect when a vcpu is about to start executing on
a thread in a core that is a different core from the last time it
executed. If that is the case, then we mark the core as needing a
TLB flush and then send an interrupt to any thread in the core that is
currently running a vcpu from the same guest. This will get those vcpus
out of the guest, and the first one to re-enter the guest will do the
TLB flush. The reason for interrupting the vcpus executing on the old
core is to cope with the following scenario:
CPU 0 CPU 1 CPU 4
(core 0) (core 0) (core 1)
VCPU 0 runs task X VCPU 1 runs
core 0 TLB gets
entries from task X
VCPU 0 moves to CPU 4
VCPU 0 runs task X
Unmap pages of task X
tlbiel
(still VCPU 1) task X moves to VCPU 1
task X runs
task X sees stale TLB
entries
That is, as soon as the VCPU starts executing on the new core, it
could unmap and tlbiel some page table entries, and then the task
could migrate to one of the VCPUs running on the old core and
potentially see stale TLB entries.
Since the TLB is shared between all the threads in a core, we only
use the bit of kvm->arch.need_tlb_flush corresponding to the first
thread in the core. To ensure that we don't have a window where we
can miss a flush, this moves the clearing of the bit from before the
actual flush to after it. This way, two threads might both do the
flush, but we prevent the situation where one thread can enter the
guest before the flush is finished.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:49 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode
If the guest is in radix mode, then it doesn't have a hashed page
table (HPT), so all of the hypercalls that manipulate the HPT can't
work and should return an error. This adds checks to make them
return H_FUNCTION ("function not supported").
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds code to keep track of dirty pages when requested (that is,
when memslot->dirty_bitmap is non-NULL) for radix guests. We use the
dirty bits in the PTEs in the second-level (partition-scoped) page
tables, together with a bitmap of pages that were dirty when their
PTE was invalidated (e.g., when the page was paged out). This bitmap
is stored in the first half of the memslot->dirty_bitmap area, and
kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
bitmap that gets returned to userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:47 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
This adapts our implementations of the MMU notifier callbacks
(unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
to call radix functions when the guest is using radix. These
implementations are much simpler than for HPT guests because we
have only one PTE to deal with, so we don't need to traverse
rmap chains.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:46 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
This adds the code to construct the second-level ("partition-scoped" in
architecturese) page tables for guests using the radix MMU. Apart from
the PGD level, which is allocated when the guest is created, the rest
of the tree is all constructed in response to hypervisor page faults.
As well as hypervisor page faults for missing pages, we also get faults
for reference/change (RC) bits needing to be set, as well as various
other error conditions. For now, we only set the R or C bit in the
guest page table if the same bit is set in the host PTE for the
backing page.
This code can take advantage of the guest being backed with either
transparent or ordinary 2MB huge pages, and insert 2MB page entries
into the guest page tables. There is no support for 1GB huge pages
yet.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds code to branch around the parts that radix guests don't
need - clearing and loading the SLB with the guest SLB contents,
saving the guest SLB contents on exit, and restoring the host SLB
contents.
Since the host is now using radix, we need to save and restore the
host value for the PID register.
On hypervisor data/instruction storage interrupts, we don't do the
guest HPT lookup on radix, but just save the guest physical address
for the fault (from the ASDR register) in the vcpu struct.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:44 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Add basic infrastructure for radix guests
This adds a field in struct kvm_arch and an inline helper to
indicate whether a guest is a radix guest or not, plus a new file
to contain the radix MMU code, which currently contains just a
translate function which knows how to traverse the guest page
tables to translate an address.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:43 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9
POWER9 adds a register called ASDR (Access Segment Descriptor
Register), which is set by hypervisor data/instruction storage
interrupts to contain the segment descriptor for the address
being accessed, assuming the guest is using HPT translation.
(For radix guests, it contains the guest real address of the
access.)
Thus, for HPT guests on POWER9, we can use this register rather
than looking up the SLB with the slbfee. instruction.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:42 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
for HPT guests on POWER9. With this, we can return 1 for the
KVM_CAP_PPC_MMU_HASH_V3 capability.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:41 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest. The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables. (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).
The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.
The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU. It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.
Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:40 +0000 (21:21 +1100)]
powerpc/64: Allow for relocation-on interrupts from guest to host
With host and guest both using radix translation, it is feasible
for the host to take interrupts that come from the guest with
relocation on, and that is in fact what the POWER9 hardware will
do when LPCR[AIL] = 3. All such interrupts use HSRR0/1 not SRR0/1
except for system call with LEV=1 (hcall).
Therefore this adds the KVM tests to the _HV variants of the
relocation-on interrupt handlers, and adds the KVM test to the
relocation-on system call entry point.
We also instantiate the relocation-on versions of the hypervisor
data storage and instruction interrupt handlers, since these can
occur with relocation on in radix guests.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:39 +0000 (21:21 +1100)]
powerpc/64: Make type of partition table flush depend on partition type
When changing a partition table entry on POWER9, we do a particular
form of the tlbie instruction which flushes all TLBs and caches of
the partition table for a given logical partition ID (LPID).
This instruction has a field in the instruction word, labelled R
(radix), which should be 1 if the partition was previously a radix
partition and 0 if it was a HPT partition. This implements that
logic.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:37 +0000 (21:21 +1100)]
powerpc/64: More definitions for POWER9
This adds definitions for bits in the DSISR register which are used
by POWER9 for various translation-related exception conditions, and
for some more bits in the partition table entry that will be needed
by KVM.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:36 +0000 (21:21 +1100)]
powerpc/64: Enable use of radix MMU under hypervisor on POWER9
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).
Then we need to check whether the hypervisor agreed to us using
radix. We need to do this very early on in the kernel boot process
before any of the MMU initialization is done. If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.
Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:35 +0000 (21:21 +1100)]
powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options
This fixes the byte index values for some of the option bits in
the "ibm,architectur-vec-5" property. The "platform facilities options"
bits are in byte 17 not byte 14, so the upper 8 bits of their
definitions need to be 0x11 not 0x0E. The "sub processor support" option
is in byte 21 not byte 15.
Note none of these options are actually looked up in
"ibm,architecture-vec-5" at this time, so there is no bug.
When checking whether option bits are set, we should check that
the offset of the byte being checked is less than the vector
length that we got from the hypervisor.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Mon, 30 Jan 2017 10:21:34 +0000 (21:21 +1100)]
powerpc/64: Don't try to use radix MMU under a hypervisor
Currently, if the kernel is running on a POWER9 processor under a
hypervisor, it will try to use the radix MMU even though it doesn't have
the necessary code to use radix under a hypervisor (it doesn't negotiate
use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
result is that the guest kernel will crash when it tries to turn on the
MMU.
This fixes it by looking for the /chosen/ibm,architecture-vec-5
property, and if it exists, clears the radix MMU feature bit, before we
decide whether to initialize for radix or HPT. This property is created
by the hypervisor as a result of the guest calling the
ibm,client-architecture-support method to indicate its capabilities, so
it will indicate whether the hypervisor agreed to us using radix.
Systems without a hypervisor may have this property also (for example,
skiboot creates it), so we check the HV bit in the MSR to see whether we
are running as a guest or not. If we are in hypervisor mode, then we can
do whatever we like including using the radix MMU.
The reason for using this property is that in future, when we have
support for using radix under a hypervisor, we will need to check this
property to see whether the hypervisor agreed to us using radix.
Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 27 Jan 2017 04:00:34 +0000 (14:00 +1000)]
KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts
64-bit Book3S exception handlers must find the dynamic kernel base
to add to the target address when branching beyond __end_interrupts,
in order to support kernel running at non-0 physical address.
Support this in KVM by branching with CTR, similarly to regular
interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
restored after the branch.
Without this, the host kernel hangs and crashes randomly when it is
running at a non-0 address and a KVM guest is started.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use remove_pagetable() and friends for radix vmemmap removal.
We do not require the special-case handling of vmemmap done in the x86
versions of these functions. This is because vmemmap_free() has already
freed the mapped pages, and calls us with an aligned address range.
So, add a few failsafe WARNs, but otherwise the code to remove physical
mappings is already sufficient for vmemmap.
Reza Arbab [Mon, 16 Jan 2017 19:07:45 +0000 (13:07 -0600)]
powerpc/mm: add radix__remove_section_mapping()
Tear down and free the four-level page tables of physical mappings
during memory hotremove.
Borrow the basic structure of remove_pagetable() and friends from the
identically-named x86 functions. Reduce the frequency of tlb flushes and
page_table_lock spinlocks by only doing them in the outermost function.
There was some question as to whether the locking is needed at all.
Leave it for now, but we could consider dropping it.
Memory must be offline to be removed, thus not in use. So there
shouldn't be the sort of concurrent page walking activity here that
might prompt us to use RCU.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>