[PATCH v3 0/9] Enable STRICT_KERNEL_RWX

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 0/9] Enable STRICT_KERNEL_RWX

Balbir Singh
Enable STRICT_KERNEL_RWX for PPC64/BOOK3S

These patches enable RX mappings of kernel text.
rodata is mapped RX as well as a trade-off, there
are more details in the patch description

As a prerequisite for R/O text, patch_instruction
is moved over to using a separate mapping that
allows write to kernel text. xmon/ftrace/kprobes
have been moved over to work with patch_instruction

There is a bug fix, the updatepp and updateboltedpp
(pseries) providers, did not use flags as described in
PAPR

There are patches for 32 bit support from Christophe Leroy
at http://patchwork.ozlabs.org/patch/768257/. The patches
for map_page to map_kernel_page are a pre-requisite to
this series being applied.

Another build failure was reported, because instead
of using ARCH_HAS_SET_MEMORY as a gate for set_memory.h
inclusion, some of the infrastructure in the core kernel
uses CONFIG_STRICT_KERNEL_RWX. I've sent a fix to the
fix the latter.

Changelog v3:
        Support radix
        Drop ptdump patch, already picked from v2
Changelog v2:
        Support optprobes via patch_instruction

Balbir Singh (9):
  powerpc/lib/code-patching: Enhance code patching
  powerpc/kprobes: Move kprobes over to patch_instruction
  powerpc/kprobes/optprobes: Move over to patch_instruction
  powerpc/xmon: Add patch_instruction supporf for xmon
  powerpc/vmlinux.lds: Align __init_begin to 16M
  powerpc/platform/pseries/lpar: Fix updatepp and updateboltedpp
  powerpc/mm/hash: Implement mark_rodata_ro() for hash
  powerpc/Kconfig: Enable STRICT_KERNEL_RWX
  powerpc/mm/radix: Implement mark_rodata_ro() for radix

 arch/powerpc/Kconfig                       |   1 +
 arch/powerpc/include/asm/book3s/64/hash.h  |   3 +
 arch/powerpc/include/asm/book3s/64/radix.h |   4 +
 arch/powerpc/kernel/kprobes.c              |   4 +-
 arch/powerpc/kernel/optprobes.c            |  58 +++++++-----
 arch/powerpc/kernel/vmlinux.lds.S          |  10 ++-
 arch/powerpc/lib/code-patching.c           | 140 ++++++++++++++++++++++++++++-
 arch/powerpc/mm/pgtable-hash64.c           |  35 ++++++++
 arch/powerpc/mm/pgtable-radix.c            |  73 ++++++++++++++-
 arch/powerpc/mm/pgtable_64.c               |   9 ++
 arch/powerpc/platforms/pseries/lpar.c      |  13 ++-
 arch/powerpc/xmon/xmon.c                   |   7 +-
 12 files changed, 323 insertions(+), 34 deletions(-)

--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 1/9] powerpc/lib/code-patching: Enhance code patching

Balbir Singh
Today our patching happens via direct copy and
patch_instruction. The patching code is well
contained in the sense that copying bits are limited.

While considering implementation of CONFIG_STRICT_RWX,
the first requirement is to a create another mapping
that will allow for patching. We create the window using
text_poke_area, allocated via get_vm_area(), which might
be an overkill. text_poke_area is per CPU to avoid locking
Other arches do similar things, but use fixmaps. The reason
for not using fixmaps is to make use of any randomization in
the future. The code also relies on set_pte_at and pte_clear
to do the appropriate tlb flushing.

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/lib/code-patching.c | 140 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 136 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 500b0f6..c0a0834 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -12,23 +12,154 @@
 #include <linux/vmalloc.h>
 #include <linux/init.h>
 #include <linux/mm.h>
+#include <linux/cpuhotplug.h>
 #include <asm/page.h>
 #include <asm/code-patching.h>
 #include <linux/uaccess.h>
 #include <linux/kprobes.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
 
+static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
+static unsigned int text_area_patch_avail;
 
-int patch_instruction(unsigned int *addr, unsigned int instr)
+static int text_area_cpu_up(unsigned int cpu)
+{
+ struct vm_struct *area;
+
+ area = get_vm_area(PAGE_SIZE, VM_ALLOC);
+ if (!area) {
+ WARN_ONCE(1, "Failed to create text area for cpu %d\n",
+ cpu);
+ return -1;
+ }
+ this_cpu_write(text_poke_area, area);
+ return 0;
+}
+
+static int text_area_cpu_down(unsigned int cpu)
+{
+ free_vm_area(this_cpu_read(text_poke_area));
+ return 0;
+}
+
+/*
+ * This is an early_initcall and early_initcalls happen at the right time
+ * for us, after slab is enabled and before we mark ro pages R/O. In the
+ * future if get_vm_area is randomized, this will be more flexible than
+ * fixmap
+ */
+static int __init setup_text_poke_area(void)
 {
+ struct vm_struct *area;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ area = get_vm_area(PAGE_SIZE, VM_ALLOC);
+ if (!area) {
+ WARN_ONCE(1, "Failed to create text area for cpu %d\n",
+ cpu);
+ /* Should we disable strict rwx? */
+ continue;
+ }
+ this_cpu_write(text_poke_area, area);
+ }
+ cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "powerpc/text_poke:online", text_area_cpu_up,
+ text_area_cpu_down);
+ text_area_patch_avail = 1;
+ /*
+ * The barrier here ensures the write is visible to
+ * patch_instruction()
+ */
+ smp_wmb();
+ pr_info("text_poke area ready...\n");
+ return 0;
+}
+
+/*
+ * This can be called for kernel text or a module.
+ */
+static int kernel_map_addr(void *addr)
+{
+ unsigned long pfn;
  int err;
 
- __put_user_size(instr, addr, 4, err);
+ if (is_vmalloc_addr(addr))
+ pfn = vmalloc_to_pfn(addr);
+ else
+ pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+
+ err = map_kernel_page(
+ (unsigned long)__this_cpu_read(text_poke_area)->addr,
+ (pfn << PAGE_SHIFT), pgprot_val(PAGE_KERNEL));
+ pr_devel("Mapped addr %p with pfn %lx\n",
+ __this_cpu_read(text_poke_area)->addr, pfn);
  if (err)
- return err;
- asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r" (addr));
+ return -1;
  return 0;
 }
 
+static inline void kernel_unmap_addr(void *addr)
+{
+ pte_t *pte;
+ unsigned long kaddr = (unsigned long)addr;
+
+ pte = pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(kaddr),
+ kaddr), kaddr), kaddr);
+ pr_devel("clearing mm %p, pte %p, kaddr %lx\n", &init_mm, pte, kaddr);
+ pte_clear(&init_mm, kaddr, pte);
+ flush_tlb_kernel_range(kaddr, kaddr + PAGE_SIZE);
+}
+
+int patch_instruction(unsigned int *addr, unsigned int instr)
+{
+ int err;
+ unsigned int *dest = NULL;
+ unsigned long flags;
+ unsigned long kaddr = (unsigned long)addr;
+
+ /*
+ * Make sure we can see any write of text_area_patch_avail
+ */
+ smp_rmb();
+
+ /*
+ * During early early boot patch_instruction is called
+ * when text_poke_area is not ready, but we still need
+ * to allow patching. We just do the plain old patching
+ * We use text_area_patch_avail, since per cpu read
+ * via __this_cpu_read of text_poke_area might not
+ * yet be available.
+ * TODO: Make text_area_patch_avail per cpu?
+ */
+ if (!text_area_patch_avail) {
+ __put_user_size(instr, addr, 4, err);
+ if (err)
+ return err;
+ asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (addr));
+ return 0;
+ }
+
+ local_irq_save(flags);
+ if (kernel_map_addr(addr)) {
+ err = -1;
+ goto out;
+ }
+
+ dest = (unsigned int *)(__this_cpu_read(text_poke_area)->addr) +
+ ((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
+ __put_user_size(instr, dest, 4, err);
+ if (!err)
+ asm ("dcbst 0, %0; sync; icbi 0,%0; icbi 0,%1; sync; isync"
+ ::"r" (dest), "r"(addr));
+ kernel_unmap_addr(__this_cpu_read(text_poke_area)->addr);
+out:
+ local_irq_restore(flags);
+ return err;
+}
+NOKPROBE_SYMBOL(patch_instruction);
+
 int patch_branch(unsigned int *addr, unsigned long target, int flags)
 {
  return patch_instruction(addr, create_branch(addr, target, flags));
@@ -514,3 +645,4 @@ static int __init test_code_patching(void)
 late_initcall(test_code_patching);
 
 #endif /* CONFIG_CODE_PATCHING_SELFTEST */
+early_initcall(setup_text_poke_area);
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 2/9] powerpc/kprobes: Move kprobes over to patch_instruction

Balbir Singh
In reply to this post by Balbir Singh
arch_arm/disarm_probe use direct assignment for copying
instructions, replace them with patch_instruction

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/kernel/kprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index fc43435..b49f8f0 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -158,7 +158,7 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
 
 void arch_arm_kprobe(struct kprobe *p)
 {
- *p->addr = BREAKPOINT_INSTRUCTION;
+ patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
  flush_icache_range((unsigned long) p->addr,
    (unsigned long) p->addr + sizeof(kprobe_opcode_t));
 }
@@ -166,7 +166,7 @@ NOKPROBE_SYMBOL(arch_arm_kprobe);
 
 void arch_disarm_kprobe(struct kprobe *p)
 {
- *p->addr = p->opcode;
+ patch_instruction(p->addr, p->opcode);
  flush_icache_range((unsigned long) p->addr,
    (unsigned long) p->addr + sizeof(kprobe_opcode_t));
 }
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 3/9] powerpc/kprobes/optprobes: Move over to patch_instruction

Balbir Singh
In reply to this post by Balbir Singh
With text moving to read-only migrate optprobes to using
the patch_instruction infrastructure. Without this optprobes
will fail and complain.

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/kernel/optprobes.c | 58 ++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index ec60ed0..1c7326c 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -158,12 +158,13 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
 void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
 {
  /* addis r4,0,(insn)@h */
- *addr++ = PPC_INST_ADDIS | ___PPC_RT(4) |
-  ((val >> 16) & 0xffff);
+ patch_instruction((unsigned int *)addr, PPC_INST_ADDIS | ___PPC_RT(4) |
+ ((val >> 16) & 0xffff));
+ addr++;
 
  /* ori r4,r4,(insn)@l */
- *addr = PPC_INST_ORI | ___PPC_RA(4) | ___PPC_RS(4) |
- (val & 0xffff);
+ patch_instruction((unsigned int *)addr, PPC_INST_ORI | ___PPC_RA(4) |
+ ___PPC_RS(4) | (val & 0xffff));
 }
 
 /*
@@ -173,24 +174,28 @@ void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
 void patch_imm64_load_insns(unsigned long val, kprobe_opcode_t *addr)
 {
  /* lis r3,(op)@highest */
- *addr++ = PPC_INST_ADDIS | ___PPC_RT(3) |
-  ((val >> 48) & 0xffff);
+ patch_instruction((unsigned int *)addr, PPC_INST_ADDIS | ___PPC_RT(3) |
+ ((val >> 48) & 0xffff));
+ addr++;
 
  /* ori r3,r3,(op)@higher */
- *addr++ = PPC_INST_ORI | ___PPC_RA(3) | ___PPC_RS(3) |
-  ((val >> 32) & 0xffff);
+ patch_instruction((unsigned int *)addr, PPC_INST_ORI | ___PPC_RA(3) |
+ ___PPC_RS(3) | ((val >> 32) & 0xffff));
+ addr++;
 
  /* rldicr r3,r3,32,31 */
- *addr++ = PPC_INST_RLDICR | ___PPC_RA(3) | ___PPC_RS(3) |
-  __PPC_SH64(32) | __PPC_ME64(31);
+ patch_instruction((unsigned int *)addr, PPC_INST_RLDICR | ___PPC_RA(3) |
+ ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31));
+ addr++;
 
  /* oris r3,r3,(op)@h */
- *addr++ = PPC_INST_ORIS | ___PPC_RA(3) | ___PPC_RS(3) |
-  ((val >> 16) & 0xffff);
+ patch_instruction((unsigned int *)addr, PPC_INST_ORIS | ___PPC_RA(3) |
+ ___PPC_RS(3) | ((val >> 16) & 0xffff));
+ addr++;
 
  /* ori r3,r3,(op)@l */
- *addr = PPC_INST_ORI | ___PPC_RA(3) | ___PPC_RS(3) |
- (val & 0xffff);
+ patch_instruction((unsigned int *)addr, PPC_INST_ORI | ___PPC_RA(3) |
+ ___PPC_RS(3) | (val & 0xffff));
 }
 
 int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
@@ -198,7 +203,8 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
  kprobe_opcode_t *buff, branch_op_callback, branch_emulate_step;
  kprobe_opcode_t *op_callback_addr, *emulate_step_addr;
  long b_offset;
- unsigned long nip;
+ unsigned long nip, size;
+ int rc, i;
 
  kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
 
@@ -231,8 +237,15 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
  goto error;
 
  /* Setup template */
- memcpy(buff, optprobe_template_entry,
- TMPL_END_IDX * sizeof(kprobe_opcode_t));
+ /* We can optimize this via patch_instruction_window later */
+ size = (TMPL_END_IDX * sizeof(kprobe_opcode_t)) / sizeof(int);
+ pr_devel("Copying template to %p, size %lu\n", buff, size);
+ for (i = 0; i < size; i++) {
+ rc = patch_instruction((unsigned int *)buff + i,
+ *((unsigned int *)(optprobe_template_entry) + i));
+ if (rc < 0)
+ goto error;
+ }
 
  /*
  * Fixup the template with instructions to:
@@ -261,8 +274,10 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
  if (!branch_op_callback || !branch_emulate_step)
  goto error;
 
- buff[TMPL_CALL_HDLR_IDX] = branch_op_callback;
- buff[TMPL_EMULATE_IDX] = branch_emulate_step;
+ patch_instruction((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
+ branch_op_callback);
+ patch_instruction((unsigned int *)buff + TMPL_EMULATE_IDX,
+ branch_emulate_step);
 
  /*
  * 3. load instruction to be emulated into relevant register, and
@@ -272,8 +287,9 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
  /*
  * 4. branch back from trampoline
  */
- buff[TMPL_RET_IDX] = create_branch((unsigned int *)buff + TMPL_RET_IDX,
- (unsigned long)nip, 0);
+ patch_instruction((unsigned int *)buff + TMPL_RET_IDX,
+ create_branch((unsigned int *)buff +
+ TMPL_RET_IDX, (unsigned long)nip, 0));
 
  flush_icache_range((unsigned long)buff,
    (unsigned long)(&buff[TMPL_END_IDX]));
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 4/9] powerpc/xmon: Add patch_instruction supporf for xmon

Balbir Singh
In reply to this post by Balbir Singh
Move from mwrite() to patch_instruction() for xmon for
breakpoint addition and removal.

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/xmon/xmon.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a728e19..08e367e 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -53,6 +53,7 @@
 #include <asm/xive.h>
 #include <asm/opal.h>
 #include <asm/firmware.h>
+#include <asm/code-patching.h>
 
 #ifdef CONFIG_PPC64
 #include <asm/hvcall.h>
@@ -837,7 +838,8 @@ static void insert_bpts(void)
  store_inst(&bp->instr[0]);
  if (bp->enabled & BP_CIABR)
  continue;
- if (mwrite(bp->address, &bpinstr, 4) != 4) {
+ if (patch_instruction((unsigned int *)bp->address,
+ bpinstr) != 0) {
  printf("Couldn't write instruction at %lx, "
        "disabling breakpoint there\n", bp->address);
  bp->enabled &= ~BP_TRAP;
@@ -874,7 +876,8 @@ static void remove_bpts(void)
  continue;
  if (mread(bp->address, &instr, 4) == 4
     && instr == bpinstr
-    && mwrite(bp->address, &bp->instr, 4) != 4)
+    && patch_instruction(
+ (unsigned int *)bp->address, bp->instr[0]) != 0)
  printf("Couldn't remove breakpoint at %lx\n",
        bp->address);
  else
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 5/9] powerpc/vmlinux.lds: Align __init_begin to 16M

Balbir Singh
In reply to this post by Balbir Singh
For CONFIG_STRICT_KERNEL_RWX align __init_begin to 16M.
We use 16M since its the larger of 2M on radix and 16M
on hash for our linear mapping. The plan is to have
.text, .rodata and everything upto __init_begin marked
as RX. Note we still have executable read only data.
We could further align read only data to another 16M
boundary. I've used keeping text plus rodata as
read-only-executable as a trade-off to doing
read-only-executable for text and read-only for
rodata.

We don't use multi PT_LOAD in PHDRS because we are
not sure if all bootloaders support them

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/kernel/vmlinux.lds.S | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index ace6b65..b1a2505 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -8,6 +8,12 @@
 #include <asm/cache.h>
 #include <asm/thread_info.h>
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+#define STRICT_ALIGN_SIZE (1 << 24)
+#else
+#define STRICT_ALIGN_SIZE PAGE_SIZE
+#endif
+
 ENTRY(_stext)
 
 PHDRS {
@@ -123,7 +129,7 @@ SECTIONS
  PROVIDE32 (etext = .);
 
  /* Read-only data */
- RODATA
+ RO_DATA(PAGE_SIZE)
 
  EXCEPTION_TABLE(0)
 
@@ -140,7 +146,7 @@ SECTIONS
 /*
  * Init sections discarded at runtime
  */
- . = ALIGN(PAGE_SIZE);
+ . = ALIGN(STRICT_ALIGN_SIZE);
  __init_begin = .;
  INIT_TEXT_SECTION(PAGE_SIZE) :kernel
 
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 6/9] powerpc/platform/pseries/lpar: Fix updatepp and updateboltedpp

Balbir Singh
In reply to this post by Balbir Singh
PAPR has pp0 in bit 55, currently we assumed that bit
pp0 is bit 0 (all bits in IBM order). This patch fixes
the pp0 bits for both these routines that use H_PROTECT

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/platforms/pseries/lpar.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 6541d0b..83db643 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -301,7 +301,7 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
        int ssize, unsigned long inv_flags)
 {
  unsigned long lpar_rc;
- unsigned long flags = (newpp & 7) | H_AVPN;
+ unsigned long flags;
  unsigned long want_v;
 
  want_v = hpte_encode_avpn(vpn, psize, ssize);
@@ -309,6 +309,11 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
  pr_devel("    update: avpnv=%016lx, hash=%016lx, f=%lx, psize: %d ...",
  want_v, slot, flags, psize);
 
+ /*
+ * Move pp0 and set the mask, pp0 is bit 55
+ * We ignore the keys for now.
+ */
+ flags = ((newpp & HPTE_R_PP0) >> 55) | (newpp & 7) | H_AVPN;
  lpar_rc = plpar_pte_protect(flags, slot, want_v);
 
  if (lpar_rc == H_NOT_FOUND) {
@@ -379,7 +384,11 @@ static void pSeries_lpar_hpte_updateboltedpp(unsigned long newpp,
  slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
  BUG_ON(slot == -1);
 
- flags = newpp & 7;
+ /*
+ * Move pp0 and set the mask, pp0 is bit 55
+ * We ignore the keys for now.
+ */
+ flags = ((newpp & HPTE_R_PP0) >> 55) | (newpp & 7);
  lpar_rc = plpar_pte_protect(flags, slot, 0);
 
  BUG_ON(lpar_rc != H_SUCCESS);
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 7/9] powerpc/mm/hash: Implement mark_rodata_ro() for hash

Balbir Singh
In reply to this post by Balbir Singh
With hash we update the bolted pte to mark it read-only. We rely
on the MMU_FTR_KERNEL_RO to generate the correct permissions
for read-only text. The radix implementation just prints a warning
in this implementation

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/include/asm/book3s/64/hash.h  |  3 +++
 arch/powerpc/include/asm/book3s/64/radix.h |  4 ++++
 arch/powerpc/mm/pgtable-hash64.c           | 35 ++++++++++++++++++++++++++++++
 arch/powerpc/mm/pgtable-radix.c            |  7 ++++++
 arch/powerpc/mm/pgtable_64.c               |  9 ++++++++
 5 files changed, 58 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index 4e957b0..0ce513f 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -89,6 +89,9 @@ static inline int hash__pgd_bad(pgd_t pgd)
 {
  return (pgd_val(pgd) == 0);
 }
+#ifdef CONFIG_STRICT_KERNEL_RWX
+extern void hash__mark_rodata_ro(void);
+#endif
 
 extern void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
     pte_t *ptep, unsigned long pte, int huge);
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index ac16d19..368cb54 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -116,6 +116,10 @@
 #define RADIX_PUD_TABLE_SIZE (sizeof(pud_t) << RADIX_PUD_INDEX_SIZE)
 #define RADIX_PGD_TABLE_SIZE (sizeof(pgd_t) << RADIX_PGD_INDEX_SIZE)
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+extern void radix__mark_rodata_ro(void);
+#endif
+
 static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
        unsigned long set)
 {
diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c
index 8b85a14..dda808b 100644
--- a/arch/powerpc/mm/pgtable-hash64.c
+++ b/arch/powerpc/mm/pgtable-hash64.c
@@ -11,8 +11,12 @@
 
 #include <linux/sched.h>
 #include <linux/mm_types.h>
+#include <linux/mm.h>
 
 #include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/sections.h>
+#include <asm/mmu.h>
 #include <asm/tlb.h>
 
 #include "mmu_decl.h"
@@ -342,3 +346,34 @@ int hash__has_transparent_hugepage(void)
  return 1;
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+#ifdef CONFIG_STRICT_KERNEL_RWX
+void hash__mark_rodata_ro(void)
+{
+ unsigned long start = (unsigned long)_stext;
+ unsigned long end = (unsigned long)__init_begin;
+ unsigned long idx;
+ unsigned int step, shift;
+ unsigned long newpp = PP_RXXX;
+
+ if (!mmu_has_feature(MMU_FTR_KERNEL_RO)) {
+ pr_info("R/O rodata not supported\n");
+ return;
+ }
+
+ shift = mmu_psize_defs[mmu_linear_psize].shift;
+ step = 1 << shift;
+
+ start = ((start + step - 1) >> shift) << shift;
+ end = (end >> shift) << shift;
+
+ pr_devel("marking ro start %lx, end %lx, step %x\n",
+ start, end, step);
+
+ for (idx = start; idx < end; idx += step)
+ /* Not sure if we can do much with the return value */
+ mmu_hash_ops.hpte_updateboltedpp(newpp, idx, mmu_linear_psize,
+ mmu_kernel_ssize);
+
+}
+#endif
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index c28165d..8f42309 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -108,6 +108,13 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
  return 0;
 }
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+void radix__mark_rodata_ro(void)
+{
+ pr_warn("Not yet implemented for radix\n");
+}
+#endif
+
 static inline void __meminit print_mapping(unsigned long start,
    unsigned long end,
    unsigned long size)
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index db93cf7..a769e4f 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -479,3 +479,12 @@ void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
 }
 EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
 #endif /* CONFIG_PPC_BOOK3S_64 */
+
+#ifdef CONFIG_STRICT_KERNEL_RWX
+void mark_rodata_ro(void)
+{
+ if (radix_enabled())
+ return radix__mark_rodata_ro();
+ return hash__mark_rodata_ro();
+}
+#endif
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 8/9] powerpc/Kconfig: Enable STRICT_KERNEL_RWX

Balbir Singh
In reply to this post by Balbir Singh
We have the basic support in the form of patching R/O
text sections, linker scripts for extending alignment
of text data. We've also got mark_rodata_ro()

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a81460b..4803792 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -171,6 +171,7 @@ config PPC
  select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
  select HAVE_ARCH_SECCOMP_FILTER
  select HAVE_ARCH_TRACEHOOK
+ select ARCH_HAS_STRICT_KERNEL_RWX if PPC64 && PPC_BOOK3S
  select HAVE_CBPF_JIT if !PPC64
  select HAVE_CONTEXT_TRACKING if PPC64
  select HAVE_DEBUG_KMEMLEAK
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH v3 9/9] powerpc/mm/radix: Implement mark_rodata_ro() for radix

Balbir Singh
In reply to this post by Balbir Singh
The patch splits the linear page mapping such that
the ones with kernel text are mapped as 2M and others
are mapped with the largest possible size - 1G. The downside
of this is that we split a 1G mapping into 512 2M mappings
for the kernel, but in the absence of that we cannot support
R/O areas in 1G, the kernel size is much smaller and using
1G as the granularity will waste a lot of space at the cost
of optimizing the TLB. The text itself should fit into about
6-8 mappings, so the effect should not be all that bad.

Signed-off-by: Balbir Singh <[hidden email]>
---
 arch/powerpc/mm/pgtable-radix.c | 68 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 8f42309..7c46dbc 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -11,6 +11,7 @@
 #include <linux/sched/mm.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
+#include <linux/mm.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -19,9 +20,12 @@
 #include <asm/mmu.h>
 #include <asm/firmware.h>
 #include <asm/powernv.h>
+#include <asm/sections.h>
 
 #include <trace/events/thp.h>
 
+int mmu_radix_linear_psize = PAGE_SIZE;
+
 static int native_register_process_table(unsigned long base, unsigned long pg_sz,
  unsigned long table_size)
 {
@@ -111,7 +115,52 @@ int radix__map_kernel_page(unsigned long ea, unsigned long pa,
 #ifdef CONFIG_STRICT_KERNEL_RWX
 void radix__mark_rodata_ro(void)
 {
- pr_warn("Not yet implemented for radix\n");
+ unsigned long start = (unsigned long)_stext;
+ unsigned long end = (unsigned long)__init_begin;
+ unsigned long idx;
+ unsigned int step, shift;
+ pgd_t *pgdp;
+ pud_t *pudp;
+ pmd_t *pmdp;
+ pte_t *ptep;
+
+ if (!mmu_has_feature(MMU_FTR_KERNEL_RO)) {
+ pr_info("R/O rodata not supported\n");
+ return;
+ }
+
+ shift = mmu_psize_defs[mmu_radix_linear_psize].shift;
+ step = 1 << shift;
+
+ start = ((start + step - 1) >> shift) << shift;
+ end = (end >> shift) << shift;
+
+ pr_devel("marking ro start %lx, end %lx, step %x\n",
+ start, end, step);
+
+ for (idx = start; idx < end; idx += step) {
+ pgdp = pgd_offset_k(idx);
+ pudp = pud_alloc(&init_mm, pgdp, idx);
+ if (!pudp)
+ continue;
+ if (pud_huge(*pudp)) {
+ ptep = (pte_t *)pudp;
+ goto update_the_pte;
+ }
+ pmdp = pmd_alloc(&init_mm, pudp, idx);
+ if (!pmdp)
+ continue;
+ if (pmd_huge(*pmdp)) {
+ ptep = pmdp_ptep(pmdp);
+ goto update_the_pte;
+ }
+ ptep = pte_alloc_kernel(pmdp, idx);
+ if (!ptep)
+ continue;
+update_the_pte:
+ pte_update(&init_mm, idx, ptep, _PAGE_WRITE, 0, 0);
+ }
+
 }
 #endif
 
@@ -129,6 +178,7 @@ static int __meminit create_physical_mapping(unsigned long start,
      unsigned long end)
 {
  unsigned long addr, mapping_size = 0;
+ unsigned long max_mapping_size;
 
  start = _ALIGN_UP(start, PAGE_SIZE);
  for (addr = start; addr < end; addr += mapping_size) {
@@ -137,9 +187,12 @@ static int __meminit create_physical_mapping(unsigned long start,
 
  gap = end - addr;
  previous_size = mapping_size;
+ max_mapping_size = PUD_SIZE;
 
+retry:
  if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
-    mmu_psize_defs[MMU_PAGE_1G].shift)
+    mmu_psize_defs[MMU_PAGE_1G].shift &&
+    PUD_SIZE <= max_mapping_size)
  mapping_size = PUD_SIZE;
  else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
  mmu_psize_defs[MMU_PAGE_2M].shift)
@@ -147,6 +200,17 @@ static int __meminit create_physical_mapping(unsigned long start,
  else
  mapping_size = PAGE_SIZE;
 
+ if (mapping_size == PUD_SIZE &&
+ addr <= __pa_symbol(__init_begin) &&
+ (addr + mapping_size) >= __pa_symbol(_stext)) {
+ max_mapping_size = PMD_SIZE;
+ goto retry;
+ }
+
+ if (addr <= __pa_symbol(__init_begin) &&
+ (addr + mapping_size) >= __pa_symbol(_stext))
+ mmu_radix_linear_psize = mapping_size;
+
  if (mapping_size != previous_size) {
  print_mapping(start, addr, previous_size);
  start = addr;
--
2.9.4

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 2/9] powerpc/kprobes: Move kprobes over to patch_instruction

Naveen N. Rao
In reply to this post by Balbir Singh
Hi Balbir,

On 2017/06/06 02:29PM, Balbir Singh wrote:

> arch_arm/disarm_probe use direct assignment for copying
> instructions, replace them with patch_instruction
>
> Signed-off-by: Balbir Singh <[hidden email]>
> ---
>  arch/powerpc/kernel/kprobes.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index fc43435..b49f8f0 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -158,7 +158,7 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
>
>  void arch_arm_kprobe(struct kprobe *p)
>  {
> - *p->addr = BREAKPOINT_INSTRUCTION;
> + patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
>   flush_icache_range((unsigned long) p->addr,
>     (unsigned long) p->addr + sizeof(kprobe_opcode_t));

Do we still need flush_icache_range() after patch_instruction()?

- Naveen

>  }
> @@ -166,7 +166,7 @@ NOKPROBE_SYMBOL(arch_arm_kprobe);
>
>  void arch_disarm_kprobe(struct kprobe *p)
>  {
> - *p->addr = p->opcode;
> + patch_instruction(p->addr, p->opcode);
>   flush_icache_range((unsigned long) p->addr,
>     (unsigned long) p->addr + sizeof(kprobe_opcode_t));
>  }
> --
> 2.9.4
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 3/9] powerpc/kprobes/optprobes: Move over to patch_instruction

Naveen N. Rao
In reply to this post by Balbir Singh
On 2017/06/06 02:29PM, Balbir Singh wrote:

> With text moving to read-only migrate optprobes to using
> the patch_instruction infrastructure. Without this optprobes
> will fail and complain.
>
> Signed-off-by: Balbir Singh <[hidden email]>
> ---
>  arch/powerpc/kernel/optprobes.c | 58 ++++++++++++++++++++++++++---------------
>  1 file changed, 37 insertions(+), 21 deletions(-)
>
> diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
> index ec60ed0..1c7326c 100644
> --- a/arch/powerpc/kernel/optprobes.c
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -158,12 +158,13 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
>  void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
>  {
>   /* addis r4,0,(insn)@h */
> - *addr++ = PPC_INST_ADDIS | ___PPC_RT(4) |
> -  ((val >> 16) & 0xffff);
> + patch_instruction((unsigned int *)addr, PPC_INST_ADDIS | ___PPC_RT(4) |

We can probably get rid of those casts, seeing as we're not using it in
kprobes.c.

> + ((val >> 16) & 0xffff));
> + addr++;
>
>   /* ori r4,r4,(insn)@l */
> - *addr = PPC_INST_ORI | ___PPC_RA(4) | ___PPC_RS(4) |
> - (val & 0xffff);
> + patch_instruction((unsigned int *)addr, PPC_INST_ORI | ___PPC_RA(4) |
> + ___PPC_RS(4) | (val & 0xffff));
>  }
>
>  /*
> @@ -173,24 +174,28 @@ void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
>  void patch_imm64_load_insns(unsigned long val, kprobe_opcode_t *addr)
>  {
>   /* lis r3,(op)@highest */
> - *addr++ = PPC_INST_ADDIS | ___PPC_RT(3) |
> -  ((val >> 48) & 0xffff);
> + patch_instruction((unsigned int *)addr, PPC_INST_ADDIS | ___PPC_RT(3) |
> + ((val >> 48) & 0xffff));
> + addr++;
>
>   /* ori r3,r3,(op)@higher */
> - *addr++ = PPC_INST_ORI | ___PPC_RA(3) | ___PPC_RS(3) |
> -  ((val >> 32) & 0xffff);
> + patch_instruction((unsigned int *)addr, PPC_INST_ORI | ___PPC_RA(3) |
> + ___PPC_RS(3) | ((val >> 32) & 0xffff));
> + addr++;
>
>   /* rldicr r3,r3,32,31 */
> - *addr++ = PPC_INST_RLDICR | ___PPC_RA(3) | ___PPC_RS(3) |
> -  __PPC_SH64(32) | __PPC_ME64(31);
> + patch_instruction((unsigned int *)addr, PPC_INST_RLDICR | ___PPC_RA(3) |
> + ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31));
> + addr++;
>
>   /* oris r3,r3,(op)@h */
> - *addr++ = PPC_INST_ORIS | ___PPC_RA(3) | ___PPC_RS(3) |
> -  ((val >> 16) & 0xffff);
> + patch_instruction((unsigned int *)addr, PPC_INST_ORIS | ___PPC_RA(3) |
> + ___PPC_RS(3) | ((val >> 16) & 0xffff));
> + addr++;
>
>   /* ori r3,r3,(op)@l */
> - *addr = PPC_INST_ORI | ___PPC_RA(3) | ___PPC_RS(3) |
> - (val & 0xffff);
> + patch_instruction((unsigned int *)addr, PPC_INST_ORI | ___PPC_RA(3) |
> + ___PPC_RS(3) | (val & 0xffff));
>  }
>
>  int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
> @@ -198,7 +203,8 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
>   kprobe_opcode_t *buff, branch_op_callback, branch_emulate_step;
>   kprobe_opcode_t *op_callback_addr, *emulate_step_addr;
>   long b_offset;
> - unsigned long nip;
> + unsigned long nip, size;
> + int rc, i;
>
>   kprobe_ppc_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
>
> @@ -231,8 +237,15 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
>   goto error;
>
>   /* Setup template */
> - memcpy(buff, optprobe_template_entry,
> - TMPL_END_IDX * sizeof(kprobe_opcode_t));
> + /* We can optimize this via patch_instruction_window later */

This probably needs a TODO just so it's clear. I do think this would be
good to add since we copy many instructions while setting up the
optprobe, so this is quite slow as it exists today.

> + size = (TMPL_END_IDX * sizeof(kprobe_opcode_t)) / sizeof(int);

That's just TMPL_END_IDX.

Thanks,
Naveen

> + pr_devel("Copying template to %p, size %lu\n", buff, size);
> + for (i = 0; i < size; i++) {
> + rc = patch_instruction((unsigned int *)buff + i,
> + *((unsigned int *)(optprobe_template_entry) + i));
> + if (rc < 0)
> + goto error;
> + }
>
>   /*
>   * Fixup the template with instructions to:
> @@ -261,8 +274,10 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
>   if (!branch_op_callback || !branch_emulate_step)
>   goto error;
>
> - buff[TMPL_CALL_HDLR_IDX] = branch_op_callback;
> - buff[TMPL_EMULATE_IDX] = branch_emulate_step;
> + patch_instruction((unsigned int *)buff + TMPL_CALL_HDLR_IDX,
> + branch_op_callback);
> + patch_instruction((unsigned int *)buff + TMPL_EMULATE_IDX,
> + branch_emulate_step);
>
>   /*
>   * 3. load instruction to be emulated into relevant register, and
> @@ -272,8 +287,9 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
>   /*
>   * 4. branch back from trampoline
>   */
> - buff[TMPL_RET_IDX] = create_branch((unsigned int *)buff + TMPL_RET_IDX,
> - (unsigned long)nip, 0);
> + patch_instruction((unsigned int *)buff + TMPL_RET_IDX,
> + create_branch((unsigned int *)buff +
> + TMPL_RET_IDX, (unsigned long)nip, 0));
>
>   flush_icache_range((unsigned long)buff,
>     (unsigned long)(&buff[TMPL_END_IDX]));
> --
> 2.9.4
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 3/9] powerpc/kprobes/optprobes: Move over to patch_instruction

Balbir Singh
On Wed, 2017-06-07 at 00:42 +0530, Naveen N. Rao wrote:
> On 2017/06/06 02:29PM, Balbir Singh wrote:
> > With text moving to read-only migrate optprobes to using
> > the patch_instruction infrastructure. Without this optprobes
> > will fail and complain.
> >
<snip>
> > + /* We can optimize this via patch_instruction_window later */
>
> This probably needs a TODO just so it's clear. I do think this would be
> good to add since we copy many instructions while setting up the
> optprobe, so this is quite slow as it exists today.

I made it read like a TODO, with a TOOD:

>
> > + size = (TMPL_END_IDX * sizeof(kprobe_opcode_t)) / sizeof(int);
>
> That's just TMPL_END_IDX.
>

The entire kprobe_opcode_t and int types thing is a bit messy. The size
calculation assumes nothing for now in terms of sizes, but I guess the
patch_instruction does. Another TODO?

Thanks for the review
Balbir Singh.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 2/9] powerpc/kprobes: Move kprobes over to patch_instruction

Balbir Singh
In reply to this post by Naveen N. Rao
On Wed, 2017-06-07 at 00:35 +0530, Naveen N. Rao wrote:

> Hi Balbir,
>
> On 2017/06/06 02:29PM, Balbir Singh wrote:
> > arch_arm/disarm_probe use direct assignment for copying
> > instructions, replace them with patch_instruction
> >
> > Signed-off-by: Balbir Singh <[hidden email]>
> > ---
> >  arch/powerpc/kernel/kprobes.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> > index fc43435..b49f8f0 100644
> > --- a/arch/powerpc/kernel/kprobes.c
> > +++ b/arch/powerpc/kernel/kprobes.c
> > @@ -158,7 +158,7 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
> >
> >  void arch_arm_kprobe(struct kprobe *p)
> >  {
> > - *p->addr = BREAKPOINT_INSTRUCTION;
> > + patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
> >   flush_icache_range((unsigned long) p->addr,
> >     (unsigned long) p->addr + sizeof(kprobe_opcode_t));
>
> Do we still need flush_icache_range() after patch_instruction()?
>

Good catch! No, we don't

Balbir Singh.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 3/9] powerpc/kprobes/optprobes: Move over to patch_instruction

Naveen N. Rao
In reply to this post by Balbir Singh
On 2017/06/07 03:46PM, Balbir Singh wrote:

> On Wed, 2017-06-07 at 00:42 +0530, Naveen N. Rao wrote:
> > On 2017/06/06 02:29PM, Balbir Singh wrote:
> > > With text moving to read-only migrate optprobes to using
> > > the patch_instruction infrastructure. Without this optprobes
> > > will fail and complain.
> > >
> <snip>
> > > + /* We can optimize this via patch_instruction_window later */
> >
> > This probably needs a TODO just so it's clear. I do think this would be
> > good to add since we copy many instructions while setting up the
> > optprobe, so this is quite slow as it exists today.
>
> I made it read like a TODO, with a TOOD:

Yes, just a nit that it would be good to have an explicit 'TODO:' there,
without which it looks like any other comment...

But, thinking about this more, we could probably simplify this by having
optprobes set things up in a local buffer before copying it into the
instruction slot. That will also make it easier for the subsequent
support to patch many instructions at once.

>
> >
> > > + size = (TMPL_END_IDX * sizeof(kprobe_opcode_t)) / sizeof(int);
> >
> > That's just TMPL_END_IDX.
> >
>
> The entire kprobe_opcode_t and int types thing is a bit messy. The size

Right - it's an artifact of kprobes generic infrastructure. For powerpc,
it's always an integer.

> calculation assumes nothing for now in terms of sizes, but I guess the
> patch_instruction does. Another TODO?

Not sure I follow...


- Naveen

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 2/9] powerpc/kprobes: Move kprobes over to patch_instruction

Michael Ellerman-2
In reply to this post by Balbir Singh
Balbir Singh <[hidden email]> writes:

> On Wed, 2017-06-07 at 00:35 +0530, Naveen N. Rao wrote:
>> Hi Balbir,
>>
>> On 2017/06/06 02:29PM, Balbir Singh wrote:
>> > arch_arm/disarm_probe use direct assignment for copying
>> > instructions, replace them with patch_instruction
>> >
>> > Signed-off-by: Balbir Singh <[hidden email]>
>> > ---
>> >  arch/powerpc/kernel/kprobes.c | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
>> > index fc43435..b49f8f0 100644
>> > --- a/arch/powerpc/kernel/kprobes.c
>> > +++ b/arch/powerpc/kernel/kprobes.c
>> > @@ -158,7 +158,7 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
>> >
>> >  void arch_arm_kprobe(struct kprobe *p)
>> >  {
>> > - *p->addr = BREAKPOINT_INSTRUCTION;
>> > + patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
>> >   flush_icache_range((unsigned long) p->addr,
>> >     (unsigned long) p->addr + sizeof(kprobe_opcode_t));
>>
>> Do we still need flush_icache_range() after patch_instruction()?
>
> Good catch! No, we don't

I picked this up independently.

cheers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 1/9] powerpc/lib/code-patching: Enhance code patching

Michael Ellerman-2
In reply to this post by Balbir Singh
Balbir Singh <[hidden email]> writes:

> Today our patching happens via direct copy and
> patch_instruction. The patching code is well
> contained in the sense that copying bits are limited.
>
> While considering implementation of CONFIG_STRICT_RWX,
> the first requirement is to a create another mapping
> that will allow for patching. We create the window using
> text_poke_area, allocated via get_vm_area(), which might
> be an overkill. text_poke_area is per CPU to avoid locking
> Other arches do similar things, but use fixmaps. The reason
> for not using fixmaps is to make use of any randomization in
> the future. The code also relies on set_pte_at and pte_clear
> to do the appropriate tlb flushing.
>
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> index 500b0f6..c0a0834 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -12,23 +12,154 @@
>  #include <linux/vmalloc.h>
>  #include <linux/init.h>
>  #include <linux/mm.h>
> +#include <linux/cpuhotplug.h>
>  #include <asm/page.h>
>  #include <asm/code-patching.h>
>  #include <linux/uaccess.h>
>  #include <linux/kprobes.h>
> +#include <asm/pgtable.h>
> +#include <asm/tlbflush.h>
>  
> +static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> +static unsigned int text_area_patch_avail;

All of this should be under #ifdef STRICT_RWX. So that when STRICT_RWX=n
we basically use the old code.

> -int patch_instruction(unsigned int *addr, unsigned int instr)
> +static int text_area_cpu_up(unsigned int cpu)
> +{
> + struct vm_struct *area;
> +
> + area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> + if (!area) {
> + WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> + cpu);
> + return -1;

This is good, it will block bringing up a CPU if we can't get the VM
area, which is the safe option.

> + }
> + this_cpu_write(text_poke_area, area);
> + return 0;
> +}
> +
> +static int text_area_cpu_down(unsigned int cpu)
> +{
> + free_vm_area(this_cpu_read(text_poke_area));
> + return 0;
> +}
> +
> +/*
> + * This is an early_initcall and early_initcalls happen at the right time
> + * for us, after slab is enabled and before we mark ro pages R/O. In the
> + * future if get_vm_area is randomized, this will be more flexible than
> + * fixmap
> + */
> +static int __init setup_text_poke_area(void)
>  {
> + struct vm_struct *area;
> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> + if (!area) {
> + WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> + cpu);
> + /* Should we disable strict rwx? */
> + continue;
> + }
> + this_cpu_write(text_poke_area, area);
> + }

So I think skip this.

> + cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> + "powerpc/text_poke:online", text_area_cpu_up,
> + text_area_cpu_down);

Use the cpuhp_setup_state() version, which will call it on the boot CPU.
And then just BUG_ON() if it fails.

Also switch this to a late_initcall(), so that if we do BUG_ON() it's
nice and late and the kernel is mostly up.

> + text_area_patch_avail = 1;

Instead of this global flag, we should just check that each CPUs
text_poke_area is non-NULL before using it ...

> + /*
> + * The barrier here ensures the write is visible to
> + * patch_instruction()
> + */
> + smp_wmb();
> + pr_info("text_poke area ready...\n");
> + return 0;
> +}
> +
> +/*
> + * This can be called for kernel text or a module.
> + */
> +static int kernel_map_addr(void *addr)

map_patch_area() ?

> +{
> + unsigned long pfn;
>   int err;
>  
> - __put_user_size(instr, addr, 4, err);
> + if (is_vmalloc_addr(addr))
> + pfn = vmalloc_to_pfn(addr);
> + else
> + pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> +

.. in here.

ie.     if (!this_cpu_read(text_poke_area))
                return -Exxx;

> + err = map_kernel_page(
> + (unsigned long)__this_cpu_read(text_poke_area)->addr,
> + (pfn << PAGE_SHIFT), pgprot_val(PAGE_KERNEL));

Use a local or two to make that less gross.

> + pr_devel("Mapped addr %p with pfn %lx\n",
> + __this_cpu_read(text_poke_area)->addr, pfn);

err?

>   if (err)
> - return err;
> - asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r" (addr));
> + return -1;
>   return 0;
>  }
>  
> +static inline void kernel_unmap_addr(void *addr)
> +{
> + pte_t *pte;
> + unsigned long kaddr = (unsigned long)addr;
> +
> + pte = pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(kaddr),
> + kaddr), kaddr), kaddr);

This is pretty fragile, I'd rather you checked each level returned
something sane.

> + pr_devel("clearing mm %p, pte %p, kaddr %lx\n", &init_mm, pte, kaddr);
> + pte_clear(&init_mm, kaddr, pte);
> + flush_tlb_kernel_range(kaddr, kaddr + PAGE_SIZE);
> +}
> +
> +int patch_instruction(unsigned int *addr, unsigned int instr)
> +{
> + int err;
> + unsigned int *dest = NULL;
> + unsigned long flags;
> + unsigned long kaddr = (unsigned long)addr;
> +
> + /*
> + * Make sure we can see any write of text_area_patch_avail
> + */
> + smp_rmb();
> +
> + /*
> + * During early early boot patch_instruction is called
> + * when text_poke_area is not ready, but we still need
> + * to allow patching. We just do the plain old patching
> + * We use text_area_patch_avail, since per cpu read
> + * via __this_cpu_read of text_poke_area might not
> + * yet be available.
> + * TODO: Make text_area_patch_avail per cpu?
> + */
> + if (!text_area_patch_avail) {

This should check for text_poke_area != NULL.

> + __put_user_size(instr, addr, 4, err);
> + if (err)
> + return err;
> + asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (addr));
> + return 0;

> + }
> +
> + local_irq_save(flags);
> + if (kernel_map_addr(addr)) {
> + err = -1;
> + goto out;
> + }
> +
> + dest = (unsigned int *)(__this_cpu_read(text_poke_area)->addr) +
> + ((kaddr & ~PAGE_MASK) / sizeof(unsigned int));

!!!

> + __put_user_size(instr, dest, 4, err);

Can you just write a comment explaining why we use __put_user() - in
short, so a fault becomes an error return, not a actual crash.

> + if (!err)
> + asm ("dcbst 0, %0; sync; icbi 0,%0; icbi 0,%1; sync; isync"
> + ::"r" (dest), "r"(addr));
> + kernel_unmap_addr(__this_cpu_read(text_poke_area)->addr);
> +out:
> + local_irq_restore(flags);
> + return err;
> +}
> +NOKPROBE_SYMBOL(patch_instruction);
> +
>  int patch_branch(unsigned int *addr, unsigned long target, int flags)
>  {
>   return patch_instruction(addr, create_branch(addr, target, flags));
> @@ -514,3 +645,4 @@ static int __init test_code_patching(void)
>  late_initcall(test_code_patching);
>  
>  #endif /* CONFIG_CODE_PATCHING_SELFTEST */
> +early_initcall(setup_text_poke_area);

As mentioned before, this should be late_initcall().

cheers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH v3 3/9] powerpc/kprobes/optprobes: Move over to patch_instruction

Michael Ellerman-2
In reply to this post by Balbir Singh
Balbir Singh <[hidden email]> writes:

> With text moving to read-only migrate optprobes to using
> the patch_instruction infrastructure. Without this optprobes
> will fail and complain.
>
> Signed-off-by: Balbir Singh <[hidden email]>
> ---
>  arch/powerpc/kernel/optprobes.c | 58 ++++++++++++++++++++++++++---------------
>  1 file changed, 37 insertions(+), 21 deletions(-)

I picked this up too.

cheers
Loading...