x86/ibt: FineIBT-BHI

[PATCH 11/14] llvm: kCFI pointer stuff

Posted by Peter Zijlstra 2 months ago

Quick hack to extend the Clang-kCFI function meta-data (u32 hash) with
a u8 bitmask of pointer arguments. This should really be under a new
compiler flag, dependent on both x86_64 and kCFI.

Per the comment, the bitmask represents the register based arguments
as the first 6 bits and bit 6 is used to cover all stack based
arguments. The high bit is used for invalid values.

The purpose is to put a store dependency on the set registers, thereby
blocking speculation paths that would otherwise expoit their value.


Note1:

This implementation simply sets the bit for any pointer type. A better
implementation would only set the bit for any argument that is
dereferenced in the function body.

This better implementation would also capture things like:

  void foo(unsigned long addr, void *args)
  {
    u32 t = *(u32 *)addr;
    bar(t, args);
  }

Which, in contrast to the implementation below, would set bit0 while
leaving bit1 unset -- the exact opposite of this implementation.

Notably, addr *is* dereferenced, even though it is not a pointer on
entry, while args is a pointer, but is not derefereced but passed on
to bar -- if bar uses it, it gets to deal with it.

Note2:

Do we want to make this a u32 to keep room for all registers? AFAICT
the current use is only concerned with the argument registers and
those are limited to 6 for the C ABI, but custom (assembly) functions
could use things outside of that.

---
diff --git a/llvm/lib/Target/X86/X86AsmPrinter.cpp b/llvm/lib/Target/X86/X86AsmPrinter.cpp
index 73c745062096..42dcbc40ab4b 100644
--- a/llvm/lib/Target/X86/X86AsmPrinter.cpp
+++ b/llvm/lib/Target/X86/X86AsmPrinter.cpp
@@ -143,11 +143,28 @@ void X86AsmPrinter::EmitKCFITypePadding(const MachineFunction &MF,
   // one. Otherwise, just pad with nops. The X86::MOV32ri instruction emitted
   // in X86AsmPrinter::emitKCFITypeId is 5 bytes long.
   if (HasType)
-    PrefixBytes += 5;
+    PrefixBytes += 7;
 
   emitNops(offsetToAlignment(PrefixBytes, MF.getAlignment()));
 }
 
+static uint8_t getKCFIPointerArgs(const Function &F)
+{
+  uint8_t val = 0;
+
+  if (F.isVarArg())
+    return 0x7f;
+
+  for (int i = 0; i < F.arg_size() ; i++) {
+    Argument *A = F.getArg(i);
+    Type *T = A->getType();
+    if (T->getTypeID() == Type::PointerTyID)
+      val |= 1 << std::min(i, 6);
+  }
+
+  return val;
+}
+
 /// emitKCFITypeId - Emit the KCFI type information in architecture specific
 /// format.
 void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) {
@@ -183,6 +200,26 @@ void X86AsmPrinter::emitKCFITypeId(const MachineFunction &MF) {
                               .addReg(X86::EAX)
                               .addImm(MaskKCFIType(Type->getZExtValue())));
 
+  // Extend the kCFI meta-data with a byte that has a bit set for each argument
+  // register that contains a pointer. Specifically for x86_64, which has 6
+  // argument registers:
+  //
+  //   bit0 - rdi
+  //   bit1 - rsi
+  //   bit2 - rdx
+  //   bit3 - rcx
+  //   bit4 - r8
+  //   bit5 - r9
+  //
+  // bit6 will denote any pointer on stack (%rsp), and all 7 bits set will
+  // indicate vararg or any other 'unknown' configuration. Leaving bit7 for
+  // error states.
+  //
+  // XXX: should be conditional on some new x86_64 specific 'bhi' argument.
+  EmitAndCountInstruction(MCInstBuilder(X86::MOV8ri)
+		  .addReg(X86::AL)
+		  .addImm(getKCFIPointerArgs(F)));
+
   if (MAI->hasDotTypeDotSizeDirective()) {
     MCSymbol *EndSym = OutContext.createTempSymbol("cfi_func_end");
     OutStreamer->emitLabel(EndSym);
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index cbb012161524..c0776ef78153 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -897,7 +897,7 @@ void X86AsmPrinter::LowerKCFI_CHECK(const MachineInstr &MI) {
                               .addReg(AddrReg)
                               .addImm(1)
                               .addReg(X86::NoRegister)
-                              .addImm(-(PrefixNops + 4))
+                              .addImm(-(PrefixNops + 6))
                               .addReg(X86::NoRegister));
 
   MCSymbol *Pass = OutContext.createTempSymbol();

RE: [PATCH 11/14] llvm: kCFI pointer stuff

Posted by Constable, Scott D 4 weeks, 1 day ago

> Quick hack to extend the Clang-kCFI function meta-data (u32 hash) with a u8 bitmask of pointer arguments. This should really be under a new compiler flag, dependent on both x86_64 and kCFI.
> 
> Per the comment, the bitmask represents the register based arguments as the first 6 bits and bit 6 is used to cover all stack based arguments. The high bit is used for invalid values.
> 
> The purpose is to put a store dependency on the set registers, thereby blocking speculation paths that would otherwise expoit their value.

Given the ongoing discussion on [PATCH 13/14] where there is a growing consensus that all arguments (not just pointers) should be poisoned after a misprediction, a different encoding scheme would be needed. I believe there are 8 possibilities, which correspond to the function's arity:

0: Function takes 0 args
1: Function takes 1 arg
2: Function takes 2 args
3: Function takes 3 args
4: Function takes 4 args
5: Function takes 5 args
6: Function takes 6 args
7: Function takes >6 args

These possibilities can be encoded with 3 bits. I suspect that it might actually be beneficial to steal 3 bits from the u32 kCFI hash (either by using a smaller 29-bit hash or by truncating the 32-bit hash down to 29 bits). This scheme would arguably strengthen both kCFI and FineIBT by partitioning CFI edges such that a j-arity function cannot call a k-arity function unless j=k (or unless j>6 and k>6); the current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. The disadvantage of the 29-bit hash is that it would increase the probability of collisions within each arity, but on the other hand the total number of functions of each given arity is much smaller than the total number of functions of all arities.

Regards,

Scott Constable

RE: [PATCH 11/14] llvm: kCFI pointer stuff

Posted by Constable, Scott D 4 weeks ago

> > Quick hack to extend the Clang-kCFI function meta-data (u32 hash) with a u8 bitmask of pointer arguments. This should really be under a new compiler flag, dependent on both x86_64 and kCFI.
> > 
> > Per the comment, the bitmask represents the register based arguments as the first 6 bits and bit 6 is used to cover all stack based arguments. The high bit is used for invalid values.
> > 
> > The purpose is to put a store dependency on the set registers, thereby blocking speculation paths that would otherwise expoit their value.
> 
> Given the ongoing discussion on [PATCH 13/14] where there is a growing consensus that all arguments (not just pointers) should be poisoned after a misprediction, a different encoding scheme would be needed. I believe there are 8 possibilities, which correspond to the function's arity:
> 
> 0: Function takes 0 args
> 1: Function takes 1 arg
> 2: Function takes 2 args
> 3: Function takes 3 args
> 4: Function takes 4 args
> 5: Function takes 5 args
> 6: Function takes 6 args
> 7: Function takes >6 args
>
> These possibilities can be encoded with 3 bits. I suspect that it might actually be beneficial to steal 3 bits from the u32 kCFI hash (either by using a smaller 29-bit hash or by truncating the 32-bit hash down to 29 bits). This scheme would arguably strengthen both kCFI and FineIBT by partitioning CFI edges such that a j-arity function cannot call a k-arity function unless j=k (or unless j>6 and k>6); the current 32-bit kCFI hash does not prevent, for example, a 2-arity fptr from calling a 3-arity target if the kCFI hashes collide. The disadvantage of the 29-bit hash is that it would increase the probability of collisions within each arity, but on the other hand the total number of functions of each given arity is much smaller than the total number of functions of all arities.

I have done some additional analysis on my Noble kernel, which suggests that the proposed 29-bit hash with 3-bit arity will only be more secure than the existing 32-bit hash. Consider: My kernel has 141,617 total indirect call targets, with 10,903 unique function types. With a 32-bit kCFI hash, the expected number of collisions is 2^-32 * (10903 C 2) = 0.01383765. Then I scanned the kernel to identify the number of unique function types for each arity, and computed the corresponding expected number of collisions within each arity, and assuming a 29-bit hash:

# Args	total targets	unique types	Expected collisions
0	12682	32	0.00000092
1	42981	2492	0.00578125
2	37657	3775	0.01326841
3	29436	2547	0.00603931
4	12343	1169	0.00127162
5	4137	519	0.00025038
6	1700	221	0.00004528
more	681	148	0.00002026

(Sorry if the formatting became weird after copying from Excel)

Hence, even the arity (2) with the largest number of unique function types (3775) has a lower expected value for 29-bit collisions (0.01326841) than the expected value for 32-bit collisions (0.01383765).

Regards,

Scott Constable