From nobody Fri Dec 19 20:54:52 2025 Received: from angie.orcam.me.uk (angie.orcam.me.uk [78.133.224.34]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E2F8C1A28D; Wed, 19 Feb 2025 12:46:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=78.133.224.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739969180; cv=none; b=XaREMKuE3bFE3Wq5SQOkqTGFqHb9flwo1U8E6DFQjmoFl1R+7HRTgHN+1esyY5nSXJJak+iOVKqaw2xPUS+6ubq3wtUHgpRb5Vtp3O5okuS+zGlemWpSJscszUQ8udvawf5SFdwNCvVq8QKrxyvwuoDM5mkvKLpJ3wPfHFq7PVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739969180; c=relaxed/simple; bh=fVwBG3Xva3bKUeKJhEl0OyuOLo2jA+4vhSFxu2M8Wlw=; h=Date:From:To:cc:Subject:Message-ID:MIME-Version:Content-Type; b=pEd0QmcRCIzA+wQU0fb4gc8s1clMTbZVvYfDJdFcviYxnxRnRCYf/cZ/UgByd/XJuEu0rFF+7XFeqqKAbn7vzwstlUHiM2ecsNgz2iSbmriHPD9Ypx8RnxyKUi/XSu0GKP9cf0SnrTdFqlVBU9cx5OgUYKmO1FOAIz018xtVAIU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk; spf=none smtp.mailfrom=orcam.me.uk; arc=none smtp.client-ip=78.133.224.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=orcam.me.uk Received: by angie.orcam.me.uk (Postfix, from userid 500) id 6397B92009C; Wed, 19 Feb 2025 13:46:07 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 5F98E92009B; Wed, 19 Feb 2025 12:46:07 +0000 (GMT) Date: Wed, 19 Feb 2025 12:46:07 +0000 (GMT) From: "Maciej W. Rozycki" To: Richard Henderson , Ivan Kokshaysky , Matt Turner cc: Arnd Bergmann , John Paul Adrian Glaubitz , Magnus Lindholm , "Paul E. McKenney" , Linus Torvalds , Al Viro , linux-alpha@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] Alpha: Emulate unaligned LDx_L/STx_C for data consistency Message-ID: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Complementing compiler support for the `-msafe-bwa' and `-msafe-partial'=20 code generation options slated to land in GCC 15, implement emulation=20 for unaligned LDx_L and STx_C operations for the unlikely case where an=20 alignment violation has resulted from improperly written code and caused=20 these operations to trap in atomic RMW memory access sequences made to=20 provide data consistency for non-BWX byte and word write operations, and=20 writes to unaligned data objects causing partial memory updates. The principle of operation is as follows: 1. A trapping unaligned LDx_L operation results in the pair of adjacent=20 aligned whole data quantities spanned being read and stored for the=20 reference with a subsequent STx_C operation, along with the width of=20 the data accessed and its virtual address, and the task referring or=20 NULL if the kernel. The valitidy marker is set. 2. Regular memory load operations are used to retrieve data because no=20 atomicity is needed at this stage, and matching the width accessed,=20 either LDQ_U or LDL even though the latter instruction requires extra=20 operations, to avoid the case where an unaligned longword located=20 entirely within an aligned quadword would complicate handling. 3. Data is masked, shifted and merged appropriately and returned in the intended register as the result of the trapping LDx_L instruction. 4. A trapping unaligned STx_C operation results in the valitidy marker=20 being checked for being true, and the width of the data accessed=20 along with the virtual address and the task referring or the kernel=20 for a match. The pair of whole data quantities previously read by=20 LDx_L emulation is retrieved and the valitidy marker is cleared. 5. If the checks succeeded, then in an atomic loop the location of the=20 first whole data quantity is reread, and data retrieved compared with=20 the value previously obtained. If there's no match, then the loop is=20 aborted and 0 is returned in the intended register as the result of=20 the trapping STx_C instruction and emulation completes. Otherwise=20 new data obtained from the source operand of STx_C is combined with=20 the data retrieved, replacing by byte insertion the part intended,=20 and an atomic write of this new data is attempted. If it fails, the=20 loop continues from the beginning. Otherwise processing proceeds to=20 the next step. 6. The same operations are performed on the second whole data quantity. 7. At this point both whole data quantities have been written, ensuring=20 that no third-party intervening write has changed them at the point=20 of the write from the values held at previous LDx_L. Therefore 1 is=20 returned in the intended register as the result of the trapping STx_C=20 instruction. 8. No user accesses are permitted in traps from the kernel mode as the only LDx_L/STx_C accesses made to user memory locations by the kernel=20 are supposed to be those from handcrafted code, which has to written=20 such as not to trap. Since atomic loops are used for data updates the approach works equally=20 well in both UP and SMP environments. No data atomicity is guaranteed,=20 but data consistency is, that is concurrent RMW accesses won't clobber=20 each other, however if the same data is concurrently written as already=20 there with a regular write between emulated LDx_L and STx_C, then STx_C=20 will still succeed. Likewise if data is modified, but then restored=20 before STx_C has had a chance to run. This fulfils consistency requirements and guarantees that data outside=20 the quantity written has not changed between emulated LDx_L and STx_C. Signed-off-by: Maciej W. Rozycki Tested-by: Magnus Lindholm --- Hi, This has cleared the pair of `-msafe-bwa -msafe-partial' regressions=20 observed in GCC verification (the third one was a Modula 2 frontend bug,=20 now fixed in the compiler). I have verified individual misalignments with=20 a small program by hand as well, for both the data retrieved by emulated=20 LDx_L and the data stored by emulated STx_C. The kernel itself built with `-mcpu=3Dev4 -msafe-bwa -msafe-partial' boots=20 and has passed GCC verification, and triggered no extra unaligned traps. Full verification was run with 6.3.0-rc5 and Ivan's stack alignment fixes=20 applied just because I was confident already that version works correctly. =20 Interestingly enough no kernel mode traps have triggered with a kernel=20 built with GCC 12 (and with most user traps coming from GCC verification): kernel unaligned acc : 0 (pc=3D0,va=3D0) user unaligned acc : 1766720 (pc=3D20000053064,va=3D120020189) but with GCC 15 a small quantity happened (even before I ran GCC testing): kernel unaligned acc : 78 (pc=3Dfffffc0000ad5194,va=3Dfffffc0002db5784) user unaligned acc : 883452 (pc=3D20000053064,va=3D120020189) It seems a compiler regression worth checking -- the trap recorded was in=20 `icmp6_dst_alloc' with a pair of quadword writes to `rt->rt6i_dst.addr',=20 which however by its type (`struct in6_addr') is only longword-aligned and=20 indeed starts at offset 148 from the outermost struct. I have a sneaking=20 suspicion one of my earlier GCC changes might be at fault. At least I now=20 have a test case to experiment with. I've also built and booted 6.9.0-rc3 as at commit 82c525bfafb4 ("alpha:=20 trim the unused stuff from asm-offsets.c"), the last one before support=20 for my system was axed. It has passed the verification with my small=20 program (available by request; I'm not sure if it's worth turning into a=20 kernel selftest). NB I'm going to ignore the 72 errors checkpatch.pl issues for EXC usage. =20 The coding style of the new additions is consistent with the rest of the=20 file and any change to that would best be made separately (but I fail to=20 see the point). Questions, comments, concerns? Otherwise please apply, and I'll proceed=20 with the rest of the GCC effort, followed by cleaning handwritten assembly=20 up that uses STQ_U in our port and in glibc. Maciej --- arch/alpha/kernel/traps.c | 409 +++++++++++++++++++++++++++++++++++++++++= +++-- 1 file changed, 400 insertions(+), 9 deletions(-) linux-alpha-llsc-unaligned.diff Index: linux-macro/arch/alpha/kernel/traps.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-macro.orig/arch/alpha/kernel/traps.c +++ linux-macro/arch/alpha/kernel/traps.c @@ -368,6 +368,13 @@ struct unaligned_stat { unsigned long count, va, pc; } unaligned[2]; =20 +/* Unaligned LDx_L/STx_C emulation state. */ +static DEFINE_RAW_SPINLOCK(ll_lock); +static struct task_struct *ll_task; +static unsigned long ll_data[2]; +static unsigned long ll_va; +static bool ll_quad; +static bool ll_bit; =20 /* Macro for exception fixup code to access integer registers. */ #define una_reg(r) (_regs[(r) >=3D 16 && (r) <=3D 18 ? (r)+19 : (r)]) @@ -381,6 +388,9 @@ do_entUna(void * va, unsigned long opcod unsigned long pc =3D regs->pc - 4; unsigned long *_regs =3D regs->regs; const struct exception_table_entry *fixup; + unsigned long flags; + unsigned long la; + bool ll_match; =20 unaligned[0].count++; unaligned[0].va =3D (unsigned long) va; @@ -439,6 +449,65 @@ do_entUna(void * va, unsigned long opcod una_reg(reg) =3D tmp1|tmp2; return; =20 + case 0x2a: /* ldl_l */ + la =3D (unsigned long)va; + if (la < TASK_SIZE) + break; + __asm__ __volatile__( + "1: ldl %3,0(%5)\n" + "2: ldl %4,4(%5)\n" + " srl %3,%6,%1\n" + " sll %4,%7,%2\n" + " zapnot %1,15,%1\n" + " zapnot %2,15,%2\n" + "3:\n" + EXC(1b,3b,%1,%0) + EXC(2b,3b,%2,%0) + : "=3Dr"(error), + "=3D&r"(tmp1), "=3Dr"(tmp2), "=3D&r"(tmp3), "=3D&r"(tmp4) + : "r"(la & ~3ul), + "r"((la & 3) * 8), "r"((4 - (la & 3)) * 8), "0"(0)); + if (error) + goto got_exception; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_va =3D la; + ll_task =3D NULL; + ll_data[0] =3D tmp3; + ll_data[1] =3D tmp4; + ll_quad =3D false; + ll_bit =3D true; + raw_spin_unlock_irqrestore(&ll_lock, flags); + una_reg(reg) =3D (int)(tmp1|tmp2); + return; + + case 0x2b: /* ldq_l */ + la =3D (unsigned long)va; + if (la < TASK_SIZE) + break; + __asm__ __volatile__( + "1: ldq_u %3,0(%5)\n" + "2: ldq_u %4,7(%5)\n" + " extql %3,%5,%1\n" + " extqh %4,%5,%2\n" + "3:\n" + EXC(1b,3b,%1,%0) + EXC(2b,3b,%2,%0) + : "=3Dr"(error), + "=3D&r"(tmp1), "=3Dr"(tmp2), "=3D&r"(tmp3), "=3D&r"(tmp4) + : "r"(va), "0"(0)); + if (error) + goto got_exception; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_va =3D la; + ll_task =3D NULL; + ll_data[0] =3D tmp3; + ll_data[1] =3D tmp4; + ll_quad =3D true; + ll_bit =3D true; + raw_spin_unlock_irqrestore(&ll_lock, flags); + una_reg(reg) =3D tmp1|tmp2; + return; + /* Note that the store sequences do not indicate that they change memory because it _should_ be affecting nothing in this context. (Otherwise we have other, much larger, problems.) */ @@ -513,6 +582,134 @@ do_entUna(void * va, unsigned long opcod if (error) goto got_exception; return; + + case 0x2e: /* stl_c */ + la =3D (unsigned long)va; + if (la < TASK_SIZE) + break; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_match =3D ll_bit; + ll_match &=3D !ll_quad; + ll_match &=3D ll_task =3D=3D NULL; + ll_match &=3D ll_va =3D=3D la; + tmp3 =3D ll_data[0]; + tmp4 =3D ll_data[1]; + ll_bit =3D false; + raw_spin_unlock_irqrestore(&ll_lock, flags); + if (ll_match) { + __asm__ __volatile__( + " srl %6,%5,%3\n" + " zapnot %3,%8,%3\n" + "1: ldl_l %2,4(%4)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " zap %2,%8,%2\n" + " or %2,%3,%1\n" + "2: stl_c %1,4(%4)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(la & ~3ul), "r"((4 - (la & 3)) * 8), + "r"(una_reg(reg)), "r"(tmp4), + "r"((15 >> (4 - (la & 3))) & 0xf), "0"(0)); + if (error) + goto got_exception; + } + if (ll_match) { + __asm__ __volatile__( + " sll %6,%5,%3\n" + " zapnot %3,%8,%3\n" + "1: ldl_l %2,0(%4)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " zap %2,%8,%2\n" + " or %2,%3,%1\n" + "2: stl_c %1,0(%4)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(la & ~3ul), "r"((la & 3) * 8), + "r"(una_reg(reg)), "r"(tmp3), + "r"((15 << (la & 3)) & 0xf), "0"(0)); + if (error) + goto got_exception; + } + una_reg(reg) =3D ll_match; + return; + + case 0x2f: /* stq_c */ + la =3D (unsigned long)va; + if (la < TASK_SIZE) + break; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_match =3D ll_bit; + ll_match &=3D ll_quad; + ll_match &=3D ll_task =3D=3D NULL; + ll_match &=3D ll_va =3D=3D la; + tmp3 =3D ll_data[0]; + tmp4 =3D ll_data[1]; + ll_bit =3D false; + raw_spin_unlock_irqrestore(&ll_lock, flags); + if (ll_match) { + __asm__ __volatile__( + " insqh %6,%4,%3\n" + "1: ldq_l %2,8(%5)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " mskqh %2,%4,%2\n" + " or %2,%3,%1\n" + "2: stq_c %1,8(%5)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(va), "r"(la & ~7ul), + "r"(una_reg(reg)), "r"(tmp4), "0"(0)); + if (error) + goto got_exception; + } + if (ll_match) { + __asm__ __volatile__( + " insql %6,%4,%3\n" + "1: ldq_l %2,0(%5)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " mskql %2,%4,%2\n" + " or %2,%3,%1\n" + "2: stq_c %1,0(%5)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(va), "r"(la & ~7ul), + "r"(una_reg(reg)), "r"(tmp3), "0"(0)); + if (error) + goto got_exception; + } + una_reg(reg) =3D ll_match; + return; } =20 printk("Bad unaligned kernel access at %016lx: %p %lx %lu\n", @@ -624,24 +821,33 @@ s_reg_to_mem (unsigned long s_reg) * so finding the appropriate registers is a little more difficult * than in the kernel case. * - * Finally, we handle regular integer load/stores only. In - * particular, load-linked/store-conditionally and floating point - * load/stores are not supported. The former make no sense with - * unaligned faults (they are guaranteed to fail) and I don't think - * the latter will occur in any decent program. + * We have three classes of operations to handle: * - * Sigh. We *do* have to handle some FP operations, because GCC will - * uses them as temporary storage for integer memory to memory copies. - * However, we need to deal with stt/ldt and sts/lds only. + * - We handle regular integer load/stores transparently to faulting + * code, preserving the semantics of the triggering instruction. + * + * - We handle some FP operations as well, because GCC will use them as + * temporary storage for integer memory to memory copies. However, + * we need to deal with stt/ldt and sts/lds only. + * + * - We handle load-locked/store-conditional operations by maintaining + * data consistency only, within the two adjacent longwords or + * quadwords partially spanned. This is sufficient to guarantee an + * unaligned RMW sequence using these operations won't clobber data + * *outside* the location intended but does *not* guarantee atomicity + * for the data quantity itself. */ =20 #define OP_INT_MASK ( 1L << 0x28 | 1L << 0x2c /* ldl stl */ \ + | 1L << 0x2a | 1L << 0x2e /* ldl_l stl_c */ \ | 1L << 0x29 | 1L << 0x2d /* ldq stq */ \ + | 1L << 0x2b | 1L << 0x2f /* ldq_l stq_c */ \ | 1L << 0x0c | 1L << 0x0d /* ldwu stw */ \ | 1L << 0x0a | 1L << 0x0e ) /* ldbu stb */ =20 #define OP_WRITE_MASK ( 1L << 0x26 | 1L << 0x27 /* sts stt */ \ | 1L << 0x2c | 1L << 0x2d /* stl stq */ \ + | 1L << 0x2e | 1L << 0x2d /* stl_c stq_c */ \ | 1L << 0x0d | 1L << 0x0e ) /* stw stb */ =20 #define R(x) ((size_t) &((struct pt_regs *)0)->x) @@ -666,6 +872,9 @@ do_entUnaUser(void __user * va, unsigned =20 unsigned long tmp1, tmp2, tmp3, tmp4; unsigned long fake_reg, *reg_addr =3D &fake_reg; + unsigned long flags; + unsigned long la; + bool ll_match; int si_code; long error; =20 @@ -794,6 +1003,61 @@ do_entUnaUser(void __user * va, unsigned *reg_addr =3D tmp1|tmp2; break; =20 + case 0x2a: /* ldl_l */ + la =3D (unsigned long)va; + __asm__ __volatile__( + "1: ldl %3,0(%5)\n" + "2: ldl %4,4(%5)\n" + " srl %3,%6,%1\n" + " sll %4,%7,%2\n" + " zapnot %1,15,%1\n" + " zapnot %2,15,%2\n" + "3:\n" + EXC(1b,3b,%1,%0) + EXC(2b,3b,%2,%0) + : "=3Dr"(error), + "=3D&r"(tmp1), "=3Dr"(tmp2), "=3D&r"(tmp3), "=3D&r"(tmp4) + : "r"(la & ~3ul), + "r"((la & 3) * 8), "r"((4 - (la & 3)) * 8), "0"(0)); + if (error) + goto give_sigsegv; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_va =3D la; + ll_task =3D current; + ll_data[0] =3D tmp3; + ll_data[1] =3D tmp4; + ll_quad =3D false; + ll_bit =3D true; + raw_spin_unlock_irqrestore(&ll_lock, flags); + *reg_addr =3D (int)(tmp1|tmp2); + break; + + case 0x2b: /* ldq_l */ + la =3D (unsigned long)va; + __asm__ __volatile__( + "1: ldq_u %3,0(%5)\n" + "2: ldq_u %4,7(%5)\n" + " extql %3,%5,%1\n" + " extqh %4,%5,%2\n" + "3:\n" + EXC(1b,3b,%1,%0) + EXC(2b,3b,%2,%0) + : "=3Dr"(error), + "=3D&r"(tmp1), "=3Dr"(tmp2), "=3D&r"(tmp3), "=3D&r"(tmp4) + : "r"(va), "0"(0)); + if (error) + goto give_sigsegv; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_va =3D la; + ll_task =3D current; + ll_data[0] =3D tmp3; + ll_data[1] =3D tmp4; + ll_quad =3D true; + ll_bit =3D true; + raw_spin_unlock_irqrestore(&ll_lock, flags); + *reg_addr =3D tmp1|tmp2; + break; + /* Note that the store sequences do not indicate that they change memory because it _should_ be affecting nothing in this context. (Otherwise we have other, much larger, problems.) */ @@ -877,12 +1141,139 @@ do_entUnaUser(void __user * va, unsigned goto give_sigsegv; return; =20 + case 0x2e: /* stl_c */ + la =3D (unsigned long)va; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_match =3D ll_bit; + ll_match &=3D !ll_quad; + ll_match &=3D ll_task =3D=3D current; + ll_match &=3D ll_va =3D=3D la; + tmp3 =3D ll_data[0]; + tmp4 =3D ll_data[1]; + ll_bit =3D false; + raw_spin_unlock_irqrestore(&ll_lock, flags); + if (ll_match) { + __asm__ __volatile__( + " srl %6,%5,%3\n" + " zapnot %3,%8,%3\n" + "1: ldl_l %2,4(%4)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " zap %2,%8,%2\n" + " or %2,%3,%1\n" + "2: stl_c %1,4(%4)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(la & ~3ul), "r"((4 - (la & 3)) * 8), + "r"(*reg_addr), "r"(tmp4), + "r"((15 >> (4 - (la & 3))) & 0xf), "0"(0)); + if (error) + goto give_sigsegv; + } + if (ll_match) { + __asm__ __volatile__( + " sll %6,%5,%3\n" + " zapnot %3,%8,%3\n" + "1: ldl_l %2,0(%4)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " zap %2,%8,%2\n" + " or %2,%3,%1\n" + "2: stl_c %1,0(%4)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(la & ~3ul), "r"((la & 3) * 8), + "r"(*reg_addr), "r"(tmp3), + "r"((15 << (la & 3)) & 0xf), "0"(0)); + if (error) + goto give_sigsegv; + } + *reg_addr =3D ll_match; + break; + + case 0x2f: /* stq_c */ + la =3D (unsigned long)va; + raw_spin_lock_irqsave(&ll_lock, flags); + ll_match =3D ll_bit; + ll_match &=3D ll_quad; + ll_match &=3D ll_task =3D=3D current; + ll_match &=3D ll_va =3D=3D la; + tmp3 =3D ll_data[0]; + tmp4 =3D ll_data[1]; + ll_bit =3D false; + raw_spin_unlock_irqrestore(&ll_lock, flags); + if (ll_match) { + __asm__ __volatile__( + " insqh %6,%4,%3\n" + "1: ldq_l %2,8(%5)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " mskqh %2,%4,%2\n" + " or %2,%3,%1\n" + "2: stq_c %1,8(%5)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(va), "r"(la & ~7ul), + "r"(*reg_addr), "r"(tmp4), "0"(0)); + if (error) + goto give_sigsegv; + } + if (ll_match) { + __asm__ __volatile__( + " insql %6,%4,%3\n" + "1: ldq_l %2,0(%5)\n" + " cmpeq %7,%2,%1\n" + " beq %1,4f\n" + " mskql %2,%4,%2\n" + " or %2,%3,%1\n" + "2: stq_c %1,0(%5)\n" + " beq %1,3f\n" + " .subsection 2\n" + "3: br 1b\n" + " .previous\n" + "4:\n" + EXC(1b,4b,%2,%0) + EXC(2b,4b,%1,%0) + : "=3Dr"(error), "=3D&r"(ll_match), + "=3D&r"(tmp1), "=3D&r"(tmp2) + : "r"(va), "r"(la & ~7ul), + "r"(*reg_addr), "r"(tmp3), "0"(0)); + if (error) + goto give_sigsegv; + } + *reg_addr =3D ll_match; + break; + default: /* What instruction were you trying to use, exactly? */ goto give_sigbus; } =20 - /* Only integer loads should get here; everyone else returns early. */ + /* + * Only integer loads and stores conditional should get here; + * everyone else returns early. + */ if (reg =3D=3D 30) wrusp(fake_reg); return;