From nobody Wed Nov 27 13:07:30 2024 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C053E1EABBE; Fri, 11 Oct 2024 02:14:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728612852; cv=none; b=gOLldGO4k9qx91GjyAupKDIOTrP72a2nba/Rt1hWZJMR8HDSDWKqiY5mKWX69K1md1LLXN20kdn05mI0+TG2xu9SiJVsFCJNG4TrZCVLwEoQaGGiYdzQtOfB8WnWBLz1RW/73CE2+C6KRvOj2szJZFd9qylFLj+dJ2RZJcIw8RQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728612852; c=relaxed/simple; bh=Faz5um/LkKzqB8ZjN4Jg+GjjWI0PfNgOUrIiymn4gAE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QtFibbRKogmS8xdmH2QoswmzdOX6YnEr2C+XuyHiw46mlIFXqWQTJbRr625EwiMMkRf6Ny5ZUCf3OSfDgqABWLpg0rRe5ujZNvFxgF5ckzzlHZI+cVJCrgH/CV7ABQbc0OrR04S5cEt8tn991nP5s1F70w9mjdWuO9BKtjFuIhg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YnEXjYf0; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YnEXjYf0" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2e2bd0e2c4fso1348138a91.3; Thu, 10 Oct 2024 19:14:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728612850; x=1729217650; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BgOjfzVdFS3y30GZHi4FKJSIKe0cpG5dnomDFlD9jeY=; b=YnEXjYf0Nxv4HHPMA1HIh4w4F+5uyVuSCrtxFbZ3V+2kf7T0iEYBwovG0cH3XGnoAH ugnAM8iohcy1IAQZW8lKqTqOtDZazzGWmfptC4Wqidc15IQHicx6+27N+P7kRHOiUFx+ NwqXMwJ/8mH7Va8j92v5FaLak0RdqbVxvgN+vIBraWorxo0TkoX/hg6DyIU5jjeSYclm K/ttGTyZXivDBzq91CEzCDfclEgNbuwcSWzzB+cFqjaXc3MSRcCCp6IkXy2P8D2rXChP EwMaHsI66Gz8FrhO9Kp76PgAuQVk2TiWepvFKwu1/boS+J/9CBX26AwGeKYu4VO8hSZO 9/GA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728612850; x=1729217650; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BgOjfzVdFS3y30GZHi4FKJSIKe0cpG5dnomDFlD9jeY=; b=ACYLuEedkhlBxLO1f0C5G/HRpEO8nOCoow5LxjNL0jTJCLf80e46pAEZkpBNJtGkYK 22vNrsIl6N2kdy5wjfweopnuAVor3IfWOdWGLtilBbsjjGd+C7I8ooLH2zBIMm4MBZ4Y AFrGHs98FHxmQt+STDN6snzjRqfZZjWNAkUy9BRCzdRFRodOQFdlrc1KGndJqLMZT8SZ n/tg9TSLzBaLPo8GNgnBGPbGfmTsBh8eNlP45mds0qns4mV5Dw92/32nE5LX38xSqKpn Wklo0OTvQOme2pNyV95hQ1ZeUBU6nslRfRfjK3boj7K77kH++Isuu8UfGxNXtQwEGblm HEkQ== X-Forwarded-Encrypted: i=1; AJvYcCV9Do4hNv788ujI0eCAj7lnHIjEIISg8dcWV3VbfttiZlTFc5O9sfs+DA++X4luo9gBn5OtozaJf3dk2QZK3jDgJA==@vger.kernel.org, AJvYcCWLE67s5UMG3+yjcUpbZMEzNZ5zZ/S5G8O3lcW4fpS4hJfre+MmJc391Q4xJoB+SPD6ALstSvD6Ow7vlxI=@vger.kernel.org X-Gm-Message-State: AOJu0YzL4i+L7oLXEoEIEdqKfJqRDabh/SYpABQL4+yAKhD0aEIpGgG/ TSuEYKicgSNl/amFXZ1Jfs1SjhRJYj/x+KPTdVioDJwGYnkOOhw0 X-Google-Smtp-Source: AGHT+IE8p/0jNH1GdWDl+1+su39ICkz1+z7QqHAjvgvLTerohZ0Co+kJqT1SEXEYQY2rL5enutfLEw== X-Received: by 2002:a17:90a:fc8e:b0:2e2:bd72:543d with SMTP id 98e67ed59e1d1-2e2f0ddb967mr1334832a91.41.1728612850049; Thu, 10 Oct 2024 19:14:10 -0700 (PDT) Received: from mbp.lan (c-67-174-206-244.hsd1.ca.comcast.net. [67.174.206.244]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e2a56f71b5sm4393024a91.20.2024.10.10.19.14.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Oct 2024 19:14:09 -0700 (PDT) From: Howard Chu To: peterz@infradead.org Cc: mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com, kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, james.clark@linaro.org, alan.maguire@oracle.com, Howard Chu Subject: [PATCH v2 2/2] perf trace: Rewrite BPF code to pass the verifier Date: Thu, 10 Oct 2024 19:14:02 -0700 Message-ID: <20241011021403.4089793-3-howardchu95@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241011021403.4089793-1-howardchu95@gmail.com> References: <20241011021403.4089793-1-howardchu95@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rewrite the code to add more memory bound checking in order to pass the BPF verifier, no logic is changed. This rewrite is centered around two main ideas: - Always use a variable instead of an expression in if block's condition, so BPF verifier keeps track of the correct register. - Delay the check as late as possible, just before the BPF function call. Things that can be done better still: - Instead of allowing a theoretical maximum of a 6-argument augmentation payload, reduce the payload to a smaller fixed size. Signed-off-by: Howard Chu --- .../bpf_skel/augmented_raw_syscalls.bpf.c | 122 ++++++++++-------- 1 file changed, 67 insertions(+), 55 deletions(-) diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/= perf/util/bpf_skel/augmented_raw_syscalls.bpf.c index b2f17cca014b..9ae459faac4b 100644 --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c @@ -157,9 +157,9 @@ static inline int augmented__output(void *ctx, struct a= ugmented_args_payload *ar return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_= CPU, args, len); } =20 -static inline int augmented__beauty_output(void *ctx, void *data, int len) +static inline int augmented__beauty_output(void *ctx, struct beauty_payloa= d_enter *args, int len) { - return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_= CPU, data, len); + return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_= CPU, args, len); } =20 static inline @@ -277,25 +277,31 @@ int sys_enter_rename(struct syscall_enter_args *args) struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *oldpath_arg =3D (const void *)args->args[0], *newpath_arg =3D (const void *)args->args[1]; - unsigned int len =3D sizeof(augmented_args->args), oldpath_len, newpath_l= en; + unsigned int len =3D sizeof(augmented_args->args), oldpath_len, newpath_l= en, aligned_size; =20 if (augmented_args =3D=3D NULL) - return 1; /* Failure: don't filter */ + goto failure; =20 len +=3D 2 * sizeof(u64); // The overhead of size and err, just before th= e payload... =20 oldpath_len =3D augmented_arg__read_str(&augmented_args->arg, oldpath_arg= , sizeof(augmented_args->arg.value)); - augmented_args->arg.size =3D PERF_ALIGN(oldpath_len + 1, sizeof(u64)); - len +=3D augmented_args->arg.size; + aligned_size =3D PERF_ALIGN(oldpath_len + 1, sizeof(u64)); + augmented_args->arg.size =3D aligned_size; + len +=3D aligned_size; + + /* Every read from userspace is limited to value size */ + if (aligned_size > sizeof(augmented_args->arg.value)) + goto failure; =20 - struct augmented_arg *arg2 =3D (void *)&augmented_args->arg.value + augme= nted_args->arg.size; + struct augmented_arg *arg2 =3D (void *)&augmented_args->arg.value + align= ed_size; =20 newpath_len =3D augmented_arg__read_str(arg2, newpath_arg, sizeof(augment= ed_args->arg.value)); arg2->size =3D newpath_len; - len +=3D newpath_len; =20 return augmented__output(args, augmented_args, len); +failure: + return 1; /* Failure: don't filter */ } =20 SEC("tp/syscalls/sys_enter_renameat2") @@ -304,25 +310,31 @@ int sys_enter_renameat2(struct syscall_enter_args *ar= gs) struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *oldpath_arg =3D (const void *)args->args[1], *newpath_arg =3D (const void *)args->args[3]; - unsigned int len =3D sizeof(augmented_args->args), oldpath_len, newpath_l= en; + unsigned int len =3D sizeof(augmented_args->args), oldpath_len, newpath_l= en, aligned_size; =20 if (augmented_args =3D=3D NULL) - return 1; /* Failure: don't filter */ + goto failure; =20 len +=3D 2 * sizeof(u64); // The overhead of size and err, just before th= e payload... =20 oldpath_len =3D augmented_arg__read_str(&augmented_args->arg, oldpath_arg= , sizeof(augmented_args->arg.value)); - augmented_args->arg.size =3D PERF_ALIGN(oldpath_len + 1, sizeof(u64)); - len +=3D augmented_args->arg.size; + aligned_size =3D PERF_ALIGN(oldpath_len + 1, sizeof(u64)); + augmented_args->arg.size =3D aligned_size; + len +=3D aligned_size; =20 - struct augmented_arg *arg2 =3D (void *)&augmented_args->arg.value + augme= nted_args->arg.size; + /* Every read from userspace is limited to value size */ + if (aligned_size > sizeof(augmented_args->arg.value)) + goto failure; + + struct augmented_arg *arg2 =3D (void *)&augmented_args->arg.value + align= ed_size; =20 newpath_len =3D augmented_arg__read_str(arg2, newpath_arg, sizeof(augment= ed_args->arg.value)); arg2->size =3D newpath_len; - len +=3D newpath_len; =20 return augmented__output(args, augmented_args, len); +failure: + return 1; /* Failure: don't filter */ } =20 #define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */ @@ -422,12 +434,12 @@ static bool pid_filter__has(struct pids_filtered *pid= s, pid_t pid) =20 static int augment_sys_enter(void *ctx, struct syscall_enter_args *args) { - bool augmented, do_output =3D false; - int zero =3D 0, size, aug_size, index, output =3D 0, - value_size =3D sizeof(struct augmented_arg) - offsetof(struct augment= ed_arg, value); + bool do_augment =3D false; + int zero =3D 0, value_size =3D sizeof(struct augmented_arg) - sizeof(u64); unsigned int nr, *beauty_map; struct beauty_payload_enter *payload; - void *arg, *payload_offset; + void *payload_offset, *value_offset; + u64 len =3D 0; /* has to be u64, otherwise it won't pass the verifier */ =20 /* fall back to do predefined tail call */ if (args =3D=3D NULL) @@ -436,16 +448,18 @@ static int augment_sys_enter(void *ctx, struct syscal= l_enter_args *args) /* use syscall number to get beauty_map entry */ nr =3D (__u32)args->syscall_nr; beauty_map =3D bpf_map_lookup_elem(&beauty_map_enter, &nr); + if (beauty_map =3D=3D NULL) + return 1; =20 /* set up payload for output */ payload =3D bpf_map_lookup_elem(&beauty_payload_enter_map, &zero); payload_offset =3D (void *)&payload->aug_args; - - if (beauty_map =3D=3D NULL || payload =3D=3D NULL) + if (payload =3D=3D NULL) return 1; =20 /* copy the sys_enter header, which has the syscall_nr */ __builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args)); + len +=3D sizeof(struct syscall_enter_args); =20 /* * Determine what type of argument and how many bytes to read from user s= pace, using the @@ -457,52 +471,50 @@ static int augment_sys_enter(void *ctx, struct syscal= l_enter_args *args) * buffer: -1 * (index of paired len) -> value of paired len (maximum: TR= ACE_AUG_MAX_BUF) */ for (int i =3D 0; i < 6; i++) { - arg =3D (void *)args->args[i]; - augmented =3D false; - size =3D beauty_map[i]; - aug_size =3D size; /* size of the augmented data read from user space */ + int augment_size =3D beauty_map[i]; + unsigned int augment_size_with_header; + void *addr =3D (void *)args->args[i]; + bool is_augmented =3D false; =20 - if (size =3D=3D 0 || arg =3D=3D NULL) + if (augment_size =3D=3D 0 || addr =3D=3D NULL) continue; =20 - if (size =3D=3D 1) { /* string */ - aug_size =3D bpf_probe_read_user_str(((struct augmented_arg *)payload_o= ffset)->value, value_size, arg); - /* minimum of 0 to pass the verifier */ - if (aug_size < 0) - aug_size =3D 0; - - augmented =3D true; - } else if (size > 0 && size <=3D value_size) { /* struct */ - if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->valu= e, size, arg)) - augmented =3D true; - } else if (size < 0 && size >=3D -6) { /* buffer */ - index =3D -(size + 1); - aug_size =3D args->args[index]; - - if (aug_size > TRACE_AUG_MAX_BUF) - aug_size =3D TRACE_AUG_MAX_BUF; - - if (aug_size > 0) { - if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->val= ue, aug_size, arg)) - augmented =3D true; - } + value_offset =3D ((struct augmented_arg *)payload_offset)->value; + + if (augment_size =3D=3D 1) { /* string */ + augment_size =3D bpf_probe_read_user_str(value_offset, value_size, addr= ); + is_augmented =3D true; + } else if (augment_size > 1 && augment_size <=3D value_size) { /* struct= */ + if (!bpf_probe_read_user(value_offset, value_size, addr)) + is_augmented =3D true; + } else if (augment_size < 0 && augment_size >=3D -6) { /* buffer */ + int index =3D -(augment_size + 1); + + augment_size =3D args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_= BUF : args->args[index]; + if (!bpf_probe_read_user(value_offset, augment_size, addr)) + is_augmented =3D true; } =20 - /* write data to payload */ - if (augmented) { - int written =3D offsetof(struct augmented_arg, value) + aug_size; + /* Augmented data size is limited to value size */ + if (augment_size > value_size) + augment_size =3D value_size; + + /* Explicitly define this variable to pass the verifier */ + augment_size_with_header =3D sizeof(u64) + augment_size; =20 - ((struct augmented_arg *)payload_offset)->size =3D aug_size; - output +=3D written; - payload_offset +=3D written; - do_output =3D true; + /* Write data to payload */ + if (is_augmented && augment_size_with_header <=3D sizeof(struct augmente= d_arg)) { + ((struct augmented_arg *)payload_offset)->size =3D augment_size; + do_augment =3D true; + len +=3D augment_size_with_header; + payload_offset +=3D augment_size_with_header; } } =20 - if (!do_output) + if (!do_augment || len > sizeof(struct beauty_payload_enter)) return 1; =20 - return augmented__beauty_output(ctx, payload, sizeof(struct syscall_enter= _args) + output); + return augmented__beauty_output(ctx, payload, len); } =20 SEC("tp/raw_syscalls/sys_enter") --=20 2.43.0