From nobody Wed Nov 27 23:40:38 2024 Received: from relay2-d.mail.gandi.net (relay2-d.mail.gandi.net [217.70.183.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF7511DC055; Mon, 7 Oct 2024 16:10:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.183.194 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728317461; cv=none; b=CIH5vcIVYuh+z6bfAudNYP8KayycJ4tYRK4OQNo2aSk/QByKay2fY8N/0IWcFJm4zHPqmjZ+jkjGVM9X4gMYkcKyBekVKvK/Vk57ZIMfKukhpm10YdDnOZayiSmcK0B8u1H2z+JPOo050B43Vkz1F0UsPj/C1dahhVTon0CbLtU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728317461; c=relaxed/simple; bh=/Y9VMlVfq5UJNHiiDgUCeQG11WYbql4tu9XB1HLK2cA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qOPRQr3/jd9VPzxV1n737GKp6zt7u52yHVkph1ZJDJk1Oz08TrB1easmlGFWOr4AaHo0z3gMoQGuApfk/LKLsiBSxmiVM3tvxB0IUc7BdcR9+cPNXpWfEqAIvhz9QFOWcM9dsvmSLcIMc7k7rJCZfs31yhaDnE068s6tss5oW98= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com; spf=pass smtp.mailfrom=bootlin.com; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b=FD4ctT9R; arc=none smtp.client-ip=217.70.183.194 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bootlin.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b="FD4ctT9R" Received: by mail.gandi.net (Postfix) with ESMTPSA id 2EF364000C; Mon, 7 Oct 2024 16:10:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1728317456; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6MQnfX9Qj8kXj7/eV4BSo8hbAMVGNYYB5iM9DWW1nkg=; b=FD4ctT9RSrzw/yk4MqMkYguL07TUVqKsDtW10kpVQQxJZBG1HFiaLodrx+22lOvQ6rlq2J NsDtqQfeABO16PMNuceaZHICv5njVIHgtRcR0WBG9cC+h0f9DQwfPGxhKy+uM6vGhafu2e BIf3n06GIxXAKgo9rRkvEPGQK+NNEhngCXqRNKcaW9i/WXhwdV+7xUJN2DgLaBSvHLSGda dXIUy04iCD2/o7d6YGsUnpfBaoWKXm20n+hPFhB1zVhoFQCmfwWZ9X6d/pN51HaoOoS2JJ hpmIBJRhbCeJOF0lY6SCi3rOLNmd9nlrCq78ME1N5dfrfsncrWXkHZ78LwPBLg== From: Louis Chauvet Date: Mon, 07 Oct 2024 18:10:42 +0200 Subject: [PATCH v12 08/15] drm/vkms: Re-introduce line-per-line composition algorithm Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20241007-yuv-v12-8-01c1ada6fec8@bootlin.com> References: <20241007-yuv-v12-0-01c1ada6fec8@bootlin.com> In-Reply-To: <20241007-yuv-v12-0-01c1ada6fec8@bootlin.com> To: Rodrigo Siqueira , Melissa Wen , =?utf-8?q?Ma=C3=ADra_Canal?= , Haneen Mohammed , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Jonathan Corbet , Louis Chauvet , Simona Vetter , rdunlap@infradead.org, arthurgrillo@riseup.net, pekka.paalanen@haloniitty.fi, Simona Vetter Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, thomas.petazzoni@bootlin.com, jeremie.dautheribes@bootlin.com, miquel.raynal@bootlin.com, seanpaul@google.com, marcheu@google.com, nicolejadeyee@google.com, Pekka Paalanen X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=30285; i=louis.chauvet@bootlin.com; h=from:subject:message-id; bh=/Y9VMlVfq5UJNHiiDgUCeQG11WYbql4tu9XB1HLK2cA=; b=owEBbQKS/ZANAwAIASCtLsZbECziAcsmYgBnBAgCeg6iDRtJ1XF8LPUCLaxgVugQy9Vjxc7wL uhVTVFuC1iJAjMEAAEIAB0WIQRPj7g/vng8MQxQWQQgrS7GWxAs4gUCZwQIAgAKCRAgrS7GWxAs 4jpAD/sHN7rgyxVkjP6B6nbL3UZix80qIwCXBMHXBbmdLJTf1rPHi038ANR3L3DaL+z9kjpGUix 3r1mGgRK0fSGIl+J6NPdqH+rhEuUnAhnR5QFpx1WzArT1C9FuQRk3DPC5dClkuXUcgQk7SUJzLK QS6W9hpVsUe95YunpXWWy721iUa4PTu1DoLIdPp9XU2YimRxuE9VfZ+SU5UQUcta4CCPu180vGl 1PN7qtRTErHk466rumYZrskH3TMyFBYhkyc9l0VzGZeF+xrTCja74a1hux60bzBz5NNGH0JftT/ U7/Z4/UI2hH9EqfZp0mrQC9DVvua9Xmnp5nKC0B5GlhvO1xds0PoW9m3o//ZmqbT3TKlK/yUamE lthNZPSm8/rLGnrq0IXcpzsoGC+L8yEHsupvT6fYr5GD2n+3GwnkpPJGAIz3xnU3wn+9bLx3unQ 4fMzGcruCEzisMPQBiIxP61D3naqngaaTgGYzmx5+SJhEer0pV+lUu4ugxeKqzL4Zk1rlOTfJyL ZbhVMuuZizpCI9d7aZHZPKnTmyiA6edSkJcEJ2YYyQyDgCJoV9ew8mLtvyb6Azaasy8Vp1R6Hem 8ok1N8mHt61M1yEVVI/0AqMsD44A1OOjSDVL0dq+ID7vhPyfQAg0+89uTqq1B8+jjkUqgwwg4/4 l976pA045KxFtWg== X-Developer-Key: i=louis.chauvet@bootlin.com; a=openpgp; fpr=8B7104AE9A272D6693F527F2EC1883F55E0B40A5 X-GND-Sasl: louis.chauvet@bootlin.com Re-introduce a line-by-line composition algorithm for each pixel format. This allows more performance by not requiring an indirection per pixel read. This patch is focused on readability of the code. Line-by-line composition was introduced by [1] but rewritten back to pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact on performance, and it was merged. This patch is almost a revert of [2], but in addition efforts have been made to increase readability and maintainability of the rotation handling. The blend function is now divided in two parts: - Transformation of coordinates from the output referential to the source referential - Line conversion and blending Most of the complexity of the rotation management is avoided by using drm_rect_* helpers. The remaining complexity is around the clipping, to avoid reading/writing outside source/destination buffers. The pixel conversion is now done line-by-line, so the read_pixel_t was replaced with read_pixel_line_t callback. This way the indirection is only required once per line and per plane, instead of once per pixel and per plane. The read_line_t callbacks are very similar for most pixel format, but it is required to avoid performance impact. Some helpers for color conversion were introduced to avoid code repetition: - *_to_argb_u16: perform colors conversion. They should be inlined by the compiler, and they are used to avoid repetition between multiple variants of the same format (argb/xrgb and maybe in the future for formats like bgr formats). This new algorithm was tested with: - kms_plane (for color conversions) - kms_rotation_crc (for rotations of planes) - kms_cursor_crc (for translations of planes) - kms_rotation (for all rotations and formats combinations) [3] The performance gain was mesured with kms_fb_stress [4] with some modification to fix the writeback format. The performance improvement is around 5 to 10%. [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept new formats") https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail= .com/ [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion functionality") https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/ [3]: https://lore.kernel.org/igt-dev/20240313-new_rotation-v2-0-6230fd5cae5= 9@bootlin.com/ [4]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc= 88@riseup.net/ Signed-off-by: Louis Chauvet Acked-by: Pekka Paalanen # Conflicts: # drivers/gpu/drm/vkms/vkms_composer.c --- drivers/gpu/drm/vkms/vkms_composer.c | 234 ++++++++++++++++++++++++++++---= ---- drivers/gpu/drm/vkms/vkms_drv.h | 28 +++-- drivers/gpu/drm/vkms/vkms_formats.c | 224 ++++++++++++++++++++-----------= -- drivers/gpu/drm/vkms/vkms_formats.h | 2 +- drivers/gpu/drm/vkms/vkms_plane.c | 5 +- 5 files changed, 344 insertions(+), 149 deletions(-) diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vk= ms_composer.c index 601e33431b45..7a3e47b895a7 100644 --- a/drivers/gpu/drm/vkms/vkms_composer.c +++ b/drivers/gpu/drm/vkms/vkms_composer.c @@ -29,8 +29,8 @@ static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 al= pha) * @x_start: The start offset * @pixel_count: The number of pixels to blend * - * The pixels [0;@pixel_count) in stage_buffer are blended at [@x_start;@x= _start+@pixel_count) in - * output_buffer. + * The pixels [@x_start;@x_start+@pixel_count) in stage_buffer are blended= at + * [@x_start;@x_start+@pixel_count) in output_buffer. * * The current DRM assumption is that pixel color values have been already * pre-multiplied with the alpha channel values. See more @@ -41,7 +41,7 @@ static void pre_mul_alpha_blend(const struct line_buffer = *stage_buffer, struct line_buffer *output_buffer, int x_start, int pixel_count) { struct pixel_argb_u16 *out =3D &output_buffer->pixels[x_start]; - const struct pixel_argb_u16 *in =3D stage_buffer->pixels; + const struct pixel_argb_u16 *in =3D &stage_buffer->pixels[x_start]; =20 for (int i =3D 0; i < pixel_count; i++) { out[i].a =3D (u16)0xffff; @@ -51,33 +51,6 @@ static void pre_mul_alpha_blend(const struct line_buffer= *stage_buffer, } } =20 -static int get_y_pos(struct vkms_frame_info *frame_info, int y) -{ - if (frame_info->rotation & DRM_MODE_REFLECT_Y) - return drm_rect_height(&frame_info->rotated) - y - 1; - - switch (frame_info->rotation & DRM_MODE_ROTATE_MASK) { - case DRM_MODE_ROTATE_90: - return frame_info->rotated.x2 - y - 1; - case DRM_MODE_ROTATE_270: - return y + frame_info->rotated.x1; - default: - return y; - } -} - -static bool check_limit(struct vkms_frame_info *frame_info, int pos) -{ - if (drm_rotation_90_or_270(frame_info->rotation)) { - if (pos >=3D 0 && pos < drm_rect_width(&frame_info->rotated)) - return true; - } else { - if (pos >=3D frame_info->rotated.y1 && pos < frame_info->rotated.y2) - return true; - } - - return false; -} =20 static void fill_background(const struct pixel_argb_u16 *background_color, struct line_buffer *output_buffer) @@ -203,6 +176,182 @@ static enum pixel_read_direction direction_for_rotati= on(unsigned int rotation) return READ_LEFT_TO_RIGHT; } =20 +/** + * clamp_line_coordinates() - Compute and clamp the coordinate to read and= write during the blend + * process. + * + * @direction: direction of the reading + * @current_plane: current plane blended + * @src_line: source line of the reading. Only the top-left coordinate is = used. This rectangle + * must be rotated and have a shape of 1*pixel_count if @direction is vert= ical and a shape of + * pixel_count*1 if @direction is horizontal. + * @src_x_start: x start coordinate for the line reading + * @src_y_start: y start coordinate for the line reading + * @dst_x_start: x coordinate to blend the read line + * @pixel_count: number of pixels to blend + * + * This function is mainly a safety net to avoid reading outside the sourc= e buffer. As the + * userspace should never ask to read outside the source plane, all the ca= ses covered here should + * be dead code. + */ +static void clamp_line_coordinates(enum pixel_read_direction direction, + const struct vkms_plane_state *current_plane, + const struct drm_rect *src_line, int *src_x_start, + int *src_y_start, int *dst_x_start, int *pixel_count) +{ + /* By default the start points are correct */ + *src_x_start =3D src_line->x1; + *src_y_start =3D src_line->y1; + *dst_x_start =3D current_plane->frame_info->dst.x1; + + /* Get the correct number of pixel to blend, it depends of the direction = */ + switch (direction) { + case READ_LEFT_TO_RIGHT: + case READ_RIGHT_TO_LEFT: + *pixel_count =3D drm_rect_width(src_line); + break; + case READ_BOTTOM_TO_TOP: + case READ_TOP_TO_BOTTOM: + *pixel_count =3D drm_rect_height(src_line); + break; + } + + /* + * Clamp the coordinates to avoid reading outside the buffer + * + * This is mainly a security check to avoid reading outside the buffer, t= he userspace + * should never request to read outside the source buffer. + */ + switch (direction) { + case READ_LEFT_TO_RIGHT: + case READ_RIGHT_TO_LEFT: + if (*src_x_start < 0) { + *pixel_count +=3D *src_x_start; + *dst_x_start -=3D *src_x_start; + *src_x_start =3D 0; + } + if (*src_x_start + *pixel_count > current_plane->frame_info->fb->width) + *pixel_count =3D max(0, (int)current_plane->frame_info->fb->width - + *src_x_start); + break; + case READ_BOTTOM_TO_TOP: + case READ_TOP_TO_BOTTOM: + if (*src_y_start < 0) { + *pixel_count +=3D *src_y_start; + *dst_x_start -=3D *src_y_start; + *src_y_start =3D 0; + } + if (*src_y_start + *pixel_count > current_plane->frame_info->fb->height) + *pixel_count =3D max(0, (int)current_plane->frame_info->fb->height - + *src_y_start); + break; + } +} + +/** + * blend_line() - Blend a line from a plane to the output buffer + * + * @current_plane: current plane to work on + * @y: line to write in the output buffer + * @crtc_x_limit: width of the output buffer + * @stage_buffer: temporary buffer to convert the pixel line from the sour= ce buffer + * @output_buffer: buffer to blend the read line into. + */ +static void blend_line(struct vkms_plane_state *current_plane, int y, + int crtc_x_limit, struct line_buffer *stage_buffer, + struct line_buffer *output_buffer) +{ + int src_x_start, src_y_start, dst_x_start, pixel_count; + struct drm_rect dst_line, tmp_src, src_line; + + /* Avoid rendering useless lines */ + if (y < current_plane->frame_info->dst.y1 || + y >=3D current_plane->frame_info->dst.y2) + return; + + /* + * dst_line is the line to copy. The initial coordinates are inside the + * destination framebuffer, and then drm_rect_* helpers are used to + * compute the correct position into the source framebuffer. + */ + dst_line =3D DRM_RECT_INIT(current_plane->frame_info->dst.x1, y, + drm_rect_width(¤t_plane->frame_info->dst), + 1); + + drm_rect_fp_to_int(&tmp_src, ¤t_plane->frame_info->src); + + /* + * [1]: Clamping src_line to the crtc_x_limit to avoid writing outside of + * the destination buffer + */ + dst_line.x1 =3D max_t(int, dst_line.x1, 0); + dst_line.x2 =3D min_t(int, dst_line.x2, crtc_x_limit); + /* The destination is completely outside of the crtc. */ + if (dst_line.x2 <=3D dst_line.x1) + return; + + src_line =3D dst_line; + + /* + * Transform the coordinate x/y from the crtc to coordinates into + * coordinates for the src buffer. + * + * - Cancel the offset of the dst buffer. + * - Invert the rotation. This assumes that + * dst =3D drm_rect_rotate(src, rotation) (dst and src have the + * same size, but can be rotated). + * - Apply the offset of the source rectangle to the coordinate. + */ + drm_rect_translate(&src_line, -current_plane->frame_info->dst.x1, + -current_plane->frame_info->dst.y1); + drm_rect_rotate_inv(&src_line, drm_rect_width(&tmp_src), + drm_rect_height(&tmp_src), + current_plane->frame_info->rotation); + drm_rect_translate(&src_line, tmp_src.x1, tmp_src.y1); + + /* Get the correct reading direction in the source buffer. */ + + enum pixel_read_direction direction =3D + direction_for_rotation(current_plane->frame_info->rotation); + + /* [2]: Compute and clamp the number of pixel to read */ + clamp_line_coordinates(direction, current_plane, &src_line, &src_x_start,= &src_y_start, + &dst_x_start, &pixel_count); + + if (pixel_count <=3D 0) { + /* Nothing to read, so avoid multiple function calls */ + return; + } + + /* + * Modify the starting point to take in account the rotation + * + * src_line is the top-left corner, so when reading READ_RIGHT_TO_LEFT or + * READ_BOTTOM_TO_TOP, it must be changed to the top-right/bottom-left + * corner. + */ + if (direction =3D=3D READ_RIGHT_TO_LEFT) { + // src_x_start is now the right point + src_x_start +=3D pixel_count - 1; + } else if (direction =3D=3D READ_BOTTOM_TO_TOP) { + // src_y_start is now the bottom point + src_y_start +=3D pixel_count - 1; + } + + /* + * Perform the conversion and the blending + * + * Here we know that the read line (x_start, y_start, pixel_count) is + * inside the source buffer [2] and we don't write outside the stage + * buffer [1]. + */ + current_plane->pixel_read_line(current_plane, src_x_start, src_y_start, d= irection, + pixel_count, &stage_buffer->pixels[dst_x_start]); + + pre_mul_alpha_blend(stage_buffer, output_buffer, + dst_x_start, pixel_count); +} + /** * blend - blend the pixels from all planes and compute crc * @wb: The writeback frame buffer metadata @@ -223,34 +372,25 @@ static void blend(struct vkms_writeback_job *wb, { struct vkms_plane_state **plane =3D crtc_state->active_planes; u32 n_active_planes =3D crtc_state->num_active_planes; - int y_pos, x_dst, pixel_count; =20 const struct pixel_argb_u16 background_color =3D { .a =3D 0xffff }; =20 - size_t crtc_y_limit =3D crtc_state->base.crtc->mode.vdisplay; + int crtc_y_limit =3D crtc_state->base.crtc->mode.vdisplay; + int crtc_x_limit =3D crtc_state->base.crtc->mode.hdisplay; =20 /* * The planes are composed line-by-line to avoid heavy memory usage. It i= s a necessary * complexity to avoid poor blending performance. * - * The function vkms_compose_row() is used to read a line, pixel-by-pixel= , into the staging - * buffer. + * The function pixel_read_line callback is used to read a line, using an= efficient + * algorithm for a specific format, into the staging buffer. */ - for (size_t y =3D 0; y < crtc_y_limit; y++) { + for (int y =3D 0; y < crtc_y_limit; y++) { fill_background(&background_color, output_buffer); =20 /* The active planes are composed associatively in z-order. */ for (size_t i =3D 0; i < n_active_planes; i++) { - x_dst =3D plane[i]->frame_info->dst.x1; - pixel_count =3D min_t(int, drm_rect_width(&plane[i]->frame_info->dst), - (int)stage_buffer->n_pixels); - y_pos =3D get_y_pos(plane[i]->frame_info, y); - - if (!check_limit(plane[i]->frame_info, y_pos)) - continue; - - vkms_compose_row(stage_buffer, plane[i], y_pos); - pre_mul_alpha_blend(stage_buffer, output_buffer, x_dst, pixel_count); + blend_line(plane[i], y, crtc_x_limit, stage_buffer, output_buffer); } =20 apply_lut(crtc_state, output_buffer); @@ -258,7 +398,7 @@ static void blend(struct vkms_writeback_job *wb, *crc32 =3D crc32_le(*crc32, (void *)output_buffer->pixels, row_size); =20 if (wb) - vkms_writeback_row(wb, output_buffer, y_pos); + vkms_writeback_row(wb, output_buffer, y); } } =20 @@ -269,7 +409,7 @@ static int check_format_funcs(struct vkms_crtc_state *c= rtc_state, u32 n_active_planes =3D crtc_state->num_active_planes; =20 for (size_t i =3D 0; i < n_active_planes; i++) - if (!planes[i]->pixel_read) + if (!planes[i]->pixel_read_line) return -1; =20 if (active_wb && !active_wb->pixel_write) diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_dr= v.h index 777b7bd91f27..067a4797f7a0 100644 --- a/drivers/gpu/drm/vkms/vkms_drv.h +++ b/drivers/gpu/drm/vkms/vkms_drv.h @@ -39,7 +39,6 @@ struct vkms_frame_info { struct drm_framebuffer *fb; struct drm_rect src, dst; - struct drm_rect rotated; struct iosys_map map[DRM_FORMAT_MAX_PLANES]; unsigned int rotation; }; @@ -80,26 +79,38 @@ enum pixel_read_direction { READ_LEFT_TO_RIGHT }; =20 +struct vkms_plane_state; + /** - * typedef pixel_read_t - These functions are used to read a pixel in the = source frame, + * typedef pixel_read_line_t - These functions are used to read a pixel li= ne in the source frame, * convert it to `struct pixel_argb_u16` and write it to @out_pixel. * - * @in_pixel: pointer to the pixel to read - * @out_pixel: pointer to write the converted pixel + * @plane: plane used as source for the pixel value + * @x_start: X (width) coordinate of the first pixel to copy. The caller m= ust ensure that x_start + * is non-negative and smaller than @plane->frame_info->fb->width. + * @y_start: Y (height) coordinate of the first pixel to copy. The caller = must ensure that y_start + * is non-negative and smaller than @plane->frame_info->fb->height. + * @direction: direction to use for the copy, starting at @x_start/@y_start + * @count: number of pixels to copy + * @out_pixel: pointer where to write the pixel values. They will be writt= en from @out_pixel[0] + * (included) to @out_pixel[@count] (excluded). The caller must ensure tha= t out_pixel have a + * length of at least @count. */ -typedef void (*pixel_read_t)(const u8 *in_pixel, struct pixel_argb_u16 *ou= t_pixel); +typedef void (*pixel_read_line_t)(const struct vkms_plane_state *plane, in= t x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]); =20 /** * struct vkms_plane_state - Driver specific plane state * @base: base plane state * @frame_info: data required for composing computation - * @pixel_read: function to read a pixel in this plane. The creator of a s= truct vkms_plane_state - * must ensure that this pointer is valid + * @pixel_read_line: function to read a pixel line in this plane. The crea= tor of a + * struct vkms_plane_state must ensure that this pointer is valid */ struct vkms_plane_state { struct drm_shadow_plane_state base; struct vkms_frame_info *frame_info; - pixel_read_t pixel_read; + pixel_read_line_t pixel_read_line; }; =20 struct vkms_plane { @@ -265,7 +276,6 @@ int vkms_verify_crc_source(struct drm_crtc *crtc, const= char *source_name, /* Composer Support */ void vkms_composer_worker(struct work_struct *work); void vkms_set_composer(struct vkms_output *out, bool enabled); -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_= state *plane, int y); void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_b= uffer *src_buffer, int y); =20 /* Writeback */ diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkm= s_formats.c index d0e7dfc1f0d3..0f6678420a11 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.c +++ b/drivers/gpu/drm/vkms/vkms_formats.c @@ -140,83 +140,51 @@ static void packed_pixels_addr_1x1(const struct vkms_= frame_info *frame_info, *addr =3D (u8 *)frame_info->map[0].vaddr + offset; } =20 -static void *get_packed_src_addr(const struct vkms_frame_info *frame_info,= int y, - int plane_index) -{ - int x_src =3D frame_info->src.x1 >> 16; - int y_src =3D y - frame_info->rotated.y1 + (frame_info->src.y1 >> 16); - u8 *addr; - int rem_x, rem_y; - - WARN_ONCE(drm_format_info_block_width(frame_info->fb->format, plane_index= ) !=3D 1, - "%s() only support formats with block_w =3D=3D 1", __func__); - WARN_ONCE(drm_format_info_block_height(frame_info->fb->format, plane_inde= x) !=3D 1, - "%s() only support formats with block_h =3D=3D 1", __func__); - - packed_pixels_addr(frame_info, x_src, y_src, plane_index, &addr, &rem_x, = &rem_y); - - return addr; -} - -static int get_x_position(const struct vkms_frame_info *frame_info, int li= mit, int x) -{ - if (frame_info->rotation & (DRM_MODE_REFLECT_X | DRM_MODE_ROTATE_270)) - return limit - x - 1; - return x; -} - /* - * The following functions take pixel data from the buffer and convert the= m to the format - * ARGB16161616 in @out_pixel. + * The following functions take pixel data (a, r, g, b, pixel, ...) and co= nvert them to + * &struct pixel_argb_u16 * - * They are used in the vkms_compose_row() function to handle multiple for= mats. + * They are used in the `read_line`s functions to avoid duplicate work for= some pixel formats. */ =20 -static void ARGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16= *out_pixel) +static struct pixel_argb_u16 argb_u16_from_u8888(u8 a, u8 r, u8 g, u8 b) { + struct pixel_argb_u16 out_pixel; /* * The 257 is the "conversion ratio". This number is obtained by the * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get * the best color value in a pixel format with more possibilities. * A similar idea applies to others RGB color conversions. */ - out_pixel->a =3D (u16)in_pixel[3] * 257; - out_pixel->r =3D (u16)in_pixel[2] * 257; - out_pixel->g =3D (u16)in_pixel[1] * 257; - out_pixel->b =3D (u16)in_pixel[0] * 257; -} + out_pixel.a =3D (u16)a * 257; + out_pixel.r =3D (u16)r * 257; + out_pixel.g =3D (u16)g * 257; + out_pixel.b =3D (u16)b * 257; =20 -static void XRGB8888_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16= *out_pixel) -{ - out_pixel->a =3D (u16)0xffff; - out_pixel->r =3D (u16)in_pixel[2] * 257; - out_pixel->g =3D (u16)in_pixel[1] * 257; - out_pixel->b =3D (u16)in_pixel[0] * 257; + return out_pixel; } =20 -static void ARGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb= _u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_u16161616(u16 a, u16 r, u16 g, = u16 b) { - __le16 *pixel =3D (__le16 *)in_pixel; + struct pixel_argb_u16 out_pixel; + + out_pixel.a =3D a; + out_pixel.r =3D r; + out_pixel.g =3D g; + out_pixel.b =3D b; =20 - out_pixel->a =3D le16_to_cpu(pixel[3]); - out_pixel->r =3D le16_to_cpu(pixel[2]); - out_pixel->g =3D le16_to_cpu(pixel[1]); - out_pixel->b =3D le16_to_cpu(pixel[0]); + return out_pixel; } =20 -static void XRGB16161616_to_argb_u16(const u8 *in_pixel, struct pixel_argb= _u16 *out_pixel) +static struct pixel_argb_u16 argb_u16_from_le16161616(__le16 a, __le16 r, = __le16 g, __le16 b) { - __le16 *pixel =3D (__le16 *)in_pixel; - - out_pixel->a =3D (u16)0xffff; - out_pixel->r =3D le16_to_cpu(pixel[2]); - out_pixel->g =3D le16_to_cpu(pixel[1]); - out_pixel->b =3D le16_to_cpu(pixel[0]); + return argb_u16_from_u16161616(le16_to_cpu(a), le16_to_cpu(r), le16_to_cp= u(g), + le16_to_cpu(b)); } =20 -static void RGB565_to_argb_u16(const u8 *in_pixel, struct pixel_argb_u16 *= out_pixel) +static struct pixel_argb_u16 argb_u16_from_RGB565(const __le16 *pixel) { - __le16 *pixel =3D (__le16 *)in_pixel; + struct pixel_argb_u16 out_pixel; =20 s64 fp_rb_ratio =3D drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(31)); s64 fp_g_ratio =3D drm_fixp_div(drm_int2fixp(65535), drm_int2fixp(63)); @@ -226,40 +194,120 @@ static void RGB565_to_argb_u16(const u8 *in_pixel, s= truct pixel_argb_u16 *out_pi s64 fp_g =3D drm_int2fixp((rgb_565 >> 5) & 0x3f); s64 fp_b =3D drm_int2fixp(rgb_565 & 0x1f); =20 - out_pixel->a =3D (u16)0xffff; - out_pixel->r =3D drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio)); - out_pixel->g =3D drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio)); - out_pixel->b =3D drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio)); + out_pixel.a =3D (u16)0xffff; + out_pixel.r =3D drm_fixp2int_round(drm_fixp_mul(fp_r, fp_rb_ratio)); + out_pixel.g =3D drm_fixp2int_round(drm_fixp_mul(fp_g, fp_g_ratio)); + out_pixel.b =3D drm_fixp2int_round(drm_fixp_mul(fp_b, fp_rb_ratio)); + + return out_pixel; } =20 -/** - * vkms_compose_row - compose a single row of a plane - * @stage_buffer: output line with the composed pixels - * @plane: state of the plane that is being composed - * @y: y coordinate of the row +/* + * The following functions are read_line function for each pixel format su= pported by VKMS. + * + * They read a line starting at the point @x_start,@y_start following the = @direction. The result + * is stored in @out_pixel and in the format ARGB16161616. + * + * These functions are very repetitive, but the innermost pixel loops must= be kept inside these + * functions for performance reasons. Some benchmarking was done in [1] wh= ere having the innermost + * loop factored out of these functions showed a slowdown by a factor of t= hree. * - * This function composes a single row of a plane. It gets the source pixe= ls - * through the y coordinate (see get_packed_src_addr()) and goes linearly - * through the source pixel, reading the pixels and converting it to - * ARGB16161616 (see the pixel_read() callback). For rotate-90 and rotate-= 270, - * the source pixels are not traversed linearly. The source pixels are que= ried - * on each iteration in order to traverse the pixels vertically. + * [1]: https://lore.kernel.org/dri-devel/d258c8dc-78e9-4509-9037-a98f7f33= b3a3@riseup.net/ */ -void vkms_compose_row(struct line_buffer *stage_buffer, struct vkms_plane_= state *plane, int y) + +static void ARGB8888_read_line(const struct vkms_plane_state *plane, int x= _start, int y_start, + enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) { - struct pixel_argb_u16 *out_pixels =3D stage_buffer->pixels; - struct vkms_frame_info *frame_info =3D plane->frame_info; - u8 *src_pixels =3D get_packed_src_addr(frame_info, y, 0); - int limit =3D min_t(size_t, drm_rect_width(&frame_info->dst), stage_buffe= r->n_pixels); + struct pixel_argb_u16 *end =3D out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixel= s); + + int step =3D get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u8 *px =3D (u8 *)src_pixels; + *out_pixel =3D argb_u16_from_u8888(px[3], px[2], px[1], px[0]); + out_pixel +=3D 1; + src_pixels +=3D step; + } +} + +static void XRGB8888_read_line(const struct vkms_plane_state *plane, int x= _start, int y_start, + enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end =3D out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixel= s); + + int step =3D get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u8 *px =3D (u8 *)src_pixels; + *out_pixel =3D argb_u16_from_u8888(255, px[2], px[1], px[0]); + out_pixel +=3D 1; + src_pixels +=3D step; + } +} + +static void ARGB16161616_read_line(const struct vkms_plane_state *plane, i= nt x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end =3D out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixel= s); + + int step =3D get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + u16 *px =3D (u16 *)src_pixels; + *out_pixel =3D argb_u16_from_u16161616(px[3], px[2], px[1], px[0]); + out_pixel +=3D 1; + src_pixels +=3D step; + } +} + +static void XRGB16161616_read_line(const struct vkms_plane_state *plane, i= nt x_start, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end =3D out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixel= s); + + int step =3D get_block_step_bytes(plane->frame_info->fb, direction, 0); + + while (out_pixel < end) { + __le16 *px =3D (__le16 *)src_pixels; + *out_pixel =3D argb_u16_from_le16161616(cpu_to_le16(0xFFFF), px[2], px[1= ], px[0]); + out_pixel +=3D 1; + src_pixels +=3D step; + } +} + +static void RGB565_read_line(const struct vkms_plane_state *plane, int x_s= tart, + int y_start, enum pixel_read_direction direction, int count, + struct pixel_argb_u16 out_pixel[]) +{ + struct pixel_argb_u16 *end =3D out_pixel + count; + u8 *src_pixels; + + packed_pixels_addr_1x1(plane->frame_info, x_start, y_start, 0, &src_pixel= s); =20 - for (size_t x =3D 0; x < limit; x++, src_pixels +=3D frame_info->fb->form= at->cpp[0]) { - int x_pos =3D get_x_position(frame_info, limit, x); + int step =3D get_block_step_bytes(plane->frame_info->fb, direction, 0); =20 - if (drm_rotation_90_or_270(frame_info->rotation)) - src_pixels =3D get_packed_src_addr(frame_info, x + frame_info->rotated.= y1, 0) - + frame_info->fb->format->cpp[0] * y; + while (out_pixel < end) { + __le16 *px =3D (__le16 *)src_pixels; =20 - plane->pixel_read(src_pixels, &out_pixels[x_pos]); + *out_pixel =3D argb_u16_from_RGB565(px); + out_pixel +=3D 1; + src_pixels +=3D step; } } =20 @@ -359,25 +407,25 @@ void vkms_writeback_row(struct vkms_writeback_job *wb, } =20 /** - * get_pixel_read_function() - Retrieve the correct read_pixel function fo= r a specific + * get_pixel_read_line_function() - Retrieve the correct read_line functio= n for a specific * format. The returned pointer is NULL for unsupported pixel formats. The= caller must ensure that * the pointer is valid before using it in a vkms_plane_state. * * @format: DRM_FORMAT_* value for which to obtain a conversion function (= see [drm_fourcc.h]) */ -pixel_read_t get_pixel_read_function(u32 format) +pixel_read_line_t get_pixel_read_line_function(u32 format) { switch (format) { case DRM_FORMAT_ARGB8888: - return &ARGB8888_to_argb_u16; + return &ARGB8888_read_line; case DRM_FORMAT_XRGB8888: - return &XRGB8888_to_argb_u16; + return &XRGB8888_read_line; case DRM_FORMAT_ARGB16161616: - return &ARGB16161616_to_argb_u16; + return &ARGB16161616_read_line; case DRM_FORMAT_XRGB16161616: - return &XRGB16161616_to_argb_u16; + return &XRGB16161616_read_line; case DRM_FORMAT_RGB565: - return &RGB565_to_argb_u16; + return &RGB565_read_line; default: /* * This is a bug in vkms_plane_atomic_check(). All the supported diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkm= s_formats.h index 3ecea4563254..8d2bef95ff79 100644 --- a/drivers/gpu/drm/vkms/vkms_formats.h +++ b/drivers/gpu/drm/vkms/vkms_formats.h @@ -5,7 +5,7 @@ =20 #include "vkms_drv.h" =20 -pixel_read_t get_pixel_read_function(u32 format); +pixel_read_line_t get_pixel_read_line_function(u32 format); =20 pixel_write_t get_pixel_write_function(u32 format); =20 diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_= plane.c index 10e9b23dab28..8875bed76410 100644 --- a/drivers/gpu/drm/vkms/vkms_plane.c +++ b/drivers/gpu/drm/vkms/vkms_plane.c @@ -112,7 +112,6 @@ static void vkms_plane_atomic_update(struct drm_plane *= plane, frame_info =3D vkms_plane_state->frame_info; memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect)); memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect)); - memcpy(&frame_info->rotated, &new_state->dst, sizeof(struct drm_rect)); frame_info->fb =3D fb; memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->ma= p)); drm_framebuffer_get(frame_info->fb); @@ -122,10 +121,8 @@ static void vkms_plane_atomic_update(struct drm_plane = *plane, DRM_MODE_REFLECT_X | DRM_MODE_REFLECT_Y); =20 - drm_rect_rotate(&frame_info->rotated, drm_rect_width(&frame_info->rotated= ), - drm_rect_height(&frame_info->rotated), frame_info->rotation); =20 - vkms_plane_state->pixel_read =3D get_pixel_read_function(fmt); + vkms_plane_state->pixel_read_line =3D get_pixel_read_line_function(fmt); } =20 static int vkms_plane_atomic_check(struct drm_plane *plane, --=20 2.46.2