QT slow to connect to PVR using Weston

Hi, I’m wondering if anyone else has this problem; I’m thinking it’s in QT, but if anyone has tips they would be appreciated. This has been reproduced in QT 6.4.2, 6.8.3, and 6.9.0. I’m sticking with 6.8.3 right now because I can see that the RHI support is working (I can see WAYLAND_DEBUG calls to the wl_drm buffer, Mesa debug calls use PVR, and the application is registered in PVRTune as using the PVR libraries).

Weston with EGL is working fine - I am working on a fairly large screen (1280x720) with an older processor, so getting weston-simple-egl fullscreen gets around 30 FPS, and utilizes the SGX 530v5 pretty evenly (only 10FPS when PVRTune is running)

On the otherhand, even the most simple QT apps seem to have some problems. I have the QT_WIDGETS_RHI_BACKEND=opengl, and it’s slow to respond (about 1/2 a second when PVRTune isn’t running). The first spike seen in PVRTune seems to correspond with the input time, and the second corresponds with the GUI response time.

The test here is the fullscreen QTBase calculator widget.

I do see that there is still a wl_shm pool being created at the beginning, so maybe that connection is being used to share data rather than the DRM memory?

This looks to me like we aren’t properly utilizing the powerVR for rendering in this case, so any tips would be appreciated. Thanks!

Also, my configuration is as follows. I’ve tested the different pixel formats, and I can see that RHI is using the PowerVR renderer via the output:

qt.rhi.general: OpenGL VENDOR: Imagination Technologies RENDERER: PowerVR SGX 530 VERSION: OpenGL ES 2.0 build 1.17@4948957

# ./sgx_check.sh
WSEGL settings
[default]
WindowSystem=libpvrDRMWSEGL.so
DefaultPixelFormat=RGB888
#DefaultPixelFormat=RGB565

------
ARM CPU information
processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 597.60
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc08
CPU revision    : 2

Hardware        : Generic OMAP36xx (Flattened Device Tree)
Revision        : 0000
Serial          : 0000000000000000
------
SGX driver information
Version SGX_DDK sgxddk 1.17@4948957 (release) dm37xx_linux
System Version String: SGX revision = 125
------
Framebuffer settings

mode "1280x720"
    geometry 1280 720 1280 720 32
    timings 0 0 0 0 0 0 0
    accel true
    rgba 8/16,8/8,8/0,0/0
endmode

Frame buffer device information:
    Name        : omapdrmdrmfb
    Address     : (nil)
    Size        : 3686400
    Type        : PACKED PIXELS
    Visual      : TRUECOLOR
    XPanStep    : 1
    YPanStep    : 1
    YWrapStep   : 0
    LineLength  : 5120
    Accelerator : No
------
Rotation settings
0
------
PVR Module information
Module                  Size  Used by
pvrsrvkm              393216  2
------
Boot settings
console=ttyO0,115200n8 rootwait=1 rw ubi.mtd=7,512 rootfstype=ubifs root=ubi0:compu-XXXX mtdoops.mtddev=omap2.nand earlyprintk=ttyO0,115200n8 nohlt omapfb.rotate=0  vram=40M omapfb.vram=20M,1:1M,2:1M omapfb.vrfb=y cma=64MB 5
------
Linux Kernel version
Linux compu-XXXX 5.10.168-1-ctx-g991c5ce91e #1 SMP PREEMPT Fri Apr 7 09:34:04 UTC 2023 armv7l GNU/Linux
------
Weston.ini
[core]
require-input=false
idle-timeout=0
gbm-format=xrgb8888
#gbm-format=rgb565

[output]
name=DPI-1

[libinput]
touchscreen_calibrator=true
calibration_helper=/bin/echo

[shell]
locking=false
animation=none
panel-position=none
close-animation=none
startup-animation=none
focus-animation=none
------
/etc/profile.d/qt_env.sh
#!/bin/sh

### QT Environment Variables ###
# export QT_QPA_EVDEV_TOUCHSCREEN_PARAMETERS="rotate=180"
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
export QT_QPA_EGLFS_KMS_CONFIG=/etc/qt6/eglfs_kms_cfg.json
#export QT_QPA_EGLFS_INTEGRATION=eglfs_kms
export QT_QPA_EGLFS_ALWAYS_SET_MODE=1
export QT_WAYLAND_SHELL_INTEGRATION=xdg-shell

# SECCOMP-BPF Sandbox does not work due to unexpected FUTEX_UNLOCK_PI call
# from the pthread implementation. Disable this feature temporarily until
# those issues are resolved.
export QTWEBENGINE_CHROMIUM_FLAGS="--disable-seccomp-filter-sandbox"

export QT_QPA_EGLFS_INTEGRATION=none
export QSG_RHI_PREFER_SOFTWARE_RENDERER=0
export QT_WIDGETS_RHI_BACKEND=opengl
export QT_WIDGETS_HIGHDPI_DOWNSCALE=1
export QT_WIDGETS_RHI=1
export QT_OPENGL_NO_SANITY_CHECK=1


export QT_QPA_PLATFORM="wayland-egl"
export QT_WAYLAND_CLIENT_BUFFER_INTEGRATION="linux-dmabuf-unstable-v1"
export QT_WAYLAND_HARDWARE_INTEGRATION="linux-dmabuf-unstable-v1"
export QT_WAYLAND_SERVER_BUFFER_INTEGRATION="linux-dmabuf-unstable-v1"
export QT_WAYLAND_SHELL_INTEGRATION="xdg-shell"
export QT_WAYLAND_TEXT_INPUT_PROTOCOL="zwp_text_input_v1"

Note that I have changed out all of the QT_WAYLAND parameters, these are the fastest, I think.

Sway is much faster, having no considerable delays. The QT fancy compiler is marginally faster.

EDIT: This is because we are NOT utilizing the GPU in sway in this case, so 2D applications are much faster.

I did a larger analysis of the performance in the comments here: QtWayland compositor very bad performance | Qt Forum

Sway does not properly connect to EGL. Weston does and I can see a weston EGL test app connect, but the QT → Compositor → Weston connection is very slow in the case that I’m using a QT window > 640 px.

I can actually see this in PVRTune, likely as the compositor determines the window interaction information. Then there is a delay, assumedly as QT does its rendering, and then a series of spikes when the result is changed via the compositor and buffer swapped. It is also notable that in the case that QT has a change to make (say, clearing with numbers in the calculator line to clear) there is a longer delay between the initial spike and response than the case that there is no change to make. This is more noticable if I press “clear” 5 times - resolves in ~5 seconds, vs pressing a number 5 times - resolves in ~14 seconds).

I will also add this to my QT debugging post, as I’m not 100% certain if this is from the interaction with EGL libraries here from PowerVR, or due to QT having rendering issues. (QT on EGLFS without a compositor is fast/does not have this problem). It’s not just due to a compositor connection, or due to window rendering interaction, because I can have two simple-egl weston windows up and running at 40+ fps, and a smaller QT window accepting input), and I see pretty good GPU input (slower with PvrTune running)

I’ll be investigating why there is such a slowdown between QT and the compositor, but I’m highly suspicious of the shared memory between them right now.

Hi nbaldy,

Thanks for your message,

The information provided seems to point towards possible Qt performance issues on the target hardware, which should be reported to Qt (as you are doing in QtWayland compositor very bad performance | Qt Forum ). Hopefully you will get information from the Qt side.

Best regards,
Alejandro