Add embedded programming chapter by gpx1000 · Pull Request #367 · KhronosGroup/Vulkan-Guide

gpx1000 · 2026-03-16T03:42:24Z

It looked like we were debating where to put generic embedded information in the TBR chapter. That made me realize, we have a gap so, attempting to fill it in.

NB: This should be accepted AFTER the TBR chapter in PR #338 lands.

gpx1000 · 2026-03-16T03:45:04Z

NB: The build will fail until #338 lands as it links to that chapter.

chapters/embedded_programming.adoc

@spencer-lunarg

With special thanks to @spencer-lunarg Co-authored-by: Spencer Fricke <115671160+spencer-lunarg@users.noreply.github.com>

SaschaWillems

I don't have that much experience with embedded programming outside some experiments with Raspberry PIs and Android TVs, but this looks good to me. Only a few minor remarks.

Having a few more links inside the text (e.g. to spec chapters) would make this a bit easier to follow.

SaschaWillems · 2026-03-21T14:22:41Z

chapters/embedded_programming.adoc

+* **Control Lists (CL)**: The VideoCore GPU doesn't use standard command buffers in the way a desktop GPU does. Instead, the driver generates "Control Lists" that the hardware's V3D unit executes.
+* **Contiguous Memory Allocator (CMA)**: On Linux, the GPU requires physically contiguous memory. This is managed by the kernel's CMA pool. If your application crashes or fails to allocate memory despite plenty of RAM being available, you may need to increase the CMA size in `/boot/config.txt`.
+** Example: `dtoverlay=vc4-kms-v3d,cma-512` allocates 512MB to the GPU.
+* **Performance Tipping Points**: The `v3dv` driver is very efficient, but it has specific "tipping points" where it must flush the tile buffer to RAM (a "resolve"). To avoid this, ensure your render passes are structured to fit within the tile buffer limits (which vary based on the number of samples and the format of the attachments).


Is there a way to query this limit? If so, can this be added?

SaschaWillems · 2026-03-21T14:27:58Z

chapters/embedded_programming.adoc

+* **Memory Alignment**: Alignment requirements for certain resources (like `minStorageBufferOffsetAlignment`) can be much larger on embedded GPUs than on desktop counterparts. Always check the limits in `VkPhysicalDeviceProperties`.
+* **Fragmented Memory**: In systems with long uptimes (like industrial controllers), memory fragmentation can lead to allocation failures even when "free" memory appears available. Reusing allocations or using a robust allocator like the Vulkan Memory Allocator (VMA) is highly recommended.
+
+== The Direct-to-Display Workflow (VK_KHR_display)


This should link to https://docs.vulkan.org/refpages/latest/refpages/source/VK_KHR_display.html somewhere

SaschaWillems · 2026-03-21T15:07:36Z

chapters/embedded_programming.adoc

+
+* **Subgroup Operations**: Use `VK_KHR_shader_subgroup` to share data between shader invocations. For example, if you need to calculate an average of pixels in a neighborhood, use subgroup arithmetic instead of writing to and reading from shared memory (`shared` variables). This keeps the data within the GPU's register file, saving significant power.
+* **Reduced Precision**: Most embedded GPUs are twice as fast when performing 16-bit arithmetic compared to 32-bit. Use `VK_KHR_shader_float16_int8` to use half-precision types. This not only doubles throughput but also reduces the number of registers used by the shader, which allows more workgroups to run in parallel.
+* **Circular Display Optimization**: Since many smartwatches use circular displays within square memory buffers, the corners represent approximately 21.5% of the total area (the geometric difference between a square and its inscribed circle). While Vulkan renders to rectangular surfaces, you can use `discard` or `VK_EXT_discard_rectangles` (if supported) to avoid fragment processing in these non-visible regions, significantly reducing GPU ALU load and power consumption.


Isn't discard considered expensive? Would using scissors instead be a more viable/faster option?

Add embedded programming chapter

7570156

gpx1000 mentioned this pull request Mar 16, 2026

Initial TBR chapter. #338

Open

spencer-lunarg reviewed Mar 17, 2026

View reviewed changes

chapters/embedded_programming.adoc Outdated Show resolved Hide resolved

spencer-lunarg previously approved these changes Mar 17, 2026

View reviewed changes

gpx1000 dismissed spencer-lunarg’s stale review via 475965b March 17, 2026 00:12

Update chapters/embedded_programming.adoc

475965b

With special thanks to @spencer-lunarg Co-authored-by: Spencer Fricke <115671160+spencer-lunarg@users.noreply.github.com>

SaschaWillems requested changes Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add embedded programming chapter#367

Add embedded programming chapter#367
gpx1000 wants to merge 2 commits intoKhronosGroup:mainfrom
gpx1000:embedded_programming

gpx1000 commented Mar 16, 2026

Uh oh!

gpx1000 commented Mar 16, 2026

Uh oh!

Uh oh!

SaschaWillems left a comment

Uh oh!

SaschaWillems Mar 21, 2026

Uh oh!

SaschaWillems Mar 21, 2026

Uh oh!

SaschaWillems Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gpx1000 commented Mar 16, 2026

Uh oh!

gpx1000 commented Mar 16, 2026

Uh oh!

Uh oh!

SaschaWillems left a comment

Choose a reason for hiding this comment

Uh oh!

SaschaWillems Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

SaschaWillems Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

SaschaWillems Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants