Games in the GPU shaders

Screenshot from

This is not a tutorial, I write only about my way of creating logic and some useful tips and code.

Useful Shaders and Shadertoy related links and tips in my previous blog post.


  1. Useful code for data saving/loading.

How it works:

Shadertoy implementation of “buffer logic” allows read from “buffer previous frame data”(BufferA read self when binded as one of iChannels).

Basic logic —in the buffer on Frame 0 init data and on next frames execute game logic reading the previous state of saved data.

And Image buffer displays the state of logic using game elements and UI.

Creating a game in the GLSL shader is the same as in pure C — the result is very large linear straightforward logic.

Example of the game in the shaders

This example shows how I use 28 pixels to save data (state of map and logic).

Pixels, each pixel is vec4(r,g,b,a):
0 — init game status
1 to 26 — used to store 10x10 map, 25 pixels*4(r,g,b,a) values=100
27 — (r,g) player position, (b) direction of player rotation, (a) time of start movement
28 — (r,g) bullet position, (b) direction of player rotation when bulled launched, (a) time of bullet launch

Player and bullet position is tile index where they located now.
Timers are used to move player or bullet to the next tile by direction.
And Image shader animate movement using timer value.

In the tiles stored ID of the tile(water/block/free/etc), and Image shader draw graphics base on tile ID.
I use the whole 32-bit value to save a single tile ID, in real use single tile can contain much more information than just a tile ID, and for that case, the unused part of 32-bit data can be used. Look below for functions to store data.

Useful code for data saving/loading:

To read pixel data always use texelFetch.

Shadertoy Buffers is GL_RGBA32F 32 bit per value, 128 bits of data can be saved per “thread”(pixel).
GLSL has functions floatBitsToUint and uintBitsToFloat conversion float32 from/to uint32. Large 32bit uint can be stored in a single float.

To optimize data — use and save more than 4 values per pixel(thread).
These functions can be used:


uint in 0–255 range
float in -1.0 to 1.0 range

_udata32 is pack/unpack four 8-bit uint to/from 32 bit
[0xF1][0xF2][0xF3][0xF4] to/from 0xF1F2F3F4
_float4x8 is four 8-bit floats
_float3x10 is three 10-bit floats
bits is 8-bit int in range 0–255 presented as bits

Usage :

When game logic needs a large number of booleans to switch states — data to save/load can be presented as bits that allow saving 32-booleans per value and 128 booleans per pixel.

Same with other types of data, when integers don't need a full 32-bit range they can be packed to a smaller range, same with floats as for example timers-like better to store at least in the 10 bits (because 8-bit has a way too large step that noticeable in smooth-movement that you can see on the preview shader).

2D drawing/creating graphics:

Left SDF antialiasing. On the right no antialiasing.


  1. With hardware derivatives:
    Filterable procedurals — square patterns.
    Nonsquare — concentric rings and isovalues. More information on the Shadertoy Unofficial.
    Another way can be — move procedural pattern to its own buffer(BufA) and generate mipmaps for it every frame, and use texture function in the main buffer to create graphics using generated texture from a buffer.
void mainImage(out vec4 fragColor, in vec2 fragCoord)
vec4 tcol=vec4(0.);
const int AA=4;
for( int mx=0; mx<AA; mx++ )
for( int nx=0; nx<AA; nx++ )
vec2 o = vec2(float(mx),float(nx)) / float(AA) - 0.5;
Space ship-like 2D object created using SDF.

SDF Functions:

2D SDF functions and 3D SDF from Inigo Quilez.
hg_sdf — 3D SDF library.

Short youtube video that shows the creation of shader on the screenshot.

Screenshot from

User interface:

To detect UI clicks basic method/idea that used everywhere — call SDF function that generates part of your UI but instead of UV use Mouse position:

float sdCircle( vec2 p, float r )
return length(p) - r;
#define re iResolution
#define SS smoothstep
void mainImage( out vec4 c, in vec2 fc )
vec2 res = 0.5*re.xy/re.y;
vec2 p = fc/re.y - res;
float d = SS(0., 1.5/re.y, sdCircle(p,0.1));

// mouse pos in same as uv propotions
vec2 im = iMouse.xy/re.y - res;
// sdf base on mouse position
float d_im = SS(0., 1.5/re.y, sdCircle(im,0.1));
// mouse inside of seleted range
if(d_im < 1.) d = 1.-d;
c = vec4(d);

Usually, UI for games doesn't need lots of elements, UI does a very small performance impact.
Also to draw UI can be used its own buffer to not explode shader code size.

A complex UI example can be found in this shader — 2D Vector Graphics Library.

Display text using Font texture:

Shader template, ASCII character encoding. To display numbers do the same as in ASCII — add 48 to the number, number in the range 0 to 9.

3D Navigation:

I use this shader as a template.


Shadertoy audio texture explained here. Example audio shader.

Keyboard, Audio and other features full information:

Special Shadertoy features.

Optimization and debugging:

An unrolled version of my shader, 55k lines of unrolled code.

Optimization performance is about current shader logic, it way too big to discuss.

Optimize performance base on the multithreading concept. Do not have too much logic per single “thread”, better separate large logic blocks to few separated threads(pixels).

Conditional logic can be considered as — does not matter how many branches you have, modern GPUs way too fast anyway.

WebGL shaders optimizations:

Web browsers use ANGLE to pre-compile shaders and translate shaders to Native graphic API layer (Metal/DX/Vulkan/OpenGL/etc).

ANGLE is a problem — ANGLE has lots of bugs and may generate absolutely broken shader code that may even crash shader compiler in the driver.
Lots of useful information related to ANGLE can be found there Avoiding compiler crash and Compatibility issues in Shadertoy and webGLSL.

Most common ANGLE Optimizations against code size exploding because of unrolling:

  1. Prevent loops unrolling do this for (int i=0; i<N+min(0,iFrame); i++)

OpenGL/Vulkan shaders optimization:

Same as in WebGL const arrays inlined for every use by some driver-compilers.

Do not use #define for something complex, because it will explode code or leads to unpredictable bugs.
Example of complex usage of define — Debug heatmap script.
This leads to these bugs in some compilers — Define bug.
Another example of Define bug in OpenGL GLSL compiler.

For full-screen postprocessing or just logic shaders in the buffers — having many buffers with shaders with short render time is better than a single buffer with one shader for everything, divide into parts logic and long render time.

Pre-calculated vs GPU real-time values.

Shaders and constant expression:

Solution of constant expression depends on GLSL compiler, in many cases its solution not equal to expected or GPU-side result.


  1. #define PI (4.0 * atan(1.)) or #define E exp(1.) it may or may not be solved in pre-compile time, and if it solved result may not be expected PI or E value.
    Define const numbers as const values, not as const expressions.
Debug heatmap script


I found the most useful way is — a minimal application that does texture saving to file or frame by frame debugging, I use Shadertoy only to debug WebGL shaders.

Maybe can be useful — this my Debug heatmap script. Display number of function/logic calls per pixel but in most cases its obvious without a script.

I have not found useful GLSL emulation on the CPU.
Chrome and Mesa software GLSL emulation has way too many bugs, in most cases, it can not even load my shaders.
Other emulators for example SPIRV-VM also have way too many bugs and the result of calculation from complex shader is not equal to shader on the GPU result.

The edge of the Games in GPU:

Error in some GPUs on using a large shader.


Avoiding compiler crash and Compatibility issues in Shadertoy and webGLSL.


Hardware limitations — depends on GPU:

  1. Loop in the loop in another loop with complex logic on every level.
    Shaders do have a complexity limit, and when used many loops — this shader may not work on many low-end GPUs like integrated or mobile GPU.
    Example bug report of this case in Vulkan.

Games in the shaders, links:

From my shader-game Sgame.

My games in the shaders:

  1. Sgame (youtube video link) — my first try of creating a game in the shader, only physics calculated on the CPU.
    Launch link. Warning: clicking this link may crash your browser.
    Shader source code.

Other games on the Shadertoy:

List of 151 Playable games in Shadertoy!

Export template:

I use only my own Vulkan-Shadertoy-Launcher (source code there).
I do not recommend OpenGL applications because way too many bugs in OpenGL.

For other templates for exporting look at my previous blog Into Shadertoy and Shaders useful links and tips (Offline Shadertoy related applications section).

Thanks for reading!

GLSL and usual coding