Float precision on GPU, bugs/features

Expected 123 everywhere, screenshot from ANGLE DX11, Nvidia https://www.shadertoy.com/view/tlfBRB

Content:

  1. Reading float bits from texture and losing float bits. (first screenshot)
  2. Precompiled floats. CPU vs GPU.
  3. More about float bits and int/uint, float can be equal to 0. but

Reading float bits from texture and losing float bits:

Screenshot above. Test shader shadertoy link. Expected 123 everywhere.

Numbers on left side on screenshot:

Blue number result of this operation (line 29 in shadertoy code):

uint value_2 = floatBitsToUint(uintBitsToFloat(uint(value+iZero))+fZero);

Result — can be 0 or value, depends on GPU and GPU shader language.
Reason
— uintBitsToFloat(uint(value+iZero)) save value as float bits, but for test I use small uint value 123.
And the sum of float(<from uint 123>) + 0. processed as “float operation” by floats-GPU units, and uint(0) can be as result.

Conclusion — adding floats to float_bits value may result in losing bits.

Red number can be 0 or 123 depends of CPU shader compiler (it can be bugged and result 0).
Green number — should be 123 everywhere.

Right side of screenshot, texture reading:

Do not expect to read valid bits by using textureLod or texture, when FBO interpolation set to Linear or Mipmap.
Only texelFetch return valid bits.
And only when interpolation set to Nearest function texture and textureLod may return valid bits.

Precompiled floats:

https://www.shadertoy.com/view/wdXGW8 Nvidia

Result of functions such as trigonometric functions(sin etc), pow, sqrt, others on CPU may not be equal to GPU.

Test shader shadertoy link. Expected — left side of screenshot equal to right.

Left side code has static iTime=32. and shader code precalculated on CPU.
Right side code iTime is 32 + GPU(0) that force code executed on the GPU.

First line on screenshotresult of trigonometric sin-base random. Not same on CPU and GPU.

Also remember that result of sin-based-hash-random is not same from GPU to GPU, and only uint-based-hash-random will be same on every GPU and CPU.

sin sqrt and float precious patterns https://www.shadertoy.com/view/NsBBDW

Myths About Floating-Point Numbers by Adam Sawicki say this:

The reason random numbers are generated on NVIDIA cards and not on AMD is that sine instruction on AMD GPU architectures actually has period of 1, not 2*PI. But it is still fully deterministic in regards to input value. It just returns different results between different platforms.

Next lines result 1 or 2 and 243 or 242 is because pow on CPU not equal to pow on GPU.

Also remember:

Functions like smoothstep may follow specs when used on CPU shader compiler:

smoothstep returns 0.0 if xedge0 and 1.0 if xedge1.
Results are undefined if edge0edge1.

Test shader shadertoy link result of smoothstep(1., 0.9, 0.) is 0 on CPU and 1 on GPU.

Linear interpolation depends on GPU/API:

https://www.shadertoy.com/view/ftXcW7 texture pixel linear interpolation

Shader compiler may use 32 or 64 bit floats to pre-compile static code:

https://www.shadertoy.com/view/sllXW8 Nvidia

Test shader shadertoy link. Expected — 3248488448 and negative 20 everywhere.

First line on screenshot — val0 CPU precompiled result 3248488448.
Second lineval1 changing 0.3 from const to (0.3+min(iTime,0.)).
Changes result in Vulkan to 3248488447.
Last line val2 changing 0.4 from const to (0.4+min(iTime,0.)).
Changes result in OpenGL and Vulkan to 3248488447 or 3248488449.

Vulkan values can be so different, maybe because of RelaxedPrecision.

GPU precision never 0, but it can be 0:

https://www.shadertoy.com/view/ftXSWB Nvidia

Test shader shadertoy link. Expected 0 for every of val 0 to 3.

In shader and screenshot:

Value of val_const equal to val_dyn in uint bits representation on the GPU.
But their difference may not be 0. on GPU.

Result of val2 = val_const - val_dyn; is a very small positive value, that may not be equal to 0. (look below for testing example)

Result of val3 = val_dyn - val_const; is a very small negative value, that also may not equal to 0. (look below for testing example)

The result of operation will be equal to 0. only when “source of the variable” is same for every of variables in operation.
Like GPU_value minus GPU_value.

Testing this behavior:

Open this my test shader shadertoy link.
Use this line of code, and add to the line numbered below:

if(val3==0.){fragColor=vec4(1.);return;}

First test:
Add this line of code to line 27.
And press compile on shadertoy.
Result — (val3==0.) is true.

Second test:
Add this line of code to line 41.
And press compile on shadertoy.
Result — (val3==0.) is false.

When val3 in this line changed to val0 or val1 — result will be true in any part of the shader.

Undefined behavior on CPU and GPU:

OpenGL and Vulkan GLSL specs says:

The resulting value is undefined if <condition>

Undefined does not mean it guarantee NAN or INF for each operation, it can be anything.
This is also source of errors, bugs and make shader debugging much harder because result of operations not same on CPU and GPU.

I made this article as a note for my self.
I hope this is not completely useless info for you.

Also, I made list of other GPU - bugs link to list of bugs.

Thanks for reading!

--

--

--

GLSL and usual coding

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Array with Ruby — Part 1

Easy JSON parsing with Codable

Using FPGA-SoC Interface for Low Cost IoT Based Image Processing

Task: 37 Description 📄

Install Kubeflow in 10 Steps on MacOS

Scanning for memory issues in your data pipelines

How Google runs production systems

Upgrading oracle databases in AWS EC2

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Danil

Danil

GLSL and usual coding

More from Medium

Features:

Ranking 200 Random Steam Buy-to-Play Games [Part II: #169–140]

Hilt, Behind the annotation

How to stick decals on the Spatial Mesh of Hololens2