A collection of (graphics) programming reference documents
Shaders is a subject that is easy to understand from a higher level, but in practice has a lot of small details that can sometimes be hard to understand which can lead to performance issues. Understanding how to write a proper shader and how to profile a shader is crucial for writing any performance critical graphics application.
Shader compilation works quite different from how a lot of people think. Similar to how CPU code works, every graphics card has got it’s set of instructions that can be executed on the GPU within a shader. This is often called the ISA. The ISA not only differs between different IHV’s, it often even differs between different architectures of the same vendor. For example, GCN uses a different ISA than RDNA, though they are quite similar.
float4 PSMain(float4 color : COLOR) : SV_TARGET
{
return color;
}
However, when compiling our shaders for PC or mobile, we often don’t know which hardware we are compiling for, or the ISA isn’t publicly available (not even for the compiler maintainers). As a result, our shaders get compiled to an intermediate language, such as DXIL or SPIR-V. For example, the open-source Microsoft DXC compiler can compile HLSL to either DXIL or to SPIR-V. This compiled intermediate language is what is stored on disk in our binary blobs.
define void @PSMain() {
%1 = call float @dx.op.loadInput.f32(i32 4, i32 0, i32 0, i8 0, i32 undef) ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
%2 = call float @dx.op.loadInput.f32(i32 4, i32 0, i32 0, i8 1, i32 undef) ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
%3 = call float @dx.op.loadInput.f32(i32 4, i32 0, i32 0, i8 2, i32 undef) ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
%4 = call float @dx.op.loadInput.f32(i32 4, i32 0, i32 0, i8 3, i32 undef) ; LoadInput(inputSigId,rowIndex,colIndex,gsVertexAxis)
call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 0, float %1) ; StoreOutput(outputSigId,rowIndex,colIndex,value)
call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 1, float %2) ; StoreOutput(outputSigId,rowIndex,colIndex,value)
call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 2, float %3) ; StoreOutput(outputSigId,rowIndex,colIndex,value)
call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 3, float %4) ; StoreOutput(outputSigId,rowIndex,colIndex,value)
ret void
}
When we load in our shaders at runtime, the loaded graphics driver is responsible for compiling the intermediate language shaders to the hardware specific ISA. This is done behind the scenes, though you can often tell when this happens, however it is different between different graphics API’s or even drivers. In older graphics API’s, these shaders are often only compiled when they are used for the first time during a draw. This is why a lot of older games used to do a bunch of dummy draws to “warm the shader cache”. In more modern graphics API’s, such as DirectX 12 or Vulkan, this shader compilation often occurs when a pipeline is created.
s_mov_b32 m0, s2
v_interp_p1_f32 v2, v0, attr0.x
v_interp_p2_f32 v2, v1, attr0.x
v_interp_p1_f32 v3, v0, attr0.y
v_interp_p2_f32 v3, v1, attr0.y
v_interp_p1_f32 v4, v0, attr0.z
v_interp_p2_f32 v4, v1, attr0.z
v_interp_p1_f32 v0, v0, attr0.w
v_interp_p2_f32 v0, v1, attr0.w
v_cvt_pkrtz_f16_f32 v1, v2, v
v_cvt_pkrtz_f16_f32 v0, v4, v
exp mrt0, v1, v1, v0, v0 done compr v
s_endpgm
Last modified on Thursday 10 March 2022 at 21:53:35