Per pixel linked lists for single pass A-buffer

What's new

Good news! All issues with race conditions are now solved and the latest version of the code no longer needs any hack in order to run properly. The figure below gives the latest performance measured on our GeForge Titan using drivers 337.50 (beta). Results remain similar.

Results on a GeForce Titan with drivers 337.50, rendering 2.5M fragments with varying average depth complexity.

These results still emulate atomicMax 64 bits with a 32 bit atomicCAS. As soon as atomicMax 64 bits are natively supported in OpenGL we will update the results. The performance of prelin, preopen and postopen are likely to improve.

