OpenCL DCT Denoise
I've just checked in an OpenCL implementation of the DCT de-noising algorithm I mentioned previously. I've only done the mono version so far.
It's not terribly fast - 10ms wall-clock for a 512x512 mono image, and given that it requires 64 DCT's per 8x8 block and needs to accumulate the results, it probably never will be.
The kernel source.Update: Colour version implemented now.