- 23 Jul, 2012 1 commit
-
-
Heinz-Bernd Eggenstein authored
added file comment headers to express that this is now derived work and not the original Apple source code The original Apple comment headers with (c) and license info are retained
-
- 22 Jun, 2012 1 commit
-
-
Heinz-Bernd Eggenstein authored
experimental: -added alternative method for twiddle factor calc, using a smaller LUT (256 * float2 ) via Taylor series to 3rd order, seems to be almost as accurate as method with 2 bigger LUTs, but faster. -improved method w/ 2 bigger LUTs to use LUTs of float2 -improved method using slow sin/cos functions (now uses sincos combined function), still slow - preparaed plan struct to have method switchable at plan creation time. TODO: load smaller LUT for Taylor series approx into shared mem.
-
- 08 Jun, 2012 1 commit
-
-
Heinz-Bernd Eggenstein authored
Still experimental: replace calls to native_sin in clFFT This change explores the performance impacts of using a set of LUTs, precomputed on the CPU to perform sin(x_i) and cos(x_i) in a grid x_i= +/- 2*pi *i/N , N fixed. On a 6770M, this code is still ca 3% slower than the original native_sin/native_cos varaint for a BRP4-like transform This variant should have a very high accuracy, versions with lesser accuracy but higher performance should be explored next. Eventually the method should be selectable by a parameter to the plan creator as suggested by Bernd. TODO: - remove some diagnostic code, - optimze total size of LUTs perhaps by using cos(x) = sin(x+pi/2), so no need to keep separate LUTs for sin and cos, just one slighly longer with an additional alias pointer - try caching the LUTs in shared memory (using constant memory didn't help)
-
- 18 Mar, 2011 2 commits
-
-
Oliver Bock authored
-
Oliver Bock authored
-
- 17 Mar, 2011 1 commit
-
-
Oliver Bock authored
-
- 16 Mar, 2011 1 commit
-
-
Oliver Bock authored
-
- 12 Jan, 2011 4 commits
-
-
Oliver Bock authored
-
Oliver Bock authored
* Added dedicated Makefiles * Added support for Mac OS 10.6 * TODO: proper Linux and Windows support * Note: we need to add licensing headers!
-
Oliver Bock authored
-
Gaurav Khanna authored
* Based on Apple's OpenCL FFT implementation * Derivative work done by Gaurav Khanna
-