-
- Downloads
Bug #1608: clFFT use of native_sin , native_cos can cause validation problems
Still experimental: replace calls to native_sin in clFFT This change explores the performance impacts of using a set of LUTs, precomputed on the CPU to perform sin(x_i) and cos(x_i) in a grid x_i= +/- 2*pi *i/N , N fixed. On a 6770M, this code is still ca 3% slower than the original native_sin/native_cos varaint for a BRP4-like transform This variant should have a very high accuracy, versions with lesser accuracy but higher performance should be explored next. Eventually the method should be selectable by a parameter to the plan creator as suggested by Bernd. TODO: - remove some diagnostic code, - optimze total size of LUTs perhaps by using cos(x) = sin(x+pi/2), so no need to keep separate LUTs for sin and cos, just one slighly longer with an additional alias pointer - try caching the LUTs in shared memory (using constant memory didn't help)
Showing
- src/fft_base_kernels.h 16 additions, 0 deletionssrc/fft_base_kernels.h
- src/fft_execute.cpp 19 additions, 0 deletionssrc/fft_execute.cpp
- src/fft_internal.h 12 additions, 1 deletionsrc/fft_internal.h
- src/fft_kernelstring.cpp 117 additions, 8 deletionssrc/fft_kernelstring.cpp
- src/fft_setup.cpp 168 additions, 0 deletionssrc/fft_setup.cpp
Loading
Please register or sign in to comment