- Apr 25, 2023
-
-
Heinz-Bernd Eggenstein authored
allow to override default OpenCL compile options by defining CLFFT_COMPILE_OPTIONS macro See merge request !6
-
- Aug 10, 2022
-
-
Bernd Machenschalk authored
-
- Mar 04, 2021
-
-
Bernd Machenschalk authored
Add clFFT_GetSize() for getting the estimated size of a plan See merge request !5
-
- Mar 02, 2021
-
-
Maximillian Bensch authored
- similar to cufftGetSize()
-
- Dec 03, 2019
-
-
Bernd Machenschalk authored
Remove GPU constraint See merge request !4
-
- Aug 21, 2019
-
-
Maximillian Bensch authored
-
- Aug 12, 2019
-
-
Bernd Machenschalk authored
Makefile improvements See merge request !3
-
Bernd Machenschalk authored
-
- Jun 17, 2019
-
-
Bernd Machenschalk authored
-
- Jun 14, 2019
-
-
Bernd Machenschalk authored
-
Bernd Machenschalk authored
-
Bernd Machenschalk authored
-
Bernd Machenschalk authored
-
- Jun 13, 2019
-
-
Bernd Machenschalk authored
-
- Feb 21, 2019
-
-
Bernd Machenschalk authored
- allow to build shared and static versions separately - fix OSX shared build
-
- Feb 20, 2019
-
-
Bernd Machenschalk authored
add a shared library version See merge request !2
-
- Feb 18, 2019
-
-
Maximillian Bensch authored
-
- Apr 23, 2018
-
-
Oliver Bock authored
renamed library and header to 'eclfft' to avoid conflicts with clFFT See merge request !1
-
Bernd Machenschalk authored
- this installs the header in $PREFIX/include/eclfft and the lib in $PREFIX/lib/eclfft.a
-
- Jun 07, 2016
-
-
Oliver Bock authored
-
- Sep 21, 2012
-
-
Heinz-Bernd Eggenstein authored
-
- Jul 26, 2012
-
-
Oliver Bock authored
-
Oliver Bock authored
Switching to archive files (OpenCL.lib for 64 bit seems to be compiled to a different/incompatible format)
-
- Jul 25, 2012
-
-
Oliver Bock authored
-
Oliver Bock authored
-
Heinz-Bernd Eggenstein authored
fixed previous commit for C99 compliant float printf format
-
Heinz-Bernd Eggenstein authored
-fixed compilation warning -fixed problem of using "%a" printf-format which is only supported in C99 and later, which cannot be assumed for mingw cross compiles for Windows. Now uses this format only conditionally if supported, otherwise falls back to %f for generated float literals
-
- Jul 24, 2012
-
-
Oliver Bock authored
-
- Jul 23, 2012
-
-
Heinz-Bernd Eggenstein authored
added file comment headers to express that this is now derived work and not the original Apple source code The original Apple comment headers with (c) and license info are retained
-
- Jul 13, 2012
-
-
Heinz-Bernd Eggenstein authored
Prevent integer overruns for long transforms in Taylor approx of sin cos. Still to do: check all uses of mad24 etc in generated code where overruns could occur as well
-
- Jul 07, 2012
-
-
Heinz-Bernd Eggenstein authored
fix: use compiler flag to globally convert all double constants to floats
-
- Jun 26, 2012
-
-
Heinz-Bernd Eggenstein authored
added plan class creation method that allows to set flags to direct code generation currently limited to select among 4 methods to compute twiddle factors: -native_sin,native_cos function -sincos() function -set of two LUTs in global memory -Taylor series approx via a smaller LUT in shared memory
-
- Jun 25, 2012
-
-
Heinz-Bernd Eggenstein authored
experimanetal: improved Taylor series approx by copying LUT to shared mem. TODO: cleanup, expose sin/cos method on plan creation interface, do proper calculation of available shared mem for sin cos LUT
-
- Jun 22, 2012
-
-
Heinz-Bernd Eggenstein authored
experimental: -added alternative method for twiddle factor calc, using a smaller LUT (256 * float2 ) via Taylor series to 3rd order, seems to be almost as accurate as method with 2 bigger LUTs, but faster. -improved method w/ 2 bigger LUTs to use LUTs of float2 -improved method using slow sin/cos functions (now uses sincos combined function), still slow - preparaed plan struct to have method switchable at plan creation time. TODO: load smaller LUT for Taylor series approx into shared mem.
-
- Jun 08, 2012
-
-
Heinz-Bernd Eggenstein authored
Still experimental: replace calls to native_sin in clFFT This change explores the performance impacts of using a set of LUTs, precomputed on the CPU to perform sin(x_i) and cos(x_i) in a grid x_i= +/- 2*pi *i/N , N fixed. On a 6770M, this code is still ca 3% slower than the original native_sin/native_cos varaint for a BRP4-like transform This variant should have a very high accuracy, versions with lesser accuracy but higher performance should be explored next. Eventually the method should be selectable by a parameter to the plan creator as suggested by Bernd. TODO: - remove some diagnostic code, - optimze total size of LUTs perhaps by using cos(x) = sin(x+pi/2), so no need to keep separate LUTs for sin and cos, just one slighly longer with an additional alias pointer - try caching the LUTs in shared memory (using constant memory didn't help)
-
- Oct 20, 2011
-
-
Oliver Bock authored
-
- Oct 17, 2011
-
-
Oliver Bock authored
-
- Sep 13, 2011
-
-
Oliver Bock authored
-
- May 20, 2011
-
-
Oliver Bock authored
-
Oliver Bock authored
* Supported targets: linux (default), macos, win32, clean
-