1. 03 Dec, 2019 1 commit
  2. 21 Aug, 2019 1 commit
  3. 12 Aug, 2019 2 commits
  4. 17 Jun, 2019 1 commit
  5. 14 Jun, 2019 4 commits
  6. 13 Jun, 2019 1 commit
  7. 21 Feb, 2019 1 commit
  8. 20 Feb, 2019 1 commit
  9. 18 Feb, 2019 1 commit
  10. 23 Apr, 2018 2 commits
  11. 07 Jun, 2016 1 commit
  12. 21 Sep, 2012 1 commit
  13. 26 Jul, 2012 2 commits
  14. 25 Jul, 2012 4 commits
  15. 24 Jul, 2012 1 commit
  16. 23 Jul, 2012 1 commit
  17. 13 Jul, 2012 1 commit
  18. 07 Jul, 2012 1 commit
  19. 26 Jun, 2012 1 commit
  20. 25 Jun, 2012 1 commit
  21. 22 Jun, 2012 1 commit
    • Heinz-Bernd Eggenstein's avatar
      Bug #1608: clFFT use of native_sin , native_cos can cause validation problems · 20314512
      Heinz-Bernd Eggenstein authored
      experimental: -added alternative method for twiddle factor calc, using a smaller LUT (256 * float2 )
                     via Taylor series to 3rd order, seems to be almost as accurate as method with 2 bigger LUTs, but faster.
                    -improved method w/ 2 bigger LUTs to use LUTs of float2
                    -improved method using slow sin/cos functions (now uses sincos combined function), still slow
                    - preparaed plan struct to have method switchable at plan creation time.
      
                    TODO: load smaller LUT for Taylor series approx into shared mem.
      20314512
  22. 08 Jun, 2012 1 commit
    • Heinz-Bernd Eggenstein's avatar
      Bug #1608: clFFT use of native_sin , native_cos can cause validation problems · 48a3c019
      Heinz-Bernd Eggenstein authored
      Still experimental: replace calls to native_sin in clFFT
      This change explores the performance impacts of using a set of LUTs, precomputed on the CPU
      to perform sin(x_i) and cos(x_i) in a grid x_i= +/- 2*pi *i/N , N fixed.
      
      On a 6770M, this code is still ca 3% slower than the original native_sin/native_cos varaint
      for a BRP4-like transform
      
      This variant should have a very high accuracy, versions with lesser accuracy but
      higher performance should be explored next. Eventually the method should be selectable
      by a parameter to the plan creator as suggested by Bernd.
      
      TODO: - remove some diagnostic code,
            - optimze total size of LUTs perhaps by using
              cos(x) = sin(x+pi/2), so no need to keep separate LUTs for sin and cos, just one slighly longer with
              an additional alias pointer
            - try caching the LUTs in shared memory (using constant memory didn't help)
      48a3c019
  23. 20 Oct, 2011 1 commit
  24. 17 Oct, 2011 1 commit
  25. 13 Sep, 2011 1 commit
  26. 20 May, 2011 6 commits