Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
  • CLFFT_NO_MAD_ENABLE
  • BRP_build_fixes
  • override_cl_compile_options
  • improve_Makefile
  • HSA
  • clmathfft
  • longer_dft_support
  • current_brp_apps
  • current_fgrp_apps
10 results

src

  • Clone with SSH
  • Clone with HTTPS
  • Heinz-Bernd Eggenstein's avatar
    Heinz-Bernd Eggenstein authored
    Still experimental: replace calls to native_sin in clFFT
    This change explores the performance impacts of using a set of LUTs, precomputed on the CPU
    to perform sin(x_i) and cos(x_i) in a grid x_i= +/- 2*pi *i/N , N fixed.
    
    On a 6770M, this code is still ca 3% slower than the original native_sin/native_cos varaint
    for a BRP4-like transform
    
    This variant should have a very high accuracy, versions with lesser accuracy but
    higher performance should be explored next. Eventually the method should be selectable
    by a parameter to the plan creator as suggested by Bernd.
    
    TODO: - remove some diagnostic code,
          - optimze total size of LUTs perhaps by using
            cos(x) = sin(x+pi/2), so no need to keep separate LUTs for sin and cos, just one slighly longer with
            an additional alias pointer
          - try caching the LUTs in shared memory (using constant memory didn't help)
    48a3c019
    History