pyCUDA version of transient F-stat
This adds a pyCUDA implementation of lalpulsar's ComputeTransientFstatMap function, to be used for now only from the TransientGridSearch() class but in principle portable to other search classes too.
There are two separate kernel files (.cu) installed as 'package_data', and a source file tcw_fstat_map_funcs.py that includes both the direct wrappers to those GPU kernel and some setup acrobatics.
@GregAshton Could you have a preliminary look and first let me know if you think this is all in scope for PyFstat in the first place, or if you'd prefer the kernels and wrappers to live in an external package?
Then here's a (non-exhaustive) list of hacks or possibly controversial implementation details:
- Since pycuda won't be installed by default on many systems, I'm doing optional imports in init_transient_fstat_map_features(), for which there might be a more elegant and pythonic solution.
- I've had to add an explicit
__del__destructor to the ComputeFstat class. pyCuda has a nice autoinit feature which would do the cleanup too, but for multi-GPU hosts (e.g. CIT head nodes) I need to manually work with what it calls "contexts" and then clean that up in the end, too.
- The lal and pycuda versions currently use slightly different FstatMap objects, which I could unify in future if the if-elses looks too ugly.
- The last fixup commit adds explicit defaults for the new ComputeFstat attributes also to the derived SemiCoherentSearch and SemiCoherentGlitchSearch classes; I was hoping they could somehow inherit the defaults from their base class, but I guess that would require explicitly calling its
__init__from the child...?
- The commit history is a bit messy, e.g. the "choose from multiple GPUs with CUDA_DEVICE environment variable" is superseded by later commits, but if it's not too much of a bother I'd like to keep this history, because if any issues arise it might become necessary to return to that simpler implementation.
I'm also still chasing some larger-than-expected differences between CPU and GPU versions for exponential windows, though those could just be due to Reinhard's "FastExp" functions. Which is why this is
WIP: and self-assigned for now. Feedback already much appreciated, though!