Commit ba48f909 authored by Gregory Ashton's avatar Gregory Ashton
Browse files

Minor update to the readme and remove docs

parent 3a841ce9
# PyFstat
This is a python package providing an interface to perform F-statistic based
continuous gravitational wave (CW) searches. At its core, this is a simple
wrapper of selected tools in
['lalpulsar'](http://software.ligo.org/docs/lalsuite/lalpulsar/). The general
idea is to allow easy scripting of new search pipelines, we focus
primarily on interfacing the CW routines with
[emcee](http://dan.iel.fm/emcee/current/) a python MCMC sampler.
continuous gravitational wave (CW) searches.
## Examples
We include a variety of example search scripts [here](examples), for each
example there is also a more descriptive write-up containing examples of the
output which we list below. Before running any of the search examples, be sure
to have run the [script to generate fake data](examples/make_fake_data.py).
* [Making fake data with and without glitches](docs/make_fake_data.md)
* [Fully-coherent MCMC search](docs/fully_coherent_search_using_MCMC.md)
* [Fully-coherent MCMC search on data containing a single glitch](docs/fully_coherent_search_using_MCMC_on_glitching_data.md)
* [Semi-coherent MCMC glitch-search on data containing a single glitch](docs/semi_coherent_glitch_search_using_MCMC_on_glitching_data.md)
* [Semi-coherent MCMC glitch-search on data containing two glitches](docs/semi_coherent_glitch_search_with_two_glitches_using_MCMC_on_glitching_data.md)
* [Semi-coherent Follow-Up MCMC search (dynamically changing the coherence time)](docs/follow_up.md)
For documentation, please use the [wiki](https://gitlab.aei.uni-hannover.de/GregAshton/PyFstat/wikis/home).
## Installation
......@@ -35,8 +17,27 @@ the stripped down [miniconda](http://conda.pydata.org/miniconda.html)
installation, or the full-featured
[anaconda](https://www.continuum.io/downloads) (these are essentially the
same, but the `anaconda` version installs a variety of useful packages such as
`numpy` and `scipy` by default). Instructions to install miniconda/anaconda
are provided in the links.
`numpy` and `scipy` by default).
The fastest/easiest method is to follow your OS instructions
[here](https://conda.io/docs/install/quick.html) which will install Miniconda.
For the rest of this tutorial, we will make use of `pip` to install modules (
not all packages can be installed with `conda` and for those using alternatives
to `conda`, `pip` is more universal).
This can be installed with
```
$ conda install pip
```
### Clone the repository
In a terminal, to clone the directory:
```
$ git clone git@gitlab.aei.uni-hannover.de:GregAshton/PyFstat.git
```
### Dependencies
......@@ -57,8 +58,6 @@ For an introduction to installing modules see
```
$ pip install -r /PATH/TO/THIS/DIRECTORY/requirements.txt
```
If you have installed python from conda then `pip` itself can be installed via
`conda install pip`.
In addition to these modules, you also need a working **swig-enabled**
[`lalapps`](http://software.ligo.org/docs/lalsuite/lalsuite/) with
......@@ -72,7 +71,7 @@ $ ./configure --prefix=${HOME}/lalsuite-install --disable-all-lal --enable-lalpu
### `pyfstat` installation
The script can be installed system wide, assuming you are in this directory, via
The script can be installed system wide, assuming you are in the source directory, via
```
$ python setup.py install
```
......
# File formats used by PyFstat
This page documents the various file formats and how they are used.
## MCMC Searches:
### Data output
* `outdir/label.log`: On importing `pyfstat`, two logging streams are setup:
one stream which is written to the terminal, and a second which is saved to a
file `outdir/label.log` where the `outdir` and `label` are given when
initialising the search. While the terminal output can be suppressed with the
`-q` flag, the file is always written with a log-level set to `INFO`. This file
is never overwritten, so can be used to search for changes in the setup or old
results.
* `outdir/label.par`: A parameter file containing the maximum detection stat.
value, and estimates of the best-fit parameters. This can be read in with
`read_par()` and written with `write_par()`.
* `outdir/label_saved_data.p`: Upon succesful completion of an MCMC search, the
results will be saved to a `python` `pickle` file. This pickle file can
subsequently be read back and contains many useful outputs such as the
`sampler` object, the `lnprobs` and `lnlikes` from the run, and of course the
`chains` themselves. Rerunning a script with different parameters, the pickle
is overwritten once the simulation completes, however, a backup is saved with
an appended `.old` label.
### Image output
* `outdir/label_walkers.png`: The position of all temperature 0 walkers
during the burn-in + production stage. In addition, the final panel plots a
histogram of the detection statistic from all temperature 0 walkers; if
a burn-in period is defined this is computed separately and colored red.
* `outdir/label_init_i_walkers.png`: The same as the `walkers`, but for the
`ith` initialisation stage.
* `outdir/label_corner.png`: A corner plot of the production samples using
the [corner](https://github.com/dfm/corner.py) package. This file is
generated by `plot_corner()`.
* `outdir/label_prior_posterior.png`: A plot showing the prior and a KDE of
the posterior, generated by `plot_prior_posterior()`
## Grid searches
### Data output
* `outdir/label_grid_FS.txt`: Upon succesful completion of a grid search, the
grid points are saved in plain text format. The order is set by
`get_input_data_array` with an additional column being the output detection
statistic.
### Image output
* `outdir/label_1D.png`
* `outdir/label_2D.png`: A 2D contour plot of the detection statitistic over
the range of parameters; options exist to flatten higher dimension
searches. Generated using `plot_2D()`.
# Semi-coherent Follow-Up MCMC search (dynamically changing the coherence time)
Here, we will show the set-up for using the `MCMCFollowUp` class. The basic
idea is to start the MCMC chains searching on a likelihood with a short coherence
time; once the MCMC chains converge to the solution, the coherence time is
extended effectively narrowing the peak, afterwhich the chains again converge
to this narrower peak. The advantages of such a method are:
1. Dynamically shows the evolution
2. Able to handle multiple peaks and hence can result in a multi-modal posterior
The plots here are produced by [follow_up.py](../example/follow_up.py). We
will run the search on the `basic` data generated in the
[make_fake_data](make_fake_data.md) example. The basic script is here:
```python
import pyfstat
F0 = 30.0
F1 = -1e-10
F2 = 0
Alpha = 5e-3
Delta = 6e-2
tref = 362750407.0
tstart = 1000000000
duration = 100*86400
tend = tstart + duration
theta_prior = {'F0': {'type': 'unif', 'lower': F0*(1-1e-6), 'upper': F0*(1+1e-5)},
'F1': {'type': 'unif', 'lower': F1*(1+1e-2), 'upper': F1*(1-1e-2)},
'F2': F2,
'Alpha': Alpha,
'Delta': Delta
}
ntemps = 1
log10temperature_min = -1
nwalkers = 100
run_setup = [(1000, 50), (1000, 25), (1000, 1, False),
((500, 500), 1, True)]
mcmc = pyfstat.MCMCFollowUpSearch(
label='follow_up', outdir='data',
sftfilepath='data/*basic*sft', theta_prior=theta_prior, tref=tref,
minStartTime=tstart, maxStartTime=tend, nwalkers=nwalkers,
ntemps=ntemps, log10temperature_min=log10temperature_min)
mcmc.run(run_setup)
mcmc.plot_corner(add_prior=True)
mcmc.print_summary()
```
Note that, We use the `MCMCFOllowUpSearch class. Rather than using the `nsteps`
parameter to define how long the chains are run for, this class uses
`run_setup`. This is an `nstage` length list (or array) which determines the
number of steps, how many steps should be considered burn-in, the number of
segments to use, and whether to re-initialise the walkers. Each element of the
list is a 3-tuple of the form `((nburn, nprod), nsegs, resetp0)`. However, each
element can be given as a short form: either ommiting the `nsteps as `(nburn,
nsegs, resetp0)` or omiting the `resetp0` as `((nburn, nsteps), nsegs)`, or
a combination of the two. For example, above we used
```python
run_setup = [(1000, 50), (1000, 25), (1000, 1, False),
((500, 500), 1, True)]
```
Here we run the first two steps with 1000 burn-in steps (such that they will
be discarded) and changing the number of segments, then one 1000 burn-in
steps fully coherently and finally a run with 500 burn-in and 500 production
samples and a reset of the parameters at the begining. The output is
demonstrated here:
![](img/follow_up_walkers.png)
Some things to note:
* The `resetp0` is useful to clean-up straggling walkers, but will remove all
multimodality potentially loosing information.
* On the first axis the coherence time is displayed for each section
* In this example the signal is quite strong and in Gaussian noise
* The `twoF` distribution plotted at the bottom is taken only from the production
run
* This plot is generated after each stage of the run - this can be useful to
check it is converging before continuing the simulation
# Fully coherent search using MCMC
In this example, we will show the basics of setting up and running a
fully-coherent MCMC search. This is based on the example
[fully_coherent_search_using_MCMC.py](../example/fully_coherent_search_using_MCMC.py).
We will run the search on the `basic` data generated in the
[make_fake_data](make_fake_data.md) example.
First, we need to import the search tool, in this example we will use the
`MCMCSearch`, but one could equally use `MCMCGlitchSearch` with `nglitch=0`.
To import this,
```python
from pyfstat import MCMCSearch
```
Next, we define some variables defining the exact parameters of the signal
in the data, and the start and end times:
```python
F0 = 30.0
F1 = -1e-10
F2 = 0
Alpha = np.radians(83.6292)
Delta = np.radians(22.0144)
tref = 362750407.0
tstart = 1000000000
duration = 100*86400
tend = tstart = duration
```
Now, we need to specify our prior. This is a dictionary containing keys for
each variable (in the `MCMCSearch` these are `F0`, `F1`, `F2`, `Alpha`, and
`Delta`). In this example, we choose a uniform box in `F0` and `F1`:
```python
theta_prior = {'F0': {'type': 'unif', 'lower': F0*(1-1e-6), 'upper': F0*(1+1e-6)},
'F1': {'type': 'unif', 'lower': F1*(1+1e-2), 'upper': F1*(1-1e-2)},
'F2': F2,
'Alpha': Alpha,
'Delta': Delta
}
```
Each key and value of the `theta_prior` contains an instruction to the MCMC
search. If the value is a scalar, the MCMC search holds these fixed (as is the
case for `F2`, `Alpha`, and `Delta` here). If instead the value is a dictionary
describing a distribution, this is taken as the prior and the variable is
simulated in the MCMC search (as is the case for `F0` and `F1`). Note that
for `MCMCSearch`, `theta_prior` must contain at least all of the variables
given here (even if they are zero), and if `binary=True`, it must also contain
the binary parameters.
Next, we define the parameters of the MCMC search:
```python
ntemps = 4
log10temperature_min = -1
nwalkers = 100
nsteps = [1000, 1000]
```
These can be considered the *tuning parameters* of the search. A complete
discussion of these can be found [here](tuning_parameters.md).
Passing all this to the MCMC search, we also need to give it a label, a
directory to save the data, and provide `sftfilepath`, a string matching
the data to use in the search
```python
mcmc = MCMCSearch(label='fully_coherent_search_using_MCMC', outdir='data',
sftfilepath='data/*basic*sft', theta_prior=theta_prior,
tref=tref, tstart=tstart, tend=tend, nsteps=nsteps,
nwalkers=nwalkers, ntemps=ntemps,
log10temperature_min=log10temperature_min)
```
To run the simulation, we call
```python
mcmc.run()
```
This produces two `.png` images. The first is the position of the *walkers*
during the simulation:
![](img/fully_coherent_search_using_MCMC_walkers.png)
This shows (in red) the position of the walkers during the burn-in stage. They
are initially defuse (they start from positions randomly picked from the prior),
but eventually converge to a single stable solution. The black is the production
period from which posterior estimates are made. The bottom panel is a histogram
of `twoF`, split for the production period. Note that, early on there are
multiple modes corresponding to other peaks, by using the parallel tempering,
we allow the walkers to explore all of these peaks and opt for the strong
central candidate.
To get posteriors, we call
```python
mcmc.plot_corner()
```
which produces a corner plot
![](img/fully_coherent_search_using_MCMC_corner.png)
illustrating the tightly constrained posteriors on `F0` and `F1` and their
covariance. Furthermore, one may wish to get a summary which can be printed
to the terminal via
```python
mcmc.print_summary()
```
which gives the maximum twoF value, median and standard-deviation, in this case
this is
```
Summary:
theta0 index: 0
Max twoF: 1771.50622559 with parameters:
F0 = 2.999999874e+01
F1 = -9.999802960e-11
Median +/- std for production values
F0 = 2.999999873e+01 +/- 6.004803009e-07
F1 = -9.999801583e-11 +/- 9.359959909e-16
```
# Fully coherent search on glitching data using MCMC
This example applies the basic [fully coherent
search using MCMC](fully_coherent_search_using_MCMC.md), to the glitching signal data set created in
[make fake data](make_fake_data.md]). The aim here is to illustrate the effect
such a signal can have on a fully-coherent search. The complete script for this
example canbe found
[here](../example/fully_coherent_search_using_MCMC_on_glitching_data.py).
After importing `pyfstat`, We setup a flat prior on `F0` and `F1`, based on the
values used to generate the signal:
```
from pyfstat import MCMCSearch
F0 = 30.0
F1 = -1e-10
F2 = 0
Alpha = 5e-3
Delta = 6e-2
tref = 362750407.0
tstart = 1000000000
duration = 100*86400
tend = tstart + duration
theta_prior = {'F0': {'type': 'unif', 'lower': F0-1e-4, 'upper': F0+1e-4},
'F1': {'type': 'unif', 'lower': F1*(1+1e-3), 'upper': F1*(1-1e-3)},
'F2': F2,
'Alpha': Alpha,
'Delta': Delta
}
```
In this search, we will use paralllel tempering (to help the walkers move
between the different peaks in the posterior).
```
ntemps = 2
log10temperature_min = -0.01
nwalkers = 100
nsteps = [500, 500]
mcmc = MCMCSearch('fully_coherent_search_using_MCMC_on_glitching_data', 'data',
sftfilepath='data/*_glitch*.sft',
theta_prior=theta_prior, tref=tref, tstart=tstart, tend=tend,
nsteps=nsteps, nwalkers=nwalkers, ntemps=ntemps,
log10temperature_min=log10temperature_min)
mcmc.run()
mcmc.plot_corner(add_prior=True)
```
Running this example, we obtain traces of the walkers like this:
![](img/fully_coherent_search_using_MCMC_on_glitching_data_walkers.png)
Although it is not obvious at first, the large widths of these traces in fact
show that the walkers are jumping between multiple peaks (for both `F0` and
`F1): this is possible, due to the tuning of the parallel tempering. To see this
clearly, we also plot the corner plot:
![](img/fully_coherent_search_using_MCMC_on_glitching_data_corner.png)
From this corner plot, we that unlike the in the [single glitch fully-coherent
search](full_coherent_search_using_MCMC.md), the posterior, even after a large
number of steps, is multimodal. However, these two peaks do **not** correspond
exactly to the two frequencies before and after the glitch, which would be
`30` and `30+4e5` (to see this, see how the data is
[generated](../examples/make_dake_data.py)). This is partly due to the noise
and partly due to the fact that the maximum detection statistic in the case
of glitches can occur at point *in between* the two frequencies. Moreover, we
see bimodality in `F1`, which did does not change during the glitch.
```
>>> mcmc.print_summary()
Max twoF: 422.97
```
That is, compared to the basic search (on a smooth signal) which had a twoF of
`~1764` (in agreement with the predicted twoF), we have lost a large
fraction of the SNR due to the glitch.
# Making fake data
Here, we describe the steps required to generate fake data which will be used
throughout the other examples. We will generate data based on the properties of
the Crab pulsar, first as a smooth CW signal, then as a CW signal which
contains one glitch, and finally as a signal with two glitches. This document
is based on the file [make_fake_data.py](../examples/make_fake_data.py).
## Smooth signal
In the following code segment, we import the `Writer` class used to generate
fake data, define the Crab parameters and create an instant of the `Writer`
called `data`
```python
import numpy as np
from pyfstat import Writer
# Define parameters of the Crab pulsar
F0 = 30.0
F1 = -1e-10
F2 = 0
Alpha = np.radians(83.6292)
Delta = np.radians(22.0144)
tref = 362750407.0
# Signal strength
h0 = 1e-23
# Properties of the GW data
sqrtSX = 1e-22
tstart = 1000000000
duration = 100*86400
tend = tstart+duration
data = Writer(
label='basic', outdir='data', tref=tref, tstart=tstart, F0=F0, F1=F1,
F2=F2, duration=duration, Alpha=Alpha, Delta=Delta, h0=h0, sqrtSX=sqrtSX)
```
We can now use the `data` object to create `.sft` files which contain a smooth
signal in Gaussian noise. In detail, the process consists first in calling
```python
data.make_cff()
```
which generates a file `data/basic.cff` (notice the directory and file name
are defined by the `outdir` and `label` arguments given to `Writer`). This
file contains instructions which will be passed to `lalapps_MakeFakedata_v5`,
namely
```
[TS0]
Alpha = 5.000000000000000104e-03
Delta = 5.999999999999999778e-02
h0 = 9.999999999999999604e-24
cosi = 0.000000000000000000e+00
psi = 0.000000000000000000e+00
phi0 = 0.000000000000000000e+00
Freq = 3.000000000000000000e+01
f1dot = -1.000000000000000036e-10
f2dot = 0.000000000000000000e+00
refTime = 362750407.000000
transientWindowType=rect
transientStartTime=1000000000.000
transientTauDays=100.000
```
Finally, we generate the `.sft` files by calling
```python
data.run_makefakedata()
```
In fact, the previous two commands are wrapped together by a single call to
`data.make_data()` which we will use from now on.
## Glitching signal
We now want to generate a set of data which contains a *glitching signal*. We
start with a simple case in which the glitch occurs half way through the
observation span. We define the properties of this signal, create
another `Writer` instance called `glitch_data`, and then run `make_data()`
```python
dtglitch = duration/2.0
delta_F0 = 0.4e-5
delta_F1 = 0
glitch_data = Writer(
label='glitch', outdir='data', tref=tref, tstart=tstart, F0=F0, F1=F1,
F2=F2, duration=duration, Alpha=Alpha, Delta=Delta, h0=h0, sqrtSX=sqrtSX,
dtglitch=dtglitch, delta_F0=delta_F0, delta_F1=delta_F1)
glitch_data.make_data()
```
It is worth noting the difference between the config file for the non-glitching
signal and this config file (`data/glitch.cff`) which reads
```
[TS0]
Alpha = 5.000000000000000104e-03
Delta = 5.999999999999999778e-02
h0 = 9.999999999999999604e-24
cosi = 0.000000000000000000e+00
psi = 0.000000000000000000e+00
phi0 = 0.000000000000000000e+00
Freq = 3.000000000000000000e+01
f1dot = -1.000000000000000036e-10
f2dot = 0.000000000000000000e+00
refTime = 362750407.000000
transientWindowType=rect
transientStartTime=1000000000.000
transientTauDays=50.000
[TS1]
Alpha = 5.000000000000000104e-03
Delta = 5.999999999999999778e-02
h0 = 9.999999999999999604e-24
cosi = 0.000000000000000000e+00
psi = 0.000000000000000000e+00
phi0 = -1.612440256772935390e+04
Freq = 3.000000400000000056e+01
f1dot = -1.000000000000000036e-10
f2dot = 0.000000000000000000e+00
refTime = 362750407.000000
transientWindowType=rect
transientStartTime=1004320000.000
transientTauDays=50.000
```
The glitch config file uses transient windows to create two non-overlapping,
but continuous signals.
## Expected twoF
Finally, the `Writer` class also provides a wrapper of `lalapps_PredictFstat`.
So calling
```python
>>> print data.predict_fstat()
1721.1
```
Notice that the predicted value will be the same for both sets of data.
## Making data with multiple glitches
Finally, one can also use the `Writer` to generate data with multiple glitches.
To do this, simply pass in `dtglitch`, `delta_phi`, `delta_F0`, `delta_F1`, and
`delta_F2` as arrays (with a length equal to the number of glitches). Note
that all these must be of equal length. Moreover, the glitches are applied
sequentially and additively as implemented
`pyfstat.BaseSearchClass.calculate_thetas`. Here is an example with two
glitches, one a quarter of the way through and the other a fifth from the end.
```python
dtglitch = [duration/4.0, 4*duration/5.0]
delta_phi = [0, 0]
delta_F0 = [0.4e-5, 0.3e-6]
delta_F1 = [0, 0]
delta_F2 = [0, 0]
two_glitch_data = Writer(
label='two_glitch', outdir='data', tref=tref, tstart=tstart, F0=F0, F1=F1,
F2=F2, duration=duration, Alpha=Alpha, Delta=Delta, h0=h0, sqrtSX=sqrtSX,
dtglitch=dtglitch, delta_phi=delta_phi, delta_F0=delta_F0,
delta_F1=delta_F1, delta_F2=delta_F2)
two_glitch_data.make_data()
```
So, having run `$ python make_fake_data.py` (from the `examples` directory), we
will see that in the sub-directory `examples/data/` there are three `.sft`
files. These will be used throughout the other examples.
# Semi-coherent glitch search on data with a single glitch using MCMC