Commit ad4abc08 authored by Gregory Ashton's avatar Gregory Ashton
Browse files

Update readme and improve the documentation and examples

parent e65bca6f
# PyFstat
This is a python package containing basic wrappers of the `lalpulsar` module
with capabilities to perform a variety of searches, primarily focussing on
with capabilities to perform a variety of searches, primarily focusing on
semi-coherent glitch searches.
## Examples
All examples can be run from their source scripts in [examples](examples), or
for each example there is descriptive documentation:
We include a variety of example search scripts [here](examples), for each
example there is also a more descriptive write-up containing examples of the
output which we list below. Before running any of the search examples, be sure
to have run the [script to generate fake data](examples/make_fake_data.py).
* [Making fake data with and without glitches](docs/make_fake_data.md)
* [Fully coherent MCMC search](docs/fully_coherent_search.md)
......@@ -17,35 +19,41 @@ for each example there is descriptive documentation:
The script can be installed system wide via
```
python setup.py install
$ python setup.py install
```
or simply add this directroy to your python path
or simply add this directory to your python path. To check that the installation
was successful, run
```
$ python -c 'import pyfstat'
```
if no error message is output, then you have installed `pyfstat`.
### Ephemeris installation
The scripts require a path to ephemeris files in order to use the
`lalpulsar.ComputeFstat` module. This can either be specified when initialsing
each search, or more simply by playing a file `~/pyfstat.conf` in your home
`lalpulsar.ComputeFstat` module. This can either be specified when initialising
each search, or more simply by placing a file `~/pyfstat.conf` into your home
directory which looks like
```
earth_ephem = '/home/<USER>/lalsuite-install/share/lalpulsar/earth00-19-DE421.dat.gz'
sun_ephem = '/home/<USER>/lalsuite-install/share/lalpulsar/sun00-19-DE421.dat.gz'
```
where this uses the default ephemeris files provided with `lalsuite`.
here, we use the default ephemeris files provided with `lalsuite`.
### Dependencies
* swig-enabled lalpulsar, a minimal configuration is given by
```
./configure --prefix=${HOME}/lalsuite-install --disable-all-lal --enable-lalpulsar --enable-lalapps --enable-swig
$ ./configure --prefix=${HOME}/lalsuite-install --disable-all-lal --enable-lalpulsar --enable-lalapps --enable-swig
```
* [emcee](http://dan.iel.fm/emcee/current/)[^1]
* [corner](https://pypi.python.org/pypi/corner/)[^1]
* [dill](https://pypi.python.org/pypi/dill)[^1]
* [tqdm](https://pypi.python.org/pypi/tqdm)[^1] (optional), if installed, this
provides a useful progress bar and estimate of the remaining run-time.
[^1]: Most easily installed using either `conda` or `pip`
[^1]: Most easily installed using either `conda` or `pip`.
# Fully coherent search using MCMC
In this example, we will show the basics of setting up and running an MCMC
In this example, we will show the basics of setting up and running a MCMC
search for a fully-coherent search. This is based on the example
[fully_coherent_search.py](../example/fully_coherent_search.py). We will run
the search on the `basic` data generated in the
[make_fake_data](make_fake_data.md) example.
We first need to import the search tool, in this example we will use the
First, we need to import the search tool, in this example we will use the
`MCMCSearch`, but one could equally use `MCMCGlitchSearch` with `nglitch=0`.
We first import this,
To import this,
```
```python
from pyfstat import MCMCSearch
```
Next, we define some variables defining the center of our search, namely the
Crab parameters (with which the `basic` data was produced):
Next, we define some variables defining the exact parameters of the signal
in the data, and the start and end times:
```
```python
F0 = 30.0
F1 = -1e-10
F2 = 0
......@@ -30,14 +30,17 @@ duration = 100*86400
tend = tstart = duration
```
Next, we specify our prior, this is done using a dictionary:
Now, we need to specify out prior. This is a dictionary containing keys for
each variable (in the `MCMCSearch` these are `F0`, `F1`, `F2`, `Alpha`, and
`Delta`). In this example, we choose a uniform box in `F0` and `F1`:
```
theta_prior = {'F0': {'type': 'norm', 'loc': F0, 'scale': abs(1e-6*F0)},
'F1': {'type': 'norm', 'loc': F1, 'scale': abs(1e-6*F1)},
```python
theta_prior = {'F0': {'type': 'unif', 'lower': F0*(1-1e-6), 'upper': F0*(1+1e-6)},
'F1': {'type': 'unif', 'lower': F1*(1+1e-2), 'upper': F1*(1-1e-2)},
'F2': F2,
'Alpha': Alpha,
'Delta': Delta}
'Delta': Delta
}
```
Each key and value of the `theta_prior` contains an instruction to the MCMC
search. If the value is a scalar, the MCMC search holds these fixed (as is the
......@@ -50,67 +53,70 @@ the binary parameters.
Next, we define the parameters of the MCMC search:
```
ntemps = 1
```python
ntemps = 4
log10temperature_min = -1
nwalkers = 100
nsteps = [100, 500, 1000]
nsteps = [1000, 1000]
```
These can be considered the *tuning parameters* of the search. In this instance
we use a single temperature with 100 walkers. For the number of steps we use
a 2-step process. First we run the simulation starting from values picked
about the prior for `100` steps, afterwhich we take the maximum likelihood
chain and pick a new set of walkers scattered about this. This step removes
walkers which get lost is islands of low probability (it can be removed by
simply called `nsteps = [500, 1000]`). Finally, the simulation run for `500`
steps of burn-in then `1000` steps of production to estimate the posterior.
Passing all this to the MCMC search, we also need to give it a label and
directory to save the data and provide `sftfilepath`, a string matching
These can be considered the *tuning parameters* of the search. A complete
discussion of these can be found [here](tuning_parameters.md).
Passing all this to the MCMC search, we also need to give it a label, a
directory to save the data, and provide `sftfilepath`, a string matching
the data to use in the search
```
mcmc = MCMCSearch('fully_coherent', 'data', sftfilepath='data/*basic*sft',
theta_prior=theta_prior, tref=tref, tstart=tstart, tend=tend,
nsteps=nsteps, nwalkers=nwalkers, ntemps=ntemps,
scatter_val=1e-10)
```python
mcmc = MCMCSearch(label='fully_coherent_search_using_MCMC', outdir='data',
sftfilepath='data/*basic*sft', theta_prior=theta_prior,
tref=tref, tstart=tstart, tend=tend, nsteps=nsteps,
nwalkers=nwalkers, ntemps=ntemps,
log10temperature_min=log10temperature_min)
```
To run code, we call
To run the simulation, we call
```
```python
mcmc.run()
```
This produces two `png` images. The first is the chains after the
initialisation step
![](img/fully_coherent_init_0_walkers.png)
Here one can see there are multiple evenly spaced solutions (at least in
frequency) corresponding to different peaks
in the likelihood. Taking the largest peak, the walkers are reinitialised about
this point (scattered in a relative way by the `scatter_val`) and run again
producing
![](img/fully_coherent_walkers.png)
Now once can see that the chains have converged onto the single central peak.
This produces two `.png` images. The first is the position of the *walkers*
during the simulation:
![](img/fully_coherent_using_MCMC_walkers.png)
This shows (in red) the position of the walkers during the burn-in stage. They
are initially defuse (they start from positions randomly picked from the prior),
but eventually converge to a single stable solution. The black is the production
period from which posterior estimates are made. The bottom panel is a histogram
of `twoF`, split into production and burn-in. Note that, early on there are
multiple modes corresponding to other peaks, by using the parallel tempering,
we allow the walkers to explore all of these peaks and opt for the strong
central candidate.
To get posteriors, we call
```
```python
mcmc.plot_corner()
```
which produces a corner plot
![](img/fully_coherent_corner.png)
![](img/fully_coherent_using_MCMC_corner.png)
illustrating the tightly constrained posteriors on `F0` and `F1` and their
covariance. Furthermore, one may wish to get a summary which can be printed
to the terminal via
```
```python
mcmc.print_summary()
```
which gives the maximum twoF value, median and standard-deviation, in this case
this is
```
Max twoF: 1756.44177246
F0 = 2.999999974e+01 +/- 6.363964377e-08
F1 = -9.999960025e-11 +/- 9.920455272e-17
Summary:
theta0 index: 0
Max twoF: 1771.50622559 with parameters:
F0 = 2.999999874e+01
F1 = -9.999802960e-11
Median +/- std for production values
F0 = 2.999999873e+01 +/- 6.004803009e-07
F1 = -9.999801583e-11 +/- 9.359959909e-16
```
......@@ -2,9 +2,9 @@
Here, we describe the steps required to generate fake data which will be used
throughout the other examples. We will generate data based on the properties of
the Crab pulsar, first as a smooth CW signal and then as a CW signal which
contains a glitch. This document is based on the file
[make_fake_data.py](../examples/make_fake_data.py).
the Crab pulsar, first as a smooth CW signal, then as a CW signal which
contains one glitch, and finally as a signal with two glitches. This document
is based on the file [make_fake_data.py](../examples/make_fake_data.py).
## Smooth signal
......@@ -12,7 +12,7 @@ In the following code segment, we import the `Writer` class used to generate
fake data, define the Crab parameters and create an instant of the `Writer`
called `data`
```
```python
from pyfstat import Writer
# Define parameters of the Crab pulsar
......@@ -40,7 +40,7 @@ data = Writer(
We can now use the `data` object to create `.sft` files which contain a smooth
signal in Gaussian noise. In detail, the process consists first in calling
```
```python
data.make_cff()
```
which generates a file `data/basic.cff` (notice the directory and file name
......@@ -67,7 +67,7 @@ transientTauDays=100.000
Finally, we generate the `.sft` files by calling
```
```python
data.run_makefakedata()
```
......@@ -82,7 +82,7 @@ start with a simple case in which the glitch occurs half way through the
observation span. We define the properties of this signal, create
another `Writer` instance called `glitch_data`, and then run `make_data()`
```
```python
dtglitch = duration/2.0
delta_F0 = 0.4e-5
delta_F1 = 0
......@@ -136,7 +136,7 @@ but continuous signals.
Finally, the `Writer` class also provides a wrapper of `lalapps_PredictFstat`.
So calling
```
```python
>>> print data.predict_fstat()
1721.1
```
......@@ -153,7 +153,7 @@ sequentially and additively as implemented
`pyfstat.BaseSearchClass.calculate_thetas`. Here is an example with two
glitches, one a quarter of the way through and the other a fifth from the end.
```
```python
dtglitch = [duration/4.0, 4*duration/5.0]
delta_phi = [0, 0]
delta_F0 = [0.4e-5, 0.3e-6]
......
# Tuning parameters
This page needs to be completed with a description of the tuning parameters,
how they should be set, and the diagnostic tools which can be used in doing so.
......@@ -11,21 +11,23 @@ tstart = 1000000000
duration = 100*86400
tend = tstart + duration
theta_prior = {'F0': {'type': 'norm', 'loc': F0, 'scale': abs(1e-6*F0)},
'F1': {'type': 'norm', 'loc': F1, 'scale': abs(1e-6*F1)},
theta_prior = {'F0': {'type': 'unif', 'lower': F0*(1-1e-6), 'upper': F0*(1+1e-6)},
'F1': {'type': 'unif', 'lower': F1*(1+1e-2), 'upper': F1*(1-1e-2)},
'F2': F2,
'Alpha': Alpha,
'Delta': Delta
}
ntemps = 1
ntemps = 4
log10temperature_min = -1
nwalkers = 100
nsteps = [100, 500, 1000]
nsteps = [1000, 1000]
mcmc = MCMCSearch('fully_coherent', 'data', sftfilepath='data/*basic*sft',
theta_prior=theta_prior, tref=tref, tstart=tstart, tend=tend,
nsteps=nsteps, nwalkers=nwalkers, ntemps=ntemps,
scatter_val=1e-10)
mcmc = MCMCSearch(label='fully_coherent_search_using_MCMC', outdir='data',
sftfilepath='data/*basic*sft', theta_prior=theta_prior,
tref=tref, tstart=tstart, tend=tend, nsteps=nsteps,
nwalkers=nwalkers, ntemps=ntemps,
log10temperature_min=log10temperature_min)
mcmc.run()
mcmc.plot_corner()
mcmc.plot_corner(add_prior=True)
mcmc.print_summary()
......@@ -2,7 +2,7 @@ from pyfstat import Writer
# First, we generate data with a reasonably strong smooth signal
# Define parameters of the Crab pulsar
# Define parameters of the Crab pulsar as an example
F0 = 30.0
F1 = -1e-10
F2 = 0
......@@ -37,7 +37,7 @@ glitch_data = Writer(
glitch_data.make_data()
# The predicted twoF, given by lalapps_predictFstat can be accsessed by
# The predicted twoF, given by lalapps_predictFstat can be accessed by
print data.predict_fstat()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment