Commit edad8ef3 authored by Gregory Ashton's avatar Gregory Ashton
Browse files

Various improvements to the paper

parent 4cac23fe
......@@ -28,10 +28,10 @@ data = pyfstat.Writer(
Delta=Delta, h0=h0, sqrtSX=sqrtSX)
data.make_data()
m = 0.001
m = 1
dF0 = np.sqrt(12*m)/(np.pi*duration)
DeltaF0 = 800*dF0
F0s = [F0-DeltaF0/2., F0+DeltaF0/2., dF0]
DeltaF0 = 30*dF0
F0s = [F0-DeltaF0/2., F0+DeltaF0/2., 1e-2*dF0]
F1s = [F1]
F2s = [F2]
Alphas = [Alpha]
......@@ -47,18 +47,19 @@ frequencies = np.unique(search.data[:, xidx])
twoF = search.data[:, -1]
#mismatch = np.sign(x-F0)*(duration * np.pi * (x - F0))**2 / 12.0
ax.plot(frequencies, twoF, 'k', lw=1)
ax.plot(frequencies, twoF, 'k', lw=0.8)
DeltaF = frequencies - F0
sinc = np.sin(np.pi*DeltaF*duration)/(np.pi*DeltaF*duration)
A = np.abs((np.max(twoF)-4)*sinc**2 + 4)
ax.plot(frequencies, A, '-r', lw=1)
ax.plot(frequencies, A, 'r', lw=1.2, dashes=(0.2, 0.2))
ax.set_ylabel('$\widetilde{2\mathcal{F}}$')
ax.set_xlabel('Frequency')
ax.set_xlim(F0s[0], F0s[1])
dF0 = np.sqrt(12*1)/(np.pi*duration)
xticks = [F0-10*dF0, F0, F0+10*dF0]
ax.set_xticks(xticks)
xticklabels = ['$f_0 {-} 10\Delta f$', '$f_0$', '$f_0 {+} 10\Delta f$']
xticklabels = [r'$f_0 {-} 10\Delta f(\tilde{\mu}=1)$', '$f_0$',
r'$f_0 {+} 10\Delta f(\tilde{\mu}=1)$']
ax.set_xticklabels(xticklabels)
plt.tight_layout()
fig.savefig('{}/{}_1D.png'.format(search.outdir, search.label), dpi=300)
......@@ -16,7 +16,7 @@ data_tstart = tstart - duration
data_tend = data_tstart + 3*duration
tref = .5*(data_tstart+data_tend)
h0 = 5e-24
h0 = 3e-24
sqrtSX = 1e-22
transient = pyfstat.Writer(
......
......@@ -494,3 +494,62 @@ year = {2015}
year={1998},
publisher={Taylor \& Francis}
}
@Article{ hobbs2010,
author = {Hobbs, G. and Lyne, A.~G. and Kramer, M.},
doi = {10.1111/j.1365-2966.2009.15938.x},
issn = {00358711},
journal = {Monthly Notices of the Royal Astronomical Society},
keywords = {1 i n t,6000 years of pulsar,archive of pulsar
observations,contains over,general,pulsars,ro d u
c,rotational history,t i o n,the jodrell bank data,the
pulsar timing method},
month = feb,
number = {2},
pages = {1027-1048},
title = {{An analysis of the timing irregularities for 366
pulsars}},
url = {http://doi.wiley.com/10.1111/j.1365-2966.2009.15938.x},
volume = {402},
year = {2010}
}
@article{ashton2017,
title = {{Statistical characterization of pulsar glitches and their
potential impact on searches for continuous gravitational
waves}},
author = {Ashton, G. and Prix, R. and Jones, D. I.},
note = {In prep},
year = {2017}
}
@article{Papa2016,
title = {{Hierarchical follow-up of subthreshold candidates of an all-sky Einstein@Home search for continuous gravitational waves on LIGO sixth science run data}},
author = {{Papa}, M.~A. and {Eggenstein}, H-B. and {Walsh}, S. and {Di Palma}, I. and {Allen}, B. and {Astone}, P. and {Bock}, O. and {Creighton}, T.~D. and {Keitel}, D. and {Machenschalk}, B. and {Prix}, R. and {Siemens}, X. and {Singh}, A. and {Zhu}, S.~J. and {Schutz}, B.~F.},
journal = {Phys. Rev. D},
volume = {94},
issue = {12},
pages = {122006},
numpages = {14},
year = {2016},
month = {Dec},
publisher = {American Physical Society},
doi = {10.1103/PhysRevD.94.122006},
url = {http://link.aps.org/doi/10.1103/PhysRevD.94.122006}
}
@article{dupuis2005,
title = {Bayesian estimation of pulsar parameters from gravitational wave data},
author = {Dupuis, R\'ejean J. and Woan, Graham},
journal = {Phys. Rev. D},
volume = {72},
issue = {10},
pages = {102002},
numpages = {9},
year = {2005},
month = {Nov},
publisher = {American Physical Society},
doi = {10.1103/PhysRevD.72.102002},
url = {http://link.aps.org/doi/10.1103/PhysRevD.72.102002}
}
......@@ -9,7 +9,7 @@
\newcommand{\tstart}{t_{\rm start}}
\newcommand{\tend}{t_{\rm end}}
\newcommand{\Nglitches}{N_{\rm g}}
\newcommand{\Tspan}{T_{\rm span}}
\newcommand{\Tspan}{T}
\newcommand{\Tcoh}{T_{\rm coh}}
\newcommand{\tref}{t_{\rm ref}}
\newcommand{\Nseg}{N_{\rm seg}}
......@@ -23,7 +23,7 @@
\newcommand{\ho}{h_0}
\newcommand{\homax}{\ho^{\rm max}}
\newcommand{\Bsn}{B_{\rm S/N}}
\newcommand{\Pic}{\Pi_{\rm c}}
\newcommand{\AmplitudePrior}{\Pi_\mathcal{A}}
\newcommand{\mutilde}{\tilde{\mu}}
\newcommand{\Sn}{S_{\rm n}}
\newcommand{\V}{\mathcal{V}}
......
Paper/grided_frequency_search_1D.png

42.1 KB | W: | H:

Paper/grided_frequency_search_1D.png

61.2 KB | W: | H:

Paper/grided_frequency_search_1D.png
Paper/grided_frequency_search_1D.png
Paper/grided_frequency_search_1D.png
Paper/grided_frequency_search_1D.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -12,7 +12,6 @@
\def\BasicExamplehzero{1.0{\times}10^{-24}}
\def\BasicExamplenburn{50.0}
\def\BasicExamplenprod{50.0}
\def\DirectedMC{}GlitchNoiseN{970.0}
\def\DirectedMCNoiseN{1.0{\times}10^{4}}
\def\DirectedMCNoiseOnlyMaximum{52.4}
\def\DirectedMConeGlitchNoiseN{10000}
......
......@@ -51,7 +51,7 @@
\begin{document}
\title{MCMC follow-up methods for continuous gravitational wave candidates}
\title{MCMC follow-up of continuous gravitational wave candidates}
\author{G. Ashton}
\email[E-mail: ]{gregory.ashton@ligo.org}
......@@ -67,18 +67,15 @@
\begin{abstract}
We detail methods to follow-up potential CW signals (as identified by
wide-parameter space semi-coherent searches) leveraging MCMC optimisation of
the $\mathcal{F}$-statistic. Such a framework provides a unique advantage when
used during the `zoom' (in which the coherence time is increased aiming to
localise the fully-coherent candidate) in that several candidates can be
effeciently followed up simultaneously. We describe an automated method to
define the number of zoom stages and verify such methods work in a Monte-Carlo
study. More, MCMC optimisation naturally produces parameter estimation for the
final fully-coherent candidate. Finally, we show that with minor modifications
the follow-up may allow for the CW waveform to be transient or undergo
glitches; this may allow the discovery of signals which would otherwise go
underdetected.
We describe methods to follow-up potential continuous gravitational wave
candidates (as identified by wide-parameter space semi-coherent searches)
leveraging MCMC optimisation of the $\mathcal{F}$-statistic. Such a framework
provides a unique advantage in both following-up standard signals (i.e. during
the `zoom-stage' in which the coherence time is increased aiming to localise
the fully-coherent candidate and in parameter estimation) as well as follow-up
of non-standard waveforms. In particular, we demonstrate provide a
glitch-robust statistic for follow-up of glitching continuous gravitational
wave signals.
\end{abstract}
......@@ -91,8 +88,8 @@ underdetected.
\section{Introduction}
A possible target for the advanced gravitational wave detector network of LIGO
and Virgo are long-lived periodic sources called continuous waves (CWs).
A target for the advanced gravitational wave detector network of LIGO
and Virgo are long-lived periodic sources called continuous gravitational waves (CWs).
Rapidly rotating nonaxisymmetric neutron stars are potentially capable of
producing detectable CWs which last much longer than typical observation spans.
There exists three well known sources of the nonaxisymmetry: `mountains',
......@@ -107,49 +104,51 @@ nonaxisymmetric source produces a strain in the detector $h(t, \A, \blambda)$;
where $\A{=}\{h_0, \cos\iota, \psi, \phi_0\}$ is a vector of the four
\emph{amplitude-parameters} (expressed in `physical coordinates') and
$\blambda$ is a vector of the \emph{Doppler-parameters} consisting of the
sky-location, frequency $f$, and any spindown terms required by the search.
sky-location, frequency $f$, and any spindown terms $\{\dot{f}, \ddot{f},
\ldots\}$.
CW searches typically use a fully-coherent matched-filtering methods whereby a
template (the signal model at some specific set of parameters) is convolved
against the data resulting in a detection statistic. Since the signal
CW searches often use a fully-coherent matched-filtering methods whereby a
\emph{template} (the signal model at some specific set of parameters) is
convolved against the data resulting in a detection statistic (in this work, we
will consider the $\F$-statistic, defined in \citet{jks1998}). Since the signal
parameters are unknown, it is usual to perform this matched filtering over a
grid of points. Three search categories can be identified: \emph{targeted}
searches for a signal from a known electromagnetic pulsar where the Doppler
parameters are considered `known'; \emph{directed} searches in which the
location is known, but not the frequency and spin-down (i.e. searching for the
neutron star in a supernova remnant which does not have a known pulsar); and
\emph{all-sky} searches where none of the parameters are considered known.
Searching over more parameters amounts to an increase in the dimension of the
search space. Since the density of grid points required to resolve a signal
scales inversely with total coherence time $\Tcoh$ (the span of data used in
a fully-coherent matched filter), wide-parameter searches (such as the all-sky)
with many search dimensions over long durations of data are computationally
demanding.
grid of points to maximise the detection statistic. Three search categories
can be identified: \emph{targeted} searches for a signal from a known
electromagnetic pulsar where the Doppler parameters are considered `known';
\emph{directed} searches in which the location is known, but not the frequency
and spin-down (i.e. searching for the neutron star in a supernova remnant
which does not have a known pulsar); and \emph{all-sky} searches where none of
the parameters are considered known. Searching over more parameters amounts to
an increase in the dimension of the search space. Since the density of grid
points required to resolve a signal scales inversely with total coherence time
$\Tcoh$ (the span of data used in a fully-coherent matched filter),
wide-parameter searches (such as the all-sky) with many search dimensions over
long durations of data are computationally infeasible.
At a fixed computing cost, it has been shown (see for example \citep{brady1998,
prix2012}) that a semi-coherent search is more sensitive for unknown signals
than a fully-coherent search. While the exact details of how the search works
depends on the implementation, semi-coherent search work by splitting the total
observation span $\Tspan$ into $\Nseg$ segments (each lasting for $\Tcoh$) and
in each segment computes the fully-coherent detection statistic; the
semi-coherent detection statistic is then computed by some combination of all
segments summed at the same point in parameter space. Fundamentally, this gain
in sensitivity is because the width of a peak in the detection statistic due to
a signal is inversely proportional to the coherence time: shorter coherence times
make the peak wider and hence the a lower density of templates. This idea was
first proposed by \citet{brady2000} along with the first implementation, the
`Stack-slide' search. Since then, several modifications such as the
`Hough-transform' \citep{krishnan2004, astone2014}, and the `Powerflux' method
(first described in \citet{allyskyS42008}) have been proposed, implemented and
applied to gravitational wave searches.
depends on the implementation, a semi-coherent search involves by dividing the
total observation span $\Tspan$ into $\Nseg$ segments (each lasting for
$\Tcoh$) and in each segment computes the fully-coherent detection statistic;
the semi-coherent detection statistic is then computed by some combination of
all segments summed at the same point in parameter space. Fundamentally, this
gain in sensitivity is because the width of a peak in the detection statistic
due to a signal is inversely proportional to the coherence time: shorter
coherence times make the peak wider and hence a lower density of templates is
required. This idea was first proposed by \citet{brady2000} along with the
first implementation, the `Stack-slide' search. Since then, several
modifications such as the `Hough-transform' \citep{krishnan2004, astone2014},
and the `Powerflux' method (first described in \citet{allyskyS42008}) have been
proposed, implemented and applied to gravitational wave searches.
Wide parameter space searches produce a list of candidates with an associated
detection statistic which passes some threshold. In order to verify these
candidates, they are subjected to a \emph{followed-up}: a process of increasing
candidates, they are subjected to a \emph{follow-up}: a process of increasing
the coherence time, eventually aiming to calculate a fully-coherent detection
statistic over the maximal span of data. In essence, the semi-coherent search
is powerful as it spreads the significance of a candidate over a wider area of
parameter space, so a follow-up attempts to reverse this process and recover
parameter space, the follow-up attempts to reverse this process and recover
the maximum significance and tightly constrain the candidate parameters. The
original hierarchical follow-up of \citet{brady2000} proposed a two-stage method
(an initial semi-coherent stage followed directly by a fully-coherent search.
......@@ -161,35 +160,48 @@ computational cost.
The first implementation of a two-stage follow-up was given by
\citet{shaltev2013} and used the Mesh Adaptive Direct Search algorithm for
optimisation. At the time of publication, this method was limited to two stages
and could not handle binary parameters \comment{(I think these limitations have
now been removed, but I can't find a publication)}, however these are practical
limitations which can \comment{(have?)} be overcome.
\comment{Add something on multiple modes?}
optimisation. A multi-stage gridded approach was presented in \citet{Papa2016}.
In this paper, we propose an alternative hierarchical follow-up procedure using
Markov-Chain Monte-Carlo (MCMC) as the optimisation tool. In terms of the
semi-coherent to follow-up procedure, an MCMC tool is advantages due to it's
ability to trace the evolution multiple modes simultaneously through the
follow-up procedure and allow the optimisation to decide between them without
arbitrary cuts. In addition, MCMC methods also provide two further
advantages: they can calculate directly calculate Bayes factors, the significance
of a candidate and because they are `grid-less' one can arbitrarily vary the
waveform model without requiring an understanding of the underlying topology.
We will exploit this latter property to propose an additional step in the
follow-up procedure which allows for the CW signal to be either a transient-CW
(a periodic signal lasting $\mathcal{O}(\textrm{hours-weeks})$) or to undergo
glitches (as seen in pulsars).
Markov-Chain Monte-Carlo (MCMC) as the optimisation tool. An MCMC tool is
advantages due to its ability to trace the evolution of multiple modes
simultaneously through the follow-up procedure and allow the optimisation to
decide between them without arbitrary cuts.
The follow-up of canonical CW signals is a well studied problem and, provided
they behave as expected, the maximum loss of detection statistic (i.e. the
template bank mismatch) can be calculated. However, nature does not always
behave as expected and it is plausible that the CW signals in the data may
differ from the templates used in wide-parameter space searches. There are of
course an unlimited number of ways this may manifest given our ignorance of
neutron stars, but from the study of pulsars three obvious mechanisms present
themselves: \emph{transient-}, \emph{glitching-}, and \emph{noisy-} signals.
A search method for long-duration transient signals (i.e. signals which
last from hours to weeks) was first presented by
\citet{prix2011}, in Sec.~\ref{sec_transients} we will describe how MCMC
optimisation can be used to perform parameter estimation using this method.
In \citet{ashton2017}, we noted the need for a glitch-robust follow-up method
(i.e. a search capable of finding signals which undergo sudden changed in the
rotation frequency as seen in pulsars); to respond to this in Sec.~\ref{sec_glitches}
we provide such a statistic, demonstrate its use on simulated glitching
signals, and discuss significance estimation. For noisy-signals, specifically
motivated by the observation of so-called timing noise in radio pulsars (see
\citet{hobbs2010} for a review), in \citet{ashton2015} we determined that these
where unlikely to be of immediate concern.
The methods introduced in this paper have been implemented in the
\texttt{python} package \texttt{pyfstat}. All examples in this work, along with
tutorials can be found at
\url{https://gitlab.aei.uni-hannover.de/GregAshton/PyFstat}.
We begin in Section~\ref{sec_hypothesis_testing} with a review of search
methods from a Bayesian perspective. Then in
Section~\ref{sec_MCMC_and_the_F_statistic} we introduce the MCMC optimisation
procedure and give details of our particular implementation. In
Section~\ref{sec_follow_up} we will illustrate applications of the method and
provide a prescription for choosing the setup. In Sections~\ref{sec_transients}
and \ref{sec_glitches} we demonstrate how searches can be performed for either
transient or glitches CWs before we finally conclude in
Section~\ref{sec_conclusion}.
Section~\ref{sec_follow_up} we describe the zoom follow-up procedure and test
its efficient. In Sections~\ref{sec_transients} and \ref{sec_glitches} we
demonstrate how searches can be performed for either transient or glitches CWs
before we finally conclude in Section~\ref{sec_conclusion}.
\section{Hypothesis testing}
\label{sec_hypothesis_testing}
......@@ -204,13 +216,13 @@ way to write the odds as
O_{\rm S/N} \equiv \frac{P(\Hs| x, I)}{P(\Hn| x, I)} =
\Bsn(x| I) \frac{P(\Hs| I)}{P(\Hn | I)},
\end{equation}
where the second factor is the prior odds while the first factor is the
where the second factor is the \emph{prior odds} while the first is the
\emph{Bayes factor}:
\begin{equation}
\Bsn(x| I) = \frac{P(x| \Hs, I)}{P(x| \Hn, I)}.
\end{equation}
Typically, we set the prior odds to unity such that it is the Bayes factor
which determines our confidence in the signal hypothesis. In this work we will
which determines our confidence in the signal hypothesis. In this work, we will
therefore discuss the Bayes factor with the implied assumption this is
equivalent to the odds, unless we have a good reason to change the prior odds.
......@@ -236,17 +248,18 @@ where
is the \emph{likelihood-ratio}.
At this point, we can appreciate the problems of searching for unknown signals:
one has four amplitude parameters and several Doppler parameters (three plus
the number of spin-down and binary parameters) over which this integral must be
performed. If a single signal exists in the data, this corresponds to a single
peak in the likelihood-ratio, but at an unknown location. Therefore, one must
must first search for peaks (candidates), and then subsequently analyse their
significance. If one has prior knowledge this can be used; for example in
targeted searches for gravitational waves from known pulsars emitting CWs from
a mountain the Doppler parameters are considered known collapsing their
integral to a single evaluation.
\subsection{The $\F$-statistic}
one has four amplitude parameters and several Doppler parameters over which
this integral must be performed. If a single signal exists in the data, this
corresponds to a single peak in the likelihood-ratio, but at an unknown
location. Therefore, one must must first search for peaks (candidates), and
then subsequently analyse their significance. If one has prior knowledge this
can be used; for example in targeted searches for gravitational waves from
known pulsars emitting CWs from a mountain the Doppler parameters are
considered known collapsing their integral to a single evaluation (see for
example the method proposed by \citet{dupuis2005}).
\comment{Should we cite other known-pulsar methodologies?}
\subsection{The $\F$-statistic: a frequentist perspective}
For directed and all-sky searches, a common method introduced by
\citet{jks1998} to reduce the parameter space is the maximum-likelihood
......@@ -257,9 +270,11 @@ this likelihood-ratio with respect to the four amplitude parameters results
(c.f.~\citet{prix2009}) in a maximised log-likelihood given by $\F(x|
\blambda)$: the so-called $\F$-statistic. Picking a particular set of Doppler
parameters $\blambda$ (the template) one can then compute a detection statistic
(typically $2\F$ is used) which can be used to quantify the significance of the
template. Usually this is done by calculating a corresponding false alarm rate,
the probability of seeing such a detection statistic in Gaussian noise.
(typically $\twoFtilde$ is used\footnote{Note that the `tilde' denote that it
is a fully-coherent quantity while we use a `hat' to denote a semi-coherent
quantity}) which can be used to quantify the significance of the template.
Usually this is done by calculating a corresponding false alarm rate, the
probability of seeing such a detection statistic in Gaussian noise.
Calculations of the significance are possible due to the properties of the
detection statistic: in Gaussian noise, it can be shown \citep{jks1998,
......@@ -281,60 +296,52 @@ where $(h|h)$ is the inner product of the signal with itself (see for example
the detector and $\mathcal{N}$ is the number of detectors.
\subsection{Using the $\F$-statistic to compute a Bayes factor} At first, it
\subsection{The $\F$-statistic: a Bayesian perspective}
At first, it
appears that the $\F$-statistic is independent of the Bayesian framework since
it was first derived directly from the likelihood. However, it was shown by
\citet{prix2009} that if we marginalise over the four amplitude parameters of
Equation~\eqref{eqn_full_bayes}, choosing a prior $\Pic$ such that
\begin{equation}
P(\A| \Hs, \Pic, I) \equiv \left\{
\begin{array}{ll}
C & \textrm{ for } \ho < \homax \\
0 & \textrm{ otherwise}
\end{array}
\right..
\end{equation}
it was first derived directly from the likelihood. However, it was shown in
\citet{prix2009, prix2011} that if we marginalise over the four amplitude parameters of
Equation~\eqref{eqn_full_bayes} (using the ``$\F$-statistic prior'' \citep{prix2011},
which we denote by $\AmplitudePrior$)
then the integral, when $\homax \gg 1$, is a Gaussian integral and can be
computed analytically as
\begin{align}
\Bsn(x| \Pic, \blambda, I) & \equiv
\begin{split}
\Bsn(x| \blambda, \AmplitudePrior, I) & \equiv
\int
\mathcal{L}(x ;\A, \blambda)
P(\A| \Hs, I) d\A
P(\A| \Hs, \AmplitudePrior, I) d\A
\label{eqn_B_S_N}
\\
& = \frac{C (2\pi)^{2} e^{\F(x| \blambda)}}
{\sqrt{\textrm{det} \mathcal{M}}},
& \propto e^{\tilde{\F}(x| \blambda)}
\end{split}
\end{align}
where $C$ is a normalisation constant, $\textrm{det}\mathcal{M}$ is an antenna
pattern factor dependent on the sky-position and observation period, and
$\mathcal{F}$ is the frequentist log-likelihood of \citet{jks1998}. This result
where $\mathcal{F}$ is exactly the $\F$-statistic, first derived
as a maximised log-likelihood \citep{jks1998}. This result
demonstrates that the $\mathcal{F}$-statistic is proportional to the log-Bayes
factors when calculated with a uniform prior on the amplitude parameters and
fixed Doppler parameters.
As such, we can define the Bayes-factor of Equation~\eqref{eqn_full_bayes} as
\begin{equation}
\Bsn(x| \Pic, I) = \int
\Bsn(x| \Pic, \blambda, I) P(\blambda| \Hs, I)
\Bsn(x| \AmplitudePrior, I) = \int
\Bsn(x| \AmplitudePrior, \blambda, I) P(\blambda| \Hs, I)
d\blambda,
\end{equation}
or neglecting the constants
\begin{equation}
\Bsn(x| \Pic, I) \propto \int
e^{\F(x| \blambda)} P(\blambda| \Hs, I)
\Bsn(x| \AmplitudePrior, I) \propto \int
e^{\tilde{\F}(x| \blambda)} P(\blambda| \Hs, I)
d\blambda.
\label{eqn_bayes_over_F}
\end{equation}
Formulating the significance of a CW candidate in this way is pragmatic in that
there exists a wealth of well-tested tools \citep{lalsuite} capable of
computing the $\mathcal{F}$-statistic for CW signals, transient-CWs, and CW
signals from binary systems; these can be leveraged to compute
Equation~\eqref{eqn_bayes_over_F}, or adding in the constant
$\Bsn(x| \Pic)$ itself. The disadvantage to this method is that
we are forced to use the prior $\Pic$, which was shown by \citet{prix2009} to
be unphysical.
there exists a wealth of well-tested tools (i.e. \texttt{LALSuite}
\citep{lalsuite}) capable of computing the $\mathcal{F}$-statistic for CW
signals, transient-CWs, and CW signals from binary systems; these can be
leveraged to compute Equation~\eqref{eqn_bayes_over_F}.
\section{MCMC and the $\mathcal{F}$-statistic}
\label{sec_MCMC_and_the_F_statistic}
......@@ -346,23 +353,22 @@ given given some data $x$ for some hypothesis $\H$. Namely, Bayes rule
P(\theta| x, \H, I) \propto P(x| \theta, \H, I)P(\theta| \H, I),
\label{eqn_bayes_for_theta}
\end{equation}
is used to evaluate proposed jumps from one point in parameter to other points;
jumps which increase this probably are accepted with some probability. The
algorithm, proceeding in this way, is highly efficient at resolving peaks in
high-dimension parameter spaces.
is used to evaluate proposed steps in parameter space. After some
initilisations, the samples (points visited during the stepping process) can be
used as samples of the posterior distribution. The algorithm is highly
efficient at resolving peaks in high-dimension parameter spaces.
At this point, we note the equivalence of Equation~\eqref{eqn_bayes_for_theta}
to the integrand of Equation~\eqref{eqn_bayes_over_F}:
\begin{equation}
P(\blambda | x, \Pic, \Hs, I)
%=\Bsn(x| \Pic, \blambda) P(\blambda| \Hs I).
P(\blambda | x, \AmplitudePrior, \Hs, I)
%=\Bsn(x| \AmplitudePrior, \blambda) P(\blambda| \Hs I).
\propto e^{\F(x| \blambda)} P(\blambda| \Hs I),
\label{eqn_lambda_posterior}
\end{equation}
where $e^{\F}$ is the likelihood.
In this work, we will focus on using MCMC methods to sample this, the
posterior distribution of the Doppler parameters and moreover compute the
final Bayes factor.
posterior distribution of the Doppler parameters.
\subsection{The \texttt{emcee} sampler}
......@@ -390,11 +396,11 @@ modification, the parallel-tempered ensemble sampler. A parallel
tempered MCMC simulation, first proposed by \citet{swendsen1986}, runs
$\Ntemps$ simulations in parallel with the likelihood in the $i^{\rm th}$
parallel simulation is raised to a power of $1/T_{i}$ where $T_i$ is referred
to as the temperature. As such, Equation~\eqref{eqn_lambda_posterior} is
to as the \emph{temperature}. As such, Equation~\eqref{eqn_lambda_posterior} is
written as
\begin{equation}
P(\blambda | T_i, x, \Pic, \Hs, I)
%=\Bsn(x| \Pic, \blambda)^{T_i} P(\blambda| \Hs I).
P(\blambda | T_i, x, \AmplitudePrior, \Hs, I)
%=\Bsn(x| \AmplitudePrior, \blambda)^{1/T_i} P(\blambda| \Hs I).
\propto (e^{\F(x| \blambda)})^{T_i} P(\blambda| \Hs I).
\end{equation}
Setting $T_0=1$ with $T_i > T_0 \; \forall \; i > 1$, such that the lowest
......@@ -403,25 +409,25 @@ temperatures the likelihood is broadened (for a Gaussian likelihood, the
standard deviation is larger by a factor of $\sqrt{T_i}$). Periodically, the
algorithm swaps the position of the walkers between the different
temperatures. This allows the $T_0$ chain (from which we draw samples of the
posterior) to efficiently sample from multi-modal posteriors. This introduces
posterior) to efficiently sample from multi-modal posteriors. This method introduces
two additional tuning parameters, the number and range of the set of
temperatures $\{T_i\}$, we will discuss their significance when relevant.
\subsection{Parallel tempering: estimating the Bayes factor}
\comment{Greg: I don't think we actually need this since we aren't using the Bayes factor}
In addition, parallel-tempering also offers a robust method to estimate the
Bayes factor of Equation~\eqref{eqn_bayes_over_F}. If we define
$\beta\equiv1/T$, the inverse temperature and $Z(\beta)\equiv \Bsn(x| \Pi_{\rm
c}, I)$, then as noted by \citet{goggans2004} for the general case, we may
$\beta\equiv1/T$, the inverse temperature and $Z(\beta)\equiv \Bsn(x| \AmplitudePrior, I)$, then as noted by \citet{goggans2004} for the general case, we may
write
\begin{align}
\frac{1}{Z} \frac{\partial Z}{\partial \beta}=
\frac{
\int \Bsn(x| \Pic, \blambda)^{\beta}
\log(\Bsn(x| \Pic, \blambda))P(\blambda| I)
\int \Bsn(x| \AmplitudePrior, \blambda)^{\beta}
\log(\Bsn(x| \AmplitudePrior, \blambda))P(\blambda| I)
d\blambda
}
{
\int \Bsn(x| \Pic, \blambda)^{\beta})P(\blambda| I)
\int \Bsn(x| \AmplitudePrior, \blambda)^{\beta})P(\blambda| I)
d\blambda
}
\end{align}
......@@ -429,20 +435,20 @@ The right-hand-side expresses the average of the log-likelihood at $\beta$. As
such, we have
\begin{align}
\frac{\partial \log Z}{\partial \beta} =
\langle \log(\Bsn(x| \Pic, \blambda) \rangle_{\beta}
\langle \log(\Bsn(x| \AmplitudePrior, \blambda) \rangle_{\beta}
\end{align}
The log-likelihood are a calculated during the MCMC sampling. As such, one
The log-likelihood values are calculated during the MCMC sampling. As such, one
can numerically integrate to get the Bayes factor, i.e.
\begin{align}
\log \Bsn(x| \Pic, I) = \log Z = \int_{0}^{1}
\langle \log(\Bsn(x| \Pic, \blambda) \rangle_{\beta} d\beta.
\log \Bsn(x| \AmplitudePrior, I) = \log Z = \int_{0}^{1}
\langle \log(\Bsn(x| \AmplitudePrior, \blambda) \rangle_{\beta} d\beta.
\end{align}
In practice, we use a simple numerical quadrature over a finite ladder of
$\beta_i$ with the smallest chosen such that choosing a smaller value does not
change the result beyond other numerical uncertainties. Typically, getting
accurate results for the Bayes factor requires a substantially larger number of
temperatures than are required for efficiently sampling multi-modal
distributions. Therefore, it is recommended that one uses a small number of
distributions. Therefore, it is recommended that one uses a few
temperatures during the search stage, and subsequently a larger number of
temperatures (suitably initialised close to the target peak) when estimating
the Bayes factor.
......@@ -455,9 +461,9 @@ poorly if the posterior has multiple small maxima with only a small number of
large maxima which occupy a small fraction of the prior volume. Later on, in
Section~\ref{sec_topology} we will show that for a signal in Gaussian noise our
log-likelihood $\F$ will have a single peak due to the signal which is
surrounded by other peaks particular to given noise realisation. The relative
size of the these peaks to the signal peak depends on the SNR. Howeever,
regardless of the strengt of the signal peak, if the width of the signal peak
surrounded by other smaller peaks particular to given noise realisation. The relative
size of the these peaks to the signal peak depends on the SNR. However,
regardless of the strength of the signal peak, if the width of the signal peak
is small compared to the prior volume, the sampler will get `stuck' on the
local maxima and be inefficient at finding the global maxima. This problem is
exacerbated in higher-dimensional search spaces where the volume fraction of
......@@ -467,14 +473,14 @@ the signal scales with the exponent of the number of dimensions.
In a traditional CW search which uses a grid of templates (also known as a
template bank), the spacings of the grid are chosen such that the loss of
signal to noise ratio (SNR) is bounded to less than $\mu$, the
SNR is bounded to less than $\mu$, the
\emph{template-bank mismatch}. The optimal choice of grid then consists of
minimising the computing cost while respecting this minimum template-bank
mismatch or vice-verse (for examples see \citet{pletsch2010, prix2012,
wette2013, wette2015}). We will now discuss how the work on setting up these
grids can be applied to the problem of determining whether the setup is
appropriate for an MCMC method: i.e. given the prior volume do we expect a
signal to occupy a non-negligible volume?
signal to occupy a non-negligible fraction of the volume.
For a fully-coherent $\F$-statistic search on data containing Gaussian noise
and a signal with Doppler parameters $\blambdaSignal$, the template-bank
......@@ -489,23 +495,23 @@ parameter (c.f. Equation~\ref{eqn_twoF_expectation}) at $\blambda_l$, given
that the signal is at $\blambdaSignal$; as such $\tilde{\rho}(\blambdaSignal;
\blambdaSignal)$ is the perfectly-matched non-centrality parameter. For a
fully-coherent search, this non-centrality parameter is equivalent to
fully-coherent matched-filter signal to noise ratio (SNR). However, as noted by
fully-coherent matched-filter SNR. However, as noted by
\citet{leaci2015}, this is true for the fully-coherent case only. Therefore,
we choose to use the non-centrality parameter which easily generalises to the
semi-coherent case.