Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Gregory Ashton
PyFstat
Commits
65b75b77
Commit
65b75b77
authored
May 11, 2017
by
Gregory Ashton
Browse files
Update to Bayes factor intro
parent
36fa6859
Changes
3
Hide whitespace changes
Inline
Sidebyside
Paper/bibliography.bib
View file @
65b75b77
...
...
@@ 553,3 +553,9 @@ year = {2015}
url
=
{http://link.aps.org/doi/10.1103/PhysRevD.72.102002}
}
@book
{
mackay2003information
,
title
=
{Information theory, inference and learning algorithms}
,
author
=
{MacKay, David JC}
,
year
=
{2003}
,
publisher
=
{Cambridge university press}
}
Paper/definitions.tex
View file @
65b75b77
...
...
@@ 31,4 +31,5 @@
\newcommand
{
\Vpe
}{
\V
_{
\rm
PE
}}
\newcommand
{
\smax
}{
s
_{
\textrm
{
max
}}}
\newcommand
{
\fmax
}{
f
_{
\textrm
{
max
}}}
\newcommand
{
\rhohatmax
}{
\hat
{
\rho
}_{
\mathrm
{
max
}}}
Paper/paper_cw_mcmc.tex
View file @
65b75b77
...
...
@@ 203,172 +203,217 @@ its efficient. In Sections~\ref{sec_transients} and \ref{sec_glitches} we
demonstrate how searches can be performed for either transient or glitches CWs
before we finally conclude in Section~
\ref
{
sec
_
conclusion
}
.
\section
{
Hypothesis te
sti
ng
}
\label
{
sec
_
hypothesis
_
testing
}
\section
{
MCMC and the
$
\F
$
stati
sti
c
}
\label
{
sec
_
MCMC
_
and
_
the
_
F
_
statistic
}
\subsection
{
Bayes factors
}
Given some data
$
x
$
and a set of background assumptions
$
I
$
, we formulate
two hypotheses:
$
\Hn
$
, the data contains solely Gaussian noise and
$
\Hs
$
, the
data contains an additive mixture of noise and a signal
$
h
(
t;
\A
,
\blambda
)
$
.
In order to make a quantitative comparison, we use Bayes theorem in the usual
way to write the odds as
\begin{equation}
O
_{
\rm
S/N
}
\equiv
\frac
{
P(
\Hs
 x, I)
}{
P(
\Hn
 x, I)
}
=
\Bsn
(x I)
\frac
{
P(
\Hs
 I)
}{
P(
\Hn
 I)
}
,
\end{equation}
where the second factor is the
\emph
{
prior odds
}
while the first is the
\emph
{
Bayes factor
}
:
\begin{equation}
\Bsn
(x I) =
\frac
{
P(x
\Hs
, I)
}{
P(x
\Hn
, I)
}
.
\end{equation}
Typically, we set the prior odds to unity such that it is the Bayes factor
which determines our confidence in the signal hypothesis. In this work, we will
therefore discuss the Bayes factor with the implied assumption this is
equivalent to the odds, unless we have a good reason to change the prior odds.
In this section, we will give a brief introduction to the use of the
$
\F
$
statistic in a hypothesis testing framework, introduce conceptual ideas
about the use of MarkovChain MonteCarlo (MCMC) algorithms in this context,
and develop the idea of using MCMC routines in CW searches.
We can rewrite the Bayes factor in terms of the two sets of signal parameters
as
\begin{equation}
\Bsn
(x I) =
\frac
{
P(x,
\A
,
\blambda

\Hs
, I)
}
{
P(
\A

\Hs
, I)P(
\blambda

\Hs
, I)P(x
\Hn
, I)
}
.
\end{equation}
Marginalising over the two sets of parameters we find that
\begin{equation}
\Bsn
(x I)=
\iint
\mathcal
{
L
}
(x;
\A
,
\blambda
)
P(
\A

\Hs
, I)P(
\blambda

\Hs
, I)
d
\blambda
d
\A
\label
{
eqn
_
full
_
bayes
}
\end{equation}
where
\subsection
{
Hypothesis testing framework
}
\label
{
sec
_
hypothesis
_
testing
}
In
\citet
{
prix2009
}
, a framework was introduced demonstrating the use of the
$
\F
$
statistic (first discussed in
\citet
{
jks1998
}
) in defining the
\emph
{
Bayes
factor
}
$
\Bsn
(
\boldsymbol
{
x
}
)
$
between the signal hypothesis and the noise
hypothesis computed on some data
$
\boldsymbol
{
x
}$
. It was shown that for an
unphysical uniform prior on the amplitude parameters, the logBayes factor was
proportional to the
$
\F
$
statistic. In this work, we will use the modification
proposed in
\citet
{
prix2011
}
, which removed the antennapattern weighting
factor; let us now briefly review this framework.
The Bayesfactor between the signal and noise hypothesis, given the
`
$
\F
$
statistic prior', is
\begin{equation}
\
mathcal
{
L
}
(x;
\A
,
\blambda
)
\equiv
\frac
{
P(x 
\A
,
\blambda
,
\Hs
, I)
}{
P(x
\Hn
, I)
}
,
\label
{
eqn
_
likelihood
}
\
Bsn
(
\boldsymbol
{
x
}
) =
\int
\Bsn
(
\boldsymbol
{
x
}

\blambda
)P(
\blambda
) d
\blambda
\label
{
eqn
_
Bsn
_
x
}
\end{equation}
is the
\emph
{
likelihoodratio
}
.
At this point, we can appreciate the problems of searching for unknown signals:
one has four amplitude parameters and several Doppler parameters over which
this integral must be performed. If a single signal exists in the data, this
corresponds to a single peak in the likelihoodratio, but at an unknown
location. Therefore, one must must first search for peaks (candidates), and
then subsequently analyse their significance. If one has prior knowledge this
can be used; for example in targeted searches for gravitational waves from
known pulsars emitting CWs from a mountain the Doppler parameters are
considered known collapsing their integral to a single evaluation (see for
example the method proposed by
\citet
{
dupuis2005
}
).
\comment
{
Should we cite other knownpulsar methodologies?
}
\subsection
{
The
$
\F
$
statistic: a frequentist perspective
}
For directed and allsky searches, a common method introduced by
\citet
{
jks1998
}
to reduce the parameter space is the maximumlikelihood
approach. In this approach (often referred to as `frequentist'), one starts by
defining the likelihoodratio, Equation~
\eqref
{
eqn
_
likelihood
}
, which in this
context is a
\emph
{
matchedfiltering
}
amplitude. Then, analytically maximising
this likelihoodratio with respect to the four amplitude parameters results
(c.f.~
\citet
{
prix2009
}
) in a maximised loglikelihood given by
$
\F
(
x
\blambda
)
$
: the socalled
$
\F
$
statistic. Picking a particular set of Doppler
parameters
$
\blambda
$
(the template) one can then compute a detection statistic
(typically
$
\twoFtilde
$
is used
\footnote
{
Note that the `tilde' denote that it
is a fullycoherent quantity while we use a `hat' to denote a semicoherent
quantity
}
) which can be used to quantify the significance of the template.
Usually this is done by calculating a corresponding false alarm rate, the
probability of seeing such a detection statistic in Gaussian noise.
Calculations of the significance are possible due to the properties of the
detection statistic: in Gaussian noise, it can be shown
\citep
{
jks1998,
cutlershutz2005
}
that
$
\widetilde
{
2
\F
}$
follows a chisquared distribution with
4 degrees of freedom. In the presence of a signal,
$
\widetilde
{
2
\F
}$
is
still chisquared with 4 degrees of freedom, but has a noncentrality parameter
$
\tilde
{
\rho
}^{
2
}$
such that its expectation value is
where
$
\Bsn
(
\boldsymbol
{
x
}

\blambda
)
$
is the
targeted Bayes factor (i.e. computed at a set of
Doppler parameters
$
\blambda
$
):
\begin{equation}
\textrm
{
E
}
[
\widetilde
{
2
\F
}
(x;
\blambda
)] = 4 +
\tilde
{
\rho
}
(x;
\blambda
)
^
2.
\label
{
eqn
_
twoF
_
expectation
}
\Bsn
(
\boldsymbol
{
x
}

\blambda
) =
\frac
{
70
}{
\rhohatmax
^{
4
}}
e
^{
\F
(
\boldsymbol
{
x
}
;
\blambda
)
}
.
\label
{
eqn
_
CW
_
likelihood
}
\end{equation}
The noncentrality parameter in this context is the SNR of the matchedfilter
given by
Here,
$
\rhohatmax
$
is a maximum cutoff SNR required to normalise the prior.
The original calculation was done for a transientCW, in Eqn~
\eqref
{
eqn
_
CW
_
likelihood
}
we constrain the system to the special case of standard CW signals for which the
marginalisation over the transient parameters yields unity.
Furthermore,
\citet
{
prix2011
}
also demonstrated that the semicoherent Bayes
factor naturally arrises by splitting the likelihood into
$
N
$
independent
\emph
{
segment
}
resulting in
\begin{equation}
\rho
^{
2
}
= (h  h)
\propto
\frac
{
h
_
0
^
2
}{
\Sn
}
\Tcoh
\mathcal
{
N
}
\Bsn
^{
\textrm
{
SC
}}
(
\boldsymbol
{
x
}

\blambda
) =
\left
(
\frac
{
70
}{
\rhohatmax
^{
4
}}
\right
)
^{
N
}
e
^{
\sum
_{
\ell
=1
}^{
N
}
\F
(
\boldsymbol
{
x
}_{
(
\ell
)
}
;
\blambda
)
}
,
\label
{
eqn
_
SC
_
CW
_
likelihood
}
\end{equation}
where
$
(
hh
)
$
is the inner product of the signal with itself (see for example
\citet
{
prix2009
}
),
$
\Sn
$
is a (frequencydependent) measure of the noise in
the detector and
$
\mathcal
{
N
}$
is the number of detectors.
\subsection
{
The
$
\F
$
statistic: a Bayesian perspective
}
At first, it
appears that the
$
\F
$
statistic is independent of the Bayesian framework since
it was first derived directly from the likelihood. However, it was shown in
\citet
{
prix2009, prix2011
}
that if we marginalise over the four amplitude parameters of
Equation~
\eqref
{
eqn
_
full
_
bayes
}
(using the ``
$
\F
$
statistic prior''
\citep
{
prix2011
}
,
which we denote by
$
\AmplitudePrior
$
)
then the integral, when
$
\homax
\gg
1
$
, is a Gaussian integral and can be
computed analytically as
\begin{align}
\begin{split}
\Bsn
(x
\blambda
,
\AmplitudePrior
, I)
&
\equiv
\int
\mathcal
{
L
}
(x ;
\A
,
\blambda
)
P(
\A

\Hs
,
\AmplitudePrior
, I) d
\A
\label
{
eqn
_
B
_
S
_
N
}
\\
&
\propto
e
^{
\tilde
{
\F
}
(x
\blambda
)
}
\end{split}
\end{align}
where
$
\mathcal
{
F
}$
is exactly the
$
\F
$
statistic, first derived
as a maximised loglikelihood
\citep
{
jks1998
}
. This result
demonstrates that the
$
\mathcal
{
F
}$
statistic is proportional to the logBayes
factors when calculated with a uniform prior on the amplitude parameters and
fixed Doppler parameters.
where
$
\boldsymbol
{
x
}_{
(
\ell
)
}$
refers to the data in the
$
\ell
^{
\mathrm
{
th
}}$
segment. As such, this can replace the relevant term in
As such, we can define the Bayesfactor of Equation~
\eqref
{
eqn
_
full
_
bayes
}
as
\begin{equation}
\Bsn
(x
\AmplitudePrior
, I) =
\int
\Bsn
(x
\AmplitudePrior
,
\blambda
, I) P(
\blambda

\Hs
, I)
d
\blambda
,
\end{equation}
or neglecting the constants
\begin{equation}
\Bsn
(x
\AmplitudePrior
, I)
\propto
\int
e
^{
\tilde
{
\F
}
(x
\blambda
)
}
P(
\blambda

\Hs
, I)
d
\blambda
.
\label
{
eqn
_
bayes
_
over
_
F
}
\end{equation}
Formulating the significance of a CW candidate in this way is pragmatic in that
there exists a wealth of welltested tools (i.e.
\texttt
{
LALSuite
}
\citep
{
lalsuite
}
) capable of computing the
$
\mathcal
{
F
}$
statistic for CW
signals, transientCWs, and CW signals from binary systems; these can be
leveraged to compute Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
.
\section
{
MCMC and the
$
\mathcal
{
F
}$
statistic
}
\label
{
sec
_
MCMC
_
and
_
the
_
F
_
statistic
}
%\section{Hypothesis testing}
%\label{sec_hypothesis_testing}
%
%\subsection{Bayes factors}
%Given some data $x$ and a set of background assumptions $I$, we formulate
%two hypotheses: $\Hn$, the data contains solely Gaussian noise and $\Hs$, the
%data contains an additive mixture of noise and a signal $h(t; \A, \blambda)$.
%In order to make a quantitative comparison, we use Bayes theorem in the usual
%way to write the odds as
%\begin{equation}
%O_{\rm S/N} \equiv \frac{P(\Hs x, I)}{P(\Hn x, I)} =
%\Bsn(x I) \frac{P(\Hs I)}{P(\Hn  I)},
%\end{equation}
%where the second factor is the \emph{prior odds} while the first is the
%\emph{Bayes factor}:
%\begin{equation}
%\Bsn(x I) = \frac{P(x \Hs, I)}{P(x \Hn, I)}.
%\end{equation}
%Typically, we set the prior odds to unity such that it is the Bayes factor
%which determines our confidence in the signal hypothesis. In this work, we will
%therefore discuss the Bayes factor with the implied assumption this is
%equivalent to the odds, unless we have a good reason to change the prior odds.
%
%We can rewrite the Bayes factor in terms of the two sets of signal parameters
%as
%\begin{equation}
%\Bsn(x I) = \frac{P(x, \A, \blambda\Hs, I)}
%{P(\A \Hs, I)P(\blambda \Hs, I)P(x \Hn, I)}.
%\end{equation}
%Marginalising over the two sets of parameters we find that
%\begin{equation}
%\Bsn(x I)= \iint
%\mathcal{L}(x; \A, \blambda)
%P(\A \Hs, I)P(\blambda \Hs, I)
%d\blambda d\A
%\label{eqn_full_bayes}
%\end{equation}
%where
%\begin{equation}
%\mathcal{L}(x; \A, \blambda) \equiv \frac{P(x \A, \blambda, \Hs, I)}{P(x \Hn, I)},
%\label{eqn_likelihood}
%\end{equation}
%is the \emph{likelihoodratio}.
%
%At this point, we can appreciate the problems of searching for unknown signals:
%one has four amplitude parameters and several Doppler parameters over which
%this integral must be performed. If a single signal exists in the data, this
%corresponds to a single peak in the likelihoodratio, but at an unknown
%location. Therefore, one must must first search for peaks (candidates), and
%then subsequently analyse their significance. If one has prior knowledge this
%can be used; for example in targeted searches for gravitational waves from
%known pulsars emitting CWs from a mountain the Doppler parameters are
%considered known collapsing their integral to a single evaluation (see for
%example the method proposed by \citet{dupuis2005}).
%\comment{Should we cite other knownpulsar methodologies?}
%
%\subsection{The $\F$statistic: a frequentist perspective}
%
%For directed and allsky searches, a common method introduced by
%\citet{jks1998} to reduce the parameter space is the maximumlikelihood
%approach. In this approach (often referred to as `frequentist'), one starts by
%defining the likelihoodratio, Equation~\eqref{eqn_likelihood}, which in this
%context is a \emph{matchedfiltering} amplitude. Then, analytically maximising
%this likelihoodratio with respect to the four amplitude parameters results
%(c.f.~\citet{prix2009}) in a maximised loglikelihood given by $\F(x
%\blambda)$: the socalled $\F$statistic. Picking a particular set of Doppler
%parameters $\blambda$ (the template) one can then compute a detection statistic
%(typically $\twoFtilde$ is used\footnote{Note that the `tilde' denote that it
%is a fullycoherent quantity while we use a `hat' to denote a semicoherent
%quantity}) which can be used to quantify the significance of the template.
%Usually this is done by calculating a corresponding false alarm rate, the
%probability of seeing such a detection statistic in Gaussian noise.
%
%Calculations of the significance are possible due to the properties of the
%detection statistic: in Gaussian noise, it can be shown \citep{jks1998,
%cutlershutz2005} that $\widetilde{2\F}$ follows a chisquared distribution with
%4 degrees of freedom. In the presence of a signal, $\widetilde{2\F}$ is
%still chisquared with 4 degrees of freedom, but has a noncentrality parameter
%$\tilde{\rho}^{2}$ such that its expectation value is
%\begin{equation}
%\textrm{E}[\widetilde{2\F}(x; \blambda)] = 4 + \tilde{\rho}(x; \blambda)^2.
%\label{eqn_twoF_expectation}
%\end{equation}
%The noncentrality parameter in this context is the SNR of the matchedfilter
%given by
%\begin{equation}
%\rho^{2} = (h  h) \propto \frac{h_0^2}{\Sn}\Tcoh \mathcal{N}
%\end{equation}
%where $(hh)$ is the inner product of the signal with itself (see for example
%\citet{prix2009}), $\Sn$ is a (frequencydependent) measure of the noise in
%the detector and $\mathcal{N}$ is the number of detectors.
%
%
%\subsection{The $\F$statistic: a Bayesian perspective}
%
%At first, it
%appears that the $\F$statistic is independent of the Bayesian framework since
%it was first derived directly from the likelihood. However, it was shown in
%\citet{prix2009, prix2011} that if we marginalise over the four amplitude parameters of
%Equation~\eqref{eqn_full_bayes} (using the ``$\F$statistic prior'' \citep{prix2011},
%which we denote by $\AmplitudePrior$)
%then the integral, when $\homax \gg 1$, is a Gaussian integral and can be
%computed analytically as
%\begin{align}
%\begin{split}
%\Bsn(x \blambda, \AmplitudePrior, I) & \equiv
%\int
%\mathcal{L}(x ;\A, \blambda)
%P(\A \Hs, \AmplitudePrior, I) d\A
%\label{eqn_B_S_N}
%\\
%& \propto e^{\tilde{\F}(x \blambda)}
%\end{split}
%\end{align}
%where $\mathcal{F}$ is exactly the $\F$statistic, first derived
%as a maximised loglikelihood \citep{jks1998}. This result
%demonstrates that the $\mathcal{F}$statistic is proportional to the logBayes
%factors when calculated with a uniform prior on the amplitude parameters and
%fixed Doppler parameters.
%
%As such, we can define the Bayesfactor of Equation~\eqref{eqn_full_bayes} as
%\begin{equation}
%\Bsn(x \AmplitudePrior, I) = \int
%\Bsn(x \AmplitudePrior, \blambda, I) P(\blambda \Hs, I)
% d\blambda,
%\end{equation}
%or neglecting the constants
%\begin{equation}
%\Bsn(x \AmplitudePrior, I) \propto \int
%e^{\tilde{\F}(x \blambda)} P(\blambda \Hs, I)
% d\blambda.
%\label{eqn_bayes_over_F}
%\end{equation}
%
%Formulating the significance of a CW candidate in this way is pragmatic in that
%there exists a wealth of welltested tools (i.e. \texttt{LALSuite}
%\citep{lalsuite}) capable of computing the $\mathcal{F}$statistic for CW
%signals, transientCWs, and CW signals from binary systems; these can be
%leveraged to compute Equation~\eqref{eqn_bayes_over_F}.
The MCMC class of optimisation tools are formulated to solve the problem of
inferring the posterior distribution of some general model parameters
$
\theta
$
given given some data
$
x
$
for some hypothesis
$
\H
$
. Namely, Bayes rule
given given some data
$
x
$
. We will not give a full description here, but
refer the reader to the literature, e.g.
\citet
{
mackay2003information
}
.
In this case, we will be concerned with the posterior
distribution of
$
\blambda
$
, the Doppler parameters. From Bayes theorem, we
see that this is proportional to the integrand of Eqn.~
\eqref
{
eqn
_
Bsn
_
x
}
:
\begin{equation}
P(
\theta
 x,
\H
, I)
\propto
P(x
\theta
,
\H
, I)P(
\theta

\H
, I),
\label
{
eqn
_
bayes
_
for
_
theta
}
\Bsn
(
\blambda

\boldsymbol
{
x
}
)
\propto
\Bsn
(
\boldsymbol
{
x
}

\blambda
)P(
\blambda
).
\end{equation}
is used to evaluate proposed steps in parameter space. After some
initilisations, the samples (points visited during the stepping process) can be
used as samples of the posterior distribution. The algorithm is highly
efficient at resolving peaks in highdimension parameter spaces.
At this point, we note the equivalence of Equation~
\eqref
{
eqn
_
bayes
_
for
_
theta
}
to the integrand of Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
:
\begin{equation}
P(
\blambda
 x,
\AmplitudePrior
,
\Hs
, I)
%=\Bsn(x \AmplitudePrior, \blambda) P(\blambda \Hs I).
\propto
e
^{
\F
(x
\blambda
)
}
P(
\blambda

\Hs
I),
\label
{
eqn
_
lambda
_
posterior
}
\end{equation}
where
$
e
^{
\F
}$
is the likelihood.
In this work, we will focus on using MCMC methods to sample this, the
posterior distribution of the Doppler parameters.
The likelihood function (either for the fullycoherent of semicoherent case)
and the prior distributions can be passed to any optimisation routine in order
to maximise over the unknown signal parameters. In this paper, we will discuss
the use of an ensemble MCMC optimization routine, this has the advantage of
being efficient in highdimensional spaces, the ability to sampler multimodal
posteriors, and to calculate the marginalised Bayes factor (i.e.
Eqn~
\ref
{
eqn
_
Bsn
_
x
}
).
\subsection
{
The
\texttt
{
emcee
}
sampler
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment