Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
David Keitel
PyFstat
Commits
d0708d9a
Commit
d0708d9a
authored
Nov 21, 2016
by
Gregory Ashton
Browse files
Updates to the paper and initial example plots
parent
f89e9cce
Changes
7
Show whitespace changes
Inline
Sidebyside
Paper/definitions.tex
View file @
d0708d9a
\newcommand
{
\fdot
}{
\dot
{
f
}}
\newcommand
{
\F
}{{
\mathcal
{
F
}}}
\newcommand
{
\F
}{{
\mathcal
{
F
}}}
\newcommand
{
\A
}{
\boldsymbol
{
\mathcal
{
A
}}}
\newcommand
{
\A
}{
\boldsymbol
{
\mathcal
{
A
}}}
\newcommand
{
\blambda
}{
\boldsymbol
{
\mathbf
{
\lambda
}}}
\newcommand
{
\blambda
}{
\boldsymbol
{
\mathbf
{
\lambda
}}}
...
...
Paper/fully_coherent_search_using_MCMC_walkers.png
View replaced file @
f89e9cce
View file @
d0708d9a
246 KB

W:

H:
99.7 KB

W:

H:
2up
Swipe
Onion skin
Paper/paper_cw_mcmc.tex
View file @
d0708d9a
...
@@ 50,8 +50,7 @@
...
@@ 50,8 +50,7 @@
\begin{document}
\begin{document}
\title
{
MCMC followup methods for continuous gravitational wave candidates including
\title
{
MCMC followup methods for continuous gravitational wave candidates
}
glitching and transient waveforms
}
\author
{
G. Ashton
}
\author
{
G. Ashton
}
\email
[Email: ]
{
gregory.ashton@ligo.org
}
\email
[Email: ]
{
gregory.ashton@ligo.org
}
...
@@ 197,20 +196,20 @@ data contains an additive mixture of noise and a signal $h(t; \A, \blambda)$.
...
@@ 197,20 +196,20 @@ data contains an additive mixture of noise and a signal $h(t; \A, \blambda)$.
In order to make a quantitative comparison, we use Bayes theorum in the usual
In order to make a quantitative comparison, we use Bayes theorum in the usual
way to write the odds as
way to write the odds as
\begin{equation}
\begin{equation}
O
_{
\rm
S/N
}
\equiv
\frac
{
P(
\Hs
 x I)
}{
P(
\Hn
 x I)
}
=
O
_{
\rm
S/N
}
\equiv
\frac
{
P(
\Hs
 x
,
I)
}{
P(
\Hn
 x
,
I)
}
=
\Bsn
(x I)
\frac
{
P(
\Hs
 I)
}{
P(
\Hn
 I)
}
,
\Bsn
(x I)
\frac
{
P(
\Hs
 I)
}{
P(
\Hn
 I)
}
,
\end{equation}
\end{equation}
where the second factor is the prior odds while the first factor is the
where the second factor is the prior odds while the first factor is the
\emph
{
Bayes factor
}
:
\emph
{
Bayes factor
}
:
\begin{equation}
\begin{equation}
\Bsn
(x I) =
\frac
{
P(x
\Hs
I)
}{
P(x
\Hn
I)
}
.
\Bsn
(x I) =
\frac
{
P(x
\Hs
,
I)
}{
P(x
\Hn
,
I)
}
.
\end{equation}
\end{equation}
Typically, we set the prior odds to unity such that it is the Bayes factor
Typically, we set the prior odds to unity such that it is the Bayes factor
which determines our confidence in the signal hypothesis. In this work we will
which determines our confidence in the signal hypothesis. In this work we will
therefore discuss the Bayes factor with the imp
p
lied assumption this is
therefore discuss the Bayes factor with the implied assumption this is
equivalent to the odds, unless we have a good reason to change the prior odds.
equivalent to the odds, unless we have a good reason to change the prior odds.
We can rewrite th
is
Bayes factor in terms of the two sets of signal parameters
We can rewrite th
e
Bayes factor in terms of the two sets of signal parameters
as
as
\begin{equation}
\begin{equation}
\Bsn
(x I) =
\frac
{
P(x,
\A
,
\blambda

\Hs
, I)
}
\Bsn
(x I) =
\frac
{
P(x,
\A
,
\blambda

\Hs
, I)
}
...
@@ 277,13 +276,13 @@ where $(hh)$ is the inner product of the signal with itself (see for example
...
@@ 277,13 +276,13 @@ where $(hh)$ is the inner product of the signal with itself (see for example
the detector and
$
\mathcal
{
N
}$
is the number of detectors.
the detector and
$
\mathcal
{
N
}$
is the number of detectors.
\subsection
{
Using the
$
\F
$
statistic to compute a Bayes factor
}
\subsection
{
Using the
$
\F
$
statistic to compute a Bayes factor
}
At first, it
At first, it
appear
ed
that the
$
\F
$
statistic
wa
s independent of the Bayesian
appear
s
that the
$
\F
$
statistic
i
s independent of the Bayesian
framework since
framework. However, it was shown by
\citet
{
prix2009
}
that if we marginalise
it was first derived directly from the likelihood. However, it was shown by
over the four amplitude parameters of Equation~
\eqref
{
eqn
_
full
_
bayes
}
, choosing
\citet
{
prix2009
}
that if we marginalise over the four amplitude parameters of
a prior
$
\Pi
_{
\rm
c
}
$
such that
Equation~
\eqref
{
eqn
_
full
_
bayes
}
, choosing
a prior
$
\Pi
c
$
such that
\begin{equation}
\begin{equation}
P(
\A

\Hs
,
\Pi
_{
\rm
c
}
, I)
\equiv
\left\{
P(
\A

\Hs
,
\Pi
c
, I)
\equiv
\left\{
\begin{array}
{
ll
}
\begin{array}
{
ll
}
C
&
\textrm
{
for
}
\ho
<
\homax
\\
C
&
\textrm
{
for
}
\ho
<
\homax
\\
0
&
\textrm
{
otherwise
}
0
&
\textrm
{
otherwise
}
...
@@ 293,7 +292,7 @@ C & \textrm{ for } \ho < \homax \\
...
@@ 293,7 +292,7 @@ C & \textrm{ for } \ho < \homax \\
then the integral, when
$
\homax
\gg
1
$
, is a Gaussian integral and can be
then the integral, when
$
\homax
\gg
1
$
, is a Gaussian integral and can be
computed analytically as
computed analytically as
\begin{align}
\begin{align}
B
_{
\rm
S/N
}
(x
\Pi
_{
\rm
c
}
,
\blambda
)
&
\equiv
\Bsn
(x
\Pic
,
\blambda
, I
)
&
\equiv
\int
\int
\mathcal
{
L
}
(x ;
\A
,
\blambda
)
\mathcal
{
L
}
(x ;
\A
,
\blambda
)
P(
\A

\Hs
, I) d
\A
P(
\A

\Hs
, I) d
\A
...
@@ 310,13 +309,13 @@ fixed Doppler parameters.
...
@@ 310,13 +309,13 @@ fixed Doppler parameters.
As such, we can define the Bayesfactor of Equation~
\eqref
{
eqn
_
full
_
bayes
}
as
As such, we can define the Bayesfactor of Equation~
\eqref
{
eqn
_
full
_
bayes
}
as
\begin{equation}
\begin{equation}
B
_{
\rm
S/N
}
(x
\Pi
_{
\rm
c
}
, I) =
\int
\Bsn
(x
\Pic
, I) =
\int
B
_{
\rm
S/N
}
(x
\Pi
_{
\rm
c
}
,
\blambda
) P(
\blambda

\Hs
, I)
\Bsn
(x
\Pic
,
\blambda
, I
) P(
\blambda

\Hs
, I)
d
\blambda
,
d
\blambda
,
\end{equation}
\end{equation}
or neglecting the constants
or neglecting the constants
\begin{equation}
\begin{equation}
B
_{
\rm
S/N
}
(x
\Pi
_{
\rm
c
}
, I)
\propto
\int
\Bsn
(x
\Pic
, I)
\propto
\int
e
^{
\F
(x
\blambda
)
}
P(
\blambda

\Hs
, I)
e
^{
\F
(x
\blambda
)
}
P(
\blambda

\Hs
, I)
d
\blambda
.
d
\blambda
.
\label
{
eqn
_
bayes
_
over
_
F
}
\label
{
eqn
_
bayes
_
over
_
F
}
...
@@ 327,7 +326,7 @@ there exists a wealth of welltested tools \citep{lalsuite} capable of
...
@@ 327,7 +326,7 @@ there exists a wealth of welltested tools \citep{lalsuite} capable of
computing the
$
\mathcal
{
F
}$
statistic for CW signals, transientCWs, and CW
computing the
$
\mathcal
{
F
}$
statistic for CW signals, transientCWs, and CW
signals from binary systems; these can be levereged to compute
signals from binary systems; these can be levereged to compute
Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
, or adding in the constant
Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
, or adding in the constant
$
B
_{
\rm
S
/
N
}
(
x
\Pi
_{
\rm
c
}
)
$
itself. The disadvantage to this method is that
$
\Bsn
(
x
\Pic
)
$
itself. The disadvantage to this method is that
we are forced to use the prior
$
\Pic
$
, which was shown by
\citet
{
prix2009
}
to
we are forced to use the prior
$
\Pic
$
, which was shown by
\citet
{
prix2009
}
to
be unphysical.
be unphysical.
...
@@ 343,14 +342,14 @@ P(\theta x, \H, I) \propto P(x \theta, \H, I)P(\theta \H, I),
...
@@ 343,14 +342,14 @@ P(\theta x, \H, I) \propto P(x \theta, \H, I)P(\theta \H, I),
\end{equation}
\end{equation}
is used to evaluate proposed jumps from one point in parameter to other points;
is used to evaluate proposed jumps from one point in parameter to other points;
jumps which increase this probabily are accepted with some probability. The
jumps which increase this probabily are accepted with some probability. The
algorithm, proceeding in this way, is highly eff
ective
at resolving peaks in
algorithm, proceeding in this way, is highly eff
icient
at resolving peaks in
the
highdimension parameter spaces.
highdimension parameter spaces.
At this point, we note the equivalence of Equation~
\eqref
{
eqn
_
bayes
_
for
_
theta
}
At this point, we note the equivalence of Equation~
\eqref
{
eqn
_
bayes
_
for
_
theta
}
to the integrand of Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
:
to the integrand of Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
:
\begin{equation}
\begin{equation}
P(
\blambda
 x,
\Pi
_{
\rm
c
}
,
\Hs
, I)
P(
\blambda
 x,
\Pi
c
,
\Hs
, I)
%=
B_{\rm S/N}(x \Pi_{\rm c}
, \blambda) P(\blambda \Hs I).
%=
\Bsn(x \Pic
, \blambda) P(\blambda \Hs I).
\propto
e
^{
\F
(x
\blambda
)
}
P(
\blambda

\Hs
I),
\propto
e
^{
\F
(x
\blambda
)
}
P(
\blambda

\Hs
I),
\label
{
eqn
_
lambda
_
posterior
}
\label
{
eqn
_
lambda
_
posterior
}
\end{equation}
\end{equation}
...
@@ 371,47 +370,52 @@ proposal distribution must be `tuned' so that the MCMC sampler effeciently
...
@@ 371,47 +370,52 @@ proposal distribution must be `tuned' so that the MCMC sampler effeciently
walks the parameter space without either jumping too far off the peak, or
walks the parameter space without either jumping too far off the peak, or
taking such small steps that it takes a long period of time to traverse the
taking such small steps that it takes a long period of time to traverse the
peak. The
\texttt
{
emcee
}
sampler addresses this by using an ensemble, a large
peak. The
\texttt
{
emcee
}
sampler addresses this by using an ensemble, a large
number
${
\sim
}
100
$
parallel `walkers', in which the proposal for each walker
number
${
\sim
}
100
$
parallel
\emph
{
walkers
}
, in which the proposal for each
is generated from the current distribution of the other walkers. Moreover, by
walker is generated from the current distribution of the other walkers.
applying an an affine transformation, the efficiency of the algorithm is not
Moreover, by applying an an affine transformation, the efficiency of the
diminished when the parameter space is highly anisotropic. As such, this
algorithm is not diminished when the parameter space is highly anisotropic. As
sampler requires little in the way of tuning: a single proposal scale and the
such, this sampler requires little in the way of tuning: a single proposal
Number of steps to take.
scale and the number of steps to take.
Beyond the standard ensemble sampler, we will often use one further
\subsection
{
Parallel tempering: sampling multimodal posteriors
}
modification, namely the paralleltempered ensemble sampler. A parallel
Beyond the standard ensemble sampler, we will also use one further
modification, the paralleltempered ensemble sampler. A parallel
tempered MCMC simulation, first proposed by
\citet
{
swendsen1986
}
, runs
tempered MCMC simulation, first proposed by
\citet
{
swendsen1986
}
, runs
$
\Ntemps
$
simulations in parallel with the likelihood in the
$
i
^{
\rm
th
}$
$
\Ntemps
$
simulations in parallel with the likelihood in the
$
i
^{
\rm
th
}$
parallel simulation is raied to a power of
$
1
/
T
_{
i
}$
(where
$
T
_
i
$
is referred
parallel simulation is raised to a power of
$
1
/
T
_{
i
}$
where
$
T
_
i
$
is referred
to as the temperature) such that Equation~
\eqref
{
eqn
_
lambda
_
posterior
}
becomes
to as the temperature. As such, Equation~
\eqref
{
eqn
_
lambda
_
posterior
}
is
written as
\begin{equation}
\begin{equation}
P(
\blambda
 T
_
i, x,
\Pi
_{
\rm
c
}
,
\Hs
, I)
P(
\blambda
 T
_
i, x,
\Pi
c
,
\Hs
, I)
%=
B_{\rm S/N}(x \Pi_{\rm c}
, \blambda)^{T_i} P(\blambda \Hs I).
%=
\Bsn(x \Pic
, \blambda)^{T_i} P(\blambda \Hs I).
\propto
(e
^{
\F
(x
\blambda
)
}
)
^{
T
_
i
}
P(
\blambda

\Hs
I).
\propto
(e
^{
\F
(x
\blambda
)
}
)
^{
T
_
i
}
P(
\blambda

\Hs
I).
\end{equation}
\end{equation}
Setting
$
T
_
0
=
1
$
with
$
T
_
i > T
_
0
\forall
i >
1
$
, such that the lowest
Setting
$
T
_
0
=
1
$
with
$
T
_
i > T
_
0
\;
\forall
\;
i >
1
$
, such that the lowest
temperature recovers Equation~
\eqref
{
eqn
_
lambda
_
posterior
}
while for higher
temperature recovers Equation~
\eqref
{
eqn
_
lambda
_
posterior
}
while for higher
temperatures the likelihood is broadened (for a Gaussian likelihood, the
temperatures the likelihood is broadened (for a Gaussian likelihood, the
standard devitation is larger by a factor of
$
\sqrt
{
T
_
i
}$
). Periodically, the
standard devitation is larger by a factor of
$
\sqrt
{
T
_
i
}$
). Periodically, the
different tempereates swap elements. This allows the
$
T
_
0
$
chain (from which we
algorithem swaps the position of the walkers between the different
draw samples of the posterior) to efficiently sample from multimodal
temperatures. This allows the
$
T
_
0
$
chain (from which we draw samples of the
posteriors. This does however introduce two additional tuning parameters, the
posterior) to efficiently sample from multimodal posteriors. This introduces
number and range of the set of temperatures
$
\{
T
_
i
\}
$
.
two additional tuning parameters, the number and range of the set of
temperatures
$
\{
T
_
i
\}
$
, we will discuss their signficance when relevant.
\subsection
{
Parallel tempering
}
\subsection
{
Parallel tempering
: estimating the Bayes factor
}
In addition, paralleltempering also offers a robust method to estimate the
In addition, paralleltempering also offers a robust method to estimate the
Bayes factor of Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
. If we define
Bayes factor of Equation~
\eqref
{
eqn
_
bayes
_
over
_
F
}
. If we define
$
\beta\equiv
1
/
T
$
, the inverse temperature and
$
Z
(
\beta
)
\equiv
B
_{
\rm
S
/
N
}
(
x
\Pi
_{
\rm
$
\beta\equiv
1
/
T
$
, the inverse temperature and
$
Z
(
\beta
)
\equiv
\Bsn
(
x
\Pi
_{
\rm
c
}
, I
)
$
, then as noted by
\citet
{
goggans2004
}
for the general case, we may
c
}
, I
)
$
, then as noted by
\citet
{
goggans2004
}
for the general case, we may
write
write
\begin{align}
\begin{align}
\frac
{
1
}{
Z
}
\frac
{
\partial
Z
}{
\partial
\beta
}
=
\frac
{
1
}{
Z
}
\frac
{
\partial
Z
}{
\partial
\beta
}
=
\frac
{
\frac
{
\int
B
_{
\rm
S/N
}
(x
\Pi
_{
\rm
c
}
,
\blambda
)
^{
\beta
}
\int
\Bsn
(x
\Pic
,
\blambda
)
^{
\beta
}
\log
(B
_{
\rm
S/N
}
(x
\Pi
_{
\rm
c
}
,
\blambda
))P(
\blambda
 I)
\log
(
\Bsn
(x
\Pic
,
\blambda
))P(
\blambda
 I)
d
\blambda
}
}
{
{
\int
B
_{
\rm
S/N
}
(x
\Pi
_{
\rm
c
}
,
\blambda
)
^{
\beta
}
)P(
\blambda
 I)
\int
\Bsn
(x
\Pic
,
\blambda
)
^{
\beta
}
)P(
\blambda
 I)
d
\blambda
}
}
\end{align}
\end{align}
The righthandside expresses the average of the loglikelihood at
$
\beta
$
. As
The righthandside expresses the average of the loglikelihood at
$
\beta
$
. As
...
@@ 427,7 +431,7 @@ can numerically integrate to get the Bayes factor, i.e.
...
@@ 427,7 +431,7 @@ can numerically integrate to get the Bayes factor, i.e.
\langle
\log
(
\Bsn
(x
\Pic
,
\blambda
)
\rangle
_{
\beta
}
d
\beta
.
\langle
\log
(
\Bsn
(x
\Pic
,
\blambda
)
\rangle
_{
\beta
}
d
\beta
.
\end{align}
\end{align}
In practise, we use a simple numerical quadrature over a finite ladder of
In practise, we use a simple numerical quadrature over a finite ladder of
$
\beta
_
i
$
with the smallest chosen such that
extending closer to zero
does not
$
\beta
_
i
$
with the smallest chosen such that
choosing a smaller value
does not
change the result beyond other numerical uncertainties. Typically, getting
change the result beyond other numerical uncertainties. Typically, getting
accurate results for the Bayes factor requires a substantially larger number of
accurate results for the Bayes factor requires a substantially larger number of
temperatures than are required for effeciently sampling multimodal
temperatures than are required for effeciently sampling multimodal
...
@@ 437,22 +441,20 @@ temperatures (suitably initialised close to the target peak) when estimating
...
@@ 437,22 +441,20 @@ temperatures (suitably initialised close to the target peak) when estimating
the Bayes factor.
the Bayes factor.
\subsection
{
The topology of the likelihood
}
\subsection
{
The topology of the likelihood
}
As discussed, we intend to use the
$
\F
$
statistic as our loglikelihood in the
MCMC simulations. Before continuing, it is worthwhile to understand the behaviour
We intend to use the
$
\F
$
statistic as our loglikelihood in MCMC simulations,
of the loglikelihood. As shown in Equation~
\eqref
{
eqn
_
twoF
_
expectation
}
,
but before continuing, it is worthwhile to acquaint ourselves with the typical
$
\widetilde
{
2
\F
}$
has a expectation value of 4 (corresponding to the
behaviour of the loglikelihood by considering a specific example.
4 degrees of freedom of the underlying chisquare distribution) in Gaussian
noise, but in the presence of a signal larger value are expected proportional
As shownn in Equation~
\eqref
{
eqn
_
twoF
_
expectation
}
, the expectation of
to the squared SNR.
$
\widetilde
{
2
\F
}$
is 4 in Gaussian noise alone, but proportional to the square
of the SNR in the presense of a signal. To illustrate this, let us consider
To illustrate this, let us consider
$
\widetilde
{
2
\F
}$
(the loglikelihood)
$
\widetilde
{
2
\F
}$
as a function of
$
f
$
(the template frequency) if there exists
as a function of
$
f
$
(the template frequency) if there exists a signal in the
a signal in the data with frequency
$
f
_
0
$
. We will assume that all other
data with frequency
$
f
_
0
$
. We will assume that all other Doppler parameters
Doppler parameters are perfectly matched. Such an example can be calculated
are perfectly matched.
analytically, taking the matchedfiltering amplitude (Equation~(11) of
This can be calculated analytically, taking the matchedfiltering amplitude
\citep
{
prix2005
}
) with
$
\Delta\Phi
(
t
)
=
2
\pi
(
f

f
_
0
)
t
$
, the expectation of
(Equation~(11) of
\citep
{
prix2005
}
) with
$
\Delta\Phi
(
t
)
=
2
\pi
(
f

f
_
0
)
t
$
$
\widetilde
{
2
\F
}$
as a function of the template frequency
$
f
$
is given by
the expectation of
$
\widetilde
{
2
\F
}$
as a function of the template frequency
$
f
$
is given by
\begin{equation}
\begin{equation}
\textrm
{
E
}
[
\widetilde
{
2
\F
}
](f) = 4 +
\textrm
{
E
}
[
\widetilde
{
2
\F
}
](f) = 4 +
(
\textrm
{
E
}
[
\widetilde
{
2
\F
_
0
}
] 4)
\textrm
{
sinc
}^{
2
}
(
\pi
(ff
_
0)
\Tcoh
))
(
\textrm
{
E
}
[
\widetilde
{
2
\F
_
0
}
] 4)
\textrm
{
sinc
}^{
2
}
(
\pi
(ff
_
0)
\Tcoh
))
...
@@ 624,6 +626,7 @@ As such, the volume can be decomposed as
...
@@ 624,6 +626,7 @@ As such, the volume can be decomposed as
\sqrt
{
\textrm
{
det
}
g
^{
\rm
Sky
}}
\frac
{
\Delta\Omega
}{
2
}
\times
\sqrt
{
\textrm
{
det
}
g
^{
\rm
Sky
}}
\frac
{
\Delta\Omega
}{
2
}
\times
\sqrt
{
\textrm
{
det
}
g
^{
\rm
PE
}}
\prod
_{
s=0
}^{
\smax
}
\Delta
f
^{
(s)
}
\\
\sqrt
{
\textrm
{
det
}
g
^{
\rm
PE
}}
\prod
_{
s=0
}^{
\smax
}
\Delta
f
^{
(s)
}
\\
&
=
\Vsky
\times
\Vpe
.
&
=
\Vsky
\times
\Vpe
.
\label
{
eqn
_
metric
_
volume
}
\end{align}
\end{align}
Moreover, if
$
\tref
$
is in the middle of the observation span, the diagonal
Moreover, if
$
\tref
$
is in the middle of the observation span, the diagonal
nature of
$
g
^{
\rm
PE
}$
means that one can further identify
nature of
$
g
^{
\rm
PE
}$
means that one can further identify
...
@@ 633,48 +636,90 @@ nature of $g^{\rm PE}$ means that one can further identify
...
@@ 633,48 +636,90 @@ nature of $g^{\rm PE}$ means that one can further identify
\end{align}
\end{align}
This decomposition may be useful in setting up MCMC searches.
This decomposition may be useful in setting up MCMC searches.
\subsection
{
An example
}
\subsection
{
Example: signal in noise
}
In order to familiarise the reader with the features of the search, we will now
In order to familiarise the reader with the features of an MCMC search, we will
describe a simple directed search (over
$
f
$
and
$
\dot
{
f
}$
) for a strong
now describe a simple directed search (over
$
f
$
and
$
\dot
{
f
}$
) for a simulated
simulated signal in Gaussian noise. The setup of the search consists in defining
signal in Gaussian noise. The signal will have a frequency of
$
30
$
~Hz and a
the following
spindown of
$

1
{
\times
}
10
^{

10
}$
~Hz/s, all other Doppler parameters are
\begin{itemize}
`known' and so are irrelevant. Moreover, the signal has an amplitude
\item
The prior for each search parameter. Typically, we recomend either a uniform
$
h
_
0
=
10
^{

24
}$
~Hz
$^{

1
/
2
}$
while the Gaussian noise has
prior bounding the area of interest, or a normal distribution centered on the
$
\Sn
=
10
^{

23
}$
~Hz
$^{

1
/
2
}$
such that the signal has a depth of 10.
target and with some well defined width. In this example we will use a uniform
prior.
First, we must define a prior for each search parameter Typically, we recomend
\item
The initialisation of the walkers. If the whole prior volume is to be explored,
either a uniform prior bounding the area of interest, or a normal distribution
the walkers should be initialised from the prior (i.e. random drawn from the
centered on the target and with some well defined width. However, to ensure
prior distributions) as we will do here. However, it is possible that only a
that the MCMC simulation has a reasonable chance at finding a peak, one should
small region of parameter space requires exploration, therefore we provide
consider the corresponding metricvolume given in
functionality to initialise the walkers subject to an independent distribution
Equation~
\eqref
{
eqn
_
metric
_
volume
}
. For this example, we will use a uniform
if needed.
prior with a frequency range of
$
\Delta
f
=
10
^{

7
}$
~Hz and a spindown range
\item
The number of burnin and production steps to take. This is a tuning
of
$
\Delta
\fdot
=
10
^{

13
}$
~Hz/s both centered on the simulated signal frequency
parameter of the MCMC algorithm. First we allow the walkers to run for a number
and spindown rate. We set the reference time to coincide with the middle of
of `burnin' steps which are discarded and then a number of `production' steps
the data span, therefore the metric volume can be decomposed into the frequency
are taken from which one makes estimates of the posterior
contribution and spindown contribution:
\item
The parallel tempering setup. If used, one must specify the number of
frequency,
temperatures and their arrangement. Typically, we use 3 or so temperatures
\begin{align}
with arranged linearly in logspace from some zero to some maximum temperature.
\Vpe
^{
(0)
}
=
\frac
{
(
\pi\Tcoh\Delta
f)
^
2
}{
3
}
\approx
2.46
\end{itemize}
\end{align}
and
Using these choices, the simulation is run. To illustrate the full MCMC process,
\begin{align}
in Figure~
\ref
{
fig
_
MCMC
_
simple
_
example
}
we plot the progress of all the individual
\Vpe
^{
(1)
}
=
\frac
{
4(
\pi\Delta
\fdot
)
^
2
\Tcoh
^{
4
}}{
45
}
\approx
48.9
walkers (each represented by an individual line) as a function of the total
\end{align}
number of steps. The red portion of steps are `burnin' and hence discarded,
such that
$
\V\approx
120
$
(note that
$
\Vsky
$
does not contribute since we do
from this plot we see why: the walkers are initialised from the uniform prior
not search over the sky parameters). This metric volume indicates that the
and initially spend some time exploring the whole parameter space before congerging.
signal will occupy about 1
\%
of the prior volume, therefore the MCMC is
The production samples, colored black, are only taken once the sampler has
expected to work. Alternative priors will need careful thought about how to
converged  these can be used to generate posterior plots.
translate them into a metric volume: for example using a Guassian one could use
the standard deviation as a proxy for the allowed search region.
In addition to defining the prior, one must also consider how to
\emph
{
initialise
}
the walkers. If the prior genuinely represents the stated
prior knowledge, the usual solution is to initialise the walkers from the
prior: that is the starting position is drawn from the prior. However,
instances do occur when one would like to initialise the walkers from a
different distribution. For example, if one only needs to estimate the evidence
(given a particular prior), but is aware from previous searches that the only
significant peak lies in a small area of parameter space, one could initialise
the walkers in a small cluster close to that area. In this example, we
initialise the walkers from the prior such that they have the chance to explore
the entire prior volume.
Having defined the prior, the final setup step is to define the number of
\emph
{
burnin
}
and
\emph
{
production
}
steps the sampler should take and the
number of walkers; this is a tuning parameter of the MCMC algorithm. The number
of walkers should be typically a few hundred, the greater the number the more
samples will be taken resulting in improved posterior estimates. The burnin
steps refers to an initial set of steps which are discarded as they are taken
whilst the walkers converge. After they have convereged the steps are known as
production steps since they are used to produce posterior estimates and
calculate the marginal likelihood.
Using these choices, the simulation is run. To illustrate the full MCMC
process, in Figure~
\ref
{
fig
_
MCMC
_
simple
_
example
}
we plot the progress of all
the individual walkers (each represented by an individual line) as a function
of the total number of steps. The red portion of steps are burnin and hence
discarded, from this plot we see why: the walkers are initialised from the
uniform prior and initially spend some time exploring the whole parameter space
before congerging. The fact that they converge to a single unique point is due
to the strength of the signal (substantially elevating the likelihood about
that of Gaussian fluctuations) and the tight prior which was quantifed throug the
metric volume
$
\V
$
. The production samples, colored black, are only taken once
the sampler has converged  these can be used to generate posterior plots.
\begin{figure}
[htb]
\begin{figure}
[htb]
\centering
\centering
\includegraphics
[width=0.5\textwidth]
{
fully
_
coherent
_
search
_
using
_
MCMC
_
walkers
}
\includegraphics
[width=0.5\textwidth]
{
fully
_
coherent
_
search
_
using
_
MCMC
_
walkers
}
\caption
{}
\caption
{
The progress of the MCMC simulation for a simulated signal in Gaussian
\label
{
fig:
}
noise, searching over the frequency and spindown. The upper two panels show
the position of all walkers as a function of the number of steps for the
frequency and spindown; when they are colored red the samples are discarded as
burnin (the first 100 steps), while when they are colored black they are used
as production samples. The bottom panel shows the distribution of
$
\widetilde
{
2
\F
}$
taken from the production samples.
}
\label
{
fig
_
MCMC
_
simple
_
example
}
\end{figure}
\end{figure}
\subsection
{
Example: noiseonly
}
\section
{
Followup
}
\section
{
Followup
}
\label
{
sec
_
follow
_
up
}
\label
{
sec
_
follow
_
up
}
...
@@ 686,8 +731,8 @@ fullycoherently. We begin by rewritting Equation~\eqref{eqn_lambda_posterior},
...
@@ 686,8 +731,8 @@ fullycoherently. We begin by rewritting Equation~\eqref{eqn_lambda_posterior},
the posterior distribution of the Doppler parameters, with the explicit
the posterior distribution of the Doppler parameters, with the explicit
dependence on the coherence time
$
\Tcoh
$
:
dependence on the coherence time
$
\Tcoh
$
:
\begin{equation}
\begin{equation}
P(
\blambda

\Tcoh
, x,
\Pi
_{
\rm
c
}
,
\Hs
, I)
P(
\blambda

\Tcoh
, x,
\Pi
c
,
\Hs
, I)
%=
B_{\rm S/N}
(x \Tcoh, \Pi
_{\rm c}
, \blambda) P(\blambda \Hs I).
%=
\Bsn
(x \Tcoh, \Pi
c
, \blambda) P(\blambda \Hs I).
\propto
e
^{
\hat
{
\F
}
(x
\Tcoh
,
\blambda
)
}
P(
\blambda

\Hs
I).
\propto
e
^{
\hat
{
\F
}
(x
\Tcoh
,
\blambda
)
}
P(
\blambda

\Hs
I).
\end{equation}
\end{equation}
...
...
Paper/weak_signal_follow_up_run_setup.tex
View file @
d0708d9a
\begin{tabular}
{
ccccccc
}
\begin{tabular}
{
ccccccc
}
Stage
&
$
\Nseg
$
&
$
\Tcoh
^{
\rm
days
}$
&$
\Nsteps
$
&
$
\V
$
&
$
\Vsky
$
&
$
\Vpe
$
\\
\hline
Stage
&
$
\Nseg
$
&
$
\Tcoh
^{
\rm
days
}$
&$
\Nsteps
$
&
$
\V
$
&
$
\Vsky
$
&
$
\Vpe
$
\\
\hline
0
&
80
&
1.
25
&
100
&
2
0.0
&
2
.0
&
10.0
\\
0
&
93
&
1.
1
&
100
&
1
0.0
&
1
.0
&
10.0
\\
1
&
4
0
&
2.
5
&
100
&
$
2
{
\times
}
10
^{
2
}$
&
7
.0
&
20.0
\\
1
&
4
3
&
2.
3
&
100
&
$
1
{
\times
}
10
^{
2
}$
&
6
.0
&
20.0
\\
2
&
20
&
5.0
&
100
&
$
1
{
\times
}
10
^{
3
}$
&
30.0
&
50.0
\\
2
&
20
&
5.0
&
100
&
$
1
{
\times
}
10
^{
3
}$
&
30.0
&
50.0
\\
3
&
10
&
1
0.0
&
100
&
$
1
{
\times
}
10
^{
4
}$
&
$
1
{
\times
}
10
^{
2
}$
&
90.0
\\
3
&
9
&
1
1.1
&
100
&
$
1
{
\times
}
10
^{
4
}$
&
$
1
{
\times
}
10
^{
2
}$
&
$
1
{
\times
}
10
^{
2
}$
\\
4
&
5
&
2
0
.0
&
100
&
$
7
{
\times
}
10
^{
4
}$
&
$
4
{
\times
}
10
^{
2
}$
&
$
2
{
\times
}
10
^{
2
}$
\\
4
&
4
&
2
5
.0
&
100
&
$
1
{
\times
}
10
^{
5
}$
&
$
6
{
\times
}
10
^{
2
}$
&
$
2
{
\times
}
10
^{
2
}$
\\
5
&
1
&
100.0
&
100,100
&
$
1
{
\times
}
10
^{
6
}$
&
$
1
{
\times
}
10
^{
3
}$
&
$
9
{
\times
}
10
^{
2
}$
\\
5
&
1
&
100.0
&
100,100
&
$
1
{
\times
}
10
^{
6
}$
&
$
1
{
\times
}
10
^{
3
}$
&
$
9
{
\times
}
10
^{
2
}$
\\
\end{tabular}
\end{tabular}
Paper/weak_signal_follow_up_walkers.png
View replaced file @
f89e9cce
View file @
d0708d9a
1.32 MB

W:

H:
540 KB

W:

H:
2up
Swipe
Onion skin
examples/fully_coherent_search_using_MCMC.py
View file @
d0708d9a
from
pyfstat
import
MCMCSearch
import
pyfstat
import
numpy
as
np
# Properties of the GW data
sqrtSX
=
1e23
tstart
=
1000000000
duration
=
100
*
86400
tend
=
tstart
+
duration
# Properties of the signal
F0
=
30.0
F0
=
30.0
F1
=

1e10
F1
=

1e10
F2
=
0
F2
=
0
Alpha
=
5e3
Alpha
=
5e3
Delta
=
6e2
Delta
=
6e2
tref
=
362750407.0
tref
=
.
5
*
(
tstart
+
tend
)
tstart
=
1000000000
depth
=
10
duration
=
100
*
86400
h0
=
sqrtSX
/
depth
tend
=
tstart
+
duration
data_label
=
'fully_coherent_search_using_MCMC'
data
=
pyfstat
.
Writer
(
label
=
data_label
,
outdir
=
'data'
,
tref
=
tref
,
tstart
=
tstart
,
F0
=
F0
,
F1
=
F1
,
F2
=
F2
,
duration
=
duration
,
Alpha
=
Alpha
,
Delta
=
Delta
,
h0
=
h0
,
sqrtSX
=
sqrtSX
)
data
.
make_data
()
# The predicted twoF, given by lalapps_predictFstat can be accessed by
twoF
=
data
.
predict_fstat
()
print
'Predicted twoF value: {}
\n
'
.
format
(
twoF
)
theta_prior
=
{
'F0'
:
{
'type'
:
'unif'
,
'lower'
:
F0
*
(
1

1e6
),
'upper'
:
F0
*
(
1
+
1e6
)},
DeltaF0
=
1e7
'F1'
:
{
'type'
:
'unif'
,
'lower'
:
F1
*
(
1
+
1e2
),
'upper'
:
F1
*
(
1

1e2
)},
DeltaF1
=
1e13
VF0
=
(
np
.
pi
*
duration
*
DeltaF0
)
**
2
/
3.0
VF1
=
(
np
.
pi
*
duration
**
2
*
DeltaF1
)
**
2
*
4
/
45.
print
'
\n
V={:1.2e}, VF0={:1.2e}, VF1={:1.2e}
\n
'
.
format
(
VF0
*
VF1
,
VF0
,
VF1
)
theta_prior
=
{
'F0'
:
{
'type'
:
'unif'
,
'lower'
:
F0

DeltaF0
/
2.
,
'upper'
:
F0
+
DeltaF0
/
2.
},
'F1'
:
{
'type'
:
'unif'
,
'lower'
:
F1

DeltaF1
/
2.
,
'upper'
:
F1
+
DeltaF1
/
2.
},
'F2'
:
F2
,
'F2'
:
F2
,
'Alpha'
:
Alpha
,
'Alpha'
:
Alpha
,
'Delta'
:
Delta
'Delta'
:
Delta
}
}
ntemps
=
3
ntemps
=
1
log10temperature_min
=

1
log10temperature_min
=

1
nwalkers
=
100
nwalkers
=
100
0
nsteps
=
[
1000
,
100
0
]
nsteps
=
[
50
,
5
0
]
mcmc
=
MCMCSearch
(
label
=
'fully_coherent_search_using_MCMC'
,
outdir
=
'data'
,
mcmc
=
pyfstat
.
MCMCSearch
(
sftfilepath
=
'data/*basic*sft'
,
theta_prior
=
theta_prior
,
label
=
'fully_coherent_search_using_MCMC'
,
outdir
=
'data'
,
tref
=
tref
,
tstart
=
tstart
,
tend
=
tend
,
nsteps
=
nsteps
,
sftfilepath
=
'data/*'
+
data_label
+
'*sft'
,
theta_prior
=
theta_prior
,
tref
=
tref
,
nwalkers
=
nwalkers
,
ntemps
=
ntemps
,
minStartTime
=
tstart
,
maxStartTime
=
tend
,
nsteps
=
nsteps
,
nwalkers
=
nwalkers
,
log10temperature_min
=
log10temperature_min
)
ntemps
=
ntemps
,
log10temperature_min
=
log10temperature_min
)
mcmc
.
run
()
mcmc
.
run
(
context
=
'paper'
,
subtractions
=
[
30
,

1e10
]
)
mcmc
.
plot_corner
(
add_prior
=
True
)
mcmc
.
plot_corner
(
add_prior
=
True
)
mcmc
.
print_summary
()
mcmc
.
print_summary
()
pyfstat.py
View file @
d0708d9a
...
@@ 1102,6 +1102,7 @@ class MCMCSearch(BaseSearchClass):
...
@@ 1102,6 +1102,7 @@ class MCMCSearch(BaseSearchClass):
.
format
(
sampler
.
tswap_acceptance_fraction
))
.
format
(
sampler
.
tswap_acceptance_fraction
))
fig
,
axes
=
self
.
plot_walkers
(
sampler
,
symbols
=
self
.
theta_symbols
,
fig
,
axes
=
self
.
plot_walkers
(
sampler
,
symbols
=
self
.
theta_symbols
,
**
kwargs
)
**
kwargs
)
fig
.
tight_layout
()
fig
.
savefig
(
'{}/{}_init_{}_walkers.png'
.
format
(
fig
.
savefig
(
'{}/{}_init_{}_walkers.png'
.
format
(
self
.
outdir
,
self
.
label
,
j
),
dpi
=
200
)
self
.
outdir
,
self
.
label
,
j
),
dpi
=
200
)
...
@@ 1126,6 +1127,7 @@ class MCMCSearch(BaseSearchClass):
...
@@ 1126,6 +1127,7 @@ class MCMCSearch(BaseSearchClass):
fig
,
axes
=
self
.
plot_walkers
(
sampler
,
symbols
=
self
.
theta_symbols
,
fig
,
axes
=
self
.
plot_walkers
(
sampler
,
symbols
=
self
.
theta_symbols
,
burnin_idx
=
nburn
,
**
kwargs
)
burnin_idx
=
nburn
,
**
kwargs
)
fig
.
tight_layout
()
fig
.
savefig
(
'{}/{}_walkers.png'
.
format
(
self
.
outdir
,
self
.
label
),
fig
.
savefig
(
'{}/{}_walkers.png'
.
format
(
self
.
outdir
,
self
.
label
),
dpi
=
200
)
dpi
=
200
)
...
@@ 1370,7 +1372,7 @@ class MCMCSearch(BaseSearchClass):
...
@@ 1370,7 +1372,7 @@ class MCMCSearch(BaseSearchClass):
def
plot_walkers
(
self
,
sampler
,
symbols
=
None
,
alpha
=
0.4
,
color
=
"k"
,
temp
=
0
,
def
plot_walkers
(
self
,
sampler
,
symbols
=
None
,
alpha
=
0.4
,
color
=
"k"
,
temp
=
0
,
lw
=
0.1
,
burnin_idx
=
None
,
add_det_stat_burnin
=
False
,