diff --git a/Paper/bibliography.bib b/Paper/bibliography.bib index f3bddf1969aef51226358edd10167f90bd46901e..7fba00b6fbe1503c2221d75c12828bc91ff19b79 100644 --- a/Paper/bibliography.bib +++ b/Paper/bibliography.bib @@ -553,3 +553,9 @@ year = {2015} url = {http://link.aps.org/doi/10.1103/PhysRevD.72.102002} } +@book{mackay2003information, + title={Information theory, inference and learning algorithms}, + author={MacKay, David JC}, + year={2003}, + publisher={Cambridge university press} +} diff --git a/Paper/definitions.tex b/Paper/definitions.tex index e5a06ab9e8aa4d4a05b050502dd6095b63ea967b..decc3ee8621b2f800498523b7047a43a4e6b3e19 100644 --- a/Paper/definitions.tex +++ b/Paper/definitions.tex @@ -31,4 +31,5 @@ \newcommand{\Vpe}{\V_{\rm PE}} \newcommand{\smax}{s_{\textrm{max}}} \newcommand{\fmax}{f_{\textrm{max}}} +\newcommand{\rhohatmax}{\hat{\rho}_{\mathrm{max}}} diff --git a/Paper/paper_cw_mcmc.tex b/Paper/paper_cw_mcmc.tex index 9fd766ac18fbff7bc796e63a8cb9fddf097a5200..8512a596d3108ffaf85e57f97b2b398c2d6bfe43 100644 --- a/Paper/paper_cw_mcmc.tex +++ b/Paper/paper_cw_mcmc.tex @@ -203,172 +203,217 @@ its efficient. In Sections~\ref{sec_transients} and \ref{sec_glitches} we demonstrate how searches can be performed for either transient or glitches CWs before we finally conclude in Section~\ref{sec_conclusion}. -\section{Hypothesis testing} -\label{sec_hypothesis_testing} +\section{MCMC and the $\F$-statistic} +\label{sec_MCMC_and_the_F_statistic} -\subsection{Bayes factors} -Given some data $x$ and a set of background assumptions $I$, we formulate -two hypotheses: $\Hn$, the data contains solely Gaussian noise and $\Hs$, the -data contains an additive mixture of noise and a signal $h(t; \A, \blambda)$. -In order to make a quantitative comparison, we use Bayes theorem in the usual -way to write the odds as -\begin{equation} -O_{\rm S/N} \equiv \frac{P(\Hs| x, I)}{P(\Hn| x, I)} = -\Bsn(x| I) \frac{P(\Hs| I)}{P(\Hn | I)}, -\end{equation} -where the second factor is the \emph{prior odds} while the first is the -\emph{Bayes factor}: -\begin{equation} -\Bsn(x| I) = \frac{P(x| \Hs, I)}{P(x| \Hn, I)}. -\end{equation} -Typically, we set the prior odds to unity such that it is the Bayes factor -which determines our confidence in the signal hypothesis. In this work, we will -therefore discuss the Bayes factor with the implied assumption this is -equivalent to the odds, unless we have a good reason to change the prior odds. +In this section, we will give a brief introduction to the use of the +$\F$-statistic in a hypothesis testing framework, introduce conceptual ideas +about the use of Markov-Chain Monte-Carlo (MCMC) algorithms in this context, +and develop the idea of using MCMC routines in CW searches. -We can rewrite the Bayes factor in terms of the two sets of signal parameters -as -\begin{equation} -\Bsn(x| I) = \frac{P(x, \A, \blambda|\Hs, I)} -{P(\A| \Hs, I)P(\blambda| \Hs, I)P(x| \Hn, I)}. -\end{equation} -Marginalising over the two sets of parameters we find that -\begin{equation} -\Bsn(x| I)= \iint -\mathcal{L}(x; \A, \blambda) -P(\A| \Hs, I)P(\blambda| \Hs, I) -d\blambda d\A -\label{eqn_full_bayes} -\end{equation} -where +\subsection{Hypothesis testing framework} +\label{sec_hypothesis_testing} + +In \citet{prix2009}, a framework was introduced demonstrating the use of the +$\F$-statistic (first discussed in \citet{jks1998}) in defining the \emph{Bayes +factor} $\Bsn(\boldsymbol{x})$ between the signal hypothesis and the noise +hypothesis computed on some data $\boldsymbol{x}$. It was shown that for an +unphysical uniform prior on the amplitude parameters, the log-Bayes factor was +proportional to the $\F$-statistic. In this work, we will use the modification +proposed in \citet{prix2011}, which removed the antenna-pattern weighting +factor; let us now briefly review this framework. + +The Bayes-factor between the signal and noise hypothesis, given the +`$\F$-statistic prior', is \begin{equation} -\mathcal{L}(x; \A, \blambda) \equiv \frac{P(x |\A, \blambda, \Hs, I)}{P(x| \Hn, I)}, -\label{eqn_likelihood} +\Bsn(\boldsymbol{x}) = \int \Bsn(\boldsymbol{x}| \blambda)P(\blambda) d\blambda +\label{eqn_Bsn_x} \end{equation} -is the \emph{likelihood-ratio}. - -At this point, we can appreciate the problems of searching for unknown signals: -one has four amplitude parameters and several Doppler parameters over which -this integral must be performed. If a single signal exists in the data, this -corresponds to a single peak in the likelihood-ratio, but at an unknown -location. Therefore, one must must first search for peaks (candidates), and -then subsequently analyse their significance. If one has prior knowledge this -can be used; for example in targeted searches for gravitational waves from -known pulsars emitting CWs from a mountain the Doppler parameters are -considered known collapsing their integral to a single evaluation (see for -example the method proposed by \citet{dupuis2005}). -\comment{Should we cite other known-pulsar methodologies?} - -\subsection{The $\F$-statistic: a frequentist perspective} - -For directed and all-sky searches, a common method introduced by -\citet{jks1998} to reduce the parameter space is the maximum-likelihood -approach. In this approach (often referred to as `frequentist'), one starts by -defining the likelihood-ratio, Equation~\eqref{eqn_likelihood}, which in this -context is a \emph{matched-filtering} amplitude. Then, analytically maximising -this likelihood-ratio with respect to the four amplitude parameters results -(c.f.~\citet{prix2009}) in a maximised log-likelihood given by $\F(x| -\blambda)$: the so-called $\F$-statistic. Picking a particular set of Doppler -parameters $\blambda$ (the template) one can then compute a detection statistic -(typically $\twoFtilde$ is used\footnote{Note that the `tilde' denote that it -is a fully-coherent quantity while we use a `hat' to denote a semi-coherent -quantity}) which can be used to quantify the significance of the template. -Usually this is done by calculating a corresponding false alarm rate, the -probability of seeing such a detection statistic in Gaussian noise. - -Calculations of the significance are possible due to the properties of the -detection statistic: in Gaussian noise, it can be shown \citep{jks1998, -cutlershutz2005} that $\widetilde{2\F}$ follows a chi-squared distribution with -4 degrees of freedom. In the presence of a signal, $\widetilde{2\F}$ is -still chi-squared with 4 degrees of freedom, but has a non-centrality parameter -$\tilde{\rho}^{2}$ such that its expectation value is +where $\Bsn(\boldsymbol{x}| \blambda)$ is the +targeted Bayes factor (i.e. computed at a set of +Doppler parameters $\blambda$): \begin{equation} -\textrm{E}[\widetilde{2\F}(x; \blambda)] = 4 + \tilde{\rho}(x; \blambda)^2. -\label{eqn_twoF_expectation} +\Bsn(\boldsymbol{x}|\blambda) = \frac{70}{\rhohatmax^{4}} +e^{\F(\boldsymbol{x}; \blambda)}. +\label{eqn_CW_likelihood} \end{equation} -The non-centrality parameter in this context is the SNR of the matched-filter -given by +Here, $\rhohatmax$ is a maximum cutoff SNR required to normalise the prior. +The original calculation was done for a transient-CW, in Eqn~\eqref{eqn_CW_likelihood} +we constrain the system to the special case of standard CW signals for which the +marginalisation over the transient parameters yields unity. + +Furthermore, \citet{prix2011} also demonstrated that the semi-coherent Bayes +factor naturally arrises by splitting the likelihood into $N$ independent +\emph{segment} resulting in \begin{equation} -\rho^{2} = (h | h) \propto \frac{h_0^2}{\Sn}\Tcoh \mathcal{N} +\Bsn^{\textrm{SC}}(\boldsymbol{x}|\blambda) = +\left(\frac{70}{\rhohatmax^{4}}\right)^{N} +e^{\sum_{\ell=1}^{N}\F(\boldsymbol{x}_{(\ell)}; \blambda)}, +\label{eqn_SC_CW_likelihood} \end{equation} -where $(h|h)$ is the inner product of the signal with itself (see for example -\citet{prix2009}), $\Sn$ is a (frequency-dependent) measure of the noise in -the detector and $\mathcal{N}$ is the number of detectors. - - -\subsection{The $\F$-statistic: a Bayesian perspective} - -At first, it -appears that the $\F$-statistic is independent of the Bayesian framework since -it was first derived directly from the likelihood. However, it was shown in -\citet{prix2009, prix2011} that if we marginalise over the four amplitude parameters of -Equation~\eqref{eqn_full_bayes} (using the ``$\F$-statistic prior'' \citep{prix2011}, -which we denote by $\AmplitudePrior$) -then the integral, when $\homax \gg 1$, is a Gaussian integral and can be -computed analytically as -\begin{align} -\begin{split} -\Bsn(x| \blambda, \AmplitudePrior, I) & \equiv -\int -\mathcal{L}(x ;\A, \blambda) -P(\A| \Hs, \AmplitudePrior, I) d\A -\label{eqn_B_S_N} -\\ -& \propto e^{\tilde{\F}(x| \blambda)} -\end{split} -\end{align} -where $\mathcal{F}$ is exactly the $\F$-statistic, first derived -as a maximised log-likelihood \citep{jks1998}. This result -demonstrates that the $\mathcal{F}$-statistic is proportional to the log-Bayes -factors when calculated with a uniform prior on the amplitude parameters and -fixed Doppler parameters. +where $\boldsymbol{x}_{(\ell)}$ refers to the data in the $\ell^{\mathrm{th}}$ +segment. As such, this can replace the relevant term in -As such, we can define the Bayes-factor of Equation~\eqref{eqn_full_bayes} as -\begin{equation} -\Bsn(x| \AmplitudePrior, I) = \int -\Bsn(x| \AmplitudePrior, \blambda, I) P(\blambda| \Hs, I) - d\blambda, -\end{equation} -or neglecting the constants -\begin{equation} -\Bsn(x| \AmplitudePrior, I) \propto \int -e^{\tilde{\F}(x| \blambda)} P(\blambda| \Hs, I) - d\blambda. -\label{eqn_bayes_over_F} -\end{equation} -Formulating the significance of a CW candidate in this way is pragmatic in that -there exists a wealth of well-tested tools (i.e. \texttt{LALSuite} -\citep{lalsuite}) capable of computing the $\mathcal{F}$-statistic for CW -signals, transient-CWs, and CW signals from binary systems; these can be -leveraged to compute Equation~\eqref{eqn_bayes_over_F}. -\section{MCMC and the $\mathcal{F}$-statistic} -\label{sec_MCMC_and_the_F_statistic} +%\section{Hypothesis testing} +%\label{sec_hypothesis_testing} +% +%\subsection{Bayes factors} +%Given some data $x$ and a set of background assumptions $I$, we formulate +%two hypotheses: $\Hn$, the data contains solely Gaussian noise and $\Hs$, the +%data contains an additive mixture of noise and a signal $h(t; \A, \blambda)$. +%In order to make a quantitative comparison, we use Bayes theorem in the usual +%way to write the odds as +%\begin{equation} +%O_{\rm S/N} \equiv \frac{P(\Hs| x, I)}{P(\Hn| x, I)} = +%\Bsn(x| I) \frac{P(\Hs| I)}{P(\Hn | I)}, +%\end{equation} +%where the second factor is the \emph{prior odds} while the first is the +%\emph{Bayes factor}: +%\begin{equation} +%\Bsn(x| I) = \frac{P(x| \Hs, I)}{P(x| \Hn, I)}. +%\end{equation} +%Typically, we set the prior odds to unity such that it is the Bayes factor +%which determines our confidence in the signal hypothesis. In this work, we will +%therefore discuss the Bayes factor with the implied assumption this is +%equivalent to the odds, unless we have a good reason to change the prior odds. +% +%We can rewrite the Bayes factor in terms of the two sets of signal parameters +%as +%\begin{equation} +%\Bsn(x| I) = \frac{P(x, \A, \blambda|\Hs, I)} +%{P(\A| \Hs, I)P(\blambda| \Hs, I)P(x| \Hn, I)}. +%\end{equation} +%Marginalising over the two sets of parameters we find that +%\begin{equation} +%\Bsn(x| I)= \iint +%\mathcal{L}(x; \A, \blambda) +%P(\A| \Hs, I)P(\blambda| \Hs, I) +%d\blambda d\A +%\label{eqn_full_bayes} +%\end{equation} +%where +%\begin{equation} +%\mathcal{L}(x; \A, \blambda) \equiv \frac{P(x |\A, \blambda, \Hs, I)}{P(x| \Hn, I)}, +%\label{eqn_likelihood} +%\end{equation} +%is the \emph{likelihood-ratio}. +% +%At this point, we can appreciate the problems of searching for unknown signals: +%one has four amplitude parameters and several Doppler parameters over which +%this integral must be performed. If a single signal exists in the data, this +%corresponds to a single peak in the likelihood-ratio, but at an unknown +%location. Therefore, one must must first search for peaks (candidates), and +%then subsequently analyse their significance. If one has prior knowledge this +%can be used; for example in targeted searches for gravitational waves from +%known pulsars emitting CWs from a mountain the Doppler parameters are +%considered known collapsing their integral to a single evaluation (see for +%example the method proposed by \citet{dupuis2005}). +%\comment{Should we cite other known-pulsar methodologies?} +% +%\subsection{The $\F$-statistic: a frequentist perspective} +% +%For directed and all-sky searches, a common method introduced by +%\citet{jks1998} to reduce the parameter space is the maximum-likelihood +%approach. In this approach (often referred to as `frequentist'), one starts by +%defining the likelihood-ratio, Equation~\eqref{eqn_likelihood}, which in this +%context is a \emph{matched-filtering} amplitude. Then, analytically maximising +%this likelihood-ratio with respect to the four amplitude parameters results +%(c.f.~\citet{prix2009}) in a maximised log-likelihood given by $\F(x| +%\blambda)$: the so-called $\F$-statistic. Picking a particular set of Doppler +%parameters $\blambda$ (the template) one can then compute a detection statistic +%(typically $\twoFtilde$ is used\footnote{Note that the `tilde' denote that it +%is a fully-coherent quantity while we use a `hat' to denote a semi-coherent +%quantity}) which can be used to quantify the significance of the template. +%Usually this is done by calculating a corresponding false alarm rate, the +%probability of seeing such a detection statistic in Gaussian noise. +% +%Calculations of the significance are possible due to the properties of the +%detection statistic: in Gaussian noise, it can be shown \citep{jks1998, +%cutlershutz2005} that $\widetilde{2\F}$ follows a chi-squared distribution with +%4 degrees of freedom. In the presence of a signal, $\widetilde{2\F}$ is +%still chi-squared with 4 degrees of freedom, but has a non-centrality parameter +%$\tilde{\rho}^{2}$ such that its expectation value is +%\begin{equation} +%\textrm{E}[\widetilde{2\F}(x; \blambda)] = 4 + \tilde{\rho}(x; \blambda)^2. +%\label{eqn_twoF_expectation} +%\end{equation} +%The non-centrality parameter in this context is the SNR of the matched-filter +%given by +%\begin{equation} +%\rho^{2} = (h | h) \propto \frac{h_0^2}{\Sn}\Tcoh \mathcal{N} +%\end{equation} +%where $(h|h)$ is the inner product of the signal with itself (see for example +%\citet{prix2009}), $\Sn$ is a (frequency-dependent) measure of the noise in +%the detector and $\mathcal{N}$ is the number of detectors. +% +% +%\subsection{The $\F$-statistic: a Bayesian perspective} +% +%At first, it +%appears that the $\F$-statistic is independent of the Bayesian framework since +%it was first derived directly from the likelihood. However, it was shown in +%\citet{prix2009, prix2011} that if we marginalise over the four amplitude parameters of +%Equation~\eqref{eqn_full_bayes} (using the ``$\F$-statistic prior'' \citep{prix2011}, +%which we denote by $\AmplitudePrior$) +%then the integral, when $\homax \gg 1$, is a Gaussian integral and can be +%computed analytically as +%\begin{align} +%\begin{split} +%\Bsn(x| \blambda, \AmplitudePrior, I) & \equiv +%\int +%\mathcal{L}(x ;\A, \blambda) +%P(\A| \Hs, \AmplitudePrior, I) d\A +%\label{eqn_B_S_N} +%\\ +%& \propto e^{\tilde{\F}(x| \blambda)} +%\end{split} +%\end{align} +%where $\mathcal{F}$ is exactly the $\F$-statistic, first derived +%as a maximised log-likelihood \citep{jks1998}. This result +%demonstrates that the $\mathcal{F}$-statistic is proportional to the log-Bayes +%factors when calculated with a uniform prior on the amplitude parameters and +%fixed Doppler parameters. +% +%As such, we can define the Bayes-factor of Equation~\eqref{eqn_full_bayes} as +%\begin{equation} +%\Bsn(x| \AmplitudePrior, I) = \int +%\Bsn(x| \AmplitudePrior, \blambda, I) P(\blambda| \Hs, I) +% d\blambda, +%\end{equation} +%or neglecting the constants +%\begin{equation} +%\Bsn(x| \AmplitudePrior, I) \propto \int +%e^{\tilde{\F}(x| \blambda)} P(\blambda| \Hs, I) +% d\blambda. +%\label{eqn_bayes_over_F} +%\end{equation} +% +%Formulating the significance of a CW candidate in this way is pragmatic in that +%there exists a wealth of well-tested tools (i.e. \texttt{LALSuite} +%\citep{lalsuite}) capable of computing the $\mathcal{F}$-statistic for CW +%signals, transient-CWs, and CW signals from binary systems; these can be +%leveraged to compute Equation~\eqref{eqn_bayes_over_F}. The MCMC class of optimisation tools are formulated to solve the problem of inferring the posterior distribution of some general model parameters $\theta$ -given given some data $x$ for some hypothesis $\H$. Namely, Bayes rule +given given some data $x$. We will not give a full description here, but +refer the reader to the literature, e.g. \citet{mackay2003information}. +In this case, we will be concerned with the posterior +distribution of $\blambda$, the Doppler parameters. From Bayes theorem, we +see that this is proportional to the integrand of Eqn.~\eqref{eqn_Bsn_x}: \begin{equation} -P(\theta| x, \H, I) \propto P(x| \theta, \H, I)P(\theta| \H, I), -\label{eqn_bayes_for_theta} +\Bsn(\blambda | \boldsymbol{x}) \propto \Bsn(\boldsymbol{x}| \blambda)P(\blambda). \end{equation} -is used to evaluate proposed steps in parameter space. After some -initilisations, the samples (points visited during the stepping process) can be -used as samples of the posterior distribution. The algorithm is highly -efficient at resolving peaks in high-dimension parameter spaces. -At this point, we note the equivalence of Equation~\eqref{eqn_bayes_for_theta} -to the integrand of Equation~\eqref{eqn_bayes_over_F}: -\begin{equation} -P(\blambda | x, \AmplitudePrior, \Hs, I) -%=\Bsn(x| \AmplitudePrior, \blambda) P(\blambda| \Hs I). -\propto e^{\F(x| \blambda)} P(\blambda| \Hs I), -\label{eqn_lambda_posterior} -\end{equation} -where $e^{\F}$ is the likelihood. -In this work, we will focus on using MCMC methods to sample this, the -posterior distribution of the Doppler parameters. +The likelihood function (either for the fully-coherent of semi-coherent case) +and the prior distributions can be passed to any optimisation routine in order +to maximise over the unknown signal parameters. In this paper, we will discuss +the use of an ensemble MCMC optimization routine, this has the advantage of +being efficient in high-dimensional spaces, the ability to sampler multi-modal +posteriors, and to calculate the marginalised Bayes factor (i.e. +Eqn~\ref{eqn_Bsn_x}). \subsection{The \texttt{emcee} sampler}