Stochastic filtering for multiscale stochastic reaction networks based on hybrid approximations

In the past few decades, the development of fluorescent technologies and microscopic techniques has greatly improved scientists' ability to observe real-time single-cell activities. In this paper, we consider the filtering problem associate with these advanced technologies, i.e., how to estimate latent dynamic states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. A good solution to this problem can further improve scientists' ability to extract information about intracellular systems from time-course experiments. A straightforward approach to this filtering problem is to use a particle filter where particles are generated by simulation of the full model and weighted according to observations. However, the exact simulation of the full dynamic model usually takes an impractical amount of computational time and prevents this type of particle filters from being used for real-time applications, such as transcription regulation networks. Inspired by the recent development of hybrid approximations to multiscale chemical reaction networks, we approach the filtering problem in an alternative way. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exploiting the time-scale separations in the original network and, therefore, can greatly reduce the computational effort required to simulate the dynamics. As a result, we are able to develop efficient particle filters to solve the filtering problem for the original model by applying particle filters to the reduced model. We illustrate the accuracy and the computational efficiency of our approach using several numerical examples.


Introduction
In the past few decades, scientists' ability to look into the dynamic behaviors of a living cell has been greatly improved by the fast development of fluorescent technologies [1] and advances in microscopic techniques [2,3].Despite this big success, light intensity signals observed in a microscope can only report the dynamics of a small number of species in a cell, such as fluorescent proteins and mRNAs, and, therefore, leave other dynamic states of interest, e.g., gene (on/off) state, transcription factor abundance, or enzyme levels, indirectly observable.Consequently, it is important to establish efficient stochastic filters to estimate latent states of intracellular biochemical reaction systems from these partial observations.The filtering theory is a topic of active research in both control and statistics communities, which dates back to early works by Wiener [4] and Kalman [5].It is acknowledged that apart from some particular systems, e.g., linear systems with Gaussian noise and Markov chains with finite states, most nonlinear systems result in infinitedimensional filters [6]; therefore, obtaining numerical solutions to the filtering problem for nonlinear systems is a difficult task.A powerful algorithm, called the particle filter (or sequential Monte Carlo method), was introduced in [7], which represent posterior distributions using a population of samples (also called particles) generated by importance sampling and resampling (see also [8]).Thanks to the ever-increasing computational power, a particle filter can solve the filtering problem efficiently in nonlinear non-Gaussian scenarios, and its convergence to the exact solution of the filtering problem is guaranteed under some mild conditions [9,10,11,12,13,14,15,16].
An intracellular reaction system is better modeled by a continuous-time Markov chain rather than an ordinary differential equation due to the presence of low copy number species, and it usually results in an infinite-dimensional filter which is not easy to calculate in a straightforward fashion.In this scenario, the particle filter is a good candidate for solving the filtering problem.The recent work in [17] is good evidence of this point, where a particle filter was established based on exact partial state observations and shown to work well for many biological processes.Despite the success of particle filters in solving the filtering problem for intracellular systems, the required heavy computational effort to simulate the system in the sampling step usually prevents it from being used for real-time applications.Especially for multi-scale reaction systems where reactions fire at different timescales, the computational complexity of exact simulation methods (e.g., Gillespie stochastic simulation algorithm [18,19] and the next reaction method [20]) is proportional to the rate of the fastest reaction, and, hence, the algorithm takes a very long time to output an estimate.A typical way to mitigate this type of problems is to replace the underlying system by a more tractable approximation so that a computationally efficient filter can be constructed based on the reduced model while the filter's accuracy is still preserved (see [21,Section 2]).Following this idea, a method was proposed to obtain computationally efficient filters for biological systems by approximating the underlying Markov jump process by a diffusion process [22,23,24,25].This method was shown to be efficient for broad classes of biochemical reaction systems; however, its drawback is also clear -the diffusion process strategy loses its validity in low copy number scenarios.Unfortunately, multi-scale reaction systems, which are common in systems biology [26], usually involve species with low copy numbers; this fact precludes the usage of the diffusion approximation to these multi-scale systems and requires researchers to build new solutions.
In this paper, we propose another strategy to obtain efficient particle filters based on hybrid approximations of multi-scale stochastic reaction networks.The hybrid approximation technique treats the firing of fast reactions as continuous processes (see [26]) or removes them by quasi-stationary approximations [27,28], as a consequence of which the computational complexity can be greatly reduced.A systematic approach to constructing such an approximation was provided in [26], and strategies for efficient simulation of these approximations were introduced in [29,30,31].Such a reduced model is called a hybrid approximation as some species exhibit jumping processes due to their low copy numbers, while others evolve continuously.Therefore, this model is also called a piecewise deterministic Markov process as it is Markovian and governed by a deterministic differential equation between two neighboring jumping points.Besides its application in running simulations, the hybrid approximation method can also be used to do sensitivity analysis for multi-scale stochastic reaction networks [32,33], which implies that this method has potential for broad applications.In this paper, we prove that the solution to the filtering problem for the original system can be constructed by solving the filtering problem for the hybrid approximation if some mild conditions are satisfied.Notably, while the result seems intuitive, the mathematical analysis is non-trivial because the filtering problem focuses on the conditional probability rather than the unconditional probability (i.e., the probability of the full system), and the convergence of the latter does not necessarily imply the convergence of the former.The obtained result then enables us to develop efficient particle filters to infer system parameters (reaction constants) and latent dynamic states of the original model instantaneously by applying particle filters to the reduced model.
Compared with our previous work [34], where system parameters are known and only discrete-time observations are considered, here we additionally consider the parameter uncertainty and the case of continuous-time observations.We show that in the presence of parameter uncertainties, the proposed filter can efficiently solve the filtering problem in both the continuous-time observation case and the discrete-time observation case.Also, several numerical examples of gene expression networks are presented to illustrate our approach.
It is worth noting that the idea of applying time-scale separation techniques to developing computationally man-ageable particle filters has already been proposed in [35,36,37] for diffusion type stochastic models, and its efficiency is proven in the literature [38].Compared with these works, our paper considers a different type of underlying models, the Markov jump process, and provides a much-simplified proof for the convergence of the approximate filters (based on the framework by [21]).All these references and our paper show the efficiency of applying the time-scale separation technique to the filtering problem for multi-scale systems.The rest of this paper is organized as follows.In Section 2, we first introduce the filtering problem for a chemical reaction network system and, also, review the hybrid model reduction technique and the theory of particle filters.Then, our main results are presented in Section 3, stating that the filtering problem for a multi-scale reaction system can be accurately and computationally efficiently solved by a particle filter to its hybrid approximation.Also, we provide the workflow of our approach in Section 3 to guide practitioners to solve their specific problems.To improve the readability of this paper, we put proofs of our main results in the appendices.In Section 4, two numerical examples are presented to illustrate our approach.Finally, Section 5 concludes this paper.

Notations
In this paper, we denote the natural filtered probability space by (Ω, F , {F t } t≥0 , P), where Ω is the sample space, F is the σ-algebra, {F t } t≥0 is the filtration, and P is the natural probability.Also, we term N >0 as the set of positive integers, R n with n being a positive integer as the space of n-dimensional real vectors, • as the Euclidean norm, | • | as the absolute value notation, and s ∧ t (for any s, t ∈ R) as min(s, t).For any positive integer n and any t > 0, we term D R n [0, t] as the Skorokhod space that consists of all R n valued cadlag functions on [0, t].Here, cadlag functions refer to the functions that are right continuous and have a left limit.

Chemical reaction networks and their filtering problems
In this paper, we consider an intracellular system undergoing reactions where S i (i = 1, . . ., n) are distinguished species in the systems, v i, j and v i, j are non-negative integers, called stoichiometric coefficients, k j are the reaction constants, and r is the number of reactions.Also, we name the linear combination of species (e.g, v 1, j S 1 + • • • + v n, j S n ) as a complex.Let X(t) = (X 1 (t), X 2 (t), . . ., X n (t)) be the numbers of molecules of these species at time t, then the system's dynamics following mass-action kinetics can be expressed as [39] X where R j (t) ( j = 1, . . ., r) are mutually independent unit rate Poisson processes, λ j (x) k j n i=1 x i !(xi−vi,j)! 1 {x i ≥v i, j } are propensities with 1 {•} the indicator function, and the initial condition X(0) has a particular known distribution.
In a practical biochemical reaction system, different species can vary a lot in abundance, and rate constants can also vary over several orders of magnitude.To emphasize this phenomenon, we reformulate the dynamical equation as follows using the normalized quantities of the aforementioned variables together with their scales.Following [26], we first choose a large number N (viewed as the scaling factor) and then normalize all the quantities by a power of N. Specifically, we term the variable as the magnitude of species S i , the variable β j (for j = 1, . . ., r) satisfying k j k j N −β j = O(1) as the magnitude of the reaction constant k j , the variable γ as the time scale of interest, and X N,γ (t) X N (tN γ ) as the normalized state in the time scale of interest.The choice of the scaling factor N is problem-specific; a suitable N should clearly separate the scales of species numbers and rate constants.Though such a scaling factor might not always exists, it often does for multiscale reaction systems.Also, in this paper, we only consider problems in a finite time window where the scales of the species and rate constants do not change over time.Finally, in the normalized coordinate, the dynamical equation can be expressed as where Λ N diag(N −α1 , . . ., N −α s ), ρ j β j + n i=1 v i j α i is the time scales of reactions, and } is the normalized propensity and has the constant order.Usually, the precise values of the initial condition X N (0) and reaction constants K k 1 , . . ., k r are unknown to researchers due to the variability of these parameters from cell to cell.Therefore, in this paper, we consider these parameters to be subject to some probability distributions.For fixed K, the infinitesimal generator of (1), denoted as L N,γ K , satisfies 1 for any bounded continuous function f on R n .We assume that some species in the system are fluorescent reporters, and m channels of light intensity signals of these reporters can be observed from a microscopic platform (e.g., the one in [40]).Mainly, two types of observations, continuous-time observations and discrete-time observations, are present in practical experiments depending on hardware properties.We denote the continuous-time observations by ẎN,γ c (•) and the discrete-time observations by Y N,γ d (•) and, moreover, assume them to satisfy where h is an m-dimensional bounded Lipschitz continuous function indicating the relation between the observation and the reaction process, B(t) is an m-vector of independent standard Brownian motions, {t i } i∈N >0 is a strictly increasing sequence of time points at which the discrete observation comes, and {W (t i )} i∈N >0 are mutually independent m-variate Gaussian random variables whose covariance matrices are equal to the identity matrix.In ( 2) and ( 3), the derivative ẎN,γ c (t) = h X N,γ (t) + Ḃ(t) and Y N,γ d (t i ) model the light signals measured by a microscope in the continuous-time setting and the discrete-time setting, respectively.Since Ḃ(t) is mathematically tricky, we write the continuous-time observations using the integral form in (2) and use the accumulative quantity Y N,γ c (t) later for inference.For these observations, we also assume that these Brownian motions and Gaussian random variables are independent of the Poisson processes R j (•) ( j = 1, . . ., r) and system parameters K and X N (0); in other words, the noise in observations is independent of the underlying reaction system.Also, we term Y N,γ c,t as the filtration generated by the continuous observation {Y N,γ c (s)} 0<s≤t and Y N,γ d,t i as the filtration generated by the discrete observation {Y N,γ d (t j )} 0< j≤i .The goal of the filtering problem is to infer latent states of the underlying system, e.g., φ(K, X N,γ (t)) with φ a known measurable function from φ : R r × R n to R, based on the collected observations.Specifically, it requires one to compute the conditional expectation π N,γ c,t (φ) based on continuous-time observations, or the conditional expectation π N,γ d,t i (φ) E P φ K, X N,γ (t i ) Y N,γ d,t i based on discrete-time observations.Also, we call these conditional expectations as true filters of the chemical reaction network system.For a well-behaved function φ, the conditional expectation π N,γ c,t (φ) satisfies the Kushner-Stratonovich equation [6] π The equation is usually infinite-dimensional and has no explicit solution.In the discrete-time scenario, the conditional distribution satisfies a recursive expression [6,Proposition 10.6], which also has no explicit solution in most cases.As a result, we intend to find efficient algorithms that can quickly and accurately solve these filtering problems.
Before introducing our main results, we review the setup of hybrid approximations of multi-scale reaction systems and the theory of particle filters.

Hybrid approximations at the first timescale
By fixing α i (i = 1, . . ., n) and β j ( j = 1, . . ., r) and changing the scaling factor N, we obtain a one-parameter family of models X N,γ (•).To derive reduced models, we first assume the initial conditions of these models satisfy " lim N→∞ X N (0) exists P-almost surely, and its i-th entry is positive if α i > 0." (4) which can be simply achieved by setting their initial conditions to be the same.Following the notations in [26], we term is the timescale of the i-th species, i.e., the minimum γ such that X N,γ i (t) starts to evolve at rate of O(1).Moreover, we term γ 1 min i α i − max j∈Γ + i ∪Γ − i ρ j as the parameter of the fastest timescale of the system and D α diag(1 {α 1 = α} , . . ., 1 {α n = α} ) as a diagonal matrix indicating whether a species is at the scale of N α.
By neglecting all slow reactions ( ρ j +γ 1 < 0) and approximate fast reactions ( ρ j +γ 1 > 0) by a continuous process, one can arrive at a simplified dynamic model as follows.
Proof.The proof follows easily from [26,Theorem 4.1] Note that the convergence result does not always hold on the infinite time interval; one intuitive explanation is that the limit model may preclude bi-stability or some other phenomenon that the full model has (see [41,Section VI]).
Remark 1.The computational complexity to simulate X γ 1 (•) can be greatly lower than the complexity to simulate X N,γ 1 (•), because the former avoids the exact simulation of fast reactions (γ + β j + v • j α > 0), which consume a lot of computational resources to update the system at a rate proportional to N γ 1 + ρ j .

Hybrid model at the second timescale
The second level of model reduction corresponds to the balance of complexes.Similar to the first timescale case, for any θ ∈ R n , we term To avoid the degeneracy in the limit, we only consider those cases satisfying max which prevents the concentration of the complex from exploding or diminishing to zero as the system scale grows [26].Notably, the above condition always holds for γ = γ 1 (see [26]).If there exists a constant, γ 2 , such that then we term γ 2 as the second timescale for the system.Let e i be a unit vector of R n with i-th element being 1 and the rest being 0. We denote L 1 as the space spanned by S 1 e i ∃ j s.t. e j D r 1 + ρ j (v • j − v • j ) 0 and L 2 as the space spanned by Note that the subspaces L 1 and L 2 are not necessarily orthogonal, because their intersection can contain non-zero vectors.At the second timescale, one can easily see that the species correspond to S 1 fluctuate dramatically due to the reactions of the first timescale (i.e., D γ 1 + ρ j being non-zero), however, these reactions do not influence the slow complex corresponding to S 2 .
We further denote the projection operator onto L 2 by Π 2 , and the identity operator by I.For any fixed for any bounded continuously differentiable function f on R n .Also, we assume that for any positive x 2 ∈ L 2 and almost every K with respect to its probability measure, there holds the condition that This assumption requires the fast varying species to be stable and, therefore, makes it possible to conclude their dynamic effects using a random measure.We further term a random measure (dx 1 )ds.Then, at the second timescale, a reduced model on the state space L 2 can be derived as follows.
Remark 2. Compared with the original model (1), the reduced model (10) not only replaces some fast reactions (γ 2 + ρ j > 0 and D γ 2 + ρ j 0 n×n3 ) with a continuous process but also removes some fast reactions (γ 2 + ρ j > 0 and D γ 2 + ρ j = 0 n×n ) from the model.As a result, the reduced model (10) can further accelerate the simulation of the target system.
For the validity of the hybrid model at the second time scale (see 10), we need to introduce some additional assumptions.The first one is an ergodicity assumption on the fast-varying subsystem.For a fixed K, we define a random measure on where 1 A is the indicator function.Also, we require that The above convergence of random measures means that for almost every fixed K, any bounded continuous function f (•), and all t > 0, there holds Since the set of continuous functions with compact support is a subset of bounded functions, the above condition also suggests the convergence of these random measures in distribution (see [42,Theorem 4.5]).The other assumptions are technical and can be stated as follows.Let We assume that for each c there exists a function and Finally, under the above assumptions and non-explosivity of X N,γ 2 and X γ 2 , i.e., = ∞, and lim where τ γ 2 c inf{t X γ 2 (t) ≥ c}, the process X N,γ 2 converges in distribution to X γ 2 on any finite time interval.The details are listed in the following proposition.
Proposition 2 (Adapted from [26]).If conditions (4), ( 8), ( 9), ( 11), ( 12), (13), and (14) are satisfied, then on any finite time interval [0, T ], there holds K, Π 2 X N,γ 2 (•) ⇒ (K, X γ 2 (•)) in the sense of the Skorokhod topology.Among all these assumptions in the above result, ( 11), (12), and ( 13) are the most demanding ones.Verifying them usually requires one to construct proper Lyapunov functions for the fast dynamics (see [33,Remark 4.19]) and the full dynamics (see [26,Lemma 5.3]).Moreover, since system parameters are random (but fixed over time) in our problem, the construction of Lyapunov functions should be robust to the choice of system parameters so that the criteria can work for almost every K.The method proposed in [43] where the vector norm serves as the Lyapunov function can be a candidate for solving such a problem.

Hybrid approximations at a higher timescale
The key to generating a hybrid approximation of a multiscale system is to assume quasi-equilibrium for the fast sub-network [33,Section 2.3].Therefore, following this rule, one can construct a hierarchy of reduced models to depict dynamic behaviors at different timescales.For our main results, we assume ourselves in the situation of Proposition 1 and Proposition 2 only, i.e., the first and second timescales.However, these results can be straightforwardly extended to a higher timescale using the same proof scheme shown in the appendices.

Observations for reduced models
We can also define filtering problems for the reduced models ( 5) and ( 10) as follows.We first define artificial readouts for these reduced models by and term Y γ c,t and Y γ d,t i ( = 1, 2) as the filtrations generated by {Y γ c (s)} 0<s≤t and {Y γ d (t j )} 0< j≤i , respectively.Then, for reduced models ( 5) and (10), the filtering problem requires one to compute the conditional expectations for = 1, 2 based on the corresponding continuous-time and discrete-time observations, respectively.Notably, since the reduced model does not exist in reality, one can never collect these "imaginary" readouts Therefore, solving the filtering problems for the reduced models has no much practical meaning.The value of these artificial filtering problems lies in helping us construct an efficient particle filter to solve the filtering problems for the original model, which will be discussed in-depth in Section 3.

Particle filters
The particle filter, also known as the sequential Monte Carlo method, is a protocol used to infer latent dynamic states and system parameters by generating samples that mimic dynamic behaviors of the underlying systems.In the sequel, we separately review continuous-time particle filters and discrete-time ones, including their derivations and specific algorithms.

Continuous-time particle filters
By denoting a Girsanov random variable whose reciprocal is a martingale under P [6, Lemma 3.9], and a reference probability dP N,γ , one can arrive at the Kallianpur-Striebel formula [6, Proposition 3.16] (also see [44]) which transforms the filtering problem under the natural probability into the filtering problem under the reference probability.Since under the reference probability, the observations are independent of the underlying system and become an m-vector of independent standard Brownian motions [6, Proposition 3.13], the computation of conditional expectations under the reference probability is much simpler than the computation under the original probability P.
We can also construct reference probabilities for reduced models, which orthogonalizes 4 the observations and the underlying systems and provide Kallianpur-Striebel formulas for them.The details are shown in Appendix A.
Based on (15), a continuous-time sequential importance resampling (SIR) particle filter can be constructed as Algorithm 1, which generates samples under the reference probability and uses Monte Carlo method to solve the filtering problem.Specifically, in Algorithm 1, the weights w j (t) (for j = 1, . . ., M) mimic normalized Z N,γ c (t), and particles (κ j (t), x j (t)) mimic the state (K, X N,γ (t)); thus, the empirical sum of φ(κ j (t), x j (t)) with respect to weights w j (t), approximates the true filter by the Kallianpur-Striebel formula (15).Notably, resampling is executed in each iteration to delete non-significant particles and mitigate the long-term sample impoverishment at the cost of adding additional noise at the current step [8].The residual resampling scheme is usually preferred in the resampling step, as it generally outperforms other schemes, e.g., multinomial resampling scheme, in the sense of asymptotic variances [12].

Discrete-time particle filters
For discrete-time observations, we can also define a random variable , and whose reciprocal is a martingale under P. 5 Then, a reference probability, P N,γ d , can be constructed by dP , under which the underlying system and the discrete-time observation are independent of each other, and {Y N,γ d (t i )} i∈N >0 are mutually independent m-variate Gaussian random variables with covariance matrices being the identity matrix.Using the Kallianpur-Striebel formula [44], we can express the filter like the following which transforms the original filtering problem into the filtering problem under the reference probability.For the reduced models, we can also construct such reference probabilities and Kallianpur-Striebel formulas, which is discussed in detail in Appendix A.
Based on ( 16), a discrete-time SIR particle filter can be constructed as Algorithm 2 to solve the filtering problem numerically.Algorithm 2 can be viewed as a discrete-time analog of Algorithm 1, where the only difference is substituting the discrete-time arguments in Algorithm 2 for the continuous ones in Algorithm 1.

Algorithm 2
The discrete-time SIR particle filter [7,8] 1: Input observations {Y(t i )} i∈N >0 , a dynamical model, and a initial distribution; 2: Initialization: The same as the initialization step in Algorithm 1; Also, t 0 = 0. 3: while t i does not exceed the terminal time of the observations do 4: Sampling: simulate x j (•) ( j = 1, . . ., M) from time t i−1 to t i according to the underlying model with parameters κ j (t i−1 ) and initial conditions x j (t i−1 ); Set κ j (t i ) = κ j (t i−1 ); Calculate weights w j (t i ) ∝ z j (t i ) H(x j (t i ), Y(t i )).

6:
Resampling: The same as the resampling in Algorithm 1 (change iδ to t i ); 7: end while

Main results
The goal of this paper is to construct computationally efficient algorithms to solve filtering problems for multiscale reaction network systems, i.e., calculating π N,γ c,t (φ) and π N,γ d,t i (φ).A straightforward idea to solve this problem is to use the particle filters that utilize the full dynamic model (1).However, the computational cost of simulating (1) is extremely expensive (see Remark 1), and, therefore, these particle filters are computationally inefficient.Consequently, we need to figure out a smarter way to approach this problem.
We first demonstrate that solutions to the filtering problems for original models can be constructed by the solutions to the filtering problems for the reduced model.Note that where Y N,γ c,0:t is the trajectory of Y N,γ c (•) from time 0 to t, and In this paper, we name these functions filtering maps, as they map the observations to the solutions of filtering problems.Based on these functions, we define artificial filters like the following which is constructed by plugging the observations of the full model into the filter maps of the reduced models.We can show that under some mild conditions, these artificial filters can be arbitrarily close to the exact filters of the original model as N goes to infinity.The rigorous statement of this result is presented as follows.
1. Assume that (4) and (6) are satisfied.Then, for and any bounded continuous function φ, there hold 2. Assume that conditions (4), ( 8), ( 9), ( 11), ( 12), (13), and ( 14) are satisfied, and, moreover, h(x) = h(Π 2 x) ∀x ∈ R n .Then, for any bounded continuous function φ such that φ(κ, Proof.The proof is shown in Appendix B. The above theorem indicates that the filtering problem for the original model can be transformed to the problem of computing artificial filters πN,γ 1 c,t (φ), πN,γ 1 d,t i (φ), πN,γ 2 c,t (φ), and πN,γ 2 d,t i (φ).Note that these artificial filters plug observations of the original models into filter maps of the reduced models.Therefore, one idea of calculating them is to construct particle filters where the observations of original models and the dynamics of the hybrid approximations are inserted.The corresponding particle filters are defined as follows.
• Let πN,γ M,d,t i (φ) ( = 1, 2) be the output of the particle filter Algorithm 2 where Y N,γ d (•), the dynamic model of X γ (•), and the joint distribution of K, lim N→∞ X N (0 Under some mild conditions, the above-defined particle filters converge to the corresponding artificial filters (see Theorem 2), and, therefore, can accurately approximate the true solutions to the filtering problems for the original model (see Theorem 3).Theorem 2. Let the SIR particle filters defined in Definition 1 utilize the residual resampling scheme in the resampling step.If the process X γ (•) ( ∈ {1, 2}) is almost surely nonexplosive, then for any bounded continuous function φ(κ, x), there hold where ct and dt i are time dependent scalars.
Proof.The convergence of discrete-time particle filters follows immediately from [12,Theorem 1].The convergence of continuous-time particle filters is shown in Appendix C.
Theorem 3. Let the SIR particle filters defined in Definition 1 utilize the residual resampling scheme in the resampling step.
1.If (4) and (6) are satisfied, then for any bounded continuous function φ(κ, x), there hold 2. Assume that conditions (4), ( 8), ( 9), ( 11), ( 12), (13), and ( 14) are satisfied, and, moreover, h(x) = h(Π 2 x) for all x ∈ R n .Then, for any bounded continuous function φ(κ, x) such that φ(κ, Proof.It follows immediately from Theorem 1 and Theorem 2. Theorem 3 states that errors between the proposed particle filters (see Definition 1) and the corresponding solutions to the target filtering problems are very likely to be small, provided a large scaling factor N and a large particle population size M. Therefore, the proposed particle filters can solve the filtering problems of the original model accurately.Apart from the accuracy, these SIR particle filters consume much less computational resources than the ones constructed by the full dynamical model, as the former ones only require the simulation of a reduced model in the sampling step, which is computationally much cheaper than the simulation of the full model (see Remark 1 and Remark 2).Both facts suggest that the particle filters proposed in Definition 1 based on hybrid approximations are efficient in solving the target filtering problems.
Compared with the model reduction technique via time-scale separations (Proposition 1 and Proposition 2), our main result for the filtering problem (Theorem 3) requires two more additional conditions, first that the function φ(•) should be bounded, and second that functions h(•) and φ(•) should be irrelevant to the fast subnetwork (for the second time scale only).The boundedness of φ(•) is a result of the boundedness condition required for the convergence of filters (see Theorem 1 and Theorem 2).This condition is a very common requirement in filtering theory, as it guarantees the true filter to be well-defined and simplifies analyses.However, it is not generally satisfied for biological applications, where concentrations of most target species have no theoretical upper bounds apart from some exceptions, e.g., gene copies.To tackle this problem, we can truncate the target quantity by a large number beyond which the conditional probability is comparatively low, and the tail event contributes little to the conditional expectation.Then, an estimate of the truncated quantity generated by a particle filter proposed in Definition 1 can provide an accurate approximation to the conditional expectation of the target quantity.
The other additional condition that h(•) and φ(•) are irrelevant to the fast subnetwork does not place many obstacles to applying the proposed particle filters because of the following reasons.First, most fluorescent reporters are large protein molecules which typically have slower dynamics than smaller molecules, such as enzymes and amino acids.Consequently, the function h(•), which represents the abundance of fluorescent reporters, is usually irrelevant to the fast fluctuating subnetwork.Meanwhile, researchers are inclined to estimate the state of large molecules, e.g., proteins or DNAs, rather than small molecules, as a result of which the function φ(•) is also independent of the fast subnetwork in most cases.
We conjecture that both of the additional conditions in Theorem 3 can be relaxed, and, therefore, the proposed method can cover a much broader class of multi-scale reaction systems.In the proof (see Appendix B), we frequently use the portmanteau lemma to show the convergence of expectations, which leads to the boundedness condition of φ.Therefore, by considering the convergence rate of the hybrid approximation (see [45]), it could be possible to extend the portmanteau lemma to work for unbounded functions and, thus, relax the boundedness condition of φ.On the other hand, the fast subnetwork state is very much shaped by the state of the slow subnetwork (see (11)).Therefore, it is reasonable to think that we can instantaneously get the knowledge of the fast subnetwork by inferring the slow one, which can potentially relax the constraint that h(•) and φ(•) are irrelevant to the fast subnetwork.However, to rigorously justify all these points, we need to have more elaborate analyses of the problem, because of which we leave it for future work.
We end this section with a simple algorithm that summarizes the procedures to estimate latent states of a multiscale reaction system using the proposed SIR particle filters.

Algorithm 3
The procedure to estimate latent states of a multi-scale reaction network system using SIR particle filters based on hybrid approximations 1: Describe the multi-scale reaction system in terms of scaling parameters α i -s and β j -s and choose the time scale of interest (see Section 2.2).Preferably, the chosen time scale is γ 1 or γ 2 .2: Construct a hybrid approximation of the target system using ( 5) or (10).3: Check whether the quantity of interest, φ(K, X N,γ (t)), has a theoretical bound.If not, truncate the function by a large number beyond which the system is not likely to reach.4: Based on the type of the observation and the chosen timescale, pick an appropriate particle filter in Definition 1 to solve the filtering problem.The performance of this filter is guaranteed by Theorem 3.

Numerical examples
In this section, we illustrate our approach using two numerical examples, a simple gene expression model and a transcriptional regulation network.Such gene expression models are prevalent in systems and synthetic biology and captures key biophysical phenomena observed in lab experiments.Since the burst kinetics in gene expression networks are highly gene specific [46], the identifiability is usually guaranteed in these systems.Meanwhile, a hybrid approximation based approach has already been successfully applied to the parameter inference problem for gene expression systems [47].Thus, it is natural to think that our approach that also utilizes hybrid approximations should be applicable to the filtering problem for these gene models.
All algorithms mentioned in this section are implemented on Matlab and executed on a high performance computing cluster with 15-core, 2.25GHz AMD EPYC 7742 processors.All code applied to perform the analysis is available at "github.com/ZhouFang92/Particle-filters-for-multi-scale-CRNs-v.1".We first consider a gene expression model (see Figure 1 (A)) that contains the basic patterns in the central dogma of molecular biology.This model is also known as a telegraph model, which has been widely used in the study of gene transcriptional dynamics; also, it can capture important biophysical phenomena (e.g., transcriptional bursting [46,40]).The model involves four species: a gene having on and off states (denoted respectively as S 2 and S 1 ), the mRNA it transcribes (denoted as S 3 ), and a fluorescent protein it expresses (denoted as S 4 ); besides, it has six reactions:

A simple gene expression model
For this system, we take N = 100, α 4 = 1, and α i = 0 for 1 ≤ i ≤ 3, i.e., the cellular system consists of hundreds of fluorescent protein molecules but very few copies of other molecules.The values of reaction constants and their scaling exponents are shown in Table 1.With this setting, we can easily calculate that γ 1 = 0, and the second time scale does not exist.For initial conditions, we assume X N 1 (0) to have a binary distribution with mean 1/3, X N 2 (0) to satisfy to have a Poisson distribution with mean 2, and X N 4 (0) to also have a Poisson distribution with mean 2. Also, we assume that all reaction constants and initial conditions except X 2 (0) are independent of each other.
Reaction constants (minute −1 ) Exponents Scaled rates (minute −1 ) Reaction scale Table 1.Scaling exponents for reaction rates of the gene expression model: U is the notation for the uniform distribution.
By (1), we can easily derive the full dynamical model at the fastest time scale γ 1 as follows.
In this example, we also assume that light intensity signals can be observed continuously or discretely from a microscope, satisfying (2) or (3) where h(x) = (10 × x 4 ) ∧ 10 3 with 10 3 being the measurement range, B(t) is a Brownian motion, t i = 2i (i.e., discrete-time observations come every two minutes), and {W(t i )} i∈N >0 is a sequence of mutually independent standard Gaussian random variables.the true parameter so that all the true parameters are rescaled to 1.The rest of the panels compare performances of different particle filters in estimating dynamic states, where black lines are the true values of the underlying system, red lines are the estimates by the particle filter using the full dynamic model, and the blue lines are the estimates by the particle filter using the reduced dynamic model.Specifically, (C), (E), and (G) show the estimates of the conditional means of Activated DNAs, mRNAs, and fluorescent proteins, respectively; (D), (F), and (H) present the estimates of the conditional standard deviations of these quantities.
In this numerical example, we first randomly chose a set of system parameters, simulated the system for 90 minutes, and generated observations for both continuous-time and discrete-time scenarios.Then, based on these observations, we used particle filters that applies the hybrid approximations to infer dynamic states and reaction constants.Throughout this experiment, we set particle population to be 100,000.Meanwhile, we took the particle filter that applies the full dynamic model as a benchmark.
Numerical simulation results are presented in Figure 1, Figure 2, and Figure 3. From Figure 1, we see that particle filters applying reduced models is much faster than particle filters applying full dynamical models.Moreover, both kinds of filters provide very consistent estimates to the dynamical states in both continuous-time and discretetime scenarios (see Figure 2 and Figure 3); their relative L 1 distances in estimating different parts of the system are summarized in Table 2. Besides, we also observe from Figure 2 and Figure 3 that both particle filters accurately track the trajectories of the mRNA and protein, and they also successfully identify the change of the gene state after a short time period.The relatively slow convergence of the estimate to the gene state is attributed to the long time delay between the DNA dynamics and the observation processes, which makes the observation less informative about the DNA dynamics.Particularly, in the beginning, the filters are not certain about the gene state until they identify the Fig. 3. Simulation results for the simple gene expression model with discrete-time observations.The meaning of each panel is the same as the corresponding one in Figure 2, except that panel (A) draws the raw data of the observation instead of the accumulative observation signal.
growth of mRNA copies at about the sixth minute, which suggests that the gene has already been activated.This delay also occurs at the final time when the gene is deactivated -the filters can not be sure of the gene state until they identify a sharp decrease of the mRNA copies at a time point when the gene has been shut down for a while.
Moreover, both filters also provide consistent and accurate estimates to model parameters (see Figure 2(B) and Figure 3(B)).Despite this success, in these filters, we still observe a well-known phenomenon, called sample degeneracy, that the particle diversity in parameters drops quickly over time (see the second row of Figure 4).This degeneracy usually has adverse effects on parameter inference.Fortunately, in this case study, there are still hundreds of distinguishable particles left at the final time.Therefore, sample degeneracy only slightly affects the estimate to the posterior of model parameters (see the first row of Figure 4) and does not affect their mean estimates very much (see Figure 2(B) and Figure 3(B)).We leave the problem of how to combat sample degeneracy for future work.
Finally, we investigate the performance of our filter when the observation noise is non-Gaussian.In real experiments, though the scale of observation noise can be reliably quantified, identifying its probability distribution can be difficult.When the Gaussian-noise assumption is violated, the weight updating step which utilizes the Gaussian likelihood might be problematic and introduce extra errors to the estimate.To consider this factor, we replaced Gaussian noise with several other types of noise (shown in Table 3) and then tested our filter (which applies the reduced model and updates weights by the Gaussian likelihood) in these settings.In these numerical experiments, we restricted ourselves to the discrete-time observation case and took the particle filter applying the full model and updating weights The first two columns show the results for the continuous-time observation case, and the last two columns show the results for the discrete-time observation case.The first row shows particle distributions in k 3 -k 4 plane at the final time, where the black dot shows the true value, and the colored dots show the particles from the filters.The transparency of a colored dot indicates the number of particles located in this specific site; the more transparent it is, the fewer particles are located.The second row shows the time evolution of the particle diversity in parameters (i.e., the number of distinguishable particles for estimating model parameters).This figure tells that for both observation types and both particle filters, the particle diversity in parameters decays dramatically over time.Finally, hundreds of distinguishable particles are left, and their distributions are not very consistent.

Activated
. The results show that these two kinds of particle filters provide very consistent estimate to the dynamical states with relative errors no greater than 8%.
by the exact distribution of observation noise as a benchmark.Numerical results are presented in Table 3, which show the relative L 1 distance between these two filters.Note that the observation noise directly affects our belief about on the fluorescent proteins -the stronger the noise intensity is, the larger conditional variance will be.The shape of the observation noise also matters.Therefore, it is not surprising to see that the non-Gaussian noise mainly affects the performance of our filter in estimating the conditional standard deviation of fluorescent proteins (see Table 3).However, our filter still provides accurate estimate for the mean dynamics of proteins with relative errors no greater than 4%.Moreover, our filter is still consistent with the benchmark in estimating the DNA and mRNAs with relative errors no greater than 11%.Notably, this consistency between our filter and the benchmark holds for both white noise (the t-distribution and Laplace distribution) and colored noise (the log-normal distribution).To conclude, our method is still accurate for this gene expression model when the Gaussian-noise assumption is violated.
In summary, our particle filter applying the reduced model is both accurate and computationally efficient in solving filtering problems for this multiscale gene expression model.Moreover, when the Gaussian-noise assumption is violated, our filter can still provide reliable estimates to dynamical states regardless of whether the observation noise is white or colored.

A transcription regulation network
We then consider a transcription regulation network (called Goutsias' model [48]) where the gene product affects its own gene activation process (see Figure 5 (A)).Such an autoregulation mechanism (where the gene product auto-regulates its own gene expression) is very common in living cells, which occurs over 40% of known E. coli's transcription factors [49].Moreover, it has various biological functions, including speeding up the response time of gene expression [50], inducing genetic oscillators [51], and achieving adaptation to periodic external stimuli [52]   model considered in this subsection involves five species: the protein monomer (denoted by S 1 ), the dimer (also the transcription factor, denoted by S 2 ), the mRNA (S 3 ), the unbound DNA (S 4 ), and the bound DNA (also the activated state, denoted by S 5 ); moreover, it has eight reactions: Reaction constants (minute −1 ) Exponents Scaled rates (minute −1 ) Reaction scale Table 4. Scaling exponents for reaction rates of the gene expression model: U is the notation for the uniform distribution We take N = 100, α 1 = α 2 = 1 and α i = 0 for 3 ≤ i ≤ 5; in other words, the cellular system consists of hundreds of protein molecules but very few copies of other molecules.The values of reaction constants and their scaling exponents are shown in Table 4.With this setting, we can easily check that Also, it is easy to check that the operator L N,γ 1 K,y for y ∈ L 2 admits a unique stationary distribution where Therefore, by (10), the reduced model at the second time scale satisfies Let S 1 and S 2 be both fluorescent reports, and light intensity signals satisfy (2) or (3) where h(x) = (x 1 + 2x 2 )∧10 3 with 10 3 being the measurement range, B(t) is a Brownian motion, t i = 2i (i.e., discrete-time observations come every 200 seconds or 3.33 minutes), and {W(t i )} i∈N >0 is a sequence of mutually independent standard Gaussian random variables.
In this example, we also randomly chose a set of system parameters, simulate the system for 2.5 hours (9000 seconds), and generate observations for both continuous-time and discrete-time scenarios.Then, we use SIR particle filters that apply the reduced model at the second time scale to infer non-fast fluctuating dynamic states and reaction constants K. Throughout this numerical example, we set the population of particles to be 10,000.Finally, we take the particle filter that applies the full dynamic model as a benchmark.
Numerical simulation results are presented in Figure 5, Figure 6, and Figure 7. From Figure 5, we can observe that particle filters applying the reduced dynamics consumes far less computational time than particle filters applying the full dynamic model.Meanwhile, we can learn from Figure 6 and Figure 7 that both particle filters can follow the trend of true dynamic states, and their trajectories almost merge together (whose relative L 1 distances are presented in Table 5).These observations imply that these filters perform quite the same in estimating hidden dynamic states.Moreover, from Figure 6(B), and Figure 7 . The results show that these two kinds of particle filters provide very consistent estimate to the dynamical states with relative errors no greater than 9%.
estimates to model parameters.However, sample degeneracy still occurs in this case, which makes the estimated posterior of model parameters not very consistent (see Figure 8).
Finally, we investigate the performance of our filters when the observation noise is non-Gaussian.Similar to the previous case study, we restricted ourselves to the discrete-time observation case and replaced the Gaussian noise with three other types of noise (shown in the first column of Table 6).In these numerical experiments, we compared our filter that uses the reduced model and updates weights by the Gaussian likelihood with a benchmark filter, which uses the full dynamical model and updates weight by the exact distribution of observation noise.Their relative L 1 distances are presented in Table 6.From the result, we observe that in this case, our method still provides a relatively accurate estimate to the DNA state and mRNAs with relative L 1 errors less than 16%.The non-Gaussian noise mainly affects the estimation of the conditional standard deviation of fluorescent proteins; however, our filter still provides accurate estimates to the conditional mean of proteins with relative errors no greater than 4%.To conclude, when the Gaussian noise assumption does not hold, our filter is still reliable for this transcription regulation network, and this result holds for both white noise (the t-distribution and Laplace distribution) and colored noise (the log-normal distribution).In summary, our particle filter applying the reduced model is both accurate and computationally efficient in solving filtering problems for this transcription regulation model.Moreover, when the Gaussian-noise assumption does not hold, our filter can still provide reliable estimates to dynamical states no matter whether the observation noise is white or colored.the true parameter so that all the true parameters are rescaled to 1.The rest of the panels compare performances of different particle filters in estimating dynamic states, where black lines are the true values of the underlying system (the panel (G) shows 1 minute moving average of the true value), red lines are the estimates by the particle filter using the full dynamic model, and the blue lines are the estimates by the particle filter using the reduced dynamic model.Specifically, (C), (E), and (G) show the estimates of the conditional means of the total mass of proteins (S 1 + 2S 2 ), mRNAs, and the activated Genes, respectively; (D), (F), and (H) present the estimates of the conditional standard deviations of these quantities.

Conclusion
In this paper, we established efficient particle filters to solve filtering problems for multiscale stochastic reaction network systems by using the time-scale separation technique.We first showed that the solution of the filtering problem for the original system can be accurately approximated by the solution of the filtering problem for a reduced model that represents the dynamics by a hybrid approximation (see theorem 1).The reduced model is based on exploiting the time-scale separation in the original network and can greatly reduce the required computational effort to simulate the dynamics.Consequently, these results enabled us to develop efficient particle filters to solve the  7. Simulation results for the transcription regulation network with discrete-time observations.The meaning of each panel is the same as the corresponding one in Figure 6, except that panel (A) draws the raw data of the observation instead of the accumulative observation signal.
original filtering problem by applying the particle filter to the reduced model (see theorem 3).Finally, we used two numerical examples to illustrate our approach.Both examples show that the constructed filter can accurately and computationally efficiently solve the corresponding filtering problems and, therefore, imply that our approach can improve scientists' ability to extract dynamic information about intracellular systems from time-course experiments.
There are a few topics deserving further investigation in future work.First, it is worthwhile to further improve the performance of particle filters in estimating system parameters.From numerical experiments, we observed that the SIR particle filter is less accurate in estimating the posterior of system parameters due to sample degeneracy.A standard method to mitigate this problem is to use regularized particle filters instead of SIR ones [53].Usually, to guarantee the performance of a regularized particle filter, one needs to show some regularity conditions of the transition kernel of the dynamic model (see [11,Lemma 2.38]), which requires researchers to do more elaborate analyses of the underlying system.As our first attempt to solve this problem, we provided in a follow-up paper [54] several regularity conditions for SRNs using parameter sensitivity analysis and proved the convergence of regularized particle filters under these conditions.Another direction for future research is to utilize adaptive model reduction algorithms (e.g., [30]) to speed up the particle filters further.It will be very helpful to systems where magnitudes of concentration levels vary over time, such as toggle switches [55] and repressilator [56].Finally, one can also implement our algorithms on a Cyberloop platform [40] and apply them to biological studies, e.g., identifying intracellular reaction mechanisms and guiding cell differentiation.In this scenario, one need to deal with hundreds of cellular systems un- The first two columns show the results for the continuoustime observation case, and the last two columns show the results for the discrete-time observation case.The first row shows particle distributions in k 3 -k 4 plane at the final time, where the black dot shows the true value, and the colored dots show the particles from the filters.The transparency of a colored dot indicates the number of particles located in this specific site; the more transparent it is, the fewer particles are located.The second row shows the time evolution of the particle diversity in parameters (i.e., the number of distinguishable particles for estimating model parameters).This figure tells that for both observation types and both particle filters, the particle diversity in parameters decays dramatically over time.Finally, about a hundred distinguishable particles are left, and their distributions are not very consistent.For instance, (C) has very few particles in the far north, whereas (D) has many particles in that region.6. Performance of our filter when Gaussian-noise assumption is violated (for the transcription regulation model).This table shows the relative L 1 distance between our filter and the benchmark, when the observation noise has non-Gaussian distributions shown in the first column.In these experiments, our filter utilizes the reduced model and updates weights as if the noise is Gaussian; in contrast, the benchmark filter utilizes the original model and updates weights according to the exact distribution of the observation noise.

Bound
der the microscope and, therefore, solve hundreds of filtering problems simultaneously, which requires the algorithm to be parallelized efficiently.
Appendix A. Change of measure methods for reduced models In this section, we introduce reference probabilities and Kallianpur-Striebel formulas for the reduced models, which will be helpful in later analysis.

Appendix A.1. Continuous-time observations
We first consider the scenario where observations are continuous with respect to time.For reduced models ( 5) and (10), we define Girsanov's random variables, Z γ c (t) ( = 1, 2), like the following whose reciprocals are martingales with respect to F t under P [6, Lemma 3.9].Then, reference probabilities for reduced models ( 5) and (10) are respectively defined by Under the reference probability P γ c ( = 1, 2), the observation Y γ c (•) is independent of the underlying system (K, X γ ) and becomes an m-vector of independent standard Brownian motions [6,Proposition 3.13].For these reduced models, Kallianpur-Striebel formulas are expressed as [6,Proposition 3.16] (also see [44])  Then, we consider the case where the observations are discrete with respect to time.For reduced models ( 5) and ( 10), we define random variables where H(x, y) = exp h (x)y − 1 2 h(x) 2 , and whose reciprocals are martingales under P. 6 Hence, the reference probabilities for reduced models ( 5) and ( 10) can be respectively defined by Under the reference probability P γ d ( = 1, 2), the observations Y γ d (t i ) are independent of the underlying system (K, X γ (•)) and are mutually independent m-variate Gaussian random variables whose coefficient matrices are the identity matrix. 7Then, for these reduced models, Kallianpur-Striebel formulas are expressed as where the equality follows from (A.1.1).Finally, the last line in the above formula tends to zero by (A. and Kγ 1 , Xγ 1 (•) have the same laws as K, X N,γ 1 (•) and (K, X γ 1 (•)), respectively, and 2. If conditions (4), ( 8), ( 9), ( 11), ( 12), (13), and (14) are satisfied, then for any t > 0, there exist R r >0 × D R n [0, t]-valued random variables K N,γ  • For = 1 (the first time scale), assume that (4) and (6) hold.Then, for any t > 0 and any bounded continuous function φ, the random variables and probability space defined in Appendix B.2 satisfy (A.1.3)and (A.2.3).
• For = 2 (the second time scale), assume that the conditions of the second part of Theorem 1 hold.Then, for any t > 0 and any bounded continuous function φ such that φ(κ,

Appendix B.4. The proof of the target theorem
With these preparations, we can finally prove Theorem 1.

Appendix C. The proof of the continuous-time part of Theorem 2
This section contributes to showing the convergence of the proposed continuous-time SIR particle filters, i.e., the continuous-time part of Theorem 2. For diffusion systems, the convergence of continuous-time SIR particle filters is shown in [6, Chapter 9] by using backward Zakai's equation.In this section, we use a similar proof scheme to extend the result to piecewise deterministic Markov processes.Though we still follow the main idea of backward stochastic differential equations, we present the proof in the forward fashion so that notations can be simplified, and the discussion about the existence of the solution of the backward stochastic differential equation can be avoided.Throughout this section, we assume that there hold the non-explosive condition (6) for the first time scale and ( 14) for the second time scale.Also, the scaling factor N is fixed in this section, so we drop the argument N for most variables that have a dependence on N to simplify the notation.
In general, the proof follows from the law of large numbers.We first show the filter generated by these particles can accurately approximate the true filter under some mild conditions (see Theorem 5), which require • the particle filter to accurately approximate the true filter at the initial time (C.C.1), • the error between the particle filter and the true one to grow mildly over time (C.C.2), • the resampling to perturb the particles mildly (C.C.3).
Fortunately, these conditions can be easily verified by the law of large numbers and the nature of the residual resampling method, which finally proves the result (see the proof of Theorem 3 at the end of this section).
Notably, when analyzing the propagation of the particle filter's error over time (i.e., (C.C.2)), we need to use particles obtained at an earlier time to predict the particle filter's performance at a later time.The particle filter at the later time contains some information about the observation that is unavailable at the earlier time.As a result, we need to extend the capability of the particle filter so that it can generate an estimate of a random variable that is measurable with respect to a σ-algebra at a later time.This fact leads to some complicated notations, which we introduce as follows.
We first present a theorem adapted from [9].
Theorem 5 (adapted from [9]).We assume that X γ ( ∈ {1, 2}) is almost surely non-explosive.Then, for any t > 0 and any bounded measurable function φ, there holds the relation where ct is a time dependent number, if the following conditions are satisfied.where c 3 is a constant.
Proof.The proof is almost the same as the one of [9,Theorem 2.3.1];therefore, we only single out the proof framework and leave the details for readers.In principle, the proof is shown by mathematical induction.Specifically, the condition (C.C.1) guarantees the particle filter to accurately approximate the true filter at the initial time.The condition (C.C.2) suggests the error between the particle filter and the true filter to grow mildly over time.The condition (C.C.3) suggests that resampling perturbs the particles mildly so that the particle filter's accuracy will not be influenced too much.Notably, compared with [9, Theorem 2.3.1],this theorem considers the boundedness of ψ t,τ in the sense of (C.1) rather than the supremum norm.Consequently, all the discussions concerning boundedness in the supremum norm in the proof of [9,Theorem 2.3.1] are replaced by the argument of (C.1).
Finally, by verifying all the conditions in the above theorem, we can prove the continuous-time part of Theorem 2.
The proof of the continuous-time part of Theorem 2. To prove the result, we need to verify all the conditions in Theorem 5.The condition (C.C.1) follows immediately from the central limit theorem.where the equality is due to the independence of the motion of the particles, and the inequality follows from (C.7).
Finally, if we use the multinomial resampling, then we can have the relation

Fig. 1 .
Fig. 1. (A) shows the biological circuit of the considered network.(B) shows the computational time consumed by different particle filters.The filters were run on a 2.25GHz AMD processors with 15 cores.

Fig. 2 .
Fig. 2. Simulation results for the simple gene expression model with continuous-time observations: Panel (A) shows the accumulative observation signal Y N,γ c (t), which is the integral of the time-course measurement from time 0 to time t.Panel (B) shows the performance of both filters in estimating reaction constants, where error bars represent 95% confidence intervals.In this panel, we draw the relative values of these estimates the estimate

Fig. 4 .
Fig.4.Particle distributions in the simple gene expression example.The first two columns show the results for the continuous-time observation case, and the last two columns show the results for the discrete-time observation case.The first row shows particle distributions in k 3 -k 4 plane at the final time, where the black dot shows the true value, and the colored dots show the particles from the filters.The transparency of a colored dot indicates the number of particles located in this specific site; the more transparent it is, the fewer particles are located.The second row shows the time evolution of the particle diversity in parameters (i.e., the number of distinguishable particles for estimating model parameters).This figure tells that for both observation types and both particle filters, the particle diversity in parameters decays dramatically over time.Finally, hundreds of distinguishable particles are left, and their distributions are not very consistent.

Fig. 5 .
Fig. 5. (A) shows the biological circuit of the considered network.(B) shows the computational time consumed by different particle filters.Notably, y-axis tick values grow exponentially rather than linearly.The filters were run on a 2.25GHz AMD processors with 15 cores.

Fig. 6 .
Fig. 6.Simulation results for the transcription regulation network with continuous-time observations: Panel (A) shows the accumulative observation signal Y N,γ c (t), which is the integral of the time-course measurement from time 0 to time t.Panel (B) shows the performance of both filters in estimating reaction constants, where error bars represent 95% confidence intervals.Here we draw the relative values of these estimates the estimate Fig.7.Simulation results for the transcription regulation network with discrete-time observations.The meaning of each panel is the same as the corresponding one in Figure6, except that panel (A) draws the raw data of the observation instead of the accumulative observation signal.

,Fig. 8 .
Fig.8.Particle distributions in the transcription regulation network example.The first two columns show the results for the continuoustime observation case, and the last two columns show the results for the discrete-time observation case.The first row shows particle distributions in k 3 -k 4 plane at the final time, where the black dot shows the true value, and the colored dots show the particles from the filters.The transparency of a colored dot indicates the number of particles located in this specific site; the more transparent it is, the fewer particles are located.The second row shows the time evolution of the particle diversity in parameters (i.e., the number of distinguishable particles for estimating model parameters).This figure tells that for both observation types and both particle filters, the particle diversity in parameters decays dramatically over time.Finally, about a hundred distinguishable particles are left, and their distributions are not very consistent.For instance, (C) has very few particles in the far north, whereas (D) has many particles in that region.

Table 2
. Relative L 1 distance between both kinds of particle filters in estimating the simple gene expression system.The relative L 1 distance between two signals f 1 (t) and f 2 (t) is defined by

Table 3 .
Performance of our filter when Gaussian-noise assumption is violated (for the simple gene expression model).This table shows the relative L 1 distance between our filter and the benchmark, when the observation noise has non-Gaussian distributions shown in the first column.In these experiments, our filter utilizes the reduced model and updates weights as if the noise is Gaussian; in contrast, the benchmark filter utilizes the original model and updates weights according to the exact distribution of the observation noise.
(B), we can observe that both filters also provide consistent and accurate

Table 5 .
Relative L 1 distance between both kinds of particle filters in estimating the transcription regulation network.The relative L 1 distance between two signals f 1 (t) and f 2 (t) is defined by The proofs of these two results are in the same spirit.Therefore, we only show the proof of the first result.Let us construct a probability measure Pγ t on the measurable space Ωγ t , F γ Appendix B.2. Constructing auxiliary random variablesThis subsection contributes to constructing random variables and probability spaces that satisfy (A.1.1),(A.1.2),(A.2.1) and (A.2.2).We first apply Skorokhod's representation theorem to construct the required random variables.If (4) and (6) hold, then for any t > 0, there exist R r >0 2 , XN,γ 2 (•) and Kγ 2 , Xγ 2 (•) defined on a common probability space Ωγ 2 t,1 , F γ 2 t,1 , Qγ 2 t,1 , such that K N,γ 2 , XN,γ 2 (•) and Kγ 2 , Xγ 2 (•) have the same laws as K, X N,γ 2 (•) and (K, X γ 2 (•)), [57,erefore, by Skorokhod's representation theorem[57, Theorem 1.8in Chapter 3], the result is proven.Qγ t -a.s. for any t > 0, = 1, 2, and any continuous function φ given in the lemma.In the above formulas, Ỹγ c,t is the filtration generated by the process { Ỹγ c (t)} t>0 .Therefore, by Jensen's inequality and the triangle inequality, we can further arrive at N,γ , XN,γ (t) − φ Kγ , Xγ (t) where φ ∞ is the supremum norm of φ, and the measure Pγ t is defined by (t).Note that Pγ t and Qγ t are equivalent due to the almost sure positiveness of Zγ c (t).Finally, terms on the right hand side of the second inequality converge to zero by Lemma 4, Lemma 1, and the dominant convergence theorem.By Proposition 3, we can verify conditions (A.1.3)and (A.2.3).