|
Academic Open Internet Journal |
Volume 15, 2005 |
STOCHASTIC APPROXIMATION FOR ESTIMATION
OF BIOLOGICAL MODELS
Trakian University – St. Zagora, Yambol
BULGARIA
Abtract: Stochastic approximation for estimation (SAE) is a class of optimization algorithms which compute an approximation of the gradient and/or the Hessian of the objective function by varying all the elements of the parameter vector simultaneously and therefore, require only a few objective function evaluations to obtain first or second-order information. Consequently, these algorithms are particularly well suited to problems involving a large number of design parameters. In this study, their potentialities are assessed in the context of nonlinear (NN) system identification. To this aim, a challenging modeling application is considered, i.e. dynamic modeling of batch animal cell cultures from sets of experimental data. The performance of the optimization algorithms are discussed in terms of efficiency, accuracy and ease of use.
Key words: - nonlinear system identification; stochastic approximation; biotechnology
1. INTRODUCTION
The process of modeling requires an estimation of several unknown parameters from noisy measurement data. To this aim, a least-squares or maximum –likelihood cost function (depending on the assumptions of the measurement noise) is usually minimized using a gradient-based optimization method.
Several techniques for computing the gradient of the cost function are available, including finite difference approximation and analytic differentiation. This latter technique leads to backpropagation in neural networks or several sensitivity equations in the case of conventional first-principles models.
In the above-mentioned techniques, the computational expense required to estimate the current gradient direction is directly proportional to the number of unknown model parameters, which becomes an issue for models involving a large number of parameters. This is typically the case in nonlinear (NN) modeling, but can also occur when estimating parameters and initial conditions in first-principles models.
In contrast to standard finite differences which approximate the gradient by varying the parameters one at a time, the simultaneous perturbation (SP) approximation of the gradient proposed by Spall [5] makes use of a very efficient technique based on a simultaneous (random) perturbation in all the parameters. Hence, one gradient evaluation requires only two evaluations of the cost function. This approach has first been applied to gradient estimation in a first-order stochastic approximation (SA) algorithm [5], and more recently to Hessian estimation in an accelerated second-order stochastic approximation for estimation (SAE) algorithm [6].
In previous works [3, 7], the authors applied the above–mentioned first- and second- order SA algorithms (1SAE and 2SAE) to weights and biases estimation in NNs, and proposed several variations of the 1SAE algorithm. These simulation studies were limited to relatively simple examples, but demonstrated the efficiency and modest computational costs of 1SAE. The objective of this paper is to extend these studies by evaluating:
· variants of 1SAE/2SAE algorithms, in which scaling of the gradient/Hessian estimates is introduced to avoid potential large variations in the course of the optimization process;
· the performance of first- and second- order algorithms as applied to a challenging parameter estimation problem, namely identification of unknown parameters in a macroscopic model of batch animal cell cultures from experimental measurements of biomass, glucose, glutamine and lactate concentrations.
This paper is organized as follows. Section 2 introduces the basic principles of the first- and second- order SA algorithms used throughout this study. In section 3, the algorithms are applied to the maximum likelihood estimation of kinetic parameters and initial conditions of a bioprocess model from experimental measurements of several macroscopic component concentrations. Direct and cross-validation results demonstrate the good model agreement. Finally, section 4 is devoted to discussions and concluding remarks.
2. SAE ALGORITMS
Consider the problem of minimizing a, possibly noisy, objective
function
with
respect to a vector
of unknown parameters.
1
SAE is given by the following core recursion for the parameter vector
[3; 7]. This
is shown by equation:
(1)
,
in which
is a non-negative scalar gain
coefficient, and
is an approximation of the criterion
gradient obtained by varying all the elements of
simultaneously, i.e.
(2)
,
where:
is a positive scalar and
with
symmetrically Bernouilli distributed random variables
.
In its original formulation, 1SAE makes use
of decaying gain sequences
and
in the form
(3)
,
,
which ensure asymptotic convergence results. However performance in finite samples can be different, and numerical experiments suggest that an adaptive gain sequence for parameter updating [1, 3, 5] can enhance convergence and stability (this is particularly true when solving a non convex parameter identification problem), i.e.
(4)
,
(5)
.
In addition to gain attenuation when the
value of the criterion becomes worse, “locking” mechanisms [5, 6] are also
applied , i.e. the current step is rejected and, starting from the previous
parameter estimate, a new step is accomplished (with a new gradient evaluation
and a reduced updating gain). The parameter
in equation (4) and (5)
represents permissible increase in the criterion, before step rejection and
gain attenuation occur.
A constant gain sequence
can be used for gradient
approximation, the value
being selected so as to overcome the
influence of (numerical or experimental) noise. In the optimum neighborhood,
however, a decaying sequence in the form (3) is required to evaluate the
gradient with enough accuracy and avoid an amplification of the “slowing down”
effect as an optimum is approached (note that this phenomenon is even more
pronounced in the case of SP techniques since the gradient information is more
delicate to “extract” in the – usually rather “flat” – neighborhood of the
optimum).
Finally, a gradient smoothing (GS) procedure is implemented, i.e., gradient approximations are averaged across iterations in the following way
(6)
,
where
is decreased in a way similar to
equation (4) and (5) when step rejection occurs (i.e.
with
) and is
reset to its initial value
after a successful step.
The use of these numerical artifices, i.e. adaptive gain sequences, step rejection procedure and gradient smoothing, significantly improves the effective practical performance of the algorithm (which, in the following, is denoted “adaptive1SP-GS”) [1, 2, 4].
As relatively large excursions in the
parameter space can be achieved, convergence can also be enhanced through
scaling of the gradient estimate (2) at each iteration. This new feature is
implemented here by normalizing each direction of the gradient vector
with respect
to its largest component (infinity norm scaling)
(7)
.
This latter version is denoted 1SP - GSS (Gradient Smoothing and Scaling).
Inequality constraints can also be taken into account by a projection algorithm introduced in [4], i.e. the current parameter estimate is projected onto a closed set included in the admissible region in such a way that no function evaluation is required outside this latter region. In this study, bound constraints (e.g., positivity constraints) are handled in this way.
The second-order algorithms 2SAE are based
on the following two core recursions, one for the parameter vector
, the second
for the Hessian
of the criterion [6]:
(8)
,
(9)
,
(10)
,
where:
is a per-iteration symmetric estimate of
the Hessian matrix, which is computed from gradient approximations (or direct
evaluations) using a simultaneous perturbation approach,
is a simple sample
mean, and
is
a mapping designed to cope with possible non-positive –definiteness of
.
Again, the algorithm requires only a small number of function evaluations – at least four criterion evaluations to contract the gradient and Hessian estimates – independent of the number of unknown parameters.
Several variants of the mapping
have been
considered in the literature:
a) regularization through addition of a diagonal perturbation matrix with small positive elements [5];
b)
a more elaborate
regularization technique recently proposed in which the eigenvalue matrix
of
is first
“corrected”, i.e. negative elements are replaced by a descending series of
small positive eigenvalues, and a new matrix
is defined. Then, the
orthogonal matrix
of eigenvectors is used to define the
mapping
;
c)
a simplified version of
the preceding approach in which the ”corrected” eigenvalue matrix
is replaced
by a constant diagonal matrix defined by the geometric mean of all the
eigenvalues.
Mapping (a) is easy to implement, but
relatively delicate to tune in practical situations (selection of the elements
of the perturbation matrix). Mappings are potentially more efficient, but more
complex to implement. In addition, some tuning is still required (to select the
small positive eigenvalues that are substituted to the negative elements of
). In this
study, a simple, tuning-free, Hessian estimate is considered. Following an idea
originally introduced in [2], a diagonal approximation of the Hessian is
built,
(11)
,
where the notation indicates a component wise division of two vectors (in analogy with Matlab programming).
The gradients
are obtained by one sided
approximation (in order to limit the number of function evaluations)
(12)
,
where
is a positive scalar (the sequence
can be chosen
in similar way as
, e.g. equation (3)) and
with
symmetrically Bernouilli distributed random variables
(independent of
in equation
(3)).
In the same spirit as Eq. (7), an infinity-norm scaling is introduced, i.e.
(13)
,
(14)
,
where:
is a regularization in which the absolute
value of each of the (diagonal) elements of
is computed and
represents the
largest of these elements.
This latter algorithm is denoted “adaptive
2SP-DHS” (
-
order Simultaneous Perturbation algorithm with Diagonal Hessian estimation and
Scaling).
3. MODELING OF ANIMAL CELL CULTURES
In [8] the authors propose a deterministic model of growth with constraints:
(15)
,
where:
is
the intensity
of the growth of the cellular number, and
is
a coefficient with the sense of specific number growth.
The solution of this equation gives on exponential growth of the number of cells in the system:
(16)
.
Modifications of the proposed
equation are possible. The restricting factors on the process dynamics are
usually accounted by variation of the coefficient
. If with
is denoted the degree
of influence of a given restricting factor, this equation can be presented as:
(17)
.
Consider batch animal cell cultures described by a simple macroscopic reaction scheme growth [6]:
(18)
,
and scheme maintenance:
(19)
,
where:
,
,
and
represent biomass, glucose,
glutamine and lactate, respectively, and
,
and
are pseudo-stoechiometric
coefficients. The symbol “
” means that the growth reaction is auto-catalyzed
by X and the presence of “
” in both sides of the maintenance
reaction means that
catalyzes this latter reaction.
The growth rate
and the maintenance rate
are described
by a general kinetic model structure proposed in [6]:
(20)
,
(21)
.
Simple mass balances allow the following dynamic model to be derived:
(22)
,
,
(23)
,
,
(24)
,
,
(25)
,
,
where:
and
denote the respective component
concentrations.
Identification of bioprocess models is a delicate task and in [6], a systematic procedure is proposed, which allows the pseudo-stoechiometric coefficients to be estimated independently of the kinetic coefficients by minimizing a maximum-likelihood criterion. This procedure also considers the estimation of the most likely initial conditions (since the concentration measurement are corrupted by noisy at each sampling time, including the initial one).
In this study, it is assumed that the pseudostoechiometric coefficients have already been estimated following the above-mentioned procedure and that only the kinetic coefficients and the initial component concentrations have to be inferred from rare and asynchronous measurements of biomas, glucose, glutamine and lactate concentrations.
The measurement equation is given by
(26)
,
,
where:
,
and
are the state,
measurement and noise vectors at time
, respectively. The measurement errors are
assumed to be normally distributed, white noises with zero mean and variance
matrix
.
Data are collected from seven batch experiments corresponding to different initial glucose and glutamine concentrations. Five of these experiments are used for parameter estimation, the two remaining ones being used for cross-validation tests.
The 28 unknown parameters (8 kinetic coefficients and 20 initial concentrations) are estimated by minimizing a maximum likelihood cost function taking into account the measurement noises, i.e.
(27)
,
where
and
are the measurement vector, the
measurement error covariance matrix and the state estimate obtained by
integration of the model equations (21-24) with the parameters
at time
,
respectively.
The tuning parameters of 1SP-GS are selected
as follows:
(a very slowly decaying sequence
is used for
gradient evaluation),
(no relative increase in the criterion is
allowed),
.
For 1SP-GSS, the same parameters are used, expert
. Starting with the measured
initial concentrations (which are effected by measurement errors) and in
initial guess for the kinetic parameters corresponding to a criterian value
, the
minimization problem (26) is repeated 10 times with both algorithms.
4. CONCLUSION
As a conclusion it could be done the following inferences:
1. The simultaneous perturbation approach developed by Spall [3; 5] is a very powerful technique, which allows an approximation of the gradient of the objective function to be computed by effecting simultaneous random perturbations in all the parameters.
2. Therefore, this approach is particularly well-suited to problems involving a relatively large number of design parameters.
3. In this study, variants of first- and second- order SP algorithms are considered and applied to the identification of the kinetic parameters and the initial conditions of a bioprocess model from experimental measurements of a few macroscopic components.
[1]. Renotte C., A. Vande Wouwer, M. Remy. Neural Modeling and Control of a Heat Exchanger based on SPSA Techniques. Proceedings of the Workshop “ACC”, 2000.
[2]. Wouwer A. Vande, C. Renotte, M. Remy. On the Use of Simultaneous Perturbation Stochastic Approximation for Neural Network Training. Proceedings of the Workshop “ACC”, 1999.
[3]. Spall J. Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation. IEEE Trans. Automat. Contr. 45, 2000.
[4]. Sadegh P. Constrained Optimization via Stochastic Approximation with a Simultaneous Perturbation Gradient Approximation. Automatica 33 (1997).
[5]. Spall J. Adaptive Stochastic Approximation by the Simultaneous Perturbation Method. IEEE Trans. Automat. Contr. 45, 2000.
[6]. Bogaerts Ph., R. Hanus. Macroscopic Modeling of Bioprocesses with a View to Engineering Applications. In: Focus on Biotechnology, Vol. 4, Kluver, 2000.
[7]. Petrov N. Use Reliability of Risk Technical Systems. Tracian University, St. Zagora, Yambol, “Uchkov”, ISBN 954-9978-26-5, 2002.
[8]. Kalchev I., S. Yordanova. Biometric ecomonitoring: mathematical models, estimates and predictions, CPS”96, Volume 1, Bratislava, Slovak Republic, 14-15 May 1996, pp. 480-483.
Technical College - Bourgas,
All rights reserved,
© March, 2000