Academic Open Internet Journal

www.acadjournal.com

Volume 15, 2005

 

STOCHASTIC APPROXIMATION FOR ESTIMATION

OF  BIOLOGICAL  MODELS

Nikolay Iv. Petrov

Trakian University – St. Zagora, Yambol

BULGARIA

nikipetrov@lycos.com

 

Abtract: Stochastic approximation for estimation (SAE)  is  a  class  of optimization algorithms which compute an approximation of the gradient and/or the Hessian of the objective function by varying all the elements of the parameter vector simultaneously and therefore, require only a few objective function evaluations to obtain first or second-order information. Consequently, these algorithms are particularly well suited to problems involving a large number of design parameters. In this study, their potentialities are assessed in the context of nonlinear (NN) system identification. To this aim, a challenging modeling application is considered, i.e. dynamic modeling of batch animal cell cultures from sets of experimental data. The performance of the optimization algorithms are discussed in terms of efficiency, accuracy and ease of use.

 

Key words: - nonlinear system identification; stochastic approximation; biotechnology

 

1. INTRODUCTION

          The process of modeling requires an estimation of several unknown parameters from noisy measurement data. To this aim, a least-squares or maximum –likelihood cost function (depending on the assumptions of the measurement noise)  is usually minimized using a gradient-based optimization method.

          Several techniques for computing the gradient of the cost function are available, including finite difference approximation and analytic differentiation. This latter technique leads to backpropagation in neural networks or several sensitivity equations in the case of conventional first-principles models.

          In the above-mentioned techniques, the computational expense required to estimate the current gradient direction is directly proportional to the number of unknown model parameters, which becomes an issue for models involving a large number of parameters. This is typically the case in nonlinear (NN) modeling, but can also occur when estimating parameters and initial conditions in first-principles models.

          In contrast to standard finite differences which approximate the gradient by varying the parameters one at a time, the simultaneous perturbation (SP)  approximation of the gradient proposed by Spall [5]  makes use of a very efficient technique based on a simultaneous (random)  perturbation in all the parameters. Hence, one gradient evaluation requires only two evaluations of the cost function. This approach has first been applied to gradient estimation in a first-order stochastic approximation (SA) algorithm [5], and more recently to Hessian estimation in an accelerated second-order stochastic approximation for estimation (SAE) algorithm [6].

          In previous works [3, 7], the authors applied the above–mentioned first- and second- order SA algorithms (1SAE and 2SAE) to weights and biases estimation in NNs, and proposed several variations of the 1SAE  algorithm. These simulation studies were limited to relatively simple examples, but demonstrated the efficiency and modest computational costs of 1SAE. The objective of this paper is to extend these studies by evaluating:

·        variants of  1SAE/2SAE algorithms, in which scaling of the gradient/Hessian estimates is introduced to avoid potential large variations in the course of the optimization process;

·        the performance of first- and second- order algorithms as applied to a challenging parameter estimation problem, namely identification of unknown parameters in a macroscopic model of batch animal cell cultures from experimental measurements of biomass, glucose, glutamine and lactate concentrations.

This paper is organized as follows. Section 2 introduces the basic principles of the first- and second- order SA algorithms used throughout this study. In section 3, the algorithms are applied to the maximum likelihood estimation of kinetic parameters and initial conditions of a bioprocess model from experimental measurements of several macroscopic component concentrations. Direct and cross-validation results demonstrate the good model agreement.  Finally,  section  4  is devoted to discussions and concluding remarks.

 

2. SAE  ALGORITMS

          Consider the problem of minimizing a, possibly noisy, objective function  with respect to a vector  of unknown  parameters.

          1 SAE is given by the following  core recursion for the parameter vector  [3; 7]. This is shown by equation:          

 

(1)                                               ,

 

in which  is a non-negative scalar gain coefficient, and  is an approximation of the criterion gradient obtained by varying all the elements of  simultaneously, i.e.

 

(2)                     ,

 

where:  is a positive scalar and  with symmetrically Bernouilli distributed random variables

 

          In its original formulation, 1SAE makes use of decaying gain sequences and in the form

 

(3)                             ,         ,

 

which ensure asymptotic convergence results. However performance in finite samples can be different, and numerical experiments suggest that an adaptive gain sequence for parameter updating [1, 3, 5] can enhance convergence and stability (this is particularly true when solving a non convex parameter identification problem), i.e.

 

(4)                             ,

(5)                             .

 

          In addition to gain attenuation when the value of the criterion becomes worse, “locking” mechanisms [5, 6] are also applied , i.e. the current step is rejected and, starting from the previous parameter estimate, a new step is accomplished (with a new gradient evaluation and a reduced updating gain). The parameter  in equation (4) and (5) represents permissible increase in the criterion, before step rejection and gain attenuation occur.

          A constant gain sequence  can be used for gradient approximation, the value  being selected so as to overcome the influence of (numerical or experimental) noise. In the optimum neighborhood, however, a decaying sequence in the form (3) is required to evaluate the gradient with enough accuracy and avoid an amplification of the “slowing down” effect as an optimum is approached (note that this phenomenon is even more pronounced in the case of  SP techniques since the gradient information is more delicate to “extract” in the – usually rather “flat” – neighborhood of the optimum).

          Finally, a gradient smoothing (GS)  procedure is implemented, i.e., gradient approximations are averaged across iterations in the following way

 

(6)                       ,

 

where  is decreased in a way similar to  equation (4) and (5) when step rejection occurs (i.e.  with )  and is reset to its initial value  after a successful step.

          The use of these numerical artifices, i.e. adaptive gain sequences, step rejection procedure and gradient smoothing, significantly improves the effective practical performance of the algorithm (which, in the following, is denoted “adaptive1SP-GS”) [1, 2, 4].

          As relatively large excursions in the parameter space can be achieved, convergence can also be enhanced through scaling of the gradient estimate (2) at each iteration. This new feature is implemented here by normalizing each direction of the gradient vector  with respect to its largest component (infinity norm scaling)

 

(7)                                       .

 

          This latter version is denoted  1SP - GSS  (Gradient Smoothing and Scaling).

Inequality constraints can also be taken into account by a projection algorithm introduced in [4], i.e. the current parameter estimate is projected onto a closed  set included in the admissible region in such a way that no function evaluation is required outside this latter region. In this study, bound constraints (e.g., positivity constraints) are handled in this way.

          The second-order algorithms 2SAE are based on the following two core recursions, one for the parameter vector , the second for the Hessian  of the criterion [6]:

 

(8)                                     ,

(9)                                                  ,

(10)                                       ,

 

where:  is a per-iteration symmetric estimate of the Hessian matrix, which is computed from gradient approximations (or direct evaluations)  using a simultaneous perturbation approach,   is a simple sample mean, and  is a mapping designed to cope with possible non-positive –definiteness of  .

          Again, the algorithm requires only a small number of function evaluations – at least four criterion evaluations to contract the gradient and  Hessian  estimates – independent of the number of unknown parameters.

          Several variants of the mapping  have been considered in the literature:

a)     regularization through addition of a diagonal perturbation matrix with small positive elements [5];

b)    a more elaborate regularization technique recently proposed in which the eigenvalue matrix  of is first “corrected”, i.e. negative elements are replaced by a descending series of small positive eigenvalues, and a new matrix  is defined. Then, the orthogonal matrix  of eigenvectors is used to define the mapping ;

c)     a simplified version of the preceding approach in which the ”corrected” eigenvalue matrix  is replaced by a constant diagonal matrix defined by the geometric mean of all the  eigenvalues.

 

Mapping (a) is easy to implement, but relatively delicate to tune in practical situations (selection of the elements of the perturbation matrix). Mappings are potentially more efficient, but more complex to implement. In addition, some tuning is still required (to select the small positive eigenvalues that are substituted to the negative elements of  ). In this study, a simple, tuning-free, Hessian estimate is considered. Following an idea originally introduced in [2], a diagonal approximation of the  Hessian is built,

(11)                    ,

 

where the notation indicates a component wise  division of two vectors (in analogy with Matlab programming).

          The gradients  are obtained by one sided  approximation (in order to limit the number of function evaluations)

 

(12)                 ,

 

where  is a positive scalar (the sequence can be chosen in similar way as , e.g. equation (3)) and with symmetrically Bernouilli distributed random variables  (independent of  in equation (3)).

 

          In the same spirit as Eq. (7), an infinity-norm scaling is introduced, i.e.

 

(13)                                       ,

(14)                                              ,

 

where:  is a regularization in which the absolute value of each of the (diagonal) elements of  is computed and represents the largest of these elements.

 

          This latter algorithm is denoted “adaptive 2SP-DHS” (- order Simultaneous Perturbation algorithm with Diagonal Hessian  estimation and Scaling).

 

3. MODELING  OF  ANIMAL  CELL  CULTURES

          In [8] the authors propose a deterministic model of growth with constraints:

 

(15)                                                   ,

 

where:  is the intensity  of the growth of the cellular number, and  is a coefficient with the sense of specific number growth.

 

          The solution of this equation gives on exponential growth of the number of cells in the system:

 

(16)                                              .

 

          Modifications of the proposed equation are possible. The restricting factors on the  process dynamics are usually accounted by variation of the coefficient  . If with  is denoted the degree of influence of a given restricting factor,  this equation can be presented as:

 

(17)                                              .

 

          Consider batch animal cell cultures described by a simple macroscopic reaction scheme growth [6]:

 

(18)                                               ,

 

and scheme maintenance:

 

(19)                                        ,

 

where: , ,  and  represent biomass, glucose, glutamine and lactate, respectively, and ,  and   are pseudo-stoechiometric  coefficients. The symbol “” means that the growth reaction is auto-catalyzed by X and the presence of “” in both sides of the maintenance reaction means that  catalyzes this latter reaction.

 

          The growth rate  and the  maintenance rate   are described by a general kinetic model structure proposed in [6]:

 

(20)                             ,

 

(21)                                     .

 

          Simple mass balances allow the following dynamic model to be derived:

 

(22)                             ,            ,

(23)                             ,                   ,

(24)                             , ,

(25)                              ,                 ,

 

where:  and  denote the respective component concentrations.

 

          Identification of bioprocess models is a delicate task and in [6], a systematic procedure is proposed, which allows the pseudo-stoechiometric coefficients to be estimated independently of the kinetic coefficients by minimizing a maximum-likelihood criterion. This procedure also considers the estimation of the most likely initial conditions (since the concentration measurement are corrupted by noisy at each sampling time, including the initial one).

 

          In this study, it is assumed that the pseudostoechiometric coefficients have already been estimated following the above-mentioned procedure and that only the kinetic coefficients and the initial component concentrations have to be inferred from rare and asynchronous measurements of biomas, glucose, glutamine and  lactate concentrations.

 

          The measurement equation is given by

                  

(26)                                 , ,

where:  and  are the state, measurement  and noise vectors at time , respectively. The measurement errors are assumed to be normally distributed, white noises with zero mean and variance matrix .

 

          Data are collected from seven batch experiments corresponding to different initial glucose and glutamine concentrations. Five of these experiments are used for parameter estimation, the two remaining ones being used for cross-validation tests.

          The 28 unknown parameters (8 kinetic coefficients and 20 initial concentrations) are estimated by minimizing a maximum likelihood cost function taking into account the measurement noises, i.e.

 

(27)                      ,

 

where and  are the measurement vector, the measurement error covariance matrix and the state estimate obtained by integration of the model equations (21-24) with the parameters  at time , respectively.

 

          The tuning parameters of 1SP-GS are selected as follows: (a very slowly decaying sequence  is used for gradient evaluation),  (no relative increase in the criterion is allowed), . For 1SP-GSS, the same parameters are used, expert  . Starting with the measured initial concentrations (which are effected by measurement errors) and in initial guess for the kinetic parameters corresponding to a criterian value , the minimization problem (26) is repeated 10 times with both algorithms. 

 

4. CONCLUSION

As a conclusion it could be done the following inferences:

1. The simultaneous perturbation approach developed by Spall [3; 5]  is a very powerful technique, which allows an approximation of the gradient of the objective function to be computed by effecting simultaneous random perturbations in all the parameters.

 

2. Therefore, this approach is particularly well-suited to problems involving a relatively large number of design parameters.

3. In this study, variants of first- and second- order SP algorithms are considered and applied to the identification of the kinetic parameters and the initial conditions of a bioprocess model from experimental measurements of a few macroscopic components.            

 

 

REFERENCES

          [1]. Renotte C., A. Vande Wouwer, M. Remy. Neural Modeling and Control of a Heat Exchanger based on SPSA Techniques. Proceedings of the Workshop “ACC”, 2000.

          [2]. Wouwer A. Vande, C. Renotte, M. Remy. On the Use of Simultaneous Perturbation Stochastic Approximation for Neural Network Training. Proceedings of the Workshop “ACC”, 1999.

          [3]. Spall J. Multivariate Stochastic  Approximation Using a Simultaneous Perturbation Gradient Approximation. IEEE Trans. Automat. Contr. 45, 2000.

          [4]. Sadegh P. Constrained Optimization via Stochastic Approximation with a Simultaneous Perturbation Gradient  Approximation. Automatica  33 (1997).

          [5]. Spall J. Adaptive Stochastic  Approximation by the Simultaneous Perturbation Method. IEEE Trans. Automat. Contr. 45, 2000.

          [6]. Bogaerts Ph., R. Hanus. Macroscopic Modeling of Bioprocesses with  a View to Engineering Applications. In: Focus on Biotechnology, Vol. 4, Kluver, 2000.

            [7]. Petrov N. Use Reliability of Risk Technical Systems. Tracian University, St. Zagora, Yambol, “Uchkov”, ISBN 954-9978-26-5, 2002.

 [8]. Kalchev I., S. Yordanova. Biometric ecomonitoring: mathematical models, estimates and predictions, CPS”96, Volume 1, Bratislava, Slovak Republic, 14-15 May 1996, pp. 480-483.  

 

 

Technical College - Bourgas,

All rights reserved, © March, 2000