Information About

Expectation-maximization Algorithm





HISTORY

The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster , Nan Laird , and Donald Rubin in the Journal of the Royal Statistical Society (see reference below). They pointed out that method had been "proposed many times in special circumstances" by other authors, but the 1977 paper generalized the method and developed the theory behind it.


APPLICATIONS


EM is frequently used for Data Clustering in Machine Learning and Computer Vision .

In Psychometrics , EM is almost indispensable for estimating item parameters and latent abilities of Item Response Theory models.

With the ability to deal with missing data and observe unidentified variables, EM is becoming a useful tool to price and manage risk of a portfolio.

The EM algorithm (and its faster variant OS-EM ) is also widely used in Medical Image reconstruction, especially in Positron Emission Tomography and Single Photon Emission Computed Tomography .


DEMONSTRATIONS AND ACTIVITIES

Various 1D, 2D and 3D demonstrations of EM together with Mixture Modeling are provided as part of the paired SOCR activities and applets. These applets and activities show empirically the properties of the EM algorithm for parameter estimation in diverse settings.


SPECIFICATION OF THE EM PROCEDURE


Let extbf{y} denote incomplete data consisting of values of observable variables, and let extbf{z} denote the missing data. Together, extbf{z} and extbf{y} form the complete data. extbf{z} can either be actual missing measurements or a hidden variable that would make the problem easier if its value were known. For instance, in Mixture Model s, the likelihood formula would be much more convenient if mixture components that "generated" the samples were known (see example below).

Let p\, be the joint , that is, it can be thought of as a function of heta.
Further, note that the Conditional Distribution of the missing data given the observed can be expressed as:

  By Using The "http://wwwinformationdelightinfo/information/entry/Bayes'_theorem" class="copylinks">Bayes Rule and the Law Of Total Probability (This formulation only requires knowledge of the observation likelihood given the unobservable data <math>p(\mathbf y\mathbf z, heta)</math> and the probability of the unobservable data <math>p(\mathbf z heta)</math>)
  E {\mathbf Z} \! \! \left "" class="copylinks" target="_blank">\log p \left(\mathbf y, \mathbf z \,\, heta ight) \Big \mathbf y ight ,
  E {\mathbf Z} \! \! \left "" class="copylinks" target="_blank">\log p \left(\mathbf y, \mathbf z \,\, heta ight) \Big \mathbf y ight


  P(z J i\mathbf y_j, heta_t)
  rac{p(z_j=i, \mathbf y_j heta_t)}{p(\mathbf y_j heta_t)}
  rac{p(\mathbf y_jz_j=i, heta_t) p(z_j=i heta_t)}{\sum_{k=1}^n p(\mathbf y_j z_j=k, heta_t) p(z_j=k heta_t)}
  & E_{z} \left \ln \prod_{j=1}^m p \left(\mathbf y_j, \mathbf z heta ight) \Big \mathbf y_j ight \
  & E_{z} \left \sum_{j=1}^m \ln p \left(\mathbf y_j, \mathbf z heta ight) \Big \mathbf y_j ight \
  & \sum_{j=1}^m E_{z} \left \ln p \left(\mathbf y_j, \mathbf z heta ight) \Big \mathbf y_j ight \
  & \sum_{j=1}^m \sum_{i=1}^n p \left(z_j=i \mathbf y_j, heta_t ight) \ln p\left(z_j=i, \mathbf y_j heta ight) \
  \sum_{j=1}^m \sum_{i=1}^n p(z_j=i \mathbf y_j, heta_t) \ln \left( p(\mathbf y_j z_j=i, heta) p(z_j=i heta) ight)
  \sum {i 1}^{n} p(z_j=i heta) = 1
  \left( \sum_{j=1}^m \sum_{i=1}^n p(z_j=i \mathbf y_j, heta_t) \left( - rac{1}{2} \ln (2\pi) - rac{1}{2} \ln \left \sigma_i ight - rac{1}{2}(\mathbf y_j - \mathbf \mu_i)^T \sigma_i^{-1} (\mathbf y_j - \mathbf \mu_i) + \ln p(z_j=i heta) ight) ight) - \lambda \left( \sum_{i=1}^{n} p(z_j=i heta) - 1 ight)
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \left( - rac{\partial}{\partial \mu_i} rac{1}{2}(\mathbf y_j - \mathbf \mu_i)^T \sigma_i^{-1} (\mathbf y_j - \mathbf \mu_i) ight) \
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \left( - rac{1}{2}(\sigma_i^{-1} +\sigma_i^{-T})(\mathbf y_j - \mathbf \mu_i)(-1) ight) \
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \left( \sigma_i^{-1}(\mathbf y_j - \mathbf \mu_i) ight) \
  \sum {j 1}^m p(z_j=i \mathbf y_j, heta_t) \sigma_i^{-1} \mathbf \mu_i
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \sigma_i^{-1} \mathbf y_j \
  \mu I \sum {j 1}^m p(z_j=i \mathbf y_j, heta_t)
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \mathbf y_j \
  & rac{\sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \mathbf y_j}{\sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t)} \
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \left( - rac{\partial}{\partial \sigma_i} rac{1}{2} \ln \left \sigma_i ight - rac{\partial}{\partial \sigma_i} rac{1}{2}(\mathbf y_j - \mathbf \mu_i)^T \sigma_i^{-1} (\mathbf y_j - \mathbf \mu_i) ight) \
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \left( - rac{1}{2} \sigma_i^{-T} + rac{1}{2} \sigma_i^{-T}(\mathbf y_j - \mathbf \mu_i) (\mathbf y_j - \mathbf \mu_i)^T \sigma_i^{-T} ight) \
  \sum {j 1}^m p(z_j=i \mathbf y_j, heta_t) \sigma_i^{-1}
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \sigma_i^{-1} (\mathbf y_j - \mathbf \mu_i) (\mathbf y_j - \mathbf \mu_i)^T \sigma_i^{-1} \
  \sum {j 1}^m p(z_j=i \mathbf y_j, heta_t)
  & \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \sigma_i^{-1} (\mathbf y_j - \mathbf \mu_i) (\mathbf y_j - \mathbf \mu_i)^T \
  & rac{\sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) (\mathbf y_j - \mathbf \mu_i) (\mathbf y_j - \mathbf \mu_i)^T}{\sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t)} \
  Rac{\partial \mathcal{L}( Heta)}{\partial P(z J i heta)}
  & \left( \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) rac{\partial \ln p(z_j=i heta)}{\partial p(z_j=i heta)} ight) - \lambda \left( rac{\partial p(z_j=i heta)}{\partial p(z_j=i heta)} ight) \
  & \left( \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) rac{1}{p(z_j=i heta)} ight) - \lambda \
  \sum {j 1}^m p(z_j=i \mathbf y_j, heta_t) rac{1}{p(z_j=i heta)}
  P(z J i heta)
  & rac{1}{\lambda} \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \
  \sum {i 1}^{n} p(z_j=i heta)
  & \sum_{i=1}^{n} rac{1}{\lambda} \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \
  & \sum_{i=1}^{n} \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \
  P(z J i heta)
  & rac{1}{\lambda} \sum_{j=1}^m p(z_j=i \mathbf y_j, heta_t) \