Diagnosis of Diseases in Newborn Infants by Analysis of Cry Signals
H. Farsaie Alaie
Crying is the first
sound the baby makes when he enters the world outside of his mother’s
stomach, which is a very positive sign of a new healthy life. Well, we
elders can talk but the newborn infant isn't old enough to do that yet.
Cry is all a baby can do to express any discomfort it feels. When
initially reading it, the first thing that comes to mind is why the cry
is such an important aspect of health care for newborn infants?
Although studying on infant’s cry was pioneered in the late 1960s, but
it never crossed anybody's mind that sick infants might be identified
from their cries. Statistical reports by World Health Organization
state that the congenital anomalies or birth defects affect
approximately 1 in 33 infants born every year and almost all of the
world’s infant deaths happen in developing countries. Therefore, it is
imperative to provide an inexpensive health care system, with no need
of complex and advanced technology for poor mothers with newborn babies
in low-income countries to survive more babies beyond the first months
of life. In spite of the fact that there are a lot of maternal issues
that can raise the risks of complications and anomalies in newborn
infants, we are curious to examine the ability of solely the concealed
information inside infant’s cry to clarify the infant’s physiological
anatomy and psychological condition. The creative idea behind of such a
non-invasive diagnostic system is based on the evidence extracted from
past research studies for potential ability of infant’s cry to
distinguish between healthy and sick infants. This innovative
idea can tackle key global health and development problems.
The purpose of this study is to develop a newborn cry-based diagnostic
system to classify healthy and sick infants with different pathological
conditions. First, an informed choice of pathological states and
collecting of the infant cry data base is necessary and still in
progress to complete the infant cry data base. In many of today’s
application domains, it is often unavoidable to have data with high
dimensionality and small sample size. Both small sample size problem
and dimensionality reduction methods have been studied extensively but
the combination of imbalanced data and small sample size presents a new
challenge to the community. In this situation, learning algorithm often
fail to generalize inductive rules over the sample space when presented
with this form of imbalance. In fact, the combination of small sample
size and high dimensionality hinders learning because of difficulty
involved in forming conjugations over the high degree of features with
limited samples. In the next part, data preprocessing, including
selection and extraction of pathologically-informed features suitably
with the best possible precision and then quantifying them for each
pathological condition without any human intervention is considered in
the system. In order to obtain the full benefit of the information
embedded in the cry signal, Mel Frequency Cepstrum Coefficient (MFCC)
analysis will be done on both expiratory and inspiratory cry
vocalizations separately in this study. To avoid the need of human
effort in labeling the boundaries of the corresponding corpus,
automatic labeling of cry signals is required for an ideal cry-based
diagnostic system. However, to alleviate the segmentation task in this
study, it has been manually performed so far.
Finite mixtures are a flexible and powerful probabilistic tool for
modeling univariate and multivariate data among all available
approaches to do modeling and classification tasks. In this regard, we
come up with Gaussian Mixture Models (GMMs) that is a special case of
Hidden Markov Models (HMMs) with one state, as a new representation of
cry signals according to extracted feature streams. The next part of
this thesis is dedicated to enhancement of learning of GMMs that are
usually trained using the iterative Expectation Maximization (EM)
algorithm. However, considering the risk of overfitting due to small
training sample size in some pathological conditions, and the fact that
the number of mixtures is fixed in the traditional EM-based
re-estimation algorithm, a new learning method based on boosting
algorithm is introduced to learn growing mixture models in an
incremental and recursive manner.
The idea of Universal Background Model (UBM) used in speaker
recognition and verification systems is employed to represent general
feature characteristics of infant cry signals. Then, a variant of
boosted mixture learning (BML) method is employed in order to derive
subclass models for each enrolled disease from the GMM-UBM by
adaptation of GMM parameters. The crux of the design was to fuse two
subsystems that are based on expiratory and inspiratory sounds in baby
cry recordings into a single effective system. Such systems are
expected to be more reliable due to the presence of multiple, (fairly)
independent pieces of evidence. We present log-likelihood ratio score
fusion to stop worrying on the feature compatibility and rigid fusion.
Apart from all of the above-mentioned modeling and learning methods,
our work is different from previous works in that while other systems
usually deal with binary classification tasks between healthy and sick
infant with only one specific disorder. Our cry-based diagnostic system
has a hierarchical scheme that focuses into multi-pathology
classification problem via combination of individual classifiers.
Moreover, it is worthwhile mentioning that the chosen diseases have not
been previously studied.
Keywords: Gaussian mixture
model; Universal background model; Mel-frequency Cepstral Coefficient;
Likelihood ratio scores; Newborn infant cries; Expiratory sound;
Inspiratory sound.