Diagnosis of Diseases in Newborn Infants by Analysis of Cry Signals

H. Farsaie Alaie

Crying is the first sound the baby makes when he enters the world outside of his mother’s stomach, which is a very positive sign of a new healthy life. Well, we elders can talk but the newborn infant isn't old enough to do that yet. Cry is all a baby can do to express any discomfort it feels. When initially reading it, the first thing that comes to mind is why the cry is such an important aspect of health care for newborn infants? Although studying on infant’s cry was pioneered in the late 1960s, but it never crossed anybody's mind that sick infants might be identified from their cries. Statistical reports by World Health Organization state that the congenital anomalies or birth defects affect approximately 1 in 33 infants born every year and almost all of the world’s infant deaths happen in developing countries. Therefore, it is imperative to provide an inexpensive health care system, with no need of complex and advanced technology for poor mothers with newborn babies in low-income countries to survive more babies beyond the first months of life. In spite of the fact that there are a lot of maternal issues that can raise the risks of complications and anomalies in newborn infants, we are curious to examine the ability of solely the concealed information inside infant’s cry to clarify the infant’s physiological anatomy and psychological condition. The creative idea behind of such a non-invasive diagnostic system is based on the evidence extracted from past research studies for potential ability of infant’s cry to distinguish between healthy and sick infants.  This innovative idea can tackle key global health and development problems.

The purpose of this study is to develop a newborn cry-based diagnostic system to classify healthy and sick infants with different pathological conditions. First, an informed choice of pathological states and collecting of the infant cry data base is necessary and still in progress to complete the infant cry data base. In many of today’s application domains, it is often unavoidable to have data with high dimensionality and small sample size. Both small sample size problem and dimensionality reduction methods have been studied extensively but the combination of imbalanced data and small sample size presents a new challenge to the community. In this situation, learning algorithm often fail to generalize inductive rules over the sample space when presented with this form of imbalance. In fact, the combination of small sample size and high dimensionality hinders learning because of difficulty involved in forming conjugations over the high degree of features with limited samples. In the next part, data preprocessing, including selection and extraction of pathologically-informed features suitably with the best possible precision and then quantifying them for each pathological condition without any human intervention is considered in the system. In order to obtain the full benefit of the information embedded in the cry signal, Mel Frequency Cepstrum Coefficient (MFCC) analysis will be done on both expiratory and inspiratory cry vocalizations separately in this study. To avoid the need of human effort in labeling the boundaries of the corresponding corpus, automatic labeling of cry signals is required for an ideal cry-based diagnostic system. However, to alleviate the segmentation task in this study, it has been manually performed so far.

Finite mixtures are a flexible and powerful probabilistic tool for modeling univariate and multivariate data among all available approaches to do modeling and classification tasks. In this regard, we come up with Gaussian Mixture Models (GMMs) that is a special case of Hidden Markov Models (HMMs) with one state, as a new representation of cry signals according to extracted feature streams. The next part of this thesis is dedicated to enhancement of learning of GMMs that are usually trained using the iterative Expectation Maximization (EM) algorithm. However, considering the risk of overfitting due to small training sample size in some pathological conditions, and the fact that the number of mixtures is fixed in the traditional EM-based re-estimation algorithm, a new learning method based on boosting algorithm is introduced to learn growing mixture models in an incremental and recursive manner.

The idea of Universal Background Model (UBM) used in speaker recognition and verification systems is employed to represent general feature characteristics of infant cry signals. Then, a variant of boosted mixture learning (BML) method is employed in order to derive subclass models for each enrolled disease from the GMM-UBM by adaptation of GMM parameters. The crux of the design was to fuse two subsystems that are based on expiratory and inspiratory sounds in baby cry recordings into a single effective system. Such systems are expected to be more reliable due to the presence of multiple, (fairly) independent pieces of evidence. We present log-likelihood ratio score fusion to stop worrying on the feature compatibility and rigid fusion.

Apart from all of the above-mentioned modeling and learning methods, our work is different from previous works in that while other systems usually deal with binary classification tasks between healthy and sick infant with only one specific disorder. Our cry-based diagnostic system has a hierarchical scheme that focuses into multi-pathology classification problem via combination of individual classifiers. Moreover, it is worthwhile mentioning that the chosen diseases have not been previously studied.

Keywords: Gaussian mixture model; Universal background model; Mel-frequency Cepstral Coefficient; Likelihood ratio scores; Newborn infant cries; Expiratory sound; Inspiratory sound.