A Review paper based on acoustic signal based sensor system : AutoDietary
Abstract
Today Nutrition related diseases are the most find threat in human health and become great challenges for medical care. To solve this problem the crucial step is to check the daily food consume of a person precisely. To solve this problem AutoDietary, a wearable system prepares to calculate the food intake in daily routine. An embedded Hardware prototype is prepare with sensor node is used to collect food intake and, which is highlighted by microphone to record acoustic signals during eating. The acoustic data then preprocessed and sent to the Smartphone via Bluetooth. Markov model are used for chewing and swallowing events and decision-tree-based algorithms are used to identify the type of food. Also develop Smartphone application to suggest healthier eating and provide result in user friendly way.
Keywords: AutoDietary, Food consumption recognition, embedded system, acoustic signal, hidden Markov model
Introduction
To sustain healthy life key factor is balancing energy intake and expenditure. Abnormalities in balance can have various disease like anorexia, obesity and many more which can furthermore converted to chronic disease if not treated properly[1]. Now a day, it is highly desirable to prepare a system that is accurate and easy to use that can monitor the food eaten and predict energy intake. Various types of methods exist, but either they are too complex or inaccurate [2]. To solve this problem we proposed AutoDietary that can monitor food types. It is basically divided into two parts: one is called embedded hardware system and other one is called Smartphone Application. An embedded Hardware prototype is prepare with sensor node is used to collect food intake and, which is highlighted by microphone to record acoustic signals during eating. The acoustic data then transmitted via Bluetooth to a Smartphone, where recognition of food items is done. The Smartphone application not only based on the food recognition of result but also provide information on healthier eating.
There recognition of food items consists of various steps. The acoustic signals are framed firstly and then this sound frames are processed with the help of hidden Markov model (HMM) [3] that is based on the Mel Frequency Cepstrum Coefficients [4] to detect chewing events and also to detect fluid intake by swallowing events. Each event containing information about both frequency/time domain and non-linear information. A tree based algorithm is develop to recognize the type of food intake. Current design is acceptable to most users. AutoDietary provide suggestions to healthier eating habits and used in therapy of nutrition related disease.
2. Related Work
Research community has a great interest in monitoring the reorganization of food types in food intake [5]. Meyer et al. developed a methodology to study the ingestive behavior by non-invasive monitoring of chewing and swallowing [6]. The main objective is to research the behavioral patterns of food consumption and producing volumetric and weight estimates of energy intake. But the drawback is that composite microphone system reduces the comfortability.
Amft [7] presented an acoustic ear-pad sensor device to capture air-conducted vibrations of food chewing. Amft noted 86.6% accuracy by recording 375 chewing sequences with 4 different food iteams totally. But the drawback of this system is that it requires multiple microphones and some of the microphones are placed in ear canal.
Pabler and wolff [8] proposed a system in which sound signals are recorded by microphone in the outer ear canal. Hidden Markov models are used for the single chewing or swallowing events. Further based on Viterbi algorithm [9]. The drawback of this system is same as the previous one i.e microphone placed in ear canal is less comfortable to wear.
Radio frequency identification (RFID) tags on food packages are used to detect and distinguish what and how to people eat [10]
It record the amount of food eaten by the dining table having integrated weight scale, which records bite weights and assign these weights to each person at the table. The major drawback of this approach is that restriction to only one location, which makes it impractical for monitoring under free living conditions.
Clemson University invented Bite counter that is used to identify food intake gestures, swallowing and chewing to provide food category and timing information [11]. Watch like configuration of sensors are used to continuously observe wrist motion such as eating with fork, knife, drink from glass or the use of hand to mouth and also eat with spoon throughout the day and detects periods of eating.
Lester and Tan [12] develop a method that used a smart cup which combines pH, conductivity and light spectrum to fingerprint different liquids and allow them for long term fluid intake monitoring.
Video fluoroscopy and electromyography (EMG) are considered gold standard in study of deglutition [13]. Video fluoroscopy depends on bulky unsafe equipment, while EMG is too invasive to avoid interference from the muscles to neck.
Gao [14] used video analysis to observe eating behavior of people. But the drawback is that this system is restricted to special locations where cameras are installed.
Wu and Yang [15], intake of food was recorded using camera to find out food by frequently taking images and comparing them with images in the database. But the drawback is that in which relevant objects are hidden behind other objects such as people or furniture.
Zhou et al. develop a smart table to support nutrition monitoring [16]. Food intake related action such as cutting, stirring, poking , scooping etc., are detected and recognize based on smart table cloth equipped with a weight tablet and fine grained pressure textile matrix. Based on these actions weight and food content is estimated. But the main drawback of this system is that different food with same type having similar action and cutting force to estimate food amount is not accurate.
III System Architecture
Autodietary consist of two components: an embedded system and an application running on Smartphone. The main architecture of the system is shown in Fig. 1
A. Acoustic Sensors
A high-precision and high-fidelity throat microphone is employed to pick acoustic signals during the eating time. The microphone is worn over the user's neck close to jaw. Microphone converts vibration signals from the skin surface to acoustic signals. This principle enables very high quality signals which are capable of adopting chewing and swallowing sound and on the other hand throat microphone is comfortable to wear and better accepted by the users. The effect of wearing the throat microphone is depicted in Fig. 2(a).
B. Hardware Board
Hardware board is designed for data pre-processing and transmission as shown in Fig. 2(b). When acoustic data received from throat microphone and input from Mic In then these signals are amplified and filtered for the better signal quality. After that analog signals converted into digital signals. The amplifier used is LM358[17] having features like high common mode rejection ratio and low noise and high gain that is about 250. The AD convertor is TLV2541 [18] having a sampling rate of 8000 Hz and 12 bit resolution
Then these digital signals sent to the microcontroller via the I2C interface. The Microcontroller is the ultra power MSP430F5438 [19], having feature of energy constrained electronic products and portable medical equipments. The main task of microcontroller is to frame admission control of raw signals from the throat microphone. After that these frames are sent to a Bluetooth through UART using SPI protocol. And these frames further sent to the Smartphone Bluetooth with SPP profile with data rate of 150 Kbits/s. This transmission takes place within the distance of 10m and Bluetooth is responsible of the reliability of the wireless data transmission. The whole hardware is power with rechargeable LiPo battery.
C Smartphone Application
Smartphone application has two major roles. The first role is that is performs food type recognition by implementing algorithms and second is to serves as a data manager and provides an interface to the user. Fig. 3 gives detail of screenshot of user interface application. In this application when the user starts to eat, the system will perform food type recognition and store all the detail in the database of the application. After that user can check detailed records (as shown in Fig. 3 (b)) and also go through the healthier eating habits suggestions. Eating guidance currently having the following suggestions like a) more regular and balanced diets, b) alerts on excessive snacking in a day, c) alerts on abnormal chewing speed, d) suggested intervals between meals suggestions on hydration intake. Developer can further expand the application with new feature on personal health management.
IV. Food type recognition
The food type is recognized by three consecutive steps shown in Fig. 4. First step uses the hidden Markov model based on Mel Frequency Cepstrum coefficient to detect the swallowing and chewing events from the continuous sound frame. Frames that are not involved in the event are discarded. In the second step, each event is processed to extract the key features that nest distinguish different food types. And in the last step takes feature values for each event and evaluate them and represented by decision tree to predict the food type corresponding to the given event. All the results are stored in the database for the future analysis. It helps to reduce the computation and memory usage which leads the longer battery lifetime.
A. Event Detection
For event detection we use hidden Marcov model (HMM) to automatically detect the chewing and swallowing events from continuous recording sample.HMM has been widely used in many fields. In recent decades, HMM has been proposed many acoustic classification/detection methods [20]-[22].
A recording sample is framed into frames. To find the frame sequence that maximize the posterior probability of the frame sequence W=(W1,W2,'.,WM), having observation O= (O1, O2,'.OT):
(1)
The model P(W/O) is the HMM for acoustic events and silence periods, with 4 emitting states and left-to-right state transition. For event sequence or silence sequence observation O is composed of 32 Mel Frequency spectrum Coefficients. For every observation under event and silence HMMs the Viterbi algorithm is used to computed the posterior probability. If the probability under the event HMM is larger than the frame is under the silence HMM.
We label each frame belonging to an event with bit 1 and each non event frame with bit 0.
B. Feature Extraction
The accuracy of food type heavily depends on the selection of event features which can distinguish different food types. In the present work we extract frequency'domain features, time-domain features and non-linear features for each event, listed in TABLE I,II and III respectively.
In the first we discuss about time-domain features, the time-domain features are computed for each chewing event, including high peak value, mean value , low peak value, standard deviation and variance of the signal. Most of these features have been intensively used for pattern recognition and related studies [6], [23], [24]. Zero crossing rate, skewness, interquartile range and kurtosis can also be used for the better representation of the geometry characteristics of the signals.
Frequency domain features help us to describe the distribution of the signals over the given range of frequencies. In the present work Power Spectrum Density (PSD) of the signal in each segment is estimated with the help of Welch's method with a hamming window [25]. With the help of PSD, the maximal power (Pmax) and mean power (Pmean) for a specific frequency are computed. Wavelet Packet Decomposition [26] to extract frequency domain features. WPD used to divide the signal into different time frequency, and the power of the signal is calculated which is proportional to the integration of the square of amplitude by WPD [27]. In the present work db3 wavelet implementation is used to decompose the signal into 3 levels and Shannon entropy is used to calculate sub-band energy.
The non-linear techniques are able to describe the progress of signals in a more effective way [28]. We add following 5 nopn-linear features to describe the event. Non-linear feature, such as detrended fluctuation analysis (DetrenFlu), approximate entropy (AppEn), fractal dimension (FraDimen), Hurst exponent (Hurst-E) and correlation dimension (CorDimen), have been demonstrated useful to describe a signal [28], [29].
Total 34 features for each event are extracted to represent its acoustic characteristics. Fig. 5 gives the signal and their corresponding spectrogram for 4 food intake events, each of which corresponds to different food items.
C. Recognition and Classification of Food Types
In this study, prior knowledge values obtained from thousands pre-recorded events for different types of known food. In the present work, decision tree [30], an approach widely used in activity recognition [31], text classification [32] to represent the key information in our knowledge.
Fig. 6 is an example of decision tree used by AutoDietary, which is actually a chunk of whole tree to recognize cookie and water. The process starts from the root node. The Max_peak node is checked first, and the branch satisfying the constraint is taken. The process proceeds and more feature values may be calculated in the intermediate nodes. Once a leaf node is reached, a final decision on the food type is returned.
Fig. 6. Part of decision tree to recognize cookie and water
Discussion and Future work
The AutoDietary enables us to provide trustable suggestion on proper eating. And its current design is acceptable by most of the users. It now can be used in medical and physiotherapy studies for the healthier food intake behavior. AutoDietary also help to reduce the bowl disorder due to improper chewing and swallowing speed. AutoDietary can be especially useful for disabled and very sick people, for whom daily food intake monitoring.
In future we can improve AutoDietary with several things by keep in mind like by adding new capabilities to the present system like volume and weight of the food intake. We can also improve AutoDietary by conducting experiment in the lab environment with low noise, head movement and speaking. Present work can also be improved by handling the broader range of food types. Last but not least we can also improve the system by reducing the size of the microphone and the embedded system unit to optimize user experience. We can also target to design an embedded system unit in the form of USB key size, which can be worn like a pendant or easily put into the chest like the pen. All these improvement will be validated in long term real life scenarios.
Conclusion:
An embedded Hardware prototype is prepare with sensor node is used to collect food intake and, which is highlighted by microphone to record acoustic signals during eating. The acoustic data then preprocessed and sent to the Smartphone via Bluetooth. Markov model are used for chewing and swallowing events and decision-tree-based algorithms are used to identify the type of food. Also develop Smartphone application to suggest healthier eating and provide result in user friendly way. The present system is acceptable to most of the user for daily life.
[1] Obesity and Overweight: What are Overweight and Obesity. Fact Sheet,
World Health Organization, Geneva, Switzerland, 2006, (311).
[2] L. E. Burke et al., 'Self-monitoring dietary intake: Current and
future practices,' J. Renal Nutrition, vol. 15, no. 3, pp. 281'290,
2005.
[3] J. Beh, D. K. Han, R. Durasiwami, and H. Ko, 'Hidden Markov model
on a unit hypersphere space for gesture trajectory recognition,' Pattern
Recognit. Lett., vol. 36, pp. 144'153, Jan. 2014.
[4] S. R. M. S. Baki, Z. M. A. Mohd, I. M. Yassin, A. H. Hasliza,
and A. Zabidi, 'Non-destructive classification of watermelon ripeness
using mel-frequency cepstrum coefficients and multilayer perceptrons,'
in Proc. IJCNN, Jul. 2010, pp. 1'6.
[5] O. Amft and G. Troster, 'On-body sensing solutions for automatic
dietary monitoring,' IEEE Pervasive Comput., vol. 8, no. 2, pp. 62'70,
Apr./Jun. 2009.
[6] E. Sazonov et al., 'Non-invasive monitoring of chewing and swallowing
for objective quantification of ingestive behavior,' Physiol. Meas.,
vol. 29, no. 5, pp. 525'531, 2008.
[7] O. Amft, 'A wearable earpad sensor for chewing monitoring,' IEEE
Sensors, vol. 1, no. 4, pp. 222'227, Nov. 2010.
[8] S. P''ler, M. Wolff, and W.-J. Fischer, 'Food intake monitoring:
An acoustical approach to automated food intake activity detection
and classification of consumed food,' Physiol. Meas., vol. 33, no. 6,
pp. 1073'1093, 2012.
[9] L. Rabiner, 'A tutorial on hidden Markov models and selected applications
in speech recognition,' Proc. IEEE, vol. 77, no. 2, pp. 257'286,
Feb. 1989.
[10] K.-H. Chang et al., 'The diet-aware dining table: Observing dietary
behaviors over a tabletop surface,' in Pervasive Computing. Berlin,
Germany: Springer-Verlag, 2006, pp. 366'382.
[11] Y. Dong, J. Scisco, M. Wilson, E. Muth, and A. Hoover, 'Detecting
periods of eating during free-living by tracking wrist motion,' IEEE J.
Biomed. Health Inform., vol. 18, no. 4, pp. 1253'1260, Jul. 2013.
[12] J. Lester, D. Tan, and S. Patel, 'Automatic classification of daily
Flui
d intake,' in Proc. IEEE 4th Int. Conf. Pervas. Comput. Technol.
Healthcare (PervasiveHealth), Mar. 2010, pp. 1'8.
[13] K. Sato and T. Nakashima, 'Human adult deglutition during sleep,' Ann.
Otol., Rhinol. Laryngol., vol. 115, no. 5, pp. 334'339, 2006.
[14] J. Gao, A. G. Hauptmann, A. Bharucha, and H. D. Wactlar, 'Dining
activity analysis using a hidden Markov model,' in Proc. 17th ICPR,
vol. 2, 2004, pp. 915'918.
[15] W. Wu and J. Yang, 'Fast food recognition from videos of eating for17] B. Zhou et al., 'Smart table surface: A novel approach to pervasive diningmonitoring,' in Proc. IEEE Int. Conf. Pervasive Comput. Commun.,Mar. 2015, pp. 155'162.
[16] B. Zhou et al., 'Smart table surface: A novel approach to pervasive dining
monitoring,' in Proc. IEEE Int. Conf. Pervasive Comput. Commun., Mar. 2015, pp. 155'162.
[17] Texas Instruments. TI Homepage: LM358. [Online]. Available:
http://www.ti.com/product/lm358, accessed Sep. 24, 2014.
[18] Texas Instruments. TI Homepage: TLV25431. [Online]. Available:
http://www.ti.com/product/tlv2541, accessed Sep. 24, 2014.
[19] Texas Instruments. TI Homepage: MSP430. [Online]. Available:
http://www.ti.com.cn/product/MSP430F169, accessed Sep. 30, 2014.
[20] X. Zhou et al., 'HMM-based acoustic event detection with adaBoost feature
selection,' in Multimodal Technologies for Perception of Humans.
Berlin, Germany: Springer-Verlag, 2007.
[21] C. Zieger, 'An HMM based system for acoustic event detection,'
in Multimodal Technologies Perception Humans. Berlin, Germany:
Springer-Verlag, 2008, pp. 338'344.
[22] A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu, and M. Omologo,
'Acoustic event detection and classification in smart-room environments:
Evaluation of CHIL project systems,' Cough, vol. 65, no. 48, p. 5, 2006.
[23] L. Bao and S. S. Intille, 'Activity recognition from userannotated
acceleration data,' in Pervasive Computing, Berlin, Germany:
Springer-Verlag, 2004, pp. 1'17.
[24] T. Huynh and B. Schiele, 'Analyzing features for activity recognition,'
in Proc. Joint Conf. Smart Objects Ambient Intell., Innov. Context-Aware
Services, 2005, pp. 159'163.
[25] P. D. Welch, 'The use of fast Fourier transform for the estimation of
power spectra: A method based on time averaging over short, modified
periodograms,' IEEE Trans. Audio Electroacoust., vol. 15, no. 2,
pp. 70'73, Jun. 1967.
[26] A. Yadollahi and Z. Moussavi, 'Feature selection for swallowing sounds
classification,' in Proc. 29th Annu. Int. Conf. IEEE EMBC, Aug. 2007,
pp. 3172'3175.
[27] E. S. Sazonov, O. Makeyev, S. Schuckers, P. Lopez-Meyer,
E. L. Melanson, and M. R. Neuman, 'Automatic detection of swallowing
events by acoustical means for applications of monitoring of ingestive
behavior,' IEEE Trans. Biomed. Eng., vol. 57, no. 3, pp. 626'633,
Mar. 2010.
[28] A. Metin, Nonlinear Biomedical Signal Processing Volume II: Dynamic
Analysis and Modeling. New York, NY, USA: Wiley, 2000, pp. 83'92.
[29] M. E. Cohen, D. L. Hudson, and P. C. Deedwania, 'Applying continuous
chaotic modeling to cardiac signal analysis,' IEEE Eng. Med. Biol. Mag.,
vol. 15, no. 5, pp. 97'102, Sep./Oct. 1996.
[30] U. Kumar et al., 'Mining land cover information using multiplayer
perception and decision tree from MODIS data [J]' Indian Soc. Remote
Sens., vol. 38, no. 4, pp. 592'602, Dec. 2010.
[31] C. Chien and G. J. Pottie, 'A universal hybrid decision tree classifier
design for human activity classification,' in Proc. 34th Annu. Int. Conf.
IEEE EMBS, San Diego, CA, USA, Aug./Sep. 2012, pp. 1065'1068.
[32] Y. Sakakibara, K. Misue, and T. Koshiba, 'Text classification and
keyword extraction by learning decision trees,' in in Proc. 9th Conf. Artif.
Intell. Appl., Mar. 1993, p. 466.