Vocal Pathologies arise due to accident, disease, misuse of the voice, or surgery affecting the vocal folds and have a profound impact on patients’ life. The modeling of normal and pathological voice source and the analysis of healthy and pathological voices has gained increasing interest recently. Among the most interesting works are those concerned with Parkinson’s disease (PD) and Multiple Sclerosis (MS), which belong to a class of neurodegenerative diseases that affect patient’s speech.
Most voice-related pathologies are due to irregular masses or the presence of pathologies in the larynx such as vocal fold nodules or vocal fold polyps located on the vocal folds interfering in their normal and regular vibration. This phenomenon causes a decrease in voice quality, that is, usually the first symptom of this type of disorders. In the past, the only way to measure voice quality was by applying perceptual measurements denoting the existence or absence of several voice characteristics . To analyze the speech signal in order to estimate numerous parameters that indicate amplitude and frequency perturbations, the level of air leakage, the degree of turbulence, and so forth.
Voice Quality Analysis
Voice production starts with the vibration of the vocal folds, which can be more or less stretched to achieve higher or lower pitch tones. In normal conditions and in spite of this pitch variation ability, phonation is considered stabilized and regular. Any transformation on the vocal fold’s tissue can cause an irregular, nonperiodic vibration which will change the shape of the glottal source signal from one period to the next, introducing jitter . The same problem can occur in amplitude. If, for instance, the vocal folds are too stiff, they will need a higher sub glottal pressure to vibrate. The glottal cycle can thus be irregularly disturbed also in amplitude, originating shimmer. The possible existence of high frequency noise, especially during the closed phase of the glottal cycle, originated by a partial closure of the vocal folds will cause an air leakage through the glottis, providing a turbulence effect. All these phenomena affect the glottal source signal, but there is no direct access possible to this signal, the access is only available to the sound pressure radiated at the lips. The estimation of the glottal source signal from the voice signal is not a simple task. Research in this field shows that it is reasonable to approximate the influence of the vocal tract by a linear filter. Using this approximation, the voice signal can be filtered by inverse of this filter to obtain an estimate of the glottal source signal . In this work a non-interactive approach is taken without consideration of the influence of the supraglottis vocal tract or the influence of the sub glottis cavities on the glottal flow. As a consequence, it is assumed that the source and filter parameters are independent.
Details of the vocal tract and the utterances with respect to different articulators are indicated below.
Figure 7.1 Vocal Tract indicating Place of different categories of Speech Production
7.2 Frequency Perturbation-
Fundamental frequency has been shown to reflect the vocal fold status and how the variables interact and perform. Fundamental frequency is defined as follows.
The vocal fundamental frequency is reflective of the biomechanical characteristics of the vocal folds as they interact with sub glottal pressure. The biomechanical properties are determined by laryngeal structure and applied muscle forces. Adjustment of the latter, in turn, is a function of reflective, affective and learned voluntary behaviors.
The small cycle-to-cycle changes in frequency are represented by frequency perturbations or what is called jitter. The estimation of irregularities in the vibration of the vocal folds is commonly measured by the jitter parameter. Jitter measures the irregularities in a quasi-periodic signal and can account for variations in one or more of its features, like period, amplitude, shape, and so forth . Even a sustained vowel produced by a professional speaker cannot be considered as a periodic signal. Hence, the jitter of a voiced speech signal is usually taken as a measure of the change in the duration of consecutive glottal cycles.
Jitter the instability of vocal fold vibration represents a reflection of biomechanical and neuromuscular control. Perturbation values can vary in several ways. Variations of pitch perturbation are inversely proportional to the frequency, for example when fundamental frequency increases, jitter decreases. Variations of pitch perturbation is expected to change in relation to the degree of tension present in the vocal folds, where high tension reflects lower perturbation values and low tension reflects higher perturbation values.
Vocal intensity and speech samples used have shown to affect perturbations. The effects of intensity on jitter values are as follows. The higher jitter ratios were related to low pitch and intensity of sustained vowels.
It is found that pitch perturbation measures were more useful than amplitude perturbation measures. The use of perturbation measures in sustained phonation to detect laryngeal pathologies is more valuable when combined with amplitude perturbation measures. Amplitude perturbation measures could increase the accuracy of targeting vocal pathologies.
The greater the force, the greater is the amplitude of a sound wave. Intensity is an objective measure of sound loudness and it is the amount of force per unit of area. As pressure changes the perception of loudness also changes. Changes of frequency can affect perceptions of loudness. The Vocal intensity is dependent on an interaction of sub glottal pressure and the adjustment status and aerodynamics at the level of the vocal folds, as well as vocal tract status.
Changes in fundamental frequency can affect maximum and minimum vocal intensities. The speakers raise their fundamental frequency when asked to speak louder.
Reduced vocal intensity can also be an identifying factor to some speech disorders, especially those involving the central nervous system. Values of amplitude perturbation or “shimmer” are similar to jitter values. Shimmer measures the cycle-to-cycle variations of the vocal signal. Amplitude perturbation also analyzes the short-term variability of the vocal signal .Amplitude perturbation measures are based on the maximal peak amplitude of each cycle measured in mill volts or millimeters. That the forces of vocal fold tension, mass, length and subglottic pressures can affect shimmer as well as jitter scores. Shimmer is also better analyzed in prolongations of single sounds rather than in spontaneous or connected speech. Connected speech contains instances of silences, its intensity is variable between syllables as well as in word stress, and each phoneme has its own acoustic pattern. 
The vocal folds are driven by the cranial nerves originating within this system. If the cranial nerves which innervate speech articulators are destroyed or damaged, then the transmission to the vocal folds will be disrupted. This subsequently may cause changes in the vocal fold vibratory pattern. It happens in case of Multiple Sclerosis (MS) subjects. Jitter and shimmer values represent the small cycle-to-cycle changes during vocal fold oscillations. It would seem logical that there would be significant changes reflected in the jitter and shimmer measures, since the vocal folds play an active role in speech production.
Speech portion of large duration ‘ 10ms is used to calculate frequency, jitter and shimmer. The pathological subjects have higher jitter values than the normal subjects for utterances
Vocal Tract Model
The vocal tract is responsible for changing the spectral balance of the glottal source signal. By changing the vocal tract shape the speaker can modify its resonance frequencies to produce a wide variety of different sounds. Humans use the evolution in time of the resonance frequencies to produce speech. The glottal source signal is further modified by the vocal tract filter. The contribution of the vocal tract resonances can be removed from the speech signal by inverse filtering.
Noise to Harmonic Ratio (NHR) is another useful measure of hoarseness. For a signal that can be assumed to be periodic (e.g. a sustained vowel), the signal-to-noise ratio will be equal to the harmonics-to-noise ratio (HNR). Praat declares that a healthy voice phonating /a/ or /i/ should have an HNR of 20, and 40 for the phonation of the vowel /u/. The pathological subjects have lower HNR values than the normal subjects for utterances and hence the intelligibility of vowels or consonants produced by them is very low.
Formant Frequency Analysis
In real time continuous speech, even the sustained vowel phonation consists of some random part mainly due to turbulence of airflow through the glottis (anterior and/or posterior glottis) and due to pitch perturbations. During the vowels pronunciation, the value of first formant (F1) can be found in range from 250Hz to 1000Hz. When the tongue is closer to the hard palate the value of F1 is lower. The value of the second formant (F2) can vary from 550 Hz up to 2700 Hz and it depends on front and back position of the tongue. The lower value for F2 can be achieved by rounding the lips . Third formant (F3) is important for quality and clarity of pronounced phoneme.
7.3 System Implementation
The system implemented for vocal folds Speech disorder is described below.
The present work attempts to find few parameters from pathological speech for confirmation of Vocal Folds disorder. Researchers have used lot many parameters. We try to reduce the computational cost and reduce the number of parameters. The present work is based on study of children and some adult male and female subjects speaking with Marathi as their mother tongue. The speech data of normal subjects/children and pathological subjects/children of the same age group between 3 to 10 years is collected. The children were trained to utter similar words before recording. The speech data of normal adult male and female subjects and pathological adult male and female subjects of the same age group between 20 to 50 years is collected
The standard database is not available. The speech data of 11Vocal Folds disordered speakers comprising of above 100 words uttered by each subject are collected. The speech database consists of isolated words, connected words, fast uttered sentences and songs for e.g. Prarthana-School-Prayer, National anthem and Pledge ,Nursery Rhymes ,famous film songs etc. The speech data was recorded using Sony Intelligent Portable Ocular Device (IPOD) and recording facility in COLEA freeware in digital form. The recording was carried out in a pleasant atmosphere and maintaining the children and other subjects in tension-stress free environment. The recorded signal is transformed into ‘.wav ‘file by using GOLDWAVE software. The data was collected at ChetanaVikasMandir, a special school established to educate Mentally Retarded children as well as children with various disorders. It is located at Kolhapur, India. The data is also collected from the patients under the treatment of speech therapists and ENT specialists in Kolhapur city. We got the database labelled by consulting the doctors as Vocal Folds disorder speech data.
7.3.3 Evaluation of Speech
The present work attempts to confirm the Vocal Folds disorder from the speech signal by extracting only important segmental and supra segmental acoustic indices. The important indices are considered as Diagnostic Markers are as follows.
1. Evaluation of Fundamental frequency fo, Jitter, Shimmer, and HNR for the analysis of harshness and breathiness in the voice to be done in the Training Phase to confirm the speech as of pathological speech category. Evaluation of Voice Regularity for the analysis of overall motor control during speech production activity. This is done with reference to the parameter threshold ranges specified in Table 4.1.
2. Fundamental Frequency Analysis- The fo mean value lies in the high range between 180 Hz to 440 Hz for adult male female speakers and children.
3. Glottal frequency variations ‘ the source frequency fo variations with respect to % percentile values from 0% to 100%. The Linear or nonlinear nature of the characteristics is important. The Vocal Folds disorder is confirmed by low gradient index in 0% to 50% fo percentile range and high gradient index and nonlinear nature of the characteristics in 50% to 100 % fo percentile range.
4. % Close Quotient (CQ) graph simulates the Laryngograph and indicates the close phase of Glottis pulse signal or vocal folds vibrational cycle. The mean CQ, range of variation of CQ and CQ variations with respect to total time duration of speech sample are important parameters.
5. All ‘Tx’ graph indicates histogram of all pitch cycles for the total time duration of speech sample. Regular ‘Tx’ graph indicates the histogram of the regular pitch periods which vary within +/- 10% with respect to the adjacent pitch periods. The similarity between the two graphs and the mean fo value of the person are important parameters.
7.4 System Block Schematic ‘ Diagram of the system implemented for evaluation of Vocal Folds disorder is as shown in Figure 7.2 below.
The present work attempts to confirm the Vocal Folds disorder from the speech signal by extracting various segmental and supra segmental acoustic indices. The important indices are considered as Diagnostic Markers. The acoustic indices are evaluated for all isolated word and SFS tool generates ‘Ls’ signal from the input speech data, which is compatible to Laryngograph signal which is not accessible to us. The ‘SFS’ tool provides the Qx a histogram of the closed quotient values found in the recording. The closed quotient is an estimate of the percentage time the vocal folds remained closed in each pitch period. It is found from the Laryngograph signal.The SFS tool provides the Dx1 a histogram of all the pitch periods found in the recording, distributed according to their fundamental frequency, Dx2 a histogram of all the regular pitch periods found in the recording, distributed according to their fundamental frequency. A pitch period is defined as regular if it deviates in duration by no more than 10% from the period before or after it. Using the acoustic indices the relationship between the Electroglottograph (EGG) measures and the physical movements of the vocal folds is expressed as ratios between the temporal measures of open phase of vocal fold movement with closed phase and also between different phases of movement with the full glottal period continuous sentence type speech data samples (above 100 words by each subject) uttered by every pathological and normal subject. The ready to use softwares are used for the development phase .We have used the developmental tool’ Praat’ to verify the intensity, pitch , the formant frequencies f1 , f2 ,f3 and HNR calculated by us using our MATLAB code for the speech samples. We have used the developmental tool ‘SFS’ to extract the parameters voice regularity, Jitter, Shimmer, fo mean mode values. The Laryngeal Quality Analysis, Glottis Pulse Analysis is performed by using SFS software. SFS reports indicate exactly similar ‘all Tx ‘ and ‘regular Tx ‘ glottis pulse graphs in case of normal subjects where as for pathological subjects the ‘regular Tx ‘ pulses produced are very low, insufficient in time domain as well as variable in frequency domain .
7.5 System Development
The system is developed using two modes training mode and testing mode.
Training Mode- In training mode 50 speech samples are used to train the system. The Laryngograph comprising of % Close Quotient (CQ) with respect to time indicative of the close phase of Glottis pulse signal or vocal folds vibrational cycle and the comparison of histograms of all ‘Tx ‘ which is all pitch periods with respect to regular ‘Tx’ pitch periods are plotted for these 50 speech samples to confirm Vocal Folds disordercharacteristics. The observations for Tx are as follows.
1. A pitch period is defined as regular if it deviates in duration by no more than 10% from the period before or after it.
2. In case of normal persons, regular Tx graph matches with all Tx graph.
3. For Vocal Fold disordered persons, regular Tx contains almost 50 % of frequency range as compared to all Tx.
Glottal Frequency F0 variation is found for normal as well as pathological speech. It is evaluated in transformed percentile domain. The fo values for a speech data file are calculated by using Framing and Windowing Algorithm in MATLAB. Then the Normalization routine is developed in Microsoft Excel. The algorithm for evaluation of Normalized fo Variation in Percentile Domain is as follows.
1. The % percentile value of f0 maximum level is considered to be 100% percentile.
2. Hence according to the data values of f0 variations as per the speech sample 0%-5%-10%-15%—–95%-100% percentile values of every speech sample are calculated.
3. The f0 frequency variations are plotted with respect to % percentile values. This graph provides a very good measure of fundamental frequency analysis to differentiate between normal speech and pathological speech .It also confirms Vocal Folds disorder.The graph is linear for certain range with high or low gradient and it exhibits curved nature in the other range of fo for Vocal Folds disorderedspeakers.
For normal speech % percentile fo track graphs are linear from 5% to 95 % range with very low gradient 0.05 to 0.3
The observations for CQ are as follows.
1. CQ is a time Vs CQ graph. It indicates that the CQ values are present in the range 15 % to 75 % for Vocal Folds disordered speakers.
For normal speech, CQ variation is observed in the range 15 % to 60 %
The observations for Regular Tx pulses are as follows.
1. % Regular Tx pulses is less than 20 % or even negligible of the frequency spectrum as compared to all indicating loss of regular pitch periods.
7.6 Graphs of some of the Diagnostic Markers-
Figure 7.3 to Figure 7.5 show the speech parameters extracted for 3 different speakers. The observations are indicated in the notes.
Diagnostic Markers ‘ dudh1.wav
Figure 7.3a. % Regular Glottal Pulses dudh1
Notes-The regular Tx pulses are produced in the 110 Hz to 140 Hz and 258 Hz to 278 Hz pitch spectrum.
The time duration of regular Tx pulses is almost 50% of All Tx graph
Figure 7.3b. % Close Quotient wrt Time dudh1
% Qx graph indicates set of short Qx pulses spread in 20% to 75% CQ with mean value %CQ ‘ 54.7
Figure7.3c. % Percentile fo track variations dudh1
% percentile fo track indicates initial linear section from 5% to 50% percentile range with 0.07 -low gradient index, and second nonlinear curved section graph from 55% to 95% percentile range and from 9 5% to 100% range’ 4.31 very high gradient index
Figure 7.4a. % Regular Glottal Pulses -dr1
Notes-1. The regular Tx pulses are produced in the 120 Hz to 140 Hz and 500 Hz to 510 Hz pitch spectrum.
2. The time duration of regular Tx pulses is almost 30% of All Tx graph
Figure 7.4b. % Close Quotient wrt Time ‘dr1
Note-% Qx graph indicates set of short Qx pulses spread in 15% to 72% CQ with mean value %CQ ‘ 44.5
Figure 7.4c. % Percentile fo track variations dr1
Note- % percentile fo track indicates linear graph from 5% to 50% percentile range ‘0.12 low gradient index, nonlinear curved graph from 55% to 95% percentile range with very high gradient
Diagnostic Markers-par 16.wav
Figure 7.5a. % Regular Glottal Pulses -par 16
1. The regular Tx pulses are not produced in the entire 60 Hz to 600 Hz pitch spectrum.
Figure 7.5b. % Close Quotient wrt Time-par 16
% CQ graph indicates Qx pulse 18% to 77% wide with CQ with mean value %CQ ‘ 46.2
Figure7.5c. % Percentile fo track variations ‘par16
Note-Glottal frequency fo Characteristics in Percentile Domain indicates continuous rising locally nonlinear graph from 0% to 100% percentile range ‘0.55 high gradients’
Testing Mode- In testing mode remaining 50 speech samples are used for confirmation of Vocal Folds disorder. The testing mode checks the Laryngograph characteristics, CQ graph and F0 track to confirm Vocal Folds disorder. The following observations are made.
1. Fundamental Frequency fo mean is in the range 120 Hz to 440 Hz as per the categories like adult male, female, children or elderly speakers.
2. It is observed that Laryngograph comprising of regular Tx pulses contains less than 20 % or even negligible of the frequency spectrum as compared to all Tx.
3. Time Vs closed quotient graph indicates closed quotient range more than 50 % wide with mean %CQ within 42% to 48%.
4. fo track has a mixed linear and nonlinear curved nature.
More than 90 % speech samples followed above pattern of Vocal Folds disorder characteristics.
7.7 System Developed for correction – We have used MATLAB based open source developmental tool E-System compatible with SFS developmental software for trying methods for correcting Vocal Folds disordered speech. The developmental tool COLEA and Adobe Audition are used for preprocessing the speech samples. In preprocessing the silence zone or the audible breathing voice segments are removed. Using E-System following processing blocks can be designed.
‘ Amplifier /Attenuator ‘ Design specifications are gain and Bandwidth
‘ Low Pass Filter- Design specifications are Cut off Frequency
‘ High Pass Filter- Design specifications are Cut off Frequency
‘ Band Pass Filter- Design specifications are lower and upper Cut off Frequency
‘ Band Stop Filter- Design specifications are lower and upper Cut off Frequency
‘ Vocal Tract Filter- Design specifications are f1,f2 and f3 formant Frequencies.
‘ Resonator- Design specifications are Resonating Frequency and Bandwidth.
The system applied for correction is developed with the help of following filters.
Band Pass Filter- The lower cut off frequency is in the range 10 Hz to 100 Hz and the upper cut off frequency selected should be such that the second formant frequency f2 should lie in the pass band. Hence it is selected as 1500 Hz, 2000 Hz or 2500 Hz as per male, female or children based on pitch frequency range.
Resonator – The resonating center frequency selected should be such that the second formant frequency f2 should lie in the pass band. Hence it is selected as 1500 Hz, 2000 Hz or 2500 Hz as per male, female or children based on pitch frequency range.
Vocal Tract Filter ‘ It is realized as a cascaded combination of three resonators acting as per three formant frequencies. The standard adult male formant frequencies are 500 Hz, 1500Hz, and 2500 Hz. The first formant frequency is amplified by 20 dB, the second formant frequency is amplified by 10 dB and the third formant frequency is maintained at 0 dB. Hence this filter boosts up the input speech signal spectrum as per the formant frequencies. In case of pathological speech the amplitudes of upper formants are degraded .Hence the VTF is the better solution to lift up the second formant spectrum.
7.7.1 System Applied for Correction of Vocal Folds disorder
The performance of Band Pass Filter was found to be better in comparison with Vocal Tract Filter and Resonator Filter during the Training mode. Hence the BPF is applied for correction of Vocal Folds disorder speech. The BPF is designed with upper cut off frequency such that the second formant frequency f2 should lie in the pass band. Hence it is selected as 1500 Hz, 2000 Hz or 2500 Hz as per male, female or children based on pitch frequency range.
After applying the BPF to Vocal Folds disorder the following observations are made.
5. It is observed that Laryngograph comprising of regular Tx pulses graph shows improvement and contains more than 50 % of the frequency spectrum as compared to all Tx
6. Time Vs closed quotient graph indicates closed quotient range more than 50 % wide with mean %CQ within 42% to 48%.
7. F0 track has a low gradient (slope)- In 10% to 50% percentile range ‘ 0.17 to 0.24 and a high gradient (slope) in 55% to 100% percentile range ‘0.55 to 1.94 .The nonlinear curved section is improved and converted to approximately linear with comparatively low gradient levels.
Improvement in the diagnostic markers due to application of BPF Filter is indicated below with the help of Regular Tx graph, % Close Quotient (CQ) graph and % percentile fo track graphs.
Diagnostic Markers-dudh 1BPF.wav
Figure 7.6a1. % Regular Glottal Pulses-dudh1 BPF processed
Figure 7.6a2. % Regular Glottal Pulses-dudh1 Original
The regular Tx pulses are produced in the 110 Hz to 140 Hz and 260 Hz to 275 Hz pitch spectrum in original where as in 110 Hz to 160 Hz pitch spectrum only in BPF processed graph.
Figure 7.7b. % Close Quotient wrt Time dudh1 BPF Processed
Figure 7.7b. % Close Quotient wrt Time dudh1 Original
The % CQ graph lies within 20 to 75% range in Original where as it lies in 15% to 60 % range in BPF processed.
Figure7.5c1. % Percentile fo track variations ‘dudh1 Original
Figure7.5c2. % Percentile fo track variations ‘dudh1 BPF Processed
Original% percentile fo track indicates linear graph from 5% to 50% percentile range ‘0.077 low gradient indexes. In BPF processed the nonlinear curved graph from 55% to 85% percentile range is smoothed ,approximately linear from 5% to 85% percentile range with 0.06 very low gradient
fo Track Comparison dudh1 Vocal Folds Disorder
Figure7.7. Comparative Response of the Correction System for Vocal Folds Disorder
It is evident from the graph that the BPF is better than Resonator and VTF. The BPF processed graph extends linear range till 85% fo and further reduces the gradient of both the sections. Hence it is preferred as Correction System.
7.8Segmental and Supra segmental Acoustic indices Analysis
The analysis of segmental and supra segmental acoustic indices was carried out for particular isolated words and continuous speech data. The isolated word data above 100 words uttered by each of 25 normal subjects and12 Vocal Fold disabled subjects were analyzed and reference /threshold level was considered for each isolated word. Various Misarticulation cases were analyzed in case of pathological subjects. The spectrograms were studied for Formant analysis. Fast uttered words or continuous sentences exhibit greater complexities with respect to speech intelligibility.
Considering the observations for % CQ variation, Tx variation and F0 track variation observations the system is designed for confirmation of Vocal Folds Disorder. Results for training system are shown in Table 7.1.
Table 7.1 range of diagnostic markers for 50 % of Vocal Folds Disordered speech sample and 50 % of normal speech
Diagnostic marker Range of values for Vocal Folds Disordered speech Range of values for normal speech
Time Vs % CQ 15 % to 75 % 10 % to 60 %
Time periods Vs frequency TX graph Regular Tx has less than 20 % or negligible frequency range as compared to all Tx graph with intermittent pulses Regular Tx matches with all Tx for more than 90 %.
Percentile Glottal Frequency fo track variation fo track has a mixed initial range linear and nonlinear, curved nature in higher range in percentile domain fo track has linear nature for entire range in percentile domain with low gradient.
Results for testing of remaining 50 % samples are shown in Table 7.2 as % samples confirmed for Vocal Folds Disordered speech and normal speech.
Table 7.2 % confirmation for 50 % of Vocal Folds Disordered speech and normal speech samples
Parameter used % samples confirmed for Vocal Folds Disordered speech % samples confirmed for normal speech
Time periods Vs frequency TX graph 98 % 100 %
Percentile F0 track variation 100 % 100 %
All 3 parameters % CQ, Tx, F0 97 % 100%
The existing speech enhancement algorithms like spectral subtraction do not help in enhancement of pathological speech . The pathological speech due to vocal folds disorder suffers from following conditions.
‘ Breathing voice segments are audible in speech because the subjects are under stress when they speak .When the speakers are supposed to take pause in between utterances of two successive words generally the breathing voice segment is heard.
‘ The minimum intensity level does not drop much as there is no silence region due to the presence of breathing voice segments.
‘ The speakers have to put more efforts for the motor movements of articulators .Hence the utterances of different words is not appropriate.
‘ Due to low HNR levels below the pathological threshold of 12 dB the speech indicates harshness.
‘ The formants f1, f2, f3 are seen to be spread in case of pathological speech with respect to normal speech. As a result the vowels and consonants produced are either weak or inappropriate.
7.10 Our Contribution to present work
The Vocal Folds disordered speech database is not available. We got the database labeled by the doctors .We have evaluated and analyzed the speech of the Vocal Folds disorder people with the help of few segmental an supra segmental acoustic indices like fo mean , Percentile glottal frequency fo track Characteristics, Laryngeal Quality represented by % Close Quotient characteristics , All ‘Tx’-Regular ‘Tx’ time-frequency Histogram and % voice Regularity. Evaluation and confirmation of Vocal Folds disordered speech using the present theory is done for the first time by us and it is not done by any one before.
7.11 Concluding Remarks
The Vocal Folds disorder can be identified by evaluation of speech of 25 normal and 12 pathological subjects .Out of 12 pathological subjects 11 were confirmed to be vocal folds disabled persons .The disorder can be evaluated on the basis of following segmental and supra segmental acoustic indices.
‘ The speakers exhibit variability in different acoustical indices from time to time.
‘ The range of % Voice Regularity is observed to be 15% to 40%.
‘ The range of % CQ Mean is observed to be 25% to 49 % with standard deviation of 13% to 20 %.
‘ The % of regular Tx pulses is very low with respect to all Tx pulses. The glottal pulses exist relatively only for 20 to 50 % of time duration of speech data and are effectively present in particular frequency range in 140 Hz to 300 Hz. The glottal pulses cannot be produced by the disabled speakers in low frequency range and in400 Hz to 600 Hz spectrum..
‘ Percentile glottal frequency characteristics are Linear with low gradient up to 50 % percentile fo and Nonlinear curved with very high gradient up to 100% percentile fo. In some cases Nonlinear with very high gradient
‘ The minimum intensity levels attained are in the range 28 dB to 40 dB because of breathing voice segments present consistently.
...(download the rest of the essay above)