As their model crawls through a subject’s data, it’s programmed to locate voicing segments, which comprise only roughly 10 percent of the data. For each of these voicing windows, the model computes a spectrogram, a visual representation of the spectrum of frequencies varying over time, which is often used for speech processing tasks. The spectrograms are then stored as large matrices of thousands of values.
But those matrices are huge and difficult to process. So, an autoencoder — a neural network optimized to generate efficient data encodings from large amounts of data — first compresses the spectrogram into an encoding of 30 values. It then decompresses that encoding into a separate spectrogram.

Ad Statistics
Times Displayed: 22687
Times Visited: 478 Stay up to date with the latest training to fix, troubleshoot, and maintain your critical care devices. GE HealthCare offers multiple training formats to empower teams and expand knowledge, saving you time and money
Basically, the model must ensure that the decompressed spectrogram closely resembles the original spectrogram input. In doing so, it’s forced to learn the compressed representation of every spectrogram segment input over each subject’s entire time-series data. The compressed representations are the features that help train machine-learning models to make predictions.
Mapping normal and abnormal features
In training, the model learns to map those features to “patients” or “controls.” Patients will have more voicing patterns than will controls. In testing on previously unseen subjects, the model similarly condenses all spectrogram segments into a reduced set of features. Then, it’s majority rules: If the subject has mostly abnormal voicing segments, they’re classified as patients; if they have mostly normal ones, they’re classified as controls.
In experiments, the model performed as accurately as state-of-the-art models that require manual feature engineering. Importantly, the researchers’ model performed accurately in both training and testing, indicating it’s learning clinically relevant patterns from the data, not subject-specific information.
Next, the researchers want to monitor how various treatments — such as surgery and vocal therapy — impact vocal behavior. If patients’ behaviors move form abnormal to normal over time, they’re most likely improving. They also hope to use a similar technique on electrocardiogram data, which is used to track muscular functions of the heart.
Back to HCB News