Voice Forensics: Rita Singh | 2019 Wharton People Analytics Conference

Wharton School

15 May 201916:22

Summary

TLDRThis talk delves into the fascinating realm of voice profiling, where the speaker demonstrates the potential to deduce a person's physical and psychological traits from their voice. Utilizing artificial intelligence, the technology can analyze minute voice features to predict age, health, environment, and even reconstruct a person's physical appearance. The presentation showcases the technology's current capabilities and its future potential to enhance machine understanding of individuals.

Takeaways

🕵️‍♂️ Profiling humans from their voice involves analyzing various aspects of a person's voice to deduce information about their identity, background, and even environment.
📞 The script starts with a dramatic example of a bomb threat call to illustrate how voice profiling can be used in law enforcement to identify criminals.
🔍 Voice profiling can reveal details such as a person's ethnicity, height, weight, age, and even their state of health or intoxication.
🏠 The environment in which a person is speaking can leave traces in their voice, which can be analyzed to infer the room's materials and dimensions.
🗣️ The human voice is a complex signal that carries a wealth of information, including physiological parameters, personality traits, and emotional states.
🎶 Voice profiling uses artificial intelligence to extract micro-features from the voice that are not easily discernible by the human ear.
📉 The script explains the voice production process, highlighting how the vocal tract acts as a resonance chamber and how its shape affects the sound produced.
🎛️ Different aspects of the voice, such as pitch, harmonics, and resonance, can be visualized using spectrograms, which show the frequency content over time.
👵 Aging and certain medical conditions like Parkinson's can affect the voice and leave detectable patterns that can be identified through voice analysis.
🎨 The script demonstrates the potential of voice profiling to reconstruct physical features like the face and even the skeletal structure from a person's voice.
🚀 The technology behind voice profiling is advancing, with live demonstrations and applications like recreating historical voices, indicating a promising future for this field.

Q & A

What is the main subject of the talk?
-The main subject of the talk is profiling humans from their voice, which includes identifying various characteristics and information about a person based on their voice patterns.
Why would someone record a threatening voice and take it to the police?
-Someone would record a threatening voice and take it to the police to help in identifying the perpetrator of a crime, such as bomb threats, harassment, extortion, or hoax calls.
What does the speaker do to demonstrate the concept of voice profiling?
-The speaker plays out a bomb threat recording and asks the audience to form an opinion about the speaker based on the voice. Afterward, the speaker reveals detailed information about the speaker's characteristics, which were deduced from the voice.
What are some of the personal characteristics that can be inferred from a person's voice?
-Some personal characteristics that can be inferred from a person's voice include age, gender, ethnicity, height, weight, health status, background, personality, and even the environment they are in.
How does the human vocal tract function as a resonance chamber?
-The human vocal tract functions as a resonance chamber by producing echoes and resonances when air passes through it. The shape and dimensions of the vocal tract affect the nature of these resonances, which change the sound produced.
What is a spectrogram, and how does it represent voice information?
-A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. It shows the frequency content of a voice signal, with time on the x-axis and frequency on the y-axis, color representing the energy in each frequency at a given time.
How does the voicing onset time differ between individuals and sounds?
-Voicing onset time is the time it takes for the vocal folds to go from a state of rest to a state of motion. It differs between individuals due to the unique physical properties of their vocal tracts and also varies for the same person when producing different combinations of sounds.
What is an example of how voice analysis can reveal a person's environment?
-Voice analysis can reveal a person's environment by identifying reverberation and smear in the voice signal caused by surrounding materials, such as glass, which can indicate the presence of a glass enclosure and its dimensions.
How can the structure of a person's skull and face be deduced from their voice?
-The structure of a person's skull and face can be deduced from their voice because the skull and face are closely related to the vocal chambers. The contours of the face and the structure of the skull can affect the resonance and frequencies produced by the voice.
What was the outcome of the live demonstration of voice profiling technology in Tianjin in 2018?
-In the live demonstration in Tianjin, about a thousand people tried out the technology, which included a virtual reality demo where participants' faces were recreated in 3D based on their voice, allowing them to pick up and examine their virtual faces.
What was the significance of recreating Rembrandt's voice from his facial portraits?
-Recreating Rembrandt's voice from his facial portraits demonstrated the advanced capabilities of voice profiling technology, showing that it is possible to reverse-engineer a person's voice based on their physical features, even from historical figures.
What does the future hold for voice profiling technology according to the speaker?
-The future of voice profiling technology holds the potential for machines to know individuals better, possibly even better than they know themselves, indicating a deeper integration of voice analysis in various aspects of life and technology.