Voice Forensics: Rita Singh | 2019 Wharton People Analytics Conference
Summary
TLDRThis talk delves into the fascinating realm of voice profiling, where the speaker demonstrates the potential to deduce a person's physical and psychological traits from their voice. Utilizing artificial intelligence, the technology can analyze minute voice features to predict age, health, environment, and even reconstruct a person's physical appearance. The presentation showcases the technology's current capabilities and its future potential to enhance machine understanding of individuals.
Takeaways
- 🕵️♂️ Profiling humans from their voice involves analyzing various aspects of a person's voice to deduce information about their identity, background, and even environment.
- 📞 The script starts with a dramatic example of a bomb threat call to illustrate how voice profiling can be used in law enforcement to identify criminals.
- 🔍 Voice profiling can reveal details such as a person's ethnicity, height, weight, age, and even their state of health or intoxication.
- 🏠 The environment in which a person is speaking can leave traces in their voice, which can be analyzed to infer the room's materials and dimensions.
- 🗣️ The human voice is a complex signal that carries a wealth of information, including physiological parameters, personality traits, and emotional states.
- 🎶 Voice profiling uses artificial intelligence to extract micro-features from the voice that are not easily discernible by the human ear.
- 📉 The script explains the voice production process, highlighting how the vocal tract acts as a resonance chamber and how its shape affects the sound produced.
- 🎛️ Different aspects of the voice, such as pitch, harmonics, and resonance, can be visualized using spectrograms, which show the frequency content over time.
- 👵 Aging and certain medical conditions like Parkinson's can affect the voice and leave detectable patterns that can be identified through voice analysis.
- 🎨 The script demonstrates the potential of voice profiling to reconstruct physical features like the face and even the skeletal structure from a person's voice.
- 🚀 The technology behind voice profiling is advancing, with live demonstrations and applications like recreating historical voices, indicating a promising future for this field.
Q & A
What is the main subject of the talk?
-The main subject of the talk is profiling humans from their voice, which includes identifying various characteristics and information about a person based on their voice patterns.
Why would someone record a threatening voice and take it to the police?
-Someone would record a threatening voice and take it to the police to help in identifying the perpetrator of a crime, such as bomb threats, harassment, extortion, or hoax calls.
What does the speaker do to demonstrate the concept of voice profiling?
-The speaker plays out a bomb threat recording and asks the audience to form an opinion about the speaker based on the voice. Afterward, the speaker reveals detailed information about the speaker's characteristics, which were deduced from the voice.
What are some of the personal characteristics that can be inferred from a person's voice?
-Some personal characteristics that can be inferred from a person's voice include age, gender, ethnicity, height, weight, health status, background, personality, and even the environment they are in.
How does the human vocal tract function as a resonance chamber?
-The human vocal tract functions as a resonance chamber by producing echoes and resonances when air passes through it. The shape and dimensions of the vocal tract affect the nature of these resonances, which change the sound produced.
What is a spectrogram, and how does it represent voice information?
-A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. It shows the frequency content of a voice signal, with time on the x-axis and frequency on the y-axis, color representing the energy in each frequency at a given time.
How does the voicing onset time differ between individuals and sounds?
-Voicing onset time is the time it takes for the vocal folds to go from a state of rest to a state of motion. It differs between individuals due to the unique physical properties of their vocal tracts and also varies for the same person when producing different combinations of sounds.
What is an example of how voice analysis can reveal a person's environment?
-Voice analysis can reveal a person's environment by identifying reverberation and smear in the voice signal caused by surrounding materials, such as glass, which can indicate the presence of a glass enclosure and its dimensions.
How can the structure of a person's skull and face be deduced from their voice?
-The structure of a person's skull and face can be deduced from their voice because the skull and face are closely related to the vocal chambers. The contours of the face and the structure of the skull can affect the resonance and frequencies produced by the voice.
What was the outcome of the live demonstration of voice profiling technology in Tianjin in 2018?
-In the live demonstration in Tianjin, about a thousand people tried out the technology, which included a virtual reality demo where participants' faces were recreated in 3D based on their voice, allowing them to pick up and examine their virtual faces.
What was the significance of recreating Rembrandt's voice from his facial portraits?
-Recreating Rembrandt's voice from his facial portraits demonstrated the advanced capabilities of voice profiling technology, showing that it is possible to reverse-engineer a person's voice based on their physical features, even from historical figures.
What does the future hold for voice profiling technology according to the speaker?
-The future of voice profiling technology holds the potential for machines to know individuals better, possibly even better than they know themselves, indicating a deeper integration of voice analysis in various aspects of life and technology.
Outlines
🕵️♂️ Profiling Humans from Voice Analysis
The speaker introduces the concept of profiling humans based on their voice, using a hypothetical scenario where an individual receives a threatening call and brings it to the police. The police, in turn, use voice profiling to deduce characteristics about the caller, such as age, physical attributes, and environment. The speaker demonstrates this by playing a recording of a bomb threat and then revealing specific details about the speaker, including his physical description, mental state, and the environment he is in. The talk emphasizes that humans naturally make judgments about others from their voices, and this process can be enhanced with artificial intelligence to extract more detailed information.
🎙️ Understanding Voice Production
This paragraph delves into the science behind voice production, comparing the human vocal tract to a resonance chamber that creates echoes and resonances. The speaker explains how the vocal cords vibrate to produce sound, which is then shaped by the movement of articulators such as the tongue, lips, and jaw. The resulting sound wave is a complex pattern of frequencies over time, visualized through a spectrogram. The speaker uses examples like a Ferrari's acceleration to illustrate the uniqueness of each person's voice, highlighting the differences in voicing onset time, which is a distinctive feature for each individual.
🎼 The Complexity of Voice in Time and Frequency
The speaker continues to explore the intricacies of voice, focusing on the micro features present in both time and frequency. Using a song and a spectrogram, the speaker shows the complexity of the human voice and attempts to recreate a snippet of it, emphasizing the difficulty of replicating the unique details. The paragraph also includes examples of voices from different individuals, including a singer with a distinctively sweet voice, to illustrate the vast differences in vocal characteristics. The speaker mentions the use of artificial intelligence to discover and extract these micro features, which are key to understanding the identity and uniqueness of a person's voice.
🏛️ Applications and Future of Voice Profiling
In this final paragraph, the speaker discusses the practical applications of voice profiling, such as recreating faces in 3D from voice recordings and even reviving the voice of historical figures like Rembrandt from his facial portraits. The speaker also touches on the potential future of this technology, suggesting that it could enable machines to understand individuals better than they understand themselves. The talk concludes with a mention of a forthcoming book by the speaker, which will elaborate on the technology of voice profiling in detail.
Mindmap
Keywords
💡Profiling
💡Voice Analysis
💡Artificial Intelligence (AI)
💡Vocal Tract
💡Resonance
💡Spectrogram
💡Voicing Onset Time
💡Micro-features
💡Reverberation
💡Physical Form Reconstruction
💡Virtual Reality Demo
Highlights
Profiling humans from their voice is possible and can reveal personal characteristics and environment.
Voice is used in numerous crimes such as threats, harassment, extortion, and hoax calls.
Listeners can form opinions about a speaker's characteristics based on their voice.
Profiling involves detailed analysis of a person's voice to predict their physical and psychological attributes.
Voice carries information about age, health, background, personality, and environment.
Human vocal tract acts as a resonance chamber, affecting the nature of sound produced.
The process of voice production is complex and involves the coordination of various muscles and structures.
Voice analysis can reveal the size and materials of a room from the resonances in the voice signal.
Artificial intelligence is used to extract micro-features from the voice for profiling.
Different voice representations, such as spectrograms, can reveal unique characteristics of a speaker.
The Queen of England's voice has shown changes over 50 years, indicating the effects of aging.
Voice analysis can potentially trace the structure of a person's skull and skeleton from the sound produced.
Technology demonstrated in Tianjin allowed for 3D facial reconstruction from a person's voice.
Rembrandt's voice was recreated from his facial portraits using advanced voice profiling techniques.
The future of voice profiling technology may enable machines to know individuals better than they know themselves.
A forthcoming book by Springer will detail the technology of voice profiling in approximately 400 pages.
Transcripts
everyone thank you for being here for my
talk the subject of my talk today is
profiling humans from their voice and I
would like to introduce the subject to
you through an example what if someone
called you out of the blue and
threatened to kill you and they did that
repeatedly you've never heard the voice
what would you do you would probably
record the voice and take it to the
police what would the police do that was
a bomb threat there are hundreds of
crimes that happen every day around the
world through the medium of voice
threats harassment extortion ransom
calls hoax calls and a lot more what I'm
going to do now is to play out a bomb
threat to you I want you to listen to
the voice very carefully and try to form
an opinion about this speaker and then I
will tell you a little bit about the
speaker and let's match our notes here
you kind of share fosters better off to
knock off you hello sir can you hear me
fine yes how can I help you
all right listen to me very carefully
okay all right listen very carefully and
don't interrupt me I've got seven pipe
bombs surrounding these beep are seven
pipe bombs the blast radius will go off
within a 500-meter radius telling
everybody within it seven pipe bombs are
located in on this close location and I
see the point I'm not afraid of blow
them up I have an ak-47
if the bump don't detonate in one hour I
want to run in with me ak-47 killing
everybody in the arc
do you understand
I'm sure you've formed some opinion
about the person what if I told you that
this person is white he's Caucasian he's
brought up in America probably in the
northwest
he's about 170 centimeters tall he
weighs about 72 kilograms
he's about 38 years old he's high on
cocaine he's a heavy smoker he's in a
small room the room has a wooden floor
the room has gypsum walls there is a
large glass window behind him he's using
a laptop to make the call probably an
IBM ThinkPad there's a ceiling fan in
the room and so on what if I told you
that this is what he looks like probably
what if I went one step ahead and told
you that this is what he looks like this
is what I call profiling humans from
their voice how can we do this it turns
out we do this all the time we make
judgments about other people from their
voices all the time
how many times have you met a friend or
heard their voice over the phone and
told that friend who sounds sad you
sound depressed you sound happy and so
on we make these judgments all the time
here he sounded like a man who had slept
well and then Oh too much money and we
make these judgments so easily that we
don't even realize it a remarkable ride
only here in America I was born in
there's a lineup of people up there did
you get who did you did you understand
or get who spoke might have spoken that
yes you did
right in that two seconds you judged the
person's gender their age their
ethnicity their state of mental health
their state of physical health and we
can do this because voice carries
information we just don't realize how
much information was carries it carries
information about your age your your
physiological parameters your physical
stature your high
wait your health your background your
personality even about your environment
if you're in a room there are signatures
of the room in your voice and we'll see
examples of that
how big is a room what's the ceiling
made of what's the floor made of and so
on and so forth it's possible to extract
all that and the science of profiling
uses artificial intelligence to find
this information in the voice and to
make predictions from it except that we
hope the machines will do it much better
because human hearing is not all that
good so we come to information now the
voice carries information but where is
that information in order to understand
where that information is we need to
understand a little bit about the voice
production process the human vocal tract
if the vocal chamber is a is a is a
resonance chamber and if you imagine a
building that looked like that and a
little guy shouting into the building
what do you think you would hear you
would hear echoes you would hear
resonances right if I change the shape
of the that building or the dimensions
what would happen the nature of those
echoes and resonances would change so
when we speak
air comes out of the lungs and it goes
through two vocal cords in our larynx
here they're not really chords they're
folds and they vibrate in response to
the air creating a sound and this sound
resonates in your vocal chambers we we
produce different sounds as we speak by
changing the dimensions or the shape of
the vocal chambers by moving our
articulate errs the our our tongue lip
jaw and so forth and as we speak we
produce thousands of frequencies we
produce frequencies in the range of 50
to 6800 Hertz what comes out as a result
of this process is a pressure wave that
looks like that the signal on the top in
time
and the picture below is it's frequency
content on the y-axis you have frequency
the thousands of frequencies that we
produce on the x-axis you have time and
the color at any pixel is the energy in
that frequency at that time and these
high-energy patterns that you see
throughout the picture are the
resonances of your vocal chambers so
this is one representation of the speech
signal where is the information in this
one representation and there are many
representations possible in this one
representation the information is in
time in time frequency and in frequency
and we'll see some examples of that so
I'll give you an example I'd start with
an example of information in time and
I'd like I like to start with this
example so this is the Ferrari and it
goes from 0 to 60 miles per hour in 5
seconds flat now at one time it was
touted as a car that was too fast to
race but then other cars came along and
these these these high-end cars go from
from zero to 60 miles per hour in other
times 2.1 seconds to 0.2 seconds and so
on and so forth why are these times
different for these different cars well
the answer is simple these are complex
machines each one is designed
differently it turns out our vocal
production process is even more complex
here is an example of let me play this 3
3 3
it's a nine-year-old boy saying 3 3 3
when we say a word like 3 the first
sound is we produce the sound by
creating an obstruction in our vocal
tract building up air pressure behind it
and releasing it suddenly the vocal
folds are not vibrating the very next
sound is Erb and that sound requires
your vocal folds to vibrate at full
potential your vocal tract has muscles
they have a certain inertia everyone's
vocal track has
inertia and in going for the vocal folds
to go from a state of complete rest to a
state of complete motion takes a small a
very small amount of time and that time
is called the voicing onset time it is
different for different people it is not
only different for different people it
is different for the same person for
different combinations of sounds that
makes this characteristic very unique
this is a characteristic in time so
let's go on ahead and see some of the
examples of information in frequency and
time frequency I'm going to play out
this very beautiful song to you
[Music]
Oh
beautiful let's look at the spectrogram
how complex is this how complex is this
if I wanted to artificially reproduce it
I would not be able to do it and I'll
show you some of my best efforts I take
a small snippet of this and by the way
you all these fine details that you see
are in time and frequency so this is
information in time and frequency in
this one representation so here are my
attempts to recreate a small sliver of
this sound I start with this
mark knows getting close this is a real
one this is the closest I could get
mathematically and this is just a small
snippet of the sound produced by this
person every person's voice is unique
every person's voice is unique let's
look at another example
this is a singer from India at one time
she
I think held the record for having the
sweetest voice in the world
and this is her voice
[Music]
Media
[Music]
if you compare the two voices
side-by-side nothing would match
nothing would match there there there's
such detail in these voices and
incidentally what she is singing and
this with this song was recorded in 1977
she's saying my name will be lost this
face will change my voice is my only
identity the information in voice is in
the micro features and we use artificial
intelligence to discover and extract
these micro features I was talking about
representations so I'll show you
information and a couple of other
representations very interesting these
this is the Queen of England and these
are examples of her voice 50 years apart
when she was very young at age 35 years
ago my grandfather broadcast the first
of these Christmas messages one of the
features of growing old is it-- and
awareness of change can you hear the
difference so this representation is
called the constant hue spectrogram and
the arrows point to the same word spoken
50 years apart and you can see the
differences the pitch is shifted the
harmonics are smeared and there are
other differences here's yet another
representation these are pitch pulses of
three different people saying the word
and do you see the difference in pattern
between the ones on the sides the people
on the sides and the one in the center
the people in the person in the center
has Parkinson's so and the the markers
show up on this representation that's
Hitler in 1935 addressing the Nazi Party
and what do you see in his voice in the
same representation people suspected
from video evidence that he had
Parkinson's this shows that he might
have had Parkinson's indeed
your environment leaves signatures on
your voice this is an example if you're
surrounded by glass
it causes reverberation and causes a
smear in the voice signal you can see
this mirror on the spectrogram and by
backtracking from that smear you can
actually trace the materials of the
enclosure around the person and also the
dimensions of the person the room so and
a lot more as possible using voice
analysis your skull is closely related
to your face your skull is also very
closely related to your vocal chambers
so is your face it is therefore possible
to trace the contours of your face from
the voice it is possible to deduce the
structure of your skull from voice your
skull is connected to your skeleton it
is therefore possible to deduce the
structure of your skeleton from the
voice it's possible to get your height
and weight I can get your BMI I can fill
in your skeletal structure it is then
possible to reconstruct your stature or
your physical form from voice and one
day we hope we're going to be perfect at
doing that or near-perfect at doing that
where are we today
last year in 2018 and September in
Tianjin this technology was demonstrated
live about a thousand people tried it
out and one part of the technology was a
virtual reality demo where people were
wearing a headset and saying something
and their face was recreated in 3d you
could pick it up and examine it this
year in February we reversed the
technology and we recreated Rembrandt's
voice from his facial portraits this was
done in collaboration with JWT Walter
Thompson in Rijksmuseum and ing in
lenz what's the future of this
technology your voice will help machines
know you better perhaps better than even
you can know yourself this is a book
that have just finished writing it's
going to be published very soon by
Springer it spells out the technology in
about 400 pages and hopefully it'll be
out very soon in a couple of months
thank you very much
you
浏览更多相关视频
Menggunakan fitur Artificial Intelligent di telegram Ai
Annie Murphy Paul: What we learn before we're born
Understanding Artificial Intelligence and Its Future | Neil Nie | TEDxDeerfield
Tuning OTel Collector Performance Through Profiling - Braydon Kains, Google
How To Write Research Articles with AI featuring Jenni AI | Research Writing Tutorial
7 Habits to Ditch for Spiritual Growth | C. S Lewis's Wisdom
5.0 / 5 (0 votes)