Jennifer Golbeck: The curly fry conundrum: Why social media "likes" say more than you might think

TED
3 Apr 201409:56

Summary

TLDRThe speaker explores how social media transformed the web from a static space to an interactive platform driven by user-generated content. They discuss the vast amount of personal data shared online, often unknowingly, and how predictive models can reveal hidden traits. Using examples like Target's pregnancy score and Facebook 'likes,' the speaker highlights the power of data analysis and the lack of user control. They propose future solutions, including science-driven tools for user empowerment, while acknowledging the challenges of balancing privacy with data usage for online interactions.

Takeaways

  • 🌐 The early web was static, with most content created by organizations or tech-savvy individuals.
  • 📱 The rise of social media has transformed the web into an interactive platform where average users generate most of the content.
  • 👥 Social media has enabled users to create online personas easily, leading to the widespread sharing of personal data.
  • 🔍 Large companies, like Target, use behavioral patterns from users' data to predict personal attributes and events, such as pregnancy.
  • 📊 Machine learning models can predict hidden user traits, like political preferences, intelligence, and even personality, based on subtle patterns in behavior and online actions.
  • 🤖 Likes on platforms like Facebook, such as the example of 'curly fries,' can predict personal attributes due to patterns in social networks.
  • 🛑 Users often lack control or understanding of how their data is used and what it reveals about them.
  • ⚖️ Solutions to regain user control include policy and law changes, but these are unlikely due to the business models of social media companies and the slow political process.
  • 🔐 Technological solutions, such as data encryption and transparent risk warnings, could give users more control over their data sharing decisions.
  • 🔬 Scientists aim to empower users with tools that help them make informed choices about their data, focusing on improving online interactions rather than exploiting personal information.

Q & A

  • How did the web evolve from the first decade to the era of social media?

    -In the first decade, the web was mostly static, with content created by organizations or tech-savvy individuals. The rise of social media in the early 2000s transformed the web into an interactive space where average users contribute content through platforms like YouTube, blogs, and social media posts.

  • What is homophily, and how does it relate to predicting traits online?

    -Homophily is a sociological theory that suggests people are friends with others who are like them. In the context of online behavior, it helps explain how people with similar attributes, like intelligence, cluster together, allowing models to predict traits based on patterns of behavior.

  • What example is given to illustrate how companies predict personal traits based on seemingly irrelevant data?

    -An example is Target predicting a teenage girl's pregnancy before she informed her parents. By analyzing her purchase history, including buying more vitamins or a large handbag, Target computed a 'pregnancy score' and used it to predict her due date.

  • How does liking something like curly fries on Facebook relate to predicting intelligence?

    -Liking the curly fries page is indicative of high intelligence, not because of the content itself, but due to homophily. A smart person may have initially liked the page, and through their network of similarly smart friends, this action propagated, making it a signal of intelligence.

  • Why do companies like Facebook hold so much data about users, and what are the concerns around this?

    -Social media companies rely on user data as their primary asset for generating revenue. The concern is that users often don't understand how their data is used or shared, and they have little control over it. This leads to privacy risks and potential misuse of personal information.

  • What are the potential dangers of using predictive models on social media data?

    -These models can predict sensitive personal attributes, such as political preferences, sexual orientation, and drug use, without users' knowledge or consent. Such information could be exploited for non-altruistic purposes, like by employers making hiring decisions based on these predictions.

  • What challenges exist in addressing user control over their personal data?

    -There are two primary challenges: the slow political process in enacting laws to protect data rights and the reliance of social media companies on user data for revenue. This makes it unlikely for companies to voluntarily give users full control over their data.

  • How can science help users regain control over their online data?

    -Research can develop tools that inform users about the risks of their actions, such as liking a page or sharing information. This could help users make informed decisions about what data they share. Scientists could also work on encrypting data to protect user privacy.

  • What is the speaker’s perspective on balancing data prediction models and user control?

    -The speaker believes that users should have control over how their data is used. Although predictive models are useful for enhancing online interactions, users should have the option to prevent data collection if they wish, even if that diminishes the models' effectiveness.

  • Why does the speaker think law reform on user data control is unlikely in the near future?

    -The speaker observes that the political process is slow and unlikely to prioritize or effectively address the complexities of data privacy laws. Additionally, social media companies have a vested interest in maintaining control over user data for profit.

Outlines

00:00

🌐 The Evolution of the Web: From Static Pages to Social Media

The early internet was a static space where content was mostly created by tech-savvy individuals or organizations. The rise of social media in the early 2000s transformed it into an interactive platform where average users now generate the majority of content, such as YouTube videos, blog posts, and social media updates. Facebook, with 1.2 billion monthly users, illustrates this change, allowing users to create online personas without much technical skill. However, the massive amounts of personal data shared have led to unprecedented data collection on behavior, preferences, and demographics. While this data allows computer scientists to build predictive models for various hidden attributes, it raises concerns about users' lack of understanding and control over the information they unknowingly share.

05:01

🎯 Target's Predictive Data Insights and the Power of Subtle Behavior Patterns

The famous Target anecdote illustrates the predictive power of subtle behavior patterns. Target predicted a teenage girl's pregnancy using her purchase history to calculate a 'pregnancy score,' not by tracking obvious purchases like baby products but by analyzing minor changes in behavior, such as increased vitamin purchases or buying a larger handbag. This incident highlights how seemingly insignificant behaviors, when viewed in large datasets, can reveal hidden attributes. Similarly, social media platforms predict personal details like political preferences, personality traits, and even intelligence by analyzing patterns in user behavior, such as Facebook 'likes.' These predictive models demonstrate the growing capability of data analytics to infer sensitive personal information from seemingly trivial actions.

🍟 Curly Fries and the Science of Homophily: How Social Networks Reveal Hidden Traits

A study on Facebook 'likes' shows that seemingly irrelevant content, like liking a page for curly fries, can reveal surprising insights about users' intelligence. This phenomenon is explained by 'homophily,' the sociological theory that people tend to associate with others like themselves—smart people have smart friends. Over time, behaviors like liking a page can propagate through a network of similar individuals, creating patterns that reflect back shared traits, even if the content itself seems unrelated. This illustrates the complexity of understanding how online actions reflect personal characteristics, and the difficulty for users to grasp the hidden meaning behind their digital footprints.

🔐 The Problem of Data Control: Users' Limited Power Over Their Personal Data

Despite the complex nature of online data prediction, users have little control over how their data is used. The speaker presents the ethical dilemma of how data could be exploited by companies to predict traits such as drug use or team-working ability, and sell this information to HR departments without user consent. While policy and legal reform could help, the speaker is skeptical that political systems will enact necessary changes to give users control over their data. Moreover, many social media platforms rely on user data for revenue, making it unlikely that companies will willingly cede control to users.

🧪 Science as a Solution: Empowering Users Through Better Data Awareness

The speaker advocates for scientific research to empower users by giving them better control over their data. Science has developed the methods for predicting user traits, and similar research could create tools to inform users of the risks associated with sharing certain data online. By understanding how their actions contribute to data collection, users can make more informed decisions about what to share. The speaker also suggests research into encrypting user data, allowing people to control who can access their information. This approach offers a promising path toward balancing data collection with user autonomy, creating a more educated and empowered user base in the digital world.

Mindmap

Keywords

💡Static Web

The static web refers to the early era of the internet, where web pages were primarily static and unchanging. These pages were created by organizations or tech-savvy individuals and lacked user interaction. In the script, the speaker highlights this period as a contrast to today's dynamic, user-generated content-driven web.

💡Social Media

Social media refers to platforms that allow users to create, share, and interact with content online. The rise of social media in the early 2000s changed the web from a static place to a dynamic, interactive space. The video emphasizes how platforms like Facebook enabled users to share personal data and engage with others, leading to both positive and concerning outcomes.

💡User-Generated Content

User-generated content is material, such as videos, blog posts, reviews, and social media updates, created by everyday users rather than organizations. This shift in content creation, brought on by social media, allowed for a more interactive web experience. The script mentions that most of the web’s content today is generated by users, marking a significant change from the early web.

💡Data Privacy

Data privacy refers to the control users have over their personal data and how it's collected, stored, and shared online. The speaker warns that most users don’t understand how much of their personal information is shared on platforms like Facebook, or how that data can be exploited for purposes beyond their control.

💡Behavioral Data

Behavioral data includes the patterns of users' actions online, such as what they purchase, like, or share on social media. This data can be analyzed to infer personal attributes like preferences, political views, or even if someone is pregnant, as illustrated by the Target anecdote in the script.

💡Homophily

Homophily is a sociological theory that explains how people tend to form relationships with others who are similar to them in terms of traits like intelligence, age, or interests. The speaker uses this theory to explain how liking a seemingly random Facebook page like 'curly fries' can be indicative of intelligence based on the user’s network of friends who also liked it.

💡Predictive Models

Predictive models are algorithms used to forecast hidden attributes about individuals based on their online behavior. The speaker discusses how these models can accurately predict things like political preference or even personal characteristics such as trustworthiness or intelligence by analyzing users' actions, such as their likes on Facebook.

💡Big Data

Big data refers to the vast amount of information generated from users’ interactions on social media and other digital platforms. Companies like Target use big data to analyze purchasing behaviors, which allows them to predict customer needs, such as identifying a pregnancy based on subtle changes in buying patterns.

💡Online Persona

An online persona is the image or identity that users create for themselves on social media platforms. The speaker points out that social networks allow people to create these personas with little technical knowledge, but in doing so, users often share personal information that can be used for commercial purposes.

💡Control over Data

Control over data refers to the ability of users to manage how their personal information is collected and used by platforms. The speaker argues that while current models benefit corporations, the future should focus on giving control back to users, allowing them to decide who can access their data and for what purposes.

Highlights

The web in its early days was mostly static, with content put up by tech-savvy individuals or organizations.

Social media revolutionized the web by allowing average users to create most of the content, making it much more interactive.

Facebook has 1.2 billion monthly users, representing half of the Earth's internet population.

Massive amounts of personal data are now available online due to social media, leading to unprecedented levels of behavioral and demographic insights.

Computer scientists can now build models that predict hidden user attributes, such as political preference, personality, and even intelligence.

Companies like Target use predictive algorithms, such as pregnancy scores, based on seemingly unrelated purchases, to make accurate inferences about customers.

Small, seemingly insignificant behaviors on social media, like Facebook likes, can reveal hidden traits due to patterns in behavior across large groups.

A study found that liking a Facebook page for curly fries was one of the strongest indicators of high intelligence, due to the sociological concept of homophily.

The spread of likes or viral content follows similar patterns to how diseases spread through social networks.

Users currently lack control over how their data is used, and it can be exploited in ways they may not be aware of.

There is a potential threat of data being used for purposes like employment screenings, where attributes like drug use or team compatibility could be predicted.

Legal and policy changes are unlikely to happen quickly due to the complexity and the revenue models of social media companies relying on user data.

There is a need for scientific research to develop tools that give users better control over the data they share online.

One potential solution is encryption, where data is made invisible to third parties but accessible to chosen users.

The speaker advocates for informed consent in online data usage, ensuring users have control over how their data is utilized in the future.

Transcripts

play00:12

If you remember that first decade of the web,

play00:14

it was really a static place.

play00:16

You could go online, you could look at pages,

play00:19

and they were put up either by organizations

play00:21

who had teams to do it

play00:23

or by individuals who were really tech-savvy

play00:25

for the time.

play00:27

And with the rise of social media

play00:28

and social networks in the early 2000s,

play00:31

the web was completely changed

play00:33

to a place where now the vast majority of content

play00:36

we interact with is put up by average users,

play00:40

either in YouTube videos or blog posts

play00:42

or product reviews or social media postings.

play00:46

And it's also become a much more interactive place,

play00:48

where people are interacting with others,

play00:51

they're commenting, they're sharing,

play00:52

they're not just reading.

play00:54

So Facebook is not the only place you can do this,

play00:56

but it's the biggest,

play00:57

and it serves to illustrate the numbers.

play00:59

Facebook has 1.2 billion users per month.

play01:02

So half the Earth's Internet population

play01:04

is using Facebook.

play01:06

They are a site, along with others,

play01:08

that has allowed people to create an online persona

play01:11

with very little technical skill,

play01:13

and people responded by putting huge amounts

play01:15

of personal data online.

play01:17

So the result is that we have behavioral,

play01:20

preference, demographic data

play01:22

for hundreds of millions of people,

play01:24

which is unprecedented in history.

play01:26

And as a computer scientist, what this means is that

play01:29

I've been able to build models

play01:30

that can predict all sorts of hidden attributes

play01:32

for all of you that you don't even know

play01:35

you're sharing information about.

play01:37

As scientists, we use that to help

play01:39

the way people interact online,

play01:41

but there's less altruistic applications,

play01:44

and there's a problem in that users don't really

play01:46

understand these techniques and how they work,

play01:49

and even if they did, they don't have a lot of control over it.

play01:52

So what I want to talk to you about today

play01:53

is some of these things that we're able to do,

play01:56

and then give us some ideas of how we might go forward

play01:59

to move some control back into the hands of users.

play02:02

So this is Target, the company.

play02:03

I didn't just put that logo

play02:05

on this poor, pregnant woman's belly.

play02:07

You may have seen this anecdote that was printed

play02:09

in Forbes magazine where Target

play02:11

sent a flyer to this 15-year-old girl

play02:13

with advertisements and coupons

play02:15

for baby bottles and diapers and cribs

play02:17

two weeks before she told her parents

play02:19

that she was pregnant.

play02:21

Yeah, the dad was really upset.

play02:24

He said, "How did Target figure out

play02:25

that this high school girl was pregnant

play02:27

before she told her parents?"

play02:29

It turns out that they have the purchase history

play02:32

for hundreds of thousands of customers

play02:34

and they compute what they call a pregnancy score,

play02:37

which is not just whether or not a woman's pregnant,

play02:39

but what her due date is.

play02:41

And they compute that

play02:42

not by looking at the obvious things,

play02:44

like, she's buying a crib or baby clothes,

play02:46

but things like, she bought more vitamins

play02:49

than she normally had,

play02:51

or she bought a handbag

play02:52

that's big enough to hold diapers.

play02:54

And by themselves, those purchases don't seem

play02:56

like they might reveal a lot,

play02:59

but it's a pattern of behavior that,

play03:01

when you take it in the context of thousands of other people,

play03:04

starts to actually reveal some insights.

play03:06

So that's the kind of thing that we do

play03:08

when we're predicting stuff about you on social media.

play03:11

We're looking for little patterns of behavior that,

play03:14

when you detect them among millions of people,

play03:16

lets us find out all kinds of things.

play03:19

So in my lab and with colleagues,

play03:21

we've developed mechanisms where we can

play03:22

quite accurately predict things

play03:24

like your political preference,

play03:26

your personality score, gender, sexual orientation,

play03:29

religion, age, intelligence,

play03:32

along with things like

play03:34

how much you trust the people you know

play03:36

and how strong those relationships are.

play03:38

We can do all of this really well.

play03:39

And again, it doesn't come from what you might

play03:41

think of as obvious information.

play03:44

So my favorite example is from this study

play03:46

that was published this year

play03:47

in the Proceedings of the National Academies.

play03:49

If you Google this, you'll find it.

play03:50

It's four pages, easy to read.

play03:52

And they looked at just people's Facebook likes,

play03:55

so just the things you like on Facebook,

play03:57

and used that to predict all these attributes,

play03:59

along with some other ones.

play04:01

And in their paper they listed the five likes

play04:04

that were most indicative of high intelligence.

play04:07

And among those was liking a page

play04:09

for curly fries. (Laughter)

play04:11

Curly fries are delicious,

play04:13

but liking them does not necessarily mean

play04:15

that you're smarter than the average person.

play04:17

So how is it that one of the strongest indicators

play04:21

of your intelligence

play04:22

is liking this page

play04:24

when the content is totally irrelevant

play04:26

to the attribute that's being predicted?

play04:28

And it turns out that we have to look at

play04:30

a whole bunch of underlying theories

play04:32

to see why we're able to do this.

play04:34

One of them is a sociological theory called homophily,

play04:37

which basically says people are friends with people like them.

play04:40

So if you're smart, you tend to be friends with smart people,

play04:42

and if you're young, you tend to be friends with young people,

play04:45

and this is well established

play04:46

for hundreds of years.

play04:48

We also know a lot

play04:49

about how information spreads through networks.

play04:52

It turns out things like viral videos

play04:54

or Facebook likes or other information

play04:56

spreads in exactly the same way

play04:58

that diseases spread through social networks.

play05:01

So this is something we've studied for a long time.

play05:02

We have good models of it.

play05:04

And so you can put those things together

play05:06

and start seeing why things like this happen.

play05:09

So if I were to give you a hypothesis,

play05:11

it would be that a smart guy started this page,

play05:14

or maybe one of the first people who liked it

play05:16

would have scored high on that test.

play05:18

And they liked it, and their friends saw it,

play05:20

and by homophily, we know that he probably had smart friends,

play05:23

and so it spread to them, and some of them liked it,

play05:26

and they had smart friends,

play05:28

and so it spread to them,

play05:28

and so it propagated through the network

play05:30

to a host of smart people,

play05:33

so that by the end, the action

play05:35

of liking the curly fries page

play05:37

is indicative of high intelligence,

play05:39

not because of the content,

play05:41

but because the actual action of liking

play05:43

reflects back the common attributes

play05:45

of other people who have done it.

play05:48

So this is pretty complicated stuff, right?

play05:51

It's a hard thing to sit down and explain

play05:53

to an average user, and even if you do,

play05:56

what can the average user do about it?

play05:58

How do you know that you've liked something

play06:00

that indicates a trait for you

play06:01

that's totally irrelevant to the content of what you've liked?

play06:05

There's a lot of power that users don't have

play06:08

to control how this data is used.

play06:10

And I see that as a real problem going forward.

play06:13

So I think there's a couple paths

play06:15

that we want to look at

play06:16

if we want to give users some control

play06:18

over how this data is used,

play06:20

because it's not always going to be used

play06:21

for their benefit.

play06:23

An example I often give is that,

play06:24

if I ever get bored being a professor,

play06:26

I'm going to go start a company

play06:28

that predicts all of these attributes

play06:29

and things like how well you work in teams

play06:31

and if you're a drug user, if you're an alcoholic.

play06:33

We know how to predict all that.

play06:35

And I'm going to sell reports

play06:36

to H.R. companies and big businesses

play06:39

that want to hire you.

play06:41

We totally can do that now.

play06:42

I could start that business tomorrow,

play06:44

and you would have absolutely no control

play06:46

over me using your data like that.

play06:48

That seems to me to be a problem.

play06:50

So one of the paths we can go down

play06:52

is the policy and law path.

play06:54

And in some respects, I think that that would be most effective,

play06:57

but the problem is we'd actually have to do it.

play07:00

Observing our political process in action

play07:03

makes me think it's highly unlikely

play07:05

that we're going to get a bunch of representatives

play07:07

to sit down, learn about this,

play07:09

and then enact sweeping changes

play07:11

to intellectual property law in the U.S.

play07:13

so users control their data.

play07:16

We could go the policy route,

play07:17

where social media companies say,

play07:18

you know what? You own your data.

play07:20

You have total control over how it's used.

play07:22

The problem is that the revenue models

play07:24

for most social media companies

play07:26

rely on sharing or exploiting users' data in some way.

play07:30

It's sometimes said of Facebook that the users

play07:32

aren't the customer, they're the product.

play07:34

And so how do you get a company

play07:37

to cede control of their main asset

play07:39

back to the users?

play07:41

It's possible, but I don't think it's something

play07:42

that we're going to see change quickly.

play07:45

So I think the other path

play07:46

that we can go down that's going to be more effective

play07:48

is one of more science.

play07:50

It's doing science that allowed us to develop

play07:52

all these mechanisms for computing

play07:54

this personal data in the first place.

play07:56

And it's actually very similar research

play07:58

that we'd have to do

play08:00

if we want to develop mechanisms

play08:02

that can say to a user,

play08:04

"Here's the risk of that action you just took."

play08:06

By liking that Facebook page,

play08:08

or by sharing this piece of personal information,

play08:10

you've now improved my ability

play08:12

to predict whether or not you're using drugs

play08:14

or whether or not you get along well in the workplace.

play08:17

And that, I think, can affect whether or not

play08:19

people want to share something,

play08:20

keep it private, or just keep it offline altogether.

play08:24

We can also look at things like

play08:25

allowing people to encrypt data that they upload,

play08:28

so it's kind of invisible and worthless

play08:30

to sites like Facebook

play08:31

or third party services that access it,

play08:34

but that select users who the person who posted it

play08:37

want to see it have access to see it.

play08:40

This is all super exciting research

play08:42

from an intellectual perspective,

play08:43

and so scientists are going to be willing to do it.

play08:45

So that gives us an advantage over the law side.

play08:49

One of the problems that people bring up

play08:51

when I talk about this is, they say,

play08:52

you know, if people start keeping all this data private,

play08:55

all those methods that you've been developing

play08:57

to predict their traits are going to fail.

play09:00

And I say, absolutely, and for me, that's success,

play09:03

because as a scientist,

play09:05

my goal is not to infer information about users,

play09:09

it's to improve the way people interact online.

play09:11

And sometimes that involves inferring things about them,

play09:15

but if users don't want me to use that data,

play09:18

I think they should have the right to do that.

play09:20

I want users to be informed and consenting

play09:22

users of the tools that we develop.

play09:24

And so I think encouraging this kind of science

play09:27

and supporting researchers

play09:29

who want to cede some of that control back to users

play09:32

and away from the social media companies

play09:34

means that going forward, as these tools evolve

play09:37

and advance,

play09:38

means that we're going to have an educated

play09:40

and empowered user base,

play09:41

and I think all of us can agree

play09:42

that that's a pretty ideal way to go forward.

play09:45

Thank you.

play09:47

(Applause)

Rate This

5.0 / 5 (0 votes)

Связанные теги
Social MediaData PrivacyUser ControlBehavioral DataPredictive ModelsOnline PersonaPrivacy EthicsUser DataDigital TrendsTarget Marketing
Вам нужно краткое изложение на английском?