The Internet: How Search Works

Code.org
13 Jun 201705:12

Summary

TLDRThis script delves into the inner workings of search engines, emphasizing the responsibility they hold in providing accurate answers to a diverse range of queries. It explains how search engines use spiders to index the web and algorithms, like Google's PageRank, to rank results. The script also touches on the challenges of spam and the evolution of search engines to understand context and meaning through machine learning, ensuring that relevant information remains easily accessible.

Takeaways

  • 🔍 Search engines are continuously scanning the web in advance to provide faster search results, rather than searching in real-time.
  • 🕷️ A 'Spider' program is used to crawl web pages and collect information, which is then stored in a search index.
  • 📚 When a search is performed, the engine looks for the query terms in the search index to generate a list of relevant web pages.
  • 🤔 Search engines use algorithms to rank pages, guessing what the user is looking for based on various factors like the presence of search terms in the page title.
  • 🔑 Google's PageRank algorithm determines the relevance of pages by considering the number of other web pages linking to a given page.
  • 🛡️ Search engines regularly update their algorithms to combat spam and ensure that untrustworthy sites do not rank highly.
  • 👀 Users should remain vigilant and check the reliability of sources by examining web addresses.
  • 📈 Modern search engines use machine learning to understand the context and meaning of words beyond just their presence on a page.
  • 📍 Search engines can provide personalized results, such as showing nearby dog parks even if the user's location was not specified.
  • 🧠 Machine learning allows search algorithms to understand the underlying meaning of words, distinguishing between different uses, like 'fast pitcher' for an athlete versus 'large pitcher' for a kitchen item.
  • 🌐 Despite the exponential growth of the internet, effective search engine design aims to keep relevant information easily accessible.

Q & A

  • Who is John and what is his role at Google?

    -John is the leader of the search and machine learning teams at Google, responsible for providing the best answers to users' search queries.

  • What is Akshaya's position and her team's focus at Bing?

    -Akshaya works on the Bing search team, focusing on the integration of artificial intelligence and machine learning to make an impact on society.

  • Why doesn't a search engine search the web in real time when a user makes a query?

    -Searching the web in real time would be too slow due to the vast number of websites. Instead, search engines use pre-indexed information to provide faster results.

  • What is the role of a Spider in a search engine's operation?

    -A Spider is a program that crawls through web pages, following hyperlinks and collecting information to be stored in a search index for future searches.

  • How does a search engine determine the most relevant results for a user's query?

    -Search engines use algorithms to rank pages based on various factors, such as the presence of search terms in the page title or the proximity of words, to determine the most relevant results.

  • What is the PageRank algorithm, and who is it named after?

    -PageRank is Google's algorithm for ranking search results based on the number of other web pages that link to a given page. It is named after its inventor, Larry Page, a founder of Google.

  • Why do search engines need to regularly update their algorithms?

    -Search engines update their algorithms to prevent spammers from manipulating search results and to ensure that fake or untrustworthy sites do not appear at the top of search results.

  • How can users identify untrustworthy pages in search results?

    -Users can identify untrustworthy pages by examining the web address and ensuring it comes from a reliable source.

  • How do modern search engines use information not explicitly provided by the user to improve search results?

    -Modern search engines use location data and other contextual information to provide more personalized and relevant results, such as showing nearby dog parks even if the user did not specify their location.

  • What role does machine learning play in improving search engine results?

    -Machine learning helps search engines understand the underlying meaning of words on a page, allowing them to provide more accurate and contextually relevant results.

  • How does the exponential growth of the internet affect the design of search engines?

    -The design of search engines must evolve to handle the exponential growth of the internet, ensuring that the information users want can still be quickly and easily accessed.

Outlines

00:00

🔍 Search Engine Fundamentals and Responsibility

The script introduces John from Google and Akshaya from Bing, emphasizing the significant role search engines play in providing answers to a wide range of questions. It underscores the responsibility these platforms have in delivering the best possible answers. The script then poses a question about traveling to Mars to illustrate the search process, explaining that search engines do not perform searches in real time across the entire web due to its vastness. Instead, they use pre-emptive scanning to create a searchable index, which is achieved through the use of a 'Spider' program that traverses web pages and collects data, storing it in a search index for quick retrieval.

Mindmap

Keywords

💡Search Engine

A search engine is a software system that is designed to search for information on the World Wide Web. It is the core technology behind platforms like Google and Bing, which index and rank web pages to provide users with the most relevant results for their queries. In the video, search engines are portrayed as having a huge responsibility to deliver the best answers to users' questions, whether they are trivial or incredibly important.

💡Machine Learning

Machine learning is a subset of artificial intelligence that provides systems the ability to learn and improve from experience without being explicitly programmed. In the context of the video, machine learning is used to enhance search algorithms, allowing them to understand the underlying meaning of words and provide more accurate search results. It is a key component in the evolution of search engines.

💡Spider

In the script, a 'spider' refers to a program or algorithm that systematically browses the internet to collect data about web pages. This process, known as 'crawling,' is essential for search engines to build and maintain their search index. The spider follows hyperlinks from page to page, gathering information that will later be used to deliver search results.

💡Search Index

A search index is a database that stores information about a large number of web pages that have been visited by a search engine's spider. It contains data extracted from web pages, which is used to facilitate quick and relevant search results. In the video, the search index is mentioned as the resource that search engines refer to when processing user queries.

💡Ranking Algorithm

A ranking algorithm is a set of rules used by search engines to determine the order in which search results are presented to users. These algorithms take into account various factors, such as the presence of search terms in the page title or the proximity of words, to estimate the relevance of a page to a user's query. The video explains that each search engine has its own algorithm to rank pages and provide the best matches first.

💡Page Rank

Page Rank is a specific algorithm invented by Larry Page, one of the founders of Google, which ranks web pages based on the number of other pages that link to them. The underlying assumption is that if a page is linked to by many other reputable pages, it is likely to be of high quality and relevant to users. The video script uses Page Rank as an example of how search engines determine the importance of a web page.

💡Spammers

Spammers are individuals or entities that attempt to manipulate search engine algorithms to increase the visibility of their web pages, often for malicious purposes such as promoting scams or low-quality content. In the video, spammers are mentioned as a challenge for search engines, which must regularly update their algorithms to prevent such pages from ranking highly.

💡Hyperlinks

Hyperlinks are a fundamental component of the internet, allowing users to navigate from one web page to another by clicking on highlighted text or images. In the context of search engines, hyperlinks are used by spiders to traverse the web and collect data for the search index. The script mentions that search engines follow hyperlinks to visit every page they can find on the internet.

💡Artificial Intelligence

Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. In the video, AI is discussed in the context of machine learning, which enables search engines to understand not just the words on a page but also their meanings, thus improving the accuracy of search results.

💡Relevance

Relevance, in the context of search engines, refers to the degree to which the search results match the user's query. The video emphasizes the importance of search engines providing relevant results, as it is crucial for users to find the information they seek efficiently. The ranking algorithms and machine learning techniques are employed to enhance the relevance of search results.

💡Information Retrieval

Information retrieval is the process of searching for and obtaining information from a database or a set of documents, which in the case of search engines, is the internet. The video script discusses how search engines use advanced techniques like machine learning to improve information retrieval, ensuring that users can quickly access the information they are looking for.

Highlights

John leads the search and machine learning teams at Google, emphasizing the importance of providing the best answers to search queries.

Akshaya from the Bing search team discusses the impact of AI and machine learning on user experience and societal impact.

A simple question about the travel time to Mars is used to illustrate how search engines process and provide results.

Search engines do not search the web in real time but use pre-indexed information to speed up the search process.

Over a billion websites exist on the internet, with hundreds more being created every minute, necessitating pre-indexed search results.

Search engines use a program called a Spider to crawl web pages and collect information for the search index.

The Spider follows hyperlinks to visit every page it can find on the internet and records information for search purposes.

When a user searches for a term, the search engine looks for that term in the search index to generate a list of relevant pages.

Search engines need to determine the best matches to show first, often guessing what the user is looking for.

Each search engine uses its own algorithm to rank pages based on relevance to the user's search query.

Google's PageRank algorithm considers the number of other web pages linking to a given page as a measure of relevance.

Spammers constantly try to manipulate search algorithms to rank higher, prompting regular updates to prevent this.

Users are advised to check the web address and source reliability to avoid untrustworthy pages.

Search engines are continually evolving to improve algorithms for faster and better result delivery.

Modern search engines use implicit information, like location, to provide more relevant search results.

Search engines now understand the meaning of words beyond their literal sense to match user intent more accurately.

Machine learning enables search algorithms to understand the underlying meaning of words for better search results.

The exponential growth of the internet is being managed by search engine teams to ensure quick access to information.

Transcripts

play00:06

Hi, my name's John.

play00:07

I lead the search and machine learning teams at Google.

play00:12

I think it's amazingly inspiring

play00:14

that people all over the world

play00:16

turn to search engines to ask trivial questions

play00:19

and incredibly important questions.

play00:20

So it's a huge responsibility to give them

play00:23

the best answers that we can.

play00:26

Hi, my name's Akshaya and I work on the Bing search team.

play00:30

There are many times where we will start looking

play00:33

into artificial intelligence and machine learning,

play00:35

but we have to address how are the users going to use this,

play00:39

because at the end of the day, we want to make an impact to society.

play00:43

Let's ask a simple question.

play00:45

How long does it take to travel to Mars?

play00:49

Where did these results come from

play00:51

and why was this listed before the other one?

play00:55

Okay, let's dive in and see how the search engine

play00:58

turned your request into a result.

play01:00

The first thing you need to know is when you do a search,

play01:03

the search engine isn't actually going out to the World Wide Web

play01:06

to run your search in real time.

play01:08

And that's because there's over a billion websites

play01:10

on the internet and hundreds more are being created every single minute.

play01:14

So if the search engine had to look through

play01:16

every single site to find the one you wanted,

play01:18

it would just take forever.

play01:20

So to make your search faster,

play01:21

search engines are constantly scanning the web in advance

play01:25

to record the information that might help with your search later.

play01:28

That way, when you search about travel to Mars,

play01:31

the search engine already has what it needs

play01:33

to give you an answer in real time.

play01:36

Here's how it works.

play01:37

The internet is a web of pages connected to each other by hyperlinks.

play01:42

Search engines are constantly running a program

play01:44

called a Spider that cross through these web pages

play01:47

to collect information about them.

play01:49

Each time it finds a hyperlink,

play01:52

it follows it until it has visited every page

play01:55

it can find on the entire internet.

play01:57

For each page the spider visits,

play01:59

it records any information it might need for a search

play02:02

by adding it to a special database called a search index.

play02:07

Now, let's go back to that search from earlier

play02:09

and see if we can figure out how the search engine

play02:11

came up with the results.

play02:13

When you ask how long does it take to travel to Mars,

play02:16

the search engine looks in each of those words

play02:18

in the search index to immediately get a list

play02:21

of all the pages on the internet containing those words.

play02:24

But just looking for these search terms

play02:26

could return millions of pages,

play02:28

so the search engine needs to be able to determine

play02:31

the best matches to show you first.

play02:33

This is where it gets tricky because the search engine

play02:36

may need to guess what you're looking for.

play02:38

Each search engine uses its own algorithm

play02:41

to rank the pages based on what it thinks you want.

play02:44

The search engine's ranking algorithm might check

play02:47

if your search term shows up in the page title,

play02:50

it might check if all of the words show up next to each other,

play02:54

or any number of other calculations

play02:57

that help it better determine

play02:58

which pages you'll want to see and which you won't.

play03:02

Google invented the most famous algorithm

play03:04

for choosing the most relevant results for a search by taking into account

play03:08

how many other Web pages linked to a given page.

play03:11

The idea is that if lots of websites think

play03:14

that a web page is interesting,

play03:15

then it's probably the one you're looking for.

play03:18

This algorithm is called page rank,

play03:20

not because it ranks web pages,

play03:22

but because it was named after its inventor, Larry Page,

play03:25

who's one of the founders of Google.

play03:27

Because a website often makes money when you visit it,

play03:30

spammers are constantly trying to find ways

play03:32

to game the search algorithm so that their pages

play03:35

are listed higher in the results.

play03:38

Search engines regularly update their algorithms

play03:40

to prevent fake or untrustworthy sites from reaching the top.

play03:44

Ultimately, it's up to you to keep an eye out

play03:47

for these pages that are untrustworthy

play03:49

by looking at the web address and making sure it's a reliable source.

play03:53

Search programs are always evolving

play03:55

to improve the algorithms wo they return better results,

play03:58

faster results than their competitors.

play04:01

Today's search engines even use information

play04:03

that you haven't explicitly provided to help you narrow down your search.

play04:07

So, for example, if you did a search for dog parks,

play04:10

many search engines would give you results

play04:12

for all the dog parks nearby,

play04:14

even though you didn't type in your location.

play04:17

Modern search engines also understand more

play04:20

than just the words on a page,

play04:22

but what they actually mean in order to find the best one

play04:24

that matches what you're looking for.

play04:27

For example, if you search for fast pitcher,

play04:30

it will know you're looking for an athlete.

play04:32

But if you search for large pitcher,

play04:34

it will find you options for your kitchen.

play04:38

To understand the words better, we use something called machine learning,

play04:41

a type of artificial intelligence.

play04:43

It enables search algorithms to search out

play04:46

not just individual letters or words in the page,

play04:48

but understand the underlying meaning of the words.

play04:53

The internet is growing exponentially,

play04:56

but if the teams that design search engines do our jobs right,

play05:00

the information you want should always be just a few keystrokes away.

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
Search EnginesArtificial IntelligenceMachine LearningPageRank AlgorithmInternet SpeedWeb IndexingSEOGoogleBingOnline Algorithms
Benötigen Sie eine Zusammenfassung auf Englisch?