PageRank Algorithm - Example

Global Software Support
22 May 201710:11

Summary

TLDRThis lecture explains the concept of calculating PageRank for a network of web pages using a directed graph. The lecturer walks through an example with four web pages (A, B, C, D), showing how the PageRank algorithm updates iteratively based on incoming links and the number of outgoing links. It emphasizes how PageRank reflects a page's importance in a network and why creating multiple low-quality pages won’t boost rankings. The video also touches on how Google prevents manipulation of its search results using this algorithm.

Takeaways

  • 🖥️ The PageRank algorithm calculates the importance of web pages in a network using a directed graph where links between pages represent connections.
  • 📊 Each web page initially starts with an equal PageRank value of 1/n, where n is the total number of web pages in the network.
  • 🔄 PageRank is updated iteratively by considering the ranks of pages linking to it, divided by the number of their outgoing links.
  • 🔗 In the first iteration, PageRank for each node is updated based on the PageRanks of incoming linked pages from the previous iteration.
  • 📈 The sum of all PageRank values in any iteration will always equal 1, ensuring a balanced distribution.
  • 🌐 Higher PageRank indicates a more important page within the network, determined by the number and quality of incoming links.
  • 🤔 Counterintuitively, less important pages can have higher PageRank if they are linked by highly important pages.
  • 🔍 Google's PageRank ensures that simply creating many low-quality web pages won't significantly boost a page's rank.
  • 🚫 Low-ranked websites pointing to another website won’t inflate its PageRank, countering attempts to 'hack' the ranking system.
  • 📖 To have a higher PageRank, a website needs to be linked by other high-ranked, important websites.

Q & A

  • What is the primary goal of the lecture?

    -The primary goal of the lecture is to explain how to calculate PageRanks for a network of websites using a directed graph.

  • How is the network of websites modeled in this example?

    -The network of websites is modeled using a directed graph, where the nodes represent websites, and the edges represent links between the websites.

  • What does the term 'PageRank' refer to in the context of the lecture?

    -PageRank refers to a numerical value that represents the importance of a website in the network based on the number of links pointing to it and the importance of the linking websites.

  • How are the initial PageRanks for the websites calculated?

    -The initial PageRanks are set to 1 divided by the total number of websites in the network. In this example, with four websites, each PageRank is initialized to 1/4 or 0.25.

  • What is the process for calculating the PageRank for a website in subsequent iterations?

    -In subsequent iterations, the PageRank of a website is calculated by summing the PageRanks of all websites pointing to it, divided by their number of outgoing links.

  • What happens in the first iteration when calculating the PageRank for website A?

    -In the first iteration, the PageRank of website A is influenced by website C, which points to A. The PageRank of C from the previous iteration is divided by the number of C's outgoing links (3), resulting in a PageRank of 1/12 for A.

  • How does the PageRank of website B evolve in the first iteration?

    -In the first iteration, the PageRank of website B is calculated by considering links from websites A and C. The PageRank of A is divided by its two outgoing links, and the PageRank of C is divided by its three outgoing links, resulting in a combined PageRank of 2.5/12 for B.

  • What is a key observation about the sum of PageRanks in any iteration?

    -The sum of the PageRanks of all websites in the network always equals 1, no matter the iteration.

  • Why is website D considered to have a relatively high PageRank despite not being a prominent site?

    -Website D has a relatively high PageRank because a very important website, C, is pointing to it. This demonstrates that the importance of the linking website can influence the PageRank.

  • Why can't someone artificially inflate the PageRank of a website by creating many low-quality websites that link to it?

    -Google's PageRank algorithm ensures that the PageRank of a website is not simply based on the number of incoming links but on the quality and importance of the websites linking to it. Low-quality websites with low PageRanks will not significantly boost the target website's PageRank.

Outlines

00:00

🔗 Introduction to PageRank and Initial Setup

In this lecture, we explore how to calculate PageRank for a network of websites modeled as a directed graph. The connections between websites are represented by links. Initially, each website (A, B, C, D) is assigned a PageRank of 1 divided by the total number of websites. Iterative calculations will be used to update these ranks, factoring in links from other sites. The formula considers both the PageRank from the previous iteration and the number of outgoing links from each connected page.

05:01

🖩 First Iteration of PageRank Calculation

The first iteration calculates updated PageRank values for each website. For instance, website A's new PageRank is determined by the PageRank of website C (which points to A) from the previous iteration, divided by C's outgoing links. Similarly, for website B, the PageRanks of websites A and C are considered, factoring in their outgoing links. The summation of PageRanks remains equal to 1 after every iteration, ensuring the calculation is balanced across the network.

10:02

🔄 Second Iteration and Insights

In the second iteration, PageRanks are recalculated based on updated values from the first iteration. Website C, which has the highest PageRank, influences the rankings of other websites it points to, such as D. This iteration reveals a critical insight: even if a website (like D) has fewer direct links, its rank can still be high if an important website (like C) points to it. This shows the impact of influential sites in PageRank calculations.

🔍 Google's PageRank Algorithm and Importance of High-Quality Links

This section explains the underlying importance of Google's PageRank algorithm and why it prevents manipulation. Creating many low-value websites to artificially boost a site's rank won’t work because the algorithm prioritizes links from high-PageRank websites. For example, having a highly-ranked website link to a lesser-known page will raise its rank, but creating numerous low-quality pages linking to a site won’t significantly improve its position in search results.

✅ Conclusion and Final Thoughts

The video concludes by summarizing the importance of understanding how PageRank works, emphasizing that higher PageRanks come from being linked by important websites. The takeaway is that it's not possible to manipulate PageRank by creating numerous low-value sites. Instead, earning links from credible, high-ranked websites is the key to improving a website's visibility and ranking in search engines.

Mindmap

Keywords

💡PageRank

PageRank is an algorithm used by search engines like Google to rank web pages based on their importance. It calculates the significance of a page by considering the number of links pointing to it and the quality of those links. In the script, PageRank is applied to a network of websites (A, B, C, D) to determine their relevance, showing how websites with higher-quality links gain more importance in search results.

💡Directed Graph

A directed graph is a set of nodes connected by directed edges, representing one-way relationships between them. In the video, the network of websites (A, B, C, D) is modeled as a directed graph, where a link from one website to another indicates a one-way hyperlink. For instance, website A points to B, meaning that A has a link leading to B.

💡Iteration

Iteration refers to the repeated process of recalculating the PageRank values of websites across multiple rounds. In the video, several iterations are required to stabilize the PageRank values, as the initial ranks are uniformly distributed, and with each iteration, the algorithm refines the rankings based on incoming and outgoing links.

💡Outgoing Links

Outgoing links are the hyperlinks that point from one webpage to another. In the script, outgoing links are crucial in calculating PageRank because the algorithm divides the PageRank of a site by the number of its outgoing links. For example, site C has three outgoing links, so its PageRank contribution to other sites is split among them.

💡Incoming Links

Incoming links, also known as backlinks, are hyperlinks directed to a webpage from other websites. The number and quality of these links affect the PageRank of a site. For instance, the video highlights how site C's PageRank is higher because it has important incoming links, which boosts its importance in the network.

💡Initialization

Initialization refers to the process of assigning an initial value to each website's PageRank at the start of the algorithm. As per the video, all websites in the network are initialized with an equal PageRank of 1/n, where n is the total number of websites. This serves as the starting point for the iterative process of refining the rankings.

💡Normalization

Normalization ensures that the sum of all PageRank values in the network equals 1 at each iteration. This concept is highlighted in the video as the lecturer explains that after each iteration, the total PageRank of all websites combined remains equal to 1, ensuring consistency in the algorithm's calculations.

💡Link Spam

Link spam refers to the practice of creating multiple low-quality or irrelevant websites with the sole purpose of artificially boosting a site's PageRank. The video explains that Google's PageRank algorithm is designed to prevent manipulation through link spam by considering not just the quantity but also the quality of incoming links.

💡High-Quality Links

High-quality links are backlinks from authoritative or important websites, which greatly influence the PageRank of the receiving site. The script emphasizes that having incoming links from significant websites, like site C in the network, increases the PageRank of the linked website, demonstrating that quality matters more than quantity.

💡Google Search Ranking

Google search ranking is the process of ordering search results based on relevance, largely influenced by the PageRank algorithm. The video connects PageRank to Google's ranking system, explaining that higher PageRank means better visibility in search results, which is crucial for websites aiming to attract visitors. The speaker illustrates this with an example of how his website ranks lower for certain search queries.

Highlights

Introduction to calculating PageRank for a network of websites using a directed graph.

Each website is represented as a node, with connections representing links between pages.

The initial PageRank for each page is set to 1/n, where n is the total number of websites.

PageRank calculation for the first iteration involves considering pages that point to the given page.

PageRank of a node in the next iteration is based on the PageRank of nodes pointing to it, divided by the number of outgoing links.

In the first iteration, PageRank of page A is calculated as 1/12, considering C pointing to A.

PageRank of page B is calculated as 2.5/12 by considering links from A and C.

PageRank of page C considers links from A and D, with its new value being updated accordingly.

PageRank of page D is calculated considering links from B and C, with a value of 4/12.

Sum of PageRanks across all nodes remains 1, ensuring consistency across iterations.

In the second iteration, calculations continue similarly, updating PageRank based on incoming links.

A higher PageRank indicates a more important website in the network, but it's not always intuitive.

Important websites pointing to a page can increase that page's PageRank, even if the page itself isn't significant.

Google's PageRank system prevents manipulation by ensuring low-quality websites pointing to a site don't falsely boost its ranking.

A website becomes important when high PageRank sites link to it, ensuring higher relevance in search results.

Transcripts

play00:01

hi in this given lecture we are going to

play00:04

talk about a concrete example how to

play00:07

calculate page ranks for a given network

play00:10

of websites or web pages in this case we

play00:14

have four web pages a b c and d as you

play00:19

can see we are able to model the network

play00:22

with a directed graph basically the

play00:25

connections represents the links so a is

play00:28

pointing to B what does it mean that

play00:31

website a contains a link pointing to B

play00:36

Okay C is pointing to b b is pointing to

play00:40

D and so on so what do we have to do

play00:43

basically we just have to use this

play00:45

equation so the page rank of a given

play00:48

site in the next iteration is the page

play00:51

rank of the given site in the previous

play00:54

iteration and basically we just have to

play00:57

consider pages that are pointing to that

play01:00

given page and we have to divide it by

play01:03

the number of outgoing links okay so

play01:07

what about the zero iteration basically

play01:10

this is how we initialize our system so

play01:13

we just have to Define all the nodes or

play01:15

web pages in our Network and basically

play01:18

we are going to initialize the page

play01:21

ranks for every single website to be

play01:24

equals to 1 / n where n is the number of

play01:29

total pag or total websites 1 2 3 4 so

play01:34

that's why at the beginning all of the

play01:36

web pages will have page rank 1 / by 4

play01:40

okay of course we have to make several

play01:43

iterations because these values are not

play01:46

the valid page ranks we just have to

play01:48

initialize these values to be 1 / by n

play01:52

because we don't know the page ranks by

play01:54

default at the beginning okay what about

play01:57

the first iteration okay so let's

play02:00

consider node a what are the nodes

play02:03

pointing to node a basically we just

play02:05

have a single node the node C so website

play02:09

C is pointing to website a so we just

play02:12

have to calculate the page rank of this

play02:15

website that's pointing to node a in the

play02:18

previous iteration what is the page rank

play02:21

of node C in the previous iteration 1 /

play02:25

4 and we just have to divide it by the

play02:28

number of outgoing links as far as node

play02:31

C is concerned 1 2 three as you can see

play02:36

so that's why one divid 4 / by 3 so the

play02:41

page rank of a in the next iteration is

play02:44

going to be 1 / 12 what about B we have

play02:48

to consider web pages pointing to this

play02:52

given website so website b 1 / by 4 is

play02:56

the page rank of a in the previous

play02:59

iteration how many outgoing links are

play03:02

there as far as website a is concerned

play03:05

one two as you can see that's why we

play03:08

have to divide it by two plus we have to

play03:11

consider node c c has a page Rank one/

play03:15

by four in the previous iteration and we

play03:18

have to divide it by the number of

play03:20

outgoing links how many outgoing links

play03:23

are there for note c one as you can see

play03:26

two and three so 1 / by 4 IDE by 3 it

play03:32

has something to do with node a it has

play03:35

something to do with node C okay so 2.5

play03:38

/ 12 is going to be the page rank of

play03:42

website B in the first iteration what

play03:45

about C we just have to consider all the

play03:48

websites pointing in the direction of C

play03:51

okay 1 / 4 / by two that's all about the

play03:55

node a and 1/ 4 / 1 that's all about

play04:00

node D because node D just have a single

play04:04

outgoing link as you can see okay what

play04:07

about node D basically we have to

play04:10

consider all the websites that are

play04:12

pointing to node D so node C and node B

play04:16

what's the page rank of node C in the

play04:19

previous iteration 1/ by 4 divide by 3

play04:23

because C has three outgoing links plus

play04:27

1 / 4 is the page rank of B in the

play04:30

previous iteration divide by one because

play04:33

B has just a single outgoing link so

play04:36

that's why the overall page rank for

play04:38

website D in the first iteration is

play04:41

going to be 4 / by 12 as you can see if

play04:44

we sum up all the page ranks for all the

play04:48

websites in the network we are going to

play04:50

end up with one in all of the iterations

play04:54

1/ 4 + 1/ 4 + 1/ 4 + 1/ 4 is equals to 1

play05:01

1 / 12 2.5 / 12 4.5 / 12 4 / 12 is

play05:08

equals to 1 again so no matter what

play05:11

iteration we are considering the sum of

play05:14

the page ranks is going to be one we

play05:17

just have to consider every single

play05:19

website in the network and if we sum up

play05:22

the page ranks we end up with one what

play05:25

about the second iteration okay a is 4.5

play05:29

/ by 12 / 3 why because C is pointing to

play05:34

node a what's the page rank of C in the

play05:38

previous iteration 4.5 / 12 IDE by 3

play05:42

because website C has three outgoing

play05:45

links what about B we have node a and

play05:49

node C that are pointing to node B so we

play05:53

just have to consider 1 / 12 / 2 because

play05:57

this is the page rank of a in the

play05:59

previous iteration divided by the number

play06:01

of outgoing links plus 4.5 / 12 is the

play06:05

page rank of C in the previous iteration

play06:08

divide by three because note C has three

play06:11

outgoing links and so on this is how we

play06:14

are able to calculate the page ranks of

play06:17

every single website or node in our

play06:20

Network so for example let's suppose the

play06:22

situation that we just make two

play06:25

iterations what do we know first of all

play06:27

that if we sum up the page ranks of all

play06:30

the websites in the network we are going

play06:32

to end up with one as you can see again

play06:35

as far as iteration two is concerned the

play06:38

sum of the page ranks is equals to one

play06:41

but what does it mean that basically we

play06:44

are going to have the page ranks the

play06:47

higher the better 4.5 / by 12 is the

play06:51

greatest number Within These values so

play06:54

we can say that note C is the most

play06:57

important website within this Network

play07:00

then node D and as far as I am concerned

play07:04

it is quite counterintuitive because as

play07:06

you can see D is not a very important

play07:09

website then why is it going to have

play07:12

page rank value three because a very

play07:15

important website is pointing to that

play07:18

given website C is a very important

play07:20

website in this given Network and it's

play07:23

pointing to note D so basically it's

play07:25

very important to understand the basic

play07:28

concept behind Google Google's page rank

play07:30

so why is it important to Define this

play07:33

equation like this because I have

play07:36

created a website three months ago and

play07:39

basically no one is interested in this

play07:41

website okay there are several blog

play07:44

articles for example in this website but

play07:46

whenever someone is looking for integer

play07:49

nepsac problem in Java so let's type it

play07:53

integer napsack problem in Java

play07:56

basically my website is not going to be

play07:59

in the first results I guess it's going

play08:02

to be in the 10th page or something like

play08:04

this so what does it mean that my

play08:06

website is not considered to be an

play08:09

important website by Google and

play08:12

basically this equation is going to make

play08:15

sure that I am not able to hack Google

play08:18

in the sense that basically I just have

play08:20

to make several web pages for example I

play08:24

create 10,000 web pages and those web

play08:27

pages are going to point to this given

play08:30

website Global software support what do

play08:32

it mean that lots of lots of websites

play08:35

lots of lots of nodes are pointing in

play08:38

the direction of this given website

play08:40

Global software support so we may have

play08:43

the intuition that Global software

play08:45

support is very important and basically

play08:48

Global software support is going to be

play08:51

in one of the first search results so

play08:53

basically I just have hacked Google but

play08:56

it is not going to be feasible because

play08:58

Google's page rank algorithm doesn't

play09:01

work like that if we create lots of lots

play09:04

of website with very very low page ranks

play09:07

basically it doesn't matter that given

play09:10

website is pointing to several other

play09:12

websites basically this is what I would

play09:14

like to show you that if we have a

play09:16

website without any outgoing links but a

play09:20

very important website is pointing to

play09:22

that given website it means that D will

play09:26

have a higher page rank so we are not

play09:29

able to hack Google it doesn't matter

play09:32

that we have several low page ranked

play09:34

websites pointing to Global software

play09:37

support.com that website is not going to

play09:40

be more popular it's not going to have

play09:42

better page rank hands it's not going to

play09:45

be in one of the first search results

play09:48

when we search for the topics included

play09:50

in the website we have to make sure that

play09:53

several websites with high page ranks

play09:56

are pointing to a given website this is

play09:59

when a given website is going to be

play10:02

important and this is when a website is

play10:04

going to have a higher page rank so

play10:07

that's all about this little example

play10:09

thanks for watching

Rate This

5.0 / 5 (0 votes)

Связанные теги
PageRankSEO BasicsGoogle AlgorithmWebsite RankingLink AnalysisDirected GraphsIterative MethodsSearch EnginePageRank CalculationWeb Network
Вам нужно краткое изложение на английском?