The World Wide Web: Crash Course Computer Science #30

CrashCourse
4 Oct 201711:36

Summary

TLDRIn this CrashCourse Computer Science episode, Carrie Anne explains the distinction between the Internet and the World Wide Web, emphasizing that the Web operates on top of the Internet. She discusses the foundational elements of the Web, including web pages, hyperlinks, and the importance of hyperlinked information, which was first conceptualized by Vannevar Bush in 1945. The episode covers how web browsers use URLs, DNS lookups, TCP connections, and HTTP to retrieve and display web pages. It also touches on the evolution of web browsers, the development of HTML, and the significance of search engines and Net Neutrality in shaping the modern Web experience.

Takeaways

  • 🌐 The World Wide Web is a distributed application that runs on top of the Internet, unlike the Internet itself which is the underlying infrastructure.
  • 🔗 Hyperlinks, or links to other pages, are the fundamental building blocks of the web, allowing users to navigate between pages.
  • 📄 Web pages are documents containing content and are the most common type of hypertext document, which can be retrieved and rendered by web browsers.
  • 🌐 The concept of hyperlinked information was first conceptualized by Vannevar Bush in 1945 with his Memex machine idea.
  • 📑 A Uniform Resource Locator (URL) is used to specify the unique address of each hypertext page on the web.
  • đŸ’» When a web page is requested, a DNS lookup is performed to translate the domain name into an IP address, and then a TCP connection is opened to the web server.
  • 🌐 HTTP (Hypertext Transfer Protocol) is used to communicate between the web browser and the web server, with the initial version (HTTP 0.9) having only one command, 'GET'.
  • 📝 Hypertext Markup Language (HTML) was developed to 'mark up' text files with hypertext elements, with the first version (HTML 0.a) created in 1990.
  • đŸ› ïž Web browsers are applications that request pages from web servers and render the content, with the first web browser and server created by Tim Berners-Lee in 1990.
  • 🔍 Search engines were developed to help users find information on the web, with early engines like JumpStation using web crawlers, indexes, and search algorithms.
  • đŸ›ïž Net Neutrality is a principle that advocates for equal treatment of all internet traffic, preventing ISPs from favoring certain data streams over others.

Q & A

  • What is the primary difference between the Internet and the World Wide Web?

    -The World Wide Web is a distributed application that runs on top of the Internet, which is the underlying infrastructure that conveys data for all applications. The Web is accessed through web browsers and consists of interconnected pages linked by hyperlinks.

  • What is a hyperlink and how does it function within the World Wide Web?

    -A hyperlink is a reference within a web page that points to another web page or resource. It allows users to navigate from one page to another by clicking on text or images that are designated as hyperlinks, creating a web of interconnected information.

  • Who conceptualized the value of hyperlinked information and what was his hypothetical machine called?

    -Vannevar Bush conceptualized the value of hyperlinked information in 1945. He described a hypothetical machine called a Memex, which was intended to create an associative indexing system where items could be automatically linked and retrieved.

  • What is the full form of HTML and why was it developed?

    -HTML stands for Hypertext Markup Language. It was developed to provide a way to 'mark up' text files with hypertext elements, allowing for the creation of web pages with links, headings, lists, and other content that could be interpreted and rendered by web browsers.

  • What is a URL and how does it relate to accessing web pages?

    -A URL, or Uniform Resource Locator, is a unique address for each hypertext page on the web. It specifies the location of a web page and is used by browsers to request pages from web servers using the HTTP protocol.

  • How does the Hypertext Transfer Protocol (HTTP) facilitate communication between a web browser and a web server?

    -HTTP is a protocol used by web browsers to request pages from web servers. When a user requests a page, the browser sends an HTTP command, such as 'GET', to the server, which then responds with the requested hypertext page.

  • What is the significance of the '200 OK' status code in HTTP?

    -The '200 OK' status code in HTTP indicates that the server successfully processed the request and the requested page is being returned. It is a confirmation that the operation was successful and the client can proceed to display the received content.

  • What is the role of a web browser in accessing the World Wide Web?

    -A web browser is an application that enables users to request, retrieve, and render web pages from web servers. It interprets HTML, processes other web technologies like CSS and JavaScript, and displays the content on the user's device.

  • Who created the first web browser and web server, and what were the fundamental web standards developed at the same time?

    -Sir Tim Berners-Lee created the first web browser and web server in 1990 while working at CERN. He also developed the fundamental web standards including URLs, HTML, and HTTP.

  • What is Net Neutrality and why is it a topic of debate?

    -Net Neutrality is the principle that all internet traffic should be treated equally, without any discrimination or preference given to certain types of data. It is a topic of debate because some argue that prioritizing certain types of content could lead to an exploitative business model and stifle innovation, while others believe that market forces and technical requirements might justify different treatment for certain data types.

  • How did the early search engines like JumpStation work, and what was the basis for their search algorithms?

    -Early search engines like JumpStation operated using a web crawler that followed links to gather pages, an index that recorded text terms and their locations, and a search algorithm that consulted the index to return relevant pages based on user queries. The basis for their search algorithms was often the frequency of search terms on a page, which later evolved to include more sophisticated metrics like backlinks from other sites.

Outlines

00:00

🌐 Introduction to the World Wide Web

Carrie Anne introduces the concept of the World Wide Web, distinguishing it from the Internet. She explains that the Web is an application that runs on top of the Internet, similar to how Skype or Instagram operate. The Web is accessed through web browsers and is built on millions of servers worldwide. The fundamental building block of the Web is a single page, which can contain hyperlinks to other pages. Hyperlinks enable users to navigate from one related topic to another, a concept first envisioned by Vannevar Bush in 1945 with his Memex machine. The script also covers the basics of how web pages are retrieved and rendered by browsers, including DNS lookups, TCP connections, and the use of HTTP to request pages from web servers.

05:01

🔗 Hyperlinks and the Evolution of Web Pages

This section delves into the importance of hyperlinks in web pages, which allow for an interconnected web of information. It contrasts the pre-hyperlink era, where users had to manually search for information, with the current system where hyperlinks facilitate easy navigation. The script then explains the technical aspects of web page creation, including the use of HTML tags to structure content and the significance of URLs for locating specific pages. It also touches on the history of web browsers, starting with Tim Berners-Lee's creation of the first browser and server in 1990, and the subsequent development of more sophisticated browsers like Mosaic and the rise of search engines to help users find content on the growing Web.

10:01

🌐 Net Neutrality and the Future of the Web

The final paragraph discusses the concept of Net Neutrality, which is the principle that all data on the Internet should be treated equally. It outlines the debate surrounding this principle, with proponents arguing for equal treatment to prevent ISPs from favoring certain content and opponents suggesting that market forces could regulate any potential abuse. The paragraph also touches on the potential impact of Net Neutrality on innovation and the concerns about ISPs acting as gatekeepers to content. The script concludes by encouraging viewers to learn more about Net Neutrality due to its complex and far-reaching implications.

Mindmap

Keywords

💡World Wide Web

The World Wide Web, often abbreviated as the Web, is a system of interlinked hypertext documents that are accessed via the Internet. It was invented by Tim Berners-Lee in 1989 and is a platform that allows for the sharing of information globally. In the video, the World Wide Web is distinguished from the Internet, which is the broader network that enables the Web to function. The Web is described as a 'huge distributed application running on millions of servers worldwide,' highlighting its vast scale and reach.

💡Hyperlinks

Hyperlinks, also known as links, are a fundamental part of the Web. They are elements within a webpage that allow users to navigate to another page or section by clicking on them. The video explains that hyperlinks enable 'easily flow from one related topic to another,' which was a revolutionary concept when introduced, as it allowed for a more dynamic and interconnected way of accessing information.

💡Vannevar Bush

Vannevar Bush was an American scientist and engineer who conceptualized the idea of hyperlinked information in his 1945 essay 'As We May Think.' He proposed a device called the Memex, which would allow for 'associative indexing' where items could be linked together. This concept is foundational to the development of the Web, as it predates the actual technology but laid the groundwork for the idea of interconnected information.

💡Hypertext

Hypertext refers to text that contains hyperlinks, which allow for non-linear navigation through a body of information. The video mentions that 'text containing hyperlinks is so powerful, it got an equally awesome name: hypertext!' Web pages are the most common type of hypertext document today, and they are retrieved and rendered by web browsers.

💡Uniform Resource Locator (URL)

A URL is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. The video provides an example of a web page URL, 'thecrashcourse.com/courses,' and explains that each hypertext page needs a unique address to be accessible on the Web.

💡Domain Name System (DNS)

DNS is a system that translates human-friendly domain names, such as 'thecrashcourse.com,' into IP addresses that computers use to identify each other on the network. The video describes the process of DNS lookup, which is the first step when a user requests a site, converting a domain name into the corresponding computer's IP address.

💡Hypertext Transfer Protocol (HTTP)

HTTP is the protocol used for transmitting hypertext on the Internet. The video explains that HTTP is used to 'ask that web server for the “courses” hypertext page' by sending a command such as 'GET /courses.' This protocol is fundamental to how web browsers request and receive web pages from servers.

💡Hypertext Markup Language (HTML)

HTML is the standard markup language used to create web pages. The video describes how HTML allows for the 'marking up' of a text file with hypertext elements, such as headings, hyperlinks, and lists. It also mentions the evolution of HTML from version 0.a with 18 commands to version 5 with over a hundred different tags.

💡Web Browser

A web browser is a software application used to retrieve, present, and browse through information resources on the World Wide Web. The video discusses the role of web browsers in requesting pages from web servers and rendering the content. It also touches on the history of web browsers, including the first browser created by Tim Berners-Lee.

💡Net Neutrality

Net Neutrality is the principle that all data on the Internet should be treated equally, without discrimination or preference given to certain types of content by Internet Service Providers (ISPs). The video discusses the debate around Net Neutrality, emphasizing the potential for ISPs to act as gatekeepers and the implications this could have for innovation and competition on the Web.

Highlights

The World Wide Web is a distributed application running on millions of servers worldwide, accessed via web browsers.

Hyperlinks allow for easy navigation between related information, a concept first proposed by Vannevar Bush in 1945.

Web pages are documents containing content and links to other pages, forming a vast web of interconnected information.

Hypertext, such as web pages, is text containing hyperlinks, retrieved and rendered by web browsers.

Each hypertext page requires a unique address, specified by a Uniform Resource Locator (URL).

When requesting a site, DNS lookup translates a domain name into an IP address, which is then used to open a TCP connection.

HTTP is the protocol used to request and transfer hypertext pages over the internet.

HTML, or Hypertext Markup Language, is used to structure content on web pages with tags for headings, links, and lists.

The first web browser and server were created by Tim Berners-Lee in 1990, establishing foundational web standards.

Mosaic, released in 1993, was the first browser to display graphics alongside text and introduced features like bookmarks.

Search engines like Google use complex algorithms to rank search results based on relevance and backlinks from reputable sites.

Net Neutrality is the principle that all internet traffic should be treated equally, without preferential treatment for certain data.

The debate over Net Neutrality involves concerns about ISPs potentially acting as gatekeepers and stifling innovation.

The rise of the World Wide Web has led to the development of various web browsers and servers, shaping the modern internet landscape.

HTML has evolved significantly, with HTML5 introducing over a hundred tags for a wide range of content and interactive elements.

Additional web technologies like CSS and JavaScript enhance the presentation and functionality of web pages.

Early web search engines used simple metrics for search ranking, which led to manipulation and the need for more sophisticated algorithms.

Transcripts

play00:03

Hi, I’m Carrie Anne, and welcome to CrashCourse Computer Science.

play00:05

Over the past two episodes, we’ve delved into the wires, signals, switches, packets,

play00:10

routers and protocols that make up the internet.

play00:12

Today we’re going to move up yet another level of abstraction and talk about the World

play00:16

Wide Web.This is not the same thing as the Internet, even though people often use the

play00:20

two terms interchangeably in everyday language.

play00:21

The World Wide Web runs on top of the internet, in the same way that Skype, Minecraft or Instagram do.

play00:27

The Internet is the underlying plumbing that conveys the data for all these different applications.

play00:31

And The World Wide Web is the biggest of them all – a huge distributed application running

play00:35

on millions of servers worldwide, accessed using a special program called a web browser.

play00:40

We’re going to learn about that, and much more, in today’s episode.

play00:43

INTRO

play00:53

The fundamental building block of the World Wide Web – or web for short – is a single

play00:57

page.

play00:58

This is a document, containing content, which can include links to other pages.

play01:01

These are called hyperlinks.

play01:03

You all know what these look like: text or images that you can click, and they jump you

play01:06

to another page.

play01:08

These hyperlinks form a huge web of interconnected information, which is where the whole thing

play01:12

gets its name.

play01:13

This seems like such an obvious idea.

play01:15

But before hyperlinks were implemented, every time you wanted to switch to another piece

play01:18

of information on a computer, you had to rummage through the file system to find it, or type

play01:22

it into a search box.

play01:24

With hyperlinks, you can easily flow from one related topic to another.

play01:28

The value of hyperlinked information was conceptualized by Vannevar Bush way back in 1945.

play01:33

He published an article describing a hypothetical machine called a Memex, which we discussed

play01:37

in Episode 24.

play01:39

Bush described it as "associative indexing ... whereby any item may be caused at will

play01:44

to select another immediately and automatically."

play01:47

He elaborated: "The process of tying two things together is the important thing...thereafter,

play01:52

at any time, when one of those items is in view, the other [item] can be instantly recalled

play01:57

merely by tapping a button."

play01:59

In 1945, computers didn’t even have screens, so this idea was way ahead of its time!

play02:04

Text containing hyperlinks is so powerful, it got an equally awesome name: hypertext!

play02:09

Web pages are the most common type of hypertext document today.

play02:12

They’re retrieved and rendered by web browsers which we'll get to in a few minutes.

play02:15

In order for pages to link to one another, each hypertext page needs a unique address.

play02:20

On the web, this is specified by a Uniform Resource Locator, or URL for short.

play02:25

An example web page URL is thecrashcourse.com/courses.

play02:29

Like we discussed last episode, when you request a site, the first thing your computer does

play02:33

is a DNS lookup.

play02:34

This takes a domain name as input – like “the crash course dot com” – and replies

play02:38

back with the corresponding computer’s IP address.

play02:40

Now, armed with the IP address of the computer you want, your web browser opens a TCP connection

play02:45

to a computer that’s running a special piece of software called a web server.

play02:49

The standard port number for web servers is port 80.

play02:52

At this point, all your computer has done is connect to the web server at the address

play02:55

thecrashcourse.com

play02:57

The next step is to ask that web server for the “courses” hypertext page.

play03:01

To do this, it uses the aptly named Hypertext Transfer Protocol, or HTTP.

play03:05

The very first documented version of this spec, HTTP 0.9, created in 1991, only had

play03:11

one command – “GET”.

play03:13

Fortunately, that’s pretty much all you need.

play03:15

Because we’re trying to get the “courses” page, we send the server the following command

play03:19

– GET /courses.

play03:21

This command is sent as raw ASCII text to the web server, which then replies back with

play03:25

the web page hypertext we requested.

play03:27

This is interpreted by your computer's web browser and rendered to your screen.

play03:31

If the user follows a link to another page, the computer just issues another GET request.

play03:35

And this goes on and on as you surf around the website.

play03:38

In later versions, HTTP added status codes, which prefixed any hypertext that was sent

play03:43

following a GET request.

play03:45

For example, status code 200 means OK – I’ve got the page and here it is!

play03:49

Status codes in the four hundreds are for client errors.

play03:51

Like, if a user asks the web server for a page that doesn’t exist, that’s the dreaded

play03:56

404 error!

play03:57

Web page hypertext is stored and sent as plain old text, for example, encoded in ASCII or

play04:01

UTF-16, which we talked about in Episodes 4 and 20.

play04:05

Because plain text files don’t have a way to specify what’s a link and what’s not,

play04:09

it was necessary to develop a way to “mark up” a text file with hypertext elements.

play04:13

For this, the Hypertext Markup Language was developed.

play04:16

The very first version of HTML version 0.a, created in 1990, provided 18 HTML commands

play04:22

to markup pages.

play04:23

That’s it!

play04:24

Let’s build a webpage with these!

play04:25

First, let’s give our web page a big heading.

play04:28

To do this, we type in the letters “H 1”, which indicates the start of a first level

play04:32

heading, and we surround that in angle brackets.

play04:35

This is one example of an HTML tag.

play04:38

Then, we enter whatever heading text we want.

play04:40

We don’t want the whole page to be a heading.

play04:42

So, we need to “close” the “h1” tag like so, with a little slash in the front.

play04:45

Now lets add some content.

play04:47

Visitors may not know what Klingons are, so let’s make that word a hyperlink to the

play04:51

Klingon Language Institute for more information.

play04:53

We do this with an “A” tag, inside of which we include an attribute that specifies

play04:57

a hyperlink reference.

play04:58

That’s the page to jump to if the link is clicked.

play05:00

And finally, we need to close the A tag.

play05:03

Now lets add a second level heading, which uses an “h2” tag.

play05:06

HTML also provides tags to create lists.

play05:09

We start this by adding the tag for an ordered list.

play05:12

Then we can add as many items as we want, surrounded in “L i” tags, which stands

play05:16

for list item.

play05:17

People may not know what a bat'leth is, so let’s make that a hyperlink too.

play05:21

Lastly, for good form, we need to close the ordered list tag.

play05:24

And we’re done – that’s a very simple web page!

play05:27

If you save this text into notepad or textedit, and name it something like “test.html”,

play05:31

you should be able to open it by dragging it into your computer’s web browser.

play05:35

Of course, today’s web pages are a tad more sophisticated.

play05:38

The newest version of HTML, version 5, has over a hundred different tags – for things

play05:42

like images, tables, forms and buttons.

play05:44

And there are other technologies we’re not going to discuss, like Cascading Style Sheets

play05:48

or CSS and JavaScript, which can be embedded into HTML pages and do even fancier things.

play05:54

That brings us back to web browsers.

play05:56

This is the application on your computer that lets you talk with all these web servers.

play06:00

Browsers not only request pages and media, but also render the content that’s being

play06:03

returned.

play06:04

The first web browser, and web server, was written by (now Sir) Tim Berners-Lee over

play06:09

the course of two months in 1990.

play06:10

At the time, he was working at CERN in Switzerland.

play06:13

To pull this feat off, he simultaneously created several of the fundamental web standards we

play06:18

discussed today: URLs, HTML and HTTP.

play06:21

Not bad for two months work!

play06:23

Although to be fair, he’d been researching hypertext systems for over a decade.

play06:27

After initially circulating his software amongst colleagues at CERN, it was released to the

play06:30

public in 1991.

play06:32

The World Wide Web was born.

play06:34

Importantly, the web was an open standard, making it possible for anyone to develop new

play06:38

web servers and browsers.

play06:39

This allowed a team at the University of Illinois at Urbana-Champaign to create the Mosaic web

play06:43

browser in 1993.

play06:45

It was the first browser that allowed graphics to be embedded alongside text; previous browsers

play06:50

displayed graphics in separate windows.

play06:52

It also introduced new features like bookmarks, and had a friendly GUI interface, which made

play06:56

it popular.

play06:57

Even though it looks pretty crusty, it’s recognizable as the web we know today!

play07:01

By the end of the 1990s, there were many web browsers in use, like Netscape Navigator,

play07:05

Internet Explorer, Opera, OmniWeb and Mozilla.

play07:08

Many web servers were also developed, like Apache and Microsoft’s Internet Information

play07:11

Services (IIS).

play07:13

New websites popped up daily, and web mainstays like Amazon and eBay were founded in the mid-1990s.

play07:18

A golden era!

play07:19

The web was flourishing and people increasingly needed ways to find things.

play07:23

If you knew the web address of where you wanted to go – like ebay.com – you could just

play07:27

type it into the browser.

play07:28

But what if you didn’t know where to go?

play07:30

Like, you only knew that you wanted pictures of cute cats.

play07:33

Right now!

play07:34

Where do you go?

play07:35

At first, people maintained web pages which served as directories hyperlinking to other

play07:39

websites.

play07:40

Most famous among these was "Jerry and David's guide to the World Wide Web", renamed Yahoo

play07:44

in 1994.

play07:45

As the web grew, these human-edited directories started to get unwieldy, and so search engines

play07:50

were developed.

play07:51

Let’s go to the thought bubble!

play07:52

The earliest web search engine that operated like the ones we use today, was JumpStation,

play07:57

created by Jonathon Fletcher in 1993 at the University of Stirling.

play08:01

This consisted of three pieces of software that worked together.

play08:04

The first was a web crawler, software that followed all the links it could find on the

play08:07

web; anytime it followed a link to a page that had new links, it would add those to

play08:11

its list.

play08:12

The second component was an ever enlarging index, recording what text terms appeared

play08:16

on what pages the crawler had visited.

play08:18

The final piece was a search algorithm that consulted the index; for example, if I typed

play08:22

the word “cat” into JumpStation, every webpage where the word “cat” appeared

play08:26

would come up in a list.

play08:28

Early search engines used very simple metrics to rank order their search results, most often

play08:32

just the number of times a search term appeared on a page.

play08:35

This worked okay, until people started gaming the system, like by writing “cat” hundreds

play08:40

of times on their web pages just to steer traffic their way.

play08:43

Google’s rise to fame was in large part due to a clever algorithm that sidestepped

play08:47

this issue.

play08:48

Instead of trusting the content on a web page, they looked at how other websites linked to

play08:52

that page.

play08:53

If it was a spam page with the word cat over and over again, no site would link to it.

play08:57

But if the webpage was an authority on cats, then other sites would likely link to it.

play09:01

So the number of what are called “backlinks”, especially from reputable sites, was often

play09:05

a good sign of quality.

play09:07

This started as a research project called BackRub at Stanford University in 1996, before

play09:12

being spun out, two years later, into the Google we know today.

play09:15

Thanks thought bubble!

play09:16

Finally, I want to take a second to talk about a term you’ve probably heard a lot recently,

play09:20

“Net Neutrality”.

play09:21

Now that you’ve built an understanding of packets, internet routing, and the World Wide

play09:25

Web, you know enough to understand the essence – at least the technical essence – of

play09:29

this big debate.

play09:30

In short, network neutrality is the principle that all packets on the internet should be

play09:34

treated equally.

play09:35

It doesn’t matter if the packets are my email or you streaming this video, they should

play09:38

all chug along at the same speed and priority.

play09:41

But many companies would prefer that their data arrive to you preferentially.

play09:45

Take for example, Comcast, a large ISP that also owns many TV channels, like NBC and The

play09:50

Weather Channel, which are streamed online.

play09:52

Not to pick on Comcast, but in the absence of Net Neutrality rules, they could for example say that

play09:57

they want their content to be delivered silky smooth, with high priority


play10:01

But other streaming videos are going to get throttled, that is, intentionally given less

play10:04

bandwidth and lower priority. Again I just want to reiterate here this is just conjecture.

play10:09

At a high level, Net Neutrality advocates argue that giving internet providers this

play10:13

ability to essentially set up tolls on the internet – to provide premium packet delivery

play10:17

– plants the seeds for an exploitative business model.

play10:20

ISPs could be gatekeepers to content, with strong incentives to not play nice with competitors.

play10:25

Also, if big companies like Netflix and Google can pay to get special treatment, small companies,

play10:30

like start-ups, will be at a disadvantage, stifling innovation.

play10:34

On the other hand, there are good technical reasons why you might want different types

play10:37

of data to flow at different speeds.

play10:39

That skype call needs high priority, but it’s not a big deal if an email comes in a few

play10:43

seconds late.

play10:44

Net-neutrality opponents also argue that market forces and competition would discourage bad

play10:49

behavior, because customers would leave ISPs that are throttling sites they like.

play10:53

This debate will rage on for a while yet, and as we always encourage on Crash Course,

play10:57

you should go out and learn more because the implications of Net Neutrality are complex

play11:01

and wide-reaching.

play11:02

I’ll see you next week.

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
Internet HistoryWeb DevelopmentHyperlinksHTML BasicsHTTP ProtocolWeb BrowsersSearch EnginesNet NeutralityCERNWeb Standards
Besoin d'un résumé en anglais ?