The World Wide Web: Crash Course Computer Science #30
Summary
TLDR这段视频脚本深入探讨了互联网和万维网(World Wide Web)的区别,以及万维网是如何在互联网的基础上运行的。万维网是一个庞大的分布式应用程序,通过全球数百万服务器上的网页来访问,而网页则是通过超链接(hyperlinks)相互连接的文档。视频还介绍了超文本(hypertext)的概念,以及如何使用统一资源定位器(URL)和超文本传输协议(HTTP)来定位和请求网页。此外,还讲述了超文本标记语言(HTML)的发展,以及它如何被用来创建和链接网页。视频还回顾了网络浏览器的起源,包括第一个网络浏览器和网络服务器的创建者蒂姆·伯纳斯-李(Tim Berners-Lee)的工作,以及随后出现的多种浏览器和服务器。最后,视频讨论了网络中立性(Net Neutrality)的概念,这是一个关于互联网上所有数据包是否应被平等对待的重要议题,以及这一原则对创新和技术发展的潜在影响。
Takeaways
- 🌐 互联网和万维网(World Wide Web)是两个不同的概念,万维网是建立在互联网之上的。
- 🔗 超链接(hyperlinks)是万维网的基本构成元素,它允许用户从一个文档轻松跳转到另一个文档。
- 📄 万维网的文档称为网页,这些文档通过超文本(hypertext)的形式存在,可以通过网络浏览器检索和渲染。
- 🌐 每个网页都需要一个唯一的地址,即统一资源定位器(URL),用于在互联网上定位资源。
- 📡 当用户请求一个网页时,计算机首先进行DNS查询,将域名解析为IP地址,然后通过TCP连接到目标服务器。
- 🗨️ 超文本传输协议(HTTP)是用于从服务器请求网页的标准协议,最初的HTTP 0.9版本仅有一个命令“GET”。
- 📝 超文本标记语言(HTML)是用于创建网页的标记语言,它定义了网页的结构和内容。
- 🔍 搜索引擎的发展极大地方便了用户查找信息,最初的搜索引擎通过爬虫程序索引网页,而现代搜索引擎如Google则使用复杂的算法来提高搜索质量。
- 🏗️ 第一个网络浏览器和服务器是由蒂姆·伯纳斯-李(Tim Berners-Lee)在1990年开发的,他同时创建了URL、HTML和HTTP等基础网络标准。
- 🌟 网络中立性(Net Neutrality)是一个重要原则,主张所有互联网上的数据包应被平等对待,不应因来源不同而受到不同的传输速度或优先级。
- 🚀 网络的开放标准促进了创新和发展,使得任何人都可以开发新的网络服务器和浏览器,这是万维网快速成长的关键因素。
- 📈 随着万维网的快速发展,出现了多种网络浏览器和服务器,以及新的网站和服务,如亚马逊(Amazon)和eBay。
Q & A
互联网和万维网有什么区别?
-互联网是传输数据的基础架构,而万维网是运行在互联网之上的最大的分布式应用程序,通过网页浏览器进行访问。互联网负责数据传输,万维网则提供了一种通过超链接浏览信息的方式。
超链接是如何改变信息浏览方式的?
-在超链接出现之前,用户需要通过文件系统搜索或输入搜索框来查找信息。超链接的出现使得用户可以通过点击文本或图片轻松地从一个相关主题跳转到另一个主题。
万维网的基本构成单元是什么?
-万维网的基本构成单元是单个网页,这是一个包含内容的文档,可以包含指向其他页面的链接,这些链接被称为超链接。
Vannevar Bush在1945年提出了什么概念?
-Vannevar Bush在1945年提出了超链接信息价值的概念,并描述了一个假想的机器——Memex,它通过“联想索引”将信息项相互关联,允许用户通过按钮点击即刻选择并自动检索另一项信息。
网页是如何通过统一资源定位器(URL)进行唯一标识的?
-每个超文本页面需要一个唯一的地址以链接到另一个页面,这个地址在万维网上由统一资源定位器(URL)指定。例如,一个网页的URL可能是 thecrashcourse.com/courses。
当用户在浏览器中输入一个网址时,计算机首先会执行什么操作?
-当用户在浏览器中输入一个网址时,计算机首先会执行DNS查找,将域名(如'thecrashcourse.com')转换为对应的计算机IP地址。
HTTP协议的最初版本有哪些功能?
-HTTP协议的最初版本,即HTTP 0.9,只包含一个命令——'GET'。这个命令用于请求网页,对于基本的网页检索来说已经足够。
超文本标记语言(HTML)的最初版本提供了多少个标记命令?
-超文本标记语言(HTML)的最初版本,即HTML 0.a,是在1990年创建的,提供了18个HTML命令来标记页面。
现代网页与早期网页相比有哪些进步?
-现代网页相比早期网页更为复杂和先进。HTML的最新版本,即HTML 5,提供了超过一百种不同的标签,用于创建图片、表格、表单和按钮等内容。此外,还有CSS和JavaScript等技术可以嵌入到HTML页面中,实现更丰富的功能。
第一个网络浏览器和网络服务器是由谁编写的?
-第一个网络浏览器和网络服务器是由(现在的)Sir Tim Berners-Lee在1990年的两个月内编写的。当时他在瑞士的CERN工作,并同时创建了URL、HTML和HTTP等基础网络标准。
搜索引擎的工作原理是什么?
-搜索引擎通过三部分组成:网络爬虫(web crawler),索引(index),和搜索算法(search algorithm)。网络爬虫会跟踪网页上的所有链接,索引记录了爬虫访问的页面上出现的文本术语,搜索算法则根据索引提供搜索结果。
网络中立性是什么?
-网络中立性是所有互联网上的数据包应被平等对待的原则。这意味着不论是电子邮件还是视频流,它们都应该以相同的速度和优先级传输。网络中立性的辩论涉及到是否允许互联网服务提供商(ISP)为某些数据提供优先传输,以及这可能对小型公司和创新产生的影响。
Outlines
🌐 互联网与万维网的区别
本段介绍了互联网和万维网的不同。互联网是基础架构,负责数据传输,而万维网是建立在互联网之上的分布式应用程序,通过浏览器访问。万维网由网页组成,这些网页通过超链接相互连接,形成庞大的信息网络。超链接的概念最早由Vannevar Bush在1945年提出,他描述了一个名为Memex的假想机器,能够实现关联索引。此外,还介绍了网页的地址——URL,以及如何通过DNS查找、TCP连接和HTTP协议请求网页。
📄 HTML与网页的构建
本段详细讲解了HTML(超文本标记语言)的基础知识,它是创建网页的标记语言。HTML文档通过不同的标签定义内容的结构,如标题、链接、列表等。介绍了如何使用HTML标签构建一个简单的网页,包括创建标题、添加内容、制作超链接和创建有序列表。此外,还提到了HTML的发展,从最初的18个命令发展到HTML5的一百多个标签,并提及了CSS和JavaScript等其他技术,这些可以嵌入HTML页面中以实现更复杂的功能。
🏛 网络浏览器与万维网的起源
本段讲述了网络浏览器的历史和万维网的起源。第一个网络浏览器和服务器由Tim Berners-Lee在1990年编写,他在CERN工作期间创造了URL、HTML和HTTP等基础网络标准。随后,浏览器和服务器软件被发布并迅速发展,如Mosaic浏览器和多种网络服务器。此外,还讨论了搜索引挚的发展,从早期的JumpStation到Google的算法,后者通过检查其他网站对某页面的链接来评估网页的权威性。
🚀 网络中立性的重要性
本段探讨了网络中立性的概念和争议。网络中立性主张所有互联网上的数据包应被平等对待,无论是电子邮件还是视频流。然而,一些公司,如ISP,可能希望他们的数据能够优先传输。这可能导致对小型公司和初创企业的不公平竞争,因为他们可能无法支付额外的费用以获得优先服务。同时,也提到了网络中立性反对者的观点,他们认为市场力量和竞争将防止ISP的不良行为。这个议题复杂且影响深远,需要更深入的了解和讨论。
Mindmap
Keywords
💡互联网
💡万维网
💡超链接
💡超文本
💡统一资源定位器(URL)
💡域名系统(DNS)
💡超文本传输协议(HTTP)
💡HTML
💡网络浏览器
💡搜索引擎
💡网络中立性
Highlights
互联网和万维网是两个不同的概念,尽管在日常语言中人们常常将它们交替使用。
万维网是建立在互联网之上的,就像Skype、Minecraft或Instagram等应用一样。
互联网是传输所有不同应用数据的底层架构,而万维网是其中最大的分布式应用。
万维网的基本构建块是单个网页,它是一个包含内容的文档,可以链接到其他页面。
超链接是连接不同信息片段的文本或图片,用户可以点击以跳转到另一页面。
Vannevar Bush在1945年就概念化了超链接信息的价值,并描述了一个称为Memex的假想机器。
超文本是包含超链接的文本,它使得信息的流动从一个相关主题到另一个变得容易。
每个超文本页面需要一个唯一的地址,即统一资源定位符(URL)。
当你请求一个网站时,计算机首先进行DNS查找,将域名转换为IP地址。
浏览器通过HTTP协议向服务器发送GET请求以获取网页。
HTTP协议的后续版本添加了状态代码,例如200表示OK,404表示客户端错误。
网页超文本以纯文本形式存储和发送,例如ASCII或UTF-16编码。
超文本标记语言(HTML)是为了标记文本文件中的超文本元素而开发的。
HTML的最初版本提供了18个命令来标记页面,而HTML5有超过100个不同的标签。
Cascading Style Sheets(CSS)和JavaScript是可以嵌入HTML页面并实现更复杂功能的其他技术。
Tim Berners-Lee在1990年创造了第一个网页浏览器和服务器,并同时创建了URL、HTML和HTTP等基础网络标准。
Mosaic浏览器是第一个允许将图形嵌入文本的浏览器,它在1993年由伊利诺伊大学香槟分校的团队创建。
随着网络的增长,人们需要新的方式来寻找信息,这导致了搜索引擎的发展。
Google的成功部分归功于其算法,该算法通过检查其他网站如何链接到某个页面来评估其质量。
网络中立性是一个原则,它认为互联网上的所有数据包应该被平等对待。
网络中立性的辩论涉及复杂的技术和商业问题,对创新和市场竞争都有深远的影响。
Transcripts
Hi, I’m Carrie Anne, and welcome to CrashCourse Computer Science.
Over the past two episodes, we’ve delved into the wires, signals, switches, packets,
routers and protocols that make up the internet.
Today we’re going to move up yet another level of abstraction and talk about the World
Wide Web.This is not the same thing as the Internet, even though people often use the
two terms interchangeably in everyday language.
The World Wide Web runs on top of the internet, in the same way that Skype, Minecraft or Instagram do.
The Internet is the underlying plumbing that conveys the data for all these different applications.
And The World Wide Web is the biggest of them all – a huge distributed application running
on millions of servers worldwide, accessed using a special program called a web browser.
We’re going to learn about that, and much more, in today’s episode.
INTRO
The fundamental building block of the World Wide Web – or web for short – is a single
page.
This is a document, containing content, which can include links to other pages.
These are called hyperlinks.
You all know what these look like: text or images that you can click, and they jump you
to another page.
These hyperlinks form a huge web of interconnected information, which is where the whole thing
gets its name.
This seems like such an obvious idea.
But before hyperlinks were implemented, every time you wanted to switch to another piece
of information on a computer, you had to rummage through the file system to find it, or type
it into a search box.
With hyperlinks, you can easily flow from one related topic to another.
The value of hyperlinked information was conceptualized by Vannevar Bush way back in 1945.
He published an article describing a hypothetical machine called a Memex, which we discussed
in Episode 24.
Bush described it as "associative indexing ... whereby any item may be caused at will
to select another immediately and automatically."
He elaborated: "The process of tying two things together is the important thing...thereafter,
at any time, when one of those items is in view, the other [item] can be instantly recalled
merely by tapping a button."
In 1945, computers didn’t even have screens, so this idea was way ahead of its time!
Text containing hyperlinks is so powerful, it got an equally awesome name: hypertext!
Web pages are the most common type of hypertext document today.
They’re retrieved and rendered by web browsers which we'll get to in a few minutes.
In order for pages to link to one another, each hypertext page needs a unique address.
On the web, this is specified by a Uniform Resource Locator, or URL for short.
An example web page URL is thecrashcourse.com/courses.
Like we discussed last episode, when you request a site, the first thing your computer does
is a DNS lookup.
This takes a domain name as input – like “the crash course dot com” – and replies
back with the corresponding computer’s IP address.
Now, armed with the IP address of the computer you want, your web browser opens a TCP connection
to a computer that’s running a special piece of software called a web server.
The standard port number for web servers is port 80.
At this point, all your computer has done is connect to the web server at the address
thecrashcourse.com
The next step is to ask that web server for the “courses” hypertext page.
To do this, it uses the aptly named Hypertext Transfer Protocol, or HTTP.
The very first documented version of this spec, HTTP 0.9, created in 1991, only had
one command – “GET”.
Fortunately, that’s pretty much all you need.
Because we’re trying to get the “courses” page, we send the server the following command
– GET /courses.
This command is sent as raw ASCII text to the web server, which then replies back with
the web page hypertext we requested.
This is interpreted by your computer's web browser and rendered to your screen.
If the user follows a link to another page, the computer just issues another GET request.
And this goes on and on as you surf around the website.
In later versions, HTTP added status codes, which prefixed any hypertext that was sent
following a GET request.
For example, status code 200 means OK – I’ve got the page and here it is!
Status codes in the four hundreds are for client errors.
Like, if a user asks the web server for a page that doesn’t exist, that’s the dreaded
404 error!
Web page hypertext is stored and sent as plain old text, for example, encoded in ASCII or
UTF-16, which we talked about in Episodes 4 and 20.
Because plain text files don’t have a way to specify what’s a link and what’s not,
it was necessary to develop a way to “mark up” a text file with hypertext elements.
For this, the Hypertext Markup Language was developed.
The very first version of HTML version 0.a, created in 1990, provided 18 HTML commands
to markup pages.
That’s it!
Let’s build a webpage with these!
First, let’s give our web page a big heading.
To do this, we type in the letters “H 1”, which indicates the start of a first level
heading, and we surround that in angle brackets.
This is one example of an HTML tag.
Then, we enter whatever heading text we want.
We don’t want the whole page to be a heading.
So, we need to “close” the “h1” tag like so, with a little slash in the front.
Now lets add some content.
Visitors may not know what Klingons are, so let’s make that word a hyperlink to the
Klingon Language Institute for more information.
We do this with an “A” tag, inside of which we include an attribute that specifies
a hyperlink reference.
That’s the page to jump to if the link is clicked.
And finally, we need to close the A tag.
Now lets add a second level heading, which uses an “h2” tag.
HTML also provides tags to create lists.
We start this by adding the tag for an ordered list.
Then we can add as many items as we want, surrounded in “L i” tags, which stands
for list item.
People may not know what a bat'leth is, so let’s make that a hyperlink too.
Lastly, for good form, we need to close the ordered list tag.
And we’re done – that’s a very simple web page!
If you save this text into notepad or textedit, and name it something like “test.html”,
you should be able to open it by dragging it into your computer’s web browser.
Of course, today’s web pages are a tad more sophisticated.
The newest version of HTML, version 5, has over a hundred different tags – for things
like images, tables, forms and buttons.
And there are other technologies we’re not going to discuss, like Cascading Style Sheets
or CSS and JavaScript, which can be embedded into HTML pages and do even fancier things.
That brings us back to web browsers.
This is the application on your computer that lets you talk with all these web servers.
Browsers not only request pages and media, but also render the content that’s being
returned.
The first web browser, and web server, was written by (now Sir) Tim Berners-Lee over
the course of two months in 1990.
At the time, he was working at CERN in Switzerland.
To pull this feat off, he simultaneously created several of the fundamental web standards we
discussed today: URLs, HTML and HTTP.
Not bad for two months work!
Although to be fair, he’d been researching hypertext systems for over a decade.
After initially circulating his software amongst colleagues at CERN, it was released to the
public in 1991.
The World Wide Web was born.
Importantly, the web was an open standard, making it possible for anyone to develop new
web servers and browsers.
This allowed a team at the University of Illinois at Urbana-Champaign to create the Mosaic web
browser in 1993.
It was the first browser that allowed graphics to be embedded alongside text; previous browsers
displayed graphics in separate windows.
It also introduced new features like bookmarks, and had a friendly GUI interface, which made
it popular.
Even though it looks pretty crusty, it’s recognizable as the web we know today!
By the end of the 1990s, there were many web browsers in use, like Netscape Navigator,
Internet Explorer, Opera, OmniWeb and Mozilla.
Many web servers were also developed, like Apache and Microsoft’s Internet Information
Services (IIS).
New websites popped up daily, and web mainstays like Amazon and eBay were founded in the mid-1990s.
A golden era!
The web was flourishing and people increasingly needed ways to find things.
If you knew the web address of where you wanted to go – like ebay.com – you could just
type it into the browser.
But what if you didn’t know where to go?
Like, you only knew that you wanted pictures of cute cats.
Right now!
Where do you go?
At first, people maintained web pages which served as directories hyperlinking to other
websites.
Most famous among these was "Jerry and David's guide to the World Wide Web", renamed Yahoo
in 1994.
As the web grew, these human-edited directories started to get unwieldy, and so search engines
were developed.
Let’s go to the thought bubble!
The earliest web search engine that operated like the ones we use today, was JumpStation,
created by Jonathon Fletcher in 1993 at the University of Stirling.
This consisted of three pieces of software that worked together.
The first was a web crawler, software that followed all the links it could find on the
web; anytime it followed a link to a page that had new links, it would add those to
its list.
The second component was an ever enlarging index, recording what text terms appeared
on what pages the crawler had visited.
The final piece was a search algorithm that consulted the index; for example, if I typed
the word “cat” into JumpStation, every webpage where the word “cat” appeared
would come up in a list.
Early search engines used very simple metrics to rank order their search results, most often
just the number of times a search term appeared on a page.
This worked okay, until people started gaming the system, like by writing “cat” hundreds
of times on their web pages just to steer traffic their way.
Google’s rise to fame was in large part due to a clever algorithm that sidestepped
this issue.
Instead of trusting the content on a web page, they looked at how other websites linked to
that page.
If it was a spam page with the word cat over and over again, no site would link to it.
But if the webpage was an authority on cats, then other sites would likely link to it.
So the number of what are called “backlinks”, especially from reputable sites, was often
a good sign of quality.
This started as a research project called BackRub at Stanford University in 1996, before
being spun out, two years later, into the Google we know today.
Thanks thought bubble!
Finally, I want to take a second to talk about a term you’ve probably heard a lot recently,
“Net Neutrality”.
Now that you’ve built an understanding of packets, internet routing, and the World Wide
Web, you know enough to understand the essence – at least the technical essence – of
this big debate.
In short, network neutrality is the principle that all packets on the internet should be
treated equally.
It doesn’t matter if the packets are my email or you streaming this video, they should
all chug along at the same speed and priority.
But many companies would prefer that their data arrive to you preferentially.
Take for example, Comcast, a large ISP that also owns many TV channels, like NBC and The
Weather Channel, which are streamed online.
Not to pick on Comcast, but in the absence of Net Neutrality rules, they could for example say that
they want their content to be delivered silky smooth, with high priority…
But other streaming videos are going to get throttled, that is, intentionally given less
bandwidth and lower priority. Again I just want to reiterate here this is just conjecture.
At a high level, Net Neutrality advocates argue that giving internet providers this
ability to essentially set up tolls on the internet – to provide premium packet delivery
– plants the seeds for an exploitative business model.
ISPs could be gatekeepers to content, with strong incentives to not play nice with competitors.
Also, if big companies like Netflix and Google can pay to get special treatment, small companies,
like start-ups, will be at a disadvantage, stifling innovation.
On the other hand, there are good technical reasons why you might want different types
of data to flow at different speeds.
That skype call needs high priority, but it’s not a big deal if an email comes in a few
seconds late.
Net-neutrality opponents also argue that market forces and competition would discourage bad
behavior, because customers would leave ISPs that are throttling sites they like.
This debate will rage on for a while yet, and as we always encourage on Crash Course,
you should go out and learn more because the implications of Net Neutrality are complex
and wide-reaching.
I’ll see you next week.
浏览更多相关视频
The Internet: Crash Course Computer Science #29
HTML Tutorial - Website Crash Course for Beginners
Brave浏览器:看广告赚钱!能访问暗网!币圈友好的浏览器。。。deep web dark web #197
Basic Computing Skills - Orientation
Computer Networks: Crash Course Computer Science #28
LangGraph AI Agents: How Future of Internet Search will look like?
5.0 / 5 (0 votes)