The Internet: Crash Course Computer Science #29
Summary
TLDR这段视频脚本介绍了计算机网络的基础知识,特别是互联网的工作原理。视频首先解释了互联网是如何通过一个大型分布式网络将数据包发送到全球各地的。接着,通过一个例子说明了数据包是如何从用户的本地区域网络(LAN)出发,经过多个路由器,最终到达互联网的主干网,并找到目标服务器的。视频还探讨了互联网协议(IP)和传输层协议,如用户数据报协议(UDP)和传输控制协议(TCP),它们如何确保数据的传输和完整性。此外,还介绍了域名系统(DNS)如何将人类可读的域名转换为机器可读的IP地址。最后,视频提到了OSI模型的前五层,即物理层、数据链路层、网络层、传输层和会话层,这些层次共同构成了网络通信的基础。
Takeaways
- 🌐 互联网是一个庞大的分布式网络,通过数据包传输数据。
- 📡 你的计算机首先连接到本地区域网络(LAN),然后通过互联网服务提供商(ISP)的广域网(WAN)连接到互联网。
- 📶 互联网的骨干网络由具有超高带宽连接的大型路由器组成。
- 📚 通过使用traceroute程序,可以查看数据在互联网上经过的路由。
- 📦 IP协议规定了互联网数据包的标准,类似于邮政系统中每封信都需要有唯一且清晰的地址。
- 📬 UDP(用户数据报协议)是IP协议之上的一个更高级协议,提供了端口号和校验和等额外信息。
- 🔄 TCP(传输控制协议)提供了数据包的顺序编号、确认应答和重传机制,确保数据的可靠传输。
- 🚦 TCP/IP协议组合允许数据包在网络中顺序传输、错误检测和恢复,以及根据网络拥塞调整传输速率。
- 🕹️ 对于时间敏感的应用,如多人第一人称射击游戏,UDP的简单和快速特性可能比TCP的健壮性更重要。
- 🌐 DNS(域名系统)是互联网的“电话簿”,将易于记忆的域名映射到IP地址。
- 📈 OSI模型是一个概念框架,将网络过程分解为7个不同的层次,每一层解决不同的问题。
- 🔗 下一次课程将讨论OSI模型的最后两层:表示层和应用层,包括网络浏览器、Skype、HTML解码和流媒体等。
Q & A
互联网是如何连接到全球的分布式网络的?
-互联网通过一个不断扩大的互联设备网络连接到全球。你的计算机首先连接到本地区域网络(LAN),这可能是你家里所有连接到Wi-Fi路由器的设备。然后连接到广域网(WAN),这通常由你的互联网服务提供商(ISP)运营,如Comcast、AT&T或Verizon。最初,这将是一个区域路由器,比如你邻居的路由器,然后该路由器连接到更大的WAN,可能是整个城市或城镇的路由器。经过几次跳转,最终连接到互联网的主干,由具有超高带宽连接的大型路由器组成。
如果数据包在传输过程中丢失了怎么办?
-如果使用UDP协议,那么数据包丢失后发送端不会有任何机制来修复数据或请求新副本,接收端程序会检测到数据损坏,但通常会丢弃该数据包。而TCP协议则包含了顺序号码和确认(ACK)机制,如果确认包丢失或没有在规定时间内收到,发送端将重新传输相同的数据包。
什么是IP协议,它与UDP和TCP有什么关系?
-IP协议,即互联网协议,是一种标准,规定了互联网数据包的格式。它类似于邮政系统中的邮件发送,每个数据包都需要有一个唯一的、清晰的目的地地址。IP协议是一个低层协议,主要包含目的地地址。UDP和TCP是更高级的协议,它们建立在IP协议之上,为数据传输提供额外的功能,如端口号、顺序号码、错误检测和恢复等。
为什么有些应用程序使用UDP而不是TCP?
-UDP因其简单和快速的特性,适用于一些可以容忍数据包丢失或损坏的实时应用程序,如Skype视频聊天。而TCP则提供了更可靠的数据传输,包括顺序保证、错误检测和重传机制,适用于需要可靠传输的应用,如电子邮件传输。
DNS服务器在互联网中扮演什么角色?
-DNS服务器,即域名系统服务器,是互联网的“电话簿”,它将人类可读的域名(如google.com)映射到IP地址(如172.217.7.238)。当你在浏览器中输入一个网址时,浏览器会向DNS服务器查询该域名对应的IP地址,然后才能向该地址发送请求以获取网站数据。
如何使用traceroute程序查看数据在互联网上的路由路径?
-在运行Windows、MacOS或Linux的计算机上,可以使用traceroute程序来追踪数据在互联网上的路由路径。这个程序会显示数据包从你的计算机出发到达目标服务器所经过的各个路由器节点。
什么是OSI模型,它包含哪些层?
-OSI模型,即开放式系统互联通信参考模型,是一个概念框架,用于将不同的网络过程进行分类。它包含7层,分别是物理层、数据链路层、网络层、传输层、会话层、表示层和应用层。每一层都有其特定的功能和协议,从物理介质的电气特性到最终用户应用程序的数据表示和访问。
TCP和UDP的主要区别是什么?
-TCP(传输控制协议)是一个面向连接的、可靠的、基于字节流的传输层通信协议,它提供了数据包的顺序传输、错误检测、错误修复以及流量控制等特性。而UDP(用户数据报协议)是一个简单、无连接的协议,它不保证数据包的顺序、不提供错误修复,也不进行流量控制,因此它的头部开销小,速度快,适用于对实时性要求高的应用。
如何理解互联网的分层结构?
-互联网的分层结构是指将网络通信的复杂功能分解成若干层次,每一层处理不同的任务和问题。从物理层的电气信号传输,到数据链路层的帧传输,再到网络层的路由选择,传输层的协议(如TCP和UDP)负责点对点的数据传输,会话层管理通信会话,表示层负责数据的表示和安全,最后应用层提供网络服务和应用软件的接口。
为什么TCP协议需要顺序号码和确认机制?
-TCP协议需要顺序号码和确认机制是为了确保数据的可靠传输。顺序号码允许接收计算机按正确顺序重新组装数据包,即使它们在网络中到达的时间不同。确认机制确保了发送计算机知道数据包是否成功到达,如果没有收到确认,发送端可以重新发送数据包。
为什么互联网的主干由具有超高带宽连接的大型路由器组成?
-互联网的主干是全球互联网的核心部分,负责在不同地区和国家之间传输大量的数据。因此,它需要由具有超高带宽的大型路由器组成,以保证数据传输的速度和效率,同时也能够处理不同网络之间的大量数据交换。
如何理解网络延迟和丢包对实时通信应用的影响?
-网络延迟是指数据从发送端到接收端所需的时间,而丢包则是指在传输过程中丢失的数据包。对于实时通信应用,如视频聊天或在线游戏,网络延迟和丢包可能会导致通信质量下降,出现声音或图像的断断续续、延迟或完全丢失。UDP由于其轻量级和低延迟特性,通常用于这类应用,尽管它不保证数据包的可靠传输。
为什么TCP/IP协议被称为互联网的基础协议?
-TCP/IP协议是互联网的基础协议,因为它们共同构成了互联网数据传输的核心机制。IP协议负责将数据包从源头路由到目的地,而TCP协议则确保了数据的可靠传输,包括顺序保证、错误检测、重传机制等。这种组合使得TCP/IP协议能够适应各种网络环境和应用需求,从而成为互联网通信的基石。
Outlines
🌐 互联网基础架构与数据传输
第一段落主要介绍了互联网的工作原理,包括计算机如何通过本地网络(LAN)、广域网(WAN)、互联网服务提供商(ISP)的路由器,最终连接到互联网的主干网络。详细描述了数据包从请求到传输的整个过程,包括通过traceroute程序查看数据传输路径,以及数据包如何通过互联网协议(IP)和用户数据报协议(UDP)在网络中正确传输。此外,还探讨了数据包在传输过程中可能出现的问题,如数据包丢失或损坏,以及UDP协议如何通过校验和来验证数据的完整性。
📨 TCP/IP协议与DNS服务
第二段落深入讲解了传输控制协议(TCP)和用户数据报协议(UDP)的区别,以及它们如何与IP协议协同工作,即所谓的TCP/IP模型。TCP协议确保数据包的顺序传输、确认接收和重传机制,适用于需要可靠传输的应用,如电子邮件。而UDP协议则更简单快速,适用于能够容忍数据丢失的应用,如Skype视频通话。此外,还介绍了域名系统(DNS),它将易于记忆的域名转换为IP地址,从而简化了网络访问。最后,段落还探讨了OSI模型的前五层,包括物理层、数据链路层、网络层、传输层和会话层,它们各自负责不同的网络功能。
🔄 OSI模型与网络分层
第三段落继续探讨了OSI模型的后两层,即表示层和应用层,它们负责处理数据的表示方式和应用程序的网络访问。本段还回顾了前五层的功能,包括物理层的电气信号和无线信号传输、数据链路层的MAC地址和冲突检测、网络层的路由技术、传输层的UDP和TCP协议,以及会话层的连接管理。最后,强调了抽象化的重要性,它允许工程师和计算机科学家能够同时改进网络堆栈的不同层次,而不会被整体复杂性所压倒。
Mindmap
Keywords
💡互联网
💡数据包
💡IP协议
💡UDP
💡TCP
💡DNS
💡端口号
💡校验和
💡拥塞控制
💡OSI模型
💡Traceroute
Highlights
互联网是一个不断扩大的分布式网络,连接着全球的设备。
计算机通过本地区域网络(LAN)连接到互联网,然后通过广域网(WAN)连接到互联网服务提供商(ISP)。
互联网的骨干网络由具有超高带宽连接的大型路由器组成。
使用traceroute程序可以追踪数据在互联网上的路由路径。
IP协议规定了互联网数据包的标准,类似于邮政系统中的地址系统。
用户数据报协议(UDP)提供了端口号,允许数据包被正确地传递给运行在计算机上的特定程序。
UDP包含校验和,用于数据的正确性验证,但不提供错误修复或重新发送机制。
传输控制协议(TCP)提供了数据包的顺序编号和确认机制,确保数据的完整性和顺序。
TCP/IP是互联网上常用的协议组合,其中TCP负责数据传输的可靠性。
TCP可以根据网络拥塞情况调整传输速率,实现拥塞控制。
UDP因其简单和快速特性,适用于对实时性要求高的应用,如多人第一人称射击游戏。
域名系统(DNS)是互联网的地址簿,将域名映射到IP地址。
DNS查询过程展示了如何将用户友好的域名转换为计算机可以理解的IP地址。
DNS使用树状数据结构来组织和存储超过3亿的注册域名。
OSI模型的底部五层包括物理层、数据链路层、网络层、传输层和会话层,每一层都负责不同的网络功能。
抽象化允许计算机科学家和工程师同时改进网络堆栈的不同层,而不会被整体复杂性所压倒。
OSI模型的上层包括表示层和应用层,涉及网络浏览器、Skype、HTML解码和流媒体等应用。
Transcripts
Hi, I’m Carrie Anne, and welcome to CrashCourse Computer Science!
As we talked about last episode, your computer is connected to a large, distributed network,
called The Internet.
I know this because you’re watching a youtube video, which is being streamed over that very
internet.
It’s arranged as an ever-enlarging web of interconnected devices.
For your computer to get this video, the first connection is to your local area network,
or LAN, which might be every device in your house that’s connected to your wifi router.
This then connects to a Wide Area Network, or WAN, which is likely to be a router run
by your Internet Service Provider, or ISP – companies like Comcast, AT&T or Verizon.
At first, this will be a regional router, like one for your neighborhood, and then that
router connects to an even bigger WAN, maybe one for your whole city or town.
There might be a couple more hops, but ultimately you’ll connect to the backbone of the internet
made up of gigantic routers with super high-bandwidth connections running between them.
To request this video file from youtube, a packet had to work its way up to the backbone,
travel along that for a bit, and then work its way back down to a youtube server that
had the file.
That might be four hops up, two hops across the backbone, and four hops down, for a total
of ten hops.
If you’re running Windows, MacOS or Linux, you can see the route data takes to different
places on the internet by using the traceroute program on your computer.
Instructions in the Doobly Doo.
For us here at the Chad & Stacey Emigholz Studio in Indianapolis, the route to the DFTBA
server in California goes through 11 stops.
We start at 192.168.0.1 -- thats the IP address for my computer on our LAN.
Then there’s the wifi router here at the studio, then a series of regional routers,
then we get onto the backbone, and then we start working back down to the computer hosting
“DFTBA dot com”, which has the IP address 104.24.109.186.
But how does a packet actually get there?
What happens if a packet gets lost along the way?
If I type “DFTBA dot com” into my web browser, how does it know the server’s address?
Those are our topics for today!
INTRO
As we discussed last episode, the internet is a huge distributed network that sends data
around as little packets.
If your data is big enough, like an email attachment, it might get broken up into many
packets.
For example, this video stream is arriving to your computer right now as a series of
packets, and not one gigantic file.
Internet packets have to conform to a standard called the Internet Protocol, or IP.
It’s a lot like sending physical mail through the postal system – every letter needs a
unique and legible address written on it, and there are limits to the size and weight
of packages.
Violate this, and your letter won’t get through.
IP packets are very similar.
However, IP is a very low level protocol – there isn’t much more than a destination address
in a packet’s header, which is the metadata that’s stored in front of the data payload.
This means that a packet can show up at a computer, but the computer may not know which
application to give the data to; Skype or Call of Duty.
For this reason, more advanced protocols were developed that sit on top of IP.
One of the simplest and most common is the User Datagram Protocol, or UDP.
UDP has its own header, which sits inside the data payload.
Inside of the UDP header is some useful, extra information.
One of them is a port number.
Every program wanting to access the internet will ask its host computer’s Operating System
to be given a unique port.
Like Skype might ask for port number 3478.
When a packet arrives to the computer, the Operating System will look inside the UDP
header and read the port number.
Then, if it sees, for example, 3478, it will give the packet to Skype.
So to review, IP gets the packet to the right computer, but UDP gets the packet to the right
program running on that computer.
UDP headers also include something called a checksum, which allows the data to be verified
for correctness.
As the name suggests, it does this by checking the sum of the data.
Here’s a simplified version of how this works.
Lets imagine the raw data in our UDP packet is 89 111 33 32 58 and 41.
Before the packet is sent, the transmitting computer calculates the checksum by adding
all the data together: 89 plus 111 plus 33 and so on.
In our example, this adds up to a checksum of 364.
In UDP, the checksum value is stored in 16 bits.
If the sum exceeds the maximum possible value, the upper-most bits overflow, and only the
lower bits are used.
Now, when the receiving computer gets this packet, it repeats the process, adding up
all the data.
89 plus 111 plus 33 and so on.
If that sum is the same as the checksum sent in the header, all is well.
But, if the numbers don’t match, you know that the data got corrupted at some point
in transit, maybe because of a power fluctuation or faulty cable.
Unfortunately, UDP doesn’t offer any mechanisms to fix the data, or request a new copy – receiving
programs are alerted to the corruption, but typically just discard the packet.
Also, UDP provides no mechanisms to know if packets are getting through – a sending
computer shoots the UDP packet off, but has no confirmation it ever gets to its destination
successfully.
Both of these properties sound pretty catastrophic, but some applications are ok with this, because
UDP is also really simple and fast.
Skype, for example, which uses UDP for video chat, can handle corrupt or missing packets.
That’s why sometimes if you’re on a bad internet connection,
Skype gets all glitchy – only some of the UDP packets are making it to your computer.
Skype does the best it can with the data it does receive correctly.
But this approach doesn’t work for many other types of data transmission.
Like, it doesn’t really work if you send an email, and it shows up with the middle
missing.
The whole message really needs to get there correctly!
When it “absolutely, positively needs to get there”, programs use the Transmission
Control Protocol, or TCP, which like UDP, rides inside the data payload of IP packets.
For this reason, people refer to this combination of protocols as TCP/IP.
Like UDP, the TCP header contains a destination port and checksum.
But, it also contains fancier features, and we’ll focus on the key ones.
First off, TCP packets are given sequential numbers.
So packet 15 is followed by packet 16, which is followed by 17, and so on... for potentially
millions of packets sent during that session.
These sequence numbers allow a receiving computer to put the packets into the correct order,
even if they arrive at different times across the network.
So if an email comes in all scrambled, the TCP implementation in your computer’s operating
system will piece it all together correctly.
Second, TCP requires that once a computer has correctly received a packet – and the
data passes the checksum – that it send back an acknowledgement, or “ACK” as the
cool kids say, to the sending computer.
Knowing the packet made it successfully, the sender can now transmit the next packet.
But this time, let’s say, it waits, and doesn’t get an acknowledgement packet back.
Something must be wrong If enough time elapses, the sender will go ahead and just retransmit
the same packet.
It’s worth noting that the original packet might have actually gotten there, but the
acknowledgment is just really delayed.
Or perhaps it was the acknowledgment that was lost.
Either way, it doesn’t matter, because the receiver has those sequence numbers, and if
a duplicate packet arrives, it can be discarded.
Also, TCP isn’t limited to a back and forth conversation – it can send many packets,
and have many outstanding ACKs, which increases bandwidth significantly, since you aren’t
wasting time waiting for acknowledgment packets to return.
Interestingly, the success rate of ACKs, and also the round trip time between sending and
acknowledging, can be used to infer network congestion.
TCP uses this information to adjust how aggressively it sends packets – a mechanism for congestion
control.
So, basically, TCP can handle out-of-order packet delivery, dropped packets – including
retransmission – and even throttle its transmission rate according to available bandwidth.
Pretty awesome!
You might wonder why anyone would use UDP when TCP has all these nifty features.
The single biggest downside are all those acknowledgment packets – it doubles the
number of messages on the network, and yet, you're not transmitting any more data.
That overhead, including associated delays, is sometimes not worth the improved robustness,
especially for time-critical applications, like Multiplayer First Person Shooters.
And if it’s you getting lag-fragged you’ll definitely agree!
When your computer wants to make a connection to a website, you need two things - an IP
address and a port.
Like port 80, at 172.217.7.238.
This example is the IP address and port for the Google web server.
In fact, you can enter this into your browser’s address bar, like so, and you’ll end up
on the google homepage.
This gets you to the right destination, but remembering that long string of digits would
be really annoying.
It’s much easier to remember: google.com.
So the internet has a special service that maps these domain names to addresses.
It’s like the phone book for the internet.
And it’s called the Domain Name System, or DNS for short.
You can probably guess how it works.
When you type something like “youtube.com” into your web browser, it goes and asks a
DNS server – usually one provided by your ISP – to lookup the address.
DNS consults its huge registry, and replies with the address... if one exists.
In fact, if you try mashing your keyboard, adding “.com”, and then hit enter in your
browser, you’ll likely be presented with an error that says DNS failed.
That’s because that site doesn’t exist, so DNS couldn’t give your browser an address.
But, if DNS returns a valid address, which it should for “youtube.com”, then your
browser shoots off a request over TCP for the website’s data.
There’s over 300 million registered domain names, so to make that DNS Lookup a little
more manageable, it’s not stored as one gigantically long list, but rather in a tree
data structure.
What are called Top Level Domains, or TLDs, are at the very top.
These are huge categories like .com and .gov.
Then, there are lower level domains that sit below that, called second level domains; Examples
under .com include google.com and dftba.com.
Then, there are even lower level domains, called subdomains, like images.google.com,
store.dftba.com.
And this tree is absolutely HUGE!
Like I said, more than 300 million domain names, and that's just second level domain
names, not all the sub domains.
For this reason, this data is distributed across many DNS servers, which are authorities
for different parts of the tree.
Okay, I know you’ve been waiting for it...
We’ve reached a new level of abstraction!
Over the past two episodes, we’ve worked up from electrical signals on wires, or radio
signals transmitted through the air in the case of wireless networks.
This is called the Physical Layer.
MAC addresses, collision detection, exponential backoff and similar low level protocols that
mediate access to the physical layer are part of the Data Link Layer.
Above this is the Network Layer, which is where all the switching and routing technologies
that we discussed operate.
And today, we mostly covered the Transport layer, protocols like UDP and TCP, which are
responsible for point to point data transfer between computers, and also things like error
detection and recovery when possible.
We’ve also grazed the Session Layer – where protocols like TCP and UDP are used to open
a connection, pass information back and forth, and then close the connection when finished
– what’s called a session.
This is exactly what happens when you, for example, do a DNS Lookup, or request a webpage.
These are the bottom five layers of the Open System Interconnection (OSI) model, a conceptual
framework for compartmentalizing all these different network processes.
Each level has different things to worry about and solve, and it would be impossible to build
one huge networking implementation.
As we’ve talked about all series, abstraction allows computer scientists and engineers to
be improving all these different levels of the stack simultaneously, without being overwhelmed
by the full complexity.
And amazingly, we’re not quite done yet…
The OSI model has two more layers, the Presentation Layer and the Application Layer, which include
things like web browsers, Skype, HTML decoding, streaming movies and more.
Which we’ll talk about next week.
See you then.
تصفح المزيد من مقاطع الفيديو ذات الصلة
5.0 / 5 (0 votes)