Running out of TCP source ports

The Backend Engineering Show
26 Aug 202418:33

Summary

TLDRIn this episode of the Back End Engineering Show, Hussein Naser discusses a peculiar bug he encountered where a web server stopped responding after a few thousand requests. The root cause was the exhaustion of ephemeral ports due to the web server creating a new connection to the message broker for each request instead of reusing existing ones. This led to a flood of connections and eventually, a system crash. Naser explains the technical details of TCP connections, the role of ephemeral ports, and the importance of efficient connection management in software engineering.

Takeaways

  • 🐛 The speaker discusses the inevitability of bugs in software engineering and the process of encountering, addressing, and sometimes working around them.
  • 📡 The video focuses on a specific bug related to TCP connections that the speaker had not previously encountered.
  • 🌐 The bug occurred in a system involving a web server and a message broker, where the web server was not responding after a certain number of requests.
  • 🔍 After several thousand requests, the system's behavior changed, leading to client and proxy timeouts, which initially obscured the root cause.
  • 📈 The speaker discovered an unusually high number of connections from the web server to the message broker, which should have been limited to one or two.
  • 🚀 The issue was traced to the web server running out of ephemeral ports, which are temporary ports assigned by the operating system for outgoing connections.
  • 🛠️ The problem was caused by inefficient code that created a new connection for every request instead of reusing existing connections, leading to a resource leak.
  • 🔄 The web server was also keeping connections alive with a custom keep-alive mechanism, which exacerbated the issue by maintaining unnecessary connections.
  • 💡 Once the bug was identified and fixed by ensuring connections were not unnecessarily recreated, the system returned to normal operation with a single, efficient connection.
  • 🔗 The speaker also explored whether this was a common issue and referenced a blog post about running out of ephemeral ports, indicating it's a potential pitfall for developers.

Q & A

  • What is the main issue discussed in the video script?

    -The main issue discussed is a software bug related to running out of ephemeral ports in TCP connections, which leads to the web server not being able to handle new requests properly.

  • Why are bugs considered an integral part of a software engineer's experience?

    -Bugs are considered an integral part of a software engineer's experience because they represent challenges that engineers encounter, diagnose, and overcome, which ultimately leads to learning and improvement in their skills.

  • What is the significance of the number 65,000 in the context of the script?

    -The number 65,000 refers to the maximum number of ephemeral ports available for TCP connections in IPv4, which is a common limit that can be reached if the system is not managing connections efficiently.

  • What is the role of the message broker in the system described?

    -The message broker in the system is responsible for receiving messages or jobs submitted by the web server and processing them or passing them on to downstream services for further handling.

  • What happens after a few thousand requests in the described scenario?

    -After a few thousand requests, the web server stops responding, leading to client timeouts and, in some cases, proxy timeouts if a proxy is involved.

  • Why did the investigation focus on the number of connections from the web server to the message broker?

    -The investigation focused on the number of connections because an unusually high number of outgoing connections from the web server to the message broker was observed, which indicated a potential issue with how connections were being managed.

  • What is the significance of the source port in TCP connections?

    -The source port is significant in TCP connections because it, along with the source IP, destination IP, and destination port, forms the tuple that uniquely identifies a connection. The operating system assigns a random source port from the ephemeral port range to ensure this uniqueness.

  • Why did the web server start creating excessive connections?

    -The web server started creating excessive connections due to a bug that caused it to create a new connection for every request instead of reusing existing connections, leading to a rapid depletion of available ephemeral ports.

  • What is the 'pingpong' protocol mentioned in the script?

    -The 'pingpong' protocol mentioned is an application layer keep-alive mechanism similar to WebSocket's ping/pong, where the application sends periodic messages to maintain an active connection, even if there is no actual data to transmit.

  • How was the issue of running out of ephemeral ports resolved in the script?

    -The issue was resolved by fixing the bug in the web server's logic that incorrectly created a new connection for every request instead of reusing existing connections, thus preventing the exhaustion of ephemeral ports.

Outlines

plate

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.

Mejorar ahora

Mindmap

plate

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.

Mejorar ahora

Keywords

plate

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.

Mejorar ahora

Highlights

plate

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.

Mejorar ahora

Transcripts

plate

Esta sección está disponible solo para usuarios con suscripción. Por favor, mejora tu plan para acceder a esta parte.

Mejorar ahora
Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Software EngineeringBug FixesTCP ConnectionsSystem FailureEphemeral PortsNetwork IssuesTroubleshootingWeb ServerMessage QueueConnection Management
¿Necesitas un resumen en inglés?