The business model involves backend services establishing a connection with the group’s market gateway via TCP. Each connection requires sending an authorization request first, followed by continuously sending heartbeat packets to maintain the connection status. However, one day, we received an alert message indicating a service disconnection. After carefully checking the logs, we discovered that the backend service was continuously sending heartbeat packets, but there was no response from the other party, yet the connection never disconnected.
Brief description of the scene
I was originally working overtime at the company to push forward project progress when an alarm message suddenly popped up in the work group. At first glance, I thought it was just the usual issue – likely a network timeout causing heartbeat failures and subsequently disconnecting the service. However, after carefully checking the logs, I found that the actual situation was not like that. The backend had sent authorization login messages, but received no response. Meanwhile, heartbeats continued to be sent incessantly, yet the other party never replied with any heartbeat data. In-depth analysis of the logs revealed the following key issues:
- Authorization message received no response: It is very likely that the other party’s system is restarting, preventing the authorization message from being processed in a timely manner
- The heartbeat data was sent even though authorization failed: After investigation, we found a flaw in the program logic. The judgment logic of the heartbeat sending function is flawed; it only checks the connection status but overlooks the authorization status check.
- If the service can be disconnected, it will trigger a reconnection mechanism and resend the authorization message
Currently, there remains one last urgent issue that needs resolving—why the connection has not been disconnected. Solving this problem requires more in-depth and detailed troubleshooting work.
Analyzing network packets
tcpdump
is a very powerful network packet capture tool that can be used to capture network data packets. By analyzing these network data packets, we can gain a more intuitive understanding of the details of network communication. Here, we can use tcpdump
to capture network data packets for further analysis.
Analyzing the data in the graph, I can see that the heartbeat is consistently being sent, but the other server isn’t responding with any data, yet it’s sending an ACK
. This prevents the connection from disconnecting on its own.
Common Flag Explanations
In the TCP protocol, PSH
(Push) and ACK
(Acknowledgment) are two important flags used to control data transmission and flow confirmation. Their functions are as follows:
1. PSH(Push Flag)
-
Features The purpose of the
PSH
flag is to request that the receiver immediately push data from the buffer to the upper layer application (instead of waiting for the buffer to fill up). This means that once a data segment with thePSH
flag is received, the receiver will process and pass it to the application as quickly as possible, rather than storing it in the operating system buffer. -
Typical Scenarios
- HTTP/HTTPS requests: When a client sends a request (such as
GET /index.html
), it sets thePSH
flag, hoping that the server will respond immediately - The SSH protocol: Each keyboard input triggers a
PSH
, ensuring that input characters are transmitted in real-time - Real-time communication: Low-latency scenarios such as video streams and online games may use
PSH
to reduce latency
- HTTP/HTTPS requests: When a client sends a request (such as
-
Note:
- PSH is not mandatory; the receiving party can choose to ignore this flag (but still needs to process the data normally)
- The sender may not set the
PSH
, in which case the receiver will decide when to push data based on its own buffering strategy
2. ACK(Acknowledgment Flag)
-
Features The ACK flag indicates that the preceding segment of data has been received correctly. Each ACK contains an acknowledgment number (Acknowledgment Number), which represents the next expected byte sequence number. It is a core mechanism for reliable transmission in TCP.
-
Working principle:
- When the sender sends a data segment, it carries the expected receiver’s
ACK
value (for example,ACK = sequence number + data length
) - Upon receiving data, the receiver generates an
ACK
segment confirming the received sequence number - The sender will only retransmit unacknowledged data after receiving the corresponding ACK
- When the sender sends a data segment, it carries the expected receiver’s
-
Example
- If the sender sends a data segment with sequence number
100~199
, the expectedACK
from the receiver should be200
- If the receiving party fails to receive some of the data within the range of
100~199
, it will inform the sending party to retransmit viaACK=150
- If the sender sends a data segment with sequence number
The combination of PSH and ACK
In TCP packets, PSH
and ACK
can appear simultaneously, commonly seen in the following scenarios:
-
HTTP request response When the client sends a
POST
request (with data), it setsPSH
andACK
(acknowledgment of previous responses)Client → Server: SYN, ACK=1 → 建立连接 Client → Server: PSH, ACK=1, 数据 → 发送请求数据 Server → Client: PSH, ACK=数据长度+1 → 返回响应
-
Transmit commands after SSH handshake After the client enters a command, it sends a data segment with
PSH
andACK
to ensure that the command is immediately transmitted and processed by the server
Other flag bit associations
Flag | Name | Brief Description |
---|---|---|
SYN Synchronization Initialization Connection (Three-Way Handshake) | ||
FIN | End | Gracefully close connection |
Reset | Force connection termination (abnormal situation) | |
Mark urgent pointer (rarely used) |
Summary
- PSH focuses on getting data to the application layer as quickly as possible, reducing latency
- ACK focuses on reliable data transmission, avoiding packet loss or out-of-order delivery
They work together to balance the efficiency and reliability of the TCP protocol