Background Service TCP Communication Anomaly Troubleshooting

Business Model: The backend service establishes a connection with the group’s market data gateway using TCP. Each time a connection is established, it must first send an authorization request and then continuously send heartbeat packages to maintain the connection status.

However, one day, an alert message was received indicating that the service had disconnected. After carefully examining the logs, it was discovered that the backend service was continuously sending heartbeat packages, but the other party did not respond at all, yet the connection remained open.

Field Summary

I was originally working in the office, pushing project progress, when an alarm message suddenly popped up in the company group. At first glance, I thought it was just a recurring issue – likely due to network timeouts causing heartbeat failures, leading to service disconnection. However, after careful log examination, the actual situation turned out to be different. The backend had sent an authorization login message, but hadn’t received a response; meanwhile, heartbeat packets continued to send persistently, yet the other party never replied with any heartbeat data. After in-depth analysis of the logs, several key issues were exposed:

  1. No Response to Authorization Message: This was likely due to the other system being in the process of restarting, preventing the authorization message from being processed promptly.
  2. Sending Heartbeat Data Without Successful Authorization: Upon investigation, it was found that this was a logical flaw in the program’s logic. The heartbeat sending function’s judgment logic had a defect; it only checked the connection status but missed verifying the authorization status.
  3. Service Did Not Disconnect: If the service could have disconnected, it would have triggered a reconnection mechanism and re-sent the authorization message.

Currently, there’s one remaining critical issue that needs to be resolved – why didn’t the service disconnect? Solving this problem requires more in-depth and detailed troubleshooting work.

Analyzing Network Packets

tcpdump is a very powerful network packet capture tool that can be used to capture network packets. By analyzing network packets, we can gain a more intuitive understanding of the details of network communication. Here, we can use tcpdump to capture network packets for further analysis. tcpdump Analyzing the data in the diagram, I see that the heartbeat is constantly being sent normally, and the other server did not respond with any data, but it sent an ACK, which prevents the connection from disconnecting proactively.

Common Flag Bit Explanations

In the TCP protocol, PSH (Push) and ACK (Acknowledgment) are two important flag bits used to control data transmission and traffic confirmation, respectively. Their functions are as follows:

1. PSH (Push Flag)

  • Function: The PSH flag’s purpose is to request that the receiver immediately push data from its buffer to the upper-layer application (rather than waiting for the buffer to fill). This means that once a data segment with the PSH flag is received, the receiver will process and transmit it as quickly as possible to the application, rather than storing it in an operating system buffer.
  • Typical Scenarios:
    • HTTP/HTTPS Requests: Clients setting the PSH when sending requests (e.g., GET /index.html) to ensure immediate response from the server.
    • SSH Protocol: Each keystroke triggers a PSH, ensuring real-time transmission of input characters.
    • Real-Time Communication: Low-latency scenarios like video streaming or online games may utilize PSH to reduce latency.
  • Note:
    • PSH is not mandatory; the receiver can choose to ignore this flag (but still process the data normally).
    • The sender may not set PSH, in which case the receiver will determine when to push data based on its own buffering strategy.

2. ACK (Acknowledgment Flag)

  • Function: The ACK flag indicates that the previous data segment has been correctly received. Each ACK contains an acknowledgment number (Acknowledgment Number), which represents the next byte sequence expected to be received. It is the core mechanism of TCP reliable transmission.
  • Working Principle:
    • When the sender sends a data segment, it carries the expected ACK value from the receiver (e.g., ACK = Sequence Number + Data Length).
    • Upon receiving the data, the receiver generates an ACK message to confirm the received byte sequence number.
    • The sender only retransmits unacknowledged data after receiving the corresponding ACK.
  • Example:
    • If the sender sends a data segment with sequence numbers 100~199, then the expected ACK from the receiver should be 200.
    • If the receiver has not received some of the data in 100~199, it will inform the sender to retransmit via ACK=150.

3. Combination of PSH and ACK

In the TCP header, PSH and ACK can appear simultaneously, commonly seen in the following scenarios:

  • HTTP Request Response: When a client sends a POST request (including data), it sets both PSH and ACK (to acknowledge previous responses).
  • Command Transfer after SSH Handshake: After the client enters a command, it sends a data segment with PSH and ACK to ensure that the command is immediately transmitted and processed by the server.

4. Other Flagged Associations

Flag Name Brief Description
SYN Synchronize Initiate connection (three-way handshake)

4. Other Flagged Associations

Flag Name Brief Description
FIN End Graceful connection closure

4. Other Flagged Associations

Flag Name Brief Description
RST Reset Forcefully terminates the connection (exceptional circumstances)

4. Other Flagged Associations

Flag Name Brief Description
URG Urgent Marks an urgent pointer (rarely used)

4. Other Flagged Associations

Summary

  • PSH focuses on data arriving at the application layer as quickly as possible, reducing latency.
  • ACK focuses on reliable data transmission, avoiding packet loss or out-of-order delivery.

The two work together to balance TCP protocol efficiency and reliability.

A financial IT programmer's tinkering and daily life musings
Built with Hugo
Theme Stack designed by Jimmy