In software development and operations, deadlocked processes are frequently encountered. This situation can lead to performance degradation or service unavailability. This article introduces how to use the pstack tool to troubleshoot deadlocked process issues by analyzing process stack information to identify the root cause and resolve it.
Background: A child service within the risk control system experienced a deadlocked state, resulting in the unavailability of the risk control service. Due to the lack of service availability monitoring, the deadlocked process situation was not detected in a timely manner, leading to system unavailability.
Text
A hung process refers to a process that has stopped responding but hasn’t exited. This situation can be caused by various reasons, such as deadlocks, resource exhaustion, or exceptions. To resolve these issues, we can use the pstack
tool to analyze the process’s stack information and identify the root cause.
Steps
pstack
is a commonly used tool, often provided alongside gdb
(GNU Debugger). You can install it using the following command:
sudo apt-get install gdb
Obtain Process ID: First, we need to obtain the process ID (PID) of the zombie process. We can use the ps
command to list all processes and find the PID of the process we want to investigate.
Use the pstack
tool to analyze the process stack. Once you have obtained the process ID, you can use the pstack
tool to retrieve the stack information for that process. Run the following command:
pstack <PID>
This will output the stack information of the process, displaying the sequence of function calls currently being executed. By analyzing this information, you can identify where the process is stuck and subsequently pinpoint the problem.
Analyze Stack Information: By examining the stack information, you can find the cause of the process becoming zombie. You may discover deadlock situations, infinite loops, or other abnormal conditions. Take appropriate measures based on the specific situation, such as releasing locks, fixing code logic, etc.
Case Study
Simple demo, after the main function starts, a child thread is created and the actual function enters an infinite loop, causing the program to fail to terminate normally and enter a state of false death.
cmake_minimum_required(VERSION 3.0.0)
project(pstack_main VERSION 0.1.0 LANGUAGES C CXX)
include(CTest)
enable_testing()
# Find the Threads library
find_package(Threads REQUIRED)
add_executable(pstack_main main.cpp)
# Link with the Threads library
target_link_libraries(pstack_main PRIVATE Threads::Threads)
set(CPACK_PROJECT_NAME ${PROJECT_NAME})
set(CPACK_PROJECT_VERSION ${PROJECT_VERSION})
include(CPack)
#include <iostream>
#include <thread>
#include <chrono>
void infiniteLoop() {
while (true) {
// Main thread enters an infinite loop
}
}
int main() {
std::thread thread(infiniteLoop); // Create a thread to execute the infinite loop function
thread.join(); // Wait for the thread to end
return 0;
}
Running the program, and examining the pstack
results:
Thread 2 (Thread 0x7eff3619b700 (LWP 1315017)):
#0 infiniteLoop () at /root/pstack/main.cpp:6
#1 0x0000000000402ca9 in std::__invoke_impl<void, void (*)()> (__f=@0x2260eb8: 0x4029a6 <infiniteLoop()>) at /usr/include/c++/8/bits/invoke.h:60
#2 0x0000000000402b02 in std::__invoke<void (*)()> (__fn=@0x2260eb8: 0x4029a6 <infiniteLoop()>) at /usr/include/c++/8/bits/invoke.h:95
#3 0x0000000000403150 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x2260eb8) at /usr/include/c++/8/thread:244
#4 0x0000000000403126 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x2260eb8) at /usr/include/c++/8/thread:253
#5 0x000000000040310a in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x2260eb0) at /usr/include/c++/8/thread:196
#6 0x00007eff36bceb23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#7 0x00007eff36ea91ca in start_thread () from /lib64/libpthread.so.0
#8 0x00007eff361d58d3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7eff372e1740 (LWP 1315016)):
#0 0x00007eff36eaa6cd in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
#1 0x00007eff36bceda7 in std::thread::join() () from /lib64/libstdc++.so.6
#2 0x00000000004029d2 in main () at /root/pstack/main.cpp:13
It can be seen that the program is in a false death state because of the infinite loop, the main thread enters an infinite loop, and the child thread cannot