Slow efficiency when processing large string data in Linux backend services

In C++ development projects, we used a custom protocol for communication that adopted a two-dimensional array pattern. When processing large amounts of data, the protocol needed to traverse and serialize arrays internally to generate logs. Due to low efficiency, this caused noticeable lag in the system under high load, which was reported by the business department.

Problem identification

When troubleshooting the issue, we first performed a performance analysis of the system and found that the CPU utilization increased significantly when processing large amounts of data, and the system response time lengthened. By analyzing the system logs, we discovered numerous serialization operations, which were inefficient when handling two-dimensional arrays, leading to decreased system performance.

The pstack tool captured thread information for the service, revealing that the log thread spends most of its time processing string concatenation

Here’s the key point for today: different accumulation methods can make a huge difference in efficiency. The historical code uses the + operator, which frequently creates temporary objects and is very inefficient. You know it’s inefficient, but you don’t realize just how inefficient it is.

Demo verification

Based on the project code, we extracted the business logic and wrote a simple demo to verify the efficiency of string concatenation. Compile and run with the vs2022 compiler under windows, the gcc8.5 compiler under linux, in Release mode, and compare their efficiencies.

Key points explanation

The project uses method four. Before receiving the test data, readers can first think about which method is most efficient and which is least efficient? I was still very surprised when I saw the results.

Method 1 (Concatenation using +=): Directly concatenate each field to the string using +=
Method 2 (using std::ostringstream concatenation): This method uses streams (std::ostringstream) to concatenate each field, which is more efficient, especially when concatenating large amounts of data
Method 3 (Pre-allocated Memory with +=) involves allocating sufficient memory for the string in advance using reserve, which reduces the overhead of memory reallocation and improves performance
Method 4 (bodys = bodys + body + "\n"): Creating a new temporary string object with each concatenation leads to performance degradation, especially when concatenating strings on a large scale, because each concatenation involves a new memory allocation and copy

According to the results, the project selected the least efficient method

To take things a step further, let’s analyze the optimization efficiency of different platform compilers. Microsoft’s Visual Studio remains consistently excellent, with very high optimization efficiency for strings, while the GCC compiler is somewhat lacking in this area.

The code runs on different machines, and there is no direct comparison between the two sets of data. You can compare the differences between different concatenation methods separately.

windows 平台下的 vs2022 编译器

----------------------------------------
Data Generation Time: 0.054 seconds.
----------------------------------------

----------------------------------------
Data Merging Performance:
----------------------------------------
+ Data merging (+=) took: 0.053 seconds.
+ ostringstream Data merging took: 0.054 seconds.
+ Pre-reserved Data merging took: 0.045 seconds.
+ Data merging (bodys = bodys + body + "\n") took: 16.108 seconds.

----------------------------------------
Data Merging Complete.
----------------------------------------

Program finished.

linux 平台下的 gcc8.5 编译器
----------------------------------------
Data Generation Time: 0.108 seconds.
----------------------------------------

----------------------------------------
Data Merging Performance:
----------------------------------------
+ Data merging (+=) took: 0.100 seconds.
+ ostringstream Data merging took: 0.083 seconds.
+ Pre-reserved Data merging took: 0.057 seconds.
+ Data merging (bodys = bodys + body + "\n") took: 29.298 seconds.

----------------------------------------
Data Merging Complete.
----------------------------------------

Program finished.

Complete code

#include <iostream>
#include <string>
#include <vector>
#include <random>
#include <chrono>
#include <sstream>
#include <iomanip>

typedef std::vector<std::string> DataRow;
typedef std::vector<DataRow> DataGroup;

struct ResponsePackage
{
    std::string ErrorInfo;
    DataRow Head;
    std::string ClientId;
    std::string UUID;
    std::string MsgID;
    std::string SessionID;
    std::string ExtraInfo1;
    std::string ExtraInfo2;
    DataGroup DataBody;
};

// Generate specified length of random string
std::string generateRandomString(size_t length)
{
    const char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    const size_t max_index = sizeof(charset) - 1;
    std::string random_string;
    random_string.reserve(length);

    std::random_device rd;
    std::mt19937 generator(rd());
    std::uniform_int_distribution<> distribution(0, max_index);

    for (size_t i = 0; i < length; ++i)
    {
        random_string += charset[distribution(generator)];
    }

    return random_string;
}

void create_large_string()
{
    // Example request package with 50 fields
    ResponsePackage requestPackage;

    requestPackage.Head = {
        "Field1", "Field2", "Field3", "Field4", "Field5",
        "Field6", "Field7", "Field8", "Field9", "Field10",
        "Field11", "Field12", "Field13", "Field14", "Field15",
        "Field16", "Field17", "Field18", "Field19", "Field20",
        "Field21", "Field22", "Field23", "Field24", "Field25",
        "Field26", "Field27", "Field28", "Field29", "Field30",
        "Field31", "Field32", "Field33", "Field34", "Field35",
        "Field36", "Field37", "Field38", "Field39", "Field40",
        "Field41", "Field42", "Field43", "Field44", "Field45",
        "Field46", "Field47", "Field48", "Field49", "Field50"
    };

    requestPackage.ClientId = "ClientID";
    requestPackage.UUID = "UUID";
    requestPackage.MsgID = "MsgID";
    requestPackage.SessionID = "SessionID";
    requestPackage.ExtraInfo1 = "ExtraInfo1";
    requestPackage.ExtraInfo2 = "ExtraInfo2";

    // Start timing for data generation
    auto start_gen = std::chrono::high_resolution_clock::now();

    // Generate 10,000 rows of data, each with 50 fields
    for (size_t i = 0; i < 10000; ++i)
    {
        DataRow dataRow(50, "This is a test string");
        requestPackage.DataBody.push_back(dataRow);
    }

    // End timing for data generation
    auto end_gen = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> duration_gen = end_gen - start_gen;

    // Display result generation time
    std::cout << "\n----------------------------------------\n";
    std::cout << "Data Generation Time: " << std::fixed << std::setprecision(3) << duration_gen.count() << " seconds.\n";
    std::cout << "----------------------------------------\n";

    // Data merging using different methods
    std::cout << "\n----------------------------------------\n";
    std::cout << "Data Merging Performance:\n";
    std::cout << "----------------------------------------\n";

    {
        // Method 1: Using '+=' string concatenation
        auto start_merge = std::chrono::high_resolution_clock::now();
        std::string bodys("");
        for (auto& vec : requestPackage.DataBody)
        {
            std::string body("This is a test string");
            for (auto& item : vec)
            {
                body += item + " ";
            }
            bodys += body + "\n";
        }
        auto end_merge = std::chrono::high_resolution_clock::now();
        std::chrono::duration<double> duration_merge = end_merge - start_merge;
        std::cout << "+ Data merging (+=) took: " << std::fixed << std::setprecision(3) << duration_merge.count() << " seconds.\n";
    }

    {
        // Method 2: Using ostringstream
        auto start_merge = std::chrono::high_resolution_clock::now();
        std::ostringstream bodys;
        for (auto& vec : requestPackage.DataBody)
        {
            std::ostringstream body;
            body << "This is a test string";
            for (auto& item : vec)
            {
                body << item << " ";
            }
            bodys << body.str() << "\n";
        }
        auto end_merge = std::chrono::high_resolution_clock::now();
        std::chrono::duration<double> duration_merge = end_merge - start_merge;
        std::cout << "+ ostringstream Data merging took: " << std::fixed << std::setprecision(3) << duration_merge.count() << " seconds.\n";
    }

    {
        // Method 3: Pre-allocated memory
        auto start_merge = std::chrono::high_resolution_clock::now();
        std::string bodys;
        bodys.reserve(1000 * 50 * 20); // Pre-allocate enough memory
        for (auto& vec : requestPackage.DataBody)
        {
            std::string body("This is a test string");
            body.reserve(50 * 20); // Pre-allocate memory for each row
            for (auto& item : vec)
            {
                body += item + " ";
            }
            bodys += body + "\n";
        }
        auto end_merge = std::chrono::high_resolution_clock::now();
        std::chrono::duration<double> duration_merge = end_merge - start_merge;
        std::cout << "+ Pre-reserved Data merging took: " << std::fixed << std::setprecision(3) << duration_merge.count() << " seconds.\n";
    }

    {
        // Method 4: Using 'bodys = bodys + body + "\n"'
        auto start_merge = std::chrono::high_resolution_clock::now();
        std::string bodys("");
        for (auto& vec : requestPackage.DataBody)
        {
            std::string body("This is a test string");
            for (auto& item : vec)
            {
                body = body + item + " "; // Note the use of 'body = body + item'
            }
            bodys = bodys + body + "\n"; // Again, using 'bodys = bodys + body'
        }
        auto end_merge = std::chrono::high_resolution_clock::now();
        std::chrono::duration<double> duration_merge = end_merge - start_merge;
        std::cout << "+ Data merging (bodys = bodys + body + \"\\n\") took: " << std::fixed << std::setprecision(3) << duration_merge.count() << " seconds.\n";
    }

    std::cout << "\n----------------------------------------\n";
    std::cout << "Data Merging Complete.\n";
    std::cout << "----------------------------------------\n";
}

int main()
{
    try
    {
        create_large_string();
    }
    catch (const std::exception& e)
    {
        std::cerr << "Caught exception: " << e.what() << std::endl;
    }

    std::cout << "\nProgram finished.\n";
    return 0;
}