Compiler, callback function, performance testing

Last year, we designed a module responsible for handling event encapsulation and providing a class interface. During service initialization, the caller implements the corresponding class and passes the object pointer to the module. Using function object callbacks offers greater flexibility compared to defining interfaces with pure virtual functions The question arose: which grammar is faster in terms of performance? Without understanding compiler principles, let’s just try some code

Preface

An online URL that allows you to select different compilers, compilation parameters, and run code on a specified platform, or view the corresponding assembly code

  • Sometimes, doing technical validations with small code snippets on webpages is very convenient
  • Using different colors to distinguish code associated with different assemblies makes debugging easier than using a local debugger

Main body

The standards committee defines the grammar rules; how to implement them at the compilation level depends on each compiler. Microsoft’s compilers are quite impressive. Syntax sugar isn’t all-powerful, and with fewer callback interfaces, using INLINE_CODE_0 is more convenient and eliminates the need for empty callback function interfaces. When there are many types of callback interfaces, traditional virtual functions are better suited for unifying business interface definitions.

  • The platforms have similar performance with little difference
  • Comparison, increased by 1.35 ns per cycle

In typical business system development, this level of performance loss is negligible. Signals and slots, logging, monitoring, business logic 1, and business logic 2 are completely decoupled.

Code

Counter: 1000000
Time: 3966us
Counter: 1000000
Time: 5316us
#include <iostream>
#include <chrono>
#include <memory>
#include <functional>
#include <atomic>
#include <string>

std::atomic_int64_t counter = 0;

// 定义回调接口
class UserInterface
{
public:
    virtual void name() = 0;
    virtual void full_name() = 0;
};

class User : public UserInterface
{
public:
    void name() {}
    void full_name() { counter++; }
};

void to_string(UserInterface* user)
{
    user->name();
    user->full_name();
}

using name_handler = std::function<void()>;
using full_name_handler = std::function<void()>;

class Test
{
    name_handler name_;
    full_name_handler full_name_;

public:
    void set_name_handler(name_handler name)
    {
        name_ = name;
    }

    void set_full_name_handler(full_name_handler full_name)
    {
        full_name_ = full_name;
    }

    void to_string()
    {
        name_();
        full_name_();
    }
};

int main()
{
    User user;

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        to_string(&user);
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    counter = 0;
    auto name = []() {};
    auto full_name = []() { counter++; };

    Test test;
    test.set_name_handler(name);
    test.set_full_name_handler(full_name);

    start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        test.to_string();
    }

    end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    return 0;
}

Afterword

While searching for information, I came across similar code snippets

#include <iostream>
#include <chrono>
#include <memory>
#include <functional>

using namespace std;
using namespace std::chrono;

class Base
{
public:
	Base(){}
	virtual ~Base(){}
	virtual int func(int i) = 0;
};

class Derived : public Base
{
public:
	Derived(int base = 10) : base{base}
	{

	}
	~Derived(){}

	virtual int func(int i)
	{
		return i*base;
	}
private:
	int base;
};

struct Func
{
	int base;
	int operator()(int i)
	{
		return i*base;
	}
	Func(int base) : base {base}
	{

	}
};
const int base = 10;
int calculate(int i)
{
	return base*i;
}

int main()
{
	const int num = 10000;
	Base *p = new Derived{10};
	int total = 0;
	auto start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += p->func(i);
	}
	auto end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nvirtual call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += calculate(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\ndirect function call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	Func functor{10};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += functor(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nfunctor call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	int base = 10;
	function<int(int)> lambda = [base](int i)
	{
		return i*base;
	};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += lambda(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nlambda call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	return 0;
}

/*
test on mac mini i7 2.7GHz
clang++ -std=c++11 chronotest.cpp -O0
output:
result: 499950000
virtual call elapsed: 	43171 nanoseconds.

result: 499950000
direct function call elapsed: 	31379 nanoseconds.

result: 499950000
functor call elapsed: 	41497 nanoseconds.

result: 499950000
lambda call elapsed: 	207416 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O1
output:
result: 499950000
virtual call elapsed: 	26144 nanoseconds.

result: 499950000
direct function call elapsed: 	22384 nanoseconds.

result: 499950000
functor call elapsed: 	33477 nanoseconds.

result: 499950000
lambda call elapsed: 	55799 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O2
result: 499950000
virtual call elapsed: 	22284 nanoseconds.

result: 499950000
direct function call elapsed: 	36 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	28292 nanoseconds.

===================================================
clang++ -std=c++11 chronotest.cpp -O3
result: 499950000
virtual call elapsed: 	18975 nanoseconds.

result: 499950000
direct function call elapsed: 	29 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	22542 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O4

result: 499950000
virtual call elapsed: 	22141 nanoseconds.

result: 499950000
direct function call elapsed: 	30 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	22584 nanoseconds.
*/

Here are two new modes: ordinary functions and emulated functions. There’s a significant performance difference—on the order of magnitude—between using interface callbacks and direct calls. Emulated function performance is close to that of regular functions, sometimes even better. This area (compiler principles) is a knowledge gap for me; my guess is that it’s due to variables and functions being located adjacent in memory, which benefits INLINE processing.

Attached is the execution result

result: 499950000
virtual call elapsed: 6143 nanoseconds.

result: 499950000
direct function call elapsed: 30 nanoseconds.

result: 499950000
functor call elapsed: 31 nanoseconds.

result: 499950000
lambda call elapsed: 15134 nanoseconds.
Licensed under CC BY-NC-SA 4.0
Last updated on May 28, 2025 09:47
A financial IT programmer's tinkering and daily life musings
Built with Hugo
Theme Stack designed by Jimmy