- Compiler - Callback Function - Performance Testing

  • Last year, an SDK was designed to handle encapsulating certain events and provide a class interface externally. During service initialization, the caller implements the corresponding classes and passes the object pointer to the module.
  • Familiarity with C11 piqued my curiosity, leading me to explore what would happen if these interfaces were implemented using lambda function objects instead of pure virtual function definitions. Compared to the traditional interface definition method, it seemed more flexible.
  • The question arose: with two different syntaxes, which one is faster from a performance perspective? Not understanding compiler principles, I decided to try out some code to find out.

Introduction

Online website, allowing you to select different compilers, compilation parameters, run code on the linux platform, or view corresponding assembly code.

  • https://wandbox.org/ : Sometimes useful for technical validation; executing small code snippets in a web browser is very convenient.
  • https://godbolt.org/ : Using different colors to distinguish the corresponding assembly code for each line, it’s more convenient than using a local debugger.

Body

The Standards Committee established grammatical rules, and at the compilation level, how it’s implemented depends on each compiler vendor. It’s worth noting that Microsoft’s compiler is quite powerful here. Syntax sugar isn’t a panacea; callback interfaces aren’t abundant, using lambda expressions is more convenient and eliminates the need to define empty callback function interfaces. When callback interfaces become numerous, traditional virtual functions are advantageous for unifying business interface definitions.

  • On the windows platform, performance is close, with little difference.
  • On the linux platform, a comparison of virtual functions and lambda reveals a 1.35ns overhead per call. In typical business system development, this level of performance loss can be ignored; introducing lambda brings greater convenience in design. This is particularly noticeable when dealing with multi-signal processing – the underlying event triggering mechanism, along with handling functions for logging objects when events are triggered. When more business processing interfaces are needed, lambda objects are stored in a vector, and they’re iterated through sequentially during event triggering, similar to Qt’s signals and slots; logging, monitoring, business 1, and business 2 are completely decoupled.

Code

Counter: 1000000
Time: 3966us
Counter: 1000000
Time: 5316us
#include <iostream>
#include <chrono>
#include <memory>
#include <functional>
#include <atomic>
#include <string>

std::atomic_int64_t counter = 0;

// Define the callback interface
class UserInterface
{
public:
    virtual void name() = 0;
    virtual void full_name() = 0;
};

class User : public UserInterface
{
public:
    void name() {}
    void full_name() { counter++; }
};

void to_string(UserInterface* user)
{
    user->name();
    user->full_name();
}

using name_handler = std::function<void()>;
using full_name_handler = std::function<void()>;

class Test
{
    name_handler name_;
    full_name_handler full_name_;

public:
    void set_name_handler(name_handler name)
    {
        name_ = name;
    }

    void set_full_name_handler(full_name_handler full_name)
    {
        full_name_ = full_name;
    }

    void to_string()
    {
        name_();
        full_name_();
    }
};

int main()
{
    User user;

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        to_string(&user);
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    counter = 0;
    auto name = []() {};
    auto full_name = []() { counter++; };

    Test test;
    test.set_name_handler(name);
    test.set_full_name_handler(full_name);

    start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        test.to_string();
    }

    end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    return 0;
}

Epilogue

While researching, I came across similar code snippets functionperformance.cpp

#include <iostream>
#include <chrono>
#include <memory>
#include <functional>

using namespace std;
using namespace std::chrono;

class Base
{
public:
	Base(){}
	~Base(){}
	virtual int func(int i) = 0;
};

class Derived : public Base
{
public:
	Derived(int base = 10) : base{base}
	{

	}
	~Derived(){}

	virtual int func(int i)
	{
		return i*base;
	}
private:
	int base;
};

struct Func
{
	int base;
	int operator()(int i)
	{
		return i*base;
	}
	Func(int base) : base {base}
	{

	}
};
const int base = 10;
int calculate(int i)
{
	return base*i;
}

int main()
{
	const int num = 10000;
	Base *p = new Derived{10};
	int total = 0;
	auto start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += p->func(i);
	}
	auto end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nvirtual call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += calculate(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\ndirect function call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	Func functor{10};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += functor(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nfunctor call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	int base = 10;
	function<int(int)> lambda = [base](int i)
	{
		return i*base;
	};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += lambda(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nlambda call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	return 0;
}

/*
test on mac mini i7 2.7GHz
clang++ -std=c++11 chronotest.cpp -O0
output:
result: 499950000
virtual call elapsed: 	43171 nanoseconds.

result: 499950000
direct function call elapsed: 	31379 nanoseconds.

result: 499950000
functor call elapsed: 	41497 nanoseconds.

result: 499950000
lambda call elapsed: 	207416 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O1
output:
result: 499950000
virtual call ```
/*
Here are two modes that have been added: a regular function and a functor. They provide an interface callback mechanism and comparison with direct calls, resulting in a qualitative difference in performance loss. Functors perform close to functions, and sometimes even outperform them. The compilation principle is somewhat of a knowledge blind spot; I suspect it's due to the addresses of accessed variables and the proximity of functions, which benefits `CPU` processing.

Attached is the output from running with `wandbox`.
*/
``` ## Epilogue
While searching for materials, I came across a similar code snippet [functionperformance.cpp](https://gist.githubusercontent.com/benloong/8050171/raw/fa577ec923b460862078b8b40233a42a1c619eeb/functionperformance.cpp)

```c++
#include <iostream>
#include <chrono>
#include <memory>
#include <functional>

using namespace std;
using namespace std::chrono;

class Base
{
public:
	Base(){}
	virtual ~Base(){}
	virtual int func(int i) = 0;
};

class Derived : public Base
{
public:
	Derived(int base = 10) : base{base}
	{

	}
	~Derived(){}

	virtual int func(int i)
	{
		return i*base;
	}
private:
	int base;
};

struct Func
{
	int base;
	int operator()(int i)
	{
		return i*base;
	}
	Func(int base) : base {base}
	{

	}
};
const int base = 10;
int calculate(int i)
{
	return base*i;
}

int main()
{
	const int num = 10000;
	Base *p = new Derived{10};
	int total = 0;
	auto start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += p->func(i);
	}
	auto end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nvirtual call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += calculate(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\ndirect function call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	Func functor{10};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += functor(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nfunctor call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	int base = 10;
	function<int(int)> lambda = [base](int i)
	{
		return i*base;
	};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += lambda(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nlambda call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	return 0;
}

/*
test on mac mini i7 2.7GHz
clang++ -std=c++11 chronotest.cpp -O0
output:
result: 499950000
virtual call elapsed: 	43171 nanoseconds.

result: 499950000
direct function call elapsed: 	31379 nanoseconds.

result: 499950000
functor call elapsed: 	41497 nanoseconds.

result: 499950000
lambda call elapsed: 	207416 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O1
output:
result: 49995000

## Epilogue

```shell
result: 499950000
virtual call elapsed: 6143 nanoseconds.

result: 499950000
direct function call elapsed: 30 nanoseconds.

result: 499950000
functor call elapsed: 31 nanoseconds.

result: 499950000
lambda call elapsed: 15134 nanoseconds.
Licensed under CC BY-NC-SA 4.0
Last updated on Jun 02, 2025 20:54
A financial IT programmer's tinkering and daily life musings
Built with Hugo
Theme Stack designed by Jimmy