컴파일러, 콜백 함수, 성능 테스트

지난해 __INLINE_CODE_0__을 설계했는데, 이 기능은 이벤트 캡처를 처리하고 외부에는 클래스 인터페이스를 제공합니다. 서비스 초기화 시 호출하는 쪽에서 해당 클래스를 구현하고 객체 포인터를 모듈에 전달합니다. 접촉한 __，好奇心害死猫，就想着这些接口都用__INLINE_CODE_1__함수 객체 콜백을 통해 얻는 결과는 순수 가상 함수의 인터페이스 정의 방법과 비교하여 더 유연하다 질문이 생겼다. 두 가지 다른 문법 중 성능 면에서 어느 것이 더 빠른가? 컴파일 원리를 모르는 사람이 코드를 조금 작성해서 확인해 보았다.

서론

온라인 주소에서 다양한 컴파일러와 컴파일 옵션을 선택하고, INLINE_CODE_0 플랫폼에서 코드를 실행하거나 해당 어셈블리 코드를 확인할 수 있습니다

기술 검증을 할 때, 웹 페이지에서 짧은 코드 조각을 실행하는 것이 편리합니다
다양한 색상으로 다른 어셈블리 코드에 해당하는 부분을 구분하면 로컬 디버거보다 훨씬 편리합니다

본문

표준위원회에서 문법 규칙을 제정했으며, 컴파일 단계에서 어떻게 구현할지는 각 컴파일러에 달려 있습니다. 이 점에서는 마이크로소프트의 컴파일러가 꽤 강력하다고 말씀드릴 수 있습니다. 문법적 설탕이 만능은 아니며, 콜백 인터페이스가 많지 않으므로 __INLINE_CODE_0__을 사용하면 더욱 편리하고 빈 콜백 함수 인터페이스를 정의할 필요도 없습니다. 콜백 인터페이스 종류가 다양할 때는 전통적인 가상 함수가 비즈니스 인터페이스 정의의 통일성을 높이는 데 더 유리합니다.

플랫폼인데, 성능이 비슷하고 큰 차이는 없습니다
__INLINE_CODE_0__굵게_2__INLINE_CODE_1__비교 시 1.35ns 증가

일반적인 비즈니스 시스템 개발에서는 이 정도의 성능 손실은 무시할 수 있으며, INLINE_CODE_0，在设计的上，能带来更多的便捷。在设计多信号处理时，尤为明显，底层有事件触发，如果需要落地日志，出入日志对象的的处理函数。当需要更多的业务处理接口时，底层用__INLINE_CODE_1__BOLD_5lambdaBOLD_6__INLINE_CODE_3__ 내의 신호와 슬롯, 로깅, 모니터링, 비즈니스 1, 비즈니스 2는 서로 완전히 분리되어 있습니다

코드

Counter: 1000000
Time: 3966us
Counter: 1000000
Time: 5316us

#include <iostream>
#include <chrono>
#include <memory>
#include <functional>
#include <atomic>
#include <string>

std::atomic_int64_t counter = 0;

// 定义回调接口
class UserInterface
{
public:
    virtual void name() = 0;
    virtual void full_name() = 0;
};

class User : public UserInterface
{
public:
    void name() {}
    void full_name() { counter++; }
};

void to_string(UserInterface* user)
{
    user->name();
    user->full_name();
}

using name_handler = std::function<void()>;
using full_name_handler = std::function<void()>;

class Test
{
    name_handler name_;
    full_name_handler full_name_;

public:
    void set_name_handler(name_handler name)
    {
        name_ = name;
    }

    void set_full_name_handler(full_name_handler full_name)
    {
        full_name_ = full_name;
    }

    void to_string()
    {
        name_();
        full_name_();
    }
};

int main()
{
    User user;

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        to_string(&user);
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    counter = 0;
    auto name = []() {};
    auto full_name = []() { counter++; };

    Test test;
    test.set_name_handler(name);
    test.set_full_name_handler(full_name);

    start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        test.to_string();
    }

    end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    return 0;
}

사후기

자료를 찾다가 비슷한 코드 조각을 발견했습니다. functionperformance.cpp

#include <iostream>
#include <chrono>
#include <memory>
#include <functional>

using namespace std;
using namespace std::chrono;

class Base
{
public:
	Base(){}
	virtual ~Base(){}
	virtual int func(int i) = 0;
};

class Derived : public Base
{
public:
	Derived(int base = 10) : base{base}
	{

	}
	~Derived(){}

	virtual int func(int i)
	{
		return i*base;
	}
private:
	int base;
};

struct Func
{
	int base;
	int operator()(int i)
	{
		return i*base;
	}
	Func(int base) : base {base}
	{

	}
};
const int base = 10;
int calculate(int i)
{
	return base*i;
}

int main()
{
	const int num = 10000;
	Base *p = new Derived{10};
	int total = 0;
	auto start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += p->func(i);
	}
	auto end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nvirtual call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += calculate(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\ndirect function call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	Func functor{10};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += functor(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nfunctor call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	int base = 10;
	function<int(int)> lambda = [base](int i)
	{
		return i*base;
	};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += lambda(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nlambda call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	return 0;
}

/*
test on mac mini i7 2.7GHz
clang++ -std=c++11 chronotest.cpp -O0
output:
result: 499950000
virtual call elapsed: 	43171 nanoseconds.

result: 499950000
direct function call elapsed: 	31379 nanoseconds.

result: 499950000
functor call elapsed: 	41497 nanoseconds.

result: 499950000
lambda call elapsed: 	207416 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O1
output:
result: 499950000
virtual call elapsed: 	26144 nanoseconds.

result: 499950000
direct function call elapsed: 	22384 nanoseconds.

result: 499950000
functor call elapsed: 	33477 nanoseconds.

result: 499950000
lambda call elapsed: 	55799 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O2
result: 499950000
virtual call elapsed: 	22284 nanoseconds.

result: 499950000
direct function call elapsed: 	36 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	28292 nanoseconds.

===================================================
clang++ -std=c++11 chronotest.cpp -O3
result: 499950000
virtual call elapsed: 	18975 nanoseconds.

result: 499950000
direct function call elapsed: 	29 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	22542 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O4

result: 499950000
virtual call elapsed: 	22141 nanoseconds.

result: 499950000
direct function call elapsed: 	30 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	22584 nanoseconds.
*/

여기에는 일반 함수와 람다 함수라는 두 가지 모드가 추가되었으며, 인터페이스 콜백 방식과 직접 호출 방식을 비교하여 성능 손실이 수치적인 차이를 보인다. 람다 함수의 성능은 함수에 근접하며 때로는 더 우수하다. 컴파일 원리에 대한 지식 부족으로 추측컨대, 접근하는 변수 주소와 함수가 인접해 있어 INLINE 처리 효율을 높이는 것으로 보인다.

첨부된 실행 결과

result: 499950000
virtual call elapsed: 6143 nanoseconds.

result: 499950000
direct function call elapsed: 30 nanoseconds.

result: 499950000
functor call elapsed: 31 nanoseconds.

result: 499950000
lambda call elapsed: 15134 nanoseconds.