与win32 CRITICAL_SECTION相比,std :: mutex的性能

ura*_*ray 35 c++ synchronization mutex stl thread-safety

性能如何std::mutex相比CRITICAL_SECTION?它是否相提并论?

我需要轻量级同步对象(不需要是一个进程间对象)是否有任何接近CRITICAL_SECTION其他的STL类std::mutex

wal*_*dez 30

请在答案结尾处查看我的更新,自Visual Studio 2015以来情况发生了巨大变化.原始答案如下.

我做了一个非常简单的测试,根据我的测量结果,std::mutex它比它慢50-70倍CRITICAL_SECTION.

std::mutex:       18140574us
CRITICAL_SECTION: 296874us
Run Code Online (Sandbox Code Playgroud)

编辑:经过一些更多的测试后,它发现它取决于线程数(拥塞)和CPU核心数.一般来说,std::mutex它更慢,但多少,这取决于使用.以下是更新的测试结果(在带有Core i5-4258U,Windows 10,Bootcamp的MacBook Pro上测试):

Iterations: 1000000
Thread count: 1
std::mutex:       78132us
CRITICAL_SECTION: 31252us
Thread count: 2
std::mutex:       687538us
CRITICAL_SECTION: 140648us
Thread count: 4
std::mutex:       1031277us
CRITICAL_SECTION: 703180us
Thread count: 8
std::mutex:       86779418us
CRITICAL_SECTION: 1634123us
Thread count: 16
std::mutex:       172916124us
CRITICAL_SECTION: 3390895us
Run Code Online (Sandbox Code Playgroud)

以下是生成此输出的代码.使用Visual Studio 2012编译,默认项目设置,Win32发布配置.请注意,这个测试可能不是完全正确的,但它让我使用我的转换代码前三思CRITICAL_SECTIONstd::mutex.

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc( int i )
{
    if ( i % 2 == 0 )
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

void threadFuncCritSec() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        EnterCriticalSection( &g_critSec );
        sharedFunc(i);
        LeaveCriticalSection( &g_critSec );
    }
}

void threadFuncMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n\r";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncCritSec ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n\r";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection( &g_critSec );

    std::cout << "Iterations: " << g_cRepeatCount << "\n\r";

    for (int i = 1; i <= g_cThreadCount; i = i*2) {
        std::cout << "Thread count: " << i << "\n\r";
        testRound(i);
        Sleep(1000);
    }

    DeleteCriticalSection( &g_critSec );

    // Added 10/27/2017 to try to prevent the compiler to completely
    // optimize out the code around g_shmem if it wouldn't be used anywhere.
    std::cout << "Shared variable value: " << g_shmem << std::endl;
    getchar();
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

更新10/27/2017(1):一些答案表明这不是一个现实的测试或不代表"现实世界"的情况.这是真的,这个测试试图衡量开销std::mutex,它没有试图证明两者的差异是可以忽略不计的应用程序99%.

更新10/27/2017(2):似乎std::mutex自Visual Studio 2015(VC140)以来情况发生了变化.我使用了VS2017 IDE,与上面完全相同的代码,x64版本配置,禁用了优化,我只是为每个测试切换了"Platform Toolset".结果非常令人惊讶,我真的很好奇在VC140中绞死了什么.

通过平台实现8个线程的性能

  • std::mutex 在 VC 2015 中使用 SRW 锁实现。 (3认同)
  • 希望我们可以做得更好:) (2认同)
  • 使用wchar_t绝不是没有意义的,它是Windows上Unicode API的"本机"类型.如果您的应用仅限Windows,它是目前最好的选择.使这样的代码可移植到像Linux这样的UTF-8平台可能会很痛苦,而32位wchar_t使它更糟糕. (2认同)

Sup*_*Jon 25

waldez在这里的测试是不现实的,它基本上模拟了100%的争用.通常,这正是您在多线程代码中不想要的.下面是一个修改过的测试,可以进行一些共享计算 我用这段代码得到的结果是不同的:

Tasks: 160000
Thread count: 1
std::mutex:       12096ms
CRITICAL_SECTION: 12060ms
Thread count: 2
std::mutex:       5206ms
CRITICAL_SECTION: 5110ms
Thread count: 4
std::mutex:       2643ms
CRITICAL_SECTION: 2625ms
Thread count: 8
std::mutex:       1632ms
CRITICAL_SECTION: 1702ms
Thread count: 12
std::mutex:       1227ms
CRITICAL_SECTION: 1244ms
Run Code Online (Sandbox Code Playgroud)

你可以在这里看到(对于我来说)(使用VS2013)std :: mutex和CRITICAL_SECTION之间的数字非常接近.请注意,此代码执行固定数量的任务(160,000),这就是为什么性能通常随着更多线程而改善的原因.我在这里有12个核心,这就是为什么我在12点停下来.

与其他测试相比,我并不是说这是对还是错,但它确实强调时序问题通常是特定于域的.

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int tastCount = 160000;
int numThreads;
const int MAX_THREADS = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc(int i, double &data)
{
    for (int j = 0; j < 100; j++)
    {
        if (j % 2 == 0)
            data = sqrt(data);
        else
            data *= data;
    }
}

void threadFuncCritSec() {
    double lMem = 8;
    int iterations = tastCount / numThreads;
    for (int i = 0; i < iterations; ++i) {
        for (int j = 0; j < 100; j++)
            sharedFunc(j, lMem);
        EnterCriticalSection(&g_critSec);
        sharedFunc(i, g_shmem);
        LeaveCriticalSection(&g_critSec);
    }
    printf("results: %f\n", lMem);
}

void threadFuncMutex() {
    double lMem = 8;
    int iterations = tastCount / numThreads;
    for (int i = 0; i < iterations; ++i) {
        for (int j = 0; j < 100; j++)
            sharedFunc(j, lMem);
        g_mutex.lock();
        sharedFunc(i, g_shmem);
        g_mutex.unlock();
    }
    printf("results: %f\n", lMem);
}

void testRound()
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < numThreads; ++i)
        threads.push_back(std::thread(threadFuncMutex));
    for (std::thread& thd : threads)
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endMutex - startMutex).count();
    std::cout << "ms \n\r";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < numThreads; ++i)
        threads.push_back(std::thread(threadFuncCritSec));
    for (std::thread& thd : threads)
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endCritSec - startCritSec).count();
    std::cout << "ms \n\r";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection(&g_critSec);

    std::cout << "Tasks: " << tastCount << "\n\r";

    for (numThreads = 1; numThreads <= MAX_THREADS; numThreads = numThreads * 2) {
        if (numThreads == 16)
            numThreads = 12;
        Sleep(100);
        std::cout << "Thread count: " << numThreads << "\n\r";
        testRound();
    }

    DeleteCriticalSection(&g_critSec);
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

  • 当没有争用时,您将得到相同的结果。如果不是,那么您使用的同步原语就会被破坏,或者只是测量不准确。 (2认同)