Ken*_*eth -1 c++ mutex lock-free
My test shows that using lock-free atomic fetch_add is much slower than using mutex. I am running 10 parallel threads adding 1 million to the shared counter each. I am using relaxed memory order. Did I do the atomic wrong?
output from test:
fetch_add elapsed time: 4.14048 secs
fetch_add elapsed time: 5.37776 secs
fetch_add elapsed time: 5.37771 secs
fetch_add elapsed time: 5.37791 secs
fetch_add elapsed time: 5.37865 secs
fetch_add elapsed time: 7.85113 secs
fetch_add elapsed time: 7.93542 secs
fetch_add elapsed time: 7.96275 secs
fetch_add elapsed time: 7.97306 secs
Result from fetch_add:1000000000
Lock elapsed time: 0.214009 secs
Lock elapsed time: 0.418662 secs
Lock elapsed time: 0.633993 secs
Lock elapsed time: 0.836704 secs
Lock elapsed time: 1.04225 secs
Lock elapsed time: 1.24322 secs
Lock elapsed time: 1.44627 secs
Lock elapsed time: 1.64898 secs
Lock elapsed time: 1.8539 secs
Lock elapsed time: 2.05816 secs
Result from lock:1000000000
Run Code Online (Sandbox Code Playgroud)
Testing code:
#include <atomic>
#include <cstdlib>
#include <time.h> /* time */
#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>
const int BILLION = 1000000000L;
std::mutex my_mutex;
int global_val = 0;
std::atomic<unsigned long> global_a_val(0);
void LockFree(int length) {
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
for (int i = 0; i < length; i++)
global_a_val.fetch_add(1, std::memory_order_relaxed);
clock_gettime(CLOCK_MONOTONIC, &end);
auto diff = BILLION * (end.tv_sec - start.tv_sec) + end.tv_nsec - start.tv_nsec;
std::cout<<"fetch_add elapsed time: "<< (double)diff/BILLION <<" secs\n";
}
void Lock(int length) {
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
{
const std::lock_guard<std::mutex> my_lock(my_mutex);
for (int i = 0; i < length; i++) {
auto pre = ++global_val;
}
}
clock_gettime(CLOCK_MONOTONIC, &end);
auto diff = BILLION * (end.tv_sec - start.tv_sec) + end.tv_nsec - start.tv_nsec;
std::cout<<"Lock elapsed time: "<< (double)diff/BILLION <<" secs\n";
}
int main() {
const int thread_count = 10;
const int count = 100000000;
std::thread t[thread_count];
for (int i = 0; i < thread_count; i++) {
t[i] = std::thread(LockFree, count);
}
for (int i = 0; i < thread_count; i++) {
t[i].join();
}
std::cout<<"Result from fetch_add:"<<global_a_val<<"\n";
std::thread t2[thread_count];
for (int i = 0; i < thread_count; i++) {
t2[i] = std::thread(Lock, count);
}
for (int i = 0; i < thread_count; i++) {
t2[i].join();
}
std::cout<<"Result from lock:"<<global_val<<"\n";
}
Run Code Online (Sandbox Code Playgroud)
It is obvious that it will be slower because you lock mutex only once. Here in this code:
{
const std::lock_guard<std::mutex> my_lock(my_mutex);
for (int i = 0; i < length; i++) {
auto pre = ++global_val;
}
}
Run Code Online (Sandbox Code Playgroud)
You only lock mutex once and then you use loop to normally add values, but in fetch_add atomic add happens every time. It is obvious that a normal add is quicker than atomic add and this comparison is not correct. If you want to compare them correctly, you need a code like this:
{
for (int i = 0; i < length; i++) {
const std::lock_guard<std::mutex> my_lock(my_mutex);
auto pre = ++global_val;
}
}
Run Code Online (Sandbox Code Playgroud)
To compare, your code on my vm gives following times:
fetch_add elapsed time: 11.5873 secs
fetch_add elapsed time: 12.0809 secs
fetch_add elapsed time: 12.2435 secs
fetch_add elapsed time: 12.5315 secs
fetch_add elapsed time: 12.6408 secs
fetch_add elapsed time: 12.7246 secs
fetch_add elapsed time: 12.8112 secs
fetch_add elapsed time: 12.813 secs
fetch_add elapsed time: 12.8739 secs
fetch_add elapsed time: 12.9213 secs
Result from fetch_add:1000000000
Lock elapsed time: 0.259349 secs
Lock elapsed time: 0.51883 secs
Lock elapsed time: 0.779507 secs
Lock elapsed time: 1.03916 secs
Lock elapsed time: 1.3016 secs
Lock elapsed time: 1.56413 secs
Lock elapsed time: 1.82421 secs
Lock elapsed time: 2.08337 secs
Lock elapsed time: 2.34295 secs
Lock elapsed time: 2.60248 secs
Result from lock:1000000000
Run Code Online (Sandbox Code Playgroud)
but after the change that I mentioned, results will be like this:
fetch_add elapsed time: 11.4242 secs
fetch_add elapsed time: 11.9451 secs
fetch_add elapsed time: 11.9901 secs
fetch_add elapsed time: 12.0678 secs
fetch_add elapsed time: 12.1317 secs
fetch_add elapsed time: 12.505 secs
fetch_add elapsed time: 12.5591 secs
fetch_add elapsed time: 12.7394 secs
fetch_add elapsed time: 12.8966 secs
fetch_add elapsed time: 12.9199 secs
Result from fetch_add:1000000000
Lock elapsed time: 70.9624 secs
Lock elapsed time: 70.9993 secs
Lock elapsed time: 71.0727 secs
Lock elapsed time: 71.0932 secs
Lock elapsed time: 71.0966 secs
Lock elapsed time: 71.1221 secs
Lock elapsed time: 71.143 secs
Lock elapsed time: 71.1462 secs
Lock elapsed time: 71.158 secs
Lock elapsed time: 71.1588 secs
Result from lock:1000000000
Run Code Online (Sandbox Code Playgroud)