Gar*_*ell 22 c++ multithreading
我有一个程序产生多个线程,每个线程执行一个长期运行的任务.然后主线程等待所有工作线程加入,收集结果并退出.
如果其中一个工作程序发生错误,我希望其余的工作程序正常停止,以便主线程可以在不久之后退出.
我的问题是如何最好地执行此操作,当长期运行的任务的实现由我的代码无法修改的库提供时.
这是系统的简单草图,没有错误处理:
void threadFunc()
{
// Do long-running stuff
}
void mainFunc()
{
std::vector<std::thread> threads;
for (int i = 0; i < 3; ++i) {
threads.push_back(std::thread(&threadFunc));
}
for (auto &t : threads) {
t.join();
}
}
Run Code Online (Sandbox Code Playgroud)
如果长时间运行的函数执行循环并且我可以访问代码,那么只需通过检查每次迭代顶部的共享"keep on running"标志就可以中止执行.
std::mutex mutex;
bool error;
void threadFunc()
{
try {
for (...) {
{
std::unique_lock<std::mutex> lock(mutex);
if (error) {
break;
}
}
}
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
Run Code Online (Sandbox Code Playgroud)
现在考虑一下库提供长时间运行的情况:
std::mutex mutex;
bool error;
class Task
{
public:
// Blocks until completion, error, or stop() is called
void run();
void stop();
};
void threadFunc(Task &task)
{
try {
task.run();
} catch (std::exception &) {
std::unique_lock<std::mutex> lock(mutex);
error = true;
}
}
Run Code Online (Sandbox Code Playgroud)
在这种情况下,主线程必须处理错误,并调用stop()
仍在运行的任务.因此,它不能join()
像原始实现那样简单地等待每个工作者
.
到目前为止我使用的方法是在主线程和每个worker之间共享以下结构:
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
}
Run Code Online (Sandbox Code Playgroud)
当工人成功完成时,它会减少running
计数.如果捕获到异常,则worker将设置该error
标志.在这两种情况下,它都会调用condVar.notify_one()
.
然后主线程等待条件变量,如果error
设置或running
达到零则唤醒
.在唤醒时,主线程调用stop()
所有任务(如果error
已设置).
这种方法有效,但我觉得应该有一个更清晰的解决方案,使用标准并发库中的一些更高级的原语.有人可以建议改进实施吗?
以下是我当前解决方案的完整代码:
// main.cpp
#include <chrono>
#include <mutex>
#include <thread>
#include <vector>
#include "utils.h"
// Class which encapsulates long-running task, and provides a mechanism for aborting it
class Task
{
public:
Task(int tidx, bool fail)
: tidx(tidx)
, fail(fail)
, m_run(true)
{
}
void run()
{
static const int NUM_ITERATIONS = 10;
for (int iter = 0; iter < NUM_ITERATIONS; ++iter) {
{
std::unique_lock<std::mutex> lock(m_mutex);
if (!m_run) {
out() << "thread " << tidx << " aborting";
break;
}
}
out() << "thread " << tidx << " iter " << iter;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
if (fail) {
throw std::exception();
}
}
}
void stop()
{
std::unique_lock<std::mutex> lock(m_mutex);
m_run = false;
}
const int tidx;
const bool fail;
private:
std::mutex m_mutex;
bool m_run;
};
// Data shared between all threads
struct SharedData
{
std::mutex mutex;
std::condition_variable condVar;
bool error;
int running;
SharedData(int count)
: error(false)
, running(count)
{
}
};
void threadFunc(Task &task, SharedData &shared)
{
try {
out() << "thread " << task.tidx << " starting";
task.run(); // Blocks until task completes or is aborted by main thread
out() << "thread " << task.tidx << " ended";
} catch (std::exception &) {
out() << "thread " << task.tidx << " failed";
std::unique_lock<std::mutex> lock(shared.mutex);
shared.error = true;
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
--shared.running;
}
shared.condVar.notify_one();
}
int main(int argc, char **argv)
{
static const int NUM_THREADS = 3;
std::vector<std::unique_ptr<Task>> tasks(NUM_THREADS);
std::vector<std::thread> threads(NUM_THREADS);
SharedData shared(NUM_THREADS);
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
const bool fail = (tidx == 1);
tasks[tidx] = std::make_unique<Task>(tidx, fail);
threads[tidx] = std::thread(&threadFunc, std::ref(*tasks[tidx]), std::ref(shared));
}
{
std::unique_lock<std::mutex> lock(shared.mutex);
// Wake up when either all tasks have completed, or any one has failed
shared.condVar.wait(lock, [&shared](){
return shared.error || !shared.running;
});
if (shared.error) {
out() << "error occurred - terminating remaining tasks";
for (auto &t : tasks) {
t->stop();
}
}
}
for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
out() << "waiting for thread " << tidx << " to join";
threads[tidx].join();
out() << "thread " << tidx << " joined";
}
out() << "program complete";
return 0;
}
Run Code Online (Sandbox Code Playgroud)
这里定义了一些实用程序函数:
// utils.h
#include <iostream>
#include <mutex>
#include <thread>
#ifndef UTILS_H
#define UTILS_H
#if __cplusplus <= 201103L
// Backport std::make_unique from C++14
#include <memory>
namespace std {
template<typename T, typename ...Args>
std::unique_ptr<T> make_unique(
Args&& ...args)
{
return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}
} // namespace std
#endif // __cplusplus <= 201103L
// Thread-safe wrapper around std::cout
class ThreadSafeStdOut
{
public:
ThreadSafeStdOut()
: m_lock(m_mutex)
{
}
~ThreadSafeStdOut()
{
std::cout << std::endl;
}
template <typename T>
ThreadSafeStdOut &operator<<(const T &obj)
{
std::cout << obj;
return *this;
}
private:
static std::mutex m_mutex;
std::unique_lock<std::mutex> m_lock;
};
std::mutex ThreadSafeStdOut::m_mutex;
// Convenience function for performing thread-safe output
ThreadSafeStdOut out()
{
return ThreadSafeStdOut();
}
#endif // UTILS_H
Run Code Online (Sandbox Code Playgroud)
我一直在思考你的情况,这也许对你有帮助。您可能可以尝试采用几种不同的方法来实现您的目标。有 2-3 个选项可以使用,也可以同时使用这三个选项。我至少会展示第一个选项,因为我仍在学习并尝试掌握模板专业化以及使用 Lambda 的概念。
Manager 类的伪代码如下所示:
class ThreadManager {
private:
std::unique_ptr<MainThread> mainThread_;
std::list<std::shared_ptr<WorkerThread> lWorkers_; // List to hold finished workers
std::queue<std::shared_ptr<WorkerThread> qWorkers_; // Queue to hold inactive and waiting threads.
std::map<unsigned, std::shared_ptr<WorkerThread> mThreadIds_; // Map to associate a WorkerThread with an ID value.
std::map<unsigned, bool> mFinishedThreads_; // A map to keep track of finished and unfinished threads.
bool threadError_; // Not needed if using exception handling
public:
explicit ThreadManager( const MainThread& main_thread );
void shutdownThread( const unsigned& threadId );
void shutdownAllThreads();
void addWorker( const WorkerThread& worker_thread );
bool isThreadDone( const unsigned& threadId );
void spawnMainThread() const; // Method to start main thread's work.
void spawnWorkerThread( unsigned threadId, bool& error );
bool getThreadError( unsigned& threadID ); // Returns True If Thread Encountered An Error and passes the ID of that thread,
};
Run Code Online (Sandbox Code Playgroud)
仅出于演示目的,我使用 bool 值来确定线程是否失败,以简化结构,当然,如果您更喜欢使用异常或无效的无符号值等,则可以将其替换为您的类似值。
现在使用这种类型的类将是这样的: 另请注意,如果这种类型的类是 Singleton 类型对象,那么它会被认为更好,因为您使用的是共享指针,因此您不需要超过 1 个 ManagerClass 。
SomeClass::SomeClass( ... ) {
// This class could contain a private static smart pointer of this Manager Class
// Initialize the smart pointer giving it new memory for the Manager Class and by passing it a pointer of the Main Thread object
threadManager_ = new ThreadManager( main_thread ); // Wouldn't actually use raw pointers here unless if you had a need to, but just shown for simplicity
}
SomeClass::addThreads( ... ) {
for ( unsigned u = 1, u <= threadCount; u++ ) {
threadManager_->addWorker( some_worker_thread );
}
}
SomeClass::someFunctionThatSpawnsThreads( ... ) {
threadManager_->spawnMainThread();
bool error = false;
for ( unsigned u = 1; u <= threadCount; u++ ) {
threadManager_->spawnWorkerThread( u, error );
if ( error ) { // This Thread Failed To Start, Shutdown All Threads
threadManager->shutdownAllThreads();
}
}
// If all threads spawn successfully we can do a while loop here to listen if one fails.
unsigned threadId;
while ( threadManager_->getThreadError( threadId ) ) {
// If the function passed to this while loop returns true and we end up here, it will pass the id value of the failed thread.
// We can now go through a for loop and stop all active threads.
for ( unsigned u = threadID + 1; u <= threadCount; u++ ) {
threadManager_->shutdownThread( u );
}
// We have successfully shutdown all threads
break;
}
}
Run Code Online (Sandbox Code Playgroud)
我喜欢管理器类的设计,因为我在其他项目中使用过它们,并且它们经常派上用场,特别是在使用包含许多资源的代码库时,例如具有许多资产(例如 Sprites)的工作游戏引擎,纹理、音频文件、地图、游戏项目等。使用管理器类有助于跟踪和维护所有资源。同样的概念可以应用于“管理”活动、非活动、等待线程,并且知道如何直观地正确处理和关闭所有线程。如果您的代码库和库支持异常以及线程安全异常处理,我建议使用 ExceptionHandler,而不是传递和使用 bool 来处理错误。另外,拥有一个 Logger 类也有好处,它可以写入日志文件和/或控制台窗口,以给出明确的消息,说明抛出异常的函数以及导致异常的原因,日志消息可能如下所示:
Exception Thrown: someFunctionNamedThis in ThisFile on Line# (x)
threadID 021342 failed to execute.
Run Code Online (Sandbox Code Playgroud)
通过这种方式,您可以查看日志文件并快速找出导致异常的线程,而不是使用传递的 bool 变量。