boost find in共享内存方法陷入c++多进程项目中

Question

boost find in共享内存方法陷入c++多进程项目中

我正在使用 boost 的 ipc 库来保存复杂的对象，包括图像，在共享内存中，由多个进程使用。我们称这个对象为MyImage。共享内存是一个循环缓冲区，MyImage一次保存多个对象。

在我的代码中，有两个（或更多）进程写入共享内存中的一个段，另一个进程从中读取。此流程按预期工作，但是在读取器进程完成或崩溃后，当它尝试再次打开共享内存中的同一对象时，它会卡在find方法上，而写入器进程仍然运行良好。

我试图了解哪种竞争条件可能导致此问题，但在我的代码或 boost 的文档中找不到任何解释。

这是一个简单的代码，示例了我的项目中的问题：

流程Writer：

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/ipc/message_queue.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/circular_buffer.hpp>

using namespace std;
namespace bip = boost::interprocess;

static const char *const PLACE_SHM_NAME = "PlaceInShm";
static const char *const OBJECT_SHM_NAME = "ObjectInShm";
static const char *const PUSH_POP_LOCK = "push_pop_image_lock";
static const int IMAGES_IN_BUFFER = 20;
static const int OBJECT_SIZE_IN_SHM = 91243520;

class MyImage;

typedef bip::managed_shared_memory::segment_manager SegmentManagerType;
typedef bip::allocator<void, SegmentManagerType> MyImageVoidAllocator;
typedef bip::deleter<MyImage, SegmentManagerType> MyImageDeleter;
typedef bip::shared_ptr<MyImage, MyImageVoidAllocator, MyImageDeleter> MyImageSharedPtr;

typedef bip::allocator<MyImageSharedPtr, bip::managed_shared_memory::segment_manager> MyImageShmemAllocator;
typedef boost::circular_buffer<MyImageSharedPtr, MyImageShmemAllocator> MyImageContainer;

MyImageSharedPtr GetMyImage() {
    // some implementation
    return nullptr;
}

int main(int argc, char *argv[]) {

    MyImageContainer *my_image_data_container;
    try {
        bip::named_mutex open_lock{bip::open_or_create, OPEN_SHM_LOCK};
        bip::managed_shared_memory image_segment = bip::managed_shared_memory(bip::open_or_create, PLACE_SHM_NAME, OBJECT_SIZE_IN_SHM);
        my_image_data_container = image_segment.find_or_construct<MyImageContainer>(OBJECT_SHM_NAME)(IMAGES_IN_BUFFER, image_segment.get_segment_manager());
    } catch (boost::interprocess::interprocess_exception &e) {
        exit(1);
    }
    boost::interprocess::named_mutex my_image_mutex_ptr(boost::interprocess::open_or_create, PUSH_POP_LOCK);

    while (true) {
        MyImageSharedPtr img = GetMyImage();
        my_image_mutex_ptr.lock();
        my_image_data_container->push_back(img);
        my_image_mutex_ptr.unlock();
        usleep(1000);
    }
}

Run Code Online (Sandbox Code Playgroud)

流程Reader：

int main(int argc, char *argv[]) {

    MyImageContainer *my_image_data_container;
    try {
        bip::named_mutex open_lock{bip::open_only, OPEN_SHM_LOCK};
        bip::scoped_lock<bip::named_mutex> lock(open_lock, bip::try_to_lock);
        bip::managed_shared_memory image_segment = bip::managed_shared_memory(bip::open_only, PLACE_SHM_NAME);
        my_image_data_container = image_segment.find<MyImageContainer>(OBJECT_SHM_NAME).first;
    } catch (boost::interprocess::interprocess_exception &e) {
        exit(1);
    }
    boost::interprocess::named_mutex my_image_mutex_ptr(boost::interprocess::open_or_create, PUSH_POP_LOCK);

    while (true) {
        if (my_image_data_container->size() == 0) {
            continue;
        }
        MyImage *img;
        my_image_mutex_ptr.lock();
        img = &(*my_image_data_container->at(0));
        my_image_data_container->pop_front();
        my_image_mutex_ptr.unlock();
        // do stuff with img
        usleep(1000);
    }
}

Run Code Online (Sandbox Code Playgroud)

重现该错误的流程：

运行两个进程Writer代码的两个进程。
运行其中一个进程Reader代码的一个进程。
杀死Reader进程。
再次运行该Reader过程。

在第二次运行时，进程卡在队列中，image_segment.find<MyImageContainer>(OBJECT_SHM_NAME).first;而Writer进程却一切正常。

值得一提的是，每个进程都有一个唯一的id，并且只将从索引开始的图像作为其idWriter写入共享内存中的缓冲区。int(IMAGES_IN_BUFFER / NUMBER_OF_WRITERS)例如，我有两个Writerid 0 和 id 1, IMAGES_IN_BUFFER=20，然后Writer 0将写入索引 0-9 和Writer 110-19。

我的一些调试过程：

我尝试在单独的线程中打开共享内存，使用future对象在单独的线程中打开共享内存，并设置几秒钟的超时。但整个过程还是卡住了。
当我在卡住后终止该进程并重新运行它时，它永远不会再次成功，除非我从共享内存中删除该对象并重新运行所有进程，包括Writer .
通常与一个人一起跑步时Writer我无法重现该错误，但我不能肯定地说。
它不一致，这意味着我无法判断它何时会被卡住，何时不会被卡住。
也许共享内存中的对象以某种方式损坏了，而Reader进程崩溃了，然后重新打开它时，它失败了。在这种情况下，我预计 boost 将引发异常而不是挂起。
当进程正常退出（退出代码为 0）时，也可能会发生这种情况。

等待听到一些关于可能导致流程卡住的原因的意见。提前致谢！

Answer 1

seh*_*ehe 6

I see many issues with your code. Beyond that, there is also a known limitation. So let's start with that

Robustness of Interprocess Mutexes

First off, it's a wellknown issue with the library that there are no robust interprocess mutexes (portably):

How do I take ownership of an abandoned boost::interprocess::interprocess_mutex?
Boost interprocess mutexes and checking for abandonment etc.
But also see https://www.boost.org/doc/libs/1_76_0/boost/interprocess/detail/robust_emulation.hpp I think there are alternatives using file locks on some platforms/filesystems.

So, the best you can probably do is indeed to do a timed wait and have a "forced clear" option that you can manually engage when you know it is safe to do so.

Code Issues and Review

That said, there are some issues with the code, and some things you may improve to get things less brittle.

You mentioned crashing. This goes without saying: crashes are going to break invariants, avoid them.
This may include having a proper interrupt signal handler to make sure that you properly shut down.
You never lock the open lock in the writer path. This is and obvious problem that should be fixed.
In the reader path you issue a try_lock but never seem to verify that it succeeded.
In the code as shown, you're using my_image_data_container after the shared memory segment is destructed. This, by definition, will always be Undefined Behaviour
You are not using a RAII-enabled lock guard for the push/pop mutex (my_image_mutex_ptr). This means that it is not exception safe and will - again - cause locks to get stuck on exceptions.
In general, you seem to be confusing lockable primitives with locks. I'd suggest renaming the objects (open_lock -> open_mutex, my_image_mutex_ptr (?!) -> modify_mutex) to avoid such confusion.
I would probably suggest using the same mutex for both opening and modification (after all, creating the segment is not really allowed during modification, is it?). Alternatively, consider using an unnamed interprocess mutex inside the shared segment to remove unnecessary SHM namespace pollution. One less lock that could potentially be stuck, even after removing the shared memory segment itself!).
The ->empty() check is a data race:
- it doesn't lock the modify-mutex so the container might be written to concurrently
- it doesn't hold that same lock, so right after returning from empty() the return value is no longer reliable, as something may have been modified in the mean time
In C++, a data race is also invoking Undefined Behaviour
There's big problems here:
```
img = &(*container->at(0));
```
Run Code Online (Sandbox Code Playgroud)
This dereferences the shared pointer, retaining only the raw pointer. However, the next line pop_front()s from the container, so the shared pointer is removed, potentially (likely, given the code shown) destroying the image.

Just don't lose the refcount, and use the shared pointer.
Many names can be improved for readability. You'd typically think "computers don't care", but humans do. And all the micro-confusions compound, which probably explains 50% of the bugs uncovered in this post.
Some loose ends (exceptions are best caught by const&; you should check the bool on find<>(), container->at(0) can be spelled as container->front() etc.)

Counter Demo

Here's a version of the code reviewed for the above and written with a a bit more modern C++ style. The writer and reader are in a single main now (which you can switch using an arbitrary command line argument).

Live On Coliru

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/ipc/message_queue.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <boost/interprocess/smart_ptr/shared_ptr.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/circular_buffer.hpp>
#include <iostream>
#include <mutex>
#include <thread>

namespace bip = boost::interprocess;
using namespace std::chrono_literals;
constexpr char const* OPEN_SHM_LOCK = "OPEN_SHM_LOCK";

static const char* const SHM_NAME         = "PlaceInShm";
static const char* const OBJ_NAME         = "ObjectInShm";
static const char* const PUSH_POP_LOCK    = "push_pop_image_lock";
static const int         BUF_CAPACITY = 20;
static const int         SHM_SIZE         = 91243520;

using Segment = bip::managed_shared_memory;
using Mgr     = Segment::segment_manager;

class MyImage{};

template <typename T> using Alloc = bip::allocator<T, Mgr>;
using MyDeleter                   = bip::deleter<MyImage, Mgr>;
using SharedImage = bip::shared_ptr<MyImage, Alloc<MyImage>, MyDeleter>;

using Container = boost::circular_buffer<SharedImage, Alloc<SharedImage>>;

SharedImage GetMyImage() {
    // some implementation
    return {};
}

int main(int argc, char**) try {
    bool const isWriter = (argc == 1);
    std::cout << (isWriter? "Writer":"Reader") << std::endl;

    // extract variable part to reduce code duplication
    auto find_container = [isWriter](Segment& smt) {
        if (isWriter)
            return smt.find_or_construct<Container>(OBJ_NAME)(
                BUF_CAPACITY, smt.get_segment_manager());

        auto [container, ok] = smt.find<Container>(OBJ_NAME);
        assert(ok); // TODO proper error handling?

        return container;
    };

    bip::named_mutex open_mutex{bip::open_or_create, OPEN_SHM_LOCK};
    if (std::unique_lock open_lk{open_mutex, std::try_to_lock}) {
        Segment smt(bip::open_or_create, SHM_NAME, SHM_SIZE);
        auto container = find_container(smt);

        open_lk.unlock();

        bip::named_mutex modify_mutex(bip::open_or_create, PUSH_POP_LOCK);

        while (isWriter) {
            SharedImage img = GetMyImage();

            {
                std::unique_lock lk(modify_mutex);
                container->push_back(img);
            }
            std::cout << "Pushed" << std::endl;
            std::this_thread::sleep_for(1s);
        }

        while (not isWriter) {
            SharedImage img;

            if (std::unique_lock lk(modify_mutex); !container->empty()) {
                img = std::move(container->front());
                container->pop_front();
            } else {
                continue;
            }

            // if (img)
            {
                // do stuff with img
                std::cout << "Popped" << std::endl;
            }

            std::this_thread::sleep_for(1s);
        }
    } else {
        std::cout << "Failed to acquire open lock" << std::endl;
    }
} catch (bip::interprocess_exception const& e) {
    std::cerr << "Error: " << e.what() << std::endl;
    exit(1);
}

Run Code Online (Sandbox Code Playgroud)

Which on my system works well enough. I left as an exercise for the reader: replacing the modify-lock and adding signal handlers for shutdown.

Recorded Demo

Here's a simple recorded demo that demonstrates it working as expected on my system, using various numbers of readers/writers:

添加了演示运行的屏幕录制以供参考。 (2认同)

归档时间：	4 年，8 月前
查看次数：	1142 次
最近记录：	4 年，8 月前