如何安全地将对象(尤其是STL对象)传入DLL或从DLL传递？

Question

如何安全地将对象(尤其是STL对象)传入DLL或从DLL传递？

如何将类对象(尤其是STL对象)传递给C++ DLL？

我的应用程序必须以DLL文件的形式与第三方插件交互,我无法控制这些插件构建的编译器.我知道STL对象没有保证ABI,我担心我的应用程序会导致不稳定.

Answer 1

The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.

Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.

如果你真的,真的想对整个DLL边界传递C++对象,这是技术上是可行的.以下是您需要考虑的一些因素:

数据打包/对齐

在给定的类中,单个数据成员通常会专门放在内存中,因此它们的地址对应于类型大小的倍数.例如,a int可能与4字节边界对齐.

If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.

You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.

Member reordering

如果您的类不是标准布局,则编译器可以在内存中重新排列其数据成员.没有关于如何完成此操作的标准,因此任何数据重新排列都可能导致编译器之间的不兼容.因此,将数据来回传递到DLL将需要标准布局类.

召集会议

给定函数可以有多个调用约定.这些调用约定指定了如何将数据传递给函数:存储在寄存器或堆栈中的参数是什么？参数被推入堆栈的顺序是什么？在函数完成后,谁清理堆栈上剩下的任何参数？

It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.

Datatype size

According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.

Heap issues

If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.

To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.

STL issues

The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.

Name mangling

Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL@@YAPAUCCDLL_v1@@XZ in MSVC.

You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.

You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:

GCC:

EXPORTS
    GetCCDLL=_Z8GetCCDLLv @1

Run Code Online (Sandbox Code Playgroud)

MSVC:

EXPORTS
    GetCCDLL=?GetCCDLL@@YAPAUCCDLL_v1@@XZ @1

Run Code Online (Sandbox Code Playgroud)

Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.

This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.

Passing class objects to a function

This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.

Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.

//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries

//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
  void* pod_malloc(size_t size)
  {
    HANDLE heapHandle = GetProcessHeap();
    HANDLE storageHandle = nullptr;

    if (heapHandle == nullptr)
    {
      return nullptr;
    }

    storageHandle = HeapAlloc(heapHandle, 0, size);

    return storageHandle;
  }

  void pod_free(void* ptr)
  {
    HANDLE heapHandle = GetProcessHeap();
    if (heapHandle == nullptr)
    {
      return;
    }

    if (ptr == nullptr)
    {
      return;
    }

    HeapFree(heapHandle, 0, ptr);
  }
}

//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
  pod();
  pod(const T& value);
  pod(const pod& copy);
  ~pod();

  pod<T>& operator=(pod<T> value);
  operator T() const;

  T get() const;
  void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)

//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
  //these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
  typedef int original_type;
  typedef std::int32_t safe_type;

public:
  pod() : data(nullptr) {}

  pod(const original_type& value)
  {
    set_from(value);
  }

  pod(const pod<original_type>& copyVal)
  {
    original_type copyData = copyVal.get();
    set_from(copyData);
  }

  ~pod()
  {
    release();
  }

  pod<original_type>& operator=(pod<original_type> value)
  {
    swap(*this, value);

    return *this;
  }

  operator original_type() const
  {
    return get();
  }

protected:
  safe_type* data;

  original_type get() const
  {
    original_type result;

    result = static_cast<original_type>(*data);

    return result;
  }

  void set_from(const original_type& value)
  {
    data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.

    if (data == nullptr)
    {
      return;
    }

    new(data) safe_type (value);
  }

  void release()
  {
    if (data)
    {
      pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
      data = nullptr;
    }
  }

  void swap(pod<original_type>& first, pod<original_type>& second)
  {
    using std::swap;

    swap(first.data, second.data);
  }
};
#pragma pack(pop)

Run Code Online (Sandbox Code Playgroud)

The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).

We can also wrap STL types in this way, although it requires a little extra work:

#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
  //more comfort typedefs
  typedef std::basic_string<charT> original_type;
  typedef charT safe_type;

public:
  pod() : data(nullptr) {}

  pod(const original_type& value)
  {
    set_from(value);
  }

  pod(const charT* charValue)
  {
    original_type temp(charValue);
    set_from(temp);
  }

  pod(const pod<original_type>& copyVal)
  {
    original_type copyData = copyVal.get();
    set_from(copyData);
  }

  ~pod()
  {
    release();
  }

  pod<original_type>& operator=(pod<original_type> value)
  {
    swap(*this, value);

    return *this;
  }

  operator original_type() const
  {
    return get();
  }

protected:
  //this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
  safe_type* data;
  typename original_type::size_type dataSize;

  original_type get() const
  {
    original_type result;
    result.reserve(dataSize);

    std::copy(data, data + dataSize, std::back_inserter(result));

    return result;
  }

  void set_from(const original_type& value)
  {
    dataSize = value.size();

    data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));

    if (data == nullptr)
    {
      return;
    }

    //figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
    safe_type* dataIterPtr = data;
    safe_type* dataEndPtr = data + dataSize;
    typename original_type::const_iterator iter = value.begin();

    for (; dataIterPtr != dataEndPtr;)
    {
      new(dataIterPtr++) safe_type(*iter++);
    }
  }

  void release()
  {
    if (data)
    {
      pod_helpers::pod_free(data);
      data = nullptr;
      dataSize = 0;
    }
  }

  void swap(pod<original_type>& first, pod<original_type>& second)
  {
    using std::swap;

    swap(first.data, second.data);
    swap(first.dataSize, second.dataSize);
  }
};
#pragma pack(pop)

Run Code Online (Sandbox Code Playgroud)

Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.

//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
  virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};

CCDLL_v1* GetCCDLL();

Run Code Online (Sandbox Code Playgroud)

This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:

struct CCDLL_v1_implementation: CCDLL_v1
{
  virtual void ShowMessage(const pod<std::wstring>* message) override;
};

CCDLL_v1* GetCCDLL()
{
  static CCDLL_v1_implementation* CCDLL = nullptr;

  if (!CCDLL)
  {
    CCDLL = new CCDLL_v1_implementation;
  }

  return CCDLL;
}

Run Code Online (Sandbox Code Playgroud)

And now let's implement the ShowMessage function:

#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
  std::wstring workingMessage = *message;

  MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}

Run Code Online (Sandbox Code Playgroud)

Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.

Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)

Now for an EXE to call the DLL:

//main.cpp
#include "../CCDLL/CCDLL.h"

typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;

int main()
{
  HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.

  Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
  CCDLL_v1* CCDLL_lib;

  CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.

  pod<std::wstring> message = TEXT("Hello world!");

  CCDLL_lib->ShowMessage(&message);

  FreeLibrary(ccdll); //unload the library when we're done with it

  return 0;
}

Run Code Online (Sandbox Code Playgroud)

And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.

In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.

@DavidHeffernan对.但这是我几周研究的结果,所以我认为记录我所学到的东西是值得的,所以其他人不需要做同样的研究和同样的黑客攻击工作解决方案.更是如此,因为这似乎是一个半常见的问题. (12认同)
@computerfreaker大多数主要的C++编译器(GCC,Clang,ICC,EDG等)都遵循Itanium C++ ABI.MSVC没有.所以,是的,这些ABI问题主要是针对MSVC的,尽管不是唯一的 - 甚至是Unix平台上的C编译器(甚至同一编译器的不同版本!)也存在不完美的互操作性.但是,它们通常足够接近,我发现你_could_成功地将Clang构建的DLL与GCC构建的可执行文件链接起来并不会感到惊讶. (3认同)
@πάνταῥεῖ _这些特定的 ABI 限制不适用于 MSVC 之外的其他工具链。甚至应该提到这一点..._我不确定我是否正确理解了这一点。您是否表示这些 ABI 问题是 MSVC 独有的，并且，例如，使用 clang 构建的 DLL 可以成功地与使用 GCC 构建的 EXE 一起使用？我有点困惑，因为这似乎与我所有的研究相矛盾...... (2认同)

Answer 2

Ben*_*igt 17

@computerfreaker写了一个很好的解释,为什么缺少ABI阻止在一般情况下跨越DLL边界传递C++对象,即使类型定义在用户控制下并且在两个程序中使用完全相同的令牌序列.(有两种情况可行:标准布局类和纯接口)

对于C++标准中定义的对象类型(包括那些改编自标准模板库的对象类型),情况远非如此糟糕.定义这些类型的标记在多个编译器中并不相同,因为C++标准不提供完整的类型定义,只提供最低要求.此外,这些类型定义中出现的标识符的名称查找不能解析相同的名称. 即使在存在C++ ABI的系统上,尝试跨模块边界共享此类类型也会因一个定义规则违规而导致大量未定义的行为.

这是Linux程序员不习惯处理的事情,因为g ++的libstdc ++是事实上的标准,几乎所有程序都使用它,因此满足了ODR.clang的libc ++打破了这个假设,然后C++ 11对几乎所有标准库类型进行了强制性更改.

只是不要在模块之间共享标准库类型.这是未定义的行为.

Answer 3

Ph0*_*t0n 15

这里的一些答案使得传递C++类听起来非常可怕,但我想分享另一种观点.其他一些响应中提到的纯虚拟C++方法实际上比你想象的更清晰.我围绕这个概念构建了一个完整的插件系统,并且多年来一直运行良好.我有一个"PluginManager"类,它使用LoadLib()和GetProcAddress()动态加载来自指定目录的dll(以及Linux等价物,因此可执行文件使其跨平台).

信不信由你,这种方法是宽容的,即使你做了一些古怪的事情,比如在纯虚拟接口的末尾添加一个新函数,并试图在没有新功能的情况下加载针对接口编译的dll - 它们会加载得很好.当然......您必须检查版本号以确保您的可执行文件仅为实现该功能的较新dll调用新函数.但好消息是:它有效!所以在某种程度上,你有一个粗略的方法来随着时间的推移演变你的界面.

关于纯虚拟接口的另一个很酷的事情 - 你可以继承你想要的任意数量的接口,你永远不会遇到钻石问题!

我想说这种方法的最大缺点是你必须非常小心你传递的参数类型.没有类或STL对象,没有先用纯虚拟接口包装它们.没有结构(没有经过pragma pack voodoo).只是主要类型和指向其他接口的指针.此外,你不能超载功能,这是一个不方便,但不是一个显示停止.

好消息是,通过少量代码行,您可以创建可重用的泛型类和接口来包装STL字符串,向量和其他容器类.或者,您可以向界面添加函数,如GetCount()和GetVal(n),以便让人们遍历列表.

为我们建立插件的人发现它很容易.他们不必是ABI边界的专家或任何东西 - 他们只是继承他们感兴趣的接口,编写他们支持的函数,并为他们不支持的函数返回false.

据我所知,使所有这些工作的技术不基于任何标准.从我收集的内容来看,微软决定以这种方式制作他们的虚拟表,以便他们可以制作COM,其他编译器编写者也决定效仿.这包括GCC,Intel,Borland和大多数其他主要的C++编译器.如果你打算使用一个不起眼的嵌入式编译器,那么这种方法可能不适合你.从理论上讲,任何编译器公司都可以随时更改虚拟表并破坏事物,但考虑到多年来依赖于此技术编写的大量代码,如果任何主要参与者决定打破排名,我会感到非常惊讶.

所以故事的寓意是......除了一些极端情况之外,你需要一个负责接口的人,他们可以确保ABI边界保持原始类型的干净并避免过载.如果您对该规定没问题,那么我不会害怕在编译器之间共享DLL/SO中的类的接口.直接共享类==麻烦,但共享纯虚拟接口并不是那么糟糕.

嘿，这是一个很好的答案，谢谢！在我看来，使它变得更好的是一些进一步阅读的链接，这些链接显示了您所提及的内容的一些示例（甚至是一些代码），例如包装STL类等，否则我正在阅读这个答案，但我对这些东西的实际外观和搜索方式有些迷惑。 (2认同)

Answer 4

Mr.*_*C64 8

您无法安全地跨DLL边界传递STL对象,除非所有模块(.EXE和.DLL)都使用相同的C++编译器版本以及相同的CRT设置和风格构建,这是非常有限的,而且显然不是您的情况.

如果要从DLL公开面向对象的接口,则应该公开C++纯接口(类似于COM所做的那样).考虑阅读关于CodeProject的这篇有趣的文章:

HowTo:从DLL导出C++类

您可能还需要考虑在DLL边界公开纯C接口,然后在调用者站点构建C++包装器.
这与Win32中的情况类似:Win32实现代码几乎是C++,但许多Win32 API都暴露了一个纯C接口(还有公开COM接口的API).然后ATL/WTL和MFC用C++类和对象包装这些纯C接口.

归档时间：	11 年，10 月前
查看次数：	26662 次
最近记录：	7 年，8 月前