多线程C Lua模块导致Lua脚本中的段错误

ran*_*jak 8 c linux lua multithreading pthreads


我为Lua编写了一个非常简单的C库,它由一个启动线程的函数组成,所述线程除了循环之外什么都不做:

#include "lua.h"
#include "lauxlib.h"
#include <pthread.h>
#include <stdio.h>

pthread_t handle;
void* mythread(void* args)
{
    printf("In the thread !\n");
    while(1);
    pthread_exit(NULL);
}

int start_mythread()
{
    return pthread_create(&handle, NULL, mythread, NULL);
}

int start_mythread_lua(lua_State* L)
{
    lua_pushnumber(L, start_mythread());
    return 1;
}

static const luaL_Reg testlib[] = {
    {"start_mythread", start_mythread_lua},
    {NULL, NULL}
};

int luaopen_test(lua_State* L)
{
/*
    //for lua 5.2
    luaL_newlib(L, testlib);
    lua_setglobal(L, "test");
*/
    luaL_register(L, "test", testlib);
    return 1;
}
Run Code Online (Sandbox Code Playgroud)


现在,如果我写一个非常简单的Lua脚本,那就是:

require("test")
test.start_mythread()
Run Code Online (Sandbox Code Playgroud)

运行脚本lua myscript.lua有时会导致段错误.以下是GDB对核心转储的看法:

Program terminated with signal 11, Segmentation fault.
#0  0xb778b75c in ?? ()
(gdb) thread apply all bt

Thread 2 (Thread 0xb751c940 (LWP 29078)):
#0  0xb75b3715 in _int_free () at malloc.c:4087
#1  0x08058ab9 in l_alloc ()
#2  0x080513a2 in luaM_realloc_ ()
#3  0x0805047b in sweeplist ()
#4  0x080510ef in luaC_freeall ()
#5  0x080545db in close_state ()
#6  0x0804acba in main () at lua.c:389

Thread 1 (Thread 0xb74efb40 (LWP 29080)):
#0  0xb778b75c in ?? ()
#1  0xb74f6efb in start_thread () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#2  0xb7629dfe in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129
Run Code Online (Sandbox Code Playgroud)

主线程的堆栈中不时有一些变化.
似乎start_thread函数想要跳转到某个地址(在这个例子中,b778b75c),这个地址有时恰好属于无法访问的内存.
编辑
我也有一个valgrind输出:

==642== Memcheck, a memory error detector
==642== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==642== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==642== Command: lua5.1 go.lua
==642== 
In the thread !
In the thread !
==642== Thread 2:
==642== Jump to the invalid address stated on the next line
==642==    at 0x403677C: ???
==642==    by 0x46BEEFA: start_thread (pthread_create.c:309)
==642==    by 0x41C1DFD: clone (clone.S:129)
==642==  Address 0x403677c is not stack'd, malloc'd or (recently) free'd
==642== 
==642== 
==642== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==642==  Access not within mapped region at address 0x403677C
==642==    at 0x403677C: ???
==642==    by 0x46BEEFA: start_thread (pthread_create.c:309)
==642==    by 0x41C1DFD: clone (clone.S:129)
==642==  If you believe this happened as a result of a stack
==642==  overflow in your program's main thread (unlikely but
==642==  possible), you can try to increase the size of the
==642==  main thread stack using the --main-stacksize= flag.
==642==  The main thread stack size used in this run was 8388608.
==642== 
==642== HEAP SUMMARY:
==642==     in use at exit: 1,296 bytes in 6 blocks
==642==   total heap usage: 515 allocs, 509 frees, 31,750 bytes allocated
==642== 
==642== LEAK SUMMARY:
==642==    definitely lost: 0 bytes in 0 blocks
==642==    indirectly lost: 0 bytes in 0 blocks
==642==      possibly lost: 136 bytes in 1 blocks
==642==    still reachable: 1,160 bytes in 5 blocks
==642==         suppressed: 0 bytes in 0 blocks
==642== Rerun with --leak-check=full to see details of leaked memory
==642== 
==642== For counts of detected and suppressed errors, rerun with: -v
==642== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Killed
Run Code Online (Sandbox Code Playgroud)


但是,到目前为止,我一直很好,只是打开lua解释器并一个接一个地手动输入相同的指令.
另外,一个C程序使用相同的lib执行相同的操作:

int start_mythread();

int main()
{
    int ret = start_mythread();
    return ret;
}
Run Code Online (Sandbox Code Playgroud)

应该在我的测试中从未失败过.
我尝试过Lua 5.1和5.2,但都无济于事.
编辑:我应该指出我在运行32位Debian Wheezy(Linux 3.2)的单核eeePC上进行了测试.
我刚刚在我的主机上测试了(4核64位Arch linux),并且每次都使用lua myscript.luasegfaults 启动脚本...从解释器提示输入命令工作正常,以及C程序以上. 我之所以写这个小型lib的原因是因为我正在写一个更大的库,我首先遇到了这个问题.经过几个小时的无效调试,包括逐个删除每个共享结构/变量(是的,我是绝望的),我已经深入到这段代码. 所以,我的猜测是,我对Lua做错了,但那可能是什么呢?我尽可能多地搜索了这个问题,但我发现大多数人都遇到了从几个线程使用Lua API的问题(这不是我在这里想要做的). 如果你有一个想法,任何帮助将不胜感激. 编辑为了更精确,我想知道在编写用于Lua脚本的C lib时是否应该对线程采取额外的预防措施.Lua是否需要在动态加载的库中创建的线程在"卸载"库时终止?






rya*_*son 2

为什么Lua模块会出现段错误?

您的 Lua 脚本在线程完成之前退出,这会导致段错误。Lua 模块在正常解释器关闭期间被卸载dlclose(),因此线程的指令被从内存中删除,并且在读取下一条指令时出现段错误。

有什么选择?

任何在模块卸载之前停止线程的解决方案都可以工作。在主线程中使用pthread_join()将等待线程完成(您可能想使用 终止长时间运行的线程pthread_cancel())。pthread_exit()在模块卸载之前在主线程中调用也会防止崩溃(因为它会防止dlclose()),但它也会中止 Lua 解释器的正常清理/关闭过程。

以下是一些有效的示例:

int pexit(lua_State* L) {
   pthread_exit(NULL);
   return 0; 
} 

int join(lua_State* L)
{
  pthread_join(handle, NULL);
  return 0;
}

static const luaL_Reg testlib[] = {
    {"start_mythread", start_mythread_lua},
    {"join", join},
    {"exit", pexit},
    {NULL, NULL}
};

void* mythread(void* args) {
  int i, j, k;
    printf("In the thread !\n");
    for (i = 0; i < 10000; ++i) {
      for (j = 0; j < 10000; ++j) {
        for (k = 0; k < 10; ++k) {
          pow(1, i);
        }
      }
    }
    pthread_exit(NULL);
}
Run Code Online (Sandbox Code Playgroud)

现在脚本将很好地退出:

require('test')
test.start_mythread()
print("launched thread")
test.join() -- or test.exit()
print("thread joined")
Run Code Online (Sandbox Code Playgroud)

为了自动执行此操作,您可以绑定到垃圾收集器,因为模块中的所有对象在卸载共享对象之前都会被释放。(正如大狼建议的那样)

关于从 main() 调用 pthread_exit() 的讨论: 如果 main() 在它生成的线程之前完成,并且不显式调用 pthread_exit() ,则存在明确的问题。它创建的所有线程都将终止,因为 main() 已完成并且不再存在以支持线程。通过让 main() 显式调用 pthread_exit() 作为它所做的最后一件事,main() 将阻塞并保持活动状态以支持它创建的线程,直到它们完成。

(这句话有点误导:从返回main()大致相当于调用exit(),这将退出进程,包括所有正在运行的线程。这可能是也可能不是您想要的行为。pthread_exit()另一方面,在主线程中调用将退出主线程,但保持所有其他线程运行,直到它们自行停止或其他人杀死它们。同样,这可能是也可能不是您想要的行为。除非您为您的用例选择错误的选项,否则没有问题。)