foo*_*o64 5 android mutex pthreads android-ndk
I've been porting a cross platform C++ engine to Android, and noticed that it will inexplicably (and inconsistently) block when calling pthread_mutex_lock. This engine has already been working for many years on several platforms, and the problematic code hasn't changed in years, so I doubt it's a deadlock or otherwise buggy code. It must be my port to Android..
So far there are several places in the code that block on pthread_mutex_lock. It isn't entirely reproducible either. When it hangs, there's no suspicious output in LogCat.
I modified the mutex code like this (edited for brevity... real code checks all return values):
void MutexCreate( Mutex* m )
{
#ifdef WINDOWS
InitializeCriticalSection( m );
#else ANDROID
pthread_mutex_init( m, NULL );
#endif
}
void MutexDestroy( Mutex* m )
{
#ifdef WINDOWS
DeleteCriticalSection( m );
#else ANDROID
pthread_mutex_destroy( m, NULL );
#endif
}
void MutexLock( Mutex* m )
{
#ifdef WINDOWS
EnterCriticalSection( m );
#else ANDROID
pthread_mutex_lock( m );
#endif
}
void MutexUnlock( Mutex* m )
{
#ifdef WINDOWS
LeaveCriticalSection( m );
#else ANDROID
pthread_mutex_unlock( m );
#endif
}
Run Code Online (Sandbox Code Playgroud)
I tried modifying MutexCreate to make error-checking and recursive mutexes, but it didn't matter. I wasn't even getting errors or log output either, so either that means my mutex code is just fine, or the errors/logs weren't being shown. How exactly does the OS notify you of bad mutex usage?
The engine makes heavy use of static variables, including mutexes. I can't see how, but is that a problem? I doubt it because I modified lots of mutexes to be allocated on the heap instead, and the same behavior occurred. But that may be because I missed some static mutexes. I'm probably grasping at straws here.
I read several references including:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html
http://www.embedded-linux.co.uk/tutorial/mutex_mutandis
“错误检查”互斥体将检查一些事情(例如尝试递归地使用非递归互斥体),但没什么了不起的。
您说“真实代码检查所有返回值”,因此如果任何 pthread 调用返回非零值,您的代码可能会爆炸。(不知道为什么你的 pthread_mutex_destroy 需要两个参数;假设复制和粘贴错误。)
pthread 代码在 Android 中广泛使用,并且没有已知的挂起,因此问题不太可能出现在 pthread 实现本身中。
互斥体的当前实现适合 32 位,因此如果您打印*(pthread_mutex_t* mut)为整数,您应该能够弄清楚它处于什么状态(从技术上讲,它在过去某个时刻处于什么状态)。bionic/libc/bionic/pthread.c 中的定义是:
/* a mutex is implemented as a 32-bit integer holding the following fields
*
* bits: name description
* 31-16 tid owner thread's kernel id (recursive and errorcheck only)
* 15-14 type mutex type
* 13 shared process-shared flag
* 12-2 counter counter of recursive mutexes
* 1-0 state lock state (0, 1 or 2)
*/
Run Code Online (Sandbox Code Playgroud)
“快速”互斥体的类型为 0,并且不设置该tid字段。事实上,通用互斥锁的值为 0(未持有)、1(持有)或 2(持有,有争用)。如果您看到一个快速互斥体,其值不属于其中之一,则很可能有什么东西出现并踩踏了它。
这也意味着,如果您将程序配置为使用递归互斥体,您可以通过拉出位来查看哪个线程持有互斥体(通过在 trylock 指示您即将停止时打印互斥体值,或者使用 gdb 转储状态)在挂起的进程上)。再加上 的输出ps -t,将使您知道锁定互斥体的线程是否仍然存在。