Unconventional and dodgy Android crash in during JNI/OpenGL ES loading code

Kag*_*nar 5 crash java-native-interface android opengl-es android-ndk

Bounty

Since this is an important problem to me I've stuck a bounty on. I'm not looking for the exact answer -- whatever answer leads me to fix this problem gets the bounty. Please make sure you've seen the edit just below.

Edit: I've since managed to catch the crash in Gdb just as it dies (via "adb shell setprop debug.db.uid 32767") and noticed this is the exact same problem as is mentioned on this post on Google Groups. The backtrace shown is the same (except for precise addresses) as my crashing thread. I'll admit, I'm no debugging tool wizard, so if you've any ideas of what I should be looking for please let me know.

The quick and dirty rundown

I've whittled away most of my reasonably large application's code so that the app does the following: Loads in a bunch of textures via JNI'd wrappers (from C++ --> Java) so that the Java libraries handle the decoding for me, makes OpenGL textures out of them, and clears the screen to a rather pretty but mocking dark blue color. It's dying in libc, but only one in every ten times.

To make matters worse, it doesn't even look like it's dying related to any of the code I've written -- it seems to happen in a delayed fashion, but it doesn't seem to be related to something as convenient to blame as the garbage collector. There is no specific point in my own code that the crash occurs at -- it seems to shift around on a per-run basis.

The longer story

我最终得到了一个带有堆栈的标准故障转储,它告诉我什么都没有,因为它有两个条目,一个到libc,一个看起来像无效或空堆栈框架.libc中已解析的符号是pthread_mutex_unlock.我自己甚至不再使用这个功能,因为我已经省去了多线程的需要.(本地代码在表面视图中调用,只是渲染.)

pthread_mutex_unlock is resulting in a segmentation fault, usually at address 0 but sometimes a small value (less than 0x200) instead of 0. The default (and most common) mutex in Bionic only has one pointer it can segfault on, and that's the pointer to the pthread_mutex_t structure itself. However, a more complex mutex (there's several options) may use additional pointers. So, chances are libc is fine and libdvm is having the issue (assuming I can trust my stack trace even that far).

Let me note this problem only seems to be reproducible if I do one of these two things: disable loading in the data portion of images (but still reading format/dimension information) and leaving the buffer which I use for loading textures into OpenGL uninitialized, or disabling the creation of the OpenGL texture via disabling only the final glTexImage2D call.

请注意,上述用于将纹理加载到OpenGL中的缓冲区仅创建一次并销毁一次.我已经尝试扩大它,并确定我没有特定于该缓冲区的缓冲区溢出问题.

我能想到的主要罪魁祸首是:

  • 我没有正确使用JNI,它正在做一些令人讨厌的事情.
  • 我在某个地方发生了一个错误的错误,它正在破坏堆栈帧.
  • 我正在通过OpenGL ES一些不好的事情,它正在做同样糟糕的事情(tm).
  • 我的自定义内存分配器无法正常运行.

几天来,我一直在为这些罪犯(以及更多!)梳理我的代码.我对使用调试器犹豫不决,因为这次崩溃似乎对时间敏感.但是,我仍然可以通过启用调试选项完全未优化我自己的本机代码来解决崩溃问题.(gdb本身在爬行时运行,连接时应用程序也是如此)

我做过的事情

  • 二手CheckJNI.
  • 尽可能多地删除代码,直到它停止崩溃.
  • 编写一个信号处理程序并编写一个小型日志记录系统,以便在抛出信号之前转储最后完成的事情.
  • 试图(并且失败)加剧了这个问题.
  • 两端用金丝雀填充本机堆数组.他们从未改变过.
  • 审核了代码路径中100%的代码.(我只是没有看到这个问题.)
  • Thought the problem magically disappeared when I fixed a minor error, ran the code fifty times to make sure this was so, and then crashed the next day the first time I ran. (Ooh, I've never been so angry at a bug before!)

Here's a snippet of the usual native crash info from LogCat:

I/DEBUG   ( 5818): signal 11 (SIGSEGV), fault addr 00000000
I/DEBUG   ( 5818):  r0 0000006e  r1 00000080  r2 fffffc5e  r3 100ffe58
I/DEBUG   ( 5818):  r4 00000000  r5 00000000  r6 00000000  r7 00000000
I/DEBUG   ( 5818):  r8 00000000  r9 8054f999  10 10000000  fp 0013e768
I/DEBUG   ( 5818):  ip 3b9aca00  sp 100ffe58  lr afd10640  pc 00000000  cpsr 60000010
I/DEBUG   ( 5818):  d0  643a64696f72646e  d1  6472656767756265
I/DEBUG   ( 5818):  d2  8083297880832965  d3  8083298880832973
I/DEBUG   ( 5818):  d4  8083291080832908  d5  8083292080832918
I/DEBUG   ( 5818):  d6  8083293080832928  d7  8083294880832938
I/DEBUG   ( 5818):  d8  0000000000000000  d9  0000000000000000
I/DEBUG   ( 5818):  d10 0000000000000000  d11 0000000000000000
I/DEBUG   ( 5818):  d12 0000000000000000  d13 0000000000000000
I/DEBUG   ( 5818):  d14 0000000000000000  d15 0000000000000000
I/DEBUG   ( 5818):  d16 0000000000000000  d17 3fe999999999999a
I/DEBUG   ( 5818):  d18 42eccefa43de3400  d19 3fe00000000000b4
I/DEBUG   ( 5818):  d20 4008000000000000  d21 3fd99a27ad32ddf5
I/DEBUG   ( 5818):  d22 3fd24998d6307188  d23 3fcc7288e957b53b
I/DEBUG   ( 5818):  d24 3fc74721cad6b0ed  d25 3fc39a09d078c69f
I/DEBUG   ( 5818):  d26 0000000000000000  d27 0000000000000000
I/DEBUG   ( 5818):  d28 0000000000000000  d29 0000000000000000
I/DEBUG   ( 5818):  d30 0000000000000000  d31 0000000000000000
I/DEBUG   ( 5818):  scr 80000012
I/DEBUG   ( 5818): 
I/DEBUG   ( 5818):          #00  pc 00000000  
I/DEBUG   ( 5818):          #01  pc 0001063c  /system/lib/libc.so
I/DEBUG   ( 5818): 
I/DEBUG   ( 5818): code around pc:
I/DEBUG   ( 5818): 
I/DEBUG   ( 5818): code around lr:
I/DEBUG   ( 5818): afd10620 e1a01008 e1a02007 e1a03006 e1a00005 
I/DEBUG   ( 5818): afd10630 ebfff95d e1a05000 e1a00004 ebffff46 
I/DEBUG   ( 5818): afd10640 e375006e 03a0006e 13a00000 e8bd81f0 
I/DEBUG   ( 5818): afd10650 e304cdd3 e3043240 e92d4010 e341c062 
I/DEBUG   ( 5818): afd10660 e1a0e002 e24dd008 e340300f e1a0200d 
I/DEBUG   ( 5818): 
I/DEBUG   ( 5818): stack:
I/DEBUG   ( 5818):     100ffe18  00000000  
I/DEBUG   ( 5818):     100ffe1c  00000000  
I/DEBUG   ( 5818):     100ffe20  00000000  
I/DEBUG   ( 5818):     100ffe24  ffffff92  
I/DEBUG   ( 5818):     100ffe28  100ffe58  
I/DEBUG   ( 5818):     100ffe2c  00000000  
I/DEBUG   ( 5818):     100ffe30  00000080  
I/DEBUG   ( 5818):     100ffe34  8054f999  /system/lib/libdvm.so
I/DEBUG   ( 5818):     100ffe38  10000000  
I/DEBUG   ( 5818):     100ffe3c  afd10640  /system/lib/libc.so
I/DEBUG   ( 5818):     100ffe40  00000000  
I/DEBUG   ( 5818):     100ffe44  00000000  
I/DEBUG   ( 5818):     100ffe48  00000000  
I/DEBUG   ( 5818):     100ffe4c  00000000  
I/DEBUG   ( 5818):     100ffe50  e3a07077  
I/DEBUG   ( 5818):     100ffe54  ef900077  
I/DEBUG   ( 5818): #01 100ffe58  00000000  
I/DEBUG   ( 5818):     100ffe5c  00000000  
I/DEBUG   ( 5818):     100ffe60  00000000  
I/DEBUG   ( 5818):     100ffe64  00000000  
I/DEBUG   ( 5818):     100ffe68  00000000  
I/DEBUG   ( 5818):     100ffe6c  00000000  
I/DEBUG   ( 5818):     100ffe70  00000000  
I/DEBUG   ( 5818):     100ffe74  00000000  
I/DEBUG   ( 5818):     100ffe78  00000000  
I/DEBUG   ( 5818):     100ffe7c  00000000  
I/DEBUG   ( 5818):     100ffe80  00000000  
I/DEBUG   ( 5818):     100ffe84  00000000  
I/DEBUG   ( 5818):     100ffe88  00000000  
I/DEBUG   ( 5818):     100ffe8c  00000000  
I/DEBUG   ( 5818):     100ffe90  00000000  
I/DEBUG   ( 5818):     100ffe94  00000000  
I/DEBUG   ( 5818):     100ffe98  00000000  
I/DEBUG   ( 5818):     100ffe9c  00000000  
Run Code Online (Sandbox Code Playgroud)

Using ndk r6, Android platform 2.2 (API level 8), compiling with -Wall -Werror, ARM mode only.

I'm looking at any ideas, especially those which are verifiable in a deterministic way. If more information would help, just leave a comment (or if you can't, an answer) and I'll update my question ASAP. Thanks for reading this far!

JNI Interface

There are both j2n and n2j calls. The only j2n calls right now are here:

private static class Renderer implements GLSurfaceView.Renderer {
    public void onDrawFrame(GL10 gl) {
        GraphicsLib.graphicsStep();
    }

    public void onSurfaceChanged(GL10 gl, int width, int height) {
        GraphicsLib.graphicsInit(width, height);
    }

    public void onSurfaceCreated(GL10 gl, EGLConfig config) {
        // Do nothing.
    }
}
Run Code Online (Sandbox Code Playgroud)

This code goes through this interface:

public class GraphicsLib {

     static {
         System.loadLibrary("graphicslib");
     }

     public static native void graphicsInit(int width, int height);
     public static native void graphicsStep();
}
Run Code Online (Sandbox Code Playgroud)

Which on the native side looks like:

extern "C" {
    JNIEXPORT void JNICALL FN(graphicsInit)(JNIEnv* env, jobject obj,  jint width, jint height);
    JNIEXPORT void JNICALL FN(graphicsStep)(JNIEnv* env, jobject obj);
};
Run Code Online (Sandbox Code Playgroud)

The function definitions themselves begin with a copy of the prototypes.

graphicsInit just stores away the dimensions it was passed and sets up OpenGL a bit without anything particularly interesting. graphicsStep clears the screen to a nice color and and calls LoadSprites(env).

The more complex side is comprised of n2j calls used in LoadSprites() which loads in a sprite every frame. Not an elegant solution, but it's been working with exception of this crash.

LoadSprites works like this:

GameAssetsInfo gai;
void LoadSprites(JNIEnv* env)
{
    InitGameAssets(gai, env);
    CatchJNIException(env, "j0");
    ...
    static int z = 0;
    if (z < numSprites)
    {
        CatchJNIException(env, "j1");
        OpenGameImage(gai, SpriteIDFromNumber(z));
        CatchJNIException(env, "j2");
        unsigned int actualWidth = GetGameImageWidth(gai);
        CatchJNIException(env, "j3");
        unsigned int actualHeight = GetGameImageHeight(gai);
        CatchJNIException(env, "j4");
        ...
        jint i;
        int r = 0;
        CatchJNIException(env, "j5");
        do {
            CatchJNIException(env, "j6");
            i = ReadGameImage(gai);
            CatchJNIException(env, "j7");
            if (i > 0)
            {
                // Deal with the pure data chunk -- One line at a time.
                CatchJNIException(env, "j8");
                StoreGameImageChunk(gai, (int*)sprites[z].data + r, 0, i);
                ...
                r += sprites[z].width;
                CatchJNIException(env, "j9");
                UnreadGameImage(gai);
                CatchJNIException(env, "j10");
            } else {
                break;
            }
        } while (true);

        CatchJNIException(env, "j11");
        CloseGameImage(gai);
        CatchJNIException(env, "j12");

        ... OpenGL ES calls ...

        glTexImage2D( ... );

        z++;
    }

    CatchJNIException(env, "j13");
}
Run Code Online (Sandbox Code Playgroud)

Where CatchJNIException is this (and never prints anything for me):

void CatchJNIException(JNIEnv* env, const char* str)
{
    jthrowable exc = env->ExceptionOccurred();
    if (exc) {
        jclass newExcCls;
        env->ExceptionDescribe();
        env->ExceptionClear();
        newExcCls = env->FindClass( 
            "java/lang/IllegalArgumentException");
        if (newExcCls == NULL) {
            // Couldn't find the exception class.. Uuh..
            LOGE("Failed to catch JNI exception entirely -- could not find exception class.");
            return;
            abort();
        }
        LOGE("Caught JNI exception. (%s)", str);
        env->ThrowNew( newExcCls, "thrown from C code");
//      abort();
    }
}
Run Code Online (Sandbox Code Playgroud)

And the relevant part of GameAssetInfo and associated code is only called from native code and works like this:

void InitGameAssets(GameAssetsInfo& gameasset, JNIEnv* env)
{
    CatchJNIException(env, "jS0");
    FST;
    char str[64];
    sprintf(str, "%s/GameAssets", ROOTSTR);

    gameasset.env = env;
    CatchJNIException(gameasset.env, "jS1");
    gameasset.cls = gameasset.env->FindClass(str);
    CatchJNIException(gameasset.env, "jS2");
    gameasset.openAsset = gameasset.env->GetStaticMethodID(gameasset.cls, "OpenAsset", "(I)V");
    CatchJNIException(gameasset.env, "jS3");
    gameasset.readAsset = gameasset.env->GetStaticMethodID(gameasset.cls, "ReadAsset", "()I");
    CatchJNIException(gameasset.env, "jS4");
    gameasset.closeAsset = gameasset.env->GetStaticMethodID(gameasset.cls, "CloseAsset", "()V");
    CatchJNIException(gameasset.env, "jS5");
    gameasset.buffID = gameasset.env->GetStaticFieldID(gameasset.cls, "buff", "[B");

    CatchJNIException(gameasset.env, "jS6");
    gameasset.openImage = gameasset.env->GetStaticMethodID(gameasset.cls, "OpenImage", "(I)V");
    CatchJNIException(gameasset.env, "jS7");
    gameasset.readImage = gameasset.env->GetStaticMethodID(gameasset.cls, "ReadImage", "()I");
    CatchJNIException(gameasset.env, "jS8");
    gameasset.closeImage = gameasset.env->GetStaticMethodID(gameasset.cls, "CloseImage", "()V");
    CatchJNIException(gameasset.env, "jS9");
    gameasset.buffIntID = gameasset.env->GetStaticFieldID(gameasset.cls, "buffInt", "[I");
    CatchJNIException(gameasset.env, "jS10");
    gameasset.imageWidth = gameasset.env->GetStaticFieldID(gameasset.cls, "imageWidth", "I");
    CatchJNIException(gameasset.env, "jS11");
    gameasset.imageHeight = gameasset.env->GetStaticFieldID(gameasset.cls, "imageHeight", "I");
    CatchJNIException(gameasset.env, "jS12");
    gameasset.imageHasAlpha = gameasset.env->GetStaticFieldID(gameasset.cls, "imageHasAlpha", "I");
    CatchJNIException(gameasset.env, "jS13");
}

void OpenGameAsset(GameAssetsInfo& gameasset, int rsc)
{
    FST;
    CatchJNIException(gameasset.env, "jS14");
    gameasset.env->CallStaticVoidMethod(gameasset.cls, gameasset.openAsset, rsc);
    CatchJNIException(gameasset.env, "jS15");
}

void CloseGameAsset(GameAssetsInfo& gameasset)
{
    FST;
    CatchJNIException(gameasset.env, "jS16");
    gameasset.env->CallStaticVoidMethod(gameasset.cls, gameasset.closeAsset);
    CatchJNIException(gameasset.env, "jS17");
}

int ReadGameAsset(GameAssetsInfo& gameasset)
{
    FST;
    CatchJNIException(gameasset.env, "jS18");
    int ret = gameasset.env->CallStaticIntMethod(gameasset.cls, gameasset.readAsset);
    CatchJNIException(gameasset.env, "jS19");
    if (ret > 0)
    {
    CatchJNIException(gameasset.env, "jS20");
        gameasset.obj = gameasset.env->GetStaticObjectField(gameasset.cls, gameasset.buffID);
    CatchJNIException(gameasset.env, "jS21");
        gameasset.arr = reinterpret_cast<jbyteArray*>(&gameasset.obj);
    }
    return ret;
}

void UnreadGameAsset(GameAssetsInfo& gameasset)
{
    FST;
    CatchJNIException(gameasset.env, "jS22");
    gameasset.env->DeleteLocalRef(gameasset.obj);
    CatchJNIException(gameasset.env, "jS23");
}

void StoreGameAssetChunk(GameAssetsInfo& gameasset, void* store, int offset, int length)
{
    FST;
    CatchJNIException(gameasset.env, "jS24");
    gameasset.env->GetByteArrayRegion(*gameasset.arr, offset, length, (jbyte*)store);
    CatchJNIException(gameasset.env, "jS25");
}

void OpenGameImage(GameAssetsInfo& gameasset, int rsc)
{
    FST;
    CatchJNIException(gameasset.env, "jS26");
    gameasset.env->CallStaticVoidMethod(gameasset.cls, gameasset.openImage, rsc);
    CatchJNIException(gameasset.env, "jS27");
    gameasset.l_imageWidth = (int)gameasset.env->GetStaticIntField(gameasset.cls, gameasset.imageWidth);
    CatchJNIException(gameasset.env, "jS28");
    gameasset.l_imageHeight = (int)gameasset.env->GetStaticIntField(gameasset.cls, gameasset.imageHeight);
    CatchJNIException(gameasset.env, "jS29");
    gameasset.l_imageHasAlpha = (int)gameasset.env->GetStaticIntField(gameasset.cls, gameasset.imageHasAlpha);
    CatchJNIException(gameasset.env, "jS30");
}

void CloseGameImage(GameAssetsInfo& gameasset)
{
    FST;
    CatchJNIException(gameasset.env, "jS31");
    gameasset.env->CallStaticVoidMethod(gameasset.cls, gameasset.closeImage);
    CatchJNIException(gameasset.env, "jS32");
}

int ReadGameImage(GameAssetsInfo& gameasset)
{
    FST;
    CatchJNIException(gameasset.env, "jS33");
    int ret = gameasset.env->CallStaticIntMethod(gameasset.cls, gameasset.readImage);
    CatchJNIException(gameasset.env, "jS34");
    if ( ret > 0 )
    {
        CatchJNIException(gameasset.env, "jS35");
        gameasset.obj = gameasset.env->GetStaticObjectField(gameasset.cls, gameasset.buffIntID);
        CatchJNIException(gameasset.env, "jS36");
        gameasset.arrInt = reinterpret_cast<jintArray*>(&gameasset.obj);
    }
    return ret;
}

void UnreadGameImage(GameAssetsInfo& gameasset)
{
    FST;
    CatchJNIException(gameasset.env, "jS37");
    gameasset.env->DeleteLocalRef(gameasset.obj);
    CatchJNIException(gameasset.env, "jS38");
}

void StoreGameImageChunk(GameAssetsInfo& gameasset, void* store, int offset, int length)
{
    FST;
    CatchJNIException(gameasset.env, "jS39");
    gameasset.env->GetIntArrayRegion(*gameasset.arrInt, offset, length, (jint*)store);
    CatchJNIException(gameasset.env, "jS40");
}

int GetGameImageWidth(GameAssetsInfo& gameasset) { return gameasset.l_imageWidth; }
int GetGameImageHeight(GameAssetsInfo& gameasset) { return gameasset.l_imageHeight; }
int GetGameImageHasAlpha(GameAssetsInfo& gameasset) { return gameasset.l_imageHasAlpha; }
Run Code Online (Sandbox Code Playgroud)

And it's backed by this on the Java side:

public class GameAssets {
    static public Resources res = null;
    static public InputStream is = null;
    static public byte buff[];
    static public int buffInt[];
    static public final int buffSize = 1024;
    static public final int buffIntSize = 2048;

    static public int imageWidth;
    static public int imageHeight;
    static public int imageHasAlpha;
    static public int imageLocX;
    static public int imageLocY;
    static public Bitmap mBitmap;
    static public BitmapFactory.Options decodeResourceOptions = new BitmapFactory.Options();

    public GameAssets(Resources r) {
        res = r;
        buff = new byte[buffSize];
        buffInt = new int[buffIntSize];
        decodeResourceOptions.inScaled = false;
    }
    public static final void OpenAsset(int id) {
        is = res.openRawResource(id);
    }
    public static final int ReadAsset() {
        int num = 0;
        try {
            num = is.read(buff);
        } catch (Exception e) {
            ;
        }
        return num;
    }
    public static final void CloseAsset() {
        try {
            is.close();
        } catch (Exception e) {
            ;
        }
        is = null;
    }

    // We want all the advantages that BitmapFactory can provide -- reading
    // images of compressed image formats -- so we provide our own interface
    // for it.
    public static final void OpenImage(int id) {
        mBitmap = BitmapFactory.decodeResource(res, id, decodeResourceOptions);
        imageWidth = mBitmap.getWidth();
        imageHeight = mBitmap.getHeight();
        imageHasAlpha = mBitmap.hasAlpha() ? 1 : 0;
        imageLocX = 0;
        imageLocY = 0;
    }
    public static final int ReadImage() {
        if (imageLocY >= imageHeight) return 0;
        int numReadPixels = buffIntSize;
        if (imageLocX + buffIntSize >= imageWidth)
        {
            numReadPixels = imageWidth - imageLocX;
            mBitmap.getPixels(buffInt, 0, imageWidth, imageLocX, imageLocY, numReadPixels, 1);
            imageLocY++;
        }
        else
        {
            mBitmap.getPixels(buffInt, 0, imageWidth, imageLocX, imageLocY, numReadPixels, 1);
            imageLocX += numReadPixels;
        }
        return numReadPixels;
    }
    public static final void CloseImage() {
    }
}
Run Code Online (Sandbox Code Playgroud)

Please note the distinct lack of thread safety in the game asset code.

Let me know if more information would be useful.

jog*_*ito 1

从我之前的评论中发帖。“可能会发生 JNI 异常,并且由于异常后不返回,因此可能会导致崩溃。我不知道 Android 的日志记录是如何工作的,但在 C 中一个简单的 printf,不需要立即输出日志。所以发生崩溃的场景,可能是发生了异常,但日志还没输出,系统就崩溃了”
几天不上线。希望崩溃不会再次发生。我讨厌某些问题在没有明确解释的情况下神奇地消失。它们通常会立即回来咬你;-) 不管怎样希望你不要被咬