跟踪内核中神秘的高优先级线程挂起

描述

我正在研究一个在多核ARMv7a SoC上运行的嵌入式Linux系统(使用内核3.4和仿生,类似Android).我们有一个用户空间线程,它基本上处理来自内核的事件.事件是从IRQ生成的,必须以非常低的延迟对用户空间做出反应.

线程以SCHED_FIFO优先级0运行.它是系统中唯一的优先级0线程.线程的近似代码:

    while (1)
    {
        struct pollfd fds[1];
        fds[0].fd = fd;
        fds[0].events = POLLIN|POLLRDNORM|POLLPRI;

        int ret = poll(fds, 1, reallyLongTimeout);
        FTRACE("poll() exit");
        if (ret > 0)
        {
            // notify worker threads of pending events
        }
    }

Run Code Online (Sandbox Code Playgroud)

通常我们会得到非常好的延迟(线程在IRQ发生的同一毫秒内完全往返于poll()),然而随机我们有几十毫秒的延迟会破坏一切.在遍历整个地方之后,我得出结论,延迟发生在IRQ触发之后和poll()系统调用返回之前,因为线程使自己处于睡眠状态.然后一段时间后被一些未知的力量唤醒,一切都继续.

我怀疑其他一些IRQ但是在启用了sched:,irq : , timer:*tracing我不得不排除它.我在移植系统调用时遇到了一些困难:*跟踪器到ARM内核.系统调用跟踪器工作,但如果我也启用sched:*我在ring_buffer代码中得到各种各样的恐慌.

在sys_poll()中插入一些自定义跟踪点之后,我得到了一个令人不舒服的结论,即我的线程在sys_poll()返回之后但在它重新出现在用户空间之前就已经睡着了.

这是带有我在fs/select.c中的自定义跟踪点的带注释的跟踪:

 <my thread>-915   [001] ...1    17.589394: custom: do_poll:786 - calling do_pollfd
 <my thread>-915   [001] ...1    17.589399: custom: do_poll:794 - failed, no events
 <my thread>-915   [001] ...1    17.589402: custom: do_poll:823 - going to sleep, count = …

Run Code Online (Sandbox Code Playgroud)

multithreading multicore linux-kernel embedded-linux low-latency

Yur*_*nko

2014 06-06

24
推荐指数

2
解决办法

1439
查看次数

是否有无锁矢量实现？

谷歌为"锁定免费载体"的第一个结果是Damian Dechev,Peter Pirkelbauer和Bjarne Stroustrup描述理论无锁向量的研究论文.这个或任何其他无锁向量是否已实现？

c++ concurrency vector lock-free concurrent-vector

qdi*_*dii

2014 08-19

11
推荐指数

1
解决办法

4379
查看次数

为什么我的线程有时会"口吃"？

我正在尝试编写一些多线程代码来从DAQ设备读取并同时渲染捕获的信号:

std::atomic <bool> rendering (false);
auto render = [&rendering, &display, &signal] (void)
    {
        while (not rendering)
            {std::this_thread::yield ();};
        do {display.draw (signal);}
            while (display.rendering ()); // returns false when user quits
        rendering = false;
    };
auto capture = [&rendering, &daq] (void)
    {
        for (int i = daq.read_frequency (); i --> 0;)
            daq.record (); // fill the buffer before displaying the signal
        rendering = true;
        do {daq.record ();} 
            while (rendering);
        daq.stop ();
    };
std::thread rendering_thread (render);
std::thread capturing_thread (capture);

rendering_thread.join ();
capturing_thread.join …

Run Code Online (Sandbox Code Playgroud)

c++ multithreading openmp c++11

eve*_*ode

2014 05-05

11
推荐指数

1
解决办法

630
查看次数

pthreads诉SSE弱记忆排序

x86_64上的Linux glibc pthread函数是否作为弱有序内存访问的范围？(pthread_mutex_lock/unlock是我感兴趣的确切函数).

SSE2提供了一些具有弱内存排序的指令(特别是非临时存储,例如movntps).如果您正在使用这些指令并希望保证另一个线程/核心看到一个排序,那么我理解您需要一个明确的栅栏,例如,一个sfence指令.

通常,您确实希望pthread API适当地充当栅栏.但是,我怀疑x86上的正常C代码不会产生弱有序的内存访问,所以我不相信pthreads需要充当弱有序访问的栅栏.

通过glibc pthread源代码读取,最后使用"lock cmpxchgl"实现互斥,至少在无争用路径上.所以我猜我需要知道的是,该指令是否为SSE2弱有序访问的栅栏？

multithreading sse pthreads atomic memory-fences

use*_*586

2014 08-09

9
推荐指数

1
解决办法

285
查看次数

并行多个嵌套循环与tbb

使用tbb并行三个嵌套独立循环的最佳方法是什么？

for(int i=0; i<100; i++){
    for(int j=0; j<100; j++){
        for(int k=0; k<100; k++){
            printf("Hello World \n");
        }
     }
 }

Run Code Online (Sandbox Code Playgroud)

c++ tbb

use*_*182

2015 04-20

8
推荐指数

1
解决办法

2574
查看次数

相对shebang：如何编写运行随附的便携式解释器的可执行脚本

假设我们有一个程序/包，它带有自己的解释器和一组脚本，这些脚本应该在执行时调用它（使用 shebang）。

假设我们想要保持它的可移植性，因此即使只是简单地复制到不同的位置（不同的机器）而不调用设置/安装或修改环境 (PATH)，它仍然可以运行。不应为这些脚本混入系统解释器。

给定的约束排除了两种已知的方法，如具有绝对路径的 shebang：

#!/usr/bin/python

Run Code Online (Sandbox Code Playgroud)

并在环境中搜索

#!/usr/bin/env python

Run Code Online (Sandbox Code Playgroud)

单独的发射器看起来很丑，是不可接受的。

我发现了shebang限制的很好的总结，它描述了为什么shebang中的相对路径没用，并且解释器不能有多个参数：http : //www.in-ulm.de/~mascheck/various/shebang/

而且我还通过“多行shebang”技巧为大多数语言找到了实用的解决方案。它允许编写这样的脚本：

#!/bin/sh
"exec" "`dirname $0`/python2.7" "$0" "$@"
print copyright

Run Code Online (Sandbox Code Playgroud)

但有时，我们不想使用这种方法扩展/修补依赖于 shebang 的现有脚本，并使用解释器的绝对路径。例如，Python 的 setup.py 支持--executable选项基本上允许为其生成的脚本指定 shebang 内容：

python setup.py build --executable=/opt/local/bin/python

Run Code Online (Sandbox Code Playgroud)

那么，特别是--executable=为了实现所需的可移植性，可以指定什么？或者换句话说，因为我想让这个问题不要太针对 Python ......

问题

如何编写一个shebang，它指定一个解释器，其路径相对于正在执行的脚本的位置？

python unix linux shell shebang

Ant*_*ton

2015 10-21

7
推荐指数

2
解决办法

4066
查看次数

concurrentdvector用于2d数组

我目前正在尝试使用表示2D数组tbb::concurrent_vector<T>.这个2d数组将被许多不同的线程访问,这就是为什么我希望它能够最有效地处理并行访问.

我想出了两个解决方案:

用a tbb::concurrent_vector<tbb::concurrent_vector<T> >来存储它.
将所有内容存储在一个tbb::concurrent_vector<T>和访问元素中x * width + y

我偏爱第二个,因为我不想锁定整行来访问一个元素(因为我假设要访问该元素array[x][y],tbb实现将锁定第xth行然后锁定第th y个元素).

我想知道哪种解决方案对你来说更好.

c++ parallel-processing multithreading tbb concurrent-vector

Rip*_*lka

2014 08-19

6
推荐指数

1
解决办法

1928
查看次数

为什么 OpenMP 的性能优于线程？

我一直在 OpenMP 中调用它

#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}

Run Code Online (Sandbox Code Playgroud)

这在 C++11 std::threads 中（我相信这些只是 pthreads）

vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i])); 
}
for (auto& thread : threads)
{
 thread.join();
}

Run Code Online (Sandbox Code Playgroud)

但是，OpenMP 实现的速度是原来的 2 倍——更快！我本来期望 C++11 线程更快，因为它们更底层。注意：上面的代码不仅被调用一次，而且可能在循环中被调用 10,000 次，所以也许这与它有关？

编辑：为了澄清，在实践中，我要么使用 OpenMP 要么使用 C++11 版本——而不是同时使用两者。当我使用 OpenMP 代码时，需要 45 秒，当我使用 C++11 时，需要 100 秒。

c++ multithreading c++11

use*_*666

2014 08-26

6
推荐指数

2
解决办法

3381
查看次数

如何进行多线程队列处理

默认情况下，C++ 容器应该是线程安全的。我一定是queue错误地使用了多线程，因为对于这段代码：

#include <thread>
using std::thread;
#include <iostream>
using std::cout;
using std::endl;
#include <queue>
using std::queue;
#include <string>
using std::string;
using std::to_string;
#include <functional>
using std::ref;


void fillWorkQueue(queue<string>& itemQueue) {
    int size = 40000;
    for(int i = 0; i < size; i++)
        itemQueue.push(to_string(i));
}

void doWork(queue<string>& itemQueue) {
    while(!itemQueue.empty()) {
        itemQueue.pop();
    }   
}

void singleThreaded() {
    queue<string> itemQueue;
    fillWorkQueue(itemQueue);
    doWork(itemQueue);
    cout << "done\n";
}

void multiThreaded() {
    queue<string> itemQueue;
    fillWorkQueue(itemQueue);
    thread t1(doWork, ref(itemQueue));
    thread t2(doWork, ref(itemQueue)); …

Run Code Online (Sandbox Code Playgroud)

c++ multithreading c++11 concurrent-queue

Chr*_*ord

2014 05-14

6
推荐指数

2
解决办法

2万
查看次数