小编ANT*_*ONY的帖子

我的问题如下：在某些情况下，人们使用 mmap 而不是从文件中读取。一种这样的代码是：

 *mapping = mmap(NULL, *mapping_size, PROT_READ | PROT_WRITE,
      MAP_POPULATE | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

Run Code Online (Sandbox Code Playgroud)

上面的代码试图分配大量的内存。我想知道在这种情况下 mmap 是做什么的，它是如何工作的。每个人都在谈论 mmap wrt 文件的优势。但是这些 fd 设置为 -1 的代码很常见。这是什么意思，这样做有什么好处。？我希望有人能消除我的疑问，由于含糊不清，我无法完全提出疑问。

谢谢

c linux mmap linux-kernel

ANT*_*ONY

2017 05-23

4
推荐指数

2
解决办法

4908
查看次数

_builtin_prefetch() 中第二个参数的作用是什么？

此处的 GCC 文档指定了 _buitin_prefetch 的用法。

第三个论点是完美的。若为0，编译器产生prefetchtnta(%rax)指令若为1，编译器产生prefetcht2(%rax)指令若为2，编译器产生prefetcht1(%rax)指令若为3(默认)，编译器产生prefetcht0 (%rax) 指令。

如果我们改变第三个参数，操作码已经相应地改变了。

但是第二个参数似乎没有任何效果。

__builtin_prefetch(&x,1,2);
__builtin_prefetch(&x,0,2);
__builtin_prefetch(&x,0,1);
__builtin_prefetch(&x,0,0);

Run Code Online (Sandbox Code Playgroud)

以上是生成的示例代码：

以下是组装：

 27:    0f 18 10                prefetcht1 (%rax)
  2a:   48 8d 45 fc             lea    -0x4(%rbp),%rax
  2e:   0f 18 10                prefetcht1 (%rax)
  31:   48 8d 45 fc             lea    -0x4(%rbp),%rax
  35:   0f 18 18                prefetcht2 (%rax)
  38:   48 8d 45 fc             lea    -0x4(%rbp),%rax
  3c:   0f 18 00                prefetchnta (%rax)

Run Code Online (Sandbox Code Playgroud)

可以观察到第三个参数的操作码的变化。但即使我更改了第二个参数（指定读或写），汇编代码也保持不变。<27,2a> 和 <2e,31>。所以它不会向机器提供任何信息。那么第二个论点的目的是什么？

c x86 assembly gcc prefetch

ANT*_*ONY

2019 02-23

4
推荐指数

2
解决办法

1556
查看次数

如何解决perf中的"未计数"？

perf stat -d ./sample.out输出为:

Performance counter stats for './sample.out':

          0.586266 task-clock (msec)         #    0.007 CPUs utilized          
                 2 context-switches          #    0.003 M/sec                  
                 1 cpu-migrations            #    0.002 M/sec                  
               116 page-faults               #    0.198 M/sec                  
          7,35,790 cycles                    #    1.255 GHz                     [81.06%]
     <not counted> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
     <not counted> instructions            
     <not counted> branches                
     <not counted> branch-misses           
   <not supported> L1-dcache-loads:HG      
     <not counted> L1-dcache-load-misses:HG
     <not counted> LLC-loads:HG            
   <not supported> LLC-load-misses:HG      

       0.088013919 seconds time elapsed

Run Code Online (Sandbox Code Playgroud)

我读了为什么会出现 .但我甚至得到了基本的指示器,如指令,分支机构等.任何人都可以建议如何让它工作？

有趣的是:

sudo perf stat sleep 3

给出输出:

Performance counter stats …

Run Code Online (Sandbox Code Playgroud)

linux performance x86 perf

ANT*_*ONY

2017 04-13

3
推荐指数

1
解决办法

1470
查看次数

将Pentium II定时代码转换为内联汇编？

我试图在GCC中使用以下代码.它抛出错误(我猜是因为__asm).为什么这种简单易用的格式在GCC中不起作用？这里提供了扩展汇编的语法.当在内联汇编中使用更多变量时,我感到困惑.有人可以将以下程序转换为适当的形式,并在有变量使用的地方给出必要的解释.

    int time, subtime;
    float x = 5.0f;
    __asm {
            cpuid
            rdtsc
            mov     subtime, eax
            cpuid
            rdtsc
            sub     eax, subtime
            mov     subtime, eax    // Only the last value of subtime is kept
            // subtime should now represent the overhead cost of the
            // MOV and CPUID instructions
            fld     x
            fld     x
            cpuid                   // Serialize execution
            rdtsc                   // Read time stamp to EAX
            mov     time, eax
            fdiv                    // Perform division
            cpuid                   // Serialize …

Run Code Online (Sandbox Code Playgroud)

x86 assembly gcc code-conversion visual-c++

ANT*_*ONY

2019 09-13

2
推荐指数

1
解决办法

121
查看次数