相关疑难解决方法(0)

malloc和calloc之间的区别？

做的有什么区别:

ptr = (char **) malloc (MAXELEMS * sizeof(char *));

Run Code Online (Sandbox Code Playgroud)

要么:

ptr = (char **) calloc (MAXELEMS, sizeof(char*));

Run Code Online (Sandbox Code Playgroud)

什么时候使用calloc而不是malloc是一个好主意,反之亦然？

c malloc calloc

use*_*033

2013 08-26

743
推荐指数

13
解决办法

51万
查看次数

MOVSD performance depends on arguments

I just noticed a pieces of my code exhibit different performance when copying memory. A test showed that a memory copying performance degraded if the address of destination buffer is greater than address of source. Sounds ridiculous, but the following code shows the difference (Delphi):

  const MEM_CHUNK = 50 * 1024 * 1024;
        ROUNDS_COUNT = 100;


  LpSrc := VirtualAlloc(0,MEM_CHUNK,MEM_COMMIT,PAGE_READWRITE);
  LpDest := VirtualAlloc(0,MEM_CHUNK,MEM_COMMIT,PAGE_READWRITE);

  QueryPerformanceCounter(LTick1);
  for i := 0 to ROUNDS_COUNT - 1 do
    CopyMemory(LpDest,LpSrc,MEM_CHUNK);
  QueryPerformanceCounter(LTick2);
    // show timings

  QueryPerformanceCounter(LTick1);
  for i …

Run Code Online (Sandbox Code Playgroud)

delphi performance x86 assembly memory-bandwidth

use*_*735

2019 07-23

9
推荐指数

1
解决办法

252
查看次数

从未初始化的缓冲区复制比从初始化的缓冲区复制要快得多

我的任务是开发一个测试软件，在一台 32GB RAM 的机器上的 Linux（X86-64，内核 4.15）上通过 1 个 TCP 套接字生成 100Gbps 的流量。

我开发了类似以下代码（为了简单起见，删除了一些健全性检查）来在一对 veth 接口（其中一个位于不同的 netns 中）上运行。

bmon根据开源软件，它在我的 PC 上生成大约 60Gbps 。令我惊讶的是，如果我删除该语句memset(buff, 0, size);，我会得到大约 94Gbps。这非常令人费解。

void test(int sock) {
    int size = 500 * 0x100000;
    char *buff = malloc(size);
    //optional
    memset(buff, 0, size);
    int offset = 0;
    int chunkSize = 0x200000;
    while (1) {
        offset = 0;
        while (offset < size) {
            chunkSize = size - offset;
            if (chunkSize > CHUNK_SIZE) chunkSize = CHUNK_SIZE;
            send(sock, &buff[offset], chunkSize, …

Run Code Online (Sandbox Code Playgroud)

c sockets linux x86 linux-kernel

pac*_*tie

2022 06-13

6
推荐指数

1
解决办法

355
查看次数

故意提高 L1 缓存未命中率的程序

我目前正在尝试编写一个 L1 缺失率尽可能高的程序。

为了测量 L1 缺失率，我在 Intel Core i7 处理器上使用 MEM_LOAD_RETIRED.L1_MISS 和 MEM_LOAD_RETIRED.L1_HIT 性能计数器事件（我对填充缓冲区命中不感兴趣）。我修改了 Linux 内核，以便在每次上下文切换时提供准确的测量结果，以便我可以准确地确定每个程序的命中和未命中次数。

硬件预取器被禁用。

这是我目前拥有的代码：

#define LINE_SIZE 64
#define CACHE_SIZE 4096 * 8
#define MEM_SIZE CACHE_SIZE * 64


void main(int argc, char* argv[])
{

    volatile register char* addr asm ("r12") = mmap(0, MEM_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

    volatile register unsigned long idx asm ("r13") = 0;
    volatile register unsigned long store_val asm ("r14") = 0;

    volatile register unsigned long x64 asm ("r15") = 88172645463325252ull;

    while(1) …

Run Code Online (Sandbox Code Playgroud)

c linux intel cpu-cache

arn*_*arn

2022 08-09

5
推荐指数

1
解决办法

246
查看次数

为什么迭代std :: array比迭代std :: vector快得多？

编者注：
启用优化的后续问题仅对循环
计时：为什么通过`std :: vector`进行迭代比通过`std :: array`进行迭代更快？
在这里我们可以看到延迟分配页面错误在读取未初始化的BSS内存与在定时循环之外初始化的动态分配+写入内存的影响。

我尝试分析此代码：

#include <vector>
#include <array>
#include <stdio.h>

using namespace std;

constexpr int n = 400'000'000;
//vector<int> v(n);
array<int, n> v;

int main()
{
    int res = 0;
    for(int x : v)
        res += x;
    printf("%d\n", res);
}

Run Code Online (Sandbox Code Playgroud)

在我的机器上，array版本比快vector。

在这种情况下，内存分配是无关紧要的，因为它只有一次。

$ g++ arrVsVec.cpp -O3
$ time ./a.out
0

real    0m0,445s
user    0m0,203s
sys 0m0,238s

Run Code Online (Sandbox Code Playgroud)

$ g++ arrVsVec.cpp -O3
$ time ./a.out
0

real    0m0,749s
user    0m0,273s
sys …

Run Code Online (Sandbox Code Playgroud)

c++ linux performance microbenchmark

tuk*_*ket

2019 07-21

3
推荐指数

3
解决办法

185
查看次数

绩效评估的惯用方法？

我正在评估我的项目的网络+渲染工作负载。

程序连续运行一个主循环：

while (true) {
   doSomething()
   drawSomething()
   doSomething2()
   sendSomething()
}

Run Code Online (Sandbox Code Playgroud)

主循环每秒运行 60 多次。

我想查看性能故障，每个程序需要多少时间。

我担心的是，如果我打印每个程序的每个入口和出口的时间间隔，

这会导致巨大的性能开销。

我很好奇什么是衡量性能的惯用方法。

日志打印是否足够好？

benchmarking microbenchmark

shp*_*ark

lucky-day

1
推荐指数

1
解决办法

1322
查看次数

标签统计

c ×3

linux ×3

microbenchmark ×2

performance ×2

x86 ×2

assembly ×1

benchmarking ×1

c++ ×1

calloc ×1

cpu-cache ×1

delphi ×1

intel ×1

linux-kernel ×1

malloc ×1

memory-bandwidth ×1

sockets ×1

malloc和calloc之间的区别？

MOVSD performance depends on arguments

从未初始化的缓冲区复制比从初始化的缓冲区复制要快得多

故意提高 L1 缓存未命中率的程序

为什么迭代std :: array比迭代std :: vector快得多？

绩效评估的惯用方法？

标签 统计

标签统计