小编jsg*_*guy的帖子

为什么这个C++函数会产生如此多的分支错误预测？

让A是包含奇数个零和一的数组.如果n是大小A,则A构造为使得第一ceil(n/2)元素0和剩余元素1.

所以如果n = 9,A看起来像这样:

0,0,0,0,0,1,1,1,1

目标是找到1s数组中的总和,我们使用此函数执行此操作:

s = 0;
void test1(int curIndex){
    //A is 0,0,0,...,0,1,1,1,1,1...,1

    if(curIndex == ceil(n/2)) return;

    if(A[curIndex] == 1) return;

    test1(curIndex+1);
    test1(size-curIndex-1);

    s += A[curIndex+1] + A[size-curIndex-1];

}

Run Code Online (Sandbox Code Playgroud)

对于给出的问题,这个函数相当愚蠢,但它是一个不同函数的模拟,我希望看起来像这样,并产生相同数量的分支误预测.

以下是整个实验代码:

#include <iostream>
#include <fstream>

using namespace std;


int size;
int *A;
int half;
int s;

void test1(int curIndex){
    //A is 0,0,0,...,0,1,1,1,1,1...,1

    if(curIndex == half) return;
    if(A[curIndex] == …

Run Code Online (Sandbox Code Playgroud)

c++ performance branch-prediction

jsg*_*guy

2016 09-19

33
推荐指数

1
解决办法

1279
查看次数

为什么DFS在一棵树中较慢而在另一棵树中较快？

更新:原来在解析器中有一个生成树的错误.更多在最终编辑.

我们T是一个二叉树,使得每一个内部节点正好有两个孩子.对于这棵树,我们要的代码,为每个节点的功能v中T发现了被定义子树的节点数目v.

例

输入

期望的输出

用红色表示我们想要计算的数字.树的节点将存储在一个数组中,让我们TreeArray按照预先排序布局调用它.

对于上面的示例,TreeArray将包含以下对象:

10, 11, 0, 12, 13, 2, 7, 3, 14, 1, 15, 16, 4, 8, 17, 18, 5, 9, 6

树的节点由以下结构描述:

struct tree_node{

    long long int id; //id of the node, randomly generated
    int numChildren; //number of children, it is 2 but for the leafs it's 0
    int size; //size of the subtree rooted at the current node,
    // what we want …

Run Code Online (Sandbox Code Playgroud)

c++ algorithm tree performance caching

jsg*_*guy

2016 09-15

30
推荐指数

1
解决办法

562
查看次数

为什么G ++编译器不能以同样的方式处理这两个函数？

我有一个A零和一个数组.我想找到所有数字的总和A.我想测试两个函数:

第一个功能

void test1(int curIndex){
    if(curIndex == size) return;
    test1(curIndex+1);
    s+=A[curIndex];
}

Run Code Online (Sandbox Code Playgroud)

第二功能

void test2(int curIndex){
    if(curIndex == size) return;
    s+=A[curIndex];
    test2(curIndex+1);
}

Run Code Online (Sandbox Code Playgroud)

我使用PAPI库来计算指令数量,这是整个实验:

#include <iostream>
#include <fstream>
#include "Statistics.h"

using namespace std;


int size;
int *A;
int s;

void test3(int curIndex){
    if(curIndex == size) return;
    test3(curIndex+1);
    s+=A[curIndex];
}

int main(int argc, char* argv[]){

    size = atoi(argv[1]);
    if(argc!=2){
        cout<<"type ./executable size{odd integer}"<<endl;
        return 1;
    }
    if(size%2!=1){
        cout<<"size must be an odd number"<<endl;
        return 1; …

Run Code Online (Sandbox Code Playgroud)

c++ performance

jsg*_*guy

2016 09-20

28
推荐指数

1
解决办法

1894
查看次数

为什么Perf和Papi为L3缓存引用和未命中提供不同的值？

我正在开发一个项目,我们必须实现一个在理论上被证明是缓存友好的算法.简单来说,如果N是输入,并且B是每次我们有高速缓存未命中时在高速缓存和RAM之间传输的元素数,则该算法将需要O(N/B)访问RAM.

我想表明这确实是实践中的行为.为了更好地理解如何测量各种缓存相关的硬件计数器,我决定使用不同的工具.一个是Perf,另一个是PAPI库.不幸的是,我使用这些工具越多,我就越不了解他们的确切做法.

我正在使用Intel(R)Core(TM)i5-3470 CPU @ 3.20GHz,8 GB RAM,L1缓存256 KB,L2缓存1 MB,L3缓存6 MB.缓存行大小为64字节.我猜这必须是块的大小B.

我们来看下面的例子:

#include <iostream>

using namespace std;

struct node{
    int l, r;
};

int main(int argc, char* argv[]){

    int n = 1000000;

    node* A = new node[n];

    int i;
    for(i=0;i<n;i++){
        A[i].l = 1;
        A[i].r = 4;
    }

    return 0;
}

Run Code Online (Sandbox Code Playgroud)

每个节点需要8个字节,这意味着一个缓存行可以容纳8个节点,所以我应该期待大约1000000/8 = 125000L3缓存未命中.

没有优化(否-O3),这是perf的输出:

 perf stat -B -e cache-references,cache-misses ./cachetests 

 Performance counter stats for …

Run Code Online (Sandbox Code Playgroud)

c++ performance caching papi perf

jsg*_*guy

2016 09-27

13
推荐指数

1
解决办法

1822
查看次数

为什么访问类的私有变量与访问结构的变量一样有效？

我实现了一些主数据结构是树的算法.我使用一个类来表示一个节点和一个表示树的类.因为节点得到了很多更新,所以我称之为许多setter和getter.

因为我多次听说函数调用很昂贵,所以我想也许如果我使用结构表示节点和树,它会使我的算法在实践中更有效率.

在这样做之前,我决定进行一个小实验,看看是否真的如此.

我创建了一个有一个私有变量,一个setter和一个getter的类.此外,我创建了一个具有一个变量的结构,没有setter/getter,因为我们可以通过调用来更新变量struct.varName.结果如下:

运行次数是我们调用setter/getter的次数.这是实验的代码:

#include <iostream>
#include <fstream>

#define BILLION  1000000000LL

using namespace std;

class foo{
private:
    int a;
public:

    void set(int newA){
        a = newA;
    }
    int get(){
        return a;
    }
};

struct bar{
    int a;
};

timespec startT, endT;

void startTimer(){
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startT);
}

double endTimer(){
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endT);
    return endT.tv_sec * BILLION + endT.tv_nsec - (startT.tv_sec * BILLION + startT.tv_nsec);
}


int main() {

    int runs = 10000000;
    int startRun = 10000;
    int step = …

Run Code Online (Sandbox Code Playgroud)

c++ algorithm struct

jsg*_*guy

2016 04-30

9
推荐指数

1
解决办法

170
查看次数

如何让matplotlib显示所有x坐标？

例如,在以下代码中:

import numpy as np
import matplotlib.pyplot as plt

N = 10
x = [1,2,3,4,5,6,7,8,9,10]
y = np.random.rand(N)

plt.scatter(x, y)
plt.show()

Run Code Online (Sandbox Code Playgroud)

我得到以下情节

在此输入图像描述

如您所见,在x轴上只显示偶数值.如何强制matplotlib显示所有值,即1 2 3 4 5 6 7 8 9 10？

python matplotlib

jsg*_*guy

lucky-day

8
推荐指数

1
解决办法

2万
查看次数

为什么这个递归C++函数有这么糟糕的缓存行为？

我们T是一个有根的二叉树,使得每一个内部节点正好有两个孩子.树的节点将存储在一个数组中,让我们TreeArray按照预先排序布局调用它.

例如,如果这是我们拥有的树:

然后TreeArray将包含以下节点对象:

7, 3, 1, 0, 2, 6, 12, 9, 8, 11, 13

此树中的节点是此类结构:

struct tree_node{

    int id; //id of the node, randomly generated
    int numChildren; //number of children, it is 2 but for the leafs it's 0

    int pos; //position in TreeArray where the node is stored
    int lpos; //position of the left child
    int rpos; //position of the right child

    tree_node(){
            id = -1;
            pos = lpos = rpos = -1; …

Run Code Online (Sandbox Code Playgroud)

c++ tree caching

jsg*_*guy

2016 09-08

7
推荐指数

1
解决办法

377
查看次数

为什么必须为存储在堆中的函数的局部变量调用delete？

假设您具有以下功能:

 void doSomething(){
    int *data = new int[100];
 }

Run Code Online (Sandbox Code Playgroud)

为什么会产生内存泄漏？由于我无法在函数外部访问此变量,为什么每次调用此函数时编译器都不会自行调用delete？

c++ memory memory-leaks

jsg*_*guy

lucky-day

6
推荐指数

2
解决办法

5081
查看次数

如果任何命令运行时间过长，则中止脚本

我有一个名为的应用程序A，它接收输入n并在返回答案之前运行一段时间。

我想使用以下简单的 bash 脚本A来增加大小：n

#!/bin/sh

time ./A 10
time ./A 20
time ./A 30
.
.
.

Run Code Online (Sandbox Code Playgroud)

然而我还需要一些非常重要的东西。如果一个time调用执行时间太长，因为输入很大，那么我需要整个 bash 脚本停止运行。

有一个简单的方法可以做到这一点吗？

bash

jsg*_*guy

2017 04-21

5
推荐指数

1
解决办法

4789
查看次数

如果以root身份运行它,为什么这个C++程序需要很长时间才能完成？

我想通过执行以下代码清除L1,L2和L3缓存50次.但是如果我通过输入来运行它会变得非常慢sudo ./a.out.另一方面,如果我只是写./a.out它将几乎立即完成执行.我不明白这个的原因,因为我没有在终端中出现任何错误.

#include <iostream>
#include <cstdlib>
#include <vector>
#include <fstream>
#include <unistd.h>

using namespace std;

void clear_cache(){
    sync();
    std::ofstream ofs("/proc/sys/vm/drop_caches");
    ofs << "3" << std::endl;
    sync();
}


int main() {

    for(int i = 0; i < 50; i++)
        clear_cache();

    return 0;
};

Run Code Online (Sandbox Code Playgroud)

c++ caching

jsg*_*guy

lucky-day

4
推荐指数

1
解决办法

157
查看次数