小编ead_ead的帖子

干净地压制gcc的'final`建议警告(`-Wsuggest-final-types`和`-Wsuggest-final-methods`)

我喜欢使用-Wsuggest-final-types和-Wsuggest-final-methods编译我的代码,以便警告final关键字可用于允许编译器更积极地优化的机会.

但有时候,建议是不正确的 - 例如,我有一个Base带有virtual ~Base()析构函数的类,在另一个项目中以多态方式使用,gcc建议我Base可以标记为final.

有没有办法"干净地"告诉编译器Base多态使用并且不应该标记为final？

我能想到的唯一方法是使用#pragma指令,但我发现它使代码混乱且难以阅读.

理想情况下,我正在寻找non-final可以在类/方法声明中添加/附加的关键字或属性.

c++ gcc warnings final

7
推荐指数

1
解决办法

531
查看次数

PyPy:在具有整数的列表中使用None时会导致严重的性能损失

因为我想要实现的算法使用索引,1..n并且因为它非常容易将每个索引移动一个,所以我决定变得聪明并在每个列表的开头插入一个虚拟元素,因此我可以使用本文中的原始公式.

为了简洁起见,请考虑以下玩具示例:

def calc(N):
    nums=[0]+range(1,N+1)
    return sum(nums[1:]) #skip first element

Run Code Online (Sandbox Code Playgroud)

但是,我担心,我的结果是虚假的,因为我可以在某处意外访问第0个元素而不是意识到它.所以我变得更聪明,None而不是0作为第一个元素使用 - 每次算术操作都会导致运行时错误:

def calc_safe(N):
    nums=[None]+range(1,N+1) #here we use "None"
    return sum(nums[1:])

Run Code Online (Sandbox Code Playgroud)

令人惊讶的是,这个小小的变化导致了pypy的巨大性能损失(即使使用当前的5.8版本) - 代码变得慢了大约10倍!这是我机器上的时间:

                    pypy-5.8    cpython
calc(10**8)         0.5 sec     5.5 sec
calc_safe(10**8)    7.5 sec     5.5 sec

Run Code Online (Sandbox Code Playgroud)

作为一个侧节点:Cpython不关心,是否None使用.

所以我的问题是双重的:

显然使用None不是一个好主意,但为什么呢？
是否有可能获得None-approach 的安全性并保持性能？

编辑:正如Armin所解释的那样,并非所有列表都相同,我们可以看到,通过以下方式使用了哪种策略:

import __pypy__ 
print __pypy__.strategy(nums)

Run Code Online (Sandbox Code Playgroud)

在第一种情况下,它是IntegerListStrategy在第二种情况下ObjectListStrategy.如果我们使用大整数值(例如2**100)代替,则会发生同样的情况None.

python performance pypy

7
推荐指数

1
解决办法

77
查看次数

np.zeros 和 np.full 内存消耗和性能差异的原因

测量内存消耗时np.zeros：

import psutil
import numpy as np

process = psutil.Process()
N=10**8
start_rss = process.memory_info().rss
a = np.zeros(N, dtype=np.float64)
print("memory for a", process.memory_info().rss - start_rss)

Run Code Online (Sandbox Code Playgroud)

结果是意外的8192字节，即几乎为 0，而 1e8 双倍将需要 8e8 字节。

当更换np.zeros(N, dtype=np.float64)由np.full(N, 0.0, dtype=np.float64)所需存储器a是800002048字节。

运行时间也有类似的差异：

import numpy as np
N=10**8
%timeit np.zeros(N, dtype=np.float64)
# 11.8 ms ± 389 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.full(N, 0.0, dtype=np.float64)
# 419 ms …

Run Code Online (Sandbox Code Playgroud)

python performance numpy

7
推荐指数

1
解决办法

236
查看次数

Cython - 将字符串列表转换为char**

如何将python字符串的python列表转换为空终止,char**以便将其传递给外部C函数？

我有:

struct saferun_task:
    saferun_jail   *jail
    saferun_limits *limits

    char **argv
    int stdin_fd  
    int stdout_fd
    int stderr_fd

int saferun_run(saferun_inst *inst, saferun_task *task, saferun_stat *stat)

Run Code Online (Sandbox Code Playgroud)

在cdef extern块中

我希望将类似的内容转换('./a.out', 'param1', 'param2') 为我可以指定的内容saferun_task.argv

怎么样？

arrays list cython

6
推荐指数

1
解决办法

3266
查看次数

Cython 编译将文本附加到文件名中，如何摆脱它？

我正在 Ubuntu 平台上使用 cython。一切都很好，除了一件事让我烦恼。将 cython 项目编译为 .so 文件时，.pyx 文件的文件名会附加“cpython-36m-x86_64-linux-gnu”。例如，如果我构建“helloworld.pyx”，则生成的 .so 文件称为：“helloworld.cpython-36m-x86_64-linux-gnu.so”。然而，我只想将其命名为“helloworld.so”。

我认为答案是相当微不足道的，所以我开始谷歌搜索，即使在 30 分钟后我也找不到任何关于这个主题的信息。有人有什么主意吗？

这是我的 .pyx 文件：

print('hello world')

Run Code Online (Sandbox Code Playgroud)

setup.py 文件：

from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("helloworld.pyx")
)

Run Code Online (Sandbox Code Playgroud)

构建文件：

python setup.py build_ext --inplace
Compiling helloworld.pyx because it changed.
[1/1] Cythonizing helloworld.pyx
running build_ext
building 'helloworld' extension
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/**/anaconda3/include/python3.6m -c helloworld.c -o build/temp.linux-x86_64-3.6/helloworld.o
gcc -pthread -shared -L/home/**/anaconda3/lib -Wl,-rpath=/home/ed/anaconda3/lib,--no-as-needed build/temp.linux-x86_64-3.6/helloworld.o -L/home/**/anaconda3/lib -lpython3.6m -o /home/**/new_project/helloworld.cpython-36m-x86_64-linux-gnu.so

Run Code Online (Sandbox Code Playgroud)

python gcc distutils cython cythonize

6
推荐指数

1
解决办法

2048
查看次数

g ++处理复制std :: complex

作为自我教育项目的一部分,我研究了g ++如何处理std::complex- 类型,并对这个简单的函数感到困惑:

#include <complex>  
std::complex<double> c;

void get(std::complex<double> &res){
    res=c;
}

Run Code Online (Sandbox Code Playgroud)

用Linux64 编译g++-6.3 -O3(或者也是-Os)我得到了这个结果:

    movsd   c(%rip), %xmm0
    movsd   %xmm0, (%rdi)
    movsd   c+8(%rip), %xmm0
    movsd   %xmm0, 8(%rdi)
    ret

Run Code Online (Sandbox Code Playgroud)

因此它将实部和虚部单独移动为64位浮点数.但是,我希望程序集使用两个movups而不是四个movsd,即同时将实部和虚部移动为128位包:

    movups  c(%rip), %xmm0
    movups  %xmm0, (%rdi)
    ret

Run Code Online (Sandbox Code Playgroud)

这不仅是我的机器(英特尔Broadwell)的两倍 - movsd反转,而且只需要16个字节,而movsd-version需要36个字节.

g ++创建程序集的原因是什么movsd？

还有一个额外的编译器标志来触发movups我应该使用的旁边的用法-O3？
使用movups我不知道有什么缺点？
g ++在这里不会产生最佳装配？
别的什么？

更多上下文:我尝试比较两个可能的函数签名:

std::complex<double> get(){
    return c;
}

Run Code Online (Sandbox Code Playgroud)

和

void get(std::complex<double> &res){
    res=c;
}

Run Code Online (Sandbox Code Playgroud)

由于SystemV …

c++ assembly gcc

6
推荐指数

1
解决办法

121
查看次数

尾部呼叫优化似乎略微恶化了性能

在快速排序实现中,左边的数据用于纯-O2优化代码,右边的数据是-O2带有-fno-optimize-sibling-calls标志的优化代码, 即关闭尾调用优化.这是3次不同运行的平均值,变化似乎可以忽略不计.值的范围为1-1000,时间以毫秒为单位.编译器是MinGW g ++,版本6.3.0.

size of array     with TLO(ms)    without TLO(ms)
      8M                35,083           34,051 
      4M                 8,952            8,627
      1M                   613              609

Run Code Online (Sandbox Code Playgroud)

以下是我的代码:

#include <bits/stdc++.h>
using namespace std;

int N = 4000000;

void qsort(int* arr,int start=0,int finish=N-1){
    if(start>=finish) return ;
    int i=start+1,j = finish,temp;
    auto pivot = arr[start];
    while(i!=j){
        while (arr[j]>=pivot && j>i) --j;
        while (arr[i]<pivot && i<j) ++i;
        if(i==j) break;
        temp=arr[i];arr[i]=arr[j];arr[j]=temp; //swap big guy to right side
    }
    if(arr[i]>=arr[start]) --i;

    temp = arr[start];arr[start]=arr[i];arr[i]=temp; //swap pivot …

Run Code Online (Sandbox Code Playgroud)

c++ algorithm optimization g++ compiler-optimization

6
推荐指数

1
解决办法

202
查看次数

为什么gcc对一个版本执行尾调用优化而对另一个版本执行尾调用优化？

尝试尾调用优化(tco),我偶然发现了以下奇怪的例子:

unsigned long long int fac1(unsigned long long int n){
  if (n==0)
    return 1;
  return n*fac1(n-1);
}

Run Code Online (Sandbox Code Playgroud)

实际上,我印象深刻,gcc能够在这里执行tco(带-O2标志),因为它不是那么直接:

fac1(unsigned long long):
        testq   %rdi, %rdi
        movl    $1, %eax
        je      .L4
.L3:
        imulq   %rdi, %rax
        subq    $1, %rdi
        jne     .L3
        rep ret
.L4:
        rep ret

Run Code Online (Sandbox Code Playgroud)

但是,从返回类型更改unsigned long long int为unsigned intgcc后无法执行tlo:

unsigned int fac2(unsigned long long int n){
  if (n==0)
    return 1;
  return n*fac2(n-1);
}

Run Code Online (Sandbox Code Playgroud)

我们可以清楚地看到生成的程序集中的递归调用:

fac2(unsigned long long):
        testq   %rdi, %rdi
        jne …

Run Code Online (Sandbox Code Playgroud)

c++ optimization gcc tail-call-optimization

6
推荐指数

1
解决办法

163
查看次数

为什么这种插入比插入未排序列表更快？

在我的堆和未排序的列表中插入100000000个元素之后,堆插入实际上似乎更快(12秒对20秒).为什么是这样？我相信堆插入是O(logn)在未排序的列表插入时O(1).我还注意到我的堆插入实现实际上并没有随着输入的数量而扩展.这也让我感到困惑.

这是我运行的代码:

int main ()
{
    clock_t unsortedStart;
    clock_t heapStart;

    double unsortedDuration;
    double heapDuration;

    int num_pushes = 100000000;
    int interval = 10000;

    ofstream unsorted ("unsorted.txt");
    ofstream heap ("heap.txt");

    UnsortedPQ<int> unsortedPQ; 
    HeapPQ<int> heapPQ; 

    unsortedStart = clock();

    for (int i = 0; i < num_pushes; ++i)
    {
        if (i % interval == 0) {
            unsortedDuration = ( clock() - unsortedStart ) / (double) CLOCKS_PER_SEC;
            unsorted << unsortedDuration << " " << i << endl;
        }

        unsortedPQ.insertItem(rand() % 100); …

Run Code Online (Sandbox Code Playgroud)

c++ heap list data-structures

6
推荐指数

1
解决办法

282
查看次数

不遵守Cython编译器指令language_level

我正在使用Cython的编译器指令（http://docs.cython.org/en/latest/src/reference/compilation.html#globally）。

$ cat temp.pyx
# cython: language_level=3
print("abc", "def", sep=" ,") # invalid in python 2

Run Code Online (Sandbox Code Playgroud)

编译：

$ cythonize -i world_dep.pyx
Error compiling Cython file:
------------------------------------------------------------
...
# cython: language_level=3


print("abc", "def", sep=" ,")                      ^
------------------------------------------------------------

temp.pyx:4:23: Expected ')', found '='

Run Code Online (Sandbox Code Playgroud)

因此，language_level指令未得到尊重。因此，cythonize最终使用Python 2语义，并且由于上述print语句在Python 2中无效而引发错误。

但是，包括任何Python语句都可以完成以下工作：

 $ cat temp.pyx
 # cython: language_level=3
 import os
 print("abc", "def", sep=" ,")

Run Code Online (Sandbox Code Playgroud)

编译和执行：

$ cythonize -i temp.pyx; python -c "import temp"
abc, def

Run Code Online (Sandbox Code Playgroud)

知道import语句如何使language_level得到尊重吗？

我也在Cython GitHub存储库上提出了同样的问题？

compiler-directives cython python-2.7 python-3.x cythonize

6
推荐指数

1
解决办法

4906
查看次数

标签统计

c++ ×5

gcc ×4

list ×2

optimization ×2

performance ×2

compiler-directives ×1

compiler-optimization ×1

data-structures ×1

g++ ×1

heap ×1

pypy ×1

tail-call-optimization ×1

«
1
2
3
4
5
…
8
»