小编Vem*_*ulo的帖子

为什么编译器不能优化未使用的 static std::string？

如果我使用 GCC 或 Clang 编译此代码并启用-O2优化，我仍然会得到一些全局对象初始化。任何代码是否有可能访问这些变量？

#include <string>
static const std::string s = "";

int main() { return 0; }

Run Code Online (Sandbox Code Playgroud)

编译器输出：

main:
        xor     eax, eax
        ret
_GLOBAL__sub_I_main:
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:s
        mov     edi, OFFSET FLAT:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev
        mov     QWORD PTR s[rip], OFFSET FLAT:s+16
        mov     QWORD PTR s[rip+8], 0
        mov     BYTE PTR s[rip+16], 0
        jmp     __cxa_atexit

Run Code Online (Sandbox Code Playgroud)

具体来说，我没想到这个_GLOBAL__sub_I_main:部分。

神箭链接

编辑：即使使用简单的自定义类型，编译器仍然会生成一些代码。

class Aloha
{
public:
    Aloha () : i(1) {}
    ~Aloha() = default;
private:
    int i;
};
static …

Run Code Online (Sandbox Code Playgroud)

c++ static-constructor compiler-optimization

Vem*_*ulo

2022 03-12

7
推荐指数

1
解决办法

751
查看次数

如何使用 AllocaInst 创建 LLVM 数组类型？

我想在堆栈上创建 LLVM ArrayType 所以我想使用AllocaInst (Type *Ty, Value *ArraySize=nullptr, const Twine &Name="", Instruction *InsertBefore=nullptr). 问题是我不明白这个界面。我猜那Ty会是类似的东西ArrayType::get(I.getType(), 4)，但我应该付出什么ArraySize。此外，它需要Value*，所以它让我很困惑。

要么我误解了 llvm alloc，要么我需要提供一个 llvm 常量作为数组大小的值。如果我必须给出常量，是不是有点多余，因为ArrayType包含 numElement 作为信息。

作为示例代码行，我尝试的方式是：

AllocaInst* arr_alloc = new AllocaInst(ArrayType::get(I.getType(), num)
                                       /*, What is this parameter for?*/,
                                       "",
                                       funcEntry.getFirstInsertionPt());

Run Code Online (Sandbox Code Playgroud)

c++ llvm llvm-c++-api

Vem*_*ulo

lucky-day

5
推荐指数

1
解决办法

5290
查看次数

Fermi架构的虚假依赖关系问题

我正在尝试3使用3流来实现“-方式重叠”，如CUDA流和并发网络研讨会中的示例所示。但是我做不到。

我有Geforce GT 550M（带有一个复制引擎的费米架构），并且我正在使用Windows 7（64位）。

这是我编写的代码。

#include <iostream>

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

// includes, project
#include "helper_cuda.h"
#include "helper_functions.h" // helper utility functions 

#include <stdio.h>

using namespace std;

#define DATA_SIZE 6000000
#define NUM_THREADS 32
#define NUM_BLOCKS 16
#define NUM_STREAMS 3

__global__ void kernel(const int *in, int *out, int dataSize)
{
    int start = blockIdx.x * blockDim.x + threadIdx.x;
    int end =  dataSize;
    for (int i = start; i < end; i += blockDim.x * …

Run Code Online (Sandbox Code Playgroud)

cuda nsight

Vem*_*ulo

2014 07-17

2
推荐指数

1
解决办法

442
查看次数