atl*_*ste 10 llvm calling-convention c++11 llvm-ir
我正在尝试将LLVM IR中的方法调用回C++代码.我正在使用64位Visual C++,或者正如LLVM所描述的那样:
Machine CPU: skylake
Machine info: x86_64-pc-windows-msvc
Run Code Online (Sandbox Code Playgroud)
对于整数类型和指针类型,我的代码工作正常.但是,浮点数似乎有点奇怪.
基本上这个电话看起来像这样:
struct SomeStruct
{
static void Breakpoint( return; } // used to set a breakpoint
static void Set(uint8_t* ptr, double foo) { return foo * 2; }
};
Run Code Online (Sandbox Code Playgroud)
和LLVM IR看起来像这样:
define i32 @main(i32, i8**) {
varinit:
// omitted here: initialize %ptr from i8**.
%5 = load i8*, i8** %instance0
// call to some method. This works - I use it to set a breakpoint
call void @"Helper::Breakpoint"(i8* %5)
// this call fails:
call void @"Helper::Set"(i8* %5, double 0xC19EC46965A6494D)
ret i32 0
}
declare double @"SomeStruct::Callback"(i8*, double)
Run Code Online (Sandbox Code Playgroud)
我认为问题可能与调用约定的工作方式有关.所以我试图做一些调整来纠正这个问题:
// during initialization of the function
auto function = llvm::Function::Create(functionType, llvm::Function::ExternalLinkage, name, module);
function->setCallingConv(llvm::CallingConv::X86_64_Win64);
...
// during calling of the function
call->setCallingConv(llvm::CallingConv::X86_64_Win64);
Run Code Online (Sandbox Code Playgroud)
不幸的是,无论我尝试什么,我最终都会遇到"无效指令"错误,这个用户报告这是一个调用约定的问题:Clang使用非法指令生成可执行文件.我已经尝试过X86-64_Win64,Stdcall,Fastcall和没有调用约定规范 - 都具有相同的结果.
我已经阅读了https://msdn.microsoft.com/en-us/library/ms235286.aspx,试图弄清楚发生了什么.然后我查看了应该由LLVM生成的程序集输出(使用targetMachine-> addPassesToEmitFile API调用)并找到:
movq (%rdx), %rsi
movq %rsi, %rcx
callq "Helper2<double>::Breakpoint"
vmovsd __real@c19ec46965a6494d(%rip), %xmm1
movq %rsi, %rcx
callq "Helper2<double>::Set"
xorl %eax, %eax
addq $32, %rsp
popq %rsi
Run Code Online (Sandbox Code Playgroud)
根据MSDN,参数2应该在%xmm1中,所以看起来也是正确的.但是,在检查调试器中是否所有内容都有效时,Visual Studio会报告很多问号(例如"非法指令").
任何反馈都表示赞赏.
反汇编代码:
00000144F2480007 48 B8 B6 48 B8 C8 FA 7F 00 00 mov rax,7FFAC8B848B6h
00000144F2480011 48 89 D1 mov rcx,rdx
00000144F2480014 48 89 54 24 20 mov qword ptr [rsp+20h],rdx
00000144F2480019 FF D0 call rax
00000144F248001B 48 B8 C0 48 B8 C8 FA 7F 00 00 mov rax,7FFAC8B848C0h
00000144F2480025 48 B9 00 00 47 F2 44 01 00 00 mov rcx,144F2470000h
00000144F248002F ?? ?? ??
00000144F2480030 ?? ?? ??
00000144F2480031 FF 08 dec dword ptr [rax]
00000144F2480033 10 09 adc byte ptr [rcx],cl
00000144F2480035 48 8B 4C 24 20 mov rcx,qword ptr [rsp+20h]
00000144F248003A FF D0 call rax
00000144F248003C 31 C0 xor eax,eax
00000144F248003E 48 83 C4 28 add rsp,28h
00000144F2480042 C3 ret
Run Code Online (Sandbox Code Playgroud)
缺少有关内存的一些信息.内存视图:
0x00000144F248001B 48 b8 c0 48 b8 c8 fa 7f 00 00 48 b9 00 00 47 f2 44 01 00 00 62 f1 ff 08 10 09 48 8b 4c 24 20 ff d0 31 c0 48 83 c4 28 c3 00 00 00 00 00 ...
这里遗漏的问号是:'62 f1'.
一些代码有助于了解我如何让JIT编译等等.我担心它有点长,但有助于理解......我不知道如何创建一小段代码.
// Note: FunctionBinderBase basically holds an llvm::Function* object
// which is bound using the above code and a name.
llvm::ExecutionEngine* Module::Compile(std::unordered_map<std::string, FunctionBinderBase*>& externalFunctions)
{
// DebugFlag = true;
#if (LLVMDEBUG >= 1)
this->module->dump();
#endif
// -- Initialize LLVM compiler: --
std::string error;
// Helper function, gets the current machine triplet.
llvm::Triple triple(MachineContextInfo::Triplet());
const llvm::Target *target = llvm::TargetRegistry::lookupTarget("x86-64", triple, error);
if (!target)
{
throw error.c_str();
}
llvm::TargetOptions Options;
// Options.PrintMachineCode = true;
// Options.EnableFastISel = true;
std::unique_ptr<llvm::TargetMachine> targetMachine(
target->createTargetMachine(MachineContextInfo::Triplet(), MachineContextInfo::CPU(), "", Options, llvm::Reloc::Default, llvm::CodeModel::Default, llvm::CodeGenOpt::Aggressive));
if (!targetMachine.get())
{
throw "Could not allocate target machine!";
}
// Create the target machine; set the module data layout to the correct values.
auto DL = targetMachine->createDataLayout();
module->setDataLayout(DL);
module->setTargetTriple(MachineContextInfo::Triplet());
// Pass manager builder:
llvm::PassManagerBuilder pmbuilder;
pmbuilder.OptLevel = 3;
pmbuilder.BBVectorize = false;
pmbuilder.SLPVectorize = true;
pmbuilder.LoopVectorize = true;
pmbuilder.Inliner = llvm::createFunctionInliningPass(3, 2);
llvm::TargetLibraryInfoImpl *TLI = new llvm::TargetLibraryInfoImpl(triple);
pmbuilder.LibraryInfo = TLI;
// Generate pass managers:
// 1. Function pass manager:
llvm::legacy::FunctionPassManager FPM(module.get());
pmbuilder.populateFunctionPassManager(FPM);
// 2. Module pass manager:
llvm::legacy::PassManager PM;
PM.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
pmbuilder.populateModulePassManager(PM);
// 3. Execute passes:
// - Per-function passes:
FPM.doInitialization();
for (llvm::Module::iterator I = module->begin(), E = module->end(); I != E; ++I)
{
if (!I->isDeclaration())
{
FPM.run(*I);
}
}
FPM.doFinalization();
// - Per-module passes:
PM.run(*module);
// Fix function pointers; the PM.run will ruin them, this fixes that.
for (auto it : externalFunctions)
{
auto name = it.first;
auto fcn = module->getFunction(name);
it.second->function = fcn;
}
#if (LLVMDEBUG >= 2)
// -- ASSEMBLER dump code
// 3. Code generation pass manager:
llvm::legacy::PassManager CGP;
CGP.add(llvm::createTargetTransformInfoWrapperPass(targetMachine->getTargetIRAnalysis()));
pmbuilder.populateModulePassManager(CGP);
std::string result;
llvm::raw_string_ostream str(result);
llvm::buffer_ostream os(str);
targetMachine->addPassesToEmitFile(CGP, os, llvm::TargetMachine::CodeGenFileType::CGFT_AssemblyFile);
CGP.run(*module);
str.flush();
auto stringref = os.str();
std::string assembly(stringref.begin(), stringref.end());
std::cout << "ASM code: " << std::endl << "---------------------" << std::endl << assembly << std::endl << "---------------------" << std::endl;
// -- end of ASSEMBLER dump code.
for (auto it : externalFunctions)
{
auto name = it.first;
auto fcn = module->getFunction(name);
it.second->function = fcn;
}
#endif
#if (LLVMDEBUG >= 2)
module->dump();
#endif
// All done, *RUN*.
llvm::EngineBuilder engineBuilder(std::move(module));
engineBuilder.setEngineKind(llvm::EngineKind::JIT);
engineBuilder.setMCPU(MachineContextInfo::CPU());
engineBuilder.setMArch("x86-64");
engineBuilder.setUseOrcMCJITReplacement(false);
engineBuilder.setOptLevel(llvm::CodeGenOpt::None);
llvm::ExecutionEngine* engine = engineBuilder.create();
// Define external functions
for (auto it : externalFunctions)
{
auto fcn = it.second;
if (fcn->function)
{
engine->addGlobalMapping(fcn->function, const_cast<void*>(fcn->FunctionPointer())); // Yuck... LLVM only takes non-const pointers
}
}
// Finalize
engine->finalizeObject();
return engine;
}
Run Code Online (Sandbox Code Playgroud)
更新(进度)
显然我的Skylake有vmovsd指令的问题.在Haswell(服务器)上运行相同的代码时,测试成功.我已经检查了两者的装配输出 - 它们完全相同.
只是为了确定:XSAVE/XRESTORE不应该是Win10-x64上的问题,但不管怎样我们都会发现.我检查的代码的特征从https://msdn.microsoft.com/en-us/library/hskdteyh.aspx从和XSAVE/XRESTORE https://insufficientlycomplicated.wordpress.com/2011/11/07/detection-intel-advanced-vector-extensions-avx-in-visual-studio /.后者运行得很好.至于前者,这些是结果:
GenuineIntel
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
3DNOW not supported
3DNOWEXT not supported
ABM not supported
ADX supported
AES supported
AVX supported
AVX2 supported
AVX512CD not supported
AVX512ER not supported
AVX512F not supported
AVX512PF not supported
BMI1 supported
BMI2 supported
CLFSH supported
CMPXCHG16B supported
CX8 supported
ERMS supported
F16C supported
FMA supported
FSGSBASE supported
FXSR supported
HLE supported
INVPCID supported
LAHF supported
LZCNT supported
MMX supported
MMXEXT not supported
MONITOR supported
MOVBE supported
MSR supported
OSXSAVE supported
PCLMULQDQ supported
POPCNT supported
PREFETCHWT1 not supported
RDRAND supported
RDSEED supported
RDTSCP supported
RTM supported
SEP supported
SHA not supported
SSE supported
SSE2 supported
SSE3 supported
SSE4.1 supported
SSE4.2 supported
SSE4a not supported
SSSE3 supported
SYSCALL supported
TBM not supported
XOP not supported
XSAVE supported
Run Code Online (Sandbox Code Playgroud)
这很奇怪,所以我想:为什么不直接发出指令.
int main()
{
const double value = 1.2;
const double value2 = 1.3;
auto x1 = _mm_load_sd(&value);
auto x2 = _mm_load_sd(&value2);
std::string s;
std::getline(std::cin, s);
}
Run Code Online (Sandbox Code Playgroud)
这段代码运行正常.拆卸:
auto x1 = _mm_load_sd(&value);
00007FF7C4833724 C5 FB 10 45 08 vmovsd xmm0,qword ptr [value]
auto x1 = _mm_load_sd(&value);
00007FF7C4833729 C5 F1 57 C9 vxorpd xmm1,xmm1,xmm1
00007FF7C483372D C5 F3 10 C0 vmovsd xmm0,xmm1,xmm0
Run Code Online (Sandbox Code Playgroud)
显然它不会使用寄存器xmm1,但仍然证明指令本身可以解决问题.
我刚刚检查了另一个 Intel Haswell 的情况,发现了这一点:
0000015077F20110 C5 FB 10 08 vmovsd xmm1,qword ptr [rax]
Run Code Online (Sandbox Code Playgroud)
显然,在 Intel Haswell 上它发出的字节码指令比我的 Skylake 上的要多。
@哈。实际上他很友善地为我指明了正确的方向。是的,隐藏字节确实表示 VMOVSD,但显然它被编码为 EVEX。这一切都很好,但是 EVEX 前缀/编码将作为 AVX512 的一部分引入到最新的 Skylake 架构中,直到 2017 年 Skylake Purley 才会得到支持。换句话说,这是一条无效指令。
为了检查,我在 中放置了一个断点X86MCCodeEmitter::EmitMemModRMByte。在某些时候,我确实看到了bool HasEVEX = [...]对真实的评估。这证实了代码生成器/发射器正在产生错误的输出。
因此,我的结论是,这一定是 Skylake CPU 的 LLVM 目标信息中的错误。这意味着只剩下两件事要做:找出这个错误在 LLVM 中的确切位置,这样我们就可以解决这个问题并将错误报告给 LLVM 团队......
那么LLVM中它在哪里呢?这很难说... x86.td.def 将 skylake 功能定义为“FeatureAVX512”,这可能会触发 X86SSELevel 到 AVX512F。这反过来会给出错误的指令。作为解决方法,最好简单地告诉 LLVM 我们有一个 Intel Haswell,一切都会好起来的:
// MCPU is used to call createTargetMachine
llvm::StringRef MCPU = llvm::sys::getHostCPUName();
if (MCPU.str() == "skylake")
{
MCPU = llvm::StringRef("haswell");
}
Run Code Online (Sandbox Code Playgroud)
测试,有效。