Rem*_*emz 482 c++ abi llvm undefined-behavior llvm-codegen
我知道C++ 中的"未定义行为"几乎可以让编译器做任何想做的事情.但是,我遇到了让我感到惊讶的崩溃,因为我认为代码足够安全.
在这种情况下,真正的问题仅发生在使用特定编译器的特定平台上,并且仅在启用了优化时才发生.
我尝试了几件事来重现问题并将其简化到最大程度.这是一个名为的函数的摘录Serialize,它将获取bool参数,并将字符串true或复制false到现有的目标缓冲区.
如果bool参数是未初始化的值,那么这个函数是否会在代码审查中,没有办法告诉它实际上可能会崩溃?
// Zero-filled global buffer of 16 characters
char destBuffer[16];
void Serialize(bool boolValue) {
// Determine which string to print based on boolValue
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
const size_t len = strlen(whichString);
// Copy string into destination buffer, which is zero-filled (thus already null-terminated)
memcpy(destBuffer, whichString, len);
}
Run Code Online (Sandbox Code Playgroud)
如果使用clang 5.0.0 +优化执行此代码,它将/可能崩溃.
boolValue ? "true" : "false"对于我来说,预期的三元运算符看起来足够安全,我假设,"无论垃圾价值是多少boolValue都没关系,因为它无论如何都会评估为真或假."
我已经设置了一个Compiler Explorer示例,它显示了反汇编中的问题,这里是完整的示例.注意:为了重现问题,我发现有效的组合是使用Clang 5.0.0和-O2优化.
#include <iostream>
#include <cstring>
// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
bool uninitializedBool;
__attribute__ ((noinline)) // Note: the constructor must be declared noinline to trigger the problem
FStruct() {};
};
char destBuffer[16];
// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
// Determine which string to print depending if 'boolValue' is evaluated as true or false
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
size_t len = strlen(whichString);
memcpy(destBuffer, whichString, len);
}
int main()
{
// Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
FStruct structInstance;
// Output "true" or "false" to stdout
Serialize(structInstance.uninitializedBool);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
这个问题的产生是因为优化的:它是足够聪明的推断,"真"与"假"只长1.因此,而不是真正的计算长度不同的字符串,它使用布尔本身的价值,这应该技术上可以是0或1,并且如下所示:
const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue; // clang clever optimization
Run Code Online (Sandbox Code Playgroud)
虽然这很"聪明",但可以这么说,我的问题是:C++标准是否允许编译器假设bool只能有一个内部数字表示'0'或'1'并以这种方式使用它?
或者这是实现定义的情况,在这种情况下,实现假定它的所有bool只包含0或1,而任何其他值是未定义的行为区域?
Pet*_*des 275
But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t. Even though that's required to be a fixed-layout type with no trap representations.
It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.
You're compiling for the x86-64 System V ABI, which specifies that a bool as a function arg in a register is represented by the bit-patterns false=0 and true=1 in the low 8 bits of the register1. In memory, bool is a 1-byte type that again must have an integer value of 0 or 1.
(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)
ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool, for any architecture (not just x86). It allows optimizations like !mybool with xor eax,1 to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b to a bitwise AND for bool types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.
In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)
The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString) to
5U - boolValue. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpyas stores of immediate data2.)
Or the compiler could have created a table of pointers and indexed it with the integer value of the bool, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)
Your __attribute((noinline)) constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool. It made space for the object in main with push rax (which is smaller and for various reason about as efficient as sub rsp, 8), so whatever garbage was in AL on entry to main is the value it used for uninitializedBool. This is why you actually got values that weren't just 0.
5U - random garbage can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.
Other implementations could make different choices, e.g. false=0 and true=any non-zero value. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.
ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool. (e.g. by memcpying the bool into unsigned char, which you're allowed to do because char* can alias anything. And unsigned char is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)
You've partially "hidden" the UB on this execution path from the compiler with noinline. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{} definition so all translation units must have the same definition. Like with the inline keyword.)
So a compiler could emit just a ret or ud2 (illegal instruction) as the definition for main, because the path of execution starting at the top of main unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)
Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if() branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.
GCC and Clang in practice do actually sometimes emit ud2 on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void function, gcc will sometimes omit a ret instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.
Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t* be a problem? Because alignof(uint16_t) == 2, and violating that assumption led to a segfault when auto-vectorizing with SSE2.
See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.
bool.Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv.)
Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.
Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN doesn't optimize a<0 as always-false, only that tmp is always negative. (Because INT_MIN + a=INT_MAX is negative on this 2's complement target, and a can't be any higher than that.)
So gcc/clang don't currently backtrack to derive range info for the inputs of a calculation, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is optimization is intentionally "missed" in the name of user-friendliness or what.
Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128) for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *) does unaligned loads by taking a misaligned __m128i* arg, not a void* or char*. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?
GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.
There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.
Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.
(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)
For example, a caller might have calculated a & 0x01010101 in RDI and used it for something else, before calling bool_func(a&1). The caller could optimize away the &1 because it already did that to the low byte as part of and edi, 0x01010101, and it knows the callee is required to ignore the high bytes.
Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem] instead of movzx edx, [mem], saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10] instead of movzx edi, byte [r10], because both require a REX prefix anyway.
This is why clang emits movzx eax, dil in Serialize, instead of sub eax, edi. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
So I was interested to see that it doesn't do the same thing for bool.)
Footnote 2: After branching, you'd just have a 4-byte mov-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.
OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7): block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.
If inlining, you could use 2x mov-immediate + cmov and a conditional offset, or you could leave the string data in memory.
Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb might be optimal. glibc memcpy might start using rep movsb for small sizes on CPUs with that feature, saving a lot of branching.
In gcc and clang, you can compile with -fsanitize=undefined to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).
See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2 in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory changes the generated asm, and might result in a check for this.)
It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.
MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).
It should work for this case because the call to glibc memcpy with a length calculated from uninitialized memory will (inside the library) result in a branch based on length. If it had inlined a fully branchless version that just used cmov, indexing, and two stores, it might not have worked.
Valgrind's memcheck will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.
Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.
ric*_*ici 56
允许编译器假定作为参数传递的布尔值是有效的布尔值(即已初始化或转换为true或的值false).该true值不必与整数1相同 - 实际上,可以有各种表示true和false- 但参数必须是这两个值之一的一些有效表示,其中"有效表示"是实现 -定义.
因此,如果您未能初始化a bool,或者如果您通过某种不同类型的指针成功覆盖它,则编译器的假设将是错误的,并且随后会出现未定义的行为.你被警告过:
50)以本国际标准描述的方式将bool值用作"未定义",例如通过检查未初始化的自动对象的值,可能会使其表现为既不是真也不是假.(§6.9.1第6段脚注,基本类型)
M.M*_*M.M 51
函数本身是正确的,但在测试程序中,调用函数的语句通过使用未初始化变量的值导致未定义的行为.
该错误在调用函数中,可以通过代码检查或调用函数的静态分析来检测.使用编译器资源管理器链接,gcc 8.2编译器会检测错误.(也许你可以提交针对clang的bug报告,它没有发现问题).
未定义的行为意味着任何事情都可能发生,其中包括程序在触发未定义行为的事件之后崩溃几行.
NB.答案"未定义的行为会导致_____吗?" 总是"是".这就是未定义行为的定义.
Bar*_*mar 23
一个bool只允许保存的值true或false,并且将所生成的代码可以假定它将只保持这两个值中的一个.在赋值中为三元生成的代码可以使用该值作为指向两个字符串的指针数组的索引,即它可能转换为类似的内容:
// the compile could make asm that "looks" like this, from your source
const static char *strings[] = {"false", "true"};
const char *whichString = strings[boolValue];
Run Code Online (Sandbox Code Playgroud)
如果0未初始化,它实际上可以保存任何整数值,这将导致访问false数组边界之外.
Tom*_*ner 15
总结你的问题很多,你问的是C++标准是否允许编译器假设一个bool只能有内部数字表示'0'或'1'并以这种方式使用它?
标准没有说明a的内部表示bool.它只定义铸造时会发生什么情况bool到int(反之亦然).大多数情况下,由于这些完整的转换(以及人们非常依赖它们的事实),编译器将使用0和1,但它不必(尽管它必须遵守它使用的任何较低级别ABI的约束) ).
因此,编译器在看到a时bool有权认为所述bool包含true'或' false'位模式中的任何一种并做任何感觉.因此,如果对值true和false为1和0,分别,编译器确实允许优化strlen到5 - <boolean value>.其他有趣的行为是可能的!
正如在此重复陈述的那样,未定义的行为具有未定义的结果.包括但不仅限于
| 归档时间: |
|
| 查看次数: |
31363 次 |
| 最近记录: |