C++标准是否允许未初始化的bool使程序崩溃？

Question

C++标准是否允许未初始化的bool使程序崩溃？

Rem*_*emz 482 c++ abi llvm undefined-behavior llvm-codegen

我知道C++ 中的"未定义行为"几乎可以让编译器做任何想做的事情.但是,我遇到了让我感到惊讶的崩溃,因为我认为代码足够安全.

在这种情况下,真正的问题仅发生在使用特定编译器的特定平台上,并且仅在启用了优化时才发生.

我尝试了几件事来重现问题并将其简化到最大程度.这是一个名为的函数的摘录Serialize,它将获取bool参数,并将字符串true或复制false到现有的目标缓冲区.

如果bool参数是未初始化的值,那么这个函数是否会在代码审查中,没有办法告诉它实际上可能会崩溃？

// Zero-filled global buffer of 16 characters
char destBuffer[16];

void Serialize(bool boolValue) {
    // Determine which string to print based on boolValue
    const char* whichString = boolValue ? "true" : "false";

    // Compute the length of the string we selected
    const size_t len = strlen(whichString);

    // Copy string into destination buffer, which is zero-filled (thus already null-terminated)
    memcpy(destBuffer, whichString, len);
}

Run Code Online (Sandbox Code Playgroud)

如果使用clang 5.0.0 +优化执行此代码,它将/可能崩溃.

boolValue ? "true" : "false"对于我来说,预期的三元运算符看起来足够安全,我假设,"无论垃圾价值是多少boolValue都没关系,因为它无论如何都会评估为真或假."

我已经设置了一个Compiler Explorer示例,它显示了反汇编中的问题,这里是完整的示例.注意:为了重现问题,我发现有效的组合是使用Clang 5.0.0和-O2优化.

#include <iostream>
#include <cstring>

// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
    bool uninitializedBool;

   __attribute__ ((noinline))  // Note: the constructor must be declared noinline to trigger the problem
   FStruct() {};
};

char destBuffer[16];

// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
    // Determine which string to print depending if 'boolValue' is evaluated as true or false
    const char* whichString = boolValue ? "true" : "false";

    // Compute the length of the string we selected
    size_t len = strlen(whichString);

    memcpy(destBuffer, whichString, len);
}

int main()
{
    // Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
    FStruct structInstance;

    // Output "true" or "false" to stdout
    Serialize(structInstance.uninitializedBool);
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

这个问题的产生是因为优化的:它是足够聪明的推断,"真"与"假"只长1.因此,而不是真正的计算长度不同的字符串,它使用布尔本身的价值,这应该技术上可以是0或1,并且如下所示:

const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue;       // clang clever optimization

Run Code Online (Sandbox Code Playgroud)

虽然这很"聪明",但可以这么说,我的问题是:C++标准是否允许编译器假设bool只能有一个内部数字表示'0'或'1'并以这种方式使用它？

或者这是实现定义的情况,在这种情况下,实现假定它的所有bool只包含0或1,而任何其他值是未定义的行为区域？

Answer 1

Pet*_*des 275

Yes, ISO C++ allows (but doesn't require) implementations to make this choice.

But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t. Even though that's required to be a fixed-layout type with no trap representations.

It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.

You're compiling for the x86-64 System V ABI, which specifies that a bool as a function arg in a register is represented by the bit-patterns false=0 and true=1 in the low 8 bits of the register¹. In memory, bool is a 1-byte type that again must have an integer value of 0 or 1.

(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)

ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool, for any architecture (not just x86). It allows optimizations like !mybool with xor eax,1 to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b to a bitwise AND for bool types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.

In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)

The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString) to
5U - boolValue. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpyas stores of immediate data².)

Or the compiler could have created a table of pointers and indexed it with the integer value of the bool, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)

Your __attribute((noinline)) constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool. It made space for the object in main with push rax (which is smaller and for various reason about as efficient as sub rsp, 8), so whatever garbage was in AL on entry to main is the value it used for uninitializedBool. This is why you actually got values that weren't just 0.

5U - random garbage can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.

Other implementations could make different choices, e.g. false=0 and true=any non-zero value. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.

ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool. (e.g. by memcpying the bool into unsigned char, which you're allowed to do because char* can alias anything. And unsigned char is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)

You've partially "hidden" the UB on this execution path from the compiler with noinline. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{} definition so all translation units must have the same definition. Like with the inline keyword.)

So a compiler could emit just a ret or ud2 (illegal instruction) as the definition for main, because the path of execution starting at the top of main unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)

Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if() branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.

GCC and Clang in practice do actually sometimes emit ud2 on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void function, gcc will sometimes omit a ret instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.

Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t* be a problem? Because alignof(uint16_t) == 2, and violating that assumption led to a segfault when auto-vectorizing with SSE2.

See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.

Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for `bool`.

Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv.)

Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.

Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN doesn't optimize a<0 as always-false, only that tmp is always negative. (Because INT_MIN + a=INT_MAX is negative on this 2's complement target, and a can't be any higher than that.)

So gcc/clang don't currently backtrack to derive range info for the inputs of a calculation, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is optimization is intentionally "missed" in the name of user-friendliness or what.

Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128) for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *) does unaligned loads by taking a misaligned __m128i* arg, not a void* or char*. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?

GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.

There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.

Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.

(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)

For example, a caller might have calculated a & 0x01010101 in RDI and used it for something else, before calling bool_func(a&1). The caller could optimize away the &1 because it already did that to the low byte as part of and edi, 0x01010101, and it knows the callee is required to ignore the high bytes.

Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem] instead of movzx edx, [mem], saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10] instead of movzx edi, byte [r10], because both require a REX prefix anyway.

This is why clang emits movzx eax, dil in Serialize, instead of sub eax, edi. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI? So I was interested to see that it doesn't do the same thing for bool.)

Footnote 2: After branching, you'd just have a 4-byte mov-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.

OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7): block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.

If inlining, you could use 2x mov-immediate + cmov and a conditional offset, or you could leave the string data in memory.

Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb might be optimal. glibc memcpy might start using rep movsb for small sizes on CPUs with that feature, saving a lot of branching.

Tools for detecting UB and usage of uninitialized values

In gcc and clang, you can compile with -fsanitize=undefined to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).

See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2 in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory changes the generated asm, and might result in a check for this.)

It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.

MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).

It should work for this case because the call to glibc memcpy with a length calculated from uninitialized memory will (inside the library) result in a branch based on length. If it had inlined a fully branchless version that just used cmov, indexing, and two stores, it might not have worked.

Valgrind's memcheck will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.

Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.

此外,这也说明了_Why_ UB featurebug首先在C和C++语言的设计中引入:因为它给编译器_exactly_这种自由,现在允许最现代的编译器执行这些高质量的优化使C/C++成为如此高性能的中级语言. (10认同)
@The_Sympathizer:包含UB,允许实现以任何方式表现*对客户最有用*.它无意表明所有行为都应被视为同等有用. (4认同)
@VioletGiraffe - C++ 中“未定义行为”的定义非常宽松，以至于许多程序员有时都没有意识到。当然，我时不时就会遇到这种情况。我希望我能够像您显然总是那样严格按照 C++ 标准进行编程，但我不能。公平地说，我曾经共事过的人和我曾与顶尖人士共事过的人也不能。 (3认同)
我见过更糟糕的情况，其中变量取的值不在 8 位整数范围内，而只取整个 CPU 寄存器的值。Itanium 还有一个更糟糕的情况，使用未初始化的变量可能会彻底崩溃。 (2认同)
@Joshua：哦，对，好点，Itanium 的显式推测将使用“非数字”等价物标记寄存器值，以便使用值错误。 (2认同)
因此，C ++编译器作者和试图编写有用程序的C ++程序员之间的战争仍在继续。这个答案在回答这个问题时非常全面，也可以用作说服静态分析工具供应商的广告文案... (2认同)
@Joshua：与让编译器优化 `if (x > 300) FATAL_ERROR(); 相比，彻底崩溃是一件“好事”。否则 {foo[x]=23;}`; 通过删除条件检查，因为“x”“不能”超过 255，但随后允许代码覆盖任意存储，因为“x”实际上比这更大。 (2认同)
@Joshua：在某些实现中，许多形式的UB在设计上会以极高（有时为100％）的概率完全崩溃。可靠地捕获各种错误动作通常会严重影响运行时性能，但如果例如正在对公路桥梁进行荷载计算，则确保溢出不会导致程序产生错误结果的保证可能值得增加。执行时间，并且标准的作者不会希望禁止此类实施。 (2认同)
@davidbak：您是否提倡编写依赖于未初始化数据的程序？垃圾进垃圾出。承受性能损失来为无意义的错误程序提供有意义的（明确定义的）行为是没有意义的。 (2认同)

Answer 2

ric*_*ici 56

允许编译器假定作为参数传递的布尔值是有效的布尔值(即已初始化或转换为true或的值false).该true值不必与整数1相同 - 实际上,可以有各种表示true和false- 但参数必须是这两个值之一的一些有效表示,其中"有效表示"是实现 -定义.

因此,如果您未能初始化a bool,或者如果您通过某种不同类型的指针成功覆盖它,则编译器的假设将是错误的,并且随后会出现未定义的行为.你被警告过:

50)以本国际标准描述的方式将bool值用作"未定义",例如通过检查未初始化的自动对象的值,可能会使其表现为既不是真也不是假.(§6.9.1第6段脚注,基本类型)

"`true`值不必与整数1相同"有点误导.当然,实际的位模式*可能是*其他东西,但是当隐式转换/提升时(唯一的方法是你看到除了'true` /`false`以外的值),[`true`总是`1`,和`false`总是'0`](https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions).当然,这样的编译器也无法使用这个编译器试图使用的技巧(使用`bool的实际位模式只能是'0`或'1`这一事实),所以它与OP的问题. (11认同)
@shadowranger:我的观点是实施是负责的.如果它将"true"的有效表示限制为位模式"1",那就是它的特权.如果它选择了其他一组表示,那么它确实无法使用此处提到的优化.如果它确实选择了特定的表示,那么它就可以.它只需要内部一致.你*可以通过将它复制到一个字节数组来检查`bool`的表示形式; 那不是UB(但它是实现定义的) (7认同)
关于*未定义行为*的观点是允许编译器得出更多关于它的结论,例如假设从未采用导致访问未初始化值的代码路径,因为确保这正是责任.程序员.因此,不仅仅是低水平值可能不同于零或一的可能性. (5认同)
@ShadowRanger您始终可以直接检查对象表示. (4认同)
是的,优化编译器(即实际的C++实现)通常有时会发出依赖于具有"0"或"1"位模式的`bool`的代码.每次从内存中读取它时,它们都不会重新布置"bool"(或者是一个包含函数arg的寄存器).这就是这个答案所说的.[例子](/sf/answers/3307150241/):gcc4.7 +可以在返回`bool`的函数中优化`return a || b`到`或eax,edi`,或者MSVC可以优化` a&b`到`test cl,dl`.x86的`test`是*bitwise*`和`,所以如果`cl = 1`和`dl = 2`,test会根据`cl&dl = 0`设置标志. (3认同)
@supercat我的意思是：太多的程序员可能会认为，可能发生的最糟糕的事情是，未初始化的布尔值可能不同于两个合法值。但是UB的影响可以是任意的。例如，当您具有`if（condition1）foo = expression;时；/ * foo的唯一初始化* / if（condition2）bar（foo）; / * foo * /`的唯一用法，编译器可能会假设`condition2`暗含`condition1`，而无需证明。在没有其他副作用的情况下，它可以将其转换为`if（condition2）bar（expression）;`; 它甚至可以在后续代码中使用该假设。 (3认同)
@burnsba:C和C++都没有提供任何运行时机制来测试未初始化的值.如果没有硬件支持(这种情况不常见,至少可以说),任何此类机制都会产生很大的成本.静态分析也不能总是捕获错误,但是视觉检查会向您显示未在其声明中初始化的变量.如果您始终提供初始化程序,则不会遇到此特定问题. (2认同)
@BurnsBA:一些实现(包括gcc和clang)可以添加运行时检测来检测在编译时并不总能检测到的某些形式的UB.例如`gcc -fsanitize = undefined -O3 foo.c`.请参阅https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/.**要查找未初始化数据的用法,请在clang/LLVM中使用Address Sanitizer和Memory Sanitizer.https://github.com/google/sanitizers/wiki/MemorySanitizer显示了检测未初始化的内存读取的示例.** (2认同)

Answer 3

M.M*_*M.M 51

函数本身是正确的,但在测试程序中,调用函数的语句通过使用未初始化变量的值导致未定义的行为.

该错误在调用函数中,可以通过代码检查或调用函数的静态分析来检测.使用编译器资源管理器链接,gcc 8.2编译器会检测错误.(也许你可以提交针对clang的bug报告,它没有发现问题).

未定义的行为意味着任何事情都可能发生,其中包括程序在触发未定义行为的事件之后崩溃几行.

NB.答案"未定义的行为会导致_____吗？" 总是"是".这就是未定义行为的定义.

@JoshuaGreen见[dcl.init]/12"如果评估产生了一个不确定的值,则行为是不确定的,除非在下列情况下:"(并且这些情况都没有"bool"的例外).复制需要评估来源 (10认同)
@JoshuaGreen其原因在于,如果您访问某些类型的某些无效值,您可能会有一个触发硬件故障的平台.这些有时被称为"陷阱表示". (8认同)
Itanium虽然模糊不清,但它仍处于生产阶段,具有陷阱值,并且至少有两个半现代C++编译器(Intel/HP).它实际上对布尔值有"真","假"和"非对物"的价值. (6认同)
另一方面,"标准是否要求所有编译器以某种方式处理某些东西"的答案通常是"不",甚至/特别是在显然任何质量编译器应该这样做的情况下; 事情越明显,标准的作者就越不需要实际说出来. (3认同)
第一个条款是真的吗？只是_copying_一个未初始化的`bool`触发UB？ (2认同)

Answer 4

Bar*_*mar 23

一个bool只允许保存的值true或false,并且将所生成的代码可以假定它将只保持这两个值中的一个.在赋值中为三元生成的代码可以使用该值作为指向两个字符串的指针数组的索引,即它可能转换为类似的内容:

// the compile could make asm that "looks" like this, from your source
const static char *strings[] = {"false", "true"};
const char *whichString = strings[boolValue];

Run Code Online (Sandbox Code Playgroud)

如果0未初始化,它实际上可以保存任何整数值,这将导致访问false数组边界之外.

@Remz我只是使用数组来显示生成的代码可能等同于什么,而不是建议任何人实际写出它. (3认同)
@Havenard，“int”可能比“bool”大，所以这不能证明任何事情。 (2认同)
@MSalters：`std :: bitset <8>`并没有给我所有不同标志的好名字。取决于它们是什么，这可能很重要。 (2认同)

Answer 5

Tom*_*ner 15

总结你的问题很多,你问的是C++标准是否允许编译器假设一个bool只能有内部数字表示'0'或'1'并以这种方式使用它？

标准没有说明a的内部表示bool.它只定义铸造时会发生什么情况bool到int(反之亦然).大多数情况下,由于这些完整的转换(以及人们非常依赖它们的事实),编译器将使用0和1,但它不必(尽管它必须遵守它使用的任何较低级别ABI的约束) ).

因此,编译器在看到a时bool有权认为所述bool包含true'或' false'位模式中的任何一种并做任何感觉.因此,如果对值true和false为1和0,分别,编译器确实允许优化strlen到5 - <boolean value>.其他有趣的行为是可能的!

正如在此重复陈述的那样,未定义的行为具有未定义的结果.包括但不仅限于

您的代码按预期工作
您的代码随机失败
您的代码根本没有运行.

请参阅每个程序员应该了解的未定义行为

归档时间：	7 年前
查看次数：	31363 次
最近记录：	6 年，3 月前

C++标准是否允许未初始化的bool使程序崩溃？

Yes, ISO C++ allows (but doesn't require) implementations to make this choice.

Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool.

Tools for detecting UB and usage of uninitialized values

Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for `bool`.