如何找到编译器优化的内容？

Question

如何找到编译器优化的内容？

bie*_*000 1 embedded assembly gcc g++ compiler-optimization

我的代码有问题。如果我用 -O0 或 -Og 编译它，它似乎工作正常。

但是，如果我使用任何其他标志，如 -Os、-O1 等，则它不起作用。如何找到编译器优化的内容？

编译器 arm-none-eabi-g++ 8.3.1

源代码：https : //github.com/bielu000/stm32-libopencm3/tree/uart_not_working_version

我添加了到 repo 的链接，因为很难将整个代码放在这里。主要代码：src/app/src

我认为问题出在 server_run 函数中。

看看屏幕。

在左侧（它有效）->

优化：-O1,
属性((optimize("-O0"))) void server_run();

在右侧（它不起作用）->

优化：-O1,
void server_run();

我没有看到任何在优化（右）版本中获取缓冲区容量的调用。但为什么？

函数体

extern "C" {
  #include <libopencm3/stm32/usart.h> 
  #include <libopencm3/stm32/gpio.h>
  #include <libopencm3/stm32/rcc.h>
  #include <libopencm3/cm3/nvic.h>
}

#include <server.hpp>
#include <stdint.h>
#include <ring_buffer.hpp>
#include <Os.hpp>
#include <Timer.hpp>
#include <target.h>

static uint8_t w_buffer[1024]; // write buffer
static uint8_t r_buffer[1024]; // read buffer

utils::containers::RingBuffer write_rb{w_buffer, sizeof(w_buffer)};
utils::containers::RingBuffer read_rb{r_buffer, sizeof(r_buffer)};

static void sendData()
{
  if (write_rb.capacity() != 0)
  {
    usart_send(USART1, write_rb.read());
    usart_enable_tx_interrupt(USART1);
  }
  else 
  {
    usart_disable_tx_interrupt(USART1);
  }
}

static void readData()
{
  auto data = usart_recv(USART1);
  read_rb.write(static_cast<uint8_t>(data));
}

void server_init()
{
  //RCC
  rcc_periph_clock_enable(RCC_USART1);

  //GPIO
  gpio_set_mode(GPIO_BANK_USART1_TX, GPIO_MODE_OUTPUT_50_MHZ, 
    GPIO_CNF_OUTPUT_ALTFN_PUSHPULL, GPIO_USART1_TX);

  gpio_set_mode(GPIO_BANK_USART1_RX, GPIO_MODE_INPUT, 
    GPIO_CNF_OUTPUT_ALTFN_OPENDRAIN, GPIO_USART1_RX);

  //USART
  usart_set_mode(USART1, USART_MODE_TX_RX);
  usart_set_baudrate(USART1, 9600);
  usart_set_parity(USART1, USART_PARITY_NONE);
  usart_set_databits(USART1, 8);
  usart_set_stopbits(USART1, 1);
  usart_set_flow_control(USART1, USART_FLOWCONTROL_NONE);
  usart_enable_rx_interrupt(USART1);

  //ISR
  nvic_enable_irq(NVIC_USART1_IRQ);
  
  //Enable 
  usart_enable(USART1);
}

void server_run()
{
  while(true)
  {
    size_t xsize = read_rb.capacity();
    if (xsize >= 64)
    {
      while (read_rb.capacity() != 0)
      {
        write_rb.write(read_rb.read());
      }
      sendData();
    }
  }
}

void usart1_isr()
{
  if (usart_get_flag(USART1, USART_FLAG_TXE) != 0) 
  {
    sendData();
  }

  if (usart_get_flag(USART1, USART_FLAG_RXNE) != 0) // when data is ready to read
  {
    readData(); 
  }
}

Run Code Online (Sandbox Code Playgroud)

更新：

我将xsize变量类型更改为

std::atomic<size_t> xsize = read_rb.capacity();

现在它甚至可以与 -Os 一起使用。但为什么？

Answer 1

Pet*_*des 5

通常，仅在禁用优化的情况下工作的代码是未定义行为的标志：允许编译器做出您违反的假设。它并不总是在禁用优化的情况下利用这些假设，例如，每个变量都被视为volatile如此严格别名和数据竞争 UB 很少是禁用优化的问题。例如，忘记使用atomic<T>共享变量通常只会导致优化问题，除非您使用的是像++. MCU 编程 - C++ O2 优化在循环时中断

显然编译时带有完整警告 ( -Wall -Wextra); 编译时可见的 UB 经常会被编译器注意到并警告，特别是在它放弃并假设代码路径无法访问的情况下，因为它遇到了 UB，甚至不沿该路径发出返回指令。

但是，如果您真的想要回答您提出的字面问题（优化/优化的内容），而不是隐式调试问题：

对这样一个通用问题的唯一答案是一个非常通用的答案：比较编译器的 asm 输出，或者编译器对程序逻辑的内部表示。

比较 asm 文本输出很困难，因为寄存器分配选择上的一个微小差异可以使整个函数中的每条指令使用不同的寄存器。

因此，更好的选择可能是让 GCC 打印出它的代码的 GIMPLE 表示，这是它用于表示大多数优化程序的形式。在某些阶段，它甚至可以将其转储回类似 C 的形式。

例如，对于这个例子，我不确定它是否展示了任何关于优化的有趣之处，除了z被优化掉（恒定传播）：

int foo(int x) {
    int z = 1;
    int y = x * 2 + z;
    return y;
}

Run Code Online (Sandbox Code Playgroud)

与-O1来自ARM上Godbolt（无）GCC 8.3.1（其中有一个GIMPLE树查看器），我们得到这个ASM输出

foo(int):
        lsl     r0, r0, #1
        add     r0, r0, #1
        bx      lr

Run Code Online (Sandbox Code Playgroud)

优化通过 232t.optimized 后的 GIMPLE 输出：

;; Function foo (_Z3fooi, funcdef_no=0, decl_uid=4625, cgraph_uid=0, symbol_order=0)

foo (int x)
{
  int y;
  int _1;

  <bb 2> [local count: 1073741825]:
  # DEBUG BEGIN_STMT
  # DEBUG z => 1
  # DEBUG BEGIN_STMT
  _1 = x_2(D) * 2;
  y_3 = _1 + 1;
  # DEBUG y => y_3
  # DEBUG BEGIN_STMT
  return y_3;

}

Run Code Online (Sandbox Code Playgroud)

优化后的这个RTL输出通过312r.final：

;; Function foo (_Z3fooi, funcdef_no=0, decl_uid=4625, cgraph_uid=0, symbol_order=0)



foo

Dataflow summary:
;;  invalidated by call      0 [r0] 1 [r1] 2 [r2] 3 [r3] 12 [ip] 14 [lr] 15 [pc] 16 [s0] 17 [s1] 18 [s2] 19 [s3] 20 [s4] 21 [s5] 22 [s6] 23 [s7] 24 [s8] 25 [s9] 26 [s10] 27 [s11] 28 [s12] 29 [s13] 30 [s14] 31 [s15] 32 [s16] 33 [s17] 34 [s18] 35 [s19] 36 [s20] 37 [s21] 38 [s22] 39 [s23] 40 [s24] 41 [s25] 42 [s26] 43 [s27] 44 [s28] 45 [s29] 46 [s30] 47 [s31] 48 [d16] 49 [?16] 50 [d17] 51 [?17] 52 [d18] 53 [?18] 54 [d19] 55 [?19] 56 [d20] 57 [?20] 58 [d21] 59 [?21] 60 [d22] 61 [?22] 62 [d23] 63 [?23] 64 [d24] 65 [?24] 66 [d25] 67 [?25] 68 [d26] 69 [?26] 70 [d27] 71 [?27] 72 [d28] 73 [?28] 74 [d29] 75 [?29] 76 [d30] 77 [?30] 78 [d31] 79 [?31] 80 [wr0] 81 [wr1] 82 [wr2] 83 [wr3] 84 [wr4] 85 [wr5] 86 [wr6] 87 [wr7] 88 [wr8] 89 [wr9] 90 [wr10] 91 [wr11] 92 [wr12] 93 [wr13] 94 [wr14] 95 [wr15] 96 [wcgr0] 97 [wcgr1] 98 [wcgr2] 99 [wcgr3] 100 [cc] 101 [vfpcc]
;;  hardware regs used   13 [sp]
;;  regular block artificial uses    13 [sp]
;;  eh block artificial uses     13 [sp] 103 [afp]
;;  entry block defs     0 [r0] 1 [r1] 2 [r2] 3 [r3] 13 [sp] 14 [lr]
;;  exit block uses      0 [r0] 13 [sp] 14 [lr]
;;  regs ever live   0 [r0]
;;  ref usage   r0={3d,4u} r1={1d} r2={1d} r3={1d} r13={1d,2u} r14={1d,1u} 
;;    total ref usage 15{8d,7u,0e} in 4{4 regular + 0 call} insns.
(note 1 0 28 NOTE_INSN_DELETED)
(note 28 1 4 (var_location x (reg:SI 0 r0 [ x ])) NOTE_INSN_VAR_LOCATION)
(note 4 28 21 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 21 4 2 NOTE_INSN_PROLOGUE_END)
(note 2 21 3 NOTE_INSN_DELETED)
(note 3 2 25 NOTE_INSN_FUNCTION_BEG)
(note 25 3 29 ./example.cpp:2 NOTE_INSN_BEGIN_STMT)
(note 29 25 26 (var_location z (const_int 1 [0x1])) NOTE_INSN_VAR_LOCATION)
(note 26 29 30 ./example.cpp:3 NOTE_INSN_BEGIN_STMT)
(note 30 26 27 (var_location y (plus:SI (ashift:SI (reg:SI 0 r0 [ x ])
        (const_int 1 [0x1]))
    (const_int 1 [0x1]))) NOTE_INSN_VAR_LOCATION)
(note 27 30 11 ./example.cpp:4 NOTE_INSN_BEGIN_STMT)
(insn 11 27 31 (set (reg:SI 0 r0 [114])
        (ashift:SI (reg:SI 0 r0 [ x ])
            (const_int 1 [0x1]))) "./example.cpp":3 129 {*arm_shiftsi3}
     (nil))
(note 31 11 32 (var_location y (plus:SI (ashift:SI (entry_value:SI (reg:SI 0 r0 [ x ]))
        (const_int 1 [0x1]))
    (const_int 1 [0x1]))) NOTE_INSN_VAR_LOCATION)
(note 32 31 12 (var_location x (entry_value:SI (reg:SI 0 r0 [ x ]))) NOTE_INSN_VAR_LOCATION)
(note 12 32 17 NOTE_INSN_DELETED)
(insn 17 12 33 (set (reg/i:SI 0 r0)
        (plus:SI (reg:SI 0 r0 [114])
            (const_int 1 [0x1]))) "./example.cpp":5 4 {*arm_addsi3}
     (nil))
(note 33 17 18 (var_location y (reg/i:SI 0 r0)) NOTE_INSN_VAR_LOCATION)
(insn 18 33 22 (use (reg/i:SI 0 r0)) "./example.cpp":5 -1
     (nil))
(note 22 18 23 NOTE_INSN_EPILOGUE_BEG)
(jump_insn 23 22 24 (return) "./example.cpp":5 220 {*arm_return}
     (nil)
 -> return)
(barrier 24 23 20)
(note 20 24 0 NOTE_INSN_DELETED)

Run Code Online (Sandbox Code Playgroud)

如果您想真正了解“GCC 优化了哪些内容”，您最好复习一下 GIMPLE 和/或 RTL。（GCC 内部手册：https : //gcc.gnu.org/onlinedocs/gccint/GIMPLE.html）

我不会用 GIMPLE 和 RTL 输出来-O0混淆答案，但您可以（我认为）在 Godbolt 上设置 2 个编译器窗格，以便您可以区分它们。

`size_t xsize` 是一个本地变量，其地址未被占用，因此显然它不是共享变量，也不需要是 `atomic<size_t>`，因为它不直接被多个线程或中断处理程序访问。您的更改只是出于某种原因隐藏了您的错误，可能是通过使存储和加载速度变慢，并且可能使某些竞争条件不会发生，或者因为它限制了周围代码的优化。这个答案的要点是“不”随机将事物更改为“atomic<T>”。您应该假设您的代码仍然存在错误，并且仍然可能因任何不相关的更改而随时中断。 (2认同)
@bielu000：如果您想询问有关实际代码的特定调试问题，请发布不同的问题。当然，您需要将其减少为 [mcve] 以使其适合 Stack Overflow。很少有人有兴趣为您调试整个项目。并且发布难以阅读的反汇编“图像”并没有帮助，特别是当源代码隐藏在将来可能会腐烂的站外链接后面时。您的编辑使这个问题成为一个更糟糕的 Stack Overflow 问题。 (2认同)

归档时间：	5 年，5 月前
查看次数：	153 次
最近记录：	5 年，5 月前