仅在Objective-C中使用内联汇编的ROL/ROR变量

Question

仅在Objective-C中使用内联汇编的ROL/ROR变量

几天前,我问了下面的问题.因为我需要快速回答,我补充说:

代码不需要使用内联汇编.但是,我还没有找到使用Objective-C/C++/C指令的方法.

今天,我想学点东西.所以我再次提出问题,使用内联汇编寻找答案.

我想在Objective-C程序中对变量执行ROR和ROL操作.但是,我无法管理它 - 我不是装配专家.

这是我到目前为止所做的:

uint8_t v1 = ....;
uint8_t v2 = ....; // v2 is either 1, 2, 3, 4 or 5

asm("ROR v1, v2");

Run Code Online (Sandbox Code Playgroud)

我得到的错误是:

未知使用具有未知大小后缀的指令助记符

我怎样才能解决这个问题？

Answer 1

CRD*_*CRD 5

旋转只是两个班次 - 一些位留下,其他位置正确 - 一旦你看到这个旋转很容易没有装配.某些编译器识别该模式,并使用旋转指令进行编译.请参阅维基百科以获取代码.

更新:x86-64上的Xcode 4.6.2(其他未测试)编译双移+或旋转32和64位操作数,对于8和16位操作数,双移+或保留.为什么？也许编译器理解这些指令的性能,也许只是没有优化 - 但一般来说,如果你可以避免汇编程序这样做,编译器总是最了解!还可以使用static inline函数或使用以与标准宏相同的方式定义的宏MAX(宏具有适应其操作数类型的优点),可用于内联操作.

OP评论后的附录

这里以i86_64汇编程序为例,详细介绍了如何在此处使用asm构造.

首先是非汇编版本:

static inline uint32 rotl32_i64(uint32 value, unsigned shift)
{
   // assume shift is in range 0..31 or subtraction would be wrong
   // however we know the compiler will spot the pattern and replace
   // the expression with a single roll and there will be no subtraction
   // so if the compiler changes this may break without:
   //    shift &= 0x1f;
   return (value << shift) | (value >> (32 - shift));
}

void test_rotl32(uint32 value, unsigned shift)
{
   uint32 shifted = rotl32_i64(value, shift);

   NSLog(@"%8x <<< %u -> %8x", value & 0xFFFFFFFF, shift, shifted & 0xFFFFFFFF);
}

Run Code Online (Sandbox Code Playgroud)

如果你在Xcode(产品>生成输出>组件文件,然后在弹出菜单中选择Profiling作为窗口底部)查看汇编器输出以进行性能分析(以便优化器启动),您将看到rotl32_i64内联输出test_rotl32并编译为rotate(roll)指令.

现在直接生成汇编程序比FrankH表示的ARM代码更复杂.这是因为要将变量移位值作为特定寄存器,cl必须使用,因此我们需要为编译器提供足够的信息来做到这一点.开始:

static inline uint32 rotl32_i64_asm(uint32 value, unsigned shift)
{
   // i64 - shift must be in register cl so create a register local assigned to cl
   // no need to mask as i64 will do that
   register uint8 cl asm ( "cl" ) = shift;
   uint32 shifted;
   // emit the rotate left long
   // %n values are replaced by args:
   //    0: "=r" (shifted) - any register (r), result(=), store in var (shifted)
   //    1: "0" (value) - *same* register as %0 (0), load from var (value)
   //    2: "r" (cl) - any register (r), load from var (cl - which is the cl register so this one is used)
   __asm__ ("roll %2,%0" : "=r" (shifted) : "0" (value), "r" (cl));
   return shifted;
}

Run Code Online (Sandbox Code Playgroud)

更改test_rotl32为调用rotl32_i64_asm并再次检查程序集输出 - 它应该是相同的,即编译器和我们一样.

进一步注意,如果包含注释掉的掩蔽线,rotl32_i64它实际上变成了rotl32- 编译器将为任何架构做正确的事情,所有这些都是andi64版本中单个指令的成本.

那么asm你是否需要它,使用它可能会有所涉及,编译器总是会自己做得好或者更好......

HTH

归档时间：	13 年，1 月前
查看次数：	3074 次
最近记录：	13 年，1 月前