dAE*_*527 1 c c++ optimization assembly hex
这是我当前的代码:
//Input:hex string , 1234ABCDEEFF0505DDCC ....
//Output?BYTE stream
void HexString2Hex(/*IN*/ char* hexstring, /*OUT*/ BYTE* hexBuff)
{
for (int i = 0; i < strlen(hexstring); i += 2)
{
BYTE val = 0;
if (hexstring[i] < 'A')
val += 0x10 * (hexstring[i] - '0');
else
val += 0xA0 + 0x10 * (hexstring[i] - 'A');
if (hexstring[i+1] < 'A')
val += hexstring[i + 1] - '0';
else
val += 0xA + hexstring[i + 1] - 'A';
hexBuff[i / 2] = val;
}
}
Run Code Online (Sandbox Code Playgroud)
问题是:当输入的十六进制字符串非常大(比如长度为 1000000)时,这个函数需要一百秒,这对我来说是不可接受的。(CPU:i7-8700,3.2GHz。内存:32G)
那么,是否有任何替代算法可以更快地完成工作?
谢谢你们
编辑 1:感谢稻田的评论。我太粗心了,没有注意到 strlen(time:O(n)) 被执行了数百次。所以我原来的函数是 O(n*n) 这太可怕了。
更新后的代码如下:
int len=strlen(hexstring);
for (int i = 0; i < len; i += 2)
Run Code Online (Sandbox Code Playgroud)
而且,对于 Emanuel P 的建议,我尝试过,但似乎不太好。下面是我的代码
map<string, BYTE> by_map;
Run Code Online (Sandbox Code Playgroud)
//init table (map here)
char *xx1 = "0123456789ABCDEF";
for (int i = 0; i < 16;i++)
{
for (int j = 0; j < 16; j++)
{
_tmp[0] = xx1[i];
_tmp[1] = xx1[j];
BYTE val = 0;
if (xx1[i] < 'A')
val += 0x10 * (xx1[i] - '0');
else
val += 0xA0 + 0x10 * (xx1[i] - 'A');
if (xx1[j] < 'A')
val += xx1[j] - '0';
else
val += 0xA + xx1[j] - 'A';
by_map.insert(map<string, BYTE>::value_type(_tmp, val));
}
}
Run Code Online (Sandbox Code Playgroud)
//search map
void HexString2Hex2(char* hexstring, BYTE* hexBuff)
{
char _tmp[3] = { 0 };
for (int i = 0; i < strlen(hexstring); i += 2)
{
_tmp[0] = hexstring[i];
_tmp[1] = hexstring[i + 1];
//DWORD dw = 0;
//sscanf(_tmp, "%02X", &dw);
hexBuff[i / 2] = by_map[_tmp];
}
}
Run Code Online (Sandbox Code Playgroud)
Edit2:事实上,当我修复 strlen 错误时,我的问题就解决了。下面是我的最终代码:
void HexString2Bytes(/*IN*/ char* hexstr, /*OUT*/ BYTE* dst)
{
static uint_fast8_t LOOKUP[256];
for (int i = 0; i < 10; i++)
{
LOOKUP['0' + i] = i;
}
for (int i = 0; i < 6; i++)
{
LOOKUP['A' + i] = 0xA + i;
}
for (size_t i = 0; hexstr[i] != '\0'; i += 2)
{
*dst = LOOKUP[hexstr[i]] << 4 |
LOOKUP[hexstr[i + 1]];
dst++;
}
}
Run Code Online (Sandbox Code Playgroud)
顺便说一句,真诚地感谢你们。你太棒了!真正的研究人员!
创建最高效代码的标准方法(以 RAM/ROM 为代价)是使用查找表。像这样的东西:
static const uint_fast8_t LOOKUP [256] =
{
['0'] = 0x0, ['1'] = 0x1, ['2'] = 0x2, ['3'] = 0x3,
['4'] = 0x4, ['5'] = 0x5, ['6'] = 0x6, ['7'] = 0x7,
['8'] = 0x8, ['9'] = 0x9, ['A'] = 0xA, ['B'] = 0xB,
['C'] = 0xC, ['D'] = 0xD, ['E'] = 0xE, ['F'] = 0xF,
};
Run Code Online (Sandbox Code Playgroud)
这牺牲了 256 字节的只读内存,因此我们不必进行任何形式的算术运算。如果uint_fast8_t编译器认为这有助于提高性能,则它可以让编译器选择更大的类型。
完整的代码将是这样的:
void hexstr_to_bytes (const char* restrict hexstr, uint8_t* restrict dst)
{
static const uint_fast8_t LOOKUP [256] =
{
['0'] = 0x0, ['1'] = 0x1, ['2'] = 0x2, ['3'] = 0x3,
['4'] = 0x4, ['5'] = 0x5, ['6'] = 0x6, ['7'] = 0x7,
['8'] = 0x8, ['9'] = 0x9, ['A'] = 0xA, ['B'] = 0xB,
['C'] = 0xC, ['D'] = 0xD, ['E'] = 0xE, ['F'] = 0xF,
};
for(size_t i=0; hexstr[i]!='\0'; i+=2)
{
*dst = LOOKUP[ hexstr[i ] ] << 4 |
LOOKUP[ hexstr[i+1] ];
dst++;
}
}
Run Code Online (Sandbox Code Playgroud)
在 x86_64 ( Godbolt )上测试时,这可以归结为大约 10 条指令。除循环条件外,无分支。值得注意的是,没有任何错误检查,因此您必须确保其他地方的数据正常(并且包含偶数个半字节)。
测试代码:
#include <stdio.h>
#include <stdint.h>
void hexstr_to_bytes (const char* restrict hexstr, uint8_t* restrict dst)
{
static const uint_fast8_t LOOKUP [256] =
{
['0'] = 0x0, ['1'] = 0x1, ['2'] = 0x2, ['3'] = 0x3,
['4'] = 0x4, ['5'] = 0x5, ['6'] = 0x6, ['7'] = 0x7,
['8'] = 0x8, ['9'] = 0x9, ['A'] = 0xA, ['B'] = 0xB,
['C'] = 0xC, ['D'] = 0xD, ['E'] = 0xE, ['F'] = 0xF,
};
for(size_t i=0; hexstr[i]!='\0'; i+=2)
{
*dst = LOOKUP[ hexstr[i ] ] << 4 |
LOOKUP[ hexstr[i+1] ];
dst++;
}
}
int main (void)
{
const char hexstr[] = "DEADBEEFC0FFEE";
uint8_t bytes [(sizeof hexstr - 1)/2];
hexstr_to_bytes(hexstr, bytes);
for(size_t i=0; i<sizeof bytes; i++)
{
printf("%.2X ", bytes[i]);
}
}
Run Code Online (Sandbox Code Playgroud)