Bar*_*rry 25 c++ gcc templates string-literals
我想写一些带字符串文字的函数 - 只有一个字符串文字:
template <size_t N>
void foo(const char (&str)[N]);
不幸的是,这太过于扩展并且会匹配任何数组char- 无论它是否是真正的字符串文字.虽然在编译时无法区分它们之间的区别 - 而不必在运行时求助调用者包装文字/数组 - 但这两个数组将在内存中完全不同的位置:
foo("Hello"); // at 0x400f81
const char msg[] = {'1', '2', '3'};
foo(msg); // at 0x7fff3552767f
有没有办法知道字符串数据在内存中的位置,以便我至少可以assert使该函数仅采用字符串文字?(使用gcc 4.7.3,但实际上任何编译器的解决方案都会很棒).
Mik*_*han 13
您似乎假设"真正的字符串文字"的必要特征是编译器将其烘焙到可执行文件的静态存储中.
事实并非如此.C和C++标准保证字符串文字应具有静态存储持续时间,因此它必须在程序的生命周期中存在,但如果编译器可以在不将文字放在静态存储中的情况下进行排列,则可以自由地执行此操作,有些编译器有时会这样做.
但是,很明显,对于给定的字符串文字,您要测试的属性是否实际上是静态存储.而且由于它不需要 在静态存储中,因此只要语言标准保证,就不能仅仅依靠便携式C/C++来解决您的问题.
给定的字符串文字实际上是否在静态存储中是一个问题,即字符串文字的地址是否位于其中一个地址范围内,该地址范围被分配给符合静态存储的链接部分 ,在您的特定工具链的命名中,你的程序是由该工具链构建的.
所以我建议的解决方案是让你的程序知道它自己的连接部分的地址范围,这些部分有资格作为 静态存储,然后它可以通过明显的代码测试给定的字符串文字是否在静态存储中.
下面是一个玩具C++项目解决方案的例子,该项目prog 
使用GNU/Linux x86_64工具链构建(C++ 98或更高版本将会这样做,而且这种方法对于C来说只是略微繁琐).在这个设置中,我们以ELF格式链接,我们认为静态存储的链接部分是.bss(0初始化静态数据),.rodata 
(只读静态静态)和.data(读/写静态数据).
这是我们的源文件:
section_bounds.h
#ifndef SECTION_BOUNDS_H
#define SECTION_BOUNDS_H
// Export delimiting values for our `.bss`, `.rodata` and `.data` sections
extern unsigned long const section_bss_start;
extern unsigned long const section_bss_size;
extern unsigned long const section_bss_end;
extern unsigned long const section_rodata_start;
extern unsigned long const section_rodata_size;
extern unsigned long const section_rodata_end;
extern unsigned long const section_data_start;
extern unsigned long const section_data_size;
extern unsigned long const section_data_end;
#endif
section_bounds.cpp
// Assign either placeholder or pre-defined values to 
// the section delimiting globals.
#ifndef BSS_START
#define BSS_START 0x0
#endif
#ifndef BSS_SIZE
#define BSS_SIZE 0xffff
#endif
#ifndef RODATA_START
#define RODATA_START 0x0
#endif
#ifndef RODATA_SIZE
#define RODATA_SIZE 0xffff
#endif
#ifndef DATA_START
#define DATA_START 0x0
#endif
#ifndef DATA_SIZE
#define DATA_SIZE 0xffff
#endif
extern unsigned long const 
    section_bss_start = BSS_START;
extern unsigned long const section_bss_size = BSS_SIZE;
extern unsigned long const 
    section_bss_end = section_bss_start + section_bss_size;
extern unsigned long const 
    section_rodata_start = RODATA_START;
extern unsigned long const 
    section_rodata_size = RODATA_SIZE;
extern unsigned long const 
    section_rodata_end = section_rodata_start + section_rodata_size;
extern unsigned long const 
    section_data_start = DATA_START;
extern unsigned long const 
    section_data_size = DATA_SIZE;
extern unsigned long const 
    section_data_end = section_data_start + section_data_size;
cstr_storage_triage.h
#ifndef CSTR_STORAGE_TRIAGE_H
#define CSTR_STORAGE_TRIAGE_H
// Classify the storage type addressed by `s` and print it on `cout`
extern void cstr_storage_triage(const char *s);
#endif
cstr_storage_triage.cpp
#include "cstr_storage_triage.h"
#include "section_bounds.h"
#include <iostream>
using namespace std;
void cstr_storage_triage(const char *s)
{
    unsigned long addr = (unsigned long)s;
    cout << "When s = " << (void*)s << " -> \"" << s << '\"' << endl;
    if (addr >= section_bss_start && addr < section_bss_end) {
        cout << "then s is in static 0-initialized data\n";
    } else if (addr >= section_rodata_start && addr < section_rodata_end) {
        cout << "then s is in static read-only data\n";     
    } else if (addr >= section_data_start && addr < section_data_end){
        cout << "then s is in static read/write data\n";
    } else {
        cout << "then s is on the stack/heap\n";
    }       
}
main.cpp中
// Demonstrate storage classification of various arrays of char 
#include "cstr_storage_triage.h"
static char in_bss[1];
static char const * in_rodata = "In static read-only data";
static char in_rwdata[] = "In static read/write data";  
int main()
{
    char on_stack[] = "On stack";
    cstr_storage_triage(in_bss);
    cstr_storage_triage(in_rodata);
    cstr_storage_triage(in_rwdata);
    cstr_storage_triage(on_stack);
    cstr_storage_triage("Where am I?");
    return 0;
}
这是我们的makefile:
.PHONY: all clean
SRCS = main.cpp cstr_storage_triage.cpp section_bounds.cpp 
OBJS = $(SRCS:.cpp=.o)
TARG = prog
MAP_FILE = $(TARG).map
ifdef AGAIN
BSS_BOUNDS := $(shell grep -m 1 '^\.bss ' $(MAP_FILE))
BSS_START := $(word 2,$(BSS_BOUNDS))
BSS_SIZE := $(word 3,$(BSS_BOUNDS))
RODATA_BOUNDS := $(shell grep -m 1 '^\.rodata ' $(MAP_FILE))
RODATA_START := $(word 2,$(RODATA_BOUNDS))
RODATA_SIZE := $(word 3,$(RODATA_BOUNDS))
DATA_BOUNDS := $(shell grep -m 1 '^\.data ' $(MAP_FILE))
DATA_START := $(word 2,$(DATA_BOUNDS))
DATA_SIZE := $(word 3,$(DATA_BOUNDS))
CPPFLAGS += \
    -DBSS_START=$(BSS_START) \
    -DBSS_SIZE=$(BSS_SIZE) \
    -DRODATA_START=$(RODATA_START) \
    -DRODATA_SIZE=$(RODATA_SIZE) \
    -DDATA_START=$(DATA_START) \
    -DDATA_SIZE=$(DATA_SIZE)
endif
all: $(TARG)
clean:
    rm -f $(OBJS) $(MAP_FILE) $(TARG)
ifndef AGAIN
$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp
$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1
else
$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
endif
这是make看起来像:
$ make
g++    -c -o main.o main.cpp
g++    -c -o cstr_storage_triage.o cstr_storage_triage.cpp
g++    -c -o section_bounds.o section_bounds.cpp
g++ -o prog  -Wl,-Map=prog.map main.o cstr_storage_triage.o section_bounds.o 
touch section_bounds.cpp
make AGAIN=1
make[1]: Entering directory `/home/imk/develop/SO/string_lit_only'
g++  -DBSS_START=0x00000000006020c0 -DBSS_SIZE=0x118 -DRODATA_START=0x0000000000400bf0
 -DRODATA_SIZE=0x120 -DDATA_START=0x0000000000602070 -DDATA_SIZE=0x3a
  -c -o section_bounds.o section_bounds.cpp
g++ -o prog  main.o cstr_storage_triage.o section_bounds.o
最后,做了什么prog:
$ ./prog
When s = 0x6021d1 -> ""
then s is in static 0-initialized data
When s = 0x400bf4 -> "In static read-only data"
then s is in static read-only data
When s = 0x602090 -> "In static read/write data"
then s is in static read/write data
When s = 0x7fffa1b053a0 -> "On stack"
then s is on the stack/heap
When s = 0x400c0d -> "Where am I?"
then s is in static read-only data
如果这显然是如何工作的,你不需要再读.
即使在我们知道静态存储部分的地址和大小之前,程序也会编译和链接.它也需要,不是吗??在这种情况下,section_*应该保存这些值的全局变量都使用占位符值构建.
什么时候make运行,食谱:
$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1
和
$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp
是有效的,因为AGAIN是未定义的.他们告诉他们make为了构建prog它必须首先prog根据第二个配方构建链接器映射文件,然后重新设置时间戳section_bounds.cpp.之后,
 make是使用AGAINdefined = 1 再次调用自身.
使用AGAINdefined 再次执行makefile make现在发现它必须计算所有变量:
BSS_BOUNDS
BSS_START
BSS_SIZE
RODATA_BOUNDS
RODATA_START
RODATA_SIZE
DATA_BOUNDS
DATA_START
DATA_SIZE
对于每个静态存储部分S,它S_BOUNDS通过grepping链接器映射文件来计算报告地址和大小的行S.从该行开始,它将第二个单词(=部分地址)S_START和第三个单词(=部分的大小)分配给S_SIZE.然后附加所有部分分隔值,通过-D选项CPPFLAGS
将其自动传递给编译.
因为AGAIN已定义,$(TARG)现在的操作配方是惯例:
$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
但我们触及section_bounds.cpp了父母make; 因此必须重新编译,因此prog必须重新链接.这次,
 section_bounds.cpp编译时,所有的分区分隔宏:
BSS_START
BSS_SIZE
RODATA_START
RODATA_SIZE
DATA_START
DATA_SIZE
将具有预定义的值,并且不会假设其占位符值.
并且这些预定义值将是正确的,因为第二个链接不向链接添加符号并且不删除任何符号,并且不会更改任何符号的大小或存储类.它只是为第一个链接中存在的符号指定不同的值.因此,静态存储部分的地址和大小将保持不变,现在您的程序已知.
根据您的实际需求,这可能对您不起作用:
#include <cstdlib>
template <size_t N>
void foo(const char (&str)[N]) {}
template <char> struct check_literal {};
#define foo(arg) foo((check_literal<arg[0]>(),arg))    
int main()
{
    // This compiles
    foo("abc");
    // This does not
    static const char abc[] = "abc";
    foo(abc);
}
仅适用于g ++和clang ++ -std=c++11。