Dro*_* K. 34 c undefined-behavior language-lawyer flexible-array-member
通过在结构类型中使用灵活的数组成员(FAM),我们是否将程序暴露给未定义行为的可能性?
程序是否可以使用FAM并且仍然是严格符合的程序?
灵活数组成员的偏移量是否需要位于结构的末尾?
这些问题适用于C99 (TC3)和C11 (TC1).
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
int main(void) {
struct s {
size_t len;
char pad;
int array[];
};
struct s *s = malloc(sizeof *s + sizeof *s->array);
printf("sizeof *s: %zu\n", sizeof *s);
printf("offsetof(struct s, array): %zu\n", offsetof(struct s, array));
s->array[0] = 0;
s->len = 1;
printf("%d\n", s->array[0]);
free(s);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
输出:
sizeof *s: 16
offsetof(struct s, array): 12
0
Run Code Online (Sandbox Code Playgroud)
Dro*_* K. 26
是.使用FAM的常见约定使我们的程序暴露于未定义行为的可能性.话虽如此,我并不知道任何现有的符合规定的实施方案都会出现问题.
可能,但不太可能.即使我们实际上没有达到未定义的行为,我们仍然可能无法严格遵守.
号的FAM的偏移是在该结构的端部并不是必需的,它可以覆盖任何尾随填充字节.
答案适用于C99 (TC3)和C11 (TC1).
FAM最初是在C99(TC0)(1999年12月)中引入的,它们的原始规范要求FAM的偏移位于结构的末尾.原始规范定义明确,因此不会导致未定义的行为或严格一致性的问题.
C99 (TC0) §6.7.2.1 p16 (1999年12月)
[本文档为官方标准,受版权保护,不可免费提供]
问题是常见的C99实现(如GCC)不符合标准的要求,并允许FAM覆盖任何尾随填充字节.他们的方法被认为更有效,并且因为他们遵循标准的要求 - 会导致向后兼容性,委员会选择更改规范,从C99 TC2(2004年11月)起不再需要标准FAM的偏移量在结构的末尾.
C99 (TC2) §6.7.2.1 p16 (2004年11月)
[...]结构的大小就像柔性阵列构件被省略一样,除了它可能具有比遗漏意味着更多的拖尾填充.
新规范删除了需要FAM的偏移量位于结构末尾的语句,并且它引入了一个非常不幸的结果,因为标准赋予实现自由,不保留结构中任何填充字节的值或工会处于一致的状态.进一步来说:
C99 (TC3) §6.2.6.1 p6
当值存储在结构或联合类型的对象中(包括在成员对象中)时,对应于任何填充字节的对象表示的字节采用未指定的值.
这意味着如果我们的任何FAM元素对应(或覆盖)任何尾随填充字节,则在存储到结构的成员时(它们)可以采用未指定的值.我们甚至不需要思考这是否适用于存储在FAM本身的值,即使严格解释这仅适用于FAM以外的其他成员,也是有害的.
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
int main(void) {
struct s {
size_t len;
char pad;
int array[];
};
struct s *s = malloc(sizeof *s + sizeof *s->array);
if (sizeof *s > offsetof(struct s, array)) {
s->array[0] = 123;
s->len = 1; /* any padding bytes take unspecified values */
printf("%d\n", s->array[0]); /* indeterminate value */
}
free(s);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
一旦我们存储到结构的成员,填充字节采用未指定的字节,因此任何关于对应于任何尾随填充字节的FAM元素的值的假设现在是假的.这意味着任何假设都会导致我们严格遵守规定.
虽然填充字节的值是"未指定的值",但是对于受它们影响的类型不能说同样的,因为基于未指定值的对象表示可以生成陷阱表示.因此,描述这两种可能性的唯一标准术语是"不确定的价值".如果FAM的类型碰巧有陷阱表示,那么访问它不仅仅是未指定值的问题,而是未定义的行为.
但等等,还有更多.如果我们同意描述这种价值的唯一标准术语是"不确定的价值",那么即使FAM的类型没有陷阱表示,我们也会达到未定义的行为,因为C的官方解释标准委员会将不确定的值传递给标准库函数是未定义的行为.
Jon*_*ler 19
这是一个很长的答案,涉及一个棘手的话题.
关键问题是误解了C99和C11标准中§6.2.16的含义,并且不适当地将其应用于简单的整数赋值,例如:
fam_ptr->nonfam_member = 23;
Run Code Online (Sandbox Code Playgroud)
该分配不允许改变任何填充字节中指出在由结构fam_ptr.因此,基于可以改变结构中的填充字节的假设的分析是错误的.
原则上,我并不十分关注C99标准及其更正; 它们不是现行标准.然而,柔性阵列成员规范的发展是有益的.
The C99 standard — ISO/IEC 9899:1999 — had 3 technical corrigenda:
It was TC3, for example, that stated that gets() was obsolescent and
deprecated, leading to it being removed from the C11 standard.
The C11 standard — ISO/IEC 9899:2011 — has one technical
corrigendum, but that simply sets the value of two macros accidentally
left in the form 201ymmL — the values required for
__STDC_VERSION__ and __STDC_LIB_EXT1__ were corrected to the value
201112L.
(You can see the TC1 — formally "ISO/IEC 9899:2011/Cor.1:2012(en)
Information technology — Programming languages — C TECHNICAL
CORRIGENDUM 1" — at
https://www.iso.org/obp/ui/#iso:std:iso-iec:9899:ed-3:v1:cor:1:v1:en.
I've not worked out how you get a download of it, but it is so simple
that it really doesn't matter very much.
ISO/IEC 9899:1999 (before TC2) §6.7.2.1 16:
As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. With two exceptions, the flexible array member is ignored. First, the size of the structure shall be equal to the offset of the last element of an otherwise identical structure that replaces the flexible array member with an array of unspecified length.106) Second, when a
.(or->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.126) The length is unspecified to allow for the fact that implementations may give array members different alignments according to their lengths.
(此脚注在重写时被删除.)原始C99标准包含一个示例:
17示例假设所有数组成员在声明后对齐相同:
Run Code Online (Sandbox Code Playgroud)struct s { int n; double d[]; }; struct ss { int n; double d[1]; };三个表达式:
Run Code Online (Sandbox Code Playgroud)sizeof (struct s) offsetof(struct s, d) offsetof(struct ss, d)具有相同的价值.结构struct具有灵活的阵列成员d.
18如果sizeof(double)为8,则执行以下代码后:
Run Code Online (Sandbox Code Playgroud)struct s *s1; struct s *s2; s1 = malloc(sizeof (struct s) + 64); s2 = malloc(sizeof (struct s) + 46);假设对malloc的调用成功,s1和s2指向的对象就像标识符声明为:
Run Code Online (Sandbox Code Playgroud)struct { int n; double d[8]; } *s1; struct { int n; double d[5]; } *s2;19进一步成功完成任务后:
Run Code Online (Sandbox Code Playgroud)s1 = malloc(sizeof (struct s) + 10); s2 = malloc(sizeof (struct s) + 6);然后他们表现得好像声明是:
Run Code Online (Sandbox Code Playgroud)struct { int n; double d[1]; } *s1, *s2;和:
Run Code Online (Sandbox Code Playgroud)double *dp; dp = &(s1->d[0]); // valid *dp = 42; // valid dp = &(s2->d[0]); // valid *dp = 42; // undefined behavior20任务:
Run Code Online (Sandbox Code Playgroud)*s1 = *s2;仅复制成员n而不复制任何数组元素.同理:
Run Code Online (Sandbox Code Playgroud)struct s t1 = { 0 }; // valid struct s t2 = { 2 }; // valid struct ss tt = { 1, { 4.2 }}; // valid struct s t3 = { 1, { 4.2 }}; // invalid: there is nothing for the 4.2 to initialize t1.n = 4; // valid t1.d[0] = 4.2; // undefined behavior
Some of this example material was removed in C11. The change was not noted (and did not need to be noted) in TC2 because the examples are not normative. But the rewritten material in C11 is informative when studied.
N983 from the WG14 Pre-Santa Cruz-2002 mailing is, I believe, the initial statement of a defect report. It states that some C compilers (citing three) manage to put a FAM before the padding at the end of a structure. The final defect report was DR 282.
As I understand it, this report led to the change in TC2, though I have not traced all the steps in the process. It appears that the DR is no longer available separately.
TC2 used the wording found in the C11 standard in the normative material.
So, what does the C11 standard have to say about flexible array members?
§6.7.2.1 Structure and union specifiers
3 A structure or union shall not contain a member with incomplete or function type (hence, a structure shall not contain an instance of itself, but may contain a pointer to an instance of itself), except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.
This firmly positions the FAM at the end of the structure — 'the last member' is by definition at the end of the structure, and this is confirmed by:
15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
17 There may be unnamed padding at the end of a structure or union.
18 As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a
.(or->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.
This paragraph contains the change in 20 of ISO/IEC 9899:1999/Cor.2:2004(E) — the TC2 for C99;
The data at the end of the main part of a structure containing a flexible array member is regular trailing padding that can occur with any structure type. Such padding can't be accessed legitimately, but can be passed to library functions etc via pointers to the structure without incurring undefined behaviour.
The C11 standard contains three examples, but the first and third are related to anonymous structures and unions rather than the mechanics of flexible array members. Remember, examples are not 'normative', but they are illustrative.
20 EXAMPLE 2 After the declaration:
Run Code Online (Sandbox Code Playgroud)struct s { int n; double d[]; };the structure
struct shas a flexible array memberd. A typical way to use this is:Run Code Online (Sandbox Code Playgroud)int m = /* some value */; struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));and assuming that the call to
mallocsucceeds, the object pointed to bypbehaves, for most purposes, as ifphad been declared as:Run Code Online (Sandbox Code Playgroud)struct { int n; double d[m]; } *p;(there are circumstances in which this equivalence is broken; in particular, the offsets of member
dmight not be the same).21 Following the above declaration:
Run Code Online (Sandbox Code Playgroud)struct s t1 = { 0 }; // valid struct s t2 = { 1, { 4.2 }}; // invalid t1.n = 4; // valid t1.d[0] = 4.2; // might be undefined behaviorThe initialization of
t2is invalid (and violates a constraint) becausestruct sis treated as if it did not contain memberd. The assignment tot1.d[0]is probably undefined behavior, but it is possible thatRun Code Online (Sandbox Code Playgroud)sizeof (struct s) >= offsetof(struct s, d) + sizeof (double)in which case the assignment would be legitimate. Nevertheless, it cannot appear in strictly conforming code.
22 After the further declaration:
Run Code Online (Sandbox Code Playgroud)struct ss { int n; };the expressions:
Run Code Online (Sandbox Code Playgroud)sizeof (struct s) >= sizeof (struct ss) sizeof (struct s) >= offsetof(struct s, d)are always equal to 1.
23 If
sizeof (double)is 8, then after the following code is executed:Run Code Online (Sandbox Code Playgroud)struct s *s1; struct s *s2; s1 = malloc(sizeof (struct s) + 64); s2 = malloc(sizeof (struct s) + 46);and assuming that the calls to
mallocsucceed, the objects pointed to bys1ands2behave, for most purposes, as if the identifiers had been declared as:Run Code Online (Sandbox Code Playgroud)struct { int n; double d[8]; } *s1; struct { int n; double d[5]; } *s2;24 Following the further successful assignments:
Run Code Online (Sandbox Code Playgroud)s1 = malloc(sizeof (struct s) + 10); s2 = malloc(sizeof (struct s) + 6);they then behave as if the declarations were:
Run Code Online (Sandbox Code Playgroud)struct { int n; double d[1]; } *s1, *s2;and:
Run Code Online (Sandbox Code Playgroud)double *dp; dp = &(s1->d[0]); // valid *dp = 42; // valid dp = &(s2->d[0]); // valid *dp = 42; // undefined behavior25 The assignment:
Run Code Online (Sandbox Code Playgroud)*s1 = *s2;only copies the member
n; if any of the array elements are within the firstsizeof (struct s)bytes of the structure, they might be copied or simply overwritten with indeterminate values.
Note that this changed between C99 and C11.
Another part of the standard describes this copying behaviour:
§6.2.6 Representation of types §6.2.6.1 General
6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
51) Thus, for example, structure assignment need not copy any padding bits.
In the C chat room, I wrote some information of which this is a paraphrase:
Consider:
struct fam1 { double d; char c; char fam[]; };
Run Code Online (Sandbox Code Playgroud)
Assuming double requires 8-byte alignment (or 4-byte; it doesn't matter
too much, but I'll stick with 8), then struct non_fam1a { double d; char c; }; would have 7 padding bytes after c and a size of 16.
Further, struct non_fam1b { double d; char c; char nonfam[4]; }; would
have 3 bytes padding after the nonfam array, and a size of 16.
The suggestion is that the start of fam in struct fam1 can be at offset
9, even though sizeof(struct fam1) is 16.
So that the bytes after c are not padding (necessarily).
So, for a small enough FAM, the size of the struct plus FAM might still
be less than size of struct fam.
The prototypical allocation is:
struct fam1 *fam = malloc(sizeof(struct fam1) + array_size * sizeof(char));
Run Code Online (Sandbox Code Playgroud)
when the FAM is of type char (as in struct fam1).
That's a (gross) over-estimate when the offset of fam is less than
sizeof(struct fam1).
There are macros out there for calculating the 'precise' required storage based on FAM offsets that are less than the size of the structure. Such as this one: https://gustedt.wordpress.com/2011/03/14/flexible-array-member/
The question asks:
- By using flexible array members (FAMs) within structure types, are we exposing our programs to the possibility of undefined behavior?
- Is it possible for a program to use FAMs and still be a strictly conforming program?
- Is the offset of the flexible array member required to be at the end of the struct?
The questions apply to both C99 (TC3) and C11 (TC1).
I believe that if you code correctly, the answers are "No", "Yes", "No and Yes, depending …".
Question 1
I am assuming that the intent of question 1 is "must your program inevitably be exposed to undefined behaviour if you use any FAM anywhere?" To state what I think is obvious: there are lots of ways of exposing a program to undefined behaviour (and some of those ways involve structures with flexible array members).
I do not think that simply using a FAM means that the program automatically has (invokes, is exposed to) undefined behaviour.
Question 2
Section §4 Conformance defines:
5 A strictly conforming program shall use only those features of the language and library specified in this International Standard.3) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.
3) A strictly conforming program can use conditional features (see 6.10.8.3) provided the use is guarded by an appropriate conditional inclusion preprocessing directive using the related macro. …
7 A conforming program is one that is acceptable to a conforming implementation.5).
5) Strictly conforming programs are intended to be maximally portable among conforming implementations. Conforming programs may depend upon nonportable features of a conforming implementation.
I don't think there are any features of standard C which, if used in the way that the standard intends, makes the program not strictly conforming. If there are any such, they are related to locale-dependent behaviour. The behaviour of FAM code is not inherently locale-dependent.
I do not think that the use of a FAM inherently means that the program is not strictly conforming.
Question 3
I think question 3 is ambiguous between:
The answer to 3A is "No" (witness the C11 example at 25, quoted above).
The answer to 3B is "Yes" (witness §6.7.2.1 15, quoted above).
I need to quote the C standard and Dror's answer. I'll use [DK] to indicate the
start of a quote from Dror's answer, and unmarked quotations are from the C standard.
As of 2017-07-01 18:00 -08:00, the short answer by Dror K said:
[DK]
- Yes. Common conventions of using FAMs expose our programs to the possibility of undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.
I'm not convinced that simply using a FAM means that the program automatically has undefined behaviour.
[DK]
- Possible, but unlikely. Even if we don't actually reach undefined behavior, we are still likely to fail strict conformance.
I'm not convinced that the use of a FAM automatically renders a program not strictly conforming.
[DK]
- No. The offset of the FAM is not required to be at the end of the struct, it may overlay any trailing padding bytes.
This is the answer to my interpretation 3A, and I agree with this.
The long answer contains interpretation of the short answers above.
[DK]The problem was that common C99 implementations, such as GCC, didn't follow the requirement of the standard, and allowed the FAM to overlay any trailing padding bytes. Their approach was considered to be more efficient, and since for them to follow the requirement of the standard- would result with breaking backwards compatibility, the committee chose to change the specification, and as of C99 TC2 (Nov 2004) the standard no longer required the offset of the FAM to be at the end of the struct.
I agree with this analysis.
[DK]The new specification removed the statement that required the offset of the FAM to be at the end of the struct, and it introduced a very unfortunate consequence, because the standard gives the implementation the liberty not to keep the values of any padding bytes within structures or unions in a consistent state.
I agree that the new specification removed the requirement that the FAM be stored at an offset greater than or equal to the size of the structure.
I don't agree that there is a problem with the padding bytes.
The standard explicitly says that structure assignment for a structure containing a FAM effectively ignores the FAM (§6.7.2.1 18). It must copy the non-FAM members. It is explicitly stated that padding bytes need not be copied at all (§6.2.6.1 6 and footnote 51). And the Example 2 explicitly states (non-normatively §6.7.2.1 25) that if the FAM overlaps the space defined by the structure, the data from the part of the FAM that overlaps with the end of the structure might or might not be copied.
[DK]This means that if any of our FAM elements correspond to (or overlay) any trailing padding bytes, upon storing to a member of the struct- they (may) take unspecified values. We don't even need to ponder whether this applies to a value stored to the FAM itself, even the strict interpretation that this only applies to members other than the FAM, is damaging enough.
I don't see this as a problem. Any expectation that you can copy a structure containing a FAM using structure assignment and have the FAM array copied is is inherently flawed — the copy leaves the FAM data logically uncopied. Any program that depends on the FAM data within the scope of the structure is broken; that is a property of the (flawed) program, not the standard.
[DK]Run Code Online (Sandbox Code Playgroud)#include <stdio.h> #include <stdlib.h> #include <stddef.h> int main(void) { struct s { size_t len; char pad; int array[]; }; struct s *s = malloc(sizeof *s + sizeof *s->array); if (sizeof *s > offsetof(struct s, array)) { s->array[0] = 123; s->len = 1; /* any padding bytes take unspecified values */ printf("%d\n", s->array[0]); /* indeterminate value */ } free(s); return 0; }
Ideally, of course, the code would set the named member pad to a
determinate value, but that doesn't cause actually cause a problem
since it is never accessed.
I emphatically disagree that the value of s->array[0] in the
printf() is indeterminate; its value is 123.
The prior standard quote is (it is the same §6.2.6.1 6 in both C99 and C11, though the footnote number is 42 in C99 and 51 in C11):
When a value is stored in an object of structure or union type, including in a member object, the by
- @JonathanLeffler,我同意Dror的解释,原因很简单,如果你把它限制在属于自己结构或工会的成员身上,那么这段本身就没有多大意义.该段的目的是允许编译器在更改结构元素时根据需要处理填充字节的余地. (4认同)
- 这并不意味着.它说"成员对象".int是一个对象.它没有说"具有结构或联合类型的成员对象". (2认同)
如果允许严格一致的程序在其将"适用"所有合法行为的情况下使用实现定义的行为(尽管几乎任何类型的有用输出都将取决于实现定义的细节,例如执行字符集如果程序不关心柔性阵列构件的偏移是否与结构的长度一致,则应该可以在严格一致的程序中使用柔性阵列构件.
数组在内部不被视为具有任何填充,因此由于FAM而添加的任何填充都将在其之前.如果在结构内部或外部有足够的空间来容纳FAM中的成员,则这些成员是FAM的一部分.例如,给定:
struct { long long x; char y; short z[]; } foo;
Run Code Online (Sandbox Code Playgroud)
z由于对齐,"foo"的大小可以填充超出开始,但任何这样的填充都可以作为其一部分使用z.写入y可能会干扰之前的填充z,但不应该干扰z自身的任何部分.
| 归档时间: |
|
| 查看次数: |
1361 次 |
| 最近记录: |