Rather than discuss this in the PR, I thought it worth making an issue of it since it will need addressing.
The file in question is crypto/asn1/x_long.c The relevant commit is here.
Over in #3869 @dot-asm wrote:
H-m-m-m-m... I feel that I have to rise kind of objection to [at least] rationale behind replacing *(long *)x = y with memcpy. I mean rationale is to counter undefined behaviour. Well, formally speaking memcpy doesn't really cut it. The change kind of suggests that we are working with kind of "wire" data, i.e. pieces of data payload that is or is to be delivered by some inter-peer communication means. But[!] if this was actually case, then we wouldn't be able to assume that data is arranged in host byte order. I mean surrounding code is written under assumption that data comes in host byte order, isn't it? And consequently we have to assume that producer or consumer of this data is application code in same process context, not some "wire" data. But in such case behaviour is rendered undefined only if original and assumed types are different. Which in turn means that memcpy would be as inappropriate. [And that COPY_SIZE thing actually makes it apparent, doesn't it?]
Just in case for reference. This kind of contradicts concerns I've risen in #3701. Well, yes, but there is difference between first impressions/reactions and conclusions you can draw after actually contemplating the problem. This is such case...
Can we agree that *(long *)x = y is bad? In these cases, it is undefined according to the C standard -- it is allowed to do nothing, crash the program or format the boot drive although it probably tries to do something resembling what is asked. There are alignment and size concerns when the alignment of long is stricter than that of a pointer or when a long is larger than a pointer.
The memcpy solution is better in the sense that it is well defined and the COPY_SIZE _thing_ (good word, it's ugly) prevents overflows. I think there could be problems with loss of information if the size of a pointer and a long don't match. Is memcpy likely to be confused with a "wire" operation in this context? There are no endian guards which indicate "wire" more strong to me.
I don't think a union will be any better. A size mismatch will could result in loss of information, although it would cover the overflow concerns.
Would using a ptrint_t instead of a long make sense? It doesn't feel right to me.
Would allocating a block on the heap to contain the long make sense? This seems like the proper solution. The pointer stays a pointer and overflows aren't possible. Are memory leaks likely to result?
[Edit: Replaced some voids with ASN1_VALUE in the examples, to match crypto/asn1 better.]
I may be wrong, but I seem to recall C only considers using a pointer cast to the wrong type to be undefined, not casting it around. I.e. I believe this is defined, if rather goofy:
long l;
ASN1_VALUE **ptr = (ASN1_VALUE **)&l;
*((long *)ptr) = 42;
In the case of x_long.c, that's approximately what's happening here. While normally the representation of the ASN.1 type is a pointer of some type and *pval is written to without cast[0], x_long.c is weird and the representation is a long. Even though the pointers are passed around as ASN1_VALUE **pval, sizeof(*pval) is not meaningful and pval must not be dereferenced until you check it is not one of the special types. See f617b4969a9261b9d7d381670aefbe2cf766a2cb. (I feel like there were other similar changes which demonstrated this, but I can't seem to find them right now.)
I.e., I agree with Andy that the memcpy and COPY_SIZE bits in the commit are questionable. Concretely, that change breaks if long larger than ASN1_VALUE *.
[0] Thus doing the approximately:
FOO *p;
ASN1_VALUE **ptr = (ASN1_VALUE **)&p;
*ptr = malloc(sizeof(int));
which is, in itself, a bit of type-punning. At least the long version ultimately dereferences a pointer of the right type. Of course, if we're sufficiently worried about undefinedness, this entirely ASN.1 library works by stashing offsets into tables, doing math on pointers, and them casting them around (see offset2ptr), so there's all kinds of silliness going on.
You can cast a pointer to void* and back. void** doesnât work. Thatâs it. Memcpy looks ugly but is safer. Those who forget the pre-x86 days are doomed to relive them âș
Rich, I think you misunderstand where the memcpy is going. memcpy doesn't help here and only breaks things. If the pointer cast is invalid, the bug is that crypto/asn1's entire calling convention is wrong and cannot support x_long.c.
yeah, sorry, quick read w/o looking at context. ignore that comment of mine.
What does memcpy break that the cast doesn't?
There are cases where the cast is broken and memcpy isn't: sizeof(long) > sizeof(pointer) is nasty.
I rather suspect that the crypto/asn1 calling convention is the bug here.
Yeah, I probably should have been clearer that, if the pointer cast was invalid (the "I may be wrong" part---sounds like I was indeed wrong? I haven't had a chance to review the spec there yet), then what I wrote about the memcpy still holds but we have the additional existing problem that the original conversion from long * to ASN1_VALUE ** (itself not done that way but by offset2ptr which deserves its own definedness analysis) would be undefined.
@paulidale Right, the calling convention is either incorrect or at best extremely confusing, depending on what kinds of (non-dereferencing) pointer casts C allows. The type of that pointer is wrong for x_long.c. It is a long * with the wrong type. It is incorrect to dereference the pointer without casting it. It was probably incorrect to get it into that state to begin with, but the memcpy won't fix that.
You all know that the LONG and ZLONG types are strongly discouraged, right (its use has been completely removed from our code)? It has me wondering if this particular code is worth the effort...
That one relies on ASN1_TFLG_EMBED to avoid the allocation, right? This embed flag seems to be the exact same situation:
https://git.openssl.org/gitweb/?p=openssl.git;a=blob;f=crypto/asn1/tasn_dec.c;h=c9b637516e38a2fd7c901d606323085389e2c092;hb=HEAD#l508
It appears ASN1_VALUE **val is actually something *val pointing to the field in the struct. ASN1_TFLG_EMBED appears to just be a generalized version of LONG/ZLONG, pointer-munging and all. So if C forbids stashing long *val in a ASN1_VALUE **val, it would also forbid the generalized case with ASN1_TFLG_EMBED.
Well, no, not quite. LONG / ZLONG came along before ASN1_TFLG_EMBED, and looking at it today, they are a bastardization that embed without telling anyone, by relying on sizeof(long) == sizeof(void *).
UINT32 / ZUINT32 / UINT64 / ZUINT64, on the other hand, do rely on ASN1_TFLG_EMBED, and as far as I know, correctly so.
I wasn't aware that these had been removed from the code base. I was reimplementing a number of recently removed _fixes_.
Still, I'm happier with them gone. I'd advocate for removing them entirely with the next API change.
LONG / ZLONG are scheduled for deprecation with 1.2.0. See #3126
@levitte thanks muchly for that. I hadn't seen the deprecation PR. Not soon enough :)
This memcpy vs cast magic discussion seems kind of irrelevant now.
Pauli
In these cases, it is undefined according to the C standard -- it is allowed to do nothing, crash the program or format the boot drive
Here we go again. This is urban legend. Compiler is not actually allowed to [deliberately] do the said things. Standard effectively says that there might be side effects of executing the generated code, but nowhere does it say that compiler is allowed to generate sequence of instructions that would [attempt to] format the boot drive or any of the sort.
although it probably tries to do something resembling what is asked.
No, there is no "probably". Compiler will generate machine instructions as prescribed. It's the effect of execution of these instructions that is (or may be) undefined, not the way the instruction sequence looks.
Yeah, the formatting of the boot drive is over the top. A compiler that did this in response to an undefined operation wouldn't survive long.
Crashing the program is allowed -- misaligned pointers being dereferenced on a platform that doesn't handle it produces a boom usually. Odd word address on a 68k family device e.g.
Doing nothing is less likely in this case but it is possible in others. The expression x = x++ is undefined and can legitimately do nothing.
Memcpy... is safer.
No, it's as safe [or as unsafe]. Just consider you have pointers to objects of different sizes, say int and long on LP64. Copying to long as if it was int has a chance of working on little-endian [in case when most significant part was initially zeroed] and always produces nonsense on big-endian. In other words formally speaking undefined behavour. And in addition to endiannes problem on big-endian copy to int as if it was long will corrupt adjacent memory on either platform. And all this is irregardless what you use, *(long *)x = y or memcpy. That's the point. The only difference is that memcpy masks possible alignment problems [on platforms that don't tolerate unaligned access]. And it's not actually obvious if masking is a good thing. I for one would actually appreciate a crash, at the root of the problem, rather than letting it go indefinitely. As an example let's say it's a time stamp for an authentication token and you are deciding if you should meet or reject request based on copied value. It's copied to long as if it was int, i.e. no corruption, for simplicity... From this angle memcpy is actually less safe, isn't it? Because memcpy guarantees that problem remains undetected in situation when could be detected.
Crashing the program is allowed -- misaligned pointers being dereferenced on a platform that doesn't handle it produces a boom usually
As you might have noticed, I'm not very good with [English] words, but I'd still argue against this formulation [in the context]. Because it's CPU that is responsible for generating exception, not compiler. Well, standard doesn't actually prohibit compiler to generate code that checks pointer alignment, but the thing is that if it did discuss that, then it would probably have to specify an action, i.e. define behaviour. Bottom line is that it's not about "allowing" any specific thing, but refusal to discuss possible side effects. Or in other words, it's not like crashing is allowed, but rather is what we get de-facto.
[It's not the only thing standard deliberately doesn't discuss. Another example is padding between fields of the structures. However! It's not actually up to compiler developer to pick one. It is customarily specified, just not in language standard, but ... in ABI specification. What I'm saying is that when you see something unspecified in language standard, it doesn't actually mean that you are free to let your imagination do the work.]
Can we agree that
*(long *)x = yis bad?
But at the same time we can't deny that it (or its memcpy equivalent) is necessary under the circumstances. As David said, it is specified behavour as long as we are looking at objects of same type. Behaviour becomes undefined in cases when original types are different. But in such case it doesn't matter which of the two alternatives we use. That's the point. Yes, it's "bad", but it's rather "badass" than "replacing it with memcpy will make everything right". Once again, memcpy is just as bad/badass.
I don't think a union will be any better.
Do note that question about union was not actually related to badness comparison between the options in question. It's rather question of its own. And question is if compiler is not prohibited to reorder dereferences. More specifically, consider
long l = something;
memcpy((int *)&l,some_pointer,sizeof(l));
if (l == something) { magic(); }
Can it be compiled as memcpy(&l,some_printer,sizeof(l)); magic();? With rationale that (int *)&l is pointer to incompatible type and hence can't overlap l? Naturally provided that memcpy is built-in function, i.e. compiler can make all possible assumptions about what it does and how. For example it could even l = *some_pointer; magic(); on x86 since it's safe to assume that it tolerates misaligned access? Note that I'm not saying that I know answer to this question. Only that I do know that if you pass it through union, then compiler would be actually free to do that...
Only that I do know that if you pass it through union, then compiler would be actually free to do that...
I meant that compiler would not be free to do that [reorder dereferences]. Sorry :-)
I'm sorry, but I have to disagree with some of @dot-asm's comments here relating to undefined behavior. That is, in this context, we are worrying about the precise details of the C language specification, and "undefined behavior" is a specific technical term witih precise meaning in this context. In
I'd still argue against this formulation [in the context]. Because it's CPU that is responsible for generating exception, not compiler. Well, standard doesn't actually prohibit compiler to generate code that checks pointer alignment, but the thing is that if it did discuss that, then it would probably have to specify an action, i.e. define behaviour
we seem to be generating confusion with a different sort of (un)defined behavior, that is, what we should expect to happen at runtime, which is probably counterproductive. That is, we can talk about what happens at runtime, but can we try to avoid using "undefined behavior" when referring to it?
The C specification requires the compiler to produce machine instructions that have the same net effect as the source code does when interpreted in the C abstract machine. There are some 200-ish places in the spec where the abstract machine has undefined behavior, and producing machine instructions that have the same effect as undefined behavior gives the compiler great leeway to act in conformance with the language specification, and I will repeat that the compiler is permitted to generate instructions that crash the program, reformat the hard drive, launch the missiles, etc. in this case, where "permitted" is in terms of "compliance with the C language specification". The undefinedness is at the source level, and not just when the generated instructions are executed.
Some of the undefined behaviors (like constructing pointers outside the bounds of an array/object) do have natural corresponding runtime exceptions on some hardware, such as segmented addressing modes, but I see that more as being data used as inputs in the specification process -- the undefined behavior is specified solely in the language spec.
Getting off my soapbox, there are two highly relevant potential sources of undefined behavior here (and maybe others, like the offset2ptr case that was mentioned):
Converting a (pointer to an object or incomplete type) to a (pointer to a different object or incomplete type) with a different alignment requirement is undefined behavior, n1256.pdf section 6.3.2.3 paragraph 7. This applies to the mere construction of the value; no accesses or dereferences are needed to trigger the undefined behavior. There are exceptions elsewhere in this section that permit intercasting to/from (qualified) void* and char*, as these are "universal" ways to access any object. In this case the relevant types are pointer-to-long and pointer-to-(ASN1_VALUE*); if long is 32 bits and pointers 64, the alignment requirements are likely different, so casting through these pointers triggers undefined behavior.
An object must only be accessed via a pointer of compatible type (section 6.5 paragraph 7) [or some other exceptions that don't apply here]; for compatible types we see section 6.2.7 paragraph 1; struct ASN1_VALUE_st* and long are not compatible types. However, if I understand correctly, even though we are handed an ASN1_VALUE** as input to this function, the actual object pointed to by the pointer value we receive is still a long, so the *(long *)val cast is necessary to avoid undefined behavior (as opposed to just dereferencing *val). The memcpy approach also avoids this undefined behavior by using the exception that allows char* accesses a byte-at-a-time. Restricting the memcpy with COPY_SIZE is problematic at runtime in several senses, as Andy points out, but it is perfectly well-defined behavior at the C language level. It may not produce consistent results at runtime on various platforms, but this is not the fault of C!
My conclusion is that this whole setup is quite risky w.r.t. pointer alignment undefined behavior, it's good that we're not using it in-tree anymore (but even with that caveat we still have to support it until 1.2.0), and COPY_SIZE should be replaced with sizeof(long), optionally with some casts to long* retained.
Andy also wrote:
long l = something;
memcpy((int *)&l,some_pointer,sizeof(l));
if (l == something) { magic(); }
Can it be compiled as memcpy(&l,some_printer,sizeof(l)); magic();? With rationale that (int *)&l is pointer to incompatible type and hence can't overlap l? Naturally provided that memcpy is built-in function, i.e. compiler can make all possible assumptions about what it does and how. For example it could even l = *some_pointer; magic(); on x86 since it's safe to assume that it tolerates misaligned access? Note that I'm not saying that I know answer to this question. Only that I do know that if you pass it through union, then compiler would not be actually free to do that...
(with edit applied)
This one is a bit complicated, because the actual access in memcpy occurs via char* and the (int*) cast in there is a red herring w.r.t. the question of accessing an lvalue. That is, the actual access is something like *(char*)(int*)&l. When such pointer casts are defined behavior, they are value preserving, so the memcpy would be equivalent to memcpy(&l, some_pointer, sizeof(l)), which the compiler is not permitted to reorder via strict aliasing assumptions.
If we instead considered
long l = something;
*(int*)&l = some_integer;
if (l == something) { magic(); }
that would be a strict aliasing violation and the compiler is permitted to reorder. (It's also an access via an incompatible type, which is undefined behavior in its own right, so this example is not so good.)
But all of the above ignores the possibility that merely computing (int*)&l is undefined behavior due to the pointer alignment requirements of section 6.3.2.3 mentioned in my previous comment.
That is, we can talk about what happens at runtime, but can we try to avoid using "undefined behavior" when referring to it?
You probably have to keep in mind my occupational hazard, I can't let be imagining machine code when looking at source code or even reading specification :-)
The undefinedness is at the source level, and not just when the generated instructions are executed.
But wouldn't it mean that code has to be executed on system equipped with missiles? Well, maybe less hypothetical example [since vast majority of systems is equipped with storage] in context actually authorized to write to storage controller I/O ports to actually have capacity to initiate "disk reformat"? Otherwise how do you reconcile the fact that execution of same instruction sequence can result in both defined and undefined behaviour? I mean let's say you pass pointer to memory-mapped I/O port to ... memcpy...
the compiler is permitted to generate instructions that crash the program, reformat the hard drive, launch the missiles, etc. in this case, where "permitted" is in terms of "compliance with the C language specification".
Yet nowhere does it say so. It's just conclusion people tend to draw. You said that standard is about C abstract machine. And thing is that it, abstract machine, does not have any of these things, not even formattable storage, and hence aren't all references to above mentioned extreme side effects unavoidably rendered to what happens at runtime?
Wow, thanks, @kaduk! That was very thorough.
Unfortunately I think that means, while you're no longer using LONG/ZLONG in tree, ASN1_TFLG_EMBED has similar properties as far as this is concerned. It ends up doing approximately:
FOO foo;
// asn1_get_field_ptr. (NB: it uses offset2ptr instead of &foo which
// may be its own can of worms.)
ASN1_VALUE **pval = (ASN1_VALUE **)&foo; // pval really has type FOO*
// First few lines of asn1_template_noexp_d2i.
ASN1_VALUE *tval = (ASN1_VALUE *)pval; // tval really has type FOO*
pval = &tval; // pval now really has type FOO**
// What parsing the FOO ultimately ends up doing, either by way
// of the primitive's c2i hook or filling in the fields one-by-one.
*((FOO *)*pval) = blahblahblah;
Per your pointer alignment analysis, we have casts:
FOO * to ASN1_VALUE **ASN1_VALUE ** to ASN1_VALUE *.The first cast requires that FOO and ASN1_VALUE * have the same alignment, which is not true for all types in x_int64.c. The embed flag may also be applied to arbitrary ASN.1 fields it seems, so any struct with different alignment then a point carries risk.
The second cast requires that ASN1_VALUE * and ASN1_VALUE have the same alignment. One is a pointer and the other is some incomplete type. I'm not sure how those alignment rule work here given that ASN1_VALUE is never defined at all...
Presumably these particular undefined behavior concerns could be addressed by stopping using the incomplete struct ASN1_VALUE_st type and using void* as it's intended. (Note: void* not void**. Some other pointer is a perfectly fine thing for a void* to point to.)
Yes to what @kaduk just said! Letâs just use void*
That is, we can talk about what happens at runtime, but can we try to avoid using "undefined behavior" when referring to it?
You probably have to keep in mind my occupational hazard, I can't let be imagining machine code when looking at source code or even reading specification :-)
Sure. And it's well established that compilers are less good at generating machine code than you are :)
The undefinedness is at the source level, and not just when the generated instructions are executed.
But wouldn't it mean that code has to be executed on system equipped with missiles? Well, maybe less hypothetical example [since vast majority of systems is equipped with storage] in context actually authorized to write to storage controller I/O ports to actually have capacity to initiate "disk reformat"? Otherwise how do you reconcile the fact that execution of same instruction sequence can result in both defined and undefined behaviour? I mean let's say you pass pointer to memory-mapped I/O port to ... memcpy...
Yes, the instructions generated by the compiler would have to be executed on the machine connected to the missiles. Presumably, if those instructions were executed somewhere else they would get SIGILL or similar.
W.r.t. the storage controller/disk reformat hypothetical, this would seem to involve the compiler needing to emit instructions to write to the I/O ports only in the case when the C-level undefined behavior is triggered. This, in general, means emitting more instructions, which "real" compiler authors are loth to do, so we don't see the issue in practice. But if we admit a malicious compiler author, then the possibilities become more exciting.
I'm not sure I understand your question about " the fact that execution of same instruction sequence can result in both defined and undefined behaviour", though -- C-level undefined behavior is generally going to only be relevant when translating C code into machine instructions; the relevance of undefined behavior has passed by the time the instructions are executed. That is, the compiler has generated instructions and the processor is expected to deterministically execute those instructions (modulo silicon bugs), so executing any given sequence of instructions is just ... executing the program as realized from the C abstract machine. Perhaps you are thinking of the class of behaviors such as where a variable is dereferenced prior to being checked against NULL and the compiler omits the NULL check entirely (so that "undefined behavior" is invoked when the input is NULL but behavior is defined on non-NULL inputs)? This sort of thing seems like a case of the compiler taking "undefined behavior" as license to emit no instructions. That is, in the specific case causing undefined behavior, the compiler is permitted to issue the same instructions that it would issue in the case of non-defined behavior, and for efficiency of output the conditional is dropped, since both branches contain the same instructions.
the compiler is permitted to generate instructions that crash the program, reformat the hard drive, launch the missiles, etc. in this case, where "permitted" is in terms of "compliance with the C language specification".
Yet nowhere does it say so. It's just conclusion people tend to draw. You said that standard is about C abstract machine. And thing is that it, abstract machine, does not have any of these things, not even formattable storage, and hence aren't all references to above mentioned extreme side effects unavoidably rendered to what happens at runtime?
They are side effects, and visible at runtime, yes. ("Side effects" is also a technical term in the context of the C language specification, though I think you are not using that meaning here.)
I'm not sure if I'm repeating myself, but it seems helpful to think of undefined behavior as being a property of the C abstract machine. Compilers are charged with translating from the abstract machine to a particular physical machine, in the form of emitting instructions. What instructions the compiler emits when translating from abstract machine undefined behavior are up to the compiler, and can be non-deterministic with respect to invocations of the compiler, other occurrences of the same source code in the compilation unit, etc., though of course each instance of undefined behavior being translated to instructions will have a specific (possibly empty) set of corresponding physical-machine instructions. What happens at runtime is "just" executing those instructions, but since the compiler has no constraints on what those instructions are (not even consistency, as is imposed on implementation-defined behavior), the side effects of these compiler-chosen instructions can be, effectively, arbitrary, subject only to the limitations of what the processor will execute and what hardware is connected to the processor.
Someone will have to explain why a pointer to void is better than a pointer to an undefined (only declared) struct.
Or let me put it this way: why is a pointer better than a pointer?
Because âvoid*â has a particular guarantee: you can stuff anything into it (via casting) and you can get it back out unchanged (via casting).
By "anything" I mean any pointer. For example, this is not guaranteed to work:
struct foo *fp = malloc(sizeof *fp);
struct bar *bp = (struct bar *)fp;
struct foo *f2 = (struct foo *)bp;
assert(f2 == fp); /* might or might not assert */
If, however, you replace the second line with this:
void *bp = fp;
then the code is guaranteed to work.
void* is very special.
( Even though on many, if not most, machines treat all pointers the same )
Actually, I believe that void * can be passed to and from pointers to other types without casting, that's what makes a void * special.
But that's exactly what we do not want. void * takes away all type safety, we want to avoid that.
@richsalz, if I'm not entirely mistaken, nothing bad should happen as long as you just pass a pointer around. It's when you try to dereference it that you may or may not get in trouble.
You are mistaken. For example, pointers can be "typed" and a compiler can round off the representation to an even multiple of the structure size.
If we want to pass pointers around, you have to use void* as the transport. Nothing else is guaranteed to work, even if other things will work essentially everywhere.
I should clarify (and credit Greg Hudson for pointing it out) that my statement about computing a pointer value with wrong alignment is supposed to apply when the computed pointer value is not not correctly aligned for the new type, which is something of a substantial difference in terms of practical effect. That is, since malloc() is guaranteed to return a pointer that is aligned for any type, the above example will not be problematic.
But if, say,
struct s {
int a;
long b;
char *c
} __attribute(packed)__;
was used, that's the sort of thing that could get us in trouble.
a compiler can round off the representation to an even multiple of the structure size.
You didn't really mean that... that would mean that struct { char foo[1024]; } would have to be 1024 byte aligned...
Yes I meant that. But it doesnât have to be aligned, it can be. Imagine an array of such structures.
Stop emailing me OK I will call 911
On Jul 7, 2017 9:44 AM, "Richard Levitte" notifications@github.com wrote:
a compiler can round off the representation to an even multiple of the
structure size.You didn't really mean that... that would mean that struct { char
foo[1024]; } would have to be 1024 byte aligned...â
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/openssl/openssl/issues/3877#issuecomment-313733388,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AcIBenY3MDkDvh6Zn1kvDRkshtkk5CTQks5sLmBbgaJpZM4OQTjl
.
By "anything" I mean any pointer. For example, this is not guaranteed to work:
struct foo *fp = malloc(sizeof *fp); struct bar *bp = (struct bar *)fp; struct foo *f2 = (struct foo *)bp; assert(f2 == fp); /* might or might not assert */
You're thinking alignment, and that pointers would be fiddled with so it would have proper alignment. Then, this would happen whenever there's an assignment like struct foo *whatever = something, right?
If, however, you replace the second line with this:
void *bp = fp;then the code is guaranteed to work.
I doubt it, because the next line would, according to what you seem to say, still realign the passed pointer for the final assignment, wouldn't it?
struct foo *f2 = bp;
No. The C standard says that "void" is special and the code that I said is guaranteed to work Is Guaranteed To Work. No other pointer is guaranteed. Yes, it doesn't happen on almost all computers we will see, but the only safe way to "hold" a pointer is to use void, not to use an incomplete type. I don't know of any other ways to say this. the only reference I have at and is K&R 2ed copyright 1988, p199 sections A7.6 to A6.8
What I end up hearing from you guys is that you'd like to see this change:
diff --git a/crypto/asn1/tasn_dec.c b/crypto/asn1/tasn_dec.c
index c9b637516e..42756d3b66 100644
--- a/crypto/asn1/tasn_dec.c
+++ b/crypto/asn1/tasn_dec.c
@@ -495,7 +495,7 @@ static int asn1_template_noexp_d2i(ASN1_VALUE **val,
{
int flags, aclass;
int ret;
- ASN1_VALUE *tval;
+ void *tval;
const unsigned char *p, *q;
if (!val)
return 0;
@@ -510,8 +510,8 @@ static int asn1_template_noexp_d2i(ASN1_VALUE **val,
* a pointer to a field.
*/
if (tt->flags & ASN1_TFLG_EMBED) {
- tval = (ASN1_VALUE *)val;
- val = &tval;
+ tval = val;
+ val = (ASN1_VALUE **)&tval;
}
if (flags & ASN1_TFLG_SK_MASK) {
diff --git a/crypto/asn1/tasn_enc.c b/crypto/asn1/tasn_enc.c
index caa48696da..d8ca717fb0 100644
--- a/crypto/asn1/tasn_enc.c
+++ b/crypto/asn1/tasn_enc.c
@@ -196,7 +196,7 @@ static int asn1_template_ex_i2d(ASN1_VALUE **pval, unsigned char **out,
const ASN1_TEMPLATE *tt, int tag, int iclass)
{
int i, ret, flags, ttag, tclass, ndef;
- ASN1_VALUE *tval;
+ void *tval;
flags = tt->flags;
/*
@@ -204,8 +204,8 @@ static int asn1_template_ex_i2d(ASN1_VALUE **pval, unsigned char **out,
* a pointer to a field.
*/
if (flags & ASN1_TFLG_EMBED) {
- tval = (ASN1_VALUE *)pval;
- pval = &tval;
+ tval = pval;
+ pval = (ASN1_VALUE **)&tval;
}
/*
* Work out tag and class to use: tagging may come either from the
diff --git a/crypto/asn1/tasn_fre.c b/crypto/asn1/tasn_fre.c
index ae91461774..e5dc34081d 100644
--- a/crypto/asn1/tasn_fre.c
+++ b/crypto/asn1/tasn_fre.c
@@ -126,10 +126,11 @@ static void asn1_item_embed_free(ASN1_VALUE **pval, const ASN1_ITEM *it,
void asn1_template_free(ASN1_VALUE **pval, const ASN1_TEMPLATE *tt)
{
int embed = tt->flags & ASN1_TFLG_EMBED;
- ASN1_VALUE *tval;
+ void *tval;
+
if (embed) {
- tval = (ASN1_VALUE *)pval;
- pval = &tval;
+ tval = pval;
+ pval = (ASN1_VALUE **)&tval;
}
if (tt->flags & ASN1_TFLG_SK_MASK) {
STACK_OF(ASN1_VALUE) *sk = (STACK_OF(ASN1_VALUE) *)*pval;
diff --git a/crypto/asn1/tasn_prn.c b/crypto/asn1/tasn_prn.c
index b5698060e8..32278eb782 100644
--- a/crypto/asn1/tasn_prn.c
+++ b/crypto/asn1/tasn_prn.c
@@ -266,7 +266,8 @@ static int asn1_template_print_ctx(BIO *out, ASN1_VALUE **fld, int indent,
{
int i, flags;
const char *sname, *fname;
- ASN1_VALUE *tfld;
+ void *tfld;
+
flags = tt->flags;
if (pctx->flags & ASN1_PCTX_FLAGS_SHOW_FIELD_STRUCT_NAME)
sname = ASN1_ITEM_ptr(tt->item)->sname;
@@ -282,8 +283,8 @@ static int asn1_template_print_ctx(BIO *out, ASN1_VALUE **fld, int indent,
* a pointer to a field.
*/
if (flags & ASN1_TFLG_EMBED) {
- tfld = (ASN1_VALUE *)fld;
- fld = &tfld;
+ tfld = fld;
+ fld = (ASN1_VALUE **)&tfld;
}
if (flags & ASN1_TFLG_SK_MASK) {
Do I understand that correctly?
What I end up hearing from you guys is that you'd like to see this change:
I'm not sure there's much benefit from a patch that small, as we'd still be passing things around as ASN1_VALUE** -- I had in mind something much more invasive that would probably constitute and ABI break. (And is also a rather daunting task to think about starting.)
Oook... if it's that invasive, then yeah, that won't go in before 1.2.0...
Btw, @kaduk, you mentioned n1265.pdf... are we supposed to find it by googling? I'm asking, 'cause I found one, but it's about something quite different. still C, but...
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf looks better (searching for just "n1256" without the .pdf).
@kaduk thanks for explaining things the way I should have but better :) I linked to the n1265.pdf in the PR that spawned this issue. I should have linked it in the first post here.
@paulidale I think COPY_SIZE should not allow copying different sized types, it should
be a static assertion, maybe like
#define COPY_SIZE(a, b) (sizeof(int [sizeof(a) == sizeof(b) ? 1 : -1]), sizeof(a))
I don't think that this line in long_i2c is correct:
memcpy(<mp, pval, COPY_SIZE(*pval, ltmp));
because *pval is of type ASN1_VALUE *
and ltmp is of type long.
Above assertion would fire on Win64 for instance
becaue sizeof (long) != sizeof (ASN1_VALUE*).
Changing COPY_SIZE like that will (could?) break existing code :( Although the argument could be made that such code is broken already. I agree COPY_SIZE needs to go away with the rest of this.
Instead of void *, could a union be used for ASN1_VALUE? Casts of pointers to _type_ to unions containing _type_ are explicitly defined. The union will have sufficient space for the copy, type checking is retained and the casting can be avoided: union->long =*long_ptr;
Yes, avoiding type-casts is generally superior.
I admit I have not spend too much time thinking about it,
but when this code gets rewritten, could we avoid using the non-portable "long"
and use something more portable like "int64_t" or even "long long" (C99)?
@bernd-edlinger I think the consensus is that in this instance the ASN1_VALUE ** is aliasing a pointer value that was generated by computing the offset of a long struct member and adding that offset to the address of the start of the struct. That is, the object being pointed to is actually a long, and we do not really need to worry about the size of a pointer. Look at (e.g.) 6fb4f30611e8e5a5598234463f644cb950de760d for an example of how this code was used before it was removed.
Hmm...
While casting the pointer to a member of type long to long* is perfectly OK,
it would not be OK to cast the pointer to a member of type long to union*
and access the memory location like union->m_long.
Are we back at LONG / ZLONG? I fail to see why we should spend so much energy on them specifically...
What needs to happen with this issue? Ping @paulidale.
I think we close this and make a new one to remove x_long for 1.2.
Or maybe do even more invasive changes^Wrewrites to the ASN.1 implementation...
I believe the COPY_SIZE bits are still wrong and should be reverted. It's true that there is some pointer-munging but that is true of the entire library and not specific to x_long.
Note that LONG / ZLONG are deprecated from 1.2 on... per policy, that usually means that it should still be usable for some short foreseeable future, but I'd personally be perfectly ok with removing any trace of this particular type already in 1.2.
This seems to have died off. As a final note, the use of LONG / ZLONG is strongly discouraged (and as mentioned before, to be deprecated), so I don't see that there's much benefit in fixing them.
Unless someone cries bloody murder, I'll close this issue next time around that I see it.