Go: cmd/cgo: make identical C types identical Go types across packages

Created on 3 Dec 2015  ·  53Comments  ·  Source: golang/go

https://golang.org/cmd/cgo/ says:

Cgo translates C types into equivalent unexported Go types. Because the translations are unexported, a Go package should not expose C types in its exported API: a C type used in one Go package is different from the same C type used in another.

While that's a convenient workaround for allowing access to struct fields and other names that would otherwise be inaccessible as Go identifiers, it greatly complicates the process of writing Go APIs for export to C callers. The Go code to produce and/or manipulate C values must be essentially confined to a single package.

It would be nice to remove that restriction: instead of treating C types as unexported local types, we should treat them as exported types in the "C" package (and similarly export the lower-case names that would otherwise be unexported).

Most helpful comment

In case the relevance of this issue is actually unclear and someone would benefit from an real world example, let me help out. Otherwise please ignore this comment, I have nothing technical to add to the discussion.

There are a lot of C libraries that will give you an instance of a non-basic C type, which you then use in a different C library. Last time I ran into this was while using Vulkan (the low level OpenGL "successor"), so here we go:

If you want to do GPU accelerated graphics stuff, you would typically use a library like Glfw to handle the OS dependent details, like window creation and input. There are go bindings for that, which is nice. Glfw will do the OS specific incantations for you and return an instance of VkSurfaceKHR, a Vulkan type.

Now you want to draw something into your window, so you need to pass the VkSurfaceKHR to Vulkan and do something with it.

But since the Vulkan and Glfw bindings are in different packages, you can't just get the C.VkSurfaceKHR from Glfw and use it in a Vulkan function call.

You can't put both bindings into one package, because Glfw supports m graphics apis and there are n platform abstraction libraries that support Vulkan. So you would end up with m*n go packages.

This is a very real problem I encounter every couple of weeks in different contexts.

All 53 comments

Somewhat related: does C require types to be defined the same way across translation units? Reading through C99, it seems to only require that objects and functions with external linkage need to have the same type across translation units (6.2.7), but I can't find anything per se that disallows for example "typedef int foo;" in one translation unit and "typedef unsigned foo;" in another (assuming they don't in turn lead to incompatible object/function declarations).

(Not to say that cgo _needs_ to support that.)

C and C++ permit the same name to designate different types in different compilation units.

I definitely agree that this restriction should be lifted. It makes it very hard to break up your code to promote maintainability. It would make more sense for a C.int, etc to be a C.int everywhere, just as an int is an int everywhere.

I don't believe we should lift this restriction. It is explicitly not a goal to make it possible to expose C types directly in Go package APIs. As Ian said, it's not even clear this is sound.

It is explicitly not a goal to make it possible to expose C types directly in Go package APIs.

The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions and/or Go packages that export C APIs with richer structure than primitive pointers and integers. (For example: one might want to return a protocol buffer from a Go function to a C or C++ caller without making more than one copy of the marshaled data. That operation is complex enough that it needs a support library, and because it needs to manipulate Go types it must be written in Go.)

As Ian said, it's not even clear this is sound.

The request is to make "identical" C types identical Go types, not to make C types "with the same name" identical Go types. I believe it is sound provided that we enforce that the C types are actually identical.

OK, I'm happy to reopen this, but I have no idea how to do it. It seems fundamentally at odds with Go's package system. I'm happy to look at implementation proposals though.

For the time being, is there a workaround for this? Maybe using an interface that can represent the same struct from different packages?

You can work around it in one direction (going from the C types in another package to the C types in the current package) using reflect and unsafe.Pointer. The same technique may be possible in the other direction too.

If you want to add back some of the type-safety at run time, you can use the reflect package to iterate over the struct fields to verify that they're compatible.

Just pondering if aliases or whatever comes out of https://github.com/golang/go/issues/18130 would help with this problem.

@joegrasse I don't think type aliases per se would help with the general problem: either way, you end up needing one "canonical definition" for each type, and if you've already got a canonical definition then you don't need to be able to refer to it by different names.

However, it might at least solve the subproblem of making the Go types for C typedefs have the same aliasing structure as the C types. (I'm honestly not sure whether that's currently the case: it hadn't even occurred to me to check.)

@joegrasse, probably not, but if we do #16623 (let compiler know more about cgo) then the compiler would be in a position to resolve this, if we wanted to.

(Fixed issue number, sorry.)

@rsc, do you mind double checking the issue number. I think you might have mistyped it.

@joegrasse Russ meant #16623.

Thanks

@bcmills Forgive my backtracking, but I don't understand why aliasing wouldn't solve the problem. I thought that the problem with C structs in Go was that the compiler views a C struct within a package differently as the same C struct outside of the package. Therefore you don't have a "cannonical definition" for that type. Wouldn't aliasing the C struct as a Go struct help with the problem?

@14rcole Consider this program:

foo.h:

typedef struct {
  int i;
} Foo;

foo/foo.go:

package foo

// #include "foo.h"
import "C"

func Frozzle(x *C.Foo) {
  …
}

bar.h:

typedef struct {
  int i;
} Bar;

bar/bar.go:

package bar

// #include "bar.h"
import "C"

func Bozzle(y *C.Bar) { foo.Frozzle(y) }

This program should compile: C.Foo in package bar is the same C type (a typedef of a struct with the same definition) as C.Bar in package foo. However, that would require cgo to write the definitions of C.Foo and C.Bar such that they are aliases for the same underlying type. Since the type includes the x field which is currently unexported, there is no package in which that type could be defined.

There are other possible ways to solve the problem (e.g. by rewriting field names so that they are always exported and combining all of the C declarations into one package), but they involve more than just a suitable application of aliases.

Also, an alias has to exist in one package P and point to another package Q. That implies P imports Q (to point at it). In the general version of the problem in this issue, both P and Q define some C type and don't know about each other at all. Then some other package M (for main) imports both and tries to mix one with the other. There's no way for aliases per se to solve this problem, because P and Q need to continue not knowing about each other, and M can't change the definitions in P and Q.

Following the latest proposed solution in #16623 (but not strictly dependent upon it), the compilers could treat declarations for _Cfoo_bar as though they were declared from a synthetic "C" package. I believe we could also easily turn off symbol visibility rules for this package (e.g., so that lowercase struct fields are still accessible).

Then I think usual type identity rules would just work as desired, and usual type-reexporting information would help to catch ODR (one-definition rule) violations across C compilation units.

@mdempsky

the compilers could treat declarations for _Cfoo_bar as though they were declared from a synthetic "C" package

It can't be just one package, unfortunately. In a valid program, a type C.X can legitimately mean two different things in two different compilation units.

We could perhaps do some sort of name-mangling to disambiguate, though. For example, we could encode the complete C type definition in the mangled name and have the compiler treat all C-mangled names as being in the same package.

The remaining concern with that approach is what to do with reflect. (If we've mangled the names to avoid collisions, should reflect report the mangled names, the colliding names, or something else entirely?)

What if two packages both define their own version of incompatible
type T in cgo preamble? I think any solutions to this issue must check
to make sure the two cgo types are indeed compatible.

Here is a very contrived example of how I came across this issue. I believe it to be a more simplistic case then what @bcmills and @rsc have both discussed above (although I could be wrong).

Consider the package:
cp/cp.go

package cp

import "C"

func CTest(name *C.char) string {
    return C.GoString(name)
}

and the program:
ct/main.go

package main

import (
    "fmt"

    "cp"
)
import "C"

func main() {
    s := C.CString("Hello World") // This needs to be freed later
    fmt.Println(cp.CTest(s))
}

When you try and build ct/main.go, you get the following error.

# ct
./main.go:12: cannot use s (type *C.char) as type *cp.C.char in argument to cp.CTest

Before coming across this isssue, I would have thought that this program should compile, because cp.CTest takes a *C.char and s is a *C.char. I am not creating any new types, just using the basic C char type. For some reason though, *C.char in package cp becomes a *cp.C.char.

After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.

After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.

What about for types in external C libs? For external libs we can be pretty sure C.foo will always be C.foo, maybe even prefix it MyLib so like C.MyLib.foo or C.MyLib_foo?

@AlexRouSg

What about for types in external C libs?

According to the C standard, "[a]ll declarations of structure, union, or enumerated types that have the same scope and use the same tag declare the same type." That is a property of the type declarations themselves, not the libraries that implement or make use of those declarations.

The proposal here is that C types that are "the same type" according to the C standard should translate to Go types that are identical according to the Go spec.

@bcmills
ohhhhh, was replying to rsc saying it might be limited to only primitive types.

I'm hitting a similar, but I believe related problem.

I'm interfacing with two different headers providing the same functionality.

In one case the header looks like this (cpython)

/* overly simplified */
typedef long Py_ssize_t;

int PyTuple_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o);

The other provider (pypy) has a header that looks like this:

typedef long Py_ssize_t;

int PyTuple_SetItem(PyObject *p, long pos, PyObject *o);

When calling from go, I can't use either C.long(...) or C.Py_ssize_t(...) for the second argument and satisfy both implementations (despite Py_ssize_t and long being identical C types).

I don't know terribly much about go, but at least for primitives does it make sense to expose typedef primitive X as type aliases? would that even solve the problem? where do I start hacking :)

@asottile Are you including both of those headers in cgo comments in the same Go source file? If so,
at least one of C.long or C.Py_ssize_t should work; if it does not, that seems like a separate bug.

I have a single call:

https://github.com/asottile/dockerfile/blob/bf98b2fd9f9598f141771c4170535139d59969b9/pylib/main.go#L122

This compiles fine with cpython, but not with pypy.

If I change it to C.long(i) it compiles fine with pypy, but not with cpython.

So despite Py_ssize_t and long being identical C types in the same source module, I can't write go source that satisfies both implementations of the header.

Ah, I see. That would presumably be fixed by a solution to this issue, but seems like a simpler problem to solve on its own since it does not cross package boundaries. Mind filing a separate issue for it?

As a workaround, you can probably add one declaration or the other explicitly to your cgo preamble, or else define a static wrapper function in the cgo preamble.

Yep I'll try and write up a separate issue for this!

I've opened #21809

Change https://golang.org/cl/63277 mentions this issue: cmd/cgo: use type aliases for primitive types

Change https://golang.org/cl/63276 mentions this issue: misc/cgo/errors: port test.bash to Go

Change https://golang.org/cl/63692 mentions this issue: errors_test: fix erroneous regexp detection

Change https://golang.org/cl/63730 mentions this issue: misc/cgo/errors: test that the Go rune type is not identical to C.int

I think I have a partial solution to this for struct and union types. As expected, Go type aliases are the key.

Caveats:

  • All of the C struct fields must begin with a capital letter (so that they become exported fields in Go).
  • The C types must either be complete in both packages or incomplete in both packages (so that they have the same size and fields in both packages).
  • We need to figure out a solution for primitives to make it work: otherwise the individual field types will conflict.

We start by defining each converted type as an _alias_ for its underlying Go struct type. Now the same types are identical, but _too many_ types are identical: types with the same layout but different C tags are erroneously aliased to the same type.

To fix that problem, we can use _Go_ struct field tags to encode the C struct tags! Because struct tags still count for type identity (#16085 notwithstanding), if we apply a Go tag containing the C tag to the first field on each struct, the two Go types will be mutually convertible but not identical. If the Go struct type does not have any fields, we add a zero-size field named _ and apply the tag to that.

https://play.golang.org/p/Dq9icy_BlH illustrates the general approach.

A simple solution for primitives would be to add some package to the standard library containing declarations for all of the C types:

package ctypes

// #cgo CGO_NOALIAS=1
import "C"

type (
    Int = C.int
    Uint = C.uint
    …
)

Then cgo would be able to rewrite all of the local types to be aliases for that:

package usercode

import "ctypes"

type (
    _Ctype_int = ctypes.Int
    ...
)

For certain C types with sizes defined by the standard (e.g., int32_t), we would instead emit typedefs directly to the corresponding Go types (e.g., int32).

@bcmills
Do you think there would be a workaround for the all fields must start with a capital letter requirement? Cause passing around third party C structs where you can't rename the fields could be very useful.

Maybe have a tag to tell cgo to caps the first letter in go?

Do you think there would be a workaround for the all fields must start with a capital letter requirement?

The only alternative I can see, short of renaming fields, would be a language change to allow lower-case names to be exported anyway, which would potentially require changes in associated tooling (godoc, theast` package, and likely others).

I doubt that this use-case is compelling enough to justify such a change.

Maybe have a tag to tell cgo to caps the first letter in go?

That could work, or to add a prefix (such as "C_") to each field. That wouldn't be source-compatible with existing cgo files, but it could be viable as an explicit option (e.g. specified in the cgo prelude).

I'm unclear about what problem people are talking about solving at this point.

I'm unclear about what problem people are talking about solving at this point.

The same one this issue has always been about: making identical C types (in a cgo-using Go program) identical Go types across Go packages.

Comment 329947340 addresses the subproblem of translating identical numeric C types to identical Go types.

Comment 329946826 addresses the subproblem of translating identical struct and union C types to identical Go types. The solution proposed in that comment requires that the names of the C members start with a capital letter (so that they becomes exported fields of the Go struct type). Workarounds for that requirement are discussed in comments 329969062 and 330593346.

I have no interest in solving the general problem, nor in the associated complexity. Packages should not be exporting, say, *C.FILE in their APIs. If two different packages export *C.FILE and those are different types, that's OK.

I am slightly more sympathetic to *C.char, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte).

Oh, I see what you're saying now. I tried to address that question in comment 168719378, but apparently I was not convincing enough.

I am slightly more sympathetic to *C.char, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte).

*C.char is honestly one of the least problematic types, because it already loosely corresponds to at least three idiomatic Go types ([]byte, *byte, or unsafe.Pointer, depending on usage).

C.long is a better example for a primitive, because there is no Go type to which it portably corresponds.

Packages should not be exporting, say, *C.FILE in their APIs.

Agreed. As I noted previously, “The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions.”

To give some concrete examples:

  • If the package returns the Go type time.Time, it may need a way to obtain one from a *C.struct_tm.
  • If the package returns the Go type *os.File, it may need a way to obtain one from a *C.FILE.
  • If the package returns the Go type string, it may need a way to obtain one from a *C.wchar_t.
  • If the package uses the Go x/text libraries, it may need a way to convert those types to and from a *C.struct_lconv.
  • If the package returns a Go proto.Message, it may need to obtain one from a *C.ProtobufCMessage

...and so on. Most of these conversions involve struct types and require non-trivial boilerplate, and some are quite subtle to implement correctly.

At the moment, either each package must implement its own copy of these conversions (inefficient and error-prone), or the exported API of the conversion helper-package must rely on error-prone unsafe.Pointer conversions.

@rsc Here is a very basic example of my problem and interest in this issue. As you stated here, I would really only care about the basic C types.

@joegrasse Honestly, I think that example only undermines my point. It isn't at all obvious why your cp package needs to accept a parameter of type *C.char instead of the idiomatic Go []byte or string type, considering that you can easily construct the former from the latter (as illustrated in https://golang.org/cl/56530):

package cp

import "C"

import "unsafe"

func CTest(name string) {
    b := make([]byte, len(name)+1)
    copy(b, name)
    p := (*C.char)(unsafe.Pointer(&b[0]))
    C.use(p)
}

You can apply the transformation in the reverse direction using the workaround library described in https://github.com/golang/go/issues/13656#issuecomment-303216308, or perhaps its eventual replacement described in #19367.

To reiterate: I really _don't_ think *C.char is a compelling example for this issue at all.

@bcmills That example was a very contrived example just to demonstrate the problem. I could have chosen any basic C type to display the problem.

@joegrasse, part of Russ's point is that this problem is not worth solving if it only affects contrived examples. I think we all understand the _nature_ of the problem: what we need to understand is its _importance_. (See https://blog.golang.org/toward-go2 for a much more in-depth discussion on this point.)

In case the relevance of this issue is actually unclear and someone would benefit from an real world example, let me help out. Otherwise please ignore this comment, I have nothing technical to add to the discussion.

There are a lot of C libraries that will give you an instance of a non-basic C type, which you then use in a different C library. Last time I ran into this was while using Vulkan (the low level OpenGL "successor"), so here we go:

If you want to do GPU accelerated graphics stuff, you would typically use a library like Glfw to handle the OS dependent details, like window creation and input. There are go bindings for that, which is nice. Glfw will do the OS specific incantations for you and return an instance of VkSurfaceKHR, a Vulkan type.

Now you want to draw something into your window, so you need to pass the VkSurfaceKHR to Vulkan and do something with it.

But since the Vulkan and Glfw bindings are in different packages, you can't just get the C.VkSurfaceKHR from Glfw and use it in a Vulkan function call.

You can't put both bindings into one package, because Glfw supports m graphics apis and there are n platform abstraction libraries that support Vulkan. So you would end up with m*n go packages.

This is a very real problem I encounter every couple of weeks in different contexts.

@MaVo159

Was just about to describe a very similar problem and you beat me to it.

A related issue is one large library that has a number of "modules" defined by optional header files.

It's natural to want to make these true separate packages in Go. This is doable with an internal/ package that implements everything which is used by packages that expose the actual API.

But this means that the implementation of every optional module is included in the build artifact regardless of what actually gets imported, which depending on the library/module can be rather large.

It's not a show-stopping issue, generally, but it sounds like this would fix it.

My use case is more like an union of the problems described by @MaVo159 and @jimmyfrasche.

A package using multiple intercommunicating C libraries that wants users to expand on it's functionality by calling the C library's functions directly. Something similar to plugin behaviour.

Just thought of a problem that will affect the implementation and its usefulness.

It is common for some C libraries to hide the fields of some structs. For example SDL defines typedef struct SDL_Window SDL_Window; in the public headers and struct SDL_Window{...} in the internal headers so you can't directly access the fields.

I think in these cases we would have no choice but to use unsafe.Pointer if we want to pass them around? Unless there is some way cgo can get the strut definition without including the internal headers as we would most likely be missing the proper #defines for it.

@AlexRouSg

I think in these cases we would have no choice but to use unsafe.Pointer if we want to pass them around?

Those should be fine, as long as the compiler treats blank-named fields as exported for the purpose of computing type identity (but I should verify that).

The we can define an “opaque” struct type for each incomplete type, and as long as the type is incomplete in all of the Go packages they'll remain equivalent.

The generated code would look something like:

type _Ctype_struct_SDL_Window = struct {
    _ struct{} `cgo: struct SDL_Window`
}

(But see also #19487: defining Go types at all for incomplete types is a bit of a thorny problem.)

Was this page helpful?
0 / 5 - 0 ratings