Crystal: [RFC] C bindings helper

Created on 11 Oct 2019  Â·  16Comments  Â·  Source: crystal-lang/crystal

I recently stumbled upon a tool called hsc2hs: it's a template language for Haskell to easy writing C bindings.

How it would work

The idea applied to Crystal would be something like this:

You write a .crc file with some directives

# pcre.crc

<%= include "pcre.h" %>

lib LibPCRE
  enum Options
    CASELESS = <%= const PCRE_CASELESS %>
  end
end

Here we just have two of the most simple directives:

  • include will include a C header file
  • const will output the value of a C #define or constant

The crc generates a C file that, when executed, produces a Crystal file

The C file generated by the program above will look like this:

// pcre.crc.c
// This line because of our "include" directive
#include "pcre.h"
#include <stdio.h>

int main(int argc, char** argv) {
  printf("lib LibPCR\n");
  printf("  enum Options\n");
  printf("    CASELESS =");
  printf(PCRE_CASELESS); // this line because of the "const" directive
  printf("\n");
  printf("  end\n");
  printf("end\n");
  return 0;
}

We run the C file and pipe the output to the final Crystal file

$ clang pcre.crc.c -o some_temp_name
$ ./some_temp_name > pcre.cr

The generated file will look like this:

# pcre.cr
lib LibPCRE
  enum Options
    CASELESS = 1
  end
end

We got the 1 directly from the pcre.h header file! :-)

Thoughts

I think the idea of hsc2hs is brilliant: instead of trying to automatically generate bindings from a C header, like we tried to do in crystal_lib, which is very complex, we just get what we really need from C headers.

The things we could get from C headers are:

  • the value of #define and constants
  • sizeof structs
  • alignment of structs (maybe we don't need this)
  • offsetof struct fields: with just this, we can write and read any value from a struct by casting the struct pointer to a Pointer(UInt8), adding the offset, then casting the pointer to the type of the struct field and fetching the value from it. Of course it works too if we have a plain struct (not a pointer of a struct) and we use pointerof.

The above means we can bind to any C struct:

  • we represent the struct as UInt8[N] where N is the sizeof the struct.
  • to read or write a value we use offsetof and use pointers and casting

Not as nice as writing a full C struct binding with all the types, but in many cases we are only interested in a couple of fields from the struct.

The benefit of doing this is portability: the template will be compiled to C and then compiled and ran to generate a Crystal file on the host machine.

The disadvantage is that this doesn't work when cross-compiling. However, I believe cross-compiling is mainly useful for the Crystal compiler itself, to be able to port it to other platforms (once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal).

How to implement this

This can be an external tool that you have to run to generate the Crystal file, and it could be automated by a hook in shards. An alternative is to have this functionality embedded in the compiler.. It doesn't sound that hard to implement, it's in a way similar to ECR.

I'm interested in your feedback! What do you think?

Most helpful comment

All 16 comments

The disadvantage is that this doesn't work when cross-compiling.

once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal

Very good point. The target platforms with most limited resources will need cross-compiling the most. Given the IOT quick growth having a binding generator that works for that case too can be more and more important in the future for Crystal to thrive.

If parsing C is the hard part, can't it be partially solved by using libclang, with https://github.com/crystal-lang/clang.cr?

Parsing is not the problem. The problem is automatically mapping things to Crystal, which isn't clear.

And isn't that what crystal_lib is doing?

Another problem is that doing it with clang imposes a dependency to it and the code is pretty complex. The template way, like what hsc2hs does, has no depenendeices other than a C compiler which is available in basically every dev machine. And the code should be pretty simple too.

I understand the immediate benefit for expanding macros and getting the actual value, but I'm wondering how it works for types and function definitions?

I'd love to see a small POC in a shard.

@ysbaddaden You don't use it for types and function definitions

What hsc2hs provides, however, is a way to map a C integer type to an equivalent Haskell type. They do it with a macro that I don't understand: https://github.com/haskell/hsc2hs/blob/9056de46495348ea8a8fff419c82a91afda0e7e7/template-hsc.h#L69-L78

It casts a float literal to type, then casts one to int and if equal then type is integer, otherwise float. Then some overflow check for unsigned vs signed.

Note that cross compile is still possible but more complex —it needs a cross compiler environment to be installed, which allows to build the executable directly.

@oprypin there are many many edge cases with C headers that make interpretation of the clang AST and mapping to Crystal complex.

I believe CCR as explained here is a better solution than crystal_lib: it's much simpler (and very smart). It feels perfect for mapping complex C libraries, like the libc, where defines and structs are scattered in dozens of files and places.

It's still bothersome to have to write a mapping —thought it gives control on the actual mapping— so I still like c2cr where we can just do c_include "pcre.h" to automagically map everything from that header.

Wait, why "@oprypin"? 😂

I started replying to "And isn't that what crystal_lib is doing?" but drifted away!

My comment was a reply to this comment, not in general

@ysbaddaden c2cr looks useful!

I think the main advantages of doing it like hsc2hs, so like in #8336, are:

  • no dependencies on external libraries. We don't need to depend on clang or libclang, so users wanting to use Crystal don't need those things
  • autogenerating Crystal code from a C header file means one have to check the generated .cr file, or remember the tool's rules, to know what names to use for structs, fields, constants, etc. The way ccr works, you choose the names and you just ask for a little help from the tool: constant values, struct sizes, field offsets, etc. Then you don't need to look at the generated file because all the definitions are there in the template. Also, compilation errors, or even runtime errors, will point to source code that you can read and is not part of the generated file, they'll point to the template file
  • also many times a tool will have some problems with names that are similar in C that will result in the same Crystal name. The tool can choose to change some names to avoid clashes but then the user of the resulting code has to remember these rules too
  • instead of mapping an entire library, where some types might not be needed or used at all by the Crystal code, you can just map what's important to you

I agree with all those points. CCR is also much simpler to implement and doesnt require additional, large, dependencies (libclang).

The drawback is that you must write the mapping, and everyone may have different rules. An automated tool will always use a set of rules (no surprises), but with manual mapping you have control over its rules and can bend them as needed, to look nicer or potential clashes (never happened to me) or guess why a type wasn't mapped because if a c2cr limitation (happens a lot with llvm-c).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lbguilherme picture lbguilherme  Â·  3Comments

Papierkorb picture Papierkorb  Â·  3Comments

asterite picture asterite  Â·  3Comments

jhass picture jhass  Â·  3Comments

costajob picture costajob  Â·  3Comments