I recently stumbled upon a tool called hsc2hs: it's a template language for Haskell to easy writing C bindings.
The idea applied to Crystal would be something like this:
# pcre.crc
<%= include "pcre.h" %>
lib LibPCRE
enum Options
CASELESS = <%= const PCRE_CASELESS %>
end
end
Here we just have two of the most simple directives:
include will include a C header fileconst will output the value of a C #define or constantThe C file generated by the program above will look like this:
// pcre.crc.c
// This line because of our "include" directive
#include "pcre.h"
#include <stdio.h>
int main(int argc, char** argv) {
printf("lib LibPCR\n");
printf(" enum Options\n");
printf(" CASELESS =");
printf(PCRE_CASELESS); // this line because of the "const" directive
printf("\n");
printf(" end\n");
printf("end\n");
return 0;
}
$ clang pcre.crc.c -o some_temp_name
$ ./some_temp_name > pcre.cr
The generated file will look like this:
# pcre.cr
lib LibPCRE
enum Options
CASELESS = 1
end
end
We got the 1 directly from the pcre.h header file! :-)
I think the idea of hsc2hs is brilliant: instead of trying to automatically generate bindings from a C header, like we tried to do in crystal_lib, which is very complex, we just get what we really need from C headers.
The things we could get from C headers are:
sizeof structsalignment of structs (maybe we don't need this)offsetof struct fields: with just this, we can write and read any value from a struct by casting the struct pointer to a Pointer(UInt8), adding the offset, then casting the pointer to the type of the struct field and fetching the value from it. Of course it works too if we have a plain struct (not a pointer of a struct) and we use pointerof.The above means we can bind to any C struct:
UInt8[N] where N is the sizeof the struct.offsetof and use pointers and castingNot as nice as writing a full C struct binding with all the types, but in many cases we are only interested in a couple of fields from the struct.
The benefit of doing this is portability: the template will be compiled to C and then compiled and ran to generate a Crystal file on the host machine.
The disadvantage is that this doesn't work when cross-compiling. However, I believe cross-compiling is mainly useful for the Crystal compiler itself, to be able to port it to other platforms (once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal).
This can be an external tool that you have to run to generate the Crystal file, and it could be automated by a hook in shards. An alternative is to have this functionality embedded in the compiler.. It doesn't sound that hard to implement, it's in a way similar to ECR.
I'm interested in your feedback! What do you think?
The disadvantage is that this doesn't work when cross-compiling.
once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal
Very good point. The target platforms with most limited resources will need cross-compiling the most. Given the IOT quick growth having a binding generator that works for that case too can be more and more important in the future for Crystal to thrive.
If parsing C is the hard part, can't it be partially solved by using libclang, with https://github.com/crystal-lang/clang.cr?
Parsing is not the problem. The problem is automatically mapping things to Crystal, which isn't clear.
And isn't that what crystal_lib is doing?
Another problem is that doing it with clang imposes a dependency to it and the code is pretty complex. The template way, like what hsc2hs does, has no depenendeices other than a C compiler which is available in basically every dev machine. And the code should be pretty simple too.
I understand the immediate benefit for expanding macros and getting the actual value, but I'm wondering how it works for types and function definitions?
I'd love to see a small POC in a shard.
@ysbaddaden You don't use it for types and function definitions
What hsc2hs provides, however, is a way to map a C integer type to an equivalent Haskell type. They do it with a macro that I don't understand: https://github.com/haskell/hsc2hs/blob/9056de46495348ea8a8fff419c82a91afda0e7e7/template-hsc.h#L69-L78
It casts a float literal to type, then casts one to int and if equal then type is integer, otherwise float. Then some overflow check for unsigned vs signed.
Note that cross compile is still possible but more complex —it needs a cross compiler environment to be installed, which allows to build the executable directly.
@oprypin there are many many edge cases with C headers that make interpretation of the clang AST and mapping to Crystal complex.
I believe CCR as explained here is a better solution than crystal_lib: it's much simpler (and very smart). It feels perfect for mapping complex C libraries, like the libc, where defines and structs are scattered in dozens of files and places.
It's still bothersome to have to write a mapping —thought it gives control on the actual mapping— so I still like c2cr where we can just do c_include "pcre.h" to automagically map everything from that header.
Wait, why "@oprypin"? 😂
I started replying to "And isn't that what crystal_lib is doing?" but drifted away!
My comment was a reply to this comment, not in general
@ysbaddaden c2cr looks useful!
I think the main advantages of doing it like hsc2hs, so like in #8336, are:
.cr file, or remember the tool's rules, to know what names to use for structs, fields, constants, etc. The way ccr works, you choose the names and you just ask for a little help from the tool: constant values, struct sizes, field offsets, etc. Then you don't need to look at the generated file because all the definitions are there in the template. Also, compilation errors, or even runtime errors, will point to source code that you can read and is not part of the generated file, they'll point to the template fileI agree with all those points. CCR is also much simpler to implement and doesnt require additional, large, dependencies (libclang).
The drawback is that you must write the mapping, and everyone may have different rules. An automated tool will always use a set of rules (no surprises), but with manual mapping you have control over its rules and can bend them as needed, to look nicer or potential clashes (never happened to me) or guess why a type wasn't mapped because if a c2cr limitation (happens a lot with llvm-c).
Most helpful comment
Speaking of printf
https://github.com/oprypin/crsfml/blob/ce0f5c7aa60b7879ecbfcb174624450b78e96e06/generate.cr#L1884