Runtime: [Feature request] Higher Kinded Polymorphism

Created on 29 Jul 2016  路  10Comments  路  Source: dotnet/runtime

Hi,

_I don't know i this has been asked somewhere already, so please forgive me if it does. My search didn't find any results. This might been similar: https://github.com/dotnet/coreclr/issues/990_

I have been using C# and VB.NET for a few years and for me what makes those languages incredible is tooling but also the fact that you can produce both quite low-level code and very abstract code.

The Functional Programming paradigm has enabled non-performance-critical oriented developers to avoid many pitfalls and to reuse code. Languages such as Scala, F# or C# have proven that it can be combined successfully with an OOP approach.

Unfortunately C# doesn't support Higher Kind Polymorphism and this apparently comes from the CLR not supporting it. (See https://github.com/dotnet/roslyn/issues/2212).

It would enable programmers /languages to write functions such as:

public static T<A> To<T, A>(this IEnumerable<A> xs)
    where T : <>, new(), ICollection<> 
{
    var ta = new T<A>();
    foreach(var x in xs) {
        ta.Add(x);
    }
    return ta;
}

In the current form either you specialize the functions by hand (IE write a lot of overloads) or end up returning and interface (such as IEnumerable<T>) where you lose Type information.

I know changing the IL is a pain but it will need to happen eventually, I just wanted to make sure that when it does, Higher Kinded Polymorphism is included.

area-TypeSystem-coreclr enhancement

Most helpful comment

So I did a little experiment with this. It's very much an early prototype but was just enough to flag up some of the issues and possibilities of adding higher kinded types. The first big issue I found was to do with metadata layout.

The most obvious way to add higher kinded types is to extend generic parameters to be themselves generic, thus why some people have described this feature as generic generics. Currently generic parameters in the metadata (see ECMA-335 II.22.20) have an _Owner_ field that is a coded token for either a _TypeDef_ or a _MethodDef_. An extension to allow that coded token to also refer to a _GenericParam_ would allow the encoding of generic generics. This extension presents an immediate problem due to the interaction of how coded tokens are represented and the constraint that the _GenericParam_ table must be sorted by the _Owner_ field (to allow fast lookups by owner).

For example take a single generic param on a type that is itself generic. Thus we need to encode two generic params in the table with their owners set correctly, tokens are encoded with their type in the low bits and their index in the remaining bits. So our type is either 0b00 (_TypeDef_), 0b01 (_MethodDef_), or (0b10) _GenericParam_ (0b11 is unused). In metadata tables the index is 1 indexed, so our _TypeDef_ index is at least 1. If so then this is ok as the indices are 0b100 for the _GenericParam_ whose Owner is the _TypeDef_, and 0b110 for the _GenericParam_ whose Owner is the other _GenericParam_. These can be put into sorted order (they already are). However if the _TypeDef_'s index is anything else, say 2 (0b10), then the indices are start as 0b1000 and 0b0110). These aren't sorted, so we sort and fix up references and we get 0b1010 (because it's owner is now in index 2) and 0b1000. These still aren't sorted!

I tried a few approaches and the only working approach I found was to flip the encoding (so that the high bits are the type). That has some complexity to it because the _Owner_ field can be 16 or 32 bits wide, when 16 bits bits 14 and 15 hold the type, when 32 bits it would be bits 30 and 31. Thus when resizing the table you need to shift bits, not just do a straight copy. However this does allow the two _Owner_ tokens to be encoded and sorted (0x1 and 0x8001).

That was the big stumbling block to get the prototype working, after that fix it was pretty much just a case of loosening the constraints on the verifier to allow generics where they we're previously not allowed.

Two big points that I didn't really look into with my prototype were code sharing and a functional approach to generic parameters. Code sharing doesn't look to be too difficult, simply need to synthesise generic versions of the __Cannon type as needed. That is you'll get __Cannon'1, __Cannon'2 types generated as generic params of that order are encountered.

A functional approach is how to solve the problem of partial applying or flip the order of generic params. For example imagine a (badly designed but beside the point) type that takes a generic param T<Value, Key> if you wanted to pass Dictionary<Key, Value> as a type parameter you'd need some way of flipping Key and Value around. Likewise imagine you want something like Functor<T<V>>, to pass Dictionary<Key, Value> to that you need to partially apply the Key parameter. I haven't given this much thought yet but it look's to be the big hindrance to progressing this idea any further.

Bit of a long comment, but maybe gives some ideas and directions to actually getting this feature one day.

Prototype code at https://github.com/Frassle/coreclr/tree/higherkinds

All 10 comments

CC @jkotas @kouvel

When can we have this 馃槂?

So I did a little experiment with this. It's very much an early prototype but was just enough to flag up some of the issues and possibilities of adding higher kinded types. The first big issue I found was to do with metadata layout.

The most obvious way to add higher kinded types is to extend generic parameters to be themselves generic, thus why some people have described this feature as generic generics. Currently generic parameters in the metadata (see ECMA-335 II.22.20) have an _Owner_ field that is a coded token for either a _TypeDef_ or a _MethodDef_. An extension to allow that coded token to also refer to a _GenericParam_ would allow the encoding of generic generics. This extension presents an immediate problem due to the interaction of how coded tokens are represented and the constraint that the _GenericParam_ table must be sorted by the _Owner_ field (to allow fast lookups by owner).

For example take a single generic param on a type that is itself generic. Thus we need to encode two generic params in the table with their owners set correctly, tokens are encoded with their type in the low bits and their index in the remaining bits. So our type is either 0b00 (_TypeDef_), 0b01 (_MethodDef_), or (0b10) _GenericParam_ (0b11 is unused). In metadata tables the index is 1 indexed, so our _TypeDef_ index is at least 1. If so then this is ok as the indices are 0b100 for the _GenericParam_ whose Owner is the _TypeDef_, and 0b110 for the _GenericParam_ whose Owner is the other _GenericParam_. These can be put into sorted order (they already are). However if the _TypeDef_'s index is anything else, say 2 (0b10), then the indices are start as 0b1000 and 0b0110). These aren't sorted, so we sort and fix up references and we get 0b1010 (because it's owner is now in index 2) and 0b1000. These still aren't sorted!

I tried a few approaches and the only working approach I found was to flip the encoding (so that the high bits are the type). That has some complexity to it because the _Owner_ field can be 16 or 32 bits wide, when 16 bits bits 14 and 15 hold the type, when 32 bits it would be bits 30 and 31. Thus when resizing the table you need to shift bits, not just do a straight copy. However this does allow the two _Owner_ tokens to be encoded and sorted (0x1 and 0x8001).

That was the big stumbling block to get the prototype working, after that fix it was pretty much just a case of loosening the constraints on the verifier to allow generics where they we're previously not allowed.

Two big points that I didn't really look into with my prototype were code sharing and a functional approach to generic parameters. Code sharing doesn't look to be too difficult, simply need to synthesise generic versions of the __Cannon type as needed. That is you'll get __Cannon'1, __Cannon'2 types generated as generic params of that order are encountered.

A functional approach is how to solve the problem of partial applying or flip the order of generic params. For example imagine a (badly designed but beside the point) type that takes a generic param T<Value, Key> if you wanted to pass Dictionary<Key, Value> as a type parameter you'd need some way of flipping Key and Value around. Likewise imagine you want something like Functor<T<V>>, to pass Dictionary<Key, Value> to that you need to partially apply the Key parameter. I haven't given this much thought yet but it look's to be the big hindrance to progressing this idea any further.

Bit of a long comment, but maybe gives some ideas and directions to actually getting this feature one day.

Prototype code at https://github.com/Frassle/coreclr/tree/higherkinds

Cool stuff @Frassle 馃槂

I spent my train ride this morning re-thinking the approach above. I'll need to find some time to code up a new prototype but I think I can do generic generics with a less invasive change to the metadata. Rather than re-coding indices to allow generic params to parent to generic params, we'll just add an indirection table (maybe even the generic constraint table) and parent generic params to that. There's a 1-to-1 mapping so the end result will be equivalent, but it won't require as large a metadata change.

I also thought up an approach to solve the problem of flipping or partially applying type parameters, using the idea of re-indexing holes. Basically allow type instantiations to leave a hole but pointing to a new index. Would allow flipping of type parameters Type<@1, @0> or partial application of non-final parameters Type<@0, int>. These would leave a type that was still not fully initialized but when the runtime gets the final initialization it could use the indices to fill in the type parameters in the wanted order.

HKT notes

@Frassle before you jump too far into this, is there a chance this could include values as type parameters? E.g. one would need to be able to allow type parameters to also point to compile time constants. To implement something like Tensor<3,3,float>.

BTW, the pic is unreadable :)

I've thought briefly about that before, I think it's mostly orthogonal to higher kinds. Need to do some more reading and learning before planning out how dependent types could work.

Github must of shrunk the picture. external upload

@Frassle the reason I mentioned that feature in this thread is because when you mentioned metadata encoding, I thought if you use the only reserved value for generic arguments, there will be no more values to encode references to constants.

I am referring to this part (whose relevance I might have misunderstood):

So our type is either 0b00 (TypeDef), 0b01 (MethodDef), or (0b10) GenericParam (0b11 is unused).

That's the encoding for parent pointers (every generic param has a parent that is either a TypeDef or MethodDef, or now a GenericParam). Constraints are a different table, and values would probably be another table.

Closing this issue since its been tracked in csharplang. If that gets traction we will reopen it to be supported in coreclr.

Was this page helpful?
0 / 5 - 0 ratings