Vavr: Why is Set a Function1?

Created on 22 May 2018  路  11Comments  路  Source: vavr-io/vavr

I found out after spending several minutes on a compilation error in my code, that Set extends Function1. I see it has been done in #963 but I don't understand the rationale behind.

However I do see the problems it brings to the party for pattern matching:

Set<String> abc = Set("abc");
Set<String> result = Match(abc).of( // Does not compile: the Java inference engine sees abc as a Function1<String, Boolean> before a Set<String> thus expects result to be of type Boolean
    Case($(), abc)
);

Even without considering this pattern matching issue, I think this doesn't make sense. A Set is not a function taking a T and returning whether this Set contains it.
It is also not intuitive: I don't think many (any?) developers expect this.
Last: this looks like "favoritism" towards io.vavr.collection.Set#contains method. Why isn't Set<T> a Function1<T, T> in the name of io.vavr.collection.Set#add?

There's also no hint/explanation in the Javadoc.

!BACK-COMPAT desigrefactorinimprovement 芦vavr-collection禄

Most helpful comment

@nfekete Thank you, that's interesting. If PartialFunction uses Option under the hood, one use-case is missing: We can't ask the PF anymore if it is defined for some input without calculating the output (which may be expensive).

The Bit Mapped Trie (BMT) used by Map.get already calculates Option instances. Your new PF is constructed using a method reference, which performs better/has less overhead than a lambda instance. This leads to the benchmark results we observe.

The general case would be to have a domain, which is implemented in two independent ways, with PF_old and PF_new. Also both PF_old and PF_new have to be instantiated the same way (lambda vs method reference) to get a comparable benchmark.


Nevertheless, because of the uncovered use-case mentioned above I prefer to stay with the Scala-like PF. Also we will keep our Case overloads (there exists a workaround using type-witnesses). We will remove the {Set, Seq, Map, Multimap} extends [Function|PartialFunction].

We could then add conversion methods to cover the missing functionality:

interface Set<T> {
    Function<T, Boolean> toFunction();
}

interface Seq<T> {
    PartialFunction<Integer, T> toPartialFunction();
}

interface Map<K, V> {
    PartialFunction<K, V> toPartialFunction();
}

interface Multimap<K, V> {
    PartialFunction<K, Traversable<V>> toPartialFunction();
}

Additionally Seq and Map have methods withDefault which make the PartialFunctions total. (Interestingly Multimap does not have these methods). We should move (and rename) the withDefault methods to `PartialFunction:

interface PartialFunction<T, R> extends Function1<T, R> {

    default Function1<T, R> total(R defaultValue) { ... }

    default Function1<T, R> total(Function<? super T, ? extends R> defaultFunction) { ... }

}

All 11 comments

I agree... It's just too easy to create a function out of a Set with a method reference to the contains method. I think this could be handled while we're still in the pre 1.0 phase. Or in any phase, as it's quite easy to work around it. Similarily for Seq, Map, Multimap.

Set<Integer> set = HashSet.of(1,2,3);
Function1<Integer, Boolean> f = Function1.of(set::contains);

Should I submit a PR?

@ all

We aligned to Scala, there Set extends (A) => Boolean:

screen shot 2018-05-23 at 13 10 49

The same applies to Seq and Map and Multimap:

public interface Seq<T> extends Traversable<T>, PartialFunction<Integer, T>, Serializable {
}

public interface Map<K, V> extends Traversable<Tuple2<K, V>>, PartialFunction<K, V>, Serializable {
}

public interface Multimap<K, V> extends Traversable<Tuple2<K, V>>, PartialFunction<K, Traversable<V>>, Serializable {
}

where PartialFunction extends Function1.


@Sir4ur0n

However, the real problem are the Match.Case overloads. In our case the following are interesting:

    @GwtIncompatible
    public static <T, R> Case<T, R> Case(Pattern0<T> pattern, Function<? super T, ? extends R> f) {
        Objects.requireNonNull(pattern, "pattern is null");
        Objects.requireNonNull(f, "f is null");
        return new Case0<>(pattern, f);
    }

    @GwtIncompatible
    public static <T, R> Case<T, R> Case(Pattern0<T> pattern, Supplier<? extends R> supplier) {
        Objects.requireNonNull(pattern, "pattern is null");
        Objects.requireNonNull(supplier, "supplier is null");
        return new Case0<>(pattern, ignored -> supplier.get());
    }

    @GwtIncompatible
    public static <T, R> Case<T, R> Case(Pattern0<T> pattern, R retVal) {
        Objects.requireNonNull(pattern, "pattern is null");
        return new Case0<>(pattern, ignored -> retVal);
    }

If we comment out Case(Pattern0<T> pattern, Function<? super T, ? extends R> f), your initial code example compiles fine, because Case<T, R> Case(Pattern0<T> pattern, R retVal) is taken.


Method overloading in combination with inheritance leads to ambiguities. This is a weakness of our current API. We should think about our options to remove these potential ambiguities completely.

I managed to find an example of why Scala's Set extends (A) => Boolean: https://alvinalexander.com/scala/how-use-scala-set-function-predicate

However I keep thinking this is not a good design, even in Scala. I feel like this is a time saver in 1% of the cases but misleading in 99% of the cases... As @nfekete pointed out, it's just so easy/cheap to write contains (either in Java or Scala) for these 1% that it doesn't seem relevant.

Overall using a Scala Set as a function seems to be nearly never used (or I didn't use the right terms with my friend Google).

I'm also not a good Scala developer: Scala may be handling overloading + inheritance just fine. However Java definitely doesn't.

Of course a solution is to NOT use overloading (different method names). I'm just afraid of ending with very long method names to describe each signature (CasePatternValue, CasePatternSupplier, CasePatternFunction), or worse, cryptic method names (CasePV, CasePS, CasePF) leading to overall less usability/affordance.

I agree, Set as a Function is rarely used.

Having Seq, Map and Multimap being PartialFunctions is useful in the sense that they can be used in conjunction with the Traversable.collect(PartialFunction) method. There is no syntactic sugar in order to create PartialFunction instances, they are no FunctionalInterfaces.

However, PartialFunction extends Function (like in Scala). This leads to the same ambiguities we already experienced with Set.

We could remove

public static <T, R> Case<T, R> Case(Pattern0<T> pattern, R retVal) {
    Objects.requireNonNull(pattern, "pattern is null");
    return new Case0<>(pattern, ignored -> retVal);
}

But then we wouldn't be able anymore to return constants using pattern matching, your initial example would look like this:

Set<String> abc = Set("abc");
Set<String> result = Match(abc).of(
    Case($(), () => abc)
);

or like this:

Set<String> abc = Set("abc");
Set<String> result = Match(abc).of(
    Case($(), set => set) // resp. Function.identity()
);

The question is, which functionality to remove:

  • [ ] Set extends Function1, Seq/Map/Multimap extends PartialFunction
  • [ ] Case(Pattern0<T> pattern, R retVal)

馃

What if we would have PartialFunction defined as

public interface PartialFunctionNew<T, R> extends Function1<T, Option<R>>{
    static <T, R> PartialFunctionNew<T, R> of(PartialFunctionNew<T, R> f) {
        return f;
    }
}

Then, creating a PartialFunction from a Map would be just

Map<String, Integer> map = HashMap.of("a", 1, "b", 2, "c", 3);
PartialFunction<String, Integer> pf = PartialFunctionNew.of(map::get);

This version of PartialFunction could be used in the same way as Traversable.collect(), and

traversable.collect(oldPartialFunction) 

would become

traversable.flatMap(newPartialFunction);

I know, this would break the alignment with Scala. It would also incur one extra allocation of Option<R>, in case partial function is defined at that value, otherwise it will return the singleton none().

But calling isDefinedAt and possibly apply in PartialFunction's current form has it's own cost: a double lookup in case of a Map for example. So we could argue which one is more efficient of the two. My off-the-cuff prediction would be that the allocation of Option would be the cheaper one as JVM is quite good at optimization, and escape analysis would make it possible to allocate these Option instances on the stack whenever it can make sure it doesn't escape the local context - but I admit, I haven't made any measurements on this.

I've made some quick measurement testing with the current PartialFunction and the functional interface one that would return Option. You can see the JMH benchmark and the results at https://gist.github.com/nfekete/d671372ec8fb631308a495296b47326e

There are two runs, one with 100% hit ratio, the other with 50% hit ratio (all keys have entries in the map, half of the keys have entries in the map, by random).

Interestingly, the functional interface version that returns Option is both faster and lighter on the GC. It has around half the memory allocations (avg 64 bytes vs 32 bytes when run with 100% hit ratio) compared to the current version.

@nfekete Thank you, that's interesting. If PartialFunction uses Option under the hood, one use-case is missing: We can't ask the PF anymore if it is defined for some input without calculating the output (which may be expensive).

The Bit Mapped Trie (BMT) used by Map.get already calculates Option instances. Your new PF is constructed using a method reference, which performs better/has less overhead than a lambda instance. This leads to the benchmark results we observe.

The general case would be to have a domain, which is implemented in two independent ways, with PF_old and PF_new. Also both PF_old and PF_new have to be instantiated the same way (lambda vs method reference) to get a comparable benchmark.


Nevertheless, because of the uncovered use-case mentioned above I prefer to stay with the Scala-like PF. Also we will keep our Case overloads (there exists a workaround using type-witnesses). We will remove the {Set, Seq, Map, Multimap} extends [Function|PartialFunction].

We could then add conversion methods to cover the missing functionality:

interface Set<T> {
    Function<T, Boolean> toFunction();
}

interface Seq<T> {
    PartialFunction<Integer, T> toPartialFunction();
}

interface Map<K, V> {
    PartialFunction<K, V> toPartialFunction();
}

interface Multimap<K, V> {
    PartialFunction<K, Traversable<V>> toPartialFunction();
}

Additionally Seq and Map have methods withDefault which make the PartialFunctions total. (Interestingly Multimap does not have these methods). We should move (and rename) the withDefault methods to `PartialFunction:

interface PartialFunction<T, R> extends Function1<T, R> {

    default Function1<T, R> total(R defaultValue) { ... }

    default Function1<T, R> total(Function<? super T, ? extends R> defaultFunction) { ... }

}

The same question just asked on scala's gitter:)

Map#apply looks up the key and returns the value (or throws an exception if not found)

I just had a run-in with this obscure feature. This issue creeps up whenever there are overloaded methods for Function as well as arbitrary types. Since java libraries have become more functional the potential for issues like this has grown. Depending on the situation the compilation error message is more or less bad, but never helpful. Most of the time the error occurs on a seemingly unrelated piece of code. Especially when using fluent APIs, which have also become more prevalent.

@chisui you are right. It had historic reasons, we aligned to Scala. However, newer version of Scala also removed the Function type from Set:

Screenshot 2020-08-06 at 22 54 53

Let's do it and also remove it!
Any volunteers for a PR?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yarulan picture yarulan  路  5Comments

enelson picture enelson  路  5Comments

roookeee picture roookeee  路  5Comments

civitz picture civitz  路  6Comments

maystrovyy picture maystrovyy  路  3Comments