Fp-ts: Solution for Stack Safety on sequencing arrays (no trampoline)

Created on 29 Oct 2020 · 7Comments · Source: gcanti/fp-ts

🚀 Feature request

Current Behavior

currently for using sequence or traverse on arrays we do this

import * as A from "fp-ts/lib/Array";
import * as T from "fp-ts/lib/Task";

A.sequence(T.task)([T.of(1), T.of(2)])

but after reading this issue I was thinking if we want to colocate the sequence of Array with our data types we can optimize and get better performance also stack safety.
if we look at issues concerning stack safety most(all) of them is there are some array that wants to traverse or sequence on some data types like this issue or this issue or this issue

I write sequence arrays for major data types if the community and @gcanti think this is a good idea I can make a PR.

Suggested Solution

I named this function sequenceA, A is an acronym for Array

Option

as far as know option is stack safe but the A.sequence(O.option) is painfully slow for large arrays.

my solution is this

const sequenceA = <A>(arr: Array<Option<A>>) => {
  const result = [];
  for (const a of arr) {
    if (a._tag === "None") {
      return none;
    }
    result.push(a.value);
  }
  return some(result);
};

I write a small benchmark to see how much difference it makes

import * as O from "fp-ts/lib/Option";
import * as A from "fp-ts/lib/Array";
import { pipe } from "fp-ts/lib/pipeable";

const sequenceA = <A>(arr: Array<O.Option<A>>): Option<Array<A>> => {
  const result = [];
  for (const a of arr) {
    if (a._tag === "None") {
      return O.none;
    }
    result.push(a.value);
  }
  return O.some(result);
};

console.time("current");
pipe(A.range(0, 100000), A.map(O.some), A.sequence(O.option));
console.timeEnd("current");

console.time("suggestion");
pipe(A.range(0, 100000), A.map(O.some), sequenceA);
console.timeEnd("suggestion");

the result is

current: 40.118s
suggestion: 17.546ms

from now on I write just the result for other data types the array size is 100,000 for all of them

Either

export const sequenceA: <E, A>(
  arr: Array<Either<E, A>>
) => Either<E, Array<A>> = (arr) => {
  const result = [];
  for (const e of arr) {
    if (e._tag === "Left") {
      return e;
    }
    result.push(e.right);
  }
  return right(result);
};

benchmark result

current: 44.256s
solution: 17.599ms

Task

export const sequenceA: <A>(arr: Array<Task<A>>) => Task<Array<A>> = (
  arr
) => () => Promise.all(arr.map((x) => x()));

benchmark result

current: RangeError: Maximum call stack size exceeded
suggestion: 52.492ms

Reader

export const sequenceA: <R, A>(
  arr: Array<Reader<R, A>>
) => Reader<R, Array<A>> = (arr) => (r) => arr.map((x) => x(r));

benchmark result

current: RangeError: Maximum call stack size exceeded
suggestion: 10.789ms

IO

export const sequenceA: <A>(arr: Array<IO<A>>) => IO<Array<A>> = (
  arr
) => () => arr.map((x) => x());

benchmark result

current: RangeError: Maximum call stack size exceeded
suggestion: 24.152ms

What about Monad Transformers?

we can compose them together and make other sequenceA

TaskEither

export const sequenceA: <E, A>(
  arr: Array<TaskEither<E, A>>
) => TaskEither<E, Array<A>> = (arr) =>
  pipe(TASK.sequenceA(arr), TASK.map(EITHER.sequenceA));

benchmark result

current: RangeError: Maximum call stack size exceeded
suggestion: 57.102ms

ReaderTaskEither

const sequenceA: <R, E, A>(
  arr: Array<ReaderTaskEither<R, E, A>>
) => ReaderTaskEither<R, E, A[]> = (arr) =>
  pipe(READER.sequenceA(arr), READER.map(TASKEITHER.sequenceA));

current: RangeError: Maximum call stack size exceeded
suggestion: 81.367ms

Note

I know that this solution is not exactly like something that Haskell does, but Haskell has a compiler that optimizes the hell out of these tasks.
I think this solution is rather easy to maintain compared to trampolining and the performance gain is so much that we can consider implementing it.

Source

mohaalak

👍7 🚀4

Most helpful comment

Actually I find myself writing all that parametrisation by hand each time on each project. array.sequence(instance), array.traverse(instance) etc.. Looks like it's worth adding optimized versions of them to each ADT module. I would however suggest a full name for them: sequenceArray, traverseArray. There's already a bunch of magic letters in the library: "T" for tuples, "W" for widen, let's not introduce another one.

raveclassic on 31 Oct 2020

👍5

All 7 comments

@mohaalak LGTM (not sure about the name though)

gcanti on 31 Oct 2020

👍2

@mohaalak LGTM (not sure about the name though)

yeah the name looks odd to me too, but I don't have any other idea for it

mohaalak on 31 Oct 2020

raveclassic on 31 Oct 2020

👍5

Agree with @raveclassic that magic letters should be avoided to improve clarity for both new and existing users of the library.

I also wanted to bring up the addition of TraversableWithIndex versions of these helpers. Data types that can be indexed and define a traverse function also define a traverseWithIndex function. I frequently utilize both in various projects.

Is it worth adding an additional function that exposes the index alongside the value to the main lib? Or should these helpers be defined on a per project basis?