Elixir: Enum.filter_map/2 and Stream.filter_map/2

Created on 1 Aug 2017  路  18Comments  路  Source: elixir-lang/elixir

TL;DR This proposal is for an Enum.filter_map/2 (and Stream.filter_map/2) function that would properly implement the use case of going over an enumerable and taking some elements out while at the same element applying a transformation (mapping) to elements that are kept.

Also relevant: https://github.com/elixir-lang/elixir/issues/6164.

With the 1.5 release, Elixir deprecated the usage of Enum.filter_map/3 because it was counterintuitive and the same could be achieved with Enum.filter/2 + Enum.map/2, with Stream, or with a for comprehension.

Enum.filter_map/2 only takes an enumerable and a function. The function is called with each element of the enumerable and can return either:

  • {:cont, term} - term ends up in the resulting collection
  • :skip - the current element is not included in the resulting collection

An example of usage would be:

# Keep strings that start with a number and transform them to that number in one pass
strings = ["1234", "abc", "12ab"]
Enum.filter_map(strings, fn string ->
  case Integer.parse(string) do
    {int, _rest} ->
      {:cont, int}
    :error ->
      :skip
  end
end)
#=> [1234, 12]

If you have comments, please feel free to drop them here. I'll start work on this once we get everything down.

Most helpful comment

However, if the main premise is that the issue is discoverability, let's try to improve that with documentation! The most likely place developers will look at it is Enum.filter/2, so I have pushed docs that will hopefully guide people to the proper direction.

All 18 comments

I would love to have this feature, and would be happy to help implement it if you decide to move forward.

Although it would be nice to have, I wonder if it is worth adding considering flat_map/2 can do the same as filter_map/2 anyways:

iex(4)> strings = ["1234", "abc", "12ab"]
["1234", "abc", "12ab"]
iex(5)> Enum.flat_map(strings, fn string ->
...(5)>   case Integer.parse(string) do
...(5)>     {int, _rest} ->
...(5)>       [int]
...(5)>     :error ->
...(5)>       []
...(5)>   end
...(5)> end)
[1234, 12]

I've always used flat_map over filter_map since [] is the obvious nil value to filter out, and [value] is the obvious 'some' value to add in, and nicely enough I can even do something like [value, value] to add in the value twice if I want.

So in thinking on this, I'm not sure there is a point or reason to add filter_map/2 to Enum and Stream considering that Enum and Stream both already have a flat_map/2.

Plus if we follow the standard monad'y style, then flat_map (or bind, or to haskellers it is >>=) is the traditional name for the algorithm that Enum/Stream.flat_map/2 is already doing, so it is a 'prime' function to keep and should generally be one of the most picked functions, thus also making filter_map additionally useless as it has no monad'y correlation (rightfully so since flat_map/2 already does what filter_map/2 would do).

Technically flat_map and a constructor for a type are the only 2 things necessary for something to be a monad (EDIT: And keeping the identity laws in mind too of course, which Enum already does) as everything else can be built from those two. Helpers like map are useful to reduce verbosity (list wrapping the result), but something like filter_map does not reduce any verbosity (you still have to wrap the valid return with 'something') and in fact adds verbosity (:skip is longer than [] and {:cont, value} is longer than [value]) while also reducing functionality, which seems a very bad trade-off to me...

So yeah, I vote no on adding filter_map/2 I am sure now, due to my above arguments. I was unable to find a valid and useful reason for adding it compared to just using flat_map/2.

With that in mind, you could easily implement this feature as a lightweight convenience function wrapper around flat_map.

The valid reason for adding it is mostly intent (flat_map doesn't necessarily express filtering+mapping). As you said we could not have map either and build it through flat_map and always a one element list as the return value, but it's more clear (and performant) to use map.

I like the flat_map workaround, but it is not obvious to a new developer. Where as when I Enum. + tab in iex, if I see filter_map I know that is what I am looking for.

I'd think it would be fine to just defdelegate filter_map/2 to flat_map/2. Returning nil (the BEAM nil, the empty list of [] specifically, not the magical nil atom) make perfect sense to 'remove' the result and returning the result in a list makes sense to return it as it is obvious you are returning 'something' ([value]) or 'nothing' ([]).

However as to new developers to elixir, many languages have flat_map of some sort built in, from Javascript (known as then) to Swift (known as flatMap) to about any functional language in existence to even C++ (known as value_or in the latest proposal, for who knows what reason) to many more languages, though the names vary, in _'most'_ cases it is flatMap/flat_map, so it has enough googlability and documentability that it is already well known.

But still, a defdelegate of filter_map makes sense, plus it makes it obvious that the person should learn flat_map in general since it is just delegating to it.

(flat_map doesn't necessarily express filtering+mapping)

Also I must remark that this is one of the very purposes of flat_map so it does express it quite well.

in fact adds verbosity (:skip is longer than [] and {:cont, value} is longer than [value])

I completely agree. Something about adding in those extra atoms where a nil-like default would be perfectly reasonable feels like a bit of extra convolution.

Though one point - it is not totally obvious that the term that you want to return should be wrapped in a list. Especially if the collection that I am passing in is a map.

flat_filter_map would describe the behaviour more succinctly.

Given the usage of Enum.map/2 I would expect a function called Enum.filter_map/2 to follow a similar convention of return values.

Especially if the collection that I am passing in is a map.

Weeeeell if it was properly monadic then you should always return what is passed in, so if you flat_map'd over a map then each invocation of the function would also return a map (with the keys that you want to merge into the main base). But Enum is not written that way (The Witchcraft/Algae libraries are though! If you want it like that).

Given the usage of Enum.map/2 I would expect a function called Enum.filter_map/2 to follow a similar convention of return values.

Except they are very different. map takes a value and returns a value. filter_map would take a value and returns an 'optional' value, however Elixir does not have an optional datatype (closest thing would be [value]/[] in Erlang parlance, which happens to be what flat_map already takes as the return value). So map and filter_map would not follow similar conventions of return values as they do very different operations.

EDIT: An aside, if flat_map were a good and proper flat_map then we'd be able to do:

some_custom_type
|> Enum.flat_map(&do_something/1)
|> Enum.flat_map(&do_something_else/1)
|> Enum.flat_map(&do_something_more/1)
|> Enum.flat_map(&do_something_blorp/1)

For any given some_custom_type that implements the proper 'access' for it and you'd be able to just pass it along, which imagine if it were this:

{:error, "failed"}
|> Enum.flat_map(&do_something/1)
|> Enum.flat_map(&do_something_else/1)
|> Enum.flat_map(&do_something_more/1)
|> Enum.flat_map(&do_something_blorp/1)

Then the flat_map's could just skip the whole way down without calling any of the functions. This is the basis of Monad composition, the ability to carry along an error state while worrying only about the success states, it is the happy path in purest functional form. ^.^

For note, the above with haskell's operator would become:

some_custom_type
>>= do_something()
>>= do_something_else()
>>= do_something_more()
>>= do_something_blorp()

And so ends the slight haskell mini-tutorial of monad binding. ^.^

I'm .. Going to ignore almost all of what has been written since the opening issue.

The current implementation is:

  def filter_map(enumerable, filter, mapper) when is_list(enumerable) do
    for item <- enumerable, filter.(item), do: mapper.(item)
  end

  def filter_map(enumerable, filter, mapper) do
    enumerable
    |> reduce([], R.filter_map(filter, mapper))
    |> :lists.reverse
  end

# and R.filter_map is ...

  defmacro filter_map(filter, mapper, fun \\ nil) do
    quote do
      fn entry, acc ->
        if unquote(filter).(entry) do
          next(unquote(fun), unquote(mapper).(entry), acc)
        else
          skip(acc)
        end
      end
    end
  end

The beauty about this current implementation is that all the interfaces are respected: the mapper returns whatever you map to, the filter is a predicate. The downside to the current suggestion is that it introduces two terms that become a new interface... But we see here how a little bit of additional work might allow one to, well.. Basically re-implement this.

having to write bespoke functions for filter_map makes it almost useless. it's just as easy to write the recursive version as it is to write the fun you need to pass to filter_map

Returning nil (the BEAM nil, the empty list of [] specifically, not the magical nil atom)

Nobody calls it nil in actual conversations. Better to stick with "empty list" and avoid any confusion. :)

I find the argument that this could be implemented with flat_map to be a weak one. After all, everything in Enum today can be implemented with reduce, so should we strip all other functionality?

I would say that, if we ask Elixir developers how to convert ["1234", "abc", "12ab"] into a list of valid integers, many would implement it using two passes. Yes, flat_map definitely solves it, but the point of the current proposal is that the relationship is not immediately clear. Especially because imperative languages that support flat_map typically also support imperative constructs when traversing collections (such as next or pass).

However, the criticisms that filter_map has a more verbose output that flat_map is definitely a valid one. I see two options here:

  1. Keep the current proposal as is

  2. Allow filter_map to return an empty list or a list with one element - this makes the relationship with flat_map clearer

However, if the main premise is that the issue is discoverability, let's try to improve that with documentation! The most likely place developers will look at it is Enum.filter/2, so I have pushed docs that will hopefully guide people to the proper direction.

Was this page helpful?
0 / 5 - 0 ratings