Foldable for non-Haskellers: Haskell's controversial FTP proposal
In the Haskell world, there is currently a big fuss about the “Foldable/Traversable in Prelude”-proposal.
Edit: For the record: it was a proposal, and has been implemented in the current version of Haskell (GHC)
As a non-Haskeller, you probably wonder what all the fuss is about.
First things first: some context
In most languages, you have functions to iterate over a sequence
/ array
/ iterable
/ list
, for example in C# (assuming we have an array called values
)
int[] values = new int[] {1, 2, 3};
foreach(var v in values) {
Console.WriteLine(v);
}
In Haskell, we usually refer to an sequence
with a list
, and the example code would look like this:
values = [1,2,3]
mapM_ print values
Let’s say we need a function that concatenates several sequences of the type a
.
In somewhat contrived C# we would write it like this (assuming we use arrays):
static Ta[] Concat<Ta>(Ta[][] values) {
var res = new Ta[]{};
foreach(var v in values) {
res = res.Concat(v).ToArray();
}
return res;
}
int[][] values = new int[][] {
new int[] {1, 2, 3},
new int[] {4, 5, 6}
};
Concat(values);
// results in [1,2,3,4,5,6]
In Haskell we’d write it like this:
values :: [[Int]]
values = [[1,2,3],[4,5,6]]
concat :: [[a]] -> [a]
concat vals = foldr (\val acc -> val ++ acc) [] vals
concat values
-- results in [1,2,3,4,5,6]
Optional reading: type definitions
The odd
someName :: a -> b -> c
you see on top of the function implementations is a type definition.
- The thing before the
::
defines the name of the function- The last term defines the type of return value.
- Other terms define type of the function’s parameters.
A few examples:
value :: Int
- Takes no input
- Returns a value of the type
Int
- Example:
value = 5
values :: [Int]
- Takes no input
- Returns a
list
of values of the typeInt
- Example:
values = [1,2,3]
values :: [[Int]]
- Takes no input
- Returns a
list
of alist
of values of the typeInt
- Example:
values = [[1,2,3],[4,5,6]]
decrement :: Int -> Int
- Takes a parameter of the type
Int
- Returns a value of the type
Int
- Example:
decrement x = x - 1
add :: Int -> Int -> Int
- Takes two parameters of the type
Int
- Returns a value of the type
Int
- Example:
addTwoInts x y = x + y
length :: [a] -> Int
- Takes a
list
ofa
wherea
can be any type- Returns a value of the type
Int
- This function works on any
list
, no matter which is the type ofa
:
length [4,5,6]
returns3
length ['a','b','c']
also returns3
length "abc"
also returns3
, as aString
is a synonym for alist
ofChar
decrement :: Num a => a -> a
- Requires a type
a
to implement theNum
type class.
- A
type class
is somewhat similar to aninterface
in other languages.- The type class
Num
implements arithmetic operators for the type.- Takes a parameter of the type
a
- Returns a value of the type
a
- Example:
decrement x = x - 1
.Now back to the main content…
What is a Foldable
anyway
Notice that in the Haskell example the concat
function is defined for a list
of lists
, i.e. concat :: [[a]] -> [a]
.
There is a well known functional paradigm called a fold
, which allows you to convert something that has multiple values into a single value.
Let’s take the example of a sum
:
- Starts with an initial
accumulator
value of0
- Adds each of the sequence’s
values
one by one to theaccumulator
- returns the final
accumulator
.
This would be the code:
sum :: Num a => [a] -> a
sum vals = foldl (\acc val -> acc + val) 0 vals
product :: Num a => [a] -> a
product vals = foldl (\acc val -> acc * val) 1 vals
Now, imagine that you have a binary tree structure
where a node
contains either:
- a node with a left and a right child node
- a leaf with a value
- an empty node:
data Tree a = Node (Tree a) (Tree a)
| Leaf a
| Empty
And that you need to get the sum and product for all these leaves. This goes as follows:
acc
refers tothe currently accumulated value
ln
refers to aleft child node
rn
refers to aright child node
val
refers to avalue of a leaf node
If it’s a node
, the XXXacc
recursively calls itself to add or multiply it’s children’s values,
if it’s a leaf, it just adds/multiplies the value with the accumulator.
sum :: Num a => Tree a -> a
sum tree = sumAcc tree 0
where
sumAcc node acc =
case node of
Leaf val -> acc + val
Node ln rn -> acc + (sumAcc ln) + (sumAcc rn)
Empty -> acc
prod :: Num a => Tree a -> a
prod tree = prodAcc tree 1
where
prodAcc node acc =
case node of
Leaf val -> acc * val
Node ln rn -> acc * (prodAcc ln) * (prodAcc rn)
Empty -> acc
You might see a pattern here. Haskellers hate copy paste programming, so they extract what is common.
If you take a look at what is common and different, you see only two differences:
- We use
+
and*
- We start from
0
and1
So a Haskeller would extract this functionality, and as this is a common pattern, we’d call it a Foldable
type class instance. Code would look like this
data Tree a = Node (Tree a) (Tree a)
| Leaf a
| Empty
instance Foldable Tree where
-- some code here
sum :: Num a => Tree a -> a
sum tree = foldl (\acc val -> acc + val) 0 tree
prod :: Num a => Tree a -> a
prod tree = foldl (\acc val -> acc * val) 1 tree
Optional reading: cleaning up Haskell code
Even though this is compilable Haskell code, and the program would work, you would never see Haskell code like this in the wild. Haskellers apply some concepts to make the source code even leaner to read; let me show you how:
- Let’s take a look at this expression:
sum tree = foldl (\acc val -> acc + val) 0 tree
- The type declaration is
sum :: Num a => Tree a -> a
so we know the function
- Requires
a
to implement theNum
type class.- Takes a
Tree
that contains values of the typea
- Returns a value of the type
a
- In Haskell you can “convert operators into functions” by putting parenthesis around them, so
\acc val -> acc + val
is equivalent to\acc val -> (+) acc val
- The type for this accumulator function is
Num a => a -> a -> a
- As both parameters now have the same order, we can apply a concept called currying (i.e. omit the last parameters if they are the same), so
\acc val -> (+) acc val
is equivalent to\acc -> (+) acc
.- The type for this accumulator function remains
Num a => a -> a -> a
- we curry it once more, so
\acc -> (+) acc
is equivalent to\ -> (+)
.- The type for this accumulator function remains
Num a => a -> a -> a
- Calling a function with no parameters is the same as calling the function directly, so
\ -> (+)
is equivalent to(+)
- The type for this accumulator function remains
Num a => a -> a -> a
- Replacing this in the original declaration, we now we end up with
sum tree = foldl (+) 0 tree
- The type declaration still is
sum :: Num a => Tree a -> a
- As the function’s parameters are similar once again, we can omit the
tree
part, so
sum tree = foldl (+) 0 tree
is equivalent tosum = foldl (+) 0
- The type declaration still is
sum :: Num a => Tree a -> a
So a Haskeller would most likely end up with the following code, while maintaining the same types:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Empty instance Foldable Tree where -- some code here sum :: Num a => Tree a -> a sum = foldl (+) 0 prod :: Num a => Tree a -> a prod = foldl (*) 1
This has one important consequence when you are doing Haskell:
In order to figure out what parameters a function requires, you need to look at it’s type declaration, and not at the implementation.
So both the
sum
and theproduct
take aTree
ofa
, and return ana
, wherea
implements theNum
type class. This is the reason why Haskellers care so much about type definitions.
Back to the controversial FTP
-proposal
Haskell works with a system they call modules
(referred to as namespaces
in other languages).
By default, Haskell applications import a module called Prelude
, which contains a lot of handy helper functions, for example the sum
function we mentioned in the beginning :
sum :: Num a => [a] -> a
sum = foldl (+) 0
Now, as Haskellers do not only want to use sum
on a list
, but also on any Foldable
instance, they decided to change the type signature to this:
sum :: (Foldable t, Num a) => t a -> a
sum = foldl (+) 0
Because there exists a Foldable
instance for a list
, we can use the “Foldable sum
” function, and remove the sum
function that only works on lists.
So what is controversial about this proposal?
People come from different backgrounds. When you talk about a list
, people new in Haskell think about a sequence, and they can imagine what is happening.
However, due to the new signature (i.e. the Foldable
thing), people new to Haskell might have a hard time to get started with Haskell, because they need to learn about
the concept of Foldable
first, so the barrier of entry is getting larger.
In my opinion, it might be a little harder to grasp at first, but if, as a beginner, you are willing to just accept a few things without knowing why they are like they are, the barrier to entry shouldn’t be much higher than before.
UPDATE
I assumed I understood what all the fuss was about, but apparently I did not. Luckily, @bitemyapp went the extra mile to explain while it was wrong:
Either and (,) in Haskell are not arbitrary
https://t.co/2yJ1NBjMvS
— Chris Allen (@bitemyapp) 20 oktober 2015
I’d suggest you to read his article, but I’ll give you the recap here:
Next to a Foldable
, we also have a Functor
. A Functor
is something that you can map over
, a structure that contains zero, one or more values. An example in GHCI:
Prelude> let inc = (1 +)
Prelude> inc 5
6
Prelude> fmap inc [1,2,3]
[2,3,4]
Prelude> fmap inc (Just 5)
Just 6
Prelude> fmap inc Nothing
Nothing
Prelude> fmap inc (Right 4)
Right 5
Prelude> fmap inc (Left 3)
Left 3
Just think of this as a map
or select (C#)
statement. In the example above we have the following statements:
let inc = (1 +)
- is of the type
inc :: Num a => a -> a
, so it takes aNum
instance, and adds1
to it?
- is of the type
inc 5
- returns
6
- returns
fmap inc [1,2,3]
- returns
[2,3,4]
- the container is a
List
, and the values in the list areNum
instances. - the type signature for
fmap
is :fmap (a->b) -> [a] -> [b]
fmap
takes a transformation function and a list, and returns aList
containing the transformed values in the same order.- For a
list
, Haskell’sfmap
is equivalent tomap
, which is equivalent to a C#select
statement. - In C#,
fmap
could be implemented for a list asList<Tb> fmap<Ta,Tb>(Func<Ta,Tb> transform,List<Ta> list) => list.Select(transform).ToList()
. Bifunctors
:- The
Maybe
type is a type that either represents a value or nothing.- this type was introduced to avoid null checking in all of your code.
- values are represented using
Just a
where a can be any value. - nothing is represented using
Nothing
. - when you want to change the value of a
Maybe
, you typically only want to change it, when it contains a value. If it does not contain a value, you just want to do nothing with it. - how do we change the value of something in a container when using Haskell? Exactly: using
fmap
fmap inc (Just 5)
- returns
Just 6
, so it applied theinc
function to the value of theMaybe
- returns
fmap inc Nothing
- returns
Nothing
, because it doesn’t make any sense to apply changes to a “null value”
- returns
- let’s say we have a var called
maybeVal
, that has the typeMaybe Num
, so it is eitherNothing
orJust x
where x is a number. - if we want to either increment the value if it’s defined, or just return a
0
if it is not, we could do it like this: maybe 0 inc maybeVal
- if
maybeVal
isNothing
it uses the first argument :0
maybe 0 inc Nothing
returns0
- if
maybeVal
has a value (Just 5
), it applies theinc
function to the value and returns that value, i.e.6
maybe 0 inc (Just 5)
returns6
fmap
is theFunctor
implementation,maybe
is theBifunctor
implementation.
- if
- the
Either
type- Now, avoiding
null
checks everywhere is fun and all, but what if you would like to return some kind of error message in case a value is not defined? - For this we have the
Either a b
type, that can return 2 different types:- It can
Left a
, whereLeft a
is typically an error message - It can return a
Right b
, whereRight b
is similar to aJust b
, so this contains the value. - Let’s say we have a function to parse a
date
from astring
. We want it to return either the date, or an error message.- this would be the type of the function:
parseDateFromString :: String -> Either Error Date
- this would be equivalent to
Either<Error,Date> parseDateFromString(String)
- in case there is a problem with the parsing we’d get return value like this:
Left (Error "Invalid date format; expected YYYY/MM/DD")
- in case there is no problem with the parsing we’d get the value like this:
Right (Date 2015 10 20)
- this would be the type of the function:
- What if we would like to print the year if it’s a valid year, or display the error when it is not? Let’s assume we have a value
eitherErrorOrDate
- we would call
either displayError printYear eitherErrorOrDate
- the type of
eitherErrorOrDate
would beEither Error Date
- for now, just take my word for it: a type of
IO ()
means:something that interacted with the outside world, but didn't return any value
- the type of the function
displayError
would beError -> IO ()
, so it takes an error, and implies a changed world when executed. - the type of the function
printYear
would beDate -> IO ()
, so it takes an error, and implies a changed world when executed.
- the type of
- we would call
- It can
- Now, avoiding
- The
- returns
Now what’s wrong with the controversial FTP proposal:
In a Maybe
or an Either
, it makes sense that you only want to map
to the Just
value or the Right
value.
Now, let’s say you have a Tuple
, and would map over this:
- A tuple has the type
(a,b)
- So when you
fmap
inc
over(1,2)
, you’d expect a(2,3)
, correct? - Well, you don’t : GHCI returns a
(1,3)
- This is because a tuple can contain 2 different types, f.e.
("Hello",4)
- As an
fmap
can only take one type as an input parameter, they decided to go for the second element, sofmap inc ("Hello",4)
returns("Hello", 5)
. - This doesn’t make sense, as the first element might be just as important in a tuple as the other, so this shouldn’t have been a
Functor
in the first place.
Thank you Chris for taking the time to explain this!