Foldable for non-Haskellers: Haskell's controversial FTP proposal
In the Haskell world, there is currently a big fuss about the “Foldable/Traversable in Prelude”-proposal.
Edit: For the record: it was a proposal, and has been implemented in the current version of Haskell (GHC)
As a non-Haskeller, you probably wonder what all the fuss is about.
First things first: some context
In most languages, you have functions to iterate over a sequence / array / iterable / list, for example in C# (assuming we have an array called values)
int[] values = new int[] {1, 2, 3};
foreach(var v in values) {
Console.WriteLine(v);
}
In Haskell, we usually refer to an sequence with a list, and the example code would look like this:
values = [1,2,3]
mapM_ print values
Let’s say we need a function that concatenates several sequences of the type a.
In somewhat contrived C# we would write it like this (assuming we use arrays):
static Ta[] Concat<Ta>(Ta[][] values) {
var res = new Ta[]{};
foreach(var v in values) {
res = res.Concat(v).ToArray();
}
return res;
}
int[][] values = new int[][] {
new int[] {1, 2, 3},
new int[] {4, 5, 6}
};
Concat(values);
// results in [1,2,3,4,5,6]
In Haskell we’d write it like this:
values :: [[Int]]
values = [[1,2,3],[4,5,6]]
concat :: [[a]] -> [a]
concat vals = foldr (\val acc -> val ++ acc) [] vals
concat values
-- results in [1,2,3,4,5,6]
Optional reading: type definitions
The odd
someName :: a -> b -> cyou see on top of the function implementations is a type definition.
- The thing before the
::defines the name of the function- The last term defines the type of return value.
- Other terms define type of the function’s parameters.
A few examples:
value :: Int
- Takes no input
- Returns a value of the type
Int- Example:
value = 5values :: [Int]
- Takes no input
- Returns a
listof values of the typeInt- Example:
values = [1,2,3]values :: [[Int]]
- Takes no input
- Returns a
listof alistof values of the typeInt- Example:
values = [[1,2,3],[4,5,6]]decrement :: Int -> Int
- Takes a parameter of the type
Int- Returns a value of the type
Int- Example:
decrement x = x - 1add :: Int -> Int -> Int
- Takes two parameters of the type
Int- Returns a value of the type
Int- Example:
addTwoInts x y = x + ylength :: [a] -> Int
- Takes a
listofawhereacan be any type- Returns a value of the type
Int- This function works on any
list, no matter which is the type ofa:
length [4,5,6]returns3length ['a','b','c']also returns3length "abc"also returns3, as aStringis a synonym for alistofChardecrement :: Num a => a -> a
- Requires a type
ato implement theNumtype class.
- A
type classis somewhat similar to aninterfacein other languages.- The type class
Numimplements arithmetic operators for the type.- Takes a parameter of the type
a- Returns a value of the type
a- Example:
decrement x = x - 1.Now back to the main content…
What is a Foldable anyway
Notice that in the Haskell example the concat function is defined for a list of lists, i.e. concat :: [[a]] -> [a].
There is a well known functional paradigm called a fold, which allows you to convert something that has multiple values into a single value.
Let’s take the example of a sum:
- Starts with an initial
accumulatorvalue of0 - Adds each of the sequence’s
valuesone by one to theaccumulator - returns the final
accumulator.
This would be the code:
sum :: Num a => [a] -> a
sum vals = foldl (\acc val -> acc + val) 0 vals
product :: Num a => [a] -> a
product vals = foldl (\acc val -> acc * val) 1 vals
Now, imagine that you have a binary tree structure where a node contains either:
- a node with a left and a right child node
- a leaf with a value
- an empty node:
data Tree a = Node (Tree a) (Tree a)
| Leaf a
| Empty
And that you need to get the sum and product for all these leaves. This goes as follows:
accrefers tothe currently accumulated valuelnrefers to aleft child nodernrefers to aright child nodevalrefers to avalue of a leaf node
If it’s a node, the XXXacc recursively calls itself to add or multiply it’s children’s values,
if it’s a leaf, it just adds/multiplies the value with the accumulator.
sum :: Num a => Tree a -> a
sum tree = sumAcc tree 0
where
sumAcc node acc =
case node of
Leaf val -> acc + val
Node ln rn -> acc + (sumAcc ln) + (sumAcc rn)
Empty -> acc
prod :: Num a => Tree a -> a
prod tree = prodAcc tree 1
where
prodAcc node acc =
case node of
Leaf val -> acc * val
Node ln rn -> acc * (prodAcc ln) * (prodAcc rn)
Empty -> acc
You might see a pattern here. Haskellers hate copy paste programming, so they extract what is common.
If you take a look at what is common and different, you see only two differences:
- We use
+and* - We start from
0and1
So a Haskeller would extract this functionality, and as this is a common pattern, we’d call it a Foldable type class instance. Code would look like this
data Tree a = Node (Tree a) (Tree a)
| Leaf a
| Empty
instance Foldable Tree where
-- some code here
sum :: Num a => Tree a -> a
sum tree = foldl (\acc val -> acc + val) 0 tree
prod :: Num a => Tree a -> a
prod tree = foldl (\acc val -> acc * val) 1 tree
Optional reading: cleaning up Haskell code
Even though this is compilable Haskell code, and the program would work, you would never see Haskell code like this in the wild. Haskellers apply some concepts to make the source code even leaner to read; let me show you how:
- Let’s take a look at this expression:
sum tree = foldl (\acc val -> acc + val) 0 tree- The type declaration is
sum :: Num a => Tree a -> aso we know the function
- Requires
ato implement theNumtype class.- Takes a
Treethat contains values of the typea- Returns a value of the type
a- In Haskell you can “convert operators into functions” by putting parenthesis around them, so
\acc val -> acc + valis equivalent to\acc val -> (+) acc val- The type for this accumulator function is
Num a => a -> a -> a- As both parameters now have the same order, we can apply a concept called currying (i.e. omit the last parameters if they are the same), so
\acc val -> (+) acc valis equivalent to\acc -> (+) acc.- The type for this accumulator function remains
Num a => a -> a -> a- we curry it once more, so
\acc -> (+) accis equivalent to\ -> (+).- The type for this accumulator function remains
Num a => a -> a -> a- Calling a function with no parameters is the same as calling the function directly, so
\ -> (+)is equivalent to(+)- The type for this accumulator function remains
Num a => a -> a -> a- Replacing this in the original declaration, we now we end up with
sum tree = foldl (+) 0 tree- The type declaration still is
sum :: Num a => Tree a -> a- As the function’s parameters are similar once again, we can omit the
treepart, so
sum tree = foldl (+) 0 treeis equivalent tosum = foldl (+) 0- The type declaration still is
sum :: Num a => Tree a -> aSo a Haskeller would most likely end up with the following code, while maintaining the same types:
data Tree a = Node (Tree a) (Tree a) | Leaf a | Empty instance Foldable Tree where -- some code here sum :: Num a => Tree a -> a sum = foldl (+) 0 prod :: Num a => Tree a -> a prod = foldl (*) 1
This has one important consequence when you are doing Haskell:
In order to figure out what parameters a function requires, you need to look at it’s type declaration, and not at the implementation.
So both the
sumand theproducttake aTreeofa, and return ana, whereaimplements theNumtype class. This is the reason why Haskellers care so much about type definitions.
Back to the controversial FTP-proposal
Haskell works with a system they call modules (referred to as namespaces in other languages).
By default, Haskell applications import a module called Prelude, which contains a lot of handy helper functions, for example the sum function we mentioned in the beginning :
sum :: Num a => [a] -> a
sum = foldl (+) 0
Now, as Haskellers do not only want to use sum on a list, but also on any Foldable instance, they decided to change the type signature to this:
sum :: (Foldable t, Num a) => t a -> a
sum = foldl (+) 0
Because there exists a Foldable instance for a list, we can use the “Foldable sum” function, and remove the sum function that only works on lists.
So what is controversial about this proposal?
People come from different backgrounds. When you talk about a list, people new in Haskell think about a sequence, and they can imagine what is happening.
However, due to the new signature (i.e. the Foldable thing), people new to Haskell might have a hard time to get started with Haskell, because they need to learn about
the concept of Foldable first, so the barrier of entry is getting larger.
In my opinion, it might be a little harder to grasp at first, but if, as a beginner, you are willing to just accept a few things without knowing why they are like they are, the barrier to entry shouldn’t be much higher than before.
UPDATE
I assumed I understood what all the fuss was about, but apparently I did not. Luckily, @bitemyapp went the extra mile to explain while it was wrong:
Either and (,) in Haskell are not arbitrary
https://t.co/2yJ1NBjMvS
— Chris Allen (@bitemyapp) 20 oktober 2015
I’d suggest you to read his article, but I’ll give you the recap here:
Next to a Foldable, we also have a Functor. A Functor is something that you can map over, a structure that contains zero, one or more values. An example in GHCI:
Prelude> let inc = (1 +)
Prelude> inc 5
6
Prelude> fmap inc [1,2,3]
[2,3,4]
Prelude> fmap inc (Just 5)
Just 6
Prelude> fmap inc Nothing
Nothing
Prelude> fmap inc (Right 4)
Right 5
Prelude> fmap inc (Left 3)
Left 3
Just think of this as a map or select (C#) statement. In the example above we have the following statements:
let inc = (1 +)- is of the type
inc :: Num a => a -> a, so it takes aNuminstance, and adds1to it?
- is of the type
inc 5- returns
6
- returns
fmap inc [1,2,3]- returns
[2,3,4] - the container is a
List, and the values in the list areNuminstances. - the type signature for
fmapis :fmap (a->b) -> [a] -> [b] fmaptakes a transformation function and a list, and returns aListcontaining the transformed values in the same order.- For a
list, Haskell’sfmapis equivalent tomap, which is equivalent to a C#selectstatement. - In C#,
fmapcould be implemented for a list asList<Tb> fmap<Ta,Tb>(Func<Ta,Tb> transform,List<Ta> list) => list.Select(transform).ToList(). Bifunctors:- The
Maybetype is a type that either represents a value or nothing.- this type was introduced to avoid null checking in all of your code.
- values are represented using
Just awhere a can be any value. - nothing is represented using
Nothing. - when you want to change the value of a
Maybe, you typically only want to change it, when it contains a value. If it does not contain a value, you just want to do nothing with it. - how do we change the value of something in a container when using Haskell? Exactly: using
fmap fmap inc (Just 5)- returns
Just 6, so it applied theincfunction to the value of theMaybe
- returns
fmap inc Nothing- returns
Nothing, because it doesn’t make any sense to apply changes to a “null value”
- returns
- let’s say we have a var called
maybeVal, that has the typeMaybe Num, so it is eitherNothingorJust xwhere x is a number. - if we want to either increment the value if it’s defined, or just return a
0if it is not, we could do it like this: maybe 0 inc maybeVal- if
maybeValisNothingit uses the first argument :0maybe 0 inc Nothingreturns0
- if
maybeValhas a value (Just 5), it applies theincfunction to the value and returns that value, i.e.6maybe 0 inc (Just 5)returns6
fmapis theFunctorimplementation,maybeis theBifunctorimplementation.
- if
- the
Eithertype- Now, avoiding
nullchecks everywhere is fun and all, but what if you would like to return some kind of error message in case a value is not defined? - For this we have the
Either a btype, that can return 2 different types:- It can
Left a, whereLeft ais typically an error message - It can return a
Right b, whereRight bis similar to aJust b, so this contains the value. - Let’s say we have a function to parse a
datefrom astring. We want it to return either the date, or an error message.- this would be the type of the function:
parseDateFromString :: String -> Either Error Date - this would be equivalent to
Either<Error,Date> parseDateFromString(String) - in case there is a problem with the parsing we’d get return value like this:
Left (Error "Invalid date format; expected YYYY/MM/DD") - in case there is no problem with the parsing we’d get the value like this:
Right (Date 2015 10 20)
- this would be the type of the function:
- What if we would like to print the year if it’s a valid year, or display the error when it is not? Let’s assume we have a value
eitherErrorOrDate- we would call
either displayError printYear eitherErrorOrDate- the type of
eitherErrorOrDatewould beEither Error Date - for now, just take my word for it: a type of
IO ()means:something that interacted with the outside world, but didn't return any value - the type of the function
displayErrorwould beError -> IO (), so it takes an error, and implies a changed world when executed. - the type of the function
printYearwould beDate -> IO (), so it takes an error, and implies a changed world when executed.
- the type of
- we would call
- It can
- Now, avoiding
- The
- returns
Now what’s wrong with the controversial FTP proposal:
In a Maybe or an Either, it makes sense that you only want to map to the Just value or the Right value.
Now, let’s say you have a Tuple, and would map over this:
- A tuple has the type
(a,b) - So when you
fmapincover(1,2), you’d expect a(2,3), correct? - Well, you don’t : GHCI returns a
(1,3) - This is because a tuple can contain 2 different types, f.e.
("Hello",4) - As an
fmapcan only take one type as an input parameter, they decided to go for the second element, sofmap inc ("Hello",4)returns("Hello", 5). - This doesn’t make sense, as the first element might be just as important in a tuple as the other, so this shouldn’t have been a
Functorin the first place.
Thank you Chris for taking the time to explain this!