Scala Crash Course, part 2: Collections

Collections

In Scala, collections are a set of useful classes and interfaces which enable you (efficient) data storage, and processing. They are divided in mutable and immutable structures (check more about this here).

Before going further, just a small word on performance: know your data structures! Whatever you are using, if you are concerned with performance, documentation is your best friend!

Lists

Scala lists are an ubiquitous data structure, in its essence a simple linked lists (#ScalaDoc), coming in both immutable and mutable flavour.Let's first construct a simple list:

List
  1. All
  2. work
  3. and
  4. no
  5. play
  6. makes
  7. Jack
  8. a
  9. dull
  10. boy
  11. .

Very useful methods on lists are head and tail. Head of a list is the first element of a list:

All

whereas the tail of a list is the list following the first element:

List
  1. work
  2. and
  3. no
  4. play
  5. makes
  6. Jack
  7. a
  8. dull
  9. boy
  10. .

Lists allow quick addition of an element to the beginning to the list (prepending):

List
  1. Jack Torrance:
  2. All
  3. work
  4. and
  5. no
  6. play
  7. makes
  8. Jack
  9. a
  10. dull
  11. boy
  12. .

Concatenating two lists:

List
  1. All
  2. work
  3. and
  4. no
  5. play
  6. makes
  7. Jack
  8. a
  9. dull
  10. boy
  11. .
  12. Jack
  13. !
  14. What
  15. ...
  16. are
  17. you
  18. ?

A frequently useful method is getting unique elements from a list:

List
  1. Breaking
  2. the
  3. law
  4. ,
  5. breaking

For more details on lists, including other useful methods check the documentation or a random tutorial.

Immutable vs mutable list

Lists are immutable - they cannot be changed!

List
  1. Can't
  2. change
  3. this
  4. !

But you can convert it to an array and change:

Can change this !

Or even better, check scala.collection.mutable package for various mutable structures, like ListBuffer:

ListBuffer
  1. Adding
  2. elements
  3. Oh, the Joy!

If you find yourself lost in immutable structures, check scala.collection.mutable

Case classes behave in the same manner:

Student
  • name John
  • surname Explosion

You cannot change m, but you CAN copy it with a change:

Student
  • name John
  • surname Brutal

Sets

Sets are data structures which store elements without an order and repetition.

An example of a set is given here:

Set
  • .
  • stirred
  • A
  • ,
  • not
  • Shaken
  • martini

And some of the most useful methods on sets are union:

Set
  • is
  • try
  • .
  • do
  • stirred
  • A
  • or
  • There
  • ,
  • not
  • Do
  • Shaken
  • martini
  • no
Set
  • .
  • ,
  • not
Set
  • is
  • try
  • do
  • or
  • There
  • Do
  • no

Maps

A map (also known as associative array or dictionary) is a collection of key-value pairs, such that each key appears exactly only once.

Map
  • saving save
  • tokens token
  • token token
  • occurring occur

Fetching the value of a specific key in the map:

token

The set of all the keys in a map:

Set
  • saving
  • tokens
  • token
  • occurring

Mutable map

Again, if you find yourself lost in the fact that you cannot add/change/update values in maps, check scala.collection.mutable package, which holds a mutable map:

Map
  • is 9
  • My 2
  • hovercraft 2

...and it is easy to add new elements...

Map
  • is 9
  • My 2
  • full 1
  • hovercraft 2

...another way of adding an element

Map
  • is 9
  • of 99
  • My 2
  • full 1
  • hovercraft 2

and it is easy to change elements (or again, another flavor of adding):

Map
  • is 9
  • of 99
  • My 2
  • eels 4
  • full 1
  • hovercraft 2

Tuples

Tuples are fixed-length lists in Scala (length up to 22 in Scala), denoted in a specific format:

Tuple2
  • _1 like
  • _2 2

You can access specific elements of that list (first, second) by using the following notation:

2

A tuple (N=2) is equivalent to a pair (as a key-value pair in the map above):

true

Option

Options are containers for optional values, which can contain values Some(X), if a value is present, and None if the value is missing. They are very useful to eliminate using null as a missing value.

In the following example, our map lemmas contains a method get which returns an optional value, whose specific value can then be accessed with a method get:

token

In case this option does not contain a value, get returns a None:

I'm sorry, Dave.

The getOrElse is particularly useful, as it enables you to either obtain a value of an option, or fall back to a default value (its parameter):

I'm afraid I can't do that.

Important methods on Collections

map

Applies a specific function on all the elements in a collection.

def map[B, Coll[B]](f: A => B): Coll[B]

The map method is one of the most frequently used methods.

In our case, having two sentences in a list:

List
  1. Daisy, Daisy, give me your answer do.
  2. I'm half crazy all for the love of you.

We define a function dyingHal and map it on each sentence in the list:

Da-sY, daiSy, give M- -our answer D-. i'M haLF craZY all For the LOVe oF You.

Let's do a short exercise through which we'll showcase the rest of the functions. Given a text:

It's not pining, it's passed on! This parrot is no more! It has ceased to be! It's expired and gone to meet its maker! This is a late parrot! It's a stiff! Bereft of life, it rests in peace! If you hadn't nailed it to the perch, it would be pushing up the daisies! It's rung down the curtain and joined the choir invisible! This is an ex-parrot!

Let's do some frequency analysis...

flatten

Flattens a collection (of collections...)

flatten[B]: Coll[B]

First, we'll split the text into sentences over exclamations points. This is a very bad way to do sentence segmentation, but you will learn better ways very soon. Afterwards, we'll split the sentences into words.

List
  1. It's
  2. not
  3. pining,
  4. it's
  5. passed
  6. on
  7. This
  8. parrot
  9. is

The first split creates an array of strings (our sentences). The split inside a map results in an array of arrays of strings (our words, in sentences). As we will take a look only at sentences, we need to flatten everything to a single array.

flatMap

Applies a function that returns a sequence to a collection, and flattens the result.

flatMap[B, Coll[B]](f: A => Coll[B]): Coll[B]

We can do the same as previously by invoking flatMap:

List
  1. It's
  2. not
  3. pining,
  4. it's
  5. passed
  6. on
  7. This
  8. parrot
  9. is

filter

Filters out a collection with a Boolean function.

filter(p: A => Boolean): Coll[A]

Seeing how our bad sentence splitting (take care of your sentence splitting!) creates some empty strings, we need to filter them out:

List
  1. it's
  2. not
  3. pining,
  4. it's
  5. passed

We can also filter out other things, like stopwords:

List
  1. It's
  2. not
  3. pining,
  4. it's
  5. passed
  6. on
  7. This
  8. parrot
  9. is
  10. no

groupBy

Groups elements of a collection by a specific discriminator function, into key (the value of the descriminator function) and value (a list of all the elements of the starting collection which produce the same value of the descriminator function).

groupBy[K](f: A => K): Map[K, Coll[K]]

We will group our words by themselves:

Map
  • down
    List
    1. down
  • it's
    List
    1. it's
    2. it's
    3. it's
    4. it's
    5. it's
  • this
    List
    1. this
    2. this
    3. this
Exercise
Group words by their first letter.

mapValues

Applies a function to every value in a map.

mapValues[C](f: B => C): Map[A, C]

The reason we grouped the words by themselves is to count them up easily. We will do that with the mapValues function which applies a desired counting function over the values of our group map.

Map
  • down 1
  • it's 5
  • this 3
  • in 1
  • pining, 1
  • perch, 1
  • life, 1
  • is 3
  • expired 1
  • late 1

maxBy

Returns a maximum value in a collection.

maxBy[B](f: A => B): A

Out of curiosity, let's take a look at the most frequent word in our text by using maxBy:

Tuple2
  • _1 it's
  • _2 5

fold

The fold method comes in three similar flavours, fold, foldLeft and foldRight (check the differences between them here). In its essence, you can view these functions as iterators with an accumulator. Starting with a dedicated starting element, this function applies a function to a starting element and the first element. Then it applies the same function to the result and the second element, and so on...

foldLeft[B](z: B)(op: (B, A) => B): B

You can use, for example, foldLeft to iterate through the map of occurrences and calculate the total number of words:

68

and use that to calculate word frequencies:

Map
  • down 0.014705882352941176
  • it's 0.07352941176470588
  • this 0.04411764705882353
  • in 0.014705882352941176
  • pining, 0.014705882352941176
  • perch, 0.014705882352941176
  • life, 0.014705882352941176
  • is 0.04411764705882353
  • expired 0.014705882352941176
  • late 0.014705882352941176

foreach

As opposed to map, which applies a function to each element of a collection, foreach calls a non-returning procedure over each element, and does not result in a new collection, as map does.

foreach[U](f: A => U): Unit

So, if we'd want to check whether our frequencies sum to 1, we would do the following:

0.9999999999999989
Exercise
Do the same by using the foldLeft function.