Groovy

Groovy: Grouping using groupBy vs collectEntries

Since Groovy 1.0 we have groupBy and collectEntries to group elements from a Collection together into a Map. I had a little fun with them last week.

They both have their strong suits: groupBy accepts one or more closures indicating the grouping key, while collectEntries is highly flexible in determining both the key and value.

import groovy.transform.ToString
import java.time.*
import static java.time.Month.*
import static Importance.*

@ToString(ignoreNulls = true)
class Event {
    Long id
    String title
    LocalDate date
    Boolean exclusive
}

def events = [
    new Event(id: 1, title: "Greach", date: LocalDate.of(2017, MARCH, 31)),
    new Event(id: 2, title: "J-Spring",  date: LocalDate.of(2017, MAY, 10)),
    new Event(id: 3, title: "GOTO", date: LocalDate.of(2017, JUNE, 12)),
    new Event(id: 4, title: "Gr8Conf", date: LocalDate.of(2017, MAY, 31))
]

// groupBy will return a Map with a List of values for the key grouped on
Map<Month, List<Event>> eventsByMonth = events.groupBy { it.date.month }

List<Event> eventsInMay = eventsByMonth[MAY]
assert eventsInMay*.title == ['J-Spring', 'Gr8Conf']

// but what if you don't need a List?
// e.g. you want a lookup table by id and these are unique
Map<Long, List<Event>> eventsById = events.groupBy { it.id }
assert eventsById.toMapString() == '[1:[Event(1, Greach, 2017-03-31)], 2:[Event(2, J-Spring, 2017-05-10)], 3:[Event(3, GOTO, 2017-06-12)], 4:[Event(4, Gr8Conf, 2017-05-31)]]'

// let's get the single event for Greach which is id 1
List<Event> greachEvents = eventsById[1L]
assert greachEvents.first().title == 'Greach' // weird to get first() thing of a List

// There's only 1 event for id 1 right?
// When we know key and value have a one-to-one relation such 
// as Event and its id, which is typical for a lookup-table, groupBy 
// is a less ideal solution

// We don't want Map<Long, List<Event>>, but Map<Long, Event> instead
// collectEntries to the rescue!
Map<Long, Event> eventById = events.collectEntries {
    // here we return a Map with id as the key and Event as the value
    [(it.id) : it]
}

assert eventById.toMapString() == '[1:Event(1, Greach, 2017-03-31), 2:Event(2, J-Spring, 2017-05-10), 3:Event(3, GOTO, 2017-06-12), 4:Event(4, Gr8Conf, 2017-05-31)]'

// Now going with an id yields zero or one Event
Event greachEvent = eventById[1L]
assert greachEvent.title == 'Greach'

// Let's say you need to combine these functionalities together.

// We're introducing a slightly different data set, now indicating how
// exclusive an event is 🙂

events = [
    new Event(id: 1, title: "Greach", exclusive: false),
    new Event(id: 2, title: "J-Spring", exclusive: true),
    new Event(id: 3, title: "GOTO", exclusive: false),
    new Event(id: 4, title: "Gr8Conf", exclusive: true)
]

enum Importance { EXCLUSIVE, NOT_EXCLUSIVE, BOTH }
def importance = { it.exclusive ? EXCLUSIVE : NOT_EXCLUSIVE }

Map<Importance, List<Event>> eventsByImportance = events.groupBy(importance)
// This would yield
// [NOT_EXCLUSIVE:[Event(1, Greach, false), Event(3, GOTO, false)], EXCLUSIVE:[Event(2, J-Spring, true), Event(4, Gr8Conf, true)]]

// ...and getting exclusive events would be very familiar
assert eventsByImportance[EXCLUSIVE]*.title == ['J-Spring', 'Gr8Conf']

// Now we're creating a static structure which allows a client to check if a certain Event is pretty
// important ("EXCLUSIVE") or not ("NOT_EXCLUSIVE"), or just get it, through key called "BOTH" 
// if importance doesn't matter

Map<Importance, Map<Long, Event>> calendar = events
    .groupBy(importance) // first groupBy, which returns Map<Importance, List<Event>>
    .collectEntries { Importance key, List<Event> value ->
        // ...and transform List<Event> into Map<Long, Event>
        [(key): value.collectEntries { Event e -> [(e.id) : e]}]
    }

// If the event data came from an external source, make sure the essential keys are ALWAYS queryable
[EXCLUSIVE, NOT_EXCLUSIVE].each { key ->
    calendar.putIfAbsent(key, [:])
}
// and make sure BOTH allows to locate everything
calendar[BOTH] = calendar[EXCLUSIVE] + calendar[NOT_EXCLUSIVE]

// We can now query this calendar for various things:

// Only if event 4 is still NOT exclusive, I can find and edit it
Event event = calendar[EXCLUSIVE][4L] // Gr8Conf

// Export the id's of all exclusive events so we can send participants a gift
println calendar[EXCLUSIVE].keySet() // [2, 4]

// Get me all events sorted by title
Collection<Event> all = calendar[BOTH].values().sort { it.title }
// [Event(3, GOTO, false), Event(4, Gr8Conf, true), Event(1, Greach, false), Event(2, J-Spring, true)]

Used Groovy 2.4 and Java 8.

So both can perform similar functions, such a grouping into a Map. If you need to group multiple values in a List by a certain key you can use groupBy. If you know you can never have multiple values because you know there’s a one-to-one relation between key and value, you can use collectEntries to express this in a Map.