Comp 110 Filter-Map-Reduce Pipeline

Filter-Map-Reduce Pipeline

The filter-map-reduce pipeline is a pattern for analyzing and processing data that you will see again and again in this class. The data we will be working with will be stored in lists or arrays that we apply functions and algorithms to in order to collect some sort of meaningful information. Note that while filteringmapping and reducing can all be done independently of each other, they just happen to work nicely together, too.

Let's take it step by step!

Filtering

Filtering is often the first step in our data processing pipeline. A function that filters takes a collection of data objects and returns a subset of those data objects based on some criteria outlined within the function. 

For example, you could write a function that given a list of birthdays, returns to you a list of birthdays with odd-numbered dates. When you filter a data set, the type of the data you pass in as a parameter must match the type of data you get back. This is helpful because we can use multiple filters one after another on the same data. Let's take a look at some code examples.

This first function filters a list of numbers and returns back the subset of that list that is negative.

let filterByNegative = (list: Node<number>): Node<number> => {
    if (list === null) {
        return null;
    } else if (first(list) < 0) {
        return cons(first(list), filterByNegative(rest(list)));    
    } else {
        return filterByNegative(rest(list));
    }
};
let main = async () => {
    let numbers: Node<number> = cons(-3, cons(3, cons(0, cons(-1, null))));
    print(filterByNegative(numbers));
};

The resulting output would be -3 --> -1 --> null. Since this is a filtering function, the type of the list parameter matches the type of list being returned. 

Filter functions can take in more than one parameter to help with the filtering. Here is a function that filters a list of people based on their horoscope sign.

let filterBySign = (people: Node<Person>, sign: string): List<Person> => {
    if (people === null) {
        return null;
    } else if (first(people).zodiacSign ==== sign) {
        return cons(first(people), filterBySign(rest(people), sign));
    } else {
        return filterBySign(rest(people), sign);
    }
};
class Person {
    age: number = 18;
    name: string = "Jeffrey";
    zodiacSign: string = "Aquarius";
};
let main = async () => {
    let p1: Person = new Person();
    p1.zodiacSign = "Gemini";
    let p2: Person = new Person();
    let p3: Person = new Person();
    let people: Node<Person> = cons(p1, cons(p2, cons(p3, null)));    
    print(filterBySign(people, "Aquarius"));
};

The resulting output will be a list of Person objects:

agenamezodiacSign
18JeffreyAquarius
18JeffreyAquarius
null

Our p1 Person object was not included because we changed its zodiacSign property to be different than the default value of "Aquarius". If we had filtered by a different criteria, for example using the "Gemini" string, only p1 would appear in our new list.

Mapping

Mapping functions take a data set of one form, and convert it into data of another form. We will often use mapping to construct a list of a completely different type than the one passed in as a parameter. This is often useful when dealing with objects in order to just work with a certain property. 

let mapToSign = (people: Node<Person>): Node<string> => {
    if (people === null) {
        return null;    
    } else {
        return cons(first(people).zodiacSign, mapToSign(rest(people)));    } };
class Person {
    age: number = 18;
    name: string = "Jeffrey";
    zodiacSign: string = "Aquarius";
};
let main = async () => {
    let p1: Person = new Person();
    p1.zodiacSign = "Gemini";
    let p2: Person = new Person();
    let p3: Person = new Person();
    let people: Node<Person> = cons(p1, cons(p2, cons(p3, null)));
    print(mapToSign(people));
};

The resulting output will be Gemini --> Aquarius --> Aquarius --> null. Notice that the return type of this function is a Node of type strings! This is because the zodiacSign property of the Person class is a string. We are converting our list of Person objects into a list of strings.

In many mapping functions this means that the type of list or array return will be different than the one passed in, but it doesn't always have to be. For example, a function that takes in a list of words and returns back a list consisting only of the first letters of those words is still a mapping function! It is transforming our data set so that we can hone in on certain properties or characteristics of our data elements.

let firstLetter = (words: Node<string>): Node<string> => {
    if (words === null) {
        return null;
    } else {
        let letter: string = first(words).substr(0, 1);
        return cons(letter, firstLetter(rest(words)));
    }
};
let main = async () => {
    let list: Node<string> = cons("Apple", cons("Banana", cons("Carrot", null)));
    print(firstLetter(list));
};

The resulting output will be A --> B --> C --> null.

Reducing

Reducing takes a data set and simplifies it in some way. This often means taking in a data set and reducing it to a single value, but a reducer function can also result in a collection of values. You can think of filtering as a special kind of reducing. However, the types of the data taken in by a reduce function and the type of data returned by the function must match.

Here is an example of a reducer function that takes a list of strings and simplifies them all into a single string.

let concatenator = (words: Node<string>): string => {
    if (words === null) {
        return "";
    } else {
        return first(words) + concatenator(rest(words));
    }
};
let main = async () => {
    let list: Node<string> = cons("Apple", cons("Banana", cons("Carrot", null)));
    print(concatenator(list));
};

The resulting output is "AppleBananaCarrot". 

Bringing it all together...

With a single dataset, we can use any combination of these three functions to retrieve various specific data.

Below illustrates an example of the filter map reduce pipeline, using anonymous literals!

1) Create our list

// creation of list of students
class Student {
    gpa: Number = 0.0;
    major: String = "undecided";
}

let kaki = new Student();
kaki.gpa = 3.7;
kaki.major = "computer science";

let morgan = new Student();
morgan.gpa = 3.5;
morgan.major = "education";

let madison = new Student();
madison.gpa = 3.9;
madison.major = "computer science";

let lizzie = new Student();
lizzie.gpa = 3.3;
lizzie.major = "german";

let studentList = listify(kaki, morgan, madison, lizzie);
print(studentList);

2) Use filter/map/reduce to count the number of computer science students that have a gpa greater than or equal to 3.5

// function that will behave as a "reducer"
let countCSStudents = (memo: number, item: string): number => {
    if (item === "computer science") {
        return memo + 1;
    } else {
        return memo; 
    }
};

// TASK: we want to use filter/map/reduce to count the number of computer science students that have a gpa greater than or equal to 3.5

// filter out students with a gpa below 3.5
// rather than using a normal predicate function, we are using a "function literal" or "anonymous function"
let studentList = filter(studentList, (item) => item.gpa >= 3.5); 

// map students so that we have a list of the majors
let majorList = map(overThreeFive, (item) => item.major); // map the students to 

//use reduce to count the number of "computer science" majors in the list
let csStudents = reduce(majors, countCSStudents, 0);

print(csStudents); // 2 is printed

But what about when these functions are nested? What would that look like?

The same logic as above can also be implemented as:

let csStudents = reduce(map(filter(studentList, (item) => item.gpa >= 3.5) , (item) => item.major), countCSStudents, 0);
// when tracing this, we work from the inside out
// filter studentList first, then map, then reduce
print(csStudents); // 2 is printed