Ergonomic, idiomatic Rust and error handling?

Update: 11th June, 1017: See summary at bottom of post

As I learn to program in Rust I’m constantly struck by how is’ not a C like language. You can write code in a C-like fashion, but the language, and the libraries don’t seem to encourage that approach. If anything, it seems to steer the naive beginner, like myself, towards a more functional style of programming. Is this intentional?

In this post, I’m documenting my solution to one of the Exercism problems, and how I migrated from an initial solution based on mutable variables, to one which is solely functional. I then write a purely imperative/mutable version and then benchmark the two final functional and imperative/mutable solutions, to see the cost of the abstractions I’ve introduced. I’ll then summarise my thoughts on the readability of the code.


The Exercism problem: Largest Series Product

Version 1: A mutable variable

This is a fairly simple problem, and the problem page gives some really big hints as to how to solve it. So my first solution to the problem looks like this:

pub fn lsp(digits: &str, size: u32) -> Result<u32, &'static str> {
    if size == 0 { return Ok(1) };
    if !digits.chars().all(|c| c.is_digit(10)) {
        return Err("Invalid digit in input");
    }
    if size > digits.len() as u32 {
        return Err("Window longer than digits");
    }
    let mut max: u32 = 0;
    let numbers: Vec<u32> = digits
        .chars()
        .map(|c| c.to_digit(10).unwrap())
        .collect();
    for window in numbers.windows(size as usize) {
        let product = window.iter().product();
        if product > max { max = product; }
    }
    Ok(max)
}

Okay, so there’s only a single mutable variable in the code: the max variable the remembers the highest product. The ‘work’ in the function is done by the windows() function that takes the numbers vector (as an iterator) and produces a series of sliding windows on that iterator. The product() function then takes that iterator and calculates the produce.

So the function does:

  1. If the size is 0, just return 1.
  2. If one of the digits isn’t a number, return and Error string. The Err(...) function generates a Result<T> where T is the type of the argument provided to Err().
  3. If the size is bigger that the string of digits, then return an error.
  4. Convert the string of digits into a Vec<u32> - i.e. a list of integers.
  5. Calculate all the possible products of adjacent windows of size size remembering the biggest.
  6. Return the biggest one as a Result<u32>

I think this is a very readable program (assuming you can read a little bit of Rust), but there is an ergonomic problem with the code. I’ll demonstrate it with the next version where I remove the mutable variable.

Version 2: Remove the mutable

The next version of the program looks like this:

pub fn lsp(digits: &str, size: u32) -> Result<u32, &'static str> {
    if size == 0 { return Ok(1) };
    if !digits.chars().all(|c| c.is_digit(10)) {
        return Err("Invalid digit in input");
    }
    if size > digits.len() as u32 {
        return Err("Window longer than digits");
    }
    let numbers: Vec<u32> = digits
        .chars()
        .map(|c| c.to_digit(10).unwrap())
        .collect();
    Ok(numbers
        .windows(size as usize)
        .map(|w| w.iter().product())
        .max()
        .unwrap())
}

Now the mutable variable has disappeared. The for loop has been replaced by using the iterator directly, and passing it to the map() function which then converts the window iterator w into a product, which is then fed through the max() function which remembers the largest u32, thus eliminating the mutable variable from the first version.

The unwrap() is needed because the output from max() is an Option<T> which means the result is either Some(u32) in this case, or None.

It’s still iterating through the digits/numbers three times, though, just like the first version.

It’s probably fairly obvious that we can eliminate the numbers variable by dropping the collect() and simply chaining the windows() function after the map(). So we then get:

Version 3: One less iterator?

Actually, no, sadly not.

pub fn lsp(digits: &str, size: u32) -> Result<u32, &'static str> {
    if size == 0 { return Ok(1) };
    if !digits.chars().all(|c| c.is_digit(10)) {
        return Err("Invalid digit in input");
    }
    if size > digits.len() as u32 {
        return Err("Window longer than digits");
    }
    Ok(digits
        .chars()
        .map(|c| c.to_digit(10).unwrap())
        .collect::<Vec<_>>()
        .windows(size as usize)
        .map(|w| w.iter().product())
        .max()
        .unwrap())
}

So we didn’t actually eliminate the vector, because it’s still there in the .collect::<Vec<_>>() method. The ::<> turbo fish is a type hint to collect() to indicate what we want to collection into (as a collection), and the _ means that it should infer the collected type from the expression. However, we have eliminated a temporary variable, but are we losing readability of the program?

This version still traverses the digits string twice; once to look for invalid digits, and the other to convert them into integers. Can we combine them?

If you look carefully, you’ll see that to_digit() is followed by unwrap(). This means that it probably returns an Option or a Result. The documentation for to_digit() shows that it returns an Option. So, we unwrap it to get the value as the result of map.

The .max() method also returns an Option and so we need to unwrap that before passing it to the Ok(). (Note, I could use an ok_or(...) instead of the unwrap() and also drop the Ok(...) part as an alternative.)

Version 4: Better error handling?

The next version looks like the following. It reduces the two traversals of the digits string down to one by doing the error handling at to_digit():

#[derive(Debug)]
enum Error {
    Digit,
    Window,
}


pub fn lsp(digits: &str, size: u32) -> Result<u32, &'static str> {
    if size == 0 { return Ok(1) };
    digits
        .chars()
        .map(|c| c.to_digit(10).ok_or(Error::Digit))
        .collect::<Result<Vec<u32>, _>>()
        .and_then(|numbers| {
            numbers
                .windows(size as usize)
                .map(|w| w.iter().product())
                .max()
                .ok_or(Error::Window)
        })
        .or_else(|e| match e {
            Error::Digit => Err("Invalid digit in input"),
            Error::Window => Err("Window longer than digits"),
        })
}

In this version the two errors are incorporated into the digits..... expression. They are both done by ok_or(), one on the to_digit() map, and the other after the .max(). The method ok_or() defined on the Option converts an Option into a Result which is what the function returns. The bit at the end is to convert the Error enum that is defined into the str that the function signature returns.

Another interesting thing is that the output of the .map() is a vector of Result enums due to the ok_or() and thus the collect has that strange type annotation: <Result<Vec<u32>, _> which means that the collection is done inside a Result; that’s a pretty handy feature of the collect() implementation on an iterator of Result enums.

However, as it’s a program that calling this function, an enum is probably a better thing to return, as the caller can match on an enum. This simplifies the function further:

#[derive(Debug)]
pub enum Error {
    Digit,
    Window,
}


pub fn lsp(digits: &str, size: u32) -> Result<u32, Error> {
    if size == 0 { return Ok(1) };
    digits
        .chars()
        .map(|c| c.to_digit(10).ok_or(Error::Digit))
        .collect::<Result<Vec<u32>, _>>()
        .and_then(|numbers| {
            numbers
                .windows(size as usize)
                .map(|w| w.iter().product())
                .max()
                .ok_or(Error::Window)
        })
}

That’s about as far as I can go towards a functional approach. There’s no mutable variables, two passes through lists.

However, I wanted to compare it to a fully mutable version:

version 5: The immutable C style version

#[derive(Debug)]
pub enum Error {
    Digit,
    Window,
}

pub fn lsp(digits: &str, size: u32) -> Result<u32, Error> {
    if size == 0 { return Ok(1) };
    let mut prod = vec![0u32; size as usize];
    let mut pointer: usize = 0;
    let mut max_value: u32 = 0;
    let mut count: u32 = 0;
    for c in digits.chars() {
        count += 1;
        prod[pointer] = c.to_digit(10).ok_or(Error::Digit)?;
        pointer = (pointer + 1) % (size as usize);
        let mut sum: u32 = 1;
        for p in 0..size as usize {
            sum *= prod[p];
        }
        if sum > max_value { max_value = sum; }
    }
    if count < size { return Err(Error::Window); }
    Ok(max_value)
}

This one takes a completely different approach. The theory is:

  1. Initialise a window vector with 0 unsigned integers.
  2. Convert each digit and put it into the array but moving a pointer forwards and wrapping at the end of the array. The array is sized as the window.
  3. Calculate the product from the contents of the array.
  4. Continue until all the digits are consumed.

So this has the advantage over the functional version of only doing one pass through the digits. It also short-circuits the whole function if a non-digit is found in digits: the little ? on the end of the prod[pointer] ...? is syntactic sugar for a try!(...) macro which exits if the expression resolves to Result<E> (an error).

Performance of functional vs the imperative/mutable version

As I keep saying, I’m still at the beginning stages of learning Rust and, thus, I’m curious as to how much the functional abstractions actually cost in terms of implementation efficiency.

Thus, I’ve benchmarked the imperative (mutable) and functional versions of the program on my laptop, which is an old MacBook Pro early 2011 edition with an “Intel® Core™ i5-2435M CPU @ 2.40GHz” (from cat /proc/cpuinfo). So not exactly going to set the world on fire, and very, very slow in comparison to my desktop. The results were for 10 million iterations of:

fn main() {
    let mut value: u32 = 0;
    let sec1 = timeit!(10_000_000, {
        value = lsp_imperative("0123045678912345678987654111110", 6).unwrap();
    });
    println!("LSP imperative/mutable: {}: {}", value, sec1);
    let sec2 = timeit!(10_000_000, {
        value = lsp_functional("0123045678912345678987654111110", 6).unwrap();
    });
    println!("LSP functional: {}: {}", value, sec2);
}

Were:

(I ran it 10 times and took the lowest/fastest times for each one).

So the functional version (that I wrote) is about 50% longer that the mutable version. I expected that because the mutable version uses a trick to get around having to generate a complete vector of integers from the characters before doing the products. It also is more efficient in how it uses memory because it re-uses the vector that is used to generate the product.

However, I also think that the functional version is easier to read that the imperative/mutable version, precisely because it is more concise.

(The code for the benchmarking can be found here).

Conclusion

Rust lets you program using both functional and imperative styles. I find the functional style more digestible, and fun, to program in, but my two tests so far, the functional versions that I write are generally slower (but not necessarily significantly) than the mutable versions.

As Rust allows you to write in either way, you could, hypothetically, profile functions that needed to be fast and re-write them into the most efficient form possible. The rest of the code, however, can be elegant, idiomatic and, in theory, easier to understand.

However, I’m not sure which is the most idiomatic method, so I’m going to ask the Rust User Community and post back the response here as an update.

Update 2017-06-11

So, as promised, I asked the question on the Rust users formum and got back an interesting set of replies, which I’ll try to summarise:

  1. Functional, Imperative or a Mix is idiomatic.
  2. Fast is good.
  3. Most importantly, use the form/style that is most readable.

It’s kind of what I expected. I still think that the libraries, the use of iterators and the immutable-by-default nature of variables steers users of Rust towards a more functional style of programming. And the Rust language design, and compiler, try very, very hard to make those abstractions cost as little as possible. With no garbage collector, and very tightly controlled heap allocations, Rust does seem to be very, very quick.

comments powered by Disqus