Objects, Mutability, and the Mysterious Case of Hash.new([])
Do you ever have moments when the language you’ve been coding with professionally does something unexpected and completely rocks your world? So much that you rerun your code a few times just to make sure your eyeballs are working right?
You are not alone. Ruby is my first language and it still surprises me. Most recently when I was instantiating a hash with a default value.
Using default values are handy because you don’t have to check for nil if the key does not yet exist in the hash. For example, adding food to a fridge is easy using a default value of zero:
However, I want to organize my fridge by categories. I can use an empty array as the default value:
I was not expecting my cheese to end up with my meat! Just for kicks, I attempt to create a new key/value in my hash using only the shovel operator:
WHERE ARE MY CAKES. And why are new values getting assigned to every key in the hash?? I am totally confused because I was expecting this:
sorted_fridge = {:cheeses=>[:gouda], :meats=>[:salami], :cakes=>[:marzipan]}
My fridge is in DISARRAY.
Everything Is an Object
Sometimes using syntactic sugar like the += or <<= operators make me more likely to miss what the code is actually doing. Here is the code in plain old Ruby. Take a look at the right side of the equations:
Shouldn’t sorted_fridge[:cheese] evaluate to an empty array? I am lost, so I turn to Ruby’s Hash documentation for the method #new:
new → new_hash
new(obj) → new_hash
new {|hash, key| block } → new_hash
Returns a new, empty hash. If this hash is subsequently accessed by a key that doesn't correspond to a hash entry, the value returned depends on the style of new used to create the hash. In the first form, the access returns nil. If obj is specified, this single object will be used for all default values.
Emphasis mine. I wrongly assumed that I was getting a new, clean array each time I called the default value. The default value is actually a single object! I can confirm this by using the method #object_id, which is like asking the object for it’s fingerprint or unique identity:
I also notice that #<< and #+ are completely different methods. The shovel operator (<<) modifies the object (in my case the default value), whereas #+ concatenates two objects together to create a new object. So each time I tried to create a new key/value pair using the shovel operator, every key/value pair was mutated because all keys point to the same object!
How can I fix my screwed up fridge? Reading further in the #new documentation:
If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block's responsibility to store the value in the hash if required.
I can instantiate a hash with a block to return an empty array as the default value. Each time I call a key that does not exist in my hash, a new block will be yielded and thus a fresh, empty array will be returned:
Immutable vs. Mutable
Mystery solved! But I am still bothered. Why didn’t I notice that I was mutating objects when working with arrays and not mutating objects when working with integers?
What does mutability actually mean?
I know that in Ruby:
- Integers, floats, and symbols are immutable.
- Arrays, strings, and hashes are mutable.
Here’s what that means: I can’t just say that 3 is now 4. But I can say that “Princess” is now “Princess Elsa” by modifying the object by “Princess” << “ Elsa”.
Think about it this way: everything in Ruby is an object. The integer 3 is an object. I can prove it using the fingerprint method:
Whenever I work with the integer 3, I am always working with the SAME object in Ruby. Just like I was working with the SAME default value object.
What about strings? When I use the string “princess” in my program, am I using the same object every time?
Nope! Ruby creates a new object each time. And because the object is not saved to a variable, it is typically garbage collected.
If I want to hang on to the object, I must assign it to a variable. Variables are essentially pointers to places in memory where the object lives. When I reassign a variable, I tell it point to a different place (object) in memory:
Distinguish above from mutating the actual object:
It is important to know which methods will mutate objects. In Ruby, methods either modify objects in place or return a copy. Sometimes we can figure this out from the name: #gsub vs #gsub! (the bang signifies mutation). But other mutating methods, like #<<, are not as obvious. If you do need to use a mutating method but need to keep the original object intact, use the #dup method to make a copy.
Knowing this, there’s another way I could fix my Hash.new([]) dilemma. Instead of mutating the default value, I can use a non-mutating method (#+) to assign the key instead. I would then need put the value inside an array in order to concatenate them together:
Side note: Ruby’s Magic Comment
With all this talk of mutability, it’s worth mentioning frozen objects. By freezing objects, you make them immutable! For example, if I were to use the method #freeze on the default value I would no longer be able to mutate it:
What use are frozen objects to me anyways? Well, remember how Ruby creates a new object everytime we instantiate a string with the same value, like “princess”? If we have a program that uses the string ”princess” a million times, we would have a million objects taking up memory. If we freeze all princesses, all princess strings will point to the same object. By saving RAM, we might see performance gains.
TL;DR
Be careful when instantiating a Hash with a default value! Pass a block instead.
Most objects in Ruby are mutable; this is a different paradigm than most functional programming languages where objects are immutable.
You should know which Ruby methods are mutating and non-mutating (return a copy). Assigning variables is non-mutating.
There is a big difference between objects and variables. Variables point to specific places (objects) in memory and are pass-by-reference-value. Each object has a unique identity which can be retrieved using #object_id.
Use Ruby’s magic comment at the top of all files to freeze strings and increase performance.