ActiveRecord, Part 1
Who is this article for? Those who know the basics of SQL and are trying to get a handle on ActiveRecord.
First, My Struggle.
For my first technical blog post, I chose a topic that used to be challenging: ActiveRecord. I think the reason for this is because I love SQL a little too much. SQL was invented in 1974 and there’s a reason why it has persisted so long. Every time I begin an SQL query, I know exactly how to start: with a SELECT clause containing the columns I want back. And every time I know exactly what is going to be returned: a beautiful, simple table.
With ActiveRecord, I never knew what I was going to get back. Would it spit out a fake-array of objects at me? Why was it giving me a Student object when I wanted Teacher objects? Sometimes it would return a hash and I’d be lost for words. ActiveRecord’s seeming “inconsistencies” were making me frustrated. I would find myself looking up articles on why ActiveRecord sucks to enforce my own confirmation bias. I would ask the Almighty Doge, “why can’t I just make SQL queries instead of using this dumb ActiveRecord interface?”
“Because (he would whisper to me), YOU are not a good computer interface.”
I guess you’re right, Doge. The purpose of ActiveRecord, which is a type of Object Relational Mapper (ORM), is to serve as the translator between your models and your database. ActiveRecord transforms the rows in your tables into objects that you can easily work with:
Each table in your database is represented by a model
Each column in the table is represented by the model’s attributes
Each row in that table is represented by an instance of the model
ORMs are also useful because they abstract away the SQL. If you make a change to your queries, you don’t need to worry about SQL syntax because ActiveRecord plays nicely with any type (MySQL, PostgreSQL, SQLite, etc).
For these reasons, ORMs make it easy to interact with your database. Okay, I’m beginning to like ActiveRecord a little more.
What ActiveRecord Returns
ActiveRecord has great documentation with good examples that you should check out. However, I don’t think the documentation is good at breaking down ActiveRecord from a 10,000 foot view.
Let’s start off with my first conundrum: what “things” will ActiveRecord queries return? If you want an official list of all ActiveRecord return types, go into the rails console and do something along the lines of <object>.all.class.ancestors. But I'm going to simplify this for us and break them down into four main categories:
A Calculation
A Hash or Array
An Object or Collection of Objects
An empty ActiveRecord_Relation (which is AR’s way of saying “you fucked up that query but I could still formulate an SQL statement from it, so… here ya go).
First Things First
For the examples below I’ll use the repo here. Fair warning, the dataset is a list of enrollments for Hogwarts School of Witchcraft and Wizardry. The relationships looks like this:
Below are a list of "tools in my toolbox" I use when writing ActiveRecord queries. It's also good to become familiar with how to interact with the Rails command line:
Enter rails c in your terminal. This allows you to interact directly with your application from the command line. The rails console uses IRB, and you can make ActiveRecord queries from here easily.
Enter rails db in your terminal. This connects you directly with the application's development database and you can make SQL queries. (You can also enter psql, \c and then select which database you would like to connect to.) Ctrl + \ to escape.
It usually helps me to see a visual representation of the tables that are generated from ActiveRecord queries. To do this, append your ActiveRecord query with .to_sql to return the SQL that is actually being executed under the hood by the pg gem. Copy it, then hop into the rails db and paste it to see a visual representation of the tables being created.
Let’s go through types of queries based on what we want returned.
1. A Calculation
I’ll start simple. We want the average grade of all students. ActiveRecord uses the same types of aggregate functions, or calculations, that SQL uses.
In the rails console:
$ Enrollment.average(:grade)
=> 0.860555555555555556e2
The average grade of Hogwarts students is 86%. Not bad.
2. A Hash or Array
Now let’s find the student with the highest average grade.
What do I want back? The student ID and a number that is their average grade. This could look like a hash with a key of student ID and value of average grade.
First, let’s think about what this will look like in SQL. We need a GROUP BY in our query. I think of GROUP BY as smashing rows together based on a common attribute value. Grouping by student_id on Enrollments will combine rows based on their value. To illustrate the process:
Notice when I’m using a GROUP BY, I use student_id in the SELECT. If I use *, I get an SQL error that looks like this:
$ SELECT * FROM enrollments GROUP BY student_id;
=> ERROR: column "enrollments.id" must appear in GROUP BY clause or be used in an aggregate function
This is because I told SQL to smash together rows based on the student_id, but I didn’t tell it what to do with the enrollment's id column (or subject_id or grade columns either). The number of rows in our grouped student_id column is less than the number of rows in the enrollment’s id column. It’s almost like you told SQL to do this:
The same confusion happens in ActiveRecord. We get the same SQL error when we just use .group:
$ Enrollment.group(:student_id)
=> ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "enrollments.id" must appear in GROUP BY clause or be used in an aggregate function
But we don’t even use a .select in this ActiveRecord statement, so what is going on here?
By default, ActiveRecord selects ALL attributes on the model, unless told otherwise. You can override this default by specifying specific attributes in the .select OR by using an aggregate function. I’m going to resolve this one by using an aggregate function:
$ Enrollment.group(:student_id)
.average(:grade)
=> {7=>0.7e2, 2=>0.755e2, 6=>0.93e2, 9=>0.935e2, 10=>0.8e2, 12=>0.805e2, 14=>0.823333333333333333e2, 3=>0.9625e2, 11=>0.81e2, 5=>0.94e2, 13=>0.89e2, 1=>0.875e2, 4=>0.885e2, 15=>0.69e2, 16=>0.91e2, 8=>0.94e2}
Now we have a hash of a random assortment of student_ids and their corresponding average grades. Let’s add .order and .first to the query to sort by descending grade and return the highest one:
$ Enrollment.group(:student_id)
.order(“avg(grade) desc”)
.average(:grade)
.first
=> [3, 0.96e2]
*NOTE: The order of AR methods does not matter, EXCEPT when there is an aggregate function on its own. Try switching the .order and .average in the above and we're thrown an error. As soon as .average is called, a hash is created and we can no longer use AR methods on it.
So, the highest average grade is 96% and belongs to student with id #3.
TL;DR
Whenever you are grouping things together, you usually want to find some calculation based on the grouping. In general, when you use .group you will also use an aggregate function.
Our query is great and all, but I don’t know which student has student_id #3 with an average grade of 96% (it’s probably Hermione but I can’t prove it yet). We want the name of the student, but since this attribute lives on the Student model, I need to move my ActiveRecord method from Enrollment to Student.
When I do this, I still need access to the grade attribute on Enrollment. Here’s where .joins comes in to give me access. I also need the name of the student, so I will change my .group from student_id to name:
$ Student.joins(:enrollments)
.group(:name)
.order(“avg(grade) desc”)
.average(:grade)
.first
=> [“Hermione”, 0.96e2]
I told you so!
3. An Object or Collection of Objects
Our above query is great and all, however it is slightly limiting. It only gives me two pieces of information: the student’s name and the highest grade. What if I wanted my Harry Potter fan website to have a page for the student with the highest grade AND display all other information about her, like her house, her classes, etc. I need ActiveRecord to give me back a Student object instead.
This is where .select comes in. Adding .select(students.*) to my query will allow me to grab all the attributes I need on Student. I also need to move our aggregate function .average inside of the .select:
$ Student.select("students.*, avg(grade) AS average_grade")
.joins(:enrollments)
.group(:id)
.order("avg(grade) desc")
.first
=> #<Student id: 3, name: "Hermione Granger", house: "Gryffindor">
As you can see, ActiveRecord returned a Student object.
TL;DR
Whenever you add .select containing a calculation as an argument, ActiveRecord will return objects instead of hashes.
Usually, I will group by the object’s primary key, the id (by default, ActiveRecord assumes :id is the id of the model the method lives on). If I choose to group by anything else, like the name, I will get that same pesky error that says "students.id must appear in your group by clause or appear in an aggregate function." To fix this, I would need to change my .select to only include the name attribute:
$ Student.select("name, avg(grade) AS average_grade”)
.joins(:enrollments)
.group(:name)
.order(“avg(grade) desc”)
.first
=> #<Student id: nil, name: "Hermione Granger">
Unless you are grouping by the model’s primary key, whatever attributes you put in you select must also be in your group by column.
One more thing: AS. By specifying a name for my aggregate function with AS, I have created a virtual attribute called average_grade. Even though I don't see the attribute on the returned object, I can call .average_grade on it and it will return the student's average grade:
Student.select("name, avg(grade) AS average_grade")
.joins(:enrollments)
.group(:name)
.order("avg(grade) desc")
.first
.average_grade
=> 0.9625e2