Efficiently Counting Records in Rails
Why We Don't Use all.length
Main Content
As a junior Rails developer, I’ve been analyzing some of the anti-patterns I and
other junior developers have fallen into. One of the most common has been the
misuse of .all
in ActiveRecord, particularly when counting database records.
Counting seems easy, at least for small collections, and vanilla ruby provides multiple aliases to do it:
irb(main):001:0> [1,2,3].count
=> 3
irb(main):002:0> [1,2,3].size
=> 3
irb(main):003:0> [1,2,3].length
=> 3
The problem with naively transferring this methodology to Rail is that ActiveRecord collections are backed by databases which, for large applications, can be unreasonably large to treat as simple arrays and hashes.
Counting Rails Collections, the Wrong Way
So when going from Ruby to Rails, a newcomer might say “well, lets grab all the records, and issue any of those three methods to get the count.
irb(main):001:0> posts = Post.all
Post Load (0.5ms) SELECT "post".* FROM "posts"
=> #<ActiveRecord::Relation [#<Post id: 1, user_id: 2, text: "Hello there",
created_at: "2015-11-10 23:45:59", updated_at: "2015-11-10 23:45:59">, ...]
irb(main):002:0> posts.count
10
irb(main):003:0> posts.length
10
Notice the SQL statement that gets called.
SELECT "post".* FROM "posts"
This is asking the database to return the entire table! For a small application in is early stages, it might not seam like much (the example application I pulled this from has only ten records at the moment), but imagine how high the load on the database, application server, and network would be if every time you wanted a count of items.
A Better Approach and Rails Optimizations
If we want to avoid pulling down the entire database just to get a count of items, we can try chaining the operations together:
irb(main):001:0> Post.all.count
(0.4ms) SELECT COUNT(*) FROM "posts"
=> 10
irb(main):002:0> Post.all.size
(0.4ms) SELECT COUNT(*) FROM "posts"
=> 10
irb(main):003:0> Post.all.length
Posts Load (0.6ms) SELECT "posts".* FROM "posts"
=> 10
Notice that when calling .all.count
and .all.size
, Rails automatically
optimizes the database call, to have the database itself return the COUNT
.
However, the same optimization is not made for .all.length
.
SELECT COUNT(*) FROM "posts"
The upshot is, dont call .all.length
!
To take things to a further extreme, what happens to the database call when we
omit .all
from the chain?
irb(main):001:0> Post.count
(0.4ms) SELECT COUNT(*) FROM "posts"
=> 10
irb(main):002:0> Post.size
NoMethodError: undefined method `size' for #<Class:0x007ffd0375efd8>
irb(main):003:0> Post.length
NoMethodError: undefined method `length' for #<Class:0x007ffd0375efd8>
So calling .count
gives us the same optimized database call, while neither
.size
nor .length
are defined.
Use the SQL
This simple example shows the power of using the Rails console and understanding
the basics of SQL syntax; knowing exactly what your application code is doing
with the database behind-the-scenes is crucial to realizing the best performance
from the database. As you build more complicated database queries, it’s
important—especially until you’ve gained expertise in Rails—to test them out
on the console. More complicated queries should probably make use of named
scopes or custom SQL queries using .where