Benito Serna Tips and tools for Ruby on Rails developers

A guide for preloading associations in rails

Maybe you are already familiar with includes or preload, but you know that a lot of the time you will need more than just preload(:comments).

It is common to have complex nested associations, scopes that you need or want to reuse, places where preloading all the associated records could hurt the performance of your app.

How do you work with complex nested associations?… How do you simplify the preloading of those nested associations?… How do you “preload an scope”?… How do you preload just the latest n of each record?

ActiveRecord gives us some ways to do some of those things, but some of them are not in the documentation or is not that easy to find them.

Well, this is a guide to help you get better on this specific skill of preloading associations, for when you need a more than just a preload(:comments).

It starts with the basics, with just a regular has_many, preload or includes, and build from this to then show you things like…

Do you want to get a PDF with this guide?

Index

Preload direct associations

You can preload direct associations preload, includes or eager_load. They are very similar but with some differences.

If you want to know the difference in more detail you can see What is the difference between includes, preload, eager_load and joins in ActiveRecord?

For example given the next model:

class Post < ApplicationRecord
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :post
end

You will be able to fetch the posts, preloading the comments doing:

posts = Post.preload(:comments)
# or...
posts = Post.includes(:comments)
# or...
posts = Post.eager_load(:comments)

And if you ask for the comments of each post it will not require a new call to the database.

puts posts.map { |post| post.comments.to_a }

You will be able to fetch the comments, preloading the posts doing:

comments = Comment.preload(:post)
# or...
comments = Comment.includes(:post)
# or...
comments = Comment.eager_load(:post)

And in the same way, if you ask for the post of each comment it will not require a new call to the database.

puts comments.map { |comment| comment.post }

Preload nested associations

For nested associations you can pass a hash with the root association as the key and the associations that you want to preload as an array.

The array can include more nested associations.

For example, given the next model:

class Post < ApplicationRecord
  belongs_to :author, class_name: "User"
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :post
  belongs_to :user
  has_many :votes, class_name: "CommentVote"
end

class CommentVote < ApplicationRecord
  belongs_to :voter, class_name: "User"
  belongs_to :comment
end

class User < ApplicationRecord
end

You can fetch the posts preloading comments and users:

posts = Post.preload(comments: :user)

puts posts.flat_map(&:comments).map(&:user)

Fetch comments preloading posts with author:

comments = Comment.preload(post: :author)

puts comments.map(&:post).map(&:author)

Fetch posts preloading comments with users and votes

posts = Post.preload(comments: [:user, :votes])

puts posts.flat_map(&:comments).map(&:user)
puts posts.flat_map(&:comments).map(&:votes).map(&:to_a)

Fetch posts preloading comments with users, votes and voters

posts = Post.preload(comments: [:user, votes: [:voter]])

puts posts.flat_map(&:comments).map(&:user)
puts posts.flat_map(&:comments).flat_map(&:votes).map(&:voter)

Fetch posts preloading author, comments and users

posts = Post.preload(:author, comments: :user)

puts posts.map(&:author)
puts posts.flat_map(&:comments).map(&:user)

Preload an scope

Some times you already have and scope that you want to reuse in a place where it can produce n+1 queries.

For example:

class Post < ApplicationRecord
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :post
  scope :popular, -> { where("votes_count > 5") }
end

class CommentVote < ApplicationRecord
  belongs_to :comment, counter_cache: :votes_count
end

If you ask for the popular comments of one post there is no problem.

post.comments.popular.count

But if you ask for the popular comments for each post on a list, you may have some n+1 queries.

posts = Post.limit(5)

posts.map do |post|
  # Here we are doing a new call to the database for each post
  post.comments.popular.size
end

Here the bad news is that you can’t really “just preload a scope”…

But the good news is that you can solve the problem by creating a new association for the scope and preload that association.

In the example, you can add a new has_many: :popular_comments, passing the scope on the association declaration, like this:

class Post < ApplicationRecord
  has_many :comments
  has_many :popular_comments, -> { popular }, class_name: "Comment"
end

class Comment < ApplicationRecord
  belongs_to :post
  scope :popular, -> { where("votes_count > 5") }
end

class CommentVote < ApplicationRecord
  belongs_to :comment, counter_cache: :votes_count
end

Now, in a similar way, you will be able to ask for the popular comments of one post without problems.

post.popular_comments.count

But now, if you need to ask for the popular comments for each post on a list, you can preload them to avoid the n+1 queries.

posts = Post.preload(:popular_comments).limit(5)

posts.map do |post|
  # Here we will use the preloaded comments. No n+1 queries!
  post.popular_comments.size
end

Preload the top 1 of an association

Imagine, that you have a model like this…

class Posts
  has_many :comments

  def latest_comment
    comments.order(:created_at).last
  end
end

When you are fetching a single post there is no problem…

post = Post.find(id)
puts post.latest_comment

But when you try to fetch a list of posts, the method seems to ignore the includes and runs a query for each post to get the latest_comment for each post.

Post.includes(:comments).each do |post|
  puts post.latest_comment
end

One way of solving this problem is to instead order comments on the association definition and to later just ask for the last sorted comment.

class Post
  has_many :sorted_comments, -> { order(:created_at) }, class_name: "Comment"

  def latest_comment
    sorted_comments.last
  end
end

And now you will be able to preload the sorted_comments ask for the lastest_comment of each post without n+1 queries.

Post.includes(:sorted_comments).each do |post|
  puts post.latest_comment
end

The problem with this method is that you will fetch all the comments for every post in the list and sometimes it could be a problem.

But, you can avoid this problem!

You can use has_one association for the latest_comment, by providing a scope with just the latest_comment for each post.

class Post < ActiveRecord::Base
  has_many :comments
  has_one :latest_comment, -> { latest_comment_per_posts }, class_name: "Comment"
end
class Comment < ActiveRecord::Base
  belongs_to :post

  def self.latest_comment_per_posts
    latest_comments_ids = select("max(id)").group(:post_id)
    where(id: latest_comments_ids)
  end
end

Here, in the first line of latest_comment_per_post we are building the sql to find the latest comment id for each post.

latest_comments_ids = select("max(id)").group(:post_id)

Something interesting is that the query will not be executed it will just pass the ActiveRecord::Relation to the where clause.

where(id: latest_comments_ids)

And now you can preload the has one association and fetch just the latest_comment for each post.

Post.includes(:latest_comment).all.each do |post|
  puts post.latest_comment.body
end

You can explore other ways of doing it on 5 ways to fix the latest-comment n+1 problem.

Preload the top n of an association

Sometimes you need to fetch just the top elements on a list, like:

For example, for the next model:

class Post < ApplicationRecord
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :post
end

If we want to fetch the latest 3 comments for a post, we can write something like this:

class Post < ApplicationRecord
  has_many :comments

  def latest_comments
    comments.order(id: :desc).limit(3)
  end
end

post.latest_comments

But as we saw on preload an scope, if we try to use it on a list we will get n+1 queries

Post.all.each do |post|
  # This will create a call for each post
  post.latest_comments.each do |comment|
    puts comment.body
  end
end

And in this case, you can’t just move the scope to the association, because it will fetch only 3 comments in total, not for each post.

class Post < ApplicationRecord
  has_many :comments
  has_many :latest_comments, -> { order(id: :desc).limit(3) },
    class_name: "Comment"
end

Post.includes(:latest_comments).limit(10).map do |post|
  post.latest_comments.each do |comment|
    puts comment.body
  end
end

On the the sql produced on the preload, you can see that we are just fetching 3 comments in total.

SELECT "posts".* FROM "posts" LIMIT 10

SELECT "comments".*
  FROM "comments"
  WHERE "comments"."post_id" IN (...)
  ORDER BY "comments"."id" DESC
  LIMIT 3

One way to solve the problem could be to, keep just the order on the association declaration, and do the selection in ruby, like this:

class Post < ApplicationRecord
  has_many :comments
  has_many :sorted_comments, -> { order(id: :desc) },
    class_name: "Comment"
end

class Comment < ApplicationRecord
  belongs_to :post
end

Post.includes(:sorted_comments).limit(10).map do |post|
  post.sorted_comments.to_a.first(3).each do |comment|
    puts comment.body
  end
end

But if you have many comments per post it can be a problem. Because it will load them all.

You can check the Benchmarks for the fixes to the latest-comment n+1 problem to get more specific on what is “many comments”.

One way to solve this problem is to use a lateral join. There are more, but you can start with this.

class Post < ApplicationRecord
  has_many :comments
  has_many :latest_comments, -> { latest_comments_per_post },
    class_name: "Comment"
end

class Comment < ApplicationRecord
  belongs_to :post

  def self.latest_comments_per_post
    latest_comments = Post
      .select("latest_comments.*")
      .joins(<<-SQL)
        JOIN LATERAL (
          SELECT * FROM comments
          WHERE post_id = posts.id
          ORDER BY id DESC LIMIT 3
        ) AS latest_comments ON TRUE
      SQL

    from(latest_comments, "comments").order(id: :desc)
  end
end

Post.includes(:latest_comments).each do |post|
  post.latest_comments.each do |comment|
    puts comment.body
  end
end

On the method latest_comments_per_posts we are fetching just the latest three comments per posts. Because it will apply the limit “inside” the join.

def self.latest_comments_per_post
  latest_comments = Post
     # Here we select just the "comments"
    .select("latest_comments.*")

    # Here we are doing the limit inside the join,
    # it will apply it for the comments of each post.
    .joins(<<-SQL)
      JOIN LATERAL (
        SELECT * FROM comments
        WHERE post_id = posts.id
        ORDER BY id DESC LIMIT 3
      ) AS latest_comments ON TRUE
    SQL

  # Here we use the returned "latest_comments" and
  # we name them like "comments". It will tell rails
  # to instantiate Comment objects
  from(latest_comments, "comments").order(id: :desc)
end

You can explore other ways of doing it on 5 ways to fetch the latest-N-of-each record on Rails.

Simplify preloading with has_many through

Preloading nested associations is not that hard, but sometimes could be unnecessary for certain actions.

For example, with the next model:

class Post < ApplicationRecord
  belongs_to :author, class_name: "User"
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :post
  belongs_to :user
  has_many :votes, class_name: "CommentVote"
end

class CommentVote < ApplicationRecord
  belongs_to :voter, class_name: "User"
  belongs_to :comment
end

class User < ApplicationRecord
end

Imagine that you want to fetch the people that have a vote on a comment in a post, the “comment voters”.

To implement it, you can have a method in the post to do that…

class Post < ApplicationRecord
  belongs_to :author, class_name: "User"
  has_many :comments

  def comment_voters
    comments.preload(votes: :voter)
      .flat_map(:votes)
      .flat_map(:voter)
      .uniq
  end
end

And it will correctly preload the votes and voters to compute the result

puts post.comment_voters

But if you try to do it for a list of post, it will execute n+1 queries, and also you will be loading a bunch of data that you don’t need.

You can try to do a “better query” to avoid preloading all that data, but making it work in a list will be very hard.

Fortunately you can use a has_many through: associations to do this operation in a much simpler way.

You can add a has_many :voters to Comment and a has_many :comment_voters to Post.

Like this:

class Post < ApplicationRecord
  belongs_to :author, class_name: "User"
  has_many :comments
  has_many :comment_voters, -> { distinct }, through: :comments, source: :voters
end

class Comment < ApplicationRecord
  belongs_to :post
  belongs_to :user
  has_many :votes, class_name: "CommentVote"
  has_many :voters, through: :votes
end

class CommentVote < ApplicationRecord
  belongs_to :voter, class_name: "User"
  belongs_to :comment
end

class User < ApplicationRecord
end

This will correctly compute the data in just one query

puts post.comment_voters

But, you will also be able to preload the comment_voters just as any other association.

Post.preload(:comment_voters).all.each do |post|
  puts post.comment_voters.to_a
end

Preload object

Sometimes using using the default preloading mechanisms from rails, preload, includes and eager_load is not possible… or maybe it is, but you just can’t figure it out how to do it.

Maybe you know how to represent the association with a has_many :through, but you need something faster, or to use less memory.

In that kind of situations, one thing that you can do is to write a custom object to represent that preload.

For example, with the example from the last section:

class Post < ApplicationRecord
  belongs_to :author, class_name: "User"
  has_many :comments
  has_many :comment_voters, -> { distinct }, through: :comments, source: :voters
end

class Comment < ApplicationRecord
  belongs_to :post
  belongs_to :user
  has_many :votes, class_name: "CommentVote"
  has_many :voters, through: :votes
end

class CommentVote < ApplicationRecord
  belongs_to :voter, class_name: "User"
  belongs_to :comment
end

class User < ApplicationRecord
end

You know that this will correctly compute the data in just one query

post = Post.first
post.comment_voters

And, that you will also be able to preload the comment_voters just as any other association.

Post.preload(:comment_voters).all.each do |post|
  puts post.comment_voters.to_a
end

But…

If you see the executed queries, it could be not as performant as you need…

irb(main):055:0> Post.preload(:comment_voters).to_a.count

Post Load (0.7ms)  SELECT "posts".* FROM "posts"

Comment Load (3.5ms)  SELECT "comments".* FROM "comments" WHERE
"comments"."post_id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?)  [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4],
["post_id", 5], ["post_id", 6], ["post_id", 7], ["post_id", 8], ["post_id", 9],
["post_id", 10], ["post_id", 11], ["post_id", 12], ["post_id", 13], ["post_id",
14], ["post_id", 15], ["post_id", 16]...

CommentVote Load (18.3ms)  SELECT "comment_votes".* FROM "comment_votes" WHERE
"comment_votes"."comment_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
95, 96, 97, 98, 99, 100...

User Load (0.2ms)  SELECT DISTINCT "users".* FROM "users" WHERE "users"."id" IN
(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["id", 8], ["id", 5], ["id", 1], ["id", 6],
["id", 4], ["id", 7], ["id", 3], ["id", 10], ["id", 2], ["id", 9]]

Maybe you can try to eager_load, but maybe it could not be what you need either…

irb(main):059:0> posts = Post.eager_load(:comment_voters).to_a

SQL (47.0ms)  SELECT "posts"."id" AS t0_r0, "posts"."author_id" AS t0_r1,
"posts"."title" AS t0_r2, "posts"."body" AS t0_r3, "posts"."created_at" AS
t0_r4, "posts"."updated_at" AS t0_r5, "users"."id" AS t1_r0, "users"."name" AS
t1_r1, "users"."created_at" AS t1_r2, "users"."updated_at" AS t1_r3 FROM
"posts" LEFT OUTER JOIN "comments" ON "comments"."post_id" = "posts"."id" LEFT
OUTER JOIN "comment_votes" ON "comment_votes"."comment_id" = "comments"."id"
LEFT OUTER JOIN "users" ON "users"."id" = "comment_votes"."voter_id"

Trying a custom query

Maybe what you want is to write a custom query, because you don’t want to fetch data that you don’t need.

Let’s imagine that what you need is to fetch just the users with voted comments on a collection of posts. Without preloading anything else, like this:

posts = Post.limit(30)
comment_voters = User
  .select("users.*, comments.post_id") # selects the user attributes and the post_id of the comment.
  .joins(comment_votes: [:comment])
  .distinct
  .where(comments: { post_id: posts })
  .to_a
User Load (16.5ms)  SELECT DISTINCT users.*, comments.post_id FROM "users"
INNER JOIN "comment_votes" ON "comment_votes"."voter_id" = "users"."id" INNER
JOIN "comments" ON "comments"."id" = "comment_votes"."comment_id" WHERE
"comments"."post_id" IN (SELECT "posts"."id" FROM "posts")

You can then group this voters by post_id, and then in the view you can pick the voters for each post.

posts = Post.limit(30)
comment_voters = User
  .select("users.*, comments.post_id") # selects the user attributes and the post_id of the comment.
  .joins(comment_votes: [:comment])
  .where(comments: { post_id: posts })
  .distinct
  .group_by(&:post_id)

posts.each do |post|
  comment_voters[post.id].map(&:name)
end

That’s ok, but…

What if you don’t want to expose that code in your controller or view?… What if instead we move that to a model object?… Where should you put that code?

A preload object

There are many options, but one of them, is to create a new object to help you with two things:

For this example you can call the object Post::CommentVotersPreload and use it like this:

posts = Post.limit(30)
comment_voters = Post::CommentVotersPreload.new(posts)

posts.each do |post|
  comment_voters.for_post(post).map(&:name)
end

And the implementation could look something like this:

class Post::CommentVotersPreload
  def initialize(posts)
    @posts = posts
  end

  def for_post(post)
    comment_voters[post.id]
  end

  private

  def comment_voters
    @comment_voters ||= fetch_records.group_by(&:post_id)
  end

  def fetch_records
    User.select("users.*, comments.post_id")
      .joins(comment_votes: [:comment])
      .where(comments: { post_id: @posts })
      .distinct
  end
end

You can use this kind of objects in many different problems.

What you will need is:

If you need more examples to understand the idea, you can check this other examples:

When should you use it?

The answer to this question is not that simple (at least for me).

What I could recommend you is to first try to solve the problem in a “railsy” way, using the mechanisms, provided by rails. Normally those mechanisms will compose better with other parts of the framework.

But if you have already tried what rails provides, and you can’t find a way to do what you need to do… or you need something more efficient… or you just think that the final code is better…

Then go for it and use a “preload object”, just try to have a reason.

Final notes

You made to the end! Congrats!

Let’s recap what you can do now…

I hope that you can feel that now you are a better rails developer!

I hope you have enjoyed this guide!

If you have some feedback on how this guide can help you better, please write a comment =)

Related articles