Sometimes just using preload
or includes
is not possible… or maybe it is, but you just can’t figure it out how to do it.
Maybe you want to preload some records matching two keys, or preload a grouped relation.
Maybe you know how to represent the association with a has_many
:through
, but you need something faster, or to use less memory.
In that kind of situations, one thing that you can do is to write a custom object to represent that preload.
Here I will try to explain how and when you can do it with an example.
For example, given the next model:
class Post < ApplicationRecord
belongs_to :author, class_name: "User"
has_many :comments
end
class Comment < ApplicationRecord
belongs_to :post
belongs_to :user
has_many :votes, class_name: "CommentVote"
end
class CommentVote < ApplicationRecord
belongs_to :voter, class_name: "User"
belongs_to :comment
end
class User < ApplicationRecord
has_many :comment_votes, foreign_key: :voter_id
end
Imagine that you want to fetch the people that have a vote on a comment to a post, the “comment voters”.
To implement it, you can have a method in the post to do that…
class Post < ApplicationRecord
belongs_to :author, class_name: "User"
has_many :comments
def comment_voters
comments.preload(votes: :voter)
.flat_map(&:votes)
.flat_map(&:voter)
.uniq
end
end
And it will correctly preload the votes
and voters
to compute the result.
post = Post.first
post.comment_voters
But if you try to do it for a list of post, it will execute n+1 queries… What could you do?
In this particular case you can try to re-shape your associations to solve it with a has_many :through
.
You can add a has_many :voters
to Comment
and a has_many :comment_voters
to Post
.
Like this:
class Post < ApplicationRecord
belongs_to :author, class_name: "User"
has_many :comments
has_many :comment_voters, -> { distinct }, through: :comments, source: :voters
end
class Comment < ApplicationRecord
belongs_to :post
belongs_to :user
has_many :votes, class_name: "CommentVote"
has_many :voters, through: :votes
end
class CommentVote < ApplicationRecord
belongs_to :voter, class_name: "User"
belongs_to :comment
end
class User < ApplicationRecord
end
This will correctly compute the data in just one query
post = Post.first
post.comment_voters
And, you will also be able to preload the comment_voters
just as any other association.
Post.preload(:comment_voters).all.each do |post|
puts post.comment_voters.to_a
end
But…
If you see the executed queries, it could be not as performant as you need…
irb(main):055:0> Post.preload(:comment_voters).to_a.count
Post Load (0.7ms) SELECT "posts".* FROM "posts"
Comment Load (3.5ms) SELECT "comments".* FROM "comments" WHERE
"comments"."post_id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?) [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4],
["post_id", 5], ["post_id", 6], ["post_id", 7], ["post_id", 8], ["post_id", 9],
["post_id", 10], ["post_id", 11], ["post_id", 12], ["post_id", 13], ["post_id",
14], ["post_id", 15], ["post_id", 16]...
CommentVote Load (18.3ms) SELECT "comment_votes".* FROM "comment_votes" WHERE
"comment_votes"."comment_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
95, 96, 97, 98, 99, 100...
User Load (0.2ms) SELECT DISTINCT "users".* FROM "users" WHERE "users"."id" IN
(?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [["id", 8], ["id", 5], ["id", 1], ["id", 6],
["id", 4], ["id", 7], ["id", 3], ["id", 10], ["id", 2], ["id", 9]]
Maybe you can try to eager_load
, but maybe it could not be what you need either…
irb(main):059:0> posts = Post.eager_load(:comment_voters).to_a
SQL (47.0ms) SELECT "posts"."id" AS t0_r0, "posts"."author_id" AS t0_r1,
"posts"."title" AS t0_r2, "posts"."body" AS t0_r3, "posts"."created_at" AS
t0_r4, "posts"."updated_at" AS t0_r5, "users"."id" AS t1_r0, "users"."name" AS
t1_r1, "users"."created_at" AS t1_r2, "users"."updated_at" AS t1_r3 FROM
"posts" LEFT OUTER JOIN "comments" ON "comments"."post_id" = "posts"."id" LEFT
OUTER JOIN "comment_votes" ON "comment_votes"."comment_id" = "comments"."id"
LEFT OUTER JOIN "users" ON "users"."id" = "comment_votes"."voter_id"
Maybe what you want is to write a custom query, because you don’t want to fetch data that you don’t need.
Let’s imagine that what you need is to fetch just the users with voted comments on a collection of posts. Without preloading anything else, like this:
posts = Post.limit(30)
comment_voters = User
.select("users.*, comments.post_id") # selects the user attributes and the post_id of the comment.
.joins(comment_votes: [:comment])
.distinct
.where(comments: { post_id: posts })
.to_a
User Load (16.5ms) SELECT DISTINCT users.*, comments.post_id FROM "users"
INNER JOIN "comment_votes" ON "comment_votes"."voter_id" = "users"."id" INNER
JOIN "comments" ON "comments"."id" = "comment_votes"."comment_id" WHERE
"comments"."post_id" IN (SELECT "posts"."id" FROM "posts")
You can then group this voters by post_id
, and then in the view you can pick the voters for each post.
posts = Post.limit(30)
comment_voters = User
.select("users.*, comments.post_id") # selects the user attributes and the post_id of the comment.
.joins(comment_votes: [:comment])
.where(comments: { post_id: posts })
.distinct
.group_by(&:post_id)
posts.each do |post|
comment_voters[post.id].map(&:name)
end
That’s ok, but…
What if you don’t want to expose that code in your controller or view?… What if instead we move that to a model object?… Where should you put that code?
There are many options, but one of them, is to create a new object to help you with two things:
For this example you can call the object Post::CommentVotersPreload
and use it like this:
posts = Post.limit(30)
comment_voters = Post::CommentVotersPreload.new(posts)
posts.each do |post|
comment_voters.for_post(post).map(&:name)
end
And the implementation could look something like this:
class Post::CommentVotersPreload
def initialize(posts)
@posts = posts
end
def for_post(post)
comment_voters[post.id]
end
private
def comment_voters
@comment_voters ||= fetch_records.group_by(&:post_id)
end
def fetch_records
User.select("users.*, comments.post_id")
.joins(comment_votes: [:comment])
.where(comments: { post_id: @posts })
.distinct
end
end
You can use this kind of objects in many different problems.
What you will need is:
fetch_records
method to fetch the records you want to preload in one call to the database (or at least not n+1 queries).find_for
method to return the records related to each associated object.If you need more examples to understand the idea, you can check this other examples:
The answer to this question is not that simple (at least for me).
What I could recommend you is to first try to solve the problem in a “railsy” way… Try to use the mechanisms, provided by rails. Normally those mechanisms will compose better with other parts of the framework.
If you don’t know how to do it, because you don’t know enough about rails, try to study the rails guides, specially the associations guide, and the query interface guide. You can also study my guide for preloading associations.
But if you already know what rails provides, and you can’t find a way to do what you need to do… or you need something more efficient… or you just think that the final code is better…
Then go for it and use a “preload object”, just try to have a reason.
Learn just enough fundamentals to be fluent preloading associations with ActiveRecord, and start helping your team to avoid n+1 queries on production.