I have some posts about using lateral joins in rails to fetch the “top N” of each record. Some months ago Ben Sheldon helped me see a performance problem on the queries that I was using.
Here I try to explain the problem, the solution proposed by him, and also an introduction about a tool he published to build this kind of “top N” queries using lateral joins.
In his words, “The outermost condition of the association (e.g. WHERE user_id IN ($1, ...)
) doesn’t get pushed down in the current query. This leads to the lateral join being applied to all records before the condition is applied.”
The EXPLAIN
helps to see this problem. This is the example that has the problem:
class User < ActiveRecord::Base
has_many :posts
has_many :last_posts, -> { last_n_per_user(3) }, class_name: "Post"
end
class Post < ActiveRecord::Base
scope :last_n_per_user, ->(n) {
sql = <<-SQL
JOIN LATERAL (
SELECT * FROM posts
WHERE user_id = users.id
ORDER BY id DESC LIMIT :limit
) AS selected_posts ON TRUE
SQL
selected_posts = User
.select("selected_posts.*")
.joins(User.sanitize_sql([sql, limit: n]))
from(selected_posts, "posts")
}
end
puts "users count: 100"
puts "posts count: 10000"
users = User.preload(:last_posts).limit(2)
puts users.explain
This returns…
users count: 100
posts count: 10000
EXPLAIN SELECT "posts".* FROM (SELECT selected_posts.* FROM "users" JOIN LATERAL (
SELECT * FROM posts
WHERE user_id = users.id
ORDER BY id DESC LIMIT 3
) AS selected_posts ON TRUE) posts WHERE "posts"."user_id" IN ($1, $2) [["user_id", 1], ["user_id", 2]]
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.29..1139.25 rows=100 width=25)
-> Seq Scan on users (cost=0.00..2.00 rows=100 width=8)
-> Subquery Scan on selected_posts (cost=0.29..11.36 rows=1 width=25)
Filter: (selected_posts.user_id = ANY ('{1,2}'::integer[]))
-> Limit (cost=0.29..11.32 rows=3 width=25)
-> Index Scan Backward using posts_pkey on posts (cost=0.29..368.29 rows=100 width=25)
Filter: (user_id = users.id)
In this example I have 100 users in the database, and even when I used limit(2)
in the query, the plan shows 100 rows, instead of 2.
The proposed solution by Ben Sheldon, looks like this…
class User < ActiveRecord::Base
has_many :posts
has_many :last_posts, -> { last_n_per_user(3) }, class_name: "Post"
end
class Post < ActiveRecord::Base
scope :last_n_per_user, ->(n) {
selected_posts_table = Arel::Table.new('selected_posts')
sql = <<-SQL
JOIN LATERAL (
SELECT * FROM posts
WHERE user_id = users.id
ORDER BY id DESC LIMIT :limit
) AS #{selected_posts_table.name} ON TRUE
SQL
selected_posts = User
.select(User.arel_table["id"].as("user_id"))
.select(Post.column_names.excluding("user_id").map { |column| selected_posts_table[column] })
.joins(User.sanitize_sql([sql, limit: n]))
from(selected_posts, "posts")
}
end
ActiveRecord::Base.connection.execute("ANALYZE users, posts")
puts "users count: #{User.count}"
puts "posts count: #{Post.count}"
users = User.preload(:last_posts).limit(2)
puts users.explain
Here using Arel, he SELECT users.id as user_id
and then enumerates the columns from the Post record excluding the user_id
.
In his words, he does it because… “The error happens because the query in the sub-select has two user_id
columns: one from users.id AS user_id
and then the user_id
from latest.*
. That’s why it’s necessary to enumerate the columns to omit the latter. It’s not great looking 😞”
And now the EXPLAIN
shows that the scan for users is limited to 2 users.
users count: 100
posts count: 10000
EXPLAIN SELECT "posts".* FROM (SELECT "users"."id" AS user_id, "selected_posts"."id", "selected_posts"."title" FROM "users" JOIN LATERAL (
SELECT * FROM posts
WHERE user_id = users.id
ORDER BY id DESC LIMIT 3
) AS selected_posts ON TRUE) posts WHERE "posts"."user_id" IN ($1, $2) [["user_id", 1], ["user_id", 2]]
QUERY PLAN
--------------------------------------------------------------------------------------------------
Nested Loop (cost=0.29..25.02 rows=6 width=29)
-> Seq Scan on users (cost=0.00..2.25 rows=2 width=8)
Filter: (id = ANY ('{1,2}'::bigint[]))
-> Limit (cost=0.29..11.32 rows=3 width=25)
-> Index Scan Backward using posts_pkey on posts (cost=0.29..368.29 rows=100 width=25)
Filter: (user_id = users.id)
(6 rows)
Ben Sheldon published a new tool called activerecord-has_some_of_many that will give you methods like has_one_of_many
and has_some_of_many
to automatically build associations like the one in this example.
It will help you build “top N” queries using lateral joins that you will be able to preload (or includes
) to avoid N+1 queries and also are compatible with typical queries and batch methods (find_each
, in_batches
, find_in_batches
).
Learn just enough fundamentals to be fluent preloading associations with ActiveRecord, and start helping your team to avoid n+1 queries on production.