Benito Serna Tips and tools for Ruby on Rails developers

Fixing a bug in my lateral joins queries with rails

January 21, 2025

I have some posts about using lateral joins in rails to fetch the “top N” of each record. Some months ago Ben Sheldon helped me see a performance problem on the queries that I was using.

Here I try to explain the problem, the solution proposed by him, and also an introduction about a tool he published to build this kind of “top N” queries using lateral joins.

The problem

In his words, “The outermost condition of the association (e.g. WHERE user_id IN ($1, ...)) doesn’t get pushed down in the current query. This leads to the lateral join being applied to all records before the condition is applied.”

The EXPLAIN helps to see this problem. This is the example that has the problem:

class User < ActiveRecord::Base
  has_many :posts
  has_many :last_posts, -> { last_n_per_user(3) }, class_name: "Post"
end

class Post < ActiveRecord::Base
  scope :last_n_per_user, ->(n) {
    sql = <<-SQL
      JOIN LATERAL (
        SELECT * FROM posts
        WHERE user_id = users.id
        ORDER BY id DESC LIMIT :limit
      ) AS selected_posts ON TRUE
    SQL

    selected_posts = User
    .select("selected_posts.*")
    .joins(User.sanitize_sql([sql, limit: n]))

    from(selected_posts, "posts")
  }
end

puts "users count: 100"
puts "posts count: 10000"
users = User.preload(:last_posts).limit(2)
puts users.explain

This returns…

users count: 100
posts count: 10000
EXPLAIN SELECT "posts".* FROM (SELECT selected_posts.* FROM "users" JOIN LATERAL (
          SELECT * FROM posts
          WHERE user_id = users.id
          ORDER BY id DESC LIMIT 3
        ) AS selected_posts ON TRUE) posts WHERE "posts"."user_id" IN ($1, $2) [["user_id", 1], ["user_id", 2]]
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.29..1139.25 rows=100 width=25)
   ->  Seq Scan on users  (cost=0.00..2.00 rows=100 width=8)
   ->  Subquery Scan on selected_posts  (cost=0.29..11.36 rows=1 width=25)
         Filter: (selected_posts.user_id = ANY ('{1,2}'::integer[]))
         ->  Limit  (cost=0.29..11.32 rows=3 width=25)
               ->  Index Scan Backward using posts_pkey on posts  (cost=0.29..368.29 rows=100 width=25)
                     Filter: (user_id = users.id)

In this example I have 100 users in the database, and even when I used limit(2) in the query, the plan shows 100 rows, instead of 2.

Proposed solution

The proposed solution by Ben Sheldon, looks like this…

class User < ActiveRecord::Base
  has_many :posts
  has_many :last_posts, -> { last_n_per_user(3) }, class_name: "Post"
end

class Post < ActiveRecord::Base
  scope :last_n_per_user, ->(n) {
    selected_posts_table = Arel::Table.new('selected_posts')

    sql = <<-SQL
      JOIN LATERAL (
        SELECT * FROM posts
        WHERE user_id = users.id
        ORDER BY id DESC LIMIT :limit
      ) AS #{selected_posts_table.name} ON TRUE
    SQL

    selected_posts = User
      .select(User.arel_table["id"].as("user_id"))
      .select(Post.column_names.excluding("user_id").map { |column| selected_posts_table[column] })
      .joins(User.sanitize_sql([sql, limit: n]))

    from(selected_posts, "posts")
  }
end

ActiveRecord::Base.connection.execute("ANALYZE users, posts")
puts "users count: #{User.count}"
puts "posts count: #{Post.count}"
users = User.preload(:last_posts).limit(2)
puts users.explain

Here using Arel, he SELECT users.id as user_id and then enumerates the columns from the Post record excluding the user_id.

In his words, he does it because… “The error happens because the query in the sub-select has two user_id columns: one from users.id AS user_id and then the user_id from latest.*. That’s why it’s necessary to enumerate the columns to omit the latter. It’s not great looking 😞”

And now the EXPLAIN shows that the scan for users is limited to 2 users.

users count: 100
posts count: 10000
EXPLAIN SELECT "posts".* FROM (SELECT "users"."id" AS user_id, "selected_posts"."id", "selected_posts"."title" FROM "users" JOIN LATERAL (
          SELECT * FROM posts
          WHERE user_id = users.id
          ORDER BY id DESC LIMIT 3
        ) AS selected_posts ON TRUE) posts WHERE "posts"."user_id" IN ($1, $2) [["user_id", 1], ["user_id", 2]]
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.29..25.02 rows=6 width=29)
   ->  Seq Scan on users  (cost=0.00..2.25 rows=2 width=8)
         Filter: (id = ANY ('{1,2}'::bigint[]))
   ->  Limit  (cost=0.29..11.32 rows=3 width=25)
         ->  Index Scan Backward using posts_pkey on posts  (cost=0.29..368.29 rows=100 width=25)
               Filter: (user_id = users.id)
(6 rows)

A new tool to define top-N-per-group associations using lateral joins

Ben Sheldon published a new tool called activerecord-has_some_of_many that will give you methods like has_one_of_many and has_some_of_many to automatically build associations like the one in this example.

It will help you build “top N” queries using lateral joins that you will be able to preload (or includes) to avoid N+1 queries and also are compatible with typical queries and batch methods (find_eachin_batchesfind_in_batches).

Related articles

Download a free ebook to learn the basics of n+1 queries on Rails basics

Learn just enough fundamentals to be fluent preloading associations with ActiveRecord, and start helping your team to avoid n+1 queries on production.