Benito Serna
Ruby/Rails, TDD, Software...

Benchmarks for the fixes to the latest-comment n+1 problem

What fix should you pick for your n+1 queries problem?

When you are trying to solve an n+1 problem, hopefully you will find different ways to solve it. But what is the one that you should use?

You can search the internet, but opinions can be overwhelming and confusing, because… “Window functions are great option here"… or "Window functions will work, but they can be horrible on performance"… "I’d go with caching (in a column)"… "You should use Russian-doll caching"… "I’d write custom SQL"…

Talking specifically about the fixes that I share to you to solve the "latest comment” n+1 queries problem… I think you can use the first one, “the association with default order”, unless your metrics are telling you that you need something different.

But any way, I think that been aware of the performance implications of your code and understanding the tradeoffs is a good thing

That’s why I want to share with you some benchmarks to the 5 fixes in different scenarios, to help you understand which solution is better in each case, and which solution would be enough for your case.

Benchmark structure

You will have results for this scenarios…

Each scenario will include the memory and the “iterations per second” benchmarks.

With this “reports"…

All the reports where tested in a MacBook Pro (13-inch 2020) with:

Benchmark code

This is the code used for the benchmark…

require "./config"
require "./seeds"
require "benchmark"
require "benchmark/memory"
require "benchmark/ips"

class Post < ActiveRecord::Base
  has_many :comments
  has_many :sorted_comments, -> { order(:id) }, class_name: "Comment"
  has_one :latest_comment, -> { Comment.latest_comments_for_posts }, class_name: "Comment"
  belongs_to :cached_comment, class_name: "Comment"
end

class Comment < ActiveRecord::Base
  belongs_to :post
  after_commit :update_cached_comment

  def self.latest_comments_for_posts
    where(id: select("max(id)").group(:post_id))
  end

  private

  def update_cached_comment
    post.update(cached_comment_id: id)
  end
end

class Feed
  def posts
    @posts ||= fetch_posts
  end

  def fetch_posts
    posts = Post.all
    comments = Comment.latest_comments_for_posts.group_by(&:post_id)
    posts.map { |post| FeedPost.new(post, comments[post.id]&.first) }
  end
end

class FeedPost
  attr_reader :latest_comment

  def initialize(post, latest_comment)
    @post = post
    @latest_comment = latest_comment
  end
end

def cache(post)
  @cache ||= {}
  @cache[post.id] ||= yield
end

def clear_cache!
  @cache = {}
end

Seeds.run(posts_count: 20, comments_count: 1000)

n = 1
puts ""
puts "Memory Benchmark"
puts "---------"

Benchmark.memory do |benchmark|
  benchmark.report("sorted_comments") do
    n.times do
      Post.includes(:sorted_comments).all.each do |post|
        post.sorted_comments.last.body
      end
    end
  end

  benchmark.report("latest_comment") do
    n.times do
      Post.includes(:latest_comment).all.each do |post|
        post.latest_comment.body
      end
    end
  end

  benchmark.report("feed looping") do
    n.times do
      Feed.new.posts.each do |post|
        post.latest_comment.body
      end
    end
  end

  feed = Feed.new
  benchmark.report("cached feed") do
    n.times do
      feed.posts.each do |post|
        post.latest_comment.body
      end
    end
  end

  benchmark.report("cached_comment") do
    n.times do
      Post.includes(:cached_comment).all.each do |post|
        post.cached_comment.body
      end
    end
  end

  benchmark.report("russian doll") do
    n.times do
      Post.all.each do |post|
        cache(post) do
          post.latest_comment.body
        end
      end
    end
  end

  clear_cache!
  benchmark.compare!
end


puts ""
puts "Benchmark ips"
puts "---------"


Benchmark.ips do |benchmark|
  benchmark.config(:time => 5, :warmup => 1)

  benchmark.report("sorted_comments") do
    Post.includes(:sorted_comments).all.each do |post|
      post.sorted_comments.last.body
    end
  end

  benchmark.report("latest_comment") do
    Post.includes(:latest_comment).all.each do |post|
      post.latest_comment.body
    end
  end

  benchmark.report("feed looping") do
    Feed.new.posts.each do |post|
      post.latest_comment.body
    end
  end

  feed = Feed.new
  benchmark.report("cached feed") do
    n.times do
      feed.posts.each do |post|
        post.latest_comment.body
      end
    end
  end

  benchmark.report("cached_comment") do
    Post.includes(:cached_comment).all.each do |post|
      post.cached_comment.body
    end
  end

  benchmark.report("russian doll") do
    Post.all.each do |post|
      cache(post) do
        post.latest_comment.body
      end
    end
  end

  benchmark.compare!
end

You can see the code on github at bhserna/latest_comment_n-1_fixes_benchmark

Benchmark results

Now here you have the reports…

Scenario 1: Posts: 20, Comments: 5 per post

Memory Benchmark
---------
Calculating -------------------------------------
     sorted_comments   484.348k memsize (    15.950k retained)
                         5.630k objects (   146.000  retained)
                        50.000  strings (    50.000  retained)
      latest_comment   116.855k memsize (     2.065k retained)
                         1.426k objects (    16.000  retained)
                        50.000  strings (    15.000  retained)
        feed looping    55.176k memsize (     0.000  retained)
                       702.000  objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
         cached feed    55.136k memsize (    35.840k retained)
                       701.000  objects (   390.000  retained)
                        50.000  strings (    45.000  retained)
      cached_comment    70.032k memsize (    40.000  retained)
                       883.000  objects (     1.000  retained)
                        50.000  strings (     1.000  retained)
        russian doll   488.744k memsize (     1.304k retained)
                         7.808k objects (    24.000  retained)
                        50.000  strings (    20.000  retained)

Comparison:
         cached feed:      55136 allocated
        feed looping:      55176 allocated - 1.00x more
      cached_comment:      70032 allocated - 1.27x more
      latest_comment:     116855 allocated - 2.12x more
     sorted_comments:     484348 allocated - 8.78x more
        russian doll:     488744 allocated - 8.86x more

Benchmark ips
---------
Warming up --------------------------------------
     sorted_comments    29.000  i/100ms
      latest_comment    58.000  i/100ms
        feed looping    92.000  i/100ms
         cached feed    14.814k i/100ms
      cached_comment    76.000  i/100ms
        russian doll   324.000  i/100ms
Calculating -------------------------------------
     sorted_comments    317.872  (± 4.4%) i/s -      1.595k in   5.026951s
      latest_comment    606.529  (± 4.1%) i/s -      3.074k in   5.077027s
        feed looping      1.020k (± 3.4%) i/s -      5.152k in   5.056148s
         cached feed    153.459k (± 2.5%) i/s -    770.328k in   5.022960s
      cached_comment    803.337  (± 2.7%) i/s -      4.028k in   5.017891s
        russian doll      3.322k (± 2.1%) i/s -     16.848k in   5.073899s

Comparison:
         cached feed:   153459.3 i/s
        russian doll:     3322.0 i/s - 46.19x  (± 0.00) slower
        feed looping:     1020.1 i/s - 150.43x  (± 0.00) slower
      cached_comment:      803.3 i/s - 191.03x  (± 0.00) slower
      latest_comment:      606.5 i/s - 253.01x  (± 0.00) slower
     sorted_comments:      317.9 i/s - 482.77x  (± 0.00) slower

Scenario 2: Posts: 20, Comments: 10 per post

Memory Benchmark
---------
Calculating -------------------------------------
     sorted_comments   586.444k memsize (    15.950k retained)
                         6.930k objects (   146.000  retained)
                        50.000  strings (    50.000  retained)
      latest_comment   116.855k memsize (     2.065k retained)
                         1.426k objects (    16.000  retained)
                        50.000  strings (    15.000  retained)
        feed looping    55.176k memsize (     0.000  retained)
                       702.000  objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
         cached feed    55.136k memsize (    35.840k retained)
                       701.000  objects (   390.000  retained)
                        50.000  strings (    45.000  retained)
      cached_comment    70.032k memsize (    40.000  retained)
                       883.000  objects (     1.000  retained)
                        50.000  strings (     1.000  retained)
        russian doll   488.744k memsize (     1.304k retained)
                         7.808k objects (    24.000  retained)
                        50.000  strings (    20.000  retained)

Comparison:
         cached feed:      55136 allocated
        feed looping:      55176 allocated - 1.00x more
      cached_comment:      70032 allocated - 1.27x more
      latest_comment:     116855 allocated - 2.12x more
        russian doll:     488744 allocated - 8.86x more
     sorted_comments:     586444 allocated - 10.64x more

Benchmark ips
---------
Warming up --------------------------------------
     sorted_comments    25.000  i/100ms
      latest_comment    52.000  i/100ms
        feed looping    95.000  i/100ms
         cached feed    15.271k i/100ms
      cached_comment    78.000  i/100ms
        russian doll   325.000  i/100ms
Calculating -------------------------------------
     sorted_comments    240.019  (± 5.0%) i/s -      1.200k in   5.011152s
      latest_comment    532.712  (± 4.1%) i/s -      2.704k in   5.085499s
        feed looping    987.484  (± 2.8%) i/s -      4.940k in   5.006529s
         cached feed    156.682k (± 1.7%) i/s -    794.092k in   5.069703s
      cached_comment    744.650  (± 7.4%) i/s -      3.744k in   5.059516s
        russian doll      3.135k (± 4.9%) i/s -     15.925k in   5.092878s

Comparison:
         cached feed:   156681.7 i/s
        russian doll:     3134.7 i/s - 49.98x  (± 0.00) slower
        feed looping:      987.5 i/s - 158.67x  (± 0.00) slower
      cached_comment:      744.7 i/s - 210.41x  (± 0.00) slower
      latest_comment:      532.7 i/s - 294.12x  (± 0.00) slower
     sorted_comments:      240.0 i/s - 652.79x  (± 0.00) slower

Scenario 3: Posts: 20, Comments: 50 per post

Memory Benchmark
---------
Calculating -------------------------------------
     sorted_comments     1.408M memsize (    15.950k retained)
                        17.330k objects (   146.000  retained)
                        50.000  strings (    50.000  retained)
      latest_comment   116.855k memsize (     2.065k retained)
                         1.426k objects (    16.000  retained)
                        50.000  strings (    15.000  retained)
        feed looping    55.176k memsize (     0.000  retained)
                       702.000  objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
         cached feed    55.136k memsize (    35.840k retained)
                       701.000  objects (   390.000  retained)
                        50.000  strings (    45.000  retained)
      cached_comment    70.032k memsize (    40.000  retained)
                       883.000  objects (     1.000  retained)
                        50.000  strings (     1.000  retained)
        russian doll   488.744k memsize (     1.304k retained)
                         7.808k objects (    24.000  retained)
                        50.000  strings (    20.000  retained)

Comparison:
         cached feed:      55136 allocated
        feed looping:      55176 allocated - 1.00x more
      cached_comment:      70032 allocated - 1.27x more
      latest_comment:     116855 allocated - 2.12x more
        russian doll:     488744 allocated - 8.86x more
     sorted_comments:    1408084 allocated - 25.54x more

Benchmark ips
---------
Warming up --------------------------------------
     sorted_comments     8.000  i/100ms
      latest_comment    29.000  i/100ms
        feed looping    92.000  i/100ms
         cached feed    15.385k i/100ms
      cached_comment    76.000  i/100ms
        russian doll   343.000  i/100ms
Calculating -------------------------------------
     sorted_comments     87.033  (± 3.4%) i/s -    440.000  in   5.060406s
      latest_comment    287.003  (± 3.8%) i/s -      1.450k in   5.060856s
        feed looping    877.671  (± 3.0%) i/s -      4.416k in   5.036340s
         cached feed    152.899k (± 2.3%) i/s -    769.250k in   5.033712s
      cached_comment    719.988  (± 5.6%) i/s -      3.648k in   5.084290s
        russian doll      3.284k (± 6.0%) i/s -     16.464k in   5.035281s

Comparison:
         cached feed:   152899.4 i/s
        russian doll:     3284.4 i/s - 46.55x  (± 0.00) slower
        feed looping:      877.7 i/s - 174.21x  (± 0.00) slower
      cached_comment:      720.0 i/s - 212.36x  (± 0.00) slower
      latest_comment:      287.0 i/s - 532.75x  (± 0.00) slower
     sorted_comments:       87.0 i/s - 1756.80x  (± 0.00) slower

Scenario 4: Posts: 20, Comments: 100 per post

Memory Benchmark
---------
Calculating -------------------------------------
     sorted_comments     2.434M memsize (    15.950k retained)
                        30.330k objects (   146.000  retained)
                        50.000  strings (    50.000  retained)
      latest_comment   116.855k memsize (     2.065k retained)
                         1.426k objects (    16.000  retained)
                        50.000  strings (    15.000  retained)
        feed looping    55.176k memsize (     0.000  retained)
                       702.000  objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
         cached feed    55.136k memsize (    35.840k retained)
                       701.000  objects (   390.000  retained)
                        50.000  strings (    45.000  retained)
      cached_comment    70.032k memsize (    40.000  retained)
                       883.000  objects (     1.000  retained)
                        50.000  strings (     1.000  retained)
        russian doll   488.744k memsize (     1.304k retained)
                         7.808k objects (    24.000  retained)
                        50.000  strings (    20.000  retained)

Comparison:
         cached feed:      55136 allocated
        feed looping:      55176 allocated - 1.00x more
      cached_comment:      70032 allocated - 1.27x more
      latest_comment:     116855 allocated - 2.12x more
        russian doll:     488744 allocated - 8.86x more
     sorted_comments:    2433508 allocated - 44.14x more

Benchmark ips
---------
Warming up --------------------------------------
     sorted_comments     4.000  i/100ms
      latest_comment    19.000  i/100ms
        feed looping    79.000  i/100ms
         cached feed    15.535k i/100ms
      cached_comment    85.000  i/100ms
        russian doll   353.000  i/100ms
Calculating -------------------------------------
     sorted_comments     47.354  (± 4.2%) i/s -    240.000  in   5.075750s
      latest_comment    185.682  (± 2.2%) i/s -    931.000  in   5.016461s
        feed looping    754.015  (± 1.7%) i/s -      3.792k in   5.030455s
         cached feed    154.707k (± 1.1%) i/s -    776.750k in   5.021351s
      cached_comment    819.179  (± 2.0%) i/s -      4.165k in   5.086336s
        russian doll      3.390k (± 2.6%) i/s -     17.297k in   5.104940s

Comparison:
         cached feed:   154707.3 i/s
        russian doll:     3390.5 i/s - 45.63x  (± 0.00) slower
      cached_comment:      819.2 i/s - 188.86x  (± 0.00) slower
        feed looping:      754.0 i/s - 205.18x  (± 0.00) slower
      latest_comment:      185.7 i/s - 833.18x  (± 0.00) slower
     sorted_comments:       47.4 i/s - 3267.07x  (± 0.00) slower

Scenario 5: Posts: 20, Comments: 1000 per post

Memory Benchmark
---------
Calculating -------------------------------------
     sorted_comments    20.977M memsize (    15.950k retained)
                       264.330k objects (   146.000  retained)
                        50.000  strings (    50.000  retained)
      latest_comment   116.855k memsize (     2.065k retained)
                         1.426k objects (    16.000  retained)
                        50.000  strings (    15.000  retained)
        feed looping    55.176k memsize (     0.000  retained)
                       702.000  objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
         cached feed    55.136k memsize (    35.840k retained)
                       701.000  objects (   390.000  retained)
                        50.000  strings (    45.000  retained)
      cached_comment    70.032k memsize (    40.000  retained)
                       883.000  objects (     1.000  retained)
                        50.000  strings (     1.000  retained)
        russian doll   488.744k memsize (     1.304k retained)
                         7.808k objects (    24.000  retained)
                        50.000  strings (    20.000  retained)

Comparison:
         cached feed:      55136 allocated
        feed looping:      55176 allocated - 1.00x more
      cached_comment:      70032 allocated - 1.27x more
      latest_comment:     116855 allocated - 2.12x more
        russian doll:     488744 allocated - 8.86x more
     sorted_comments:   20977100 allocated - 380.46x more

Benchmark ips
---------
Warming up --------------------------------------
     sorted_comments     1.000  i/100ms
      latest_comment    22.000  i/100ms
        feed looping    25.000  i/100ms
         cached feed    14.818k i/100ms
      cached_comment    57.000  i/100ms
        russian doll   288.000  i/100ms
Calculating -------------------------------------
     sorted_comments      4.665  (± 0.0%) i/s -     24.000  in   5.177711s
      latest_comment    206.056  (± 7.3%) i/s -      1.034k in   5.050651s
        feed looping    239.214  (± 4.2%) i/s -      1.200k in   5.026543s
         cached feed    146.296k (± 5.1%) i/s -    740.900k in   5.079435s
      cached_comment    706.969  (±10.8%) i/s -      3.534k in   5.064015s
        russian doll      3.235k (± 4.9%) i/s -     16.416k in   5.087702s

Comparison:
         cached feed:   146296.1 i/s
        russian doll:     3235.0 i/s - 45.22x  (± 0.00) slower
      cached_comment:      707.0 i/s - 206.93x  (± 0.00) slower
        feed looping:      239.2 i/s - 611.57x  (± 0.00) slower
      latest_comment:      206.1 i/s - 709.98x  (± 0.00) slower
     sorted_comments:        4.7 i/s - 31362.27x  (± 0.00) slower

So… What fix should you pick?

I think that you can use the simplest solution, “the association with default order”, maybe until you are in the 50 comments per post, after that maybe you could try other options like “fragment caching” or “looping through the latest comment for each post”.

Download a cheatsheet

You can also download a cheatsheet to help you decide which fix to pick in different contexts, based mostly on my interpretation of the benchmarks and the complexity of each solution.

Get it here: Cheatsheet for the “latest-comment” n+1 queries problem