Select x records per category in a table

Joel Gascoigne was trying to figure out how he could select a number of rows in his database for each value of a partition column.

I’m just going to summarise what I had learned from exploring this with him last night.

Given the following DDL and data

create sequence foo_id_seq start 1 increment 1 no cycle;
create table foo(
  id          integer not null default nextval('foo_id_seq'),
  content     text,
  kind        text,
  created_at  timestamp with time zone default now(),
  primary key(id)
);

insert into foo(content, kind) values ('a', 'A');
insert into foo(content, kind) values ('b', 'A');
insert into foo(content, kind) values ('c', 'A');
insert into foo(content, kind) values ('d', 'A');
insert into foo(content, kind) values ('a', 'B');
insert into foo(content, kind) values ('b', 'B');
insert into foo(content, kind) values ('a', 'C');
insert into foo(content, kind) values ('b', 'C');
insert into foo(content, kind) values ('c', 'C');
insert into foo(content, kind) values ('d', 'C');

The simplest way (although not a single query) would be to firstly select the different kinds and to do a union all query.

select distinct kind from foo;

 kind 
------
 B
 C
 A

Having taken that we can construct a query as follows

(select * from foo where kind = 'A' order by created_at desc limit 2)
union all
(select * from foo where kind = 'B' order by created_at desc limit 2)
union all
(select * from foo where kind = 'C' order by created_at desc limit 2);

Whilst easy to construct, it’s a bit painful (have to go back to the application – generate the query on that end and resubmit a second query and then process the results).

An alternative using Postgres’ window functions is to use over partition by row_number():

select q.content, q.kind 
from (select row_number() 
        over (partition by kind order by created_at desc) as row_number,
        content,
        kind from foo) q 
where q.row_number < 3;

Now, this will work for Postgres >= 8.4 and should also work for MSSQL and Oracle too.

Unfortunately doing something like this in MySQL involves setting up variables and iterating through results until certain conditions are met, thus resetting the row_number counter:

set @order_prev=0, @partition_prev='';
select q.content, q.kind
from (
  select content, 
         created_at,
         @order_prev := if(@partition_prev = kind, @order_prev+1, 1) as row_number,
         @partition_prev := kind as kind
  from foo
  order by kind, created_at desc, content
) q
where row_number < 3;
“The simplest way (although not a single query) would be to firstly select the different kinds and to do a union all query.”