Archive for April, 2009

(geekstuff) T-SQL equivilant to MySql LIMIT clause

Thursday, April 2nd, 2009

As various of you know, one of the things I occasionally do is port code back and forth between using Oracle, MS SQL, and Mysql as database engines. One of the problems that has perpetually been a thorn in my side is that MS SQL didn’t have a equivilant to the handy MySQL LIMIT clause. Oh, you could achive the same effect with sub-selects and use of the TOP keyword, but it was ‘expensive’ – whereas LIMIT querys always execute very quickly. This matters very little when you have 10,000 or 100,000 rows, but get up into the tens of millions and it’s a problem.

Well, as of SQL Server 2005, there is a solution that executes, to all appearances, as quickly as the MySQL LIMIT clause. It’s detailed in this blog, but I’ll give the overview.

There’s now a function called ROW_NUMBER() that returns the current row number of the query. You can’t *directly* use this to define a range (as in SELECT ROW_NUMBER() OVER (ORDER BY column) AS RowNum WHERE RowNum > 1000 AND RowNum < 500) but you can create a temporary object, and then select from the output of it, like this:

WITH testRN AS (SELECT ROW_NUMBER() OVER (ORDER BY field1) AS RowNum, * FROM test) SELECT * FROM testRN WHERE RowNum BETWEEN 500 AND 1000;

I’m currently using this function to dump a 219 million row table, a million rows at a time, and I don’t see any performance degredataion as I move deeper into the table, so I’d say it’s a pretty good solution, unlike the many LIMIT clause suggestions out there in network land that use sub-selects.