By Ian Boyd


2010-01-11 16:44:37 8 Comments

I'll use a concrete, but hypothetical, example.

Each Order normally has only one line item:

Orders:

OrderGUID   OrderNumber
=========   ============
{FFB2...}   STL-7442-1      
{3EC6...}   MPT-9931-8A

LineItems:

LineItemGUID   Order ID Quantity   Description
============   ======== ========   =================================
{098FBE3...}   1        7          prefabulated amulite
{1609B09...}   2        32         spurving bearing

But occasionally there will be an order with two line items:

LineItemID   Order ID    Quantity   Description
==========   ========    ========   =================================
{A58A1...}   6,784,329   5          pentametric fan
{0E9BC...}   6,784,329   5          differential girdlespring 

Normally when showing the orders to the user:

SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    INNER JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID

I want to show the single item on the order. But with this occasional order containing two (or more) items, the orders would appear be duplicated:

OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         spurving bearing
KSG-0619-81   5          panametric fan
KSG-0619-81   5          differential girdlespring

What I really want is to have SQL Server just pick one, as it will be good enough:

OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         differential girdlespring
KSG-0619-81   5          panametric fan

If I get adventurous, I might show the user, an ellipsis to indicate that there's more than one:

OrderNumber   Quantity   Description
===========   ========   ====================
STL-7442-1    7          prefabulated amulite
MPT-9931-8A   32         differential girdlespring
KSG-0619-81   5          panametric fan, ...

So the question is how to either

  • eliminate "duplicate" rows
  • only join to one of the rows, to avoid duplication

First attempt

My first naive attempt was to only join to the "TOP 1" line items:

SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    INNER JOIN (
       SELECT TOP 1 LineItems.Quantity, LineItems.Description
       FROM LineItems
       WHERE LineItems.OrderID = Orders.OrderID) LineItems2
    ON 1=1

But that gives the error:

The column or prefix 'Orders' does not
match with a table name or alias name
used in the query.

Presumably because the inner select doesn't see the outer table.

11 comments

@P. Olesen 2018-11-28 11:55:08

From SQL Server 2012 and onwards I think this will do the trick:

SELECT DISTINCT
    o.OrderNumber ,
    FIRST_VALUE(li.Quantity) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Quantity ,
    FIRST_VALUE(li.Description) OVER ( PARTITION BY o.OrderNumber ORDER BY li.Description ) AS Description
FROM    Orders AS o
    INNER JOIN LineItems AS li ON o.OrderID = li.OrderID

@Quassnoi 2010-01-11 16:48:26

SELECT   Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM     Orders
JOIN     LineItems
ON       LineItems.LineItemGUID =
         (
         SELECT  TOP 1 LineItemGUID 
         FROM    LineItems
         WHERE   OrderID = Orders.OrderID
         )

In SQL Server 2005 and above, you could just replace INNER JOIN with CROSS APPLY:

SELECT  Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM    Orders
CROSS APPLY
        (
        SELECT  TOP 1 LineItems.Quantity, LineItems.Description
        FROM    LineItems
        WHERE   LineItems.OrderID = Orders.OrderID
        ) LineItems2

Please note that TOP 1 without ORDER BY is not deterministic: this query you will get you one line item per order, but it is not defined which one will it be.

Multiple invocations of the query can give you different line items for the same order, even if the underlying did not change.

If you want deterministic order, you should add an ORDER BY clause to the innermost query.

@Ian Boyd 2010-01-11 16:54:01

Excellent, that works; moving TOP 1 from derived table clause to join clause.

@Alex 2011-12-22 10:41:03

and the "OUTER JOIN" equivalent would be "OUTER APPLY"

@Alex Nolasco 2012-01-31 23:11:07

How about for LEFT OUTER JOIN?

@Quassnoi 2012-02-01 09:53:28

@AlexanderN: what exactly you want to know about LEFT OUTER JOIN?

@Brett Ryan 2012-09-12 07:20:30

How do you do this if the join is via a compound key/has multiple columns?

@Sunday Ironfoot 2012-12-11 11:19:38

I just knocked 1.3 seconds off the execution time of a complex T-SQL query thanks to this (3.7 down to 2.4 seconds). :-D

@Quassnoi 2013-11-29 18:25:26

@Lex: INNER JOIN does not generate NULL value for rows missing from the joined query, LEFT JOIN does.

@Brett Ryan 2014-06-26 03:50:34

Is there a way to do this without a top 1 in the inner select? Unfortunately OpenEdge being so state of the art you get [DataDirect][OpenEdge JDBC Driver][OpenEdge] TOP clause used in unsupported context. (13694)

@Jeff Davis 2015-09-17 14:27:35

Is there an advantage to using Cross Apply?

@Quassnoi 2015-09-17 14:31:55

@JeffDavis: it will save you one seek per row if OrderId is the leading column in the primary key.

@BornToCode 2015-12-06 11:28:09

@Quassnoi - Could you please explain your last comment about cross apply being more efficient if order id is primary key? Why is that and why it doesn't happen with join?

@Quassnoi 2015-12-06 12:30:06

@BornToCode: cross apply would retrieve the whole record and return it right away. Join would retrieve the row, take line item guid from it, self join the table on the guid (that's an additional seek) and return the record from the joined table.

@Dirk Horsten 2016-04-07 08:16:48

Unfortunately not supported on Sybase IQ

@statler 2017-01-16 10:07:23

Importantly - the second solution - Cross Apply / Outer Apply will work with SQLCE which cannot use Scalar subqueries in joins as per the top query

@hastrb 2018-03-15 12:04:56

CROSS APPLY instead INNER JOIN and OUTER APPLY instead LEFT JOIN (the same as LEFT OUTER JOIN).

@Paul Wehland 2018-09-21 10:59:17

Keep in mind that TOP, without ORDER BY is non-deterministic. You are not guaranteed to get the row you think you will.

@9Rune5 2018-12-06 13:05:57

Alternatively, use MIN() instead of TOP 1. I find that I am usually interested in the first Id.

@avb 2017-05-10 10:01:10

,Another aproach using common table expression:

with firstOnly as (
    select Orders.OrderNumber, LineItems.Quantity, LineItems.Description, ROW_NUMBER() over (partiton by Orders.OrderID order by Orders.OrderID) lp
    FROM Orders
        join LineItems on Orders.OrderID = LineItems.OrderID
) select *
  from firstOnly
  where lp = 1

or, in the end maybe you would like to show all rows joined?

comma separated version here:

  select *
  from Orders o
    cross apply (
        select CAST((select l.Description + ','
        from LineItems l
        where l.OrderID = s.OrderID
        for xml path('')) as nvarchar(max)) l
    ) lines

@Anand 2017-05-09 18:12:59

My favorite way to run this query is with a not exists clause. I believe this is the most efficient way to run this sort of query:

select o.OrderNumber,
       li.Quantity,
       li.Description
from Orders as o
inner join LineItems as li
on li.OrderID = o.OrderID
where not exists (
    select 1
    from LineItems as li_later
    where li_later.OrderID = o.OrderID
    and li_later.LineItemGUID > li.LineItemGUID
    )

But I have not tested this method against other methods suggested here.

@Abdullah Yousuf 2016-09-17 10:19:44

Correlated sub queries are sub queries that depend on the outer query. It’s like a for loop in SQL. The sub-query will run once for each row in the outer query:

select * from users join widgets on widgets.id = (
    select id from widgets
    where widgets.user_id = users.id
    order by created_at desc
    limit 1
)

@smerlung 2016-09-13 08:08:42

I solve a similar problem by using LEFT JOIN and GROUP BY Orders.OrderNumber. Is there a reason not to do it this way?

SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
    LEFT JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID
GROUP BY Orders.OrderNumber

I'll answer your answer question with an answer in your own question:

Orders             LineItems
+-------------+    +---------+----------+---------------+
| OrderNumber |    | OrderID | Quantity | Description   |
+-------------+    +---------+----------+---------------+
| 22586       |    | 22586   | 17       | Trunion       |
+-------------+    | 22586   | 3        | Girdle Spring |
                   +---------+----------+---------------+

Joining the two together on OrderNumber gives:

OrderNumber  Quantity  Description
-----------  --------  -------------
22586        17        Trunion
22586        3         Girdle Spring

2 row(s) affected

Where we wanted it to to return only one row:

OrderNumber  Quantity  Description
-----------  --------  -------------
22586        17        Trunion

1 row(s) affected

This is why I use GROUP BY Orders.OrderNumber which only returns one row per OrderNumber.

@Don Cheadle 2017-10-16 21:33:09

This will not work for at least SQL Server. Will get is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. for the Columns not in the Group By. In a sense, this is because it's indeterminate which data to show from the non-Grouped columns. I.e. how does it know which row to pull when there's many? Not sure which DB you're using.

@smerlung 2017-11-22 20:46:20

I ran this on MySql

@Spencer Sullivan 2019-04-01 14:07:52

Please remove this post as it is not pertinent to the question asked (SQL Server)

@BornToCode 2016-03-03 16:14:47

@Quassnoi answer is good, in some cases (especially if the outer table is big), a more efficient query might be with using windowed functions, like this:

SELECT  Orders.OrderNumber, LineItems2.Quantity, LineItems2.Description
FROM    Orders
LEFT JOIN 
        (
        SELECT  LineItems.Quantity, LineItems.Description, OrderId, ROW_NUMBER()
                OVER (PARTITION BY OrderId ORDER BY (SELECT NULL)) AS RowNum
        FROM    LineItems

        ) LineItems2 ON LineItems2.OrderId = Orders.OrderID And RowNum = 1

Sometimes you just need to test which query gives better performance.

@user890332 2019-05-10 17:55:39

This is the only answer I found that does a real "Left" join, meaning it does not add any more lines then is in the "Left" table. You just need to put in subquery and add "where RowNum is not null"

@ernst 2013-02-14 21:47:20

Tried the cross, works nicely, but takes slightly longer. Adjusted line columns to have max and added group which kept speed and dropped the extra record.

Here's the adjusted query:

SELECT Orders.OrderNumber, max(LineItems.Quantity), max(LineItems.Description)
FROM Orders
    INNER JOIN LineItems 
    ON Orders.OrderID = LineItems.OrderID
Group by Orders.OrderNumber

@Brianorca 2015-08-04 23:52:28

But having max separately on two columns means the quantity might not be related to the description. If the order was 2 Widgets and 10 Gadgets, the query would return 10 Widgets.

@Justin Fisher 2012-04-06 21:25:16

I know this question was answered a while ago, but when dealing with large data sets, nested queries can be costly. Here is a different solution where the nested query will only be ran once, instead of for each row returned.

SELECT 
  Orders.OrderNumber,
  LineItems.Quantity, 
  LineItems.Description
FROM 
  Orders
  INNER JOIN (
    SELECT
      Orders.OrderNumber,
      Max(LineItem.LineItemID) AS LineItemID
    FROM
      Orders INNER JOIN LineItems
      ON Orders.OrderNumber = LineItems.OrderNumber
    GROUP BY Orders.OrderNumber
  ) AS Items ON Orders.OrderNumber = Items.OrderNumber
  INNER JOIN LineItems 
  ON Items.LineItemID = LineItems.LineItemID

@GER 2015-01-20 16:43:04

This is also much faster if your 'LineItemId' column is not indexed properly. Compared to the accepted answer.

@NickG 2015-04-24 16:04:11

But how would you do this if Max is not usable as you need to order by a column different to the one you want to return?

@stifin 2015-06-09 10:39:30

you can order the derived table whichever way you want and use TOP 1 in SQL Server or LIMIT 1 in MySQL

@Peter Radocchia 2010-01-11 16:59:58

EDIT: nevermind, Quassnoi has a better answer.

For SQL2K, something like this:

SELECT 
  Orders.OrderNumber
, LineItems.Quantity
, LineItems.Description
FROM (  
  SELECT 
    Orders.OrderID
  , Orders.OrderNumber
  , FirstLineItemID = (
      SELECT TOP 1 LineItemID
      FROM LineItems
      WHERE LineItems.OrderID = Orders.OrderID
      ORDER BY LineItemID -- or whatever else
      )
  FROM Orders
  ) Orders
JOIN LineItems 
  ON LineItems.OrderID = Orders.OrderID 
 AND LineItems.LineItemID = Orders.FirstLineItemID

@Tomalak 2010-01-11 16:50:24

You could do:

SELECT 
  Orders.OrderNumber, 
  LineItems.Quantity, 
  LineItems.Description
FROM 
  Orders INNER JOIN LineItems 
  ON Orders.OrderID = LineItems.OrderID
WHERE
  LineItems.LineItemID = (
    SELECT MIN(LineItemID) 
    FROM   LineItems
    WHERE  OrderID = Orders.OrderID
  )

This requires an index (or primary key) on LineItems.LineItemID and an index on LineItems.OrderID or it will be slow.

@leo 2015-07-02 09:53:46

This does not work if an Orders has no LineItems. The sub-expression then evaluates LineItems.LineItemID = null and removes the left entity orders completely from the result.

@Tomalak 2015-07-02 09:58:28

That's also the effect of the inner join, so... yeah.

@leo 2015-07-02 10:38:32

Solution that can be adapted for LEFT OUTER JOIN: stackoverflow.com/a/20576200/510583

@Tomalak 2015-07-02 10:56:26

@leo Yes, but the OP used an inner join himself, so I don't understand your objection.

Related Questions

Sponsored Content

26 Answered Questions

28 Answered Questions

[SOLVED] How can I prevent SQL injection in PHP?

24 Answered Questions

[SOLVED] What is the difference between "INNER JOIN" and "OUTER JOIN"?

11 Answered Questions

[SOLVED] LEFT JOIN vs. LEFT OUTER JOIN in SQL Server

45 Answered Questions

42 Answered Questions

[SOLVED] How to return only the Date from a SQL Server DateTime datatype

33 Answered Questions

[SOLVED] How do I UPDATE from a SELECT in SQL Server?

23 Answered Questions

[SOLVED] Check if table exists in SQL Server

4 Answered Questions

[SOLVED] Inserting multiple rows in a single SQL query?

37 Answered Questions

Sponsored Content