Subquery or Left Join [Programming SQL]

Prev: records before 6 months
Next: Hash Match best?

From: tshad on 29 Mar 2010 19:23

If you have a Customer table with a column VendorName which could be null,
which is better?

SELECT CustomerID,
CustomerName,
VendorName = (SELECT VendorName
FROM Vendor
WHERE VendorID =
Customers.VendorID)
FROMCustomers

OR

SELECT CustomerID,
CustomerName,
VendorName
FROM Customers
LEFT JOIN Vendors ON Customers.VendorID = VendorID

I am curious because I am trying optimize a query that is doing an OUTER
JOIN on 2 views.

Each view is doing 4 or 5 LEFT JOINS just to get one value from each table
where the value could be null.

Thanks,

Tom

From: Plamen Ratchev on 29 Mar 2010 21:06

These two queries will be different if there are multiple matches based on VendorId (the first one will result in error;
the second will return multiple rows per vendor).

It is best to test with your data and look at the execution plans of the queries. You may find that the query optimizer
may produce very similar plans for both queries.

--
Plamen Ratchev
http://www.SQLStudio.com

From: tshad on 29 Mar 2010 22:15

"Plamen Ratchev" <Plamen(a)SQLStudio.com> wrote in message
news:qbednTbjT_It0yzWnZ2dnUVZ_o-hnZ2d(a)speakeasy.net...
> These two queries will be different if there are multiple matches based on
> VendorId (the first one will result in error; the second will return
> multiple rows per vendor).
>

Why would the first one cause an error:

SELECT CustomerID,
CustomerName,
VendorName = (SELECT VendorName
FROM Vendor
WHERE VendorID =
Customers.VendorID)
FROMCustomers

VendorID is the PK on Vendor and unique.

But the Customer table could have a null in that field, so I assume the name
would either be a name or null - which is fine.

On the second one:

SELECT CustomerID,
CustomerName,
VendorName
FROM Customers
LEFT JOIN Vendors ON Customers.VendorID = VendorID

Again, the VenderID in the Vendor table is unique.

I did run both plans and it did give me the same plan (except that is added
one operator for the subquery - Compute Scalar - 1%).
It gave me Hash Match 44% (right outer join - interesting that it reversed
the join), Clustered Index Scan (Vendor) 1%, and Clustered Index Scan
(Customers) 54%.

I am a little confused here.

A LEFT OUTER JOIN says to give you all the rows in Customers - so in the
second one, I should get 25,389 which is what I got.

But if it is doing an OUTER JOIN in the subQuery and only passing back 1
record.

This was why I was looking at changing the LEFT OUTER JOINS to subqueries
thinking that was better, but I guess that isn't the case.

Tom

> It is best to test with your data and look at the execution plans of the
> queries. You may find that the query optimizer may produce very similar
> plans for both queries.
>
> --
> Plamen Ratchev
> http://www.SQLStudio.com

From: Plamen Ratchev on 29 Mar 2010 22:25

tshad wrote:
> "Plamen Ratchev" <Plamen(a)SQLStudio.com> wrote in message
> news:qbednTbjT_It0yzWnZ2dnUVZ_o-hnZ2d(a)speakeasy.net...
>> These two queries will be different if there are multiple matches based on
>> VendorId (the first one will result in error; the second will return
>> multiple rows per vendor).
>>
>
> Why would the first one cause an error:
>

Note that I specified you will get an error if there are multiple matches (that is multiple rows returned in the
subquery). If VendorId is unique in table Vendor then it will not happen.

>
> I am a little confused here.
>
> A LEFT OUTER JOIN says to give you all the rows in Customers - so in the
> second one, I should get 25,389 which is what I got.
>
> But if it is doing an OUTER JOIN in the subQuery and only passing back 1
> record.
>
> This was why I was looking at changing the LEFT OUTER JOINS to subqueries
> thinking that was better, but I guess that isn't the case.
>
The correlated subquery has to be executed for each row and the transformation to left join is natural, basically it
does the same. Not sure why you think using a subquery in this case would be better.

--
Plamen Ratchev
http://www.SQLStudio.com

From: tshad on 29 Mar 2010 22:41

"Plamen Ratchev" <Plamen(a)SQLStudio.com> wrote in message
news:qbednTHjT_KT_CzWnZ2dnUVZ_o88AAAA(a)speakeasy.net...
> tshad wrote:
>> "Plamen Ratchev" <Plamen(a)SQLStudio.com> wrote in message
>> news:qbednTbjT_It0yzWnZ2dnUVZ_o-hnZ2d(a)speakeasy.net...
>>> These two queries will be different if there are multiple matches based
>>> on VendorId (the first one will result in error; the second will return
>>> multiple rows per vendor).
>>>
>>
>> Why would the first one cause an error:
>>
>
> Note that I specified you will get an error if there are multiple matches
> (that is multiple rows returned in the subquery). If VendorId is unique in
> table Vendor then it will not happen.
>
>>
>> I am a little confused here.
>>
>> A LEFT OUTER JOIN says to give you all the rows in Customers - so in the
>> second one, I should get 25,389 which is what I got.
>>
>> But if it is doing an OUTER JOIN in the subQuery and only passing back 1
>> record.
>>
>> This was why I was looking at changing the LEFT OUTER JOINS to subqueries
>> thinking that was better, but I guess that isn't the case.
>>
> The correlated subquery has to be executed for each row and the
> transformation to left join is natural, basically it does the same. Not
> sure why you think using a subquery in this case would be better.
>
I was thinking that I am only getting one row back vs all the rows back in
the OUTER JOIN. But as you say, I will get one back for each row - so
effectively the same.

I was thinking that with 340,000 reads on this query (which takes about 5 -
7 seconds to run), that maybe the Left Joins were the problem but I guess
not.

Maybe with the FULL JOIN and all the LEFT JOINS, that many reads and time is
reasonable.

Thanks,

Tom

> --
> Plamen Ratchev
> http://www.SQLStudio.com

| Next | Last
Pages: 1 2
Prev: records before 6 months
Next: Hash Match best?