The Sub-query vs. the “UPDATE FROM” Statement

From time to time I run across a table I need to update based on an ID column or other condition that doesn’t originate from that table. This usually occurs when there are one or more foreign keys in play. For instance, consider the tables below:

Categories
CategoryID
Name

Products
ProductID
CategoryID
Name
Price

Suppose I want to collapse the categories named Televisions and Ipods into a the Electronics category, knowing only the category names. There are two obvious ways to pull off the update statement: using a sub-query OR an UPDATE FROM.

Using a sub-query

DECLARE @NewCategoryID int
SELECT @NewCategoryID = CategoryID FROM Categories WHERE Name='Electronics'

UPDATE Products SET CategoryID = @NewCategoryID WHERE CategoryID in (
    SELECT CategoryID FROM Categories WHERE Name IN ('Televisions', 'Ipods')
)

Here we simply ensure that our category is in a list. It’s important to remember that the list here gets queried fresh for each row. For a small table or a simple, well indexed, table this isn’t a real problem. Now, imagine there are millions of categories in our table. We’ve dramatically increased the processing time by adding a sub-query.

Using an UPDATE FROM

DECLARE @NewCategoryID int
SELECT @NewCategoryID = CategoryID FROM Categories WHERE Name='Electronics'

UPDATE Products SET CategoryID = @NewCategoryID
FROM Products p INNER JOIN Categories c ON p.CategoryID = c.CategoryID
WHERE c.Name IN ('Televisions', 'Electronics')

The UPDATE FROM statement has an advantage over the sub-query because it performs the query with the join only once. The WHERE still has to be evaluated on each row, but if you look at our last example the WHERE was being evaluated for every row in the categories table for every row in the products table, or, the product of the count of rows in each table. So we’ve saved time by not running an extra query for each row and by cutting down the number of expression evaluations to perform.

Leave a Reply