I have a customer table (8 million records or so) and I want to set each customer record with the value of the latest order date from the orders table.
I'm in a replication environment, so I don't want to update records needlessly by setting nulls to nulls and dates to the same date.
Here's the table layout (vastly simplified);
Here's what I want to to... Update only those customers that have an order on file that is later than the last recorded order date on the Customer table.
My attempt fails... well, it updates everything, but it updates EVERYTHING -- not just the records that need updating. (And it generates a bajillion replication messages needlessly.)
I've tried various things to pull the max(order_date) out the subquery to use in a WHERE clause for the customer, but I've had no luck. But somehow, the results of the subquery need to be part of the where clause. Well, unless I scrap the subquery idea and do a JOIN.
Any help is appreciated!
Using SQLA 22.214.171.12450, replicating via dbremote
asked 05 Mar '10, 22:53
Here's what I did to get to this statement:
I started by computing the max order date for each customer:
Then I used this query to compose the list of customer rows that actually needed to be updated by joining the derived table (dt) back to the customer table and restricted the rows to those that have a later date (or the customer date is null).
A similar strategy can typically be used for most update scenarios like this one.
Note that if you update a column in a table to its same value then the server will not actually update the column but it may (depending on your version and build) cause triggers to fire, including updating any DEFAULT CURRENT TIMESTAMP (and similar) columns. If you are seeing bajillion rows being updated, I will presume that you must have something like this in your full schema.
I'm not a query expert,... so the query gurus may have a better solution?
Hah - Mark beat me to it - I was going to post the identical answer (though Mark has a small typo in his answer).
The problem you have is that you've only specified the subquery in the UPDATE statement's SET clause, and not in the query's WHERE clause - consequently you'll update every single customer tuple (as you learned). You could write this using two subqueries:
However, as Mark has already posted, doing it over a join is easier and more efficient, since rather than nested-iteration semantics you'll compute the last-order date for every customer in one go.
answered 06 Mar '10, 02:36