The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

I have a customer table (8 million records or so) and I want to set each customer record with the value of the latest order date from the orders table.

I'm in a replication environment, so I don't want to update records needlessly by setting nulls to nulls and dates to the same date.

Here's the table layout (vastly simplified);

CREATE TABLE "rhiner"."customer" (
    "Customer_id" integer NOT NULL DEFAULT global autoincrement,
    "Last_order_date" date NULL,
    PRIMARY KEY ( "Customer_id" )

CREATE TABLE "rhiner"."orders" (
        "order_id" integer NOT NULL DEFAULT global autoincrement,
    "order_date" date NULL,
    "customer_id" integer NULL,
    PRIMARY KEY ( "order_id" )

ALTER TABLE "rhiner"."orders" 
    ADD NOT NULL FOREIGN KEY "customer" ( "customer_id" ) 
    REFERENCES "rhiner"."customer" ( "Customer_id" );

Here's what I want to to... Update only those customers that have an order on file that is later than the last recorded order date on the Customer table.

My attempt fails... well, it updates everything, but it updates EVERYTHING -- not just the records that need updating. (And it generates a bajillion replication messages needlessly.)

update rhiner.customer c
set last_order_date = 
  (select max(order_date)  
  from rhiner.orders 
  where orders.customer_id = customer.customer_id

I've tried various things to pull the max(order_date) out the subquery to use in a WHERE clause for the customer, but I've had no luck. But somehow, the results of the subquery need to be part of the where clause. Well, unless I scrap the subquery idea and do a JOIN.

Any help is appreciated!

Using SQLA, replicating via dbremote

asked 05 Mar '10, 22:53

Ron%20Hiner's gravatar image

Ron Hiner
accept rate: 9%


Well, that's one type of problem one might call a SQL Pattern - cf.

(06 Mar '10, 09:40) Volker Barth

Try this:

update customer cust
   set cust.last_order_date = dt.max_order_date
  from ( select c.customer_id, max( o.order_date ) as max_order_date
           from customer c join orders o on c.customer_id = o.customer_id
          group by c.customer_id ) dt
 where cust.customer_id = dt.customer_id
   and ( cust.last_order_date is null 
         or cust.last_order_date < dt.max_order_date )

Here's what I did to get to this statement:

I started by computing the max order date for each customer:

select c.customer_id, max( o.order_date ) as max_order_date
  from customer c 
  join orders o on c.customer_id = o.customer_id
 group by c.customer_id

Then I used this query to compose the list of customer rows that actually needed to be updated by joining the derived table (dt) back to the customer table and restricted the rows to those that have a later date (or the customer date is null).

A similar strategy can typically be used for most update scenarios like this one.

Note that if you update a column in a table to its same value then the server will not actually update the column but it may (depending on your version and build) cause triggers to fire, including updating any DEFAULT CURRENT TIMESTAMP (and similar) columns. If you are seeing bajillion rows being updated, I will presume that you must have something like this in your full schema.

I'm not a query expert,... so the query gurus may have a better solution?


permanent link

answered 06 Mar '10, 02:30

Mark%20Culp's gravatar image

Mark Culp
accept rate: 40%

edited 06 Mar '10, 02:35

Awesome Mark! I'm not sure what typo Glenn was referring to... but I added some parens to the where clause. where cust.customer_id = dt.customer_id and ( ( cust.last_order_date is null ) or (cust.last_order_date < dt.max_order_date ) )

(10 Mar '10, 21:51) Ron Hiner

Hah - Mark beat me to it - I was going to post the identical answer (though Mark has a small typo in his answer).

The problem you have is that you've only specified the subquery in the UPDATE statement's SET clause, and not in the query's WHERE clause - consequently you'll update every single customer tuple (as you learned). You could write this using two subqueries:

  • one in the SET clause, to give the last-order date to set the customer row to
  • a second, near-identical subquery to filter out those customers who do not need any changes.

However, as Mark has already posted, doing it over a join is easier and more efficient, since rather than nested-iteration semantics you'll compute the last-order date for every customer in one go.

permanent link

answered 06 Mar '10, 02:36

Glenn%20Paulley's gravatar image

Glenn Paulley
accept rate: 43%

thanks both Mark and Glenn! This is very cool. I never knew about this ability. The capabilities of this product continue to surprise me. And I've only been using it since it was called "Watcom SQL 2.0". I wonder if this could be used to simulate crosstabs.

(10 Mar '10, 18:55) Ron Hiner
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 05 Mar '10, 22:53

question was seen: 1,350 times

last updated: 06 Mar '10, 02:36