SAP SQL Anywhere

Question

I have a customer table (8 million records or so) and I want to set each customer record with the value of the latest order date from the orders table.

I'm in a replication environment, so I don't want to update records needlessly by setting nulls to nulls and dates to the same date.

Here's the table layout (vastly simplified);

CREATE TABLE "rhiner"."customer" (
    "Customer_id" integer NOT NULL DEFAULT global autoincrement,
    "Last_order_date" date NULL,
    PRIMARY KEY ( "Customer_id" )
);

CREATE TABLE "rhiner"."orders" (
        "order_id" integer NOT NULL DEFAULT global autoincrement,
    "order_date" date NULL,
    "customer_id" integer NULL,
    PRIMARY KEY ( "order_id" )
 );

ALTER TABLE "rhiner"."orders" 
    ADD NOT NULL FOREIGN KEY "customer" ( "customer_id" ) 
    REFERENCES "rhiner"."customer" ( "Customer_id" );

Here's what I want to to... Update only those customers that have an order on file that is later than the last recorded order date on the Customer table.

My attempt fails... well, it updates everything, but it updates EVERYTHING -- not just the records that need updating. (And it generates a bajillion replication messages needlessly.)

update rhiner.customer c
set last_order_date = 
  (select max(order_date)  
  from rhiner.orders 
  where orders.customer_id = customer.customer_id
       )

I've tried various things to pull the max(order_date) out the subquery to use in a WHERE clause for the customer, but I've had no luck. But somehow, the results of the subquery need to be part of the where clause. Well, unless I scrap the subquery idea and do a JOIN.

Any help is appreciated!

Using SQLA 9.0.2.3850, replicating via dbremote

Accepted Answer

Try this:

update customer cust
   set cust.last_order_date = dt.max_order_date
  from ( select c.customer_id, max( o.order_date ) as max_order_date
           from customer c join orders o on c.customer_id = o.customer_id
          group by c.customer_id ) dt
 where cust.customer_id = dt.customer_id
   and ( cust.last_order_date is null 
         or cust.last_order_date < dt.max_order_date )

Here's what I did to get to this statement:

I started by computing the max order date for each customer:

select c.customer_id, max( o.order_date ) as max_order_date
  from customer c 
  join orders o on c.customer_id = o.customer_id
 group by c.customer_id

Then I used this query to compose the list of customer rows that actually needed to be updated by joining the derived table (dt) back to the customer table and restricted the rows to those that have a later date (or the customer date is null).

A similar strategy can typically be used for most update scenarios like this one.

Note that if you update a column in a table to its same value then the server will not actually update the column but it may (depending on your version and build) cause triggers to fire, including updating any DEFAULT CURRENT TIMESTAMP (and similar) columns. If you are seeing bajillion rows being updated, I will presume that you must have something like this in your full schema.

I'm not a query expert,... so the query gurus may have a better solution?

HTH

Accepted Answer

Hah - Mark beat me to it - I was going to post the identical answer (though Mark has a small typo in his answer).

The problem you have is that you've only specified the subquery in the UPDATE statement's SET clause, and not in the query's WHERE clause - consequently you'll update every single customer tuple (as you learned). You could write this using two subqueries:

one in the SET clause, to give the last-order date to set the customer row to
a second, near-identical subquery to filter out those customers who do not need any changes.

However, as Mark has already posted, doing it over a join is easier and more efficient, since rather than nested-iteration semantics you'll compute the last-order date for every customer in one go.

SAP SQL Anywhere

How can I use a subquery to limit my WHERE clause in an UPDATE?

Follow this question

Related questions