example: table A (delivery_schedule) has 100 records which joins to table B (deliveries made). The relation is 1 to many. If a new schedule record is added, it will not have a delivery made record. I want to select all of the records from table A and the most RECENT delivery from table B. the syntax is: select A.customerId, A.companyName, A.productName, isNull( max( invoiceDate ), '2014-01-01' ), isNull( quantityReceived, 0 ) from table A left outer join table b on A.customerID = B.customerID and a.productName = b.productName group by A.customerId, A.companyName, A.productName the SQL is working as expected, but it's not what i want. i've tried TOP and LIMIT without success. is there an easy way to get what i'm looking for? |
For a discussion of WINDOW and FIRST_VALUE see this blog post: Great Moments In History: Housing Bubble (I cannot code these things without looking at examples :) The WITH clause is just like CREATE VIEW except it's local to the following SELECT. It simplifies the SELECT by not having to code all that inside the FROM clause as a derived table. CREATE TABLE a ( customerID INTEGER NOT NULL, companyName VARCHAR ( 10 ) NOT NULL, productName VARCHAR ( 10 ) NOT NULL, PRIMARY KEY ( customerID, productName ) ); CREATE TABLE b ( customerID INTEGER NOT NULL, productName VARCHAR ( 10 ) NOT NULL, invoiceDate DATE NOT NULL, quantityReceived INTEGER NOT NULL, PRIMARY KEY ( customerID, productName, invoiceDate ), FOREIGN KEY fa ( customerID, productName ) REFERENCES a ( customerID, productName ) ); INSERT a VALUES ( 1, 'aaa', 'product1' ); INSERT b VALUES ( 1, 'product1', '2015-01-01', 10 ); INSERT b VALUES ( 1, 'product1', '2015-01-02', 20 ); INSERT a VALUES ( 1, 'aaa', 'product2' ); INSERT b VALUES ( 1, 'product2', '2014-11-01', 100 ); INSERT b VALUES ( 1, 'product2', '2014-12-01', 200 ); INSERT a VALUES ( 1, 'aaa', 'product3' ); COMMIT; WITH latest_b AS ( SELECT DISTINCT customerID, productName, FIRST_VALUE ( invoiceDate ) OVER date_window AS latest_invoiceDate, FIRST_VALUE ( quantityReceived ) OVER date_window AS latest_quantityReceived FROM b WINDOW date_window AS ( PARTITION BY customerID, productName ORDER BY invoiceDate DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ) SELECT a.customerID, a.companyName, a.productName, COALESCE ( latest_b.latest_invoiceDate, CURRENT DATE ) AS invoiceDate, COALESCE ( latest_b.latest_quantityReceived, 0 ) AS quantityReceived FROM a LEFT OUTER JOIN latest_b ON a.customerID = latest_b.customerID AND a.productName = latest_b.productName ORDER BY a.customerID, a.productName; customerID companyName productName invoiceDate quantityReceived ----------- ----------- ----------- ----------- ---------------- 1 aaa product1 2015-01-02 20 1 aaa product2 2014-12-01 200 1 aaa product3 2015-01-28 0 |
This is a clunkier way of doing it than Breck's, avoiding the OLAP functionality. BUT, while this may seem simpler if you aren't familiar with WINDOW, PARTITION BY and their friends, in real world examples it soon gets very messy! So the moral is, it's well worthwhile getting to grips with the OLAP as it is extraordinary powerful. There are quite a few things you would need to resolve in your example for real eg two deliveries on the same day. However for what it's worth heres's my fudge:
|
In addition to Justin's example, here's a similar query. It uses a group-by derived table to get the "latest delivery per customer/product instead of a sub select in the WHERE clause. I can't tell whether this is easier to use... Note: The missing WHERE clause helps to prevent the typical error that the LEFT JOIN is incidentally turned into an INNER JOIN by suppressing NULL values. Of course Justin has avoided that pitfall by explicitly using "b.InvoiceDate is null OR " in the WHERE clause. - Similarily, in this query it's important to do the B INNER JOIN Z before the outer join to A. select A.PK, a.CustomerID, a.CompanyName, a.ProductName, B.InvoiceDate, B.QuantityReceived from A left outer join (B inner join (select CustomerID, ProductName, max(InvoiceDate) as MaxInvoiceDate from B group by CustomerID, ProductName) Z on B.CustomerID = Z.CustomerID and B.ProductName = Z.ProductName and B.InvoiceDate = Z.MaxInvoiceDate) on A.CustomerID = B.CustomerID and A.ProductName = B.ProductName order by A.CustomerID, A.ProductName The result set is identical: PK,CustomerID,CompanyName,ProductName,InvoiceDate,QuantityReceived 1,1,'Co A','Prod A','2015-01-25',50 2,1,'Co A','Prod B','2015-01-06',30 3,1,'Co A','Prod C',, 4,2,'Co B','Prod A','2015-01-15',225 5,2,'Co B','Prod B',, 6,3,'Co C','Prod B',, 7,4,'Co D','Prod B',, I'm still getting used to use derived queries to calculate the "max values" (here Z) and to then join that with the "real data" (here B) to get the rows with the maximum values per ID. |