How do you delete duplicate rows when their data does not differ at all?

Say, I have imported data from a proxy table PT to a local table T, which has been created before via CREATE TABLE T LIKE FT. And unfortunately I have missed the fact that the PRIMARY KEY constraint was missing for T, and some imports and data manipulation tasks later I notice there are duplicate rows in T with fully identical values, so I cannot distinguish them by their values at all. (I.e. "select * from T" will show duplicate rows, "select distinct * from T" will not.)

How can I drop the duplicate rows? (Because I don't want to start from scratch again via deleting all data and re-importing again?

(I'm gonna post an answer, because I have stumbled upon that too often, so I need to publish the solution...)

sa-17

asked 08 Apr, 12:15

Volker Barth
40.2k●362●550●822
accept rate: 34%

flat view

2 Answers:

active answers oldest answers newest answers popular answers

Even when 2 rows have identical values (including the column that misses the desired "primary key", here named ID), their ROWID values will be different, and this can easily be used to distinguish them.

You can delete the 2nd ff. instance of the same logical row by using row_number() to number the identical rows based on their ROWID value and then simply delete all but the first instances per ID.

delete T
-- select rowid(T) as InternalRowID, DT.InternalRowID, DT.LogicalRowNumber, T.
from T
   inner join 
      (select ID,
          rowid(T) as InternalRowID,
          row_number() over (partition by ID order by InternalRowID) as LogicalRowNumber
       from T
       where ID in (select ID from T group by ID having count() > 1)
       order by ID, LogicalRowNumber
      ) DT
         on T.ID = DT.ID and rowid(T) = DT.InternalRowID and DT.LogicalRowNumber > 1
order by T.DD;

After the duplicates have been deleted, make sure you declare the missing PK on T.ID. :)

permanent link

answered 08 Apr, 12:28

Volker Barth
40.2k●362●550●822
accept rate: 34%

edited 08 Apr, 12:30

flat view

Slidely different, but equal idea:

create table t(tt as integer);
//Many inserts, simulated with a few rowgenerator records
insert into t(tt) select row_num from sa_rowgenerator(1,5);
insert into t(tt) select row_num from sa_rowgenerator(1,5);
insert into t(tt) select row_num from sa_rowgenerator(1,5);
insert into t(tt) select row_num from sa_rowgenerator(1,5);

delete T
where rowid(T) in 
(
  select rID from (
    select rowid(T) rID, Row_Number() over w RowNum, * from T
    window W as (partition by tt /*and all other fields that shoudl be partitioned separately*/ order by rID)
  ) T2 
  where T2.RowNum > 1
)

This should leave only 1 set of the 5 records (for those are unique)

permanent link

answered 09 Apr, 08:33

ArcoW
261●3●3●15
accept rate: 0%

flat view

Your answer

toggle preview

community wiki:

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

*italic* or _italic_
**bold** or __bold__
link:[text](http://url.com/ "title")
image?![alt text](/path/img.jpg "title")
numbered list: 1. Foo 2. Bar
to add a line break simply add two spaces to where you would like the new line to be.
basic HTML tags are also supported

learn more about Markdown

Question tags:

sa-17 ×246

question asked: 08 Apr, 12:15

question was seen: 169 times

last updated: 09 Apr, 08:33

SAP SQL Anywhere

How do you delete duplicate rows when their data does not differ at all?

Follow this question

Related questions