Please be aware that the content in SAP SQL Anywhere Forum will be migrated to the SAP Community in June and this forum will be retired.

Say, I have imported data from a proxy table PT to a local table T, which has been created before via CREATE TABLE T LIKE FT. And unfortunately I have missed the fact that the PRIMARY KEY constraint was missing for T, and some imports and data manipulation tasks later I notice there are duplicate rows in T with fully identical values, so I cannot distinguish them by their values at all. (I.e. "select * from T" will show duplicate rows, "select distinct * from T" will not.)

How can I drop the duplicate rows? (Because I don't want to start from scratch again via deleting all data and re-importing again?

(I'm gonna post an answer, because I have stumbled upon that too often, so I need to publish the solution...)

asked 08 Apr, 12:15

Volker%20Barth's gravatar image

Volker Barth
40.2k362550822
accept rate: 34%


Even when 2 rows have identical values (including the column that misses the desired "primary key", here named ID), their ROWID values will be different, and this can easily be used to distinguish them.

You can delete the 2nd ff. instance of the same logical row by using row_number() to number the identical rows based on their ROWID value and then simply delete all but the first instances per ID.

delete T
-- select rowid(T) as InternalRowID, DT.InternalRowID, DT.LogicalRowNumber, T.
from T
   inner join 
      (select ID,
          rowid(T) as InternalRowID,
          row_number() over (partition by ID order by InternalRowID) as LogicalRowNumber
       from T
       where ID in (select ID from T group by ID having count() > 1)
       order by ID, LogicalRowNumber
      ) DT
         on T.ID = DT.ID and rowid(T) = DT.InternalRowID and DT.LogicalRowNumber > 1
order by T.DD;

After the duplicates have been deleted, make sure you declare the missing PK on T.ID. :)

permanent link

answered 08 Apr, 12:28

Volker%20Barth's gravatar image

Volker Barth
40.2k362550822
accept rate: 34%

edited 08 Apr, 12:30

Slidely different, but equal idea:

create table t(tt as integer);
//Many inserts, simulated with a few rowgenerator records
insert into t(tt) select row_num from sa_rowgenerator(1,5);
insert into t(tt) select row_num from sa_rowgenerator(1,5);
insert into t(tt) select row_num from sa_rowgenerator(1,5);
insert into t(tt) select row_num from sa_rowgenerator(1,5);

delete T
where rowid(T) in 
(
  select rID from (
    select rowid(T) rID, Row_Number() over w RowNum, * from T
    window W as (partition by tt /*and all other fields that shoudl be partitioned separately*/ order by rID)
  ) T2 
  where T2.RowNum > 1
)

This should leave only 1 set of the 5 records (for those are unique)

permanent link

answered 09 Apr, 08:33

ArcoW's gravatar image

ArcoW
2613315
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×246

question asked: 08 Apr, 12:15

question was seen: 169 times

last updated: 09 Apr, 08:33