How do I import data from a file while skipping existing rows?

Once in a while I have to import data from a file into a table, where some rows may already exists (row with same primekey and same or diffent contents).

For an INSERT statement I can specify ON EXISTING {SKIP | UPDATE}.

Using temporary tables I can combine INPUT INTO or LOAD TABLE with an INSERT statement, but that's an additional overhead I'd like to avoid.

So I'd like to see LOAD TABLE and / or INPUT INTO to be enhanced with the same option.

asked 22 Jan '10, 09:00

Reimer%20Pods's gravatar image

Reimer Pods
4.2k344583
accept rate: 11%

I second that suggestion. SKIP and especially UPDATE are wondrous things.

(22 Jan '10, 11:00) Breck Carter

Thank you for your suggestion. We have previously discussed this issue within engineering - that is, handling of errors while loading data - and have this request listed as an enhancement to be consider for a future release.

permanent link

answered 22 Jan '10, 13:36

Mark%20Culp's gravatar image

Mark Culp
23.3k9132275
accept rate: 40%

Just for the record:

v16 has introduced according options for LOAD TABLE to handle errors, from the docs:

ALLOW ( integer | ALL | NO ) ERRORS clause This clause can only be specified once for the statement. The default value for this clause is 0, which means that a violation generates an error and the statement is rolled back. If you specify an integer, n, then on error n+1 the database server rolls back the statement. The values ALLOW NO ERRORS and ALLOW 0 ERRORS are equivalent. This clause allows the database server to set problematic data aside and progress with the load operation.

The database server reports the last error that was encountered to the user, and this error is also logged to the MESSAGE log. Rows that are written to the ROW log can be changed and used as input to a subsequent LOAD TABLE statement.

If a ROW LOG is written to a database server or client file, its contents are written in the same character set as the original input file. If a MESSAGE LOG is written to a server or client file, its contents are written in the client's language and in the client connection's CHAR character set. If a ROW or MESSAGE LOG is written to a CHAR or NCHAR variable, it is written in the CHAR or NCHAR (respectively) character set.

So these three new options ALLOW n ERRORS, MESSAGE LOG and ROW LOG should help to identify (and correct) invalid rows. Note, that's not the same as an ON EXISTING SKIP but should do as well for any kind of errors.


AFAIK, no similar change has been made for the INPUT statement.

permanent link

answered 09 Dec '14, 06:14

Volker%20Barth's gravatar image

Volker Barth
31.3k312458674
accept rate: 33%

As addition to Reimer's proposal:

Besides ON EXISTING, there might be other reasons for failure:

  • Wrong FK values
  • Wrong datatypes (both type and length)

I found it sometimes difficult to track down such errors, particularly when they only show up in some rows of a huge file.

One (uncomfortable) way I have used is to load into a local temp table with just varchar data to find out the failing values, and then to check the column values.

LOAD TABLE is fine and fast IMHO, but looking for unfitting values seems hard.

So a "LOAD TABLE ... [STOP | SKIP] ON ERROR" (instead of a rollback as it does IIRC) would be useful. I'm not sure if it currently lists the according line number - that would otherwise be of help, too.

permanent link

answered 22 Jan '10, 14:28

Volker%20Barth's gravatar image

Volker Barth
31.3k312458674
accept rate: 33%

cf. the response with John Smirnios in the NG: news://forums.sybase.com:119/4b59e12f$1@forums-1-dub

(22 Jan '10, 17:39) Volker Barth

I think that a good way to achieve what you ask for in existing versions of the server would be to use the OPENSTRING construct in an INSERT statement:

INSERT INTO T_INS
ON EXISTING UPDATE
SELECT *
FROM OPENSTRING( VALUE '1,1\n2,3')
WITH( TABLE T_INS ) O;

The OPENSTRING construct is the same code used to scan files for the LOAD TABLE statement and the INSERT will operate mostly the same way that LOAD TABLE does. There are differences, such as the automatic commit used for LOAD TABLE, the way that rows are logged, the checkpoint before and optionally after the statement, and the way that computed columns, check constraints, or triggers are handled. There may be some other differences I have not recalled. Using the OPENSTRING in an INSERT or even MERGE statement gives you the full flexibility of inserting and also allows you to use conditions in the WHERE clause to identify problematic rows (for example, not matching the data type or not matching foreign key declarations).

permanent link

answered 05 Jan '15, 14:20

Ivan%20T.%20Bowman's gravatar image

Ivan T. Bowman
2.8k22732
accept rate: 39%

A great suggestion, as usual:)

Just for the record: To import from files, as Reimer originally asked, it would be "OPENSTRING(FILE 'MyFileName.txt')..." obviously.

(06 Jan '15, 03:54) Volker Barth

Thanks to Ivan for that great answer and to Volker for the helpfull addition.

(06 Jan '15, 13:57) Reimer Pods
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×107
×13

question asked: 22 Jan '10, 09:00

question was seen: 2,673 times

last updated: 06 Jan '15, 13:57