How do I import data from a file while skipping existing rows? Once in a while I have to import data from a file into a table, where some rows may already exists (row with same primekey and same or diffent contents). For an INSERT statement I can specify ON EXISTING {SKIP | UPDATE}. Using temporary tables I can combine INPUT INTO or LOAD TABLE with an INSERT statement, but that's an additional overhead I'd like to avoid. So I'd like to see LOAD TABLE and / or INPUT INTO to be enhanced with the same option. |
Thank you for your suggestion. We have previously discussed this issue within engineering - that is, handling of errors while loading data - and have this request listed as an enhancement to be consider for a future release. |
Just for the record: v16 has introduced according options for LOAD TABLE to handle errors, from the docs:
So these three new options ALLOW n ERRORS, MESSAGE LOG and ROW LOG should help to identify (and correct) invalid rows. Note, that's not the same as an ON EXISTING SKIP but should do as well for any kind of errors. AFAIK, no similar change has been made for the INPUT statement. |
As addition to Reimer's proposal: Besides ON EXISTING, there might be other reasons for failure:
I found it sometimes difficult to track down such errors, particularly when they only show up in some rows of a huge file. One (uncomfortable) way I have used is to load into a local temp table with just varchar data to find out the failing values, and then to check the column values. LOAD TABLE is fine and fast IMHO, but looking for unfitting values seems hard. So a "LOAD TABLE ... [STOP | SKIP] ON ERROR" (instead of a rollback as it does IIRC) would be useful. I'm not sure if it currently lists the according line number - that would otherwise be of help, too. cf. the response with John Smirnios in the NG: news://forums.sybase.com:119/4b59e12f$1@forums-1-dub |
I think that a good way to achieve what you ask for in existing versions of the server would be to use the OPENSTRING construct in an INSERT statement:
The OPENSTRING construct is the same code used to scan files for the LOAD TABLE statement and the INSERT will operate mostly the same way that LOAD TABLE does. There are differences, such as the automatic commit used for LOAD TABLE, the way that rows are logged, the checkpoint before and optionally after the statement, and the way that computed columns, check constraints, or triggers are handled. There may be some other differences I have not recalled. Using the OPENSTRING in an INSERT or even MERGE statement gives you the full flexibility of inserting and also allows you to use conditions in the WHERE clause to identify problematic rows (for example, not matching the data type or not matching foreign key declarations). A great suggestion, as usual:) Just for the record: To import from files, as Reimer originally asked, it would be "OPENSTRING(FILE 'MyFileName.txt')..." obviously.
(06 Jan '15, 03:54)
Volker Barth
Thanks to Ivan for that great answer and to Volker for the helpfull addition.
(06 Jan '15, 13:57)
Reimer Pods
|
I second that suggestion. SKIP and especially UPDATE are wondrous things.