It's common for the first line of a comma-separated or tab-separated text file to be the column names for the data that follows, so it's natural to want to take advantage of this in OPENSTRING or LOAD TABLE. I'm sure that this would have already been considered by the SQLA development folks, so I'm wondering: Thanks! Loyal fan, asked 15 Sep '10, 23:45 Dan Konigsbach Volker Barth |
Maybe I am completely off the mark here, but assuming your server is on Windows, could you not create a remote data access server using the Microsoft Access Text Driver as the underlying ODBC driver. I believe the driver will then interpret the first line as the column names and make the appropriate guess as to what datatype the data is. You can then create a proxy table to the csv file and use select * into ... to create the base table with the proper column names, datatypes and data. Try the following: 1) create a file called foo.csv with the following contents
2) create a dsn called mycsvdsn using the Mictosoft Access Text Driver and set the directory to point to the directory that foo.csv resides in. 3) connect to SA and create a remote server
4) create the proxy table
5) query the proxy table to make sure the column names and data look correct
6) create and load the base table
7) verify that mybasetab has the correct schema and data HTH answered 16 Sep '10, 13:26 Karim Khamis |
OPENSTRING WITH and LOAD TABLE (...) serve multiple roles:
I'm suggesting that you want to sever the first role from them when input can self-identify its columns. Case 1: The input's first row says it has columns InA, InB, InC, InD, not necessarily in that order, and where you don't need InC. You might have
and not care that there's an extra column or if the file actually has the columns in a different order. Or, your destination table has columns DestP, DestQ, DestR, where InA -> DestQ, InB -> DestP, InD -> DestR. Making up a silly syntax just as an example, you might have
Case 2: The input file's first row has the same column names as your destination table, but not necessarily in the same order, and it might have extra columns or missing columns. Then, you might want something like a WITH AUTO NAME option, e.g.
which won't care about extra columns in the input and will set missing columns to their default. A capability like this would make external unload/reload much more robust and flexible. The old I-SQL INPUT had some self-defining formats, and though those formats are obsolete, the concept was good. A column name header row is so common in CSVs and TSVs that (humbly, and with great respect) I'd rather not have to go through Microsoft Desktop Driver ODBC support to use them. [All syntax examples are only to illustrate the point - not actual syntax proposals.] answered 16 Sep '10, 19:46 Dan Konigsbach |
I don't know about everyone else, but I'm not quite sure what you're asking for. Are you saying there should be a function to create said table 'on-the-fly' based on the column names?
No, if you look at the OPENSTRING WITH clause or the LOAD TABLE column-name list, you will see that these let you identify the columns in the input data, so that you can correlate them with the destination. (Technically, the LOAD TABLE column list refers to destination columns, but it is still identifying the columns in the input data.)
This means that the SQL statement needs to know the format of the incoming data, which is inherently less robust than having self-defining data.
When a CSV or TSV files is capable of being self-defining, I'm suggesting it should do so.
Understand now?
Without the row schema for OPENSTRING, we wouldn't be able to infer the datatype of the incoming data. Are you suggesting an option to return all columns as CHAR or perhaps NCHAR? We could use CHAR if the source encoding matches the CHAR encoding and NCHAR otherwise. For LOAD TABLE, I suppose we could match the column names from the data file with those in the target table and infer the datatype that way. Is that it?
I guess it would be nice for the LOAD TABLE to use the first line of a CSV as the field names, especially if for whatever reason you had exported them in a different order than you were importing them.
Obviously the unload table would need an option for it too :-)
@John: IMO this is a job for proxy tables (see Karim's reply), rather than adding a whole bunch of intelligence to the other techniques.
Karim's reply is pretty smart, I think it's even better than using the field names in the load table because it creates the fields too. Ta