I would like to have a table that would be de-normalized. This makes for a fairly large table with almost 600 columns. The question I have is that the columns are typically varying in length. For flexibility I am thinking of just making them all varchar 256 even though most of them could easily be varchar 40 Is there an issue with performance by doing this? |
Within the server there will be little to no difference since the server only stores the number of bytes that are actually present in each column. There will be a client side difference though. When a column is described the server will tell the client that the column could be as large as 256 bytes. This will result in the client allocating (at least) 256 bytes for the column (note: exact number will depend on API). This also affects the stream layer within the client library: the low level stream will be expecting up to 256 bytes for each column and therefore will need to allocate a large enough buffer to hold N rows (when N will vary) when prefetching the rows from the server. There are limits on the size of buffer that will be allocated so depending on the number of rows being fetched it could limit the number of rows that are prefetched (and hence could reduce throughput if you are fetching thousands or millions of rows). HTH answered 29 Jul '16, 10:03 Mark Culp |
I recently worked on converting a database from SQL Anywhere 9 to Microsoft SQL Server where one of the tables contained 5,110 columns, most of which were defined as LONG VARCHAR. It contained 456,811 rows, 472M total = 459M table + 3.1M ext + 9.5M index, 1,083 bytes per row, so it was clearly sparse. The "bytes per row" includes all disk space allocated to the table (all data and index pages) so it overstates the total amount of column data in the rows. As far as I know it was a key table with no performance issues reported to me. FWIW MSS was utterly incapable of handling such a [cough] curiosity without major surgery (on the table, not MSS :) answered 29 Jul '16, 10:32 Breck Carter On a side note what is the impact of using varchar(20) vs numeric (20,10)
(29 Jul '16, 10:56)
TPS
Replies hidden
Here are a couple of excerpts from my book: 1.5.1 A String Is a String: BINARY, CHARACTER, LONG All character and binary columns are stored as varying length character strings regardless of how they are declared. The maximum length specifies a limit on the byte size of the data portion of the string, with a default of 1 byte. The LONG VARCHAR and LONG BINARY types have an implied maximum length of 2GB. <string_type> ::= <char_type> [ "(" <maximum_length> ")" ] | LONG BINARY | LONG VARCHAR <char_type> ::= BINARY | CHAR [ VARYING ] | CHARACTER [ VARYING ] | VARBINARY | VARCHAR <maximum_length> ::= integer literal in the range 1 to 32767 Tip: All these data types, including LONG VARCHAR and LONG BINARY, may be used for local and global variables in stored procedures and other SQL scripts, as well as for columns in tables. Storage requirements depend on the current length of each column value rather than the maximum length. Long strings are split and require more overhead than short strings, whereas short strings are stored efficiently even if they are declared as LONG VARCHAR. Here’s how it works: String values up to 254 bytes in length are always stored together with the other columns in the row. When the length grows to 255 bytes or larger the value is partitioned into two pieces; the first piece is 254 bytes long and remains where it was, while the remainder is called a blob continuation and is placed on one or more separate pages called extension pages. These extension pages are kept separate so that a query or sequential scan that doesn’t need to look at the long values won’t have to retrieve all these pages. This arrangement is described in more detail in Section 10.6.2, “Table Fragmentation.” From a SQL programming point of view, a string is a string in SQL Anywhere 9 and you don’t have to worry about the declared data type. For example, if you think all company names will fit into 30 characters but you are concerned about exceptions, there is no performance penalty for using CHARACTER ( 100 ) or even 1000. Similarly, a description column that will usually require only a few hundred characters can be declared as LONG VARCHAR to handle those special cases; your database won’t grow in size until you actually store very long values. Exactly the same data may be stored in either CHARACTER or BINARY columns. In particular, the zero byte (hexadecimal 00) may be stored in a CHARACTER column and it is treated as data, not a string terminator. Tip: In some programming environments the zero byte string terminator is called “null.” This is not the same as the database NULL value implemented by SQL Anywhere 9; database NULLs require special handling when they are used in applications. There are a few exceptions to the assumption “a string is a string.” First, sorting and comparisons involving BINARY columns always use the actual binary values, whereas CHARACTER columns are sorted and compared according to the database collation sequence and case sensitivity. For example, in a case-insensitive database (the default) the CHARACTER values 'a' and 'A' are treated as being equal, whereas the BINARY 'a' is treated as being less than the BINARY 'A' when they are compared or sorted. Tip: Use the CAST function when you need to perform case-sensitive comparisons in a case-insensitive database; e.g., IF CAST ( char1 AS BINARY ) = CAST ( char2 AS BINARY ). This also works in the WHERE and ORDER BY clauses, and can be used when you need to ignore the database collation sequence. Note: This book doesn’t discuss multi-byte character sets, except to note that some techniques, like the Tip above, are only intended for single-byte character sets.
(29 Jul '16, 13:02)
Breck Carter
Continued... Second, a few functions only work on the first 255 bytes of the character string arguments: SOUNDEX, SIMILAR, and all the date and time functions ignore anything past 255 bytes.
(29 Jul '16, 13:02)
Breck Carter
|
Just to hint at some similar FAQs - yes, I'm aware you have not asked for CHAR or LONG VARCHAR:) but there are further details when the length might matter or not:
As to the "de-normalized" table: Have you considered to use a materialized view (in case that would be of help?)