According to this reply, a LONG VARCHAR COMPRESSED column value shorter that 256 bytes will not be compressed because the length is shorter than the default INLINE 256: http://sqlanywhere-forum.sap.com/questions/1038/column-compression/1039#1039 Suppose I have a LONG VARCHAR column with the following characteristics (in particular, avg_length = 152): SELECT * FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 ); table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length 'DBA','ttt','ccc',9757647,0,,151.89775032807609,2348.2860711857497,0,170,151.89775032807609,2348.2860711857497,0,170 What are the pros and cons of forcing compression with COMPRESSED INLINE 0 PREFIX 0? Will that even work? ( see, I'm not afraid to look stoopid :) Edit: Yes, it seems to work ( not completely stoopid, at least not this time :) Here is a small test, first of an uncompressed table fff: CREATE TABLE fff ( pkey INTEGER NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY, ccc LONG VARCHAR NULL ); INSERT fff ( ccc ) SELECT REPEAT ( 'ABCDEFGHIJ', 15 ) FROM sa_rowgenerator ( 1, 10000 ); COMMIT; Foxhound shows it uses up 1.7M of disk space (181 bytes per row) in a 11.0.1.2276 database with 4K page size: CREATE TABLE DBA.fff ( -- 10,000 rows, 1.7M total = 1.7M table + 0 ext + 16k index, 181 bytes per row pkey /* PK */ INTEGER NOT NULL DEFAULT autoincrement, ccc LONG VARCHAR NULL, CONSTRAINT ASA85 PRIMARY KEY ( -- 16k pkey ) ); Here is what Foxhound says for table ttt, which is identical except for COMPRESSED INLINE 0 PREFIX 0: CREATE TABLE DBA.ttt ( -- 10,000 rows, 868k total = 344k table + 452k ext + 72k index, 89 bytes per row pkey /* PK */ INTEGER NOT NULL DEFAULT autoincrement, ccc LONG VARCHAR COMPRESSED INLINE 0 PREFIX 0 NULL, CONSTRAINT ASA84 PRIMARY KEY ( -- 72k pkey ) ); Here are the compression statistics (in particular, avg_compression_ratio = 77%): SELECT @@VERSION, * FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 ); @@VERSION,table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length '11.0.1.2276','DBA','ttt','ccc',10000,10000,76.66666666666667,150.0,0.0,150,150,,,, So... it looks like COMPRESSED INLINE 0 PREFIX 0 can reduce disk usage quite a bit, even for a relatively short column. What are the cons? asked 23 Sep '10, 20:03 Breck Carter |
There are three cons:
Determining whether or not to use compression is, possibly counterintuitively, more a question of performance than disk savings, since disk space is really cheap these days. Saving a few megabytes here and there is less compelling if all your queries slow down (especially if you can get a terabyte for less than $100). If you have a fast enough processor, you likely want to compress as much as possible since decompression is going to be cheaper than extra disk I/Os (i.e. reading 20 compressed pages and decompressing them all may be faster than reading 200 uncompressed pages). If you have a fast disk and a slow processor, decompression may be slower but doing some extra reads may not be that big a deal. answered 23 Sep '10, 22:50 Graeme Perrow 5
Graeme alludes to a third con but doesn't explicitly state it:
@Graeme: What if the 99.9% of the queries don't involve the commpressed column, and what if INLINE 0 PREFIX 0 reduces the size of the data portion of the row by one third? (even if not COMPRESSED) Performance is not just CPU, performance is disk I/O... sometimes it's ALL ABOUT disk I/O, and having that cheapo terabyte disk doesn't help if it takes forever to read it. (and yes, in fact, I am using a pair of cheapo terabyte disks for the testing, but alas, they cost more than $100 when I bought them a couple of years ago :) @Graeme: Folks sometimes put the blobs in separate tables... INLINE 0 PREFIX 0 has a somewhat similar effect without all the DDL sturm und drang... and possibly better performance. Caveat: Putting blobs in a separate table is still necessary if the point is to avoid unnecessary MobiLink upload traffic if the other columns are frequently uploaded but the blobs aren't. Thanks Mark - I've updated my answer. |
One con: you have at least 12 bytes header for the compression. answered 24 Sep '10, 06:44 Martin Is that written down somewhere, or did someone tell you... or do you know because you wrote the code? I'm not doubting you, I just have a compelling need for, as they say, "original sources" :) Documentation says, that for the column compression the same algorithm is used as for the Compress function. In the documentation for the Compress function it says, that 12 bytes extra are needed. Ooooo... it also says this, I wonder if it applies to COMPRESSED columns? "If you are storing compressed values in a table, the column should be BINARY or LONG BINARY so that character set conversion is not performed on the data." 1
@Breck: No, that part does not apply to compressed columns. With a compressed column, the character set conversion is performed on the uncompressed data, so you can use VARCHAR or BINARY. That's referring to storing a value that has been compressed externally (or through the COMPRESS function). @Graeme: I get it, you would definitely NOT want any conversion operation applied to the value after compression, so the data type should be BINARY, when using the COMPRESS function. |
Here are the results of two tests of the INSERT side of a high-throughput application, before and after COMPRESSED INLINE 0 PREFIX 0 was specified for one single LONG VARCHAR column: Average Core2 Quad Transaction Disk Disk Average Compression CPU Usage Time Space Space Uncompressed Ratio for INSERT for INSERT Rows for Table per Row Column Length for Column Process Process --------- --------- --------- --------- ---------- ---------- ------- Before 9,703,524 5.9G 656 bytes 152 bytes 0 35% 1 sec After 28,776 18M 657 bytes 158 bytes 12.6% 80% 30 sec Some comments and observations...
Clearly, at least in this case, COMPRESSED INLINE 0 PREFIX 0 was a really bad idea... and perhaps it is a bad idea in all cases. Perhaps there is a good reason the defaults are INLINE 256 PREFIX 8. And perhaps the suggestion "don't use COMPRESSED on short columns" is a good one. answered 25 Sep '10, 08:18 Breck Carter |