According to this reply, a LONG VARCHAR COMPRESSED column value shorter that 256 bytes will not be compressed because the length is shorter than the default INLINE 256: http://sqlanywhere-forum.sap.com/questions/1038/column-compression/1039#1039
Suppose I have a LONG VARCHAR column with the following characteristics (in particular, avg_length = 152):
SELECT * FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 ); table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length 'DBA','ttt','ccc',9757647,0,,151.89775032807609,2348.2860711857497,0,170,151.89775032807609,2348.2860711857497,0,170
What are the pros and cons of forcing compression with COMPRESSED INLINE 0 PREFIX 0?
Will that even work? ( see, I'm not afraid to look stoopid :)
Edit: Yes, it seems to work ( not completely stoopid, at least not this time :)
Here is a small test, first of an uncompressed table fff:
CREATE TABLE fff ( pkey INTEGER NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY, ccc LONG VARCHAR NULL ); INSERT fff ( ccc ) SELECT REPEAT ( 'ABCDEFGHIJ', 15 ) FROM sa_rowgenerator ( 1, 10000 ); COMMIT;
Foxhound shows it uses up 1.7M of disk space (181 bytes per row) in a 18.104.22.1686 database with 4K page size:
CREATE TABLE DBA.fff ( -- 10,000 rows, 1.7M total = 1.7M table + 0 ext + 16k index, 181 bytes per row pkey /* PK */ INTEGER NOT NULL DEFAULT autoincrement, ccc LONG VARCHAR NULL, CONSTRAINT ASA85 PRIMARY KEY ( -- 16k pkey ) );
Here is what Foxhound says for table ttt, which is identical except for COMPRESSED INLINE 0 PREFIX 0:
CREATE TABLE DBA.ttt ( -- 10,000 rows, 868k total = 344k table + 452k ext + 72k index, 89 bytes per row pkey /* PK */ INTEGER NOT NULL DEFAULT autoincrement, ccc LONG VARCHAR COMPRESSED INLINE 0 PREFIX 0 NULL, CONSTRAINT ASA84 PRIMARY KEY ( -- 72k pkey ) );
Here are the compression statistics (in particular, avg_compression_ratio = 77%):
SELECT @@VERSION, * FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 ); @@VERSION,table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length '22.214.171.1246','DBA','ttt','ccc',10000,10000,76.66666666666667,150.0,0.0,150,150,,,,
So... it looks like COMPRESSED INLINE 0 PREFIX 0 can reduce disk usage quite a bit, even for a relatively short column.
What are the cons?
There are three cons:
Determining whether or not to use compression is, possibly counterintuitively, more a question of performance than disk savings, since disk space is really cheap these days. Saving a few megabytes here and there is less compelling if all your queries slow down (especially if you can get a terabyte for less than $100). If you have a fast enough processor, you likely want to compress as much as possible since decompression is going to be cheaper than extra disk I/Os (i.e. reading 20 compressed pages and decompressing them all may be faster than reading 200 uncompressed pages). If you have a fast disk and a slow processor, decompression may be slower but doing some extra reads may not be that big a deal.
One con: you have at least 12 bytes header for the compression.
answered 24 Sep '10, 06:44
Here are the results of two tests of the INSERT side of a high-throughput application, before and after COMPRESSED INLINE 0 PREFIX 0 was specified for one single LONG VARCHAR column:
Average Core2 Quad Transaction Disk Disk Average Compression CPU Usage Time Space Space Uncompressed Ratio for INSERT for INSERT Rows for Table per Row Column Length for Column Process Process --------- --------- --------- --------- ---------- ---------- ------- Before 9,703,524 5.9G 656 bytes 152 bytes 0 35% 1 sec After 28,776 18M 657 bytes 158 bytes 12.6% 80% 30 sec
Some comments and observations...
Clearly, at least in this case, COMPRESSED INLINE 0 PREFIX 0 was a really bad idea... and perhaps it is a bad idea in all cases.
Perhaps there is a good reason the defaults are INLINE 256 PREFIX 8.
And perhaps the suggestion "don't use COMPRESSED on short columns" is a good one.
answered 25 Sep '10, 08:18