Please be aware that the content in SAP SQL Anywhere Forum will be migrated to the SAP Community in June and this forum will be retired.

According to this reply, a LONG VARCHAR COMPRESSED column value shorter that 256 bytes will not be compressed because the length is shorter than the default INLINE 256: http://sqlanywhere-forum.sap.com/questions/1038/column-compression/1039#1039

Suppose I have a LONG VARCHAR column with the following characteristics (in particular, avg_length = 152):

SELECT * FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 );
table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length
'DBA','ttt','ccc',9757647,0,,151.89775032807609,2348.2860711857497,0,170,151.89775032807609,2348.2860711857497,0,170

What are the pros and cons of forcing compression with COMPRESSED INLINE 0 PREFIX 0?

Will that even work? ( see, I'm not afraid to look stoopid :)


Edit: Yes, it seems to work ( not completely stoopid, at least not this time :)

Here is a small test, first of an uncompressed table fff:

CREATE TABLE fff (
   pkey   INTEGER NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,
   ccc    LONG VARCHAR NULL );

INSERT fff ( ccc )
SELECT REPEAT ( 'ABCDEFGHIJ', 15 )
  FROM sa_rowgenerator ( 1, 10000 );
COMMIT;

Foxhound shows it uses up 1.7M of disk space (181 bytes per row) in a 11.0.1.2276 database with 4K page size:

CREATE TABLE DBA.fff ( -- 10,000 rows, 1.7M total = 1.7M table + 0 ext + 16k index, 181 bytes per row
   pkey   /* PK        */ INTEGER NOT NULL DEFAULT autoincrement,
   ccc                    LONG VARCHAR NULL,
   CONSTRAINT ASA85 PRIMARY KEY ( -- 16k
      pkey )
 );

Here is what Foxhound says for table ttt, which is identical except for COMPRESSED INLINE 0 PREFIX 0:

CREATE TABLE DBA.ttt ( -- 10,000 rows, 868k total = 344k table + 452k ext + 72k index, 89 bytes per row
   pkey   /* PK        */ INTEGER NOT NULL DEFAULT autoincrement,
   ccc                    LONG VARCHAR COMPRESSED INLINE 0 PREFIX 0 NULL,
   CONSTRAINT ASA84 PRIMARY KEY ( -- 72k
      pkey )
 );

Here are the compression statistics (in particular, avg_compression_ratio = 77%):

SELECT @@VERSION, * 
  FROM sa_column_stats ( tab_name = 'ttt', col_name = 'ccc', tab_owner = 'DBA', max_rows = 1000000000 );

@@VERSION,table_owner,table_name,column_name,num_rows_processed,num_values_compressed,avg_compression_ratio,avg_length,stddev_length,min_length,max_length,avg_uncompressed_length,stddev_uncompressed_length,min_uncompressed_length,max_uncompressed_length
'11.0.1.2276','DBA','ttt','ccc',10000,10000,76.66666666666667,150.0,0.0,150,150,,,,

So... it looks like COMPRESSED INLINE 0 PREFIX 0 can reduce disk usage quite a bit, even for a relatively short column.

What are the cons?

asked 23 Sep '10, 20:03

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

edited 23 Sep '10, 20:29


There are three cons:

  1. Values that are short enough (sorry, I don't remember the threshold) will actually take up more space even when they are compressed because the overhead of compression is larger than the space savings.
  2. If you have prefix 0 inline 0, then no uncompressed data is stored at all. This means that every access to every piece of your string requires decompression. If you do select substr(ccc,1,1) from ttt we would have to decompress the first chunk of each and every string. If you have a non-zero values for prefix, no decompression of any string would be necessary. If you have prefix 0 but a non-zero inline value, then strings shorter than the inline value would not need to be decompressed but longer ones would.
  3. Since none of the row data is stored inline, all of the compressed data is stored in an extension page. This means that at least two pages must be read (one containing the pointer to the extension page, and the extension page itself) in order to get the compressed data. For short values kept inline, all the compressed data can be stored within the row itself, thus saving a page read.

Determining whether or not to use compression is, possibly counterintuitively, more a question of performance than disk savings, since disk space is really cheap these days. Saving a few megabytes here and there is less compelling if all your queries slow down (especially if you can get a terabyte for less than $100). If you have a fast enough processor, you likely want to compress as much as possible since decompression is going to be cheaper than extra disk I/Os (i.e. reading 20 compressed pages and decompressing them all may be faster than reading 200 uncompressed pages). If you have a fast disk and a slow processor, decompression may be slower but doing some extra reads may not be that big a deal.

permanent link

answered 23 Sep '10, 22:50

Graeme%20Perrow's gravatar image

Graeme Perrow
9.6k379124
accept rate: 54%

edited 24 Sep '10, 12:30

5

Graeme alludes to a third con but doesn't explicitly state it:

  1. Using inline 0 prefix 0 requires an extra disk i/o for any access to the column data because the compressed data is stored in the extension arena (i.e. set of pages).
(24 Sep '10, 04:34) Mark Culp

@Graeme: What if the 99.9% of the queries don't involve the commpressed column, and what if INLINE 0 PREFIX 0 reduces the size of the data portion of the row by one third? (even if not COMPRESSED) Performance is not just CPU, performance is disk I/O... sometimes it's ALL ABOUT disk I/O, and having that cheapo terabyte disk doesn't help if it takes forever to read it. (and yes, in fact, I am using a pair of cheapo terabyte disks for the testing, but alas, they cost more than $100 when I bought them a couple of years ago :)

(24 Sep '10, 10:43) Breck Carter

@Graeme: Folks sometimes put the blobs in separate tables... INLINE 0 PREFIX 0 has a somewhat similar effect without all the DDL sturm und drang... and possibly better performance. Caveat: Putting blobs in a separate table is still necessary if the point is to avoid unnecessary MobiLink upload traffic if the other columns are frequently uploaded but the blobs aren't.

(24 Sep '10, 10:51) Breck Carter

Thanks Mark - I've updated my answer.

(24 Sep '10, 12:30) Graeme Perrow

One con: you have at least 12 bytes header for the compression.

permanent link

answered 24 Sep '10, 06:44

Martin's gravatar image

Martin
9.0k130169257
accept rate: 14%

Is that written down somewhere, or did someone tell you... or do you know because you wrote the code? I'm not doubting you, I just have a compelling need for, as they say, "original sources" :)

(24 Sep '10, 10:46) Breck Carter

Documentation says, that for the column compression the same algorithm is used as for the Compress function. In the documentation for the Compress function it says, that 12 bytes extra are needed.

(24 Sep '10, 12:07) Martin

Ooooo... it also says this, I wonder if it applies to COMPRESSED columns? "If you are storing compressed values in a table, the column should be BINARY or LONG BINARY so that character set conversion is not performed on the data."

(24 Sep '10, 13:02) Breck Carter
1

@Breck: No, that part does not apply to compressed columns. With a compressed column, the character set conversion is performed on the uncompressed data, so you can use VARCHAR or BINARY. That's referring to storing a value that has been compressed externally (or through the COMPRESS function).

(24 Sep '10, 14:39) Graeme Perrow

@Graeme: I get it, you would definitely NOT want any conversion operation applied to the value after compression, so the data type should be BINARY, when using the COMPRESS function.

(24 Sep '10, 19:32) Breck Carter

Here are the results of two tests of the INSERT side of a high-throughput application, before and after COMPRESSED INLINE 0 PREFIX 0 was specified for one single LONG VARCHAR column:

                                                     Average    Core2 Quad  Transaction
                   Disk      Disk       Average    Compression  CPU Usage      Time     
                   Space     Space   Uncompressed     Ratio     for INSERT  for INSERT
          Rows   for Table  per Row  Column Length  for Column   Process     Process
       --------- --------- ---------   ---------    ----------  ----------   -------
Before 9,703,524   5.9G    656 bytes   152 bytes        0          35%        1 sec
After     28,776    18M    657 bytes   158 bytes      12.6%        80%       30 sec

Some comments and observations...

  1. The two tests aren't exactly the same: the "Before" test ran a lot longer, and the uncompressed column lengths turned out to be slightly different. Otherwise, the only difference is that the "After" test used COMPRESSED INLINE 0 PREFIX 0 on one single LONG VARCHAR column.

  2. The "Disk Space" columns are based on the total number of pages allocated to the table. This includes the data, extension and index pages as well as any free space contained therein... in other words, real disk space usage.

  3. The compression ratio was fairly low. That was a disappointment.

  4. Approximately 5 bytes per row were saved, hardly worth it (6 extra bytes per column value, minus 1 extra byte disk space per row). The compression ratio indicates 20 bytes per row were saved; perhaps the missing 15 bytes went towards the overhead of creating extension pages.

  5. It is possible that the "After" test was too short to yield accurate Disk Space values, making point 4 meaningless. But that's moot, read on...

  6. All four CPU processors were pegged at 100% in the "After" test, with 80% of that going to the INSERT Process... that was a big surprise.

  7. The Transaction Time was an even bigger surprise. From the end-user point of view, the application was crushed. That's the reason the "After" test was cut short: the patient was dying, time to stop the experiment.

Clearly, at least in this case, COMPRESSED INLINE 0 PREFIX 0 was a really bad idea... and perhaps it is a bad idea in all cases.

Perhaps there is a good reason the defaults are INLINE 256 PREFIX 8.

And perhaps the suggestion "don't use COMPRESSED on short columns" is a good one.

permanent link

answered 25 Sep '10, 08:18

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×275

question asked: 23 Sep '10, 20:03

question was seen: 2,233 times

last updated: 28 Sep '10, 21:16