Hi,

we have a discussion if it is worth to shorten the data in a PK-column. The column is VARCHAR(15), the data contents is also 15 character but more than half of it are redundant '0'-characters, means, the column contains data like that '000000000123456' while we know we need only '123456'. There are a lot of accesses to this table because this is a very central table referenced a lot of other tables via foreign key relations. That means the columns in those other tables are also subject of this optimization. The UL-Database runs on a Windows CE device on Flash Memory or SD-Card and we have optimization goals like overall performance and battery run-time.

Thanks for any hints,

Chris

asked 19 May '14, 03:39

Chris%20Werner's gravatar image

Chris Werner
20691018
accept rate: 100%

1

Do all such values start with several leading zeroes? - If so, I would think this would make indexes (as used to maintain FK relationships) on these values rather inefficient - compare this UL doc page "Choose and set the maximum hash size":

The row data

The row values of the data being stored in the database also influence the effectiveness of a hashed index.

For example, if you have a common prefix shared among entries of a given column, you may render the hash ineffective if you choose a size that only hashes prefixes. In this case, you need to choose a size that ensures more than just the common prefix is hashed. If the common prefix is long, you should consider not hashing the values at all.

I think the "000000000" might make up such a common prefix.

AFAIK, that would only have an influence if the index does store hashed column data at all, i.e. when configured that way...

(19 May '14, 04:11) Volker Barth

Thank you for the comment. Yes, all values start with eight or nine '0'-characters. The data are filled up to the maximum length with zeroes (for an external source of the data handle it that way). The longest signifikant content has 7 characters (so more than half of the data is not signifikant). We haven't plans yet to use hashed column data in indexes but that may change in the future.

(19 May '14, 05:02) Chris Werner
Replies hidden
1

Are all values strictly numeric? In that case the column could be converted to "integer", which would lead to better storage (and index) efficiency.

(19 May '14, 06:14) Reimer Pods

Thanks Reimer. For the time being the values are numeric but we can't exclude alphanumeric values for the future for this is controled by a third party ERP which allows this attribute to be alphanumeric. So we are bound to the alphanumeric datatype.

(19 May '14, 08:21) Chris Werner
1

Note that by default UL stores value hashes in the index to improve performance. (This works very well because an index can often be searched down to the target row without fetching any actual rows to do comparisons -- assuming the hashes represent the data well.)

The default hash size is 4 bytes. For a string, this is essentially the first four bytes of the string. So, in your case all index hashes will be "0000", and that's not going to help :-) In this case, you should get better performance by disabling the index hash. That way, index pages will contain more entries, and that improves performance too.

If you can arrange your key values to have more unique prefixes, the index hashing should be a significant benefit. You can obtain a best-case boundary for testing by using integers and the default 4 byte index hashes.

(20 May '14, 12:02) Tim McClements

Thanks for the very helpfull explanation! I am going to arrange the key values to have more unique prefixes by eleminating the leading zeroes from it.

(20 May '14, 13:46) Chris Werner
showing 4 of 6 show all flat view

It will go faster without the zero prefixes, and it will go even faster with an integer type. For these two cases, the default index value hashing is appropriate.

With the leading zeros, disabling the index value hashing will help. You can also try increasing the size of the index value hash to contain the entire key value. The maximum is 32 bytes.

As usual for performance decisions, to quantify the benefits you will need to test your application with the changes. Happily, adjusting the index value hash size is simple enough and doesn't require any application changes.

See the create table and create index statement 'with max hash size' clause.

permanent link

answered 20 May '14, 12:20

Tim%20McClements's gravatar image

Tim McClements
2.0k1830
accept rate: 35%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×162
×21

question asked: 19 May '14, 03:39

question was seen: 2,096 times

last updated: 20 May '14, 13:46