There is no better performance test data than production data, BUT sometimes production data must be "anonymized" before being used by developers.

Does anyone have any suggestions about how to do this so that the SQL Anywhere query optimizer will pick the more-or-less the same plans and result in more-or-less the same performance?

Some possible rules come to mind: Any two equal input values must result in two equal output values; any two different input values must result in two different output values; the distribution of output values must "look" more-or-less-the same as the distribution of input values.

Does this only matter for columns in primary keys, foreign keys and indexes?

Should statistics be recreated after anonymizing? ...or is there a danger recreation will throw everything off?

asked 17 Nov '14, 14:02

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

It seems that you need the obfuscating functionality. I do not know if it makes sense to obfuscate primitive values, but for strings you can try to use the funny function from the link http://dba.stackexchange.com/questions/23786/data-obfuscation-in-sql-server.
Or something similar.

(18 Nov '14, 03:03) Vlad
Replies hidden

Thanks!

The point of the question, however, is not how to anonymize data, but how to do it to a SQL Anywhere database without affecting query performance one way or another.

It may not be possible... perhaps the answer is to leave the database data alone and anonymize the query output, but that may not satisfy the letter of the law.

(18 Nov '14, 07:39) Breck Carter

In order to understand: Is there a further rule "input and output values must be of (almost) the same length"? - Say, encrypting all strings with the same key would satisfy the first and second rule but the distribution would certainly be different, and (at least when using the builtin ENCRYPT() function) output data would be noticably longer, so I guess that could have a significant influence on plans.


If you would however "simply" modify data by replacing particular bytes with other bytes (as in a "Substitution cipher", I guess all four rules would hold but apparently it would be not too difficult to encode that "anonymization".

Yes, that's a comment, no answer at all...

(18 Nov '14, 07:58) Volker Barth

Or at least, translate the strings to any ASCII-based language before encrypting them. This will add a little more complexity :)

(18 Nov '14, 08:24) Vlad
Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×1

question asked: 17 Nov '14, 14:02

question was seen: 8,222 times

last updated: 18 Nov '14, 08:24

Related questions