How is performance impacted when you have a table that, lets say could have over 1M rows, and there are 3-5 columns that are consistently NULL.
Is it bad design to have data structures in place that support this? Should we look at normalizing it a little more?
P.S. Love this place.
Glad you like this place! It's only going to get better, when more people show up.
I'm not sure what you are asking in your second sentence, in particular I'm not sure what NULL has to do with normalization.
Tackling the first question...
I don't think NULL values cause performance problems per se, at least not that you'd notice. I know that historically, especially with those icky legacy DBMS products like [expletive deleted] and [bodily harm], folks used to worry a lot about fixed-versus-variable length columns, physical ordering of columns in rows, [excrement] like that.
What CAN hurt performance somewhat is inserting rows with all nulls, all zeroes, all empty character strings, and then later updating rows to fill in the values: instant row splits across multiple pages, instant fragmentation of your physical database storage. At least, that was true historically with SQL Anywhere... I am not absolutely sure V11 hasn't eased that problem, I will let Mr. Paulley et al fill in the blanks here.
I haven't run any INSERT/UPDATE/rowsplit tests lately, and when I did it wasn't easy to demonstrate the problem, you had to work at it. And it wasn't directly caused by NULLs per se, but by increasing value sizes over time.
I guess I'm saying, there are lots of reasons to hate NULLs but performance isn't one of them.
answered 15 Jan '10, 08:06
In addition to Breck's and Martin's responses, Nulls are quite ignorable w.r.t. to storage size:
From the SA 11 docs:
answered 15 Jan '10, 09:18