Null
is an important, but sometimes hard concept. What’s the difference between an empty string and a null
string? One of my first Stack Overflow questions was (NOT) NULL for NVARCHAR columns.
Some people claim that using null
is always wrong or is some kind of voodoo.
If you feel the urge to use Nullable<int> or Nullable<bool> stop! You're doing it wrong…
— Mike Hadlow (@mikehadlow) December 13, 2012
This time Mike is wrong. There is a place for null
, including Nullable<bool>
and Nullable<int>
.
Null means no value
null
is a special value that means no value. For plain C pointers it’s just a name for the magic number 0 (I know that according to the specification it can have another numerical representation than 0, but in reality it doesn’t). In C# null
is a special value that’s not part of the reference value space. The same is true for nullable value types (Nullable<int>
or int?
for short) where an int?
can take any permitted value for an int
or be null
. For a SQL column the same is true, a nullable int column can take any possible int
value plus or be null
.
Empty string or Null
For strings things get a bit more complicated. A SQL NVARCHAR() NULL
can be either empty or null
. If you allow the string to be null
you’d better have a strict definition of how null
is different to an empty string. There might be cases where null
means unspecified while an empty string means specified as empty. They are unusual (I even failed to come up with an example). In most cases I find it best to use empty strings instead of null
. Unfortunately C# doesn’t allow non-null strings.
Empty values should be null
In Nullability Voodoo Mike argues why using null
is wrong.
Rather than using the nullability of EndDate to mean that the task hasn’t completed, consider giving the task a status instead.
He is right that EndDate
being null
is a bad way of marking a task as not completed. Especially if there are many different states that are dependent on different fields it can quickly get hard to find the state. I prefer an explicit state field. It might be implemented as an in memory only, calculated field, on the entity corresponding to the database row. That keeps the database normalized.
Even if Mike is right that having only a null
value for EndDate
is a bad marker for a “not completed” state, that’s still not a reason to not have EndDate
nullable. If the table indeed has a state field, which clearly marks a row as “not completed”, what’s the right value to put in for EndDate
? As the task is not yet completed there is no end date. It is undefined. Undefined is represented as null
.
Exclude Undefined values from Calculations
Using null
for undefined values effectively excludes them from calculations which is good. When a manager comes running, asking for a quick ad hoc report shoving the average number of items shipped for each order you don’t want to include non shipped orders (with incomplete data) in the calculation. If the ShippedItemCount
column isn’t nullable, all non shipped orders have a 0 value. In the ad hoc report a filter has to be applied to ignore those 0 values in the calculation.
If null
is instead used for ShippedItemCount
until the order is actually shipped, those values are automatically excluded from the calculation.
Mike is Right and Wrong
Mike is right in that null
should be used with care. null
is a powerful tool that should only be used where appropriate. In fact, Anders Hejlsberg regrets that non-nullable reference types are not available in C#. null
should be an opt-in for where it is appropriate. Not mandatory as it is now.
An interesting perspective and some good points. It’s a bit of a never-ending debate, but for me in databases:
* I choose empty string over null in all character types: char, varchar, text… etc. (often there’s no choice one way or the other depending on the DB and the data type). To me they are the same. I see no value in differentiating them. Nearly any logic that you write that differentiates “” from null and treats them differently, would be either abusing the meaning of null, or abusing the meaning of “” (similar to your example of the date/status column above). I’d love an example to counter this.
* I never make a nullable boolean type. Otherwise it’s no longer boolean. :-) In this case I’d make a status type that supported Yes, No, Maybe, Unset etc… whatever null was supposed to imply.
* I will judiciously use null for all number types: numeric, integer etc, as I don’t like magic values that imply no value — not even zero. But I will use zero where it makes sense, like an account balance of $0.
* I almost always use nullable date fields, unless the date is actually required or has a domain sensible default.
* I avoid nullable foreign keys wherever possible, especially against static lookup tables, preferring instead to have a record that means “none” or “unselected”. Of course, sometimes null is needed for something like a person hierarchy, where the top of the hierarchy has no parent… I would not create a person record named “root” for example… I’d just break down and use a nullable foreign key.
Cheers,
Clinton
I will MOST often set DateTime fields (and mapped SQL Date columns) to a domain default when appropriate instead of NULL. This allows proper partitioning on that column. An example would be an “ending date” that gets set to 9999-12-31 23:59:59.997 so all queries against the column are BETWEEN clauses. This quickly lets the database partition all the not-ended rows into a partition that ONLY has that date.
While I understand that there might be performance gains to be won by such an approach, I would prefer to keep that as an implementation detail of my data storage layer and have my data access layer abstract that away from my domain layer. Letting a performance trick for the database leak into the domain layer feels like a bad idea. What if the next release of the database engine behaves differently? Will you change your domain logic?
An simple example for a valid null string/varchar.
Take a look at a Person table:
FirstName: Bart
Middle : null
LastName: Calixto
and
FirstName: John
Middle: “”
LastName: Doe,
what’s the difference ?
Middle name for bart calixto is undefined, so you don’t know if he has or not a middle name, while you know that John Doe does not have a middle name.
I know is not very practical, but … you can ask people to fill in their middle name if it’s null while if it’s empty you can’t because you don’t know if it isn’t filled or if they don’t have one.
I favor null over empty string where the value can actually be undefined.