On Null or Why Empty Strings are not Same as Null

Null is an important, but sometimes hard concept. What’s the difference between an empty string and a null string? One of my first Stack Overflow questions was (NOT) NULL for NVARCHAR columns.

Some people claim that using null is always wrong or is some kind of voodoo.

This time Mike is wrong. There is a place for null, including Nullable<bool> and Nullable<int>.

Null means no value

null is a special value that means no value. For plain C pointers it’s just a name for the magic number 0 (I know that according to the specification it can have another numerical representation than 0, but in reality it doesn’t). In C# null is a special value that’s not part of the reference value space. The same is true for nullable value types (Nullable<int> or int? for short) where an int? can take any permitted value for an int or be null. For a SQL column the same is true, a nullable int column can take any possible int value plus or be null.

Empty string or Null

For strings things get a bit more complicated. A SQL NVARCHAR() NULL can be either empty or null. If you allow the string to be null you’d better have a strict definition of how null is different to an empty string. There might be cases where null means unspecified while an empty string means specified as empty. They are unusual (I even failed to come up with an example). In most cases I find it best to use empty strings instead of null. Unfortunately C# doesn’t allow non-null strings.

Empty values should be null

In Nullability Voodoo Mike argues why using null is wrong.

Rather than using the nullability of EndDate to mean that the task hasn’t completed, consider giving the task a status instead.

He is right that EndDate being null is a bad way of marking a task as not completed. Especially if there are many different states that are dependent on different fields it can quickly get hard to find the state. I prefer an explicit state field. It might be implemented as an in memory only, calculated field, on the entity corresponding to the database row. That keeps the database normalized.

Even if Mike is right that having only a null value for EndDate is a bad marker for a “not completed” state, that’s still not a reason to not have EndDate nullable. If the table indeed has a state field, which clearly marks a row as “not completed”, what’s the right value to put in for EndDate? As the task is not yet completed there is no end date. It is undefined. Undefined is represented as null.

Exclude Undefined values from Calculations

Using null for undefined values effectively excludes them from calculations which is good. When a manager comes running, asking for a quick ad hoc report shoving the average number of items shipped for each order you don’t want to include non shipped orders (with incomplete data) in the calculation. If the ShippedItemCount column isn’t nullable, all non shipped orders have a 0 value. In the ad hoc report a filter has to be applied to ignore those 0 values in the calculation.

If null is instead used for ShippedItemCount until the order is actually shipped, those values are automatically excluded from the calculation.

Mike is Right and Wrong

Mike is right in that null should be used with care. null is a powerful tool that should only be used where appropriate. In fact, Anders Hejlsberg regrets that non-nullable reference types are not available in C#. null should be an opt-in for where it is appropriate. Not mandatory as it is now.

4 comments

  1. An interesting perspective and some good points. It’s a bit of a never-ending debate, but for me in databases:

    * I choose empty string over null in all character types: char, varchar, text… etc. (often there’s no choice one way or the other depending on the DB and the data type). To me they are the same. I see no value in differentiating them. Nearly any logic that you write that differentiates “” from null and treats them differently, would be either abusing the meaning of null, or abusing the meaning of “” (similar to your example of the date/status column above). I’d love an example to counter this.

    * I never make a nullable boolean type. Otherwise it’s no longer boolean. :-) In this case I’d make a status type that supported Yes, No, Maybe, Unset etc… whatever null was supposed to imply.

    * I will judiciously use null for all number types: numeric, integer etc, as I don’t like magic values that imply no value — not even zero. But I will use zero where it makes sense, like an account balance of $0.

    * I almost always use nullable date fields, unless the date is actually required or has a domain sensible default.

    * I avoid nullable foreign keys wherever possible, especially against static lookup tables, preferring instead to have a record that means “none” or “unselected”. Of course, sometimes null is needed for something like a person hierarchy, where the top of the hierarchy has no parent… I would not create a person record named “root” for example… I’d just break down and use a nullable foreign key.

    Cheers,
    Clinton

  2. I will MOST often set DateTime fields (and mapped SQL Date columns) to a domain default when appropriate instead of NULL. This allows proper partitioning on that column. An example would be an “ending date” that gets set to 9999-12-31 23:59:59.997 so all queries against the column are BETWEEN clauses. This quickly lets the database partition all the not-ended rows into a partition that ONLY has that date.

    1. While I understand that there might be performance gains to be won by such an approach, I would prefer to keep that as an implementation detail of my data storage layer and have my data access layer abstract that away from my domain layer. Letting a performance trick for the database leak into the domain layer feels like a bad idea. What if the next release of the database engine behaves differently? Will you change your domain logic?

  3. An simple example for a valid null string/varchar.

    Take a look at a Person table:

    FirstName: Bart
    Middle : null
    LastName: Calixto

    and

    FirstName: John
    Middle: “”
    LastName: Doe,

    what’s the difference ?
    Middle name for bart calixto is undefined, so you don’t know if he has or not a middle name, while you know that John Doe does not have a middle name.

    I know is not very practical, but … you can ask people to fill in their middle name if it’s null while if it’s empty you can’t because you don’t know if it isn’t filled or if they don’t have one.

    I favor null over empty string where the value can actually be undefined.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.