Beware of Uri.ToString()

When working with urls, it’s sometimes better to use the Uri class than to keep the Uri in a simple string. The Uri class helps validate that the format is a valid Uri and helps splitting out the parts of the Uri in a safe manner. But there is a big gotcha in that Uri.ToString() returns an unescaped representation of the Uri.

The contents of this post might sound simple, but they were behind a nasty heisenbug. Every single insight in this post is something that I learned in a very painful way. I hope that reading this post will convey the same insights in a less painful way.

TL;DR; in two lines of code

The entire problem can be expressed in two lines of code.

var uri = new Uri("http://localhost?p1=Value&p2=A%20B%26p3%3DFooled!");
Console.WriteLine("uri.ToString(): " + uri.ToString());

It looks simple and it should be simple, but it isn’t. When running these two lines on the .NET Framework 4 the following output is produced:

http://localhost/?p1=Value&p2=A B&p3=Fooled!

The query string has been decoded in such a way that it looks like there is an extra parameter p3!

When targeting .NET 4.5 however only the space is unescaped. This can be explained as a result of the breaking changes to System.Uri in .NET 4.5. But that is not the whole story. It gets more complicated (and bug prone) because .NET 4.5 is an in place upgrade to .NET 4.0.

Causing Pain by Using a .NET 4.5 library from .NET 4.0

Normally .NET won’t allow an assembly to reference another assembly with a higher framework target than the referencing assembly. But since .NET 4.5 is an in place upgrade to .NET 4.0 this can be achieved by some clever (or if you prefer to call it “evil” that’s fine) ordering of compilation.

I’ve created two applications.

  • NET40App is a console app that targets .NET 4.0.
  • Net45App is a console app that targets .NET 4.5.

Trying to add the NET45App as a reference of the NET40App isn’t allowed. Compilation will fail due to the NET45App targeting a higher framework version than the NET40App. But if we compile things in a clever/evil order we can work around that:

  1. Build NET45App with target framework .NET 4.0.
  2. Build NET40App and let it refence the .NET 4.0 build of the NET45App.
  3. Rebuild NET45App with target framework .NET 4.5 and copy it to the Net40App’s directory

This is exactly what the BuildAndTest.bat file in the source for this post does.

The result is two console applications. The NET45App runs against the .NET 4.5 framework. The NET40App runs against the .NET 4.0 framework, but all it does is to call the main method of the NET45App – having exactly the same code executed.

When running those applications the following output is produced.

*** Running the .NET 4.5 application ***
http://localhost/?p1=Value&p2=A B%26p3%3DFooled!
Default Regex.MatchTimeout: -00:00:00.0010000
 
*** Running the .NET 4.0 application ***
http://localhost/?p1=Value&p2=A B&p3=Fooled!
Default Regex.MatchTimeout: -00:00:00.0010000

Running from the .NET 4.0 application decodes the entire query string. Running from the .NET 4.5 application decodes the space, but not the control characters & and =. The Regex.MatchTimeout property is accessed just to prove that the code indeed has access to .NET 4.5 features (that property is not available in .NET 4.0).

What this shows is that it isn’t enough to target and test a class library with .NET 4.5. It can still be called in a .NET 4.0 context and misbehave. I guess that the reason we had a heisenbug was that the deployed web sites targeted .NET 4.0 while the test sites targets .NET 4.5.

There is one more question here though that I haven’t been able to figure out: How can the Uri class behave differently depending on the target framework of the assembly? As far as I’ve been able to find out the exact same version of System.dll is used in both cases. If anyone knows, please leave a comment!

The Real Solution

Having examined the System.Uri class in more depth than can be considered healthy, I think that I have come to some conclusions on when and how to use it properly.

The first observation is to never use the ToString() method for anything but displaying to the user. And it probably shouldn’t be used in that case either, because the unescaping behaviour is confusing. No, the property to use is OriginalString which always returns the exact string that was inserted.

But what is the meaning of using the Uri class at all then? There is one strong reason: security. When handling Uris/Urls that are received from user or network input the Uri class safely parses it and provides access to the components. The Uri class is a Uri reader/parser. When creating Uris in an application (e.g. as the target of a redirect) there is little value in using the Uri class.

This is actually exactly how Katana (Microsoft’s Owin framework) works. It uses the Uri class for the incoming uri in IOwinRequest.Uri but accepts a string as the parameter to IOwinResponse.Redirect.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.