When working with urls, it’s sometimes better to use the Uri
class than to keep the Uri in a simple string. The Uri class helps validate that the format is a valid Uri and helps splitting out the parts of the Uri
in a safe manner. But there is a big gotcha in that Uri.ToString()
returns an unescaped representation of the Uri.
The contents of this post might sound simple, but they were behind a nasty heisenbug. Every single insight in this post is something that I learned in a very painful way. I hope that reading this post will convey the same insights in a less painful way.
TL;DR; in two lines of code
The entire problem can be expressed in two lines of code.
var uri = new Uri("http://localhost?p1=Value&p2=A%20B%26p3%3DFooled!"); Console.WriteLine("uri.ToString(): " + uri.ToString()); |
It looks simple and it should be simple, but it isn’t. When running these two lines on the .NET Framework 4 the following output is produced:
http://localhost/?p1=Value&p2=A B&p3=Fooled! |
The query string has been decoded in such a way that it looks like there is an extra parameter p3
!
When targeting .NET 4.5 however only the space is unescaped. This can be explained as a result of the breaking changes to System.Uri
in .NET 4.5. But that is not the whole story. It gets more complicated (and bug prone) because .NET 4.5 is an in place upgrade to .NET 4.0.
Causing Pain by Using a .NET 4.5 library from .NET 4.0
Normally .NET won’t allow an assembly to reference another assembly with a higher framework target than the referencing assembly. But since .NET 4.5 is an in place upgrade to .NET 4.0 this can be achieved by some clever (or if you prefer to call it “evil” that’s fine) ordering of compilation.
I’ve created two applications.
- NET40App is a console app that targets .NET 4.0.
- Net45App is a console app that targets .NET 4.5.
Trying to add the NET45App as a reference of the NET40App isn’t allowed. Compilation will fail due to the NET45App targeting a higher framework version than the NET40App. But if we compile things in a clever/evil order we can work around that:
- Build NET45App with target framework .NET 4.0.
- Build NET40App and let it refence the .NET 4.0 build of the NET45App.
- Rebuild NET45App with target framework .NET 4.5 and copy it to the Net40App’s directory
This is exactly what the BuildAndTest.bat
file in the source for this post does.
The result is two console applications. The NET45App runs against the .NET 4.5 framework. The NET40App runs against the .NET 4.0 framework, but all it does is to call the main method of the NET45App – having exactly the same code executed.
When running those applications the following output is produced.
*** Running the .NET 4.5 application *** http://localhost/?p1=Value&p2=A B%26p3%3DFooled! Default Regex.MatchTimeout: -00:00:00.0010000 *** Running the .NET 4.0 application *** http://localhost/?p1=Value&p2=A B&p3=Fooled! Default Regex.MatchTimeout: -00:00:00.0010000 |
Running from the .NET 4.0 application decodes the entire query string. Running from the .NET 4.5 application decodes the space, but not the control characters &
and =
. The Regex.MatchTimeout
property is accessed just to prove that the code indeed has access to .NET 4.5 features (that property is not available in .NET 4.0).
What this shows is that it isn’t enough to target and test a class library with .NET 4.5. It can still be called in a .NET 4.0 context and misbehave. I guess that the reason we had a heisenbug was that the deployed web sites targeted .NET 4.0 while the test sites targets .NET 4.5.
There is one more question here though that I haven’t been able to figure out: How can the Uri
class behave differently depending on the target framework of the assembly? As far as I’ve been able to find out the exact same version of System.dll
is used in both cases. If anyone knows, please leave a comment!
The Real Solution
Having examined the System.Uri
class in more depth than can be considered healthy, I think that I have come to some conclusions on when and how to use it properly.
The first observation is to never use the ToString()
method for anything but displaying to the user. And it probably shouldn’t be used in that case either, because the unescaping behaviour is confusing. No, the property to use is OriginalString
which always returns the exact string that was inserted.
But what is the meaning of using the Uri
class at all then? There is one strong reason: security. When handling Uris/Urls that are received from user or network input the Uri
class safely parses it and provides access to the components. The Uri
class is a Uri reader/parser. When creating Uris in an application (e.g. as the target of a redirect) there is little value in using the Uri
class.
This is actually exactly how Katana (Microsoft’s Owin framework) works. It uses the Uri
class for the incoming uri in IOwinRequest.Uri
but accepts a string as the parameter to IOwinResponse.Redirect
.