Partial Commits with Git

Every once in a while I’m working on a feature, only to discover that I need to extend another part of the code first. If I was disciplined, I would create another branch at that point. But I’m not. I end up with both the extended utility class and the actual feature as pending changes. With git it is simple to make two separate commits while ensuring that every commit compiles.

I’m working on my new big thing; the command line calculator. I’ve already done addition and am quite happy with that and I’m now implementing subtraction. Half way through the subtraction implementation I discover that I need to make some changes to the console output formatter class. It has the + sign hard coded and now needs to take that as a parameter. I do that and end up with a working solution.

Doing a git status however shows a mess.

C:\git\spikes\gitpartial [master +1 ~2 -0 !]> git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
 
        modified:   ConsoleFormatter.cs
        modified:   Program.cs
 
Untracked files:
  (use "git add <file>..." to include in what will be committed)
 
        Subtraction.cs
 
no changes added to commit (use "git add" and/or "git commit -a")
C:\git\spikes\gitpartial [master +1 ~2 -0 !]>

I’ve got both the updated ConsoleFormatter.cs, the updated Program.cs and the new Subtraction.cs. The first one contains the updated console formatting features that are independent of the added functionality. I want to commit the ConsoleFormmatter.cs separately. And not only commit it. I want to compile and test the exact code I’m going to commit, by hiding the other files from view. With git this can be done with just a few commands. With subversion, I’ve never quite figured out how to do it in a simple enough way. I usually end up with one big commit on svn. If anyone knows how to do this as simple in svn, please leave a comment.

Beware of Uri.ToString()

When working with urls, it’s sometimes better to use the Uri class than to keep the Uri in a simple string. The Uri class helps validate that the format is a valid Uri and helps splitting out the parts of the Uri in a safe manner. But there is a big gotcha in that Uri.ToString() returns an unescaped representation of the Uri.

The contents of this post might sound simple, but they were behind a nasty heisenbug. Every single insight in this post is something that I learned in a very painful way. I hope that reading this post will convey the same insights in a less painful way.

TL;DR; in two lines of code

The entire problem can be expressed in two lines of code.

var uri = new Uri("http://localhost?p1=Value&p2=A%20B%26p3%3DFooled!");
Console.WriteLine("uri.ToString(): " + uri.ToString());

It looks simple and it should be simple, but it isn’t. When running these two lines on the .NET Framework 4 the following output is produced:

http://localhost/?p1=Value&p2=A B&p3=Fooled!

The query string has been decoded in such a way that it looks like there is an extra parameter p3!

When targeting .NET 4.5 however only the space is unescaped. This can be explained as a result of the breaking changes to System.Uri in .NET 4.5. But that is not the whole story. It gets more complicated (and bug prone) because .NET 4.5 is an in place upgrade to .NET 4.0.

Kentor.AuthServices 0.7.2 SAML2 for ASP.NET Released

Last week we released version 0.7.2 of the Kentor.AuthServices SAML2 Service Provider for ASP.NET. With this release and the 0.6.0 the week before (that I never blogged about) we’ve introduced some new features to better support SAML2 in federation setups. The first is that we now can load and parse federation metadata. No more manual configuration of peer identity providers. The second is that we now support using a discovery service to let the user select an identity provider to authentication with.

The core AuthServices, MVC and Owin packages are all available for download on Nuget. The source and issue list are on GitHub.

Contents

  • Idp metadata support.
  • Federation metadata support.
  • Discovery service support.
  • Http Redirect binding preferred.
  • Bug fixes.

Regression Testing Processing Algorithms

This is a guest post by Albin Sunnanbo sharing experiences on regression testing.

On several occasions I have worked with systems that processed lots of work items with a fairly complicated algorithm. When doing a larger rewrite of such an algorithm you want to regression test your algorithm. Of course you have a bunch of unit tests and/or integration tests that maps to your requirements, but unit tests tends to test systems from one angle and you need to complement with other test methods to test it from other angles too. We have used copy of production data to run a comprehensive regression test with just a few hours of work.

Our systems had the following workflow

  1. Users or imports produces some kind of work item in the system, i.e. orders.
  2. There is a completion phase of the work where the user commits each work item and make the result final, i.e. sends the order.
  3. Once each item is final the system processes the work item and produces an output that is saved in a database before it is exported to another system.

We have successfully used the following approach to regression testing for those kind of algorithms.

.NET == and .Equals()

Equality might look like a simple concept at a first glance, but looking deeper it isn’t. In C# objects can be compared with the == operator, with the Equals(Object) member, with the Object.Equals(Object, Object) method or using custom comparators that implement one of or more of the IEquatable<T>, IComparable, IStructuralEquatable or IStructuralComparable interfaces. There’s also a Object.ReferenceEquals(Object, Object) method that can be used. In this post, we’ll take a closer look at the basics: == and .Equals().

Plain Vanilla Operator ==

The most common way to compare objects in C# is to use the == operator.

For predefined value types, the equality operator (==) returns true if the values of its operands are equal, false otherwise. For reference types other than string, == returns true if its two operands refer to the same object. For the string type, == compares the values of the strings.

Looking first at simple value types, this makes sense and makes comparisons of e.g. integers behave logical. Looking at more complex value types such as DateTime it also makes sense. If we put the current date in two variables we expect them to be equal.

var d1 = DateTime.Now.Date;
var d2 = DateTime.Now.Date;
 
Console.WriteLine(d1 == d2); // Writes True

Reference types are handled differently; == by default compares if the two variable are references to the same object. The contents of the object doesn’t matter.

var sb1 = new StringBuilder("Blue");
var sb2 = new StringBuilder("Blue");
 
Console.WriteLine(sb1 == sb2);

The string type is an exception pointed out in the documentation. It is a reference type stored on the heap, but everything possible has been done to make it behave like a value type. It is immutable. == compares the contents of the strings.

But string is not the only one; looking just in the System namespace the classes Uri and Version compares the content instead of checking if the variables reference the same object. It’s possible to test by comparing the output of == to that of Object.ReferenceEquals(Object, Object). The latter checks if the two references are to the same object or to different objects.

// Strings are highly optimized to share storage space. Using a StringBuilder is
// a way to get two different string instances with the same value.
var s1 = "Blue";
var sb = new StringBuilder("Bl");
sb.Append("ue");
var s2 = sb.ToString();
 
Console.WriteLine(s1 == s2); // True
Console.WriteLine(object.ReferenceEquals(s1, s2)); // False
 
var u1 = new Uri("http://localhost");
var u2 = new Uri("http://localhost");
Console.WriteLine(u1 == u2);  // True
Console.WriteLine(object.ReferenceEquals(u1, u2)); // False
 
var v1 = new Version(1, 2, 3);
var v2 = new Version(1, 2, 3);
Console.WriteLine(v1 == v2); // True
Console.WriteLine(object.ReferenceEquals(v1, v2)); // False

Overloaded Operator ==

For string, Uri and Version the default implementation of == is obviously not used, but instead a more specific overload is provided by the framework. In fact, all of them override the == operator by implementing the public static bool operator ==.

Note that the operator method is static. It isn’t an instance member. It isn’t virtual. The decision to use it or not will be done entirely at compile time. If the references are cast to another type, such as object the custom operator won’t be used. It’s enough to cast one of the operands to object to get the default reference comparison. Using the strings from the previous example, we’ll treat one of them as an object

object o1 = s1;
Console.WriteLine(o1 == s2); // False
Console.WriteLine(s2 == o1); // False

Compiling the code will give a warning: Possible unintended reference comparison; to get a value comparison, cast the left hand side to type ‘string’.

The verdict for == is that it behaves consistent until inheritance is involved. Since it is resolved at compile time it simply can’t deal with inheritance. So == will be a reasonable default for the 90%+ of cases in a program where no inheritance is involved and the compile time type of the references is the same as the run time type. For the other few percent of comparisons, something more powerful is needed.

.Equals(Object)

When the dynamic type of the objects need to be taken into consideration, the .Equals(Object) method can be used. It is virtual and allows each class to define it’s own behaviour. Adjusting the code above to use Equals shows the difference.

Console.WriteLine(s2.Equals(o1)); // True
Console.WriteLine(o1.Equals(s2)); // True

The method is virtual so in both cases, an overload of .Equals() on String will be called. But, the overload resolution is done on the static (i.e. compile time) type. Which means that in one case String.Equals(Object) will be called and in the second case String.Equals(String). The only difference between them is that the former has to cast the parameter, which is a small performance penalty. Providing a specialized overload with the right type can give some performance improvements, so for library code like String it’s a good idea to provide that overload.

The IEquatable<T> Interface

All types inherits the .Equals(Object) method from Object, so it can be used on any type in the .NET framework. In some cases it also makes sense to mark a type as implementing the a more specific version of .Equals(), comparing to the right type. That is exactly what the IEquatable<T> interface does.

public interface IEquatable<T>
{
  bool Equals(T other);
}

With that, it might be tempting to wrap up this post and declare it done. But there are a few important details to add, regarding consitency.

Consistency

The first observation regarding consistency is that for non-virtual calls, the basic mathematical requirements of an equivalence relation should hold:

  • a == a and a.Equals(a) should always be true (Reflexivity).
  • a == b, b == a, a.Equals(b) and b.Equals(a) should always give the same result. (Symmetry)
  • If a == b is true and b == c is true, then a == c should also be true (Transitivity). The same applies to a.Equals(b), b.Equals(c) and a.Equals(c).

There is also one more important part of consistency that must be dealt with, at least if the class will ever be used in a Dictionary<TKey, TValue>: GetHashCode(). A dictionary works by first grouping item in buckets using the Object.GetHashCode() virtual method. Then it ensures that it has found the right item by checking equality (by calling .Equals() unless a custom comparer is provided). That means that if two objects are considered equal, but gives different hash codes, the Dictionary<TKey, TValue> behave peculiar. Let’s have some fun and try!

struct Person
{
  public int Age { get; set; }
  public string Name { get; set; }
 
  // A person is uniquely identified by name, so let's use it for equality.
  public override bool Equals(Object obj)
  {
    return (obj is Person) && ((Person)obj).Name == Name;
  }
 
  // For lazyness reasons we (incorrectly) use the age as the hash code.
  public override int GetHashCode()
  {
    return Age;
  }
}

The Person class is clearly incorrectly implemented as Equals() and GetHashCode won’t behave consistently. If we use Person as the key to a dictionary we can get some “fun” results.

var favColours = new Dictionary<Person, string>();
 
var p = new Person()
{
  Age = 1,
  Name = "Alice"
};
 
favColours[p] = "Blue";
 
// Happy birthday Alice!
p.Age = 2;
favColours[p] = "Green";
 
Console.WriteLine(favColours.Count); // 2
 
var keys = favColours.Keys.ToArray();
Console.WriteLine(object.ReferenceEquals(keys[0], keys[1])); // True

The output of that snippet of code shows we have two person objects (being a struct, a copy is made when stored in the dictionary) that are used as keys. They have resulted in different entries in the dictionary – but comparing them they are equal. That’s confusing. Don’t go there.

When implementing custom equality, three different methods should always be implemented and behave consistently.

  • Make an overload for the == operator.
  • Override .Equals(Object) and optionally provide an optimized .Equals(MyType).
  • Override .GetHashCode() and make sure that it returns the same hash code for all objects that compares are equal.

That’s all for now regarding equality; in my next post I’ll have a look at comparisons with IComparable.

Software Development is a Job – Coding is a Passion

I'm Anders Abel, a systems architect and developer working for Kentor in Stockholm, Sweden.

profile for Anders Abel at Stack Overflow, Q&A for professional and enthusiast programmers

The complete code for all posts is available on GitHub.

Popular Posts

Archives

Series

Powered by WordPress with the Passion for Coding theme.