String Split and Join with Escaping

.NET offers the simple string.Split() and string.Join() methods for joining and splitting separated strings. But what if there is no suitable separator character that may not occur in the string? Then the separator character must be escaped. And then the escape character must be escaped too… And this turns out to be quite an interesting algorithm to write.

I thought that this functionality would be built in, but as far as I could find out it isn’t. If there is a built in way, please leave a comment to educate me. This being a string manipulation, there is a possibility to use Regular Expressions too, but…

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Jamie Zawinski

Solving this through a Regular Expression would require some black magic double look-behind assertion which I wouldn’t understand even when I wrote the code, much less later when I came back to fix some bug. So I went for implementing it myself.

Design Considerations

This is just a small helper that I’m writing as part of a bigger project. It is not performance critical, so I haven’t spent any time optimizing it. But I did think a bit about performance implications. One approach that I directly ruled out was to build up the splitted strings character by character when looping through the input string. It would make the implementation quite easy to follow, but would allocate a new string for each char being checked in the source string. That is a bit too much of pressure on the garbage collector for my taste.

So I went for an iterative approach where I loop through the string, keeping track of where the current segment started and checking if the end of the segment has been found. I think that the resulting code is fairly readable. But it is more complex with more quirks than I first imagined because of some edge cases. With the delimiter being , and the escape character / consider the following escaped and joined strings:

  • aa,bb,cc
  • ,aa,,bb,
  • a/,b//c,/,,//,
  • a/,

Strings can be empty – even the final one. They can end with an escape sequence. And an escaped escape character can preceed a delimiter where the string should be split.

The Code

/// <summary>
/// Helpers for delimited string, with support for escaping the delimiter
/// character.
/// </summary>
public static class DelimitedString
{
  const string DelimiterString = ",";
  const char DelimiterChar = ',';
 
  // Use a single / as escape char, avoid \ as that would require
  // all escape chars to be escaped in the source code...
  const char EscapeChar = '/';
  const string EscapeString = "/";
 
  /// <summary>
  /// Join strings with a delimiter and escape any occurence of the
  /// delimiter and the escape character in the string.
  /// </summary>
  /// <param name="strings">Strings to join</param>
  /// <returns>Joined string</returns>
  public static string Join(params string[] strings)
  {
    return string.Join(
      DelimiterString,
      strings.Select(
        s => s
        .Replace(EscapeString, EscapeString + EscapeString)
        .Replace(DelimiterString, EscapeString + DelimiterString)));
  }
 
  /// <summary>
  /// Split strings delimited strings, respecting if the delimiter
  /// characters is escaped.
  /// </summary>
  /// <param name="source">Joined string from <see cref="Join(string[])"/></param>
  /// <returns>Unescaped, split strings</returns>
  public static string[] Split(string source)
  {
    var result = new List<string>();
 
    int segmentStart = 0;
    for (int i = 0; i < source.Length; i++)
    {
      bool readEscapeChar = false;
      if (source[i] == EscapeChar)
      {
        readEscapeChar = true;
        i++;
      }
 
      if (!readEscapeChar && source[i] == DelimiterChar)
      {
        result.Add(UnEscapeString(
          source.Substring(segmentStart, i - segmentStart)));
        segmentStart = i + 1;
      }
 
      if (i == source.Length - 1)
      {
        result.Add(UnEscapeString(source.Substring(segmentStart)));
      }
    }
 
    return result.ToArray();
  }
 
  static string UnEscapeString(string src)
  {
    return src.Replace(EscapeString + DelimiterString, DelimiterString)
      .Replace(EscapeString + EscapeString, EscapeString);
  }
}

The code is part of Kentor.AuthServices and also available at GitHub. As it is published on this blog, it is dual licensed with both the license of Kentor.AuthServices and the license for code snippets from this blog (see footer). The code is covered by tests.

2 comments

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.