Built in .NET CSV Parser

In administrative systems, there is often a need to import and parse csv files. .NET actually has a built in CSV parser, although it is well hidden in a VB.NET namespace. If I had known about it I wouldn’t have had to write all those custom (sometimes buggy) parsers.

To really test the parser, I’m going to parse a csv file in the Swedish format.

Name; FactoryLocation; EstablishedYear; ProfitMillionSEK
Volvo; "Gothenburg, Sweden; Gent, Belgium"; 1926; 0,345463
#A comment line
Saab; Trollhättan, Sweden; 1945; -3 009

Note that there is an embedded ; in the FactoryLocation field of Volvo, which is part of the field text and not a field delimiter.

There are three special formatting rules that applies to Swedish csv files.

  • The decimal delimiter is ,
  • The field delimiter is ; to not be confused with the decimal delimiter
  • The thousand separator in numbers is a space.

I really did my best to come up with a format that requires flexibility of the parser, but the TextFieldParser has really flexible configuration options and just worked.

To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project. Then it’s just to instantiate the parser, set needed configuration through properties and start parsing.

// TextFieldParser is in the Microsoft.VisualBasic.FileIO namespace.
using (TextFieldParser parser = new TextFieldParser(path))
{
    parser.CommentTokens = new string[] { "#" };
    parser.SetDelimiters(new string[] { ";" });
    parser.HasFieldsEnclosedInQuotes = true;
 
    // Skip over header line.
    parser.ReadLine();
 
    while (!parser.EndOfData)
    {
        string[] fields = parser.ReadFields();
        yield return new Brand()
        {
            Name = fields[0],
            FactoryLocation = fields[1],
            EstablishedYear = int.Parse(fields[2]),
            Profit = double.Parse(fields[3], swedishCulture)
        };
    }
}

The parser can be configured with comment tokens and delimiters. It can handle fields enclosed in quotes. There is also multiple read functions. It can read lines just as a string, it can split the line into fields and it can read the remainder of a file as a huge string. To be honest, it’s way better than any of the parsers I’ve written.

The only thing I could possibly wish for is built in conversion to other data types than strings and object materialization. It could be an interesting thing to write, so maybe I’ll come back with a materialization wrapper.

Wouldn’t it be cool to have a data annotations based csv parser? Create a class with proper annotations and then automatically parse data from a csv file!

  • miki on 2013-01-21

    This is very simple, the most important part is missing – ability to process multiline strings (in quotes).

    • lunchbeast on 2013-03-19

      ‘Most’ important part is missing? The most important stuff is there and well explained. You should be able to figure out the rest.

  • Brian on 2013-07-17

    Thank you Anders – this is most helpful!

  • Danny Warren on 2014-01-10

    Life Saver! And Life Changing! ;-) I am poised to parse a couple different kinds of CSV this sprint and was just looking for tips on how to do it intelligently. Thanks for showing me the light. I will never manually parse another CSV file again!

  • Tony on 2014-06-25

    Excellent post! Thanks! Works great (converted to VB.. forgive the trespass.. still working on learning C#).

  • Antony on 2014-08-09

    Thanks Anders, you just save me writing my own parser.

  • Leave a Reply

    Your name as it will be displayed on the posted comment.
    Your e-mail address will not be published. It is only used if I want to get in touch during comment moderation.
    Your name will be a link to this address.
Software Development is a Job – Coding is a Passion

I'm Anders Abel, a systems architect and developer working for Kentor in Stockholm, Sweden.

profile for Anders Abel at Stack Overflow, Q&A for professional and enthusiast programmers

The complete code for all posts is available on GitHub.

Popular Posts

Archives

Series

Powered by WordPress with the Passion for Coding theme.