Donnerstag, 3. Januar 2008

CSV in Java – Why is it so "not standard"

XML is getting there. Libraries to consume and create XML are available on just about every IT platform out there, more and more Real World problems are being solved with XML, and the developers can finely tune their applications to speed and memory constraints. 


But XML is not the only  document exchange format out there — far from it. CSV remains very dominant. The simplicity of its format, the human readable format, the very compressed data structure still make CSV the best format for exchanging large bulks of data. 

As much as I like the attitude of the Java Community to standardize common solutions using upcoming technologies, I regret that the community seems to have forgotten all about CSV. There are discontinued solutions out there, such as CSV JDBC which allows read-only access to CSV files, or simpler CSV File parsers.

But they all lack basic plausibility/integrity checks, column count verification, type security, and much more I like from working with XML. For each project I had to recode parts of  the mentioned libraries. Rewriting libraries instead of configuring them, is not my typical approach to coding Java applications. 

I would like to see a solution, where I can attach/link an XSD to a CSV file. A special parser would then create a DOM object or SAX stream based on the XSD file. I could access columns as XML elements, have type safety, defined nullable attributes, use Marshalling frameworks for POJO Bean binding, use XSLT to create hierarchical structures and many more.

Is something available like this? Can we quit maligning CSV processing and lift it to the XMLstandard?