Skip to content

MalcolmMcLean/xmltocsv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xmltocsv

Convert an XML file to a CSV (comma-separated value) file.

Usage

xmltocsv [options] <inputfile.xml>

options: -ignoreattributes - don't consider attribute data
         -ignorechildren - don't consider child elements
         -stdin - pipe input from stdin instead of from a file

Building

If you have CMake, create a directory called "build" under the project root directory. Navigate to it, then type

cmake ..
make

or

cmake -G <your favourite generator> ..

If you don't have CMake, simply navigate to the src directory and type

gcc *.c -lm

Or use your favourite C compiler.

The code should be completely portable and build anywhere with a C compiler.

Converting XML to CSV

XML files have a tree structure whilst CSV files are a dataframe, or a 2-dimensional data structure with rows representing records and columns fields, with mixed numbers and strings allowed in the fields. So CSV files cannot represent XML data perfectly. However many XML files are basically dataframes with only a little bit of extra structure. So what the program does is look for the largest dataframe-like structure in the XML file, pulls it out, and converts that. Children are folded into the CSV output recursively.

So

xmltocsv inputfile.xml > outputfile.csv

should do what you want, most of the time.

Some XML files structure data with child elements, and some use attributes. If the attributes are just noise, pass -ignoreattributes to ignore them. If all the data is in attributes and you want to ignore the child elements, pass -ignorechildren. (You can't ignore both attributes and children or you will have no data).

Strings containing quotation marks or commas need to be escaped before writing to CSV. However we don't escape most strings. Whitespace is trimmed, but otherwise they will appear as in the XML file.

Sometimes you want to set up a pipeline. So pass the -stdin option to pipe the XML data from standard input, and omit the filename. Alternative you can use "-" as the input file name, to denote stdin. It's not a very good pipeline because, in the nature of XML, the entire document must be read in before the structure can be analysed, but it should work on modern systems with plenty of memory.

See how you get on.

Components

All the code is authored by Malcolm McLean

Both the XML parser and the options parser are modular and re-usable, and you might want to take them for other projects.

XML Parser docs.
Options Parser docs.

About

Convert an XML file to a CSV (comma-separated value) file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published