Knowledge Base

Preserving for the future: Shell scripts, AoC, and more

Python json to csv

I'm sure there are many tutorials for how to convert json arrays to csv. I wrote one that finds all the possible keys and then generates a csv. Check it out at my gitlab. The logic is broken up into several parts.

   # Learn keys at this level
   keys=[]
   for i in json_obj:
      for j in list(i.keys()):
         if j not in keys:
            keys.append(j)
            debugprint("Found key {0}".format(j),debug=debug)

And then I found the concatenation to a large string, and then a single write operation to a file, to be much more performant than writing to the file for each entry of each row (duh).

   x=0
   fullstring=""
   for i in json_obj:
      x += 1
      # only if there are actually contents of "all" do we print the headers
      if x == 1:
         for k in keys:
            fullstring += str(k) + ","
         fullstring += "\n"
      for k in keys:
         p = ""
         try:
            p = str(i[k]).replace(",",":")
         except:
            # no value for this key for this entry
            pass
         pp = "{0},".format(p)
         fullstring += pp
      fullstring += "\n"

   if fullstring != "":
      if csvfile == "stdout":
         print(fullstring)
      else:
         with open(csvfile,"a") as of:
            of.write(fullstring)

So you'll see that I also replace some commas with colons. I should probably have parameterized those, but I can leave that as an exercise to the reader. I even threw in a cli front-end to my library.

$ ./json-to-csv.py --help
usage: json-to-csv.py [-h] [-d [{0,1,2,3,4,5,6,7,8,9,10}]] [-i INFILE] [-o OUTFILE]

optional arguments:
  -h, --help            show this help message and exit
  -d [{0,1,2,3,4,5,6,7,8,9,10}], --debug [{0,1,2,3,4,5,6,7,8,9,10}]
                        Set debug level.
  -i INFILE, --infile INFILE
                        Json file to convert
  -o OUTFILE, --outfile OUTFILE
                        Csv file to output

Backstory

Some hospitals present their machine-readable data for pricing transparency as json. And some devices don't know how to read json, but a csv can be opened like a spreadsheet.

Comments