> *The response for all queries is formatted as a two dimensional JSON array...

bcherry · on June 6, 2012

This doesn't seem unreasonable.

1) Whether it's user-friendly or not, it's still JSON, which makes JSON-P possible (they support JSON-P).

2) It probably mirrors the way they store their data (in tables, whether SQL or Access or Excel, doesn't really matter).

3) It's more compact than traditional JSON, which means less bandwidth, which means less cost. Keep in mind that this is essentially a not-for-profit API from a not-for-profit organization with the worst possible budgeting scenario. Yes, Gzip would basically eliminate this benefit, but they don't have gzip enabled and enabling it may be difficult or impossible under whatever constraints they operate under.

Also, it's entirely possible that this API has existed privately for a long, long time in a CSV format, and they simply made a minor enhancement to make it JSON and JSON-P compatible and open up the endpoints to the public.

chernevik · on June 7, 2012

#3 may matter for a number of reasons:

- I daresay Census is sitting on some _large_ data sets. Fancy data structures are one thing when you want the population of California, another when you're trying to get California, by ethnicity and age group, by zip code.

- Data structure compactness will matter less to HN readers than to someone sitting on the other end of a phone dial-up in Oklahoma.

- I am but an egg but don't fancier data structures require particular decisions for implementation on particular tables? Census has a lot of tables -> a lot of decisions.

- God only knows how many different systems are involved holding all their data. The simpler the data structure, the less they have to get into those -- and / or the simpler a layer they between some mainframe and the API output.

Also, it may not seem very friendly to HN readers. But it is stupid simple, you can _see_ how it is organized, making it accessible to a wider audience. And it's so simple that it can be adopted by any agency publishing tables. Note that there are a lot of agencies with a lot of tables -- something like this has a chance of becoming standard among all of them.

A public agency has special accessibility concerns, and it's just a reality that government agencies have particular legacy technology issues. A lowest common denominator format helps get the data over those obstacles.

Maybe they can do better in places, and they're free to, there is no reason they can't support other formats too. But this will be available for whatever subset isn't served by those extensions.

jack-r-abbit · on June 6, 2012

#3 seems a likely reason. and like I said, having a harder to work with data structure is still better than nothing. and it is free.

peterldowns · on June 6, 2012

All good explanations, but I can't help but wish they had chosen a format like:

  {
    P0010001 : [
      '710231',
      '4779736',
      // ... etc.
    ],
    NAME : [
      'Alaska',
      'Alabama',
      // ... etc.
    ],
    state: [
      '02',
      '01,
      // ... etc.
    ],
    // ... etc.
  }

tzs · on June 7, 2012

I'm having trouble thinking of any situation where that would be more convenient or useful than the format they are using now. Where would your proposed format be an advantage?

asdfaoeu · on June 7, 2012

Why would that be better? You'd probably still have to convert it. And also that structure would be harder to convert to from a CSV format or a SQL query.

lancefisher · on June 6, 2012

Factual.com serves their data like this too. I thought it was odd at first, but it does save a lot of bytes on the wire. I just parsed it with JSON.parse() and used underscore to map to my object format. You don't really even need underscore for that. It's just a matter of mapping an array of rows to an array of objects. It's nice there is an API now.

tantalor · on June 6, 2012

I also found this odd, but I can think of two reasons,

1. CSV is not a single well-defined format. Some people use rfc4180, but not everybody does.

2. JSON is easier to parse than CSV for JavaScript clients, i.e., browsers. On the other hand, JSON is not a subset of JavaScript.

pnathan · on June 6, 2012

It's very simple to transform that kind of data into $your_preferred_structure.

It's also a very agnostic format.

jackcviers3 · on June 7, 2012

One of the talks given at fluent conf (on Google's feedback canvas screenshot system) last week talked about how moving data from object-based storage to array-based storage saved serialization time and memory use. I would guess that is why they did it this way.

tzs · on June 7, 2012

With their format, it is an easy matter to mechanically turn the first item into:

   INSERT INTO table ('P0010001','NAME','state') VALUES (?,?,?)

and then loop through the remaining items using that to insert them into a database.

jack-r-abbit · on June 7, 2012

I can't edit my original post anymore so I'll reply to myself. Having had some time to think about this I've changed my mind slightly. It seems that if you are getting various datesets with various unknown field names, you really don't have any other way to programmatically know what keys you would pull. Having the first row give the fields might make processing know datasets wonky, it is the unknown datasets that need the most help. I take back my original criticism. Thanks for all the insight.