> The response for all queries is formatted as a two dimensional JSON array where the first row provides column names and subsequent rows provide data values.
Hmmm... looking at the example of this I can't help but think there has got to be a better way. This is more just a standard CSV (first row is header, all other rows are data). Using JSON for data formatted as such is kind of a waste of JSON. If you pass that example into a JSON decoder you get an extremely more difficult to use object. Am I missing something? or is this just typical "the government doesn't do tech properly" stuff?
EDIT: sorry, I try to be less negative but sometimes it is hard. I do applaud them for at least making the info available. I don't have a use for it but if some else does then dealing with a stupid format is better than not having data at all.
EDIT2: (to clarify) I was not saying the data should just be CSV... I'm saying it is and that defeats the purpose of using JSON. They should still use JSON but with their data formatted differently. Using their example, properly closed but truncated to just to first 2 records, the PHP function json_decode() turns it into this array:
1) Whether it's user-friendly or not, it's still JSON, which makes JSON-P possible (they support JSON-P).
2) It probably mirrors the way they store their data (in tables, whether SQL or Access or Excel, doesn't really matter).
3) It's more compact than traditional JSON, which means less bandwidth, which means less cost. Keep in mind that this is essentially a not-for-profit API from a not-for-profit organization with the worst possible budgeting scenario. Yes, Gzip would basically eliminate this benefit, but they don't have gzip enabled and enabling it may be difficult or impossible under whatever constraints they operate under.
Also, it's entirely possible that this API has existed privately for a long, long time in a CSV format, and they simply made a minor enhancement to make it JSON and JSON-P compatible and open up the endpoints to the public.
- I daresay Census is sitting on some _large_ data sets. Fancy data structures are one thing when you want the population of California, another when you're trying to get California, by ethnicity and age group, by zip code.
- Data structure compactness will matter less to HN readers than to someone sitting on the other end of a phone dial-up in Oklahoma.
- I am but an egg but don't fancier data structures require particular decisions for implementation on particular tables? Census has a lot of tables -> a lot of decisions.
- God only knows how many different systems are involved holding all their data. The simpler the data structure, the less they have to get into those -- and / or the simpler a layer they between some mainframe and the API output.
Also, it may not seem very friendly to HN readers. But it is stupid simple, you can _see_ how it is organized, making it accessible to a wider audience. And it's so simple that it can be adopted by any agency publishing tables. Note that there are a lot of agencies with a lot of tables -- something like this has a chance of becoming standard among all of them.
A public agency has special accessibility concerns, and it's just a reality that government agencies have particular legacy technology issues. A lowest common denominator format helps get the data over those obstacles.
Maybe they can do better in places, and they're free to, there is no reason they can't support other formats too. But this will be available for whatever subset isn't served by those extensions.
I'm having trouble thinking of any situation where that would be more convenient or useful than the format they are using now. Where would your proposed format be an advantage?
Why would that be better? You'd probably still have to convert it. And also that structure would be harder to convert to from a CSV format or a SQL query.
Factual.com serves their data like this too. I thought it was odd at first, but it does save a lot of bytes on the wire. I just parsed it with JSON.parse() and used underscore to map to my object format. You don't really even need underscore for that. It's just a matter of mapping an array of rows to an array of objects. It's nice there is an API now.
One of the talks given at fluent conf (on Google's feedback canvas screenshot system) last week talked about how moving data from object-based storage to array-based storage saved serialization time and memory use. I would guess that is why they did it this way.
I can't edit my original post anymore so I'll reply to myself. Having had some time to think about this I've changed my mind slightly. It seems that if you are getting various datesets with various unknown field names, you really don't have any other way to programmatically know what keys you would pull. Having the first row give the fields might make processing know datasets wonky, it is the unknown datasets that need the most help. I take back my original criticism. Thanks for all the insight.
Hmmm... looking at the example of this I can't help but think there has got to be a better way. This is more just a standard CSV (first row is header, all other rows are data). Using JSON for data formatted as such is kind of a waste of JSON. If you pass that example into a JSON decoder you get an extremely more difficult to use object. Am I missing something? or is this just typical "the government doesn't do tech properly" stuff?
EDIT: sorry, I try to be less negative but sometimes it is hard. I do applaud them for at least making the info available. I don't have a use for it but if some else does then dealing with a stupid format is better than not having data at all.
EDIT2: (to clarify) I was not saying the data should just be CSV... I'm saying it is and that defeats the purpose of using JSON. They should still use JSON but with their data formatted differently. Using their example, properly closed but truncated to just to first 2 records, the PHP function json_decode() turns it into this array:
I don't find that format to be very pleasant.