Smithsonian Open Access

peatmoss · on Oct 22, 2022

This is really fantastic to see in the wild on HN. In my day job, I’m part of the team that sponsors (i.e. covers hosting costs for) this and a number of other really interesting datasets.

https://registry.opendata.aws/smithsonian-open-access/

crancher · on Oct 22, 2022

I've integrated the Smithsonian API into my in-development VR exhibit design app.

It's very distractingly awesome searching for something like "dog painting" and wandering around the exhibit created from the results. There's just so much.

Thank you for helping making this resource available.

peatmoss · on Oct 24, 2022

Do you have anything online about your VR exhibit design app? This sounds really cool!

Our team wasn't involved in the Smithsonian API portion—we really focus on the bulk access pattern, mostly via object storage (e.g. S3). That is, we see object storage / "just give me all the data" as the base of the pyramid for data access.

Purpose built APIs that add value can be consumers of data via more general access patterns. Specialized APIs can add heaps of value for a given use case, but they can also limit the breadth of applications for other use cases.

ethbr0 · on Oct 22, 2022

Do you happen to know any of the background on the Smithsonian digitization effort?

Curious, because I know Dr. Clough went to there after his stint at Georgia Tech. Would love to read the story of how digitization was implemented: that's a big ship to turn! https://en.wikipedia.org/wiki/G._Wayne_Clough#Secretary_of_t...

peatmoss · on Oct 24, 2022

It sounds like you may be more familiar with that effort than I am. In my interactions with SI (and others like NARA), my takeaway is that these people are ridiculously smart. We try to be helpful cloud experts where formats and access patterns are concerned, but we start with the assumption that the people coming to us wanting to share data are the experts in their data. From my vantage point, SI seems pretty smart :-)

SequoiaHope · on Oct 22, 2022

I've said this before, but I often think about what it would be like if the big AI companies actually honored copyright (like the rest of us have to, even though it is a very bad system) and relied on open access images (theres more than 10 million in various collections online) and improvements to sample efficiency to make their AI art generators.

Like if they worked with Twitter and Instagram to add image license options, with sticky defaults so users could specify that their image uploads are freely licensed, and encourage users to add alt text so AI algorithms and blind users alike have a better experience.

The result would be a better internet, with more openness. Instead, they've tried to just fly under the radar, upsetting a lot of people in the process by scooping up art whether the artist wants them to or not.

Anyway, open access image libraries are fantastic! Glad to see this.

kmeisthax · on Oct 22, 2022

Honestly? I think it would result in a smarter image generator. Part of the problem with the "hope-Authors-Guild-v-Google-is-controlling-precedent" approach is that the data set is extremely noisy. In AI, the training set is gospel, and people are almost certainly overfitting their models. DALL-E 2 is suspiciously familiar with how to draw Getty Images watermarks, for example.

Man, if I knew how half this training software worked, I'd be downloading the whole image set today and shoving it straight through my poor aging 1080 Ti.

zozbot234 · on Oct 22, 2022

> Like if they worked with Twitter and Instagram to add image license options, with sticky defaults so users could specify that their image uploads are freely licensed

Flickr has always done this, though not with "sticky defaults" (which you don't actually want; open content licenses work only when they are absolutely irrevocable and non-repudiatable for the licensor, which a "sticky default choice" might not be). Even YT gives users the choice of posting freely licensed videos, though it gets very little use in practice.

ghaff · on Oct 22, 2022

A problem is that defaults are so powerful. If you default uploads to MIT-0 or something like that, it comes across like you're hoping people won't notice.

And I'd add that I'm not sure how well most of the Creative Commons options work anyway. Unless you add your own watermark--which I don't like doing--to a photo, the attribution and photo get separated unless the user is being meticulous and probably even then a lot of the time. (I try to be careful but images get copied from presentation to presentation etc.)

Plus it's very tempting for people to choose non-commercial CC if they choose CC. But there really are very few interesting uses (except maybe education) that are genuinely non-commercial.

pryelluw · on Oct 22, 2022

Browsing the available datasets took me back to my days working on Fieldscope[0]. I would spend so much time going through the Chesapeake bay buoy data. Coding science projects is very rewarding. Too bad there isn’t more work out there. These days I’m stuck working on boring problems.

[0] - https://www.fieldscope.org/

vz8 · on Oct 22, 2022

What a great post to start off a weekend - thanks. For anyone excited about this, you may want to check out the amazing collections at Rijksmuseum [0] as well.

[0] https://www.rijksmuseum.nl/en/rijksstudio

chrisweekly · on Oct 22, 2022

This is what the internet, and our tax dollars, are for. Warms my heart, puts a grin on my face.

dERtuTOR · on Oct 22, 2022

Thank you! For me, this is a fantastic find of treasures beyond belief. Thank you for making my day wonderful, and I wish you the same!

chriscjcj · on Oct 22, 2022

It is a pleasure to see the results of state-of-the-art photography and imaging technology.

iancmceachern · on Oct 22, 2022

Its nice how different is this than the British museum, who hides things away and famously refuses to return stolen artifacts.

Symbiote · on Oct 22, 2022

The British Museum's collection is digitised here. Images appear to be under a Creative Commons non commercial licence.

https://www.britishmuseum.org/collection

iancmceachern · on Oct 22, 2022

But does the British have a policy of ethical returns like the Smithsonian?

https://www.si.edu/newsdesk/releases/smithsonian-adopts-poli...

gtsnexp · on Oct 22, 2022

Bravo!!!