Great! That would be an amazing resource saver and enabler for scientists as well as small companies, incentivizing the use of AWS, as opposed to in-house built systems.
Is there a specific initiative/team/person to follow for Public Data Sets on AWS? Is there anything in place to keep data sets up-to-date (especially derivatives/mashups)? Is there a way to contribute to such an initiative? AWS Public Data Sets is awesome, but it seems unnecessarily restrictive[0].
Part of what sounds interesting to me in the longer-term vision for the dat project[1] is the ability to write transformations that point to a living data source and output an up-to-date parsed/processed version of it (and only download diffs!). Processed data sets tend to be stale, or alternatively you have to start from a data dump and run a slew of scripts to process or index the data, which can take days.
One painful example of this problem was Freebase's very interesting data set, WEX (pairing Wikipedia textual content with structured data from Freebase), of which there is an outdated snapshot on AWS Public Data Sets[2] containing less than half the data of newer versions. Google acquired Freebase a little while before then, and there were only one or two updates to WEX. I was lucky enough to download what I think was the last WEX data dump before they killed the download.freebase.com[3] subdomain. I have yet to confirm if it's gone or if it was simply moved/renamed[4].
While it was amazing Freebase/Google provided dumps this processed data, and Amazon provided an easy to access snapshot of it, we really ought to have a way of publishing and subscribing to the latest post-processed version of a data set derived from one or more regularly updated data sets, be they from NASA/Landsat, Wikipedia, or otherwise. I don't know exactly what this process would look like, but the raw data is there and all we need is a way to publish the processing software/commands (docker?) to be re-run whenever a data dependency is updated.
It seems like AWS Public Data Sets would be an ideal destination for data sets and more accessible derivatives. Is any of that in line with the intent of AWS Public Data Sets?
I apologize for letting that turn into a bit of a rant, but I wanted to provide an anecdote and context.
I don't know anything about this data, so excuse my ignorance. What's available in terms of historical data? Can I get images for a certain region for the last 10 years, say?
I worked on Landsat and Spot data back in the 80s using American (Gould) and Canadian image processing systems. Both came with Fortran source code, so it was very educational.
ATI tried to recruit me for consumer products, but their understanding of image processing was so primitive that we couldn't communicate. All they understood was red-eye removal and edge detection. :)
The various Landsat resolutions are ok for earth sciences, including ground cover, cloud cover and ice studies.
But I think most of the people here would be more interested in Spot or higher resolution data.
An interesting factoid is that one of the earliest Sony CD-ROMs ever burned (1985'ish) had Landsat sample data on it. It was distributed to a few Japanese geoscientists who had an obvious need for mass storage.