will you post all of the data (in its original, unadulterated form) collected for free, unrestricted download by anyone for any purpose, including competing with last.fm?
intending to share, and actually making a public promise to do so, are quite different things.
No-one even knows if it's possible to crowdsource good enough BPM data like this yet, so even demonstrating that it's feasible would be progress :-)
No. We are giving you data that you use to improve your tempo estimation algorithm. If you wanted to improve the state of the art, you would share the data you collect.
In particular, you note the MIREX competition (http://www.music-ir.org/mirex) earlier in your post. Share your data with the MIREX competition, and make it open for use by others, if you want to improve the state of the art.
* Charts (of course)
* Recommendations: I use last.fm for two kinds of recommendations: (1) where it uses listener history to figure out similar bands and albums, which in some ways is inferior to feature-based recommender systems like Pandora, and (2) where it uses neighbor listening history (e.g., neighbor radio) -- I can't count the number of times I've been introduced to awesome bands in genres I wasn't looking for just because a neighbor had it on their charts.
* Shoutboxes for each artist, album, and song: I can't count the number of times I've just instantly "loved" a song and needed someplace to shout something out with others who like the song.
Last.fm has a huge user base and should try to be a one-stop shop for music, like Netflix is for video. Here're some tips
* Revamp your business model: copy Grooveshark's business model if you have to, but get anyone who wants to listen to a song come to your site and not go to Youtube for a low-quality version.
* Increase avg. time on site: If you implement the suggestion above (where I can play any song I like), then you have a wealth of _real_ related song/album/artist to offer the user. You may be able to remove annoying audio ads and just rely on click ads. It also helps if you implement a site-wide list like youtube's playlist.
* Improve recommendations: Netflix had a decent algorithm, but conducting the Netflix prize got them a lot of publicity _and_ a much better algorithm. You could do the same with the ton of data that you have. Yahoo! is already ahead of you in this: see this year's KDD cup: http://www.kdd.org/kdd2011/kddcup.shtml
http://musicmachinery.com/2011/02/22/is-the-kdd-cup-really-m...
Because it's entirely anonymised, not just the users but the artists too -- c.f. Netflix's problems with deanonymization:
http://33bits.org/2010/03/15/open-letter-to-netflix/
This means you can't use any interesting characteristics of the music itself, or the associated metadata, to aid the recommendations. All the interesting domain knowledge is stripped out, which likely means the best solutions still won't work as well as algorithms that use metadata (like Last.fm's) or content analysis (like Pandora's) or both, and certainly won't lead to any particularly interesting insights about what drives people's tastes.
Disclaimer: I work at Last.fm
(Generalizing from myself with a sample size of one)
This smacks of the oft-ridiculed Java AbstractFactoryFactoryInterface. But let me put it bluntly: AbstractFactoryFactoryInterface's are how you write real, modular software–not little fart applications.
http://magicscalingsprinkles.wordpress.com/2010/02/08/why-i-...
[N.B. I'm not saying there isn't a lot of truth in the factorial article, it's just you have to know which challenges just need a one-liner function and which require an AbstractFactoryFactoryInterface]
[1]: http://www.korokithakis.net/posts/book-clouds/
But, thanks :-)