Bret Taylor's blog - Latest Comments in We need a Wikipedia for data - Bret Taylor's blog

Re: We need a Wikipedia for data - Bret Taylor's blog

Chris — Sat, 03 Apr 2010 00:06:02 -0000

It would be really cool if the data that was open was in standard formats that were readily and easily understandable to query on. Which this seems like somewhat of a pipe-dream I was pleasantly surprised when I ran across mapbox.com which looks like a really positive OSS contribution toward that end. (also: http://chrischris.posterous...

Re: We need a Wikipedia for data - Bret Taylor's blog

buyessays — Fri, 14 Aug 2009 05:02:01 -0000

It is not hard to buy essay papers online at the essay writing organization just about We need a Wikipedia for data
. Thanks for kind of nice facts.

Re: We need a Wikipedia for data - Bret Taylor's blog

Bart Van Loon — Wed, 08 Apr 2009 06:57:04 -0000

Hi,

you might be interested in The Data Tank. we just opened up a technology preview on http://thedatatank.com and are looking for enthusiastic people around the world for joining in on our team!

Re: We need a Wikipedia for data - Bret Taylor's blog

Rory McCann — Fri, 27 Feb 2009 09:22:59 -0000

+1 for OSM.

At least in the USA they have decent Free GIS data. There is basically *no* geographic data for most countries in the EU, that's why OSM exists.

As well as google providing the GPS traces, they could also provide the camera view images to OSM, that way OSM could also add pedestrian crossings, post boxes, etc to the database.

Re: We need a Wikipedia for data - Bret Taylor's blog

Marc Perramond — Mon, 14 Jul 2008 15:35:44 -0000

Hear hear! My company (http://www.insideview.com) faces all of the challenges you outlined above for a specific vertical application... aggregating and making sense of all the business information available to sales and marketing professionals. We license data from traditional editorial players like Reuters, D&B, Hoovers as well as web site harvesters like SimplyHired and ZoomInfo, user contributed communities like Jigsaw, and social networks such as LinkedIn and Facebook. Each of these have "data issues", i.e. each model brings different strengths and weaknesses to the table as a data source. And perhaps most interestingly for us, since we have to do the heavy lifting as a meta-aggregator, is that they all have different formats and unique identifiers. It's tough enough for our algorithm... imagine individual end users trying to make sense of it all.

I have also wondered about the possibility of a DataWiki (I like the term Bret!), or perhaps of wikipedia taking on the challenge itself (wikipedia already has a large number of company profiles and there is now a formal Companies WikiProject (http://en.wikipedia.org/wik.... I agree it would probably have to be seeded by a significant player in the data space, something that will be painful to do for a company that gets significant revenue from licensing their data.) I also agree with many of the other comments made here about the importance of a data standard (probably by vertical, as xml dude suggested). In our case solving the challenge of a single standard for various data sets would be HUGE. The issue of seeding, while a challenge, is only temporary. Data is becoming increasingly commoditized and the price is fast approaching zero. Case in point, take Jigsaw's recent open data initiative in which they decided to give away their database of basic company profiles in order to generate interest in their service (and their less commoditized data for individual executive contacts.) There's some seed right there!

Re: We need a Wikipedia for data - Bret Taylor's blog

anon — Wed, 30 Apr 2008 20:47:10 -0000

Great idea - but check Freebase, this has some of what you're proposing.

Re: We need a Wikipedia for data - Bret Taylor's blog

macdavid — Sat, 26 Apr 2008 08:05:49 -0000

Bret... something which many of us are thinking of... but as you point out impossibly complex to do well, in specific areas of focus it is feasible of course but Data per se... very complicated. I work around the world supporting development of different institutions and countries, if you or others establish a group looking at this let me know... data and effective Information Management is the key to stimulating change and advancement.

Re: We need a Wikipedia for data - Bret Taylor's blog

xml dude — Wed, 16 Apr 2008 12:15:05 -0000

I really like your ideas and have had similar ones. There is a parallel between electricity and data, which is this. It is really valuable, because it allows us to do all the things we need to do, and is very adaptable to being converted into forms usable directly or indirectly to achieve our goals. To that end, data needs to have a 60 Hz 120 V standard which might not be perfect for every application but can be transformed to whatever is
required. To me, XML and vertical XML standards are the way to go.

Once you get the data inside your firewall, what you convert it to is your business. If you create data at the 60 Hz 120 V standard, you should be able to roll your meter backwards. It costs money to collect data so it should be worth something in both directions.

Re: We need a Wikipedia for data - Bret Taylor's blog

Nils Hitze — Mon, 14 Apr 2008 10:38:46 -0000

This is so true, thanks for writing this.

Re: We need a Wikipedia for data - Bret Taylor's blog

ChemSpiderman — Sun, 13 Apr 2008 19:37:02 -0000

At ChemSpider we've been working hard to put together a free access website for Chemical structures and related information/data. At present we are close to 20 million structures linked to other websites, data sources and, other than the efforts of the NIH and PubChem ChemSpider has one of the richest (and crowdsource curated) datasets available online. We are working hard to curate the Wikipedia Chemistry dataset, with members of WP:Chem (http://www.chemconnector.co... at present. I agree that we need more data online and available. It is interesting to note how few are willing to PROVIDE data though, even among the advocates within this domain.

Re: We need a Wikipedia for data - Bret Taylor's blog

Sebastian Wain — Sun, 13 Apr 2008 16:45:47 -0000

IMHO as a specific subset of data we need Open Marketing Data where all size companies can benefit from consumer information and analytics. A good innitiative would be sharing data between different companies sizes in a secure way without compromising the consumer identities. Social networks are very trendy, but there is an opportunity window to business networks in a SOA sense, Google analytics has added recently a benchmarking option but it's not enought for a serious change.

Re: We need a Wikipedia for data - Bret Taylor's blog

Phillip Shoemaker — Fri, 11 Apr 2008 13:08:35 -0000

Bret, I absolutely agree with your thoughts on the wikipedia for data. I work at Numenta, where we are focusing on creating an intelligent platform. Programming for the platform is fairly straightforward, however, once you want to solve a problem like finding a pattern for predictive toxicology, machine vision, or audio problems, you run into an issue of the datasets. Where do you find a good dataset for handling the toxicology of certain drugs on myriad people? Well, the answer is, you don't. Not easily anyway. Additionally, for audio issues, people tend to go to NIST and pay a lot of money for datasets.

If these were in an open system, more people can experiment and solve real problems with any technology (of course, we'd prefer it if people used HTMs).

Re: We need a Wikipedia for data - Bret Taylor's blog

Jono — Fri, 11 Apr 2008 12:34:31 -0000

I agree. We were looking at displaying a TV guide on our site awhile back, not to bring in additional revenue, but just in an attempt to be more user focused as it is something that would benefit much of our target audience.

Unfortunately it was going to cost us $6k-$9k/month (there is only one source here in Australia) which is crazy, so we obviously decided this was not an option.

Re: We need a Wikipedia for data - Bret Taylor's blog

Andrew Turner — Fri, 11 Apr 2008 08:55:54 -0000

There are already several efforts like this going on (as I is easy to notice in all the comments)- in different specific application spaces. But what's most important is that they keep this data open and easily shareable. So that one person can build a Geospatial Data "Wiki", and someone else a Business Listing Wiki, and these two could be brought together by someone looking for local businesses (as an example).

So it's imperative that these systems support open, commonly used formats and API's.

Specifically for Geo data, a project I work on is Mapufacture that is bringing together various data sources in a large number of formats and then sharing them out via common formats - so a user (or developer) can just use the format that makes sense for them.

Re: We need a Wikipedia for data - Bret Taylor's blog

JP — Thu, 10 Apr 2008 23:58:38 -0000

Dealipedia, the business deal wiki, currently has almost 20,000 transactions on record and offers a free daily newsletter roundup of recent M&A, VC investment, IPO and bankruptcy deals.

http://www.dealipedia.com/
http://www.dealipedia.com/n...

Re: We need a Wikipedia for data - Bret Taylor's blog

Scatman Dave — Thu, 10 Apr 2008 18:26:25 -0000

WELL Agreed

Re: We need a Wikipedia for data - Bret Taylor's blog

TimG — Thu, 10 Apr 2008 18:16:12 -0000

the most stupid thing is, you could scrape it off the site pretty easily anyway, since they expose it record by record in all it's beautiful toiletness glory.

Re: We need a Wikipedia for data - Bret Taylor's blog

Jo jo — Thu, 10 Apr 2008 17:37:10 -0000

In the early '80s I was a member of the ANSI cartographic data standards committee (can't remember the exact name). The idea was to at least get the government agencies that provide such data to do it in standard ways. You're right, it's a hard problem. But it could be sooo valuable.

Re: We need a Wikipedia for data - Bret Taylor's blog

patrickatevri — Thu, 10 Apr 2008 17:27:07 -0000

Interesting post. But this part is provoking:

"No one really wants factual data accuracy and completeness to be their competitive advantage; we all want the best data possible to build the best products possible, and discrepancies in data quality are artifacts of the extremely inefficient economy of buying and selling data we currently live in. If everyone had the same, high quality data, all of our products would be better for it."

This is certainly true, but I think you've missed some of the incentives.

First, no one wants to compete in an efficient market. Efficient markets are really hard to make money in. It's why venture capital money does not go to people who want to sell wheat (or corn/oil/insert-random-commoditie-here). INEFFICIENT markets are where the money is -- that is, in an inefficient market if your company has figured out what really matters, you have a huge competitive edge. In an efficient market, all players know exactly what matters and a competitive advantage is hard to find.

I should be clear that I am not arguing with your conclusion -- I think the world would be a much better place if the market for data sets was basically an efficient commodity market. I don't, however, believe that any of the current players in this inefficient market have a lot of incentives to move us in that direction. Quite the opposite, in fact; a winner in an inefficient market is basically a player who has "solved" the game, and that player has every incentive to keep the solution secret.

Re: We need a Wikipedia for data - Bret Taylor's blog

Michael Gaio — Thu, 10 Apr 2008 15:33:58 -0000

We have an open-source "wikipedia" of data:

http://freebase.com/

Re: We need a Wikipedia for data - Bret Taylor's blog

peter murray-rust — Thu, 10 Apr 2008 13:41:09 -0000

Just to endorse Rufus and many other posters on this. I'm a scientist and have campaigned for Open Data (see the WPedia entry (http://en.wikipedia.org/wik... for what I hope is a fair summary). I believe that almost al scientists want their data to be Open but don't realise the problems and the methods for making t Open.
(Peter Murray-Rust)

Re: We need a Wikipedia for data - Bret Taylor's blog

Chris DeBrusk — Thu, 10 Apr 2008 13:17:36 -0000

While accessibility is really important I think usability is equally as important to make this concept work. The sites out there today are either two complicated to interface unless you are versed in esoteric W3C specifications, or to one dimensional - upload data, download data, repeat.

Make it interactive and fun to use and I think you'd get all sorts of data from all sorts of people - not just the tech community.

Re: We need a Wikipedia for data - Bret Taylor's blog

Jan Horna — Thu, 10 Apr 2008 11:22:31 -0000

How about microformats? If I have the data for public sharing, I could be able to export them into reusable XML format with a given structure (e.g. microformat). I can imagine this way could the data flow among different web apps.

Re: We need a Wikipedia for data - Bret Taylor's blog

marc wick — Thu, 10 Apr 2008 08:59:31 -0000

Bret

I agree with Evan, that we don't necessarily need a huge ultimate data store for all and everything. What we need are relations between open datasets in the way of the LinkingOpenData Project :
http://esw.w3.org/topic/Swe...

The project already interlinks data from wikipedia, GeoNames, MusicBrainz, WordNet and many more.

Re: We need a Wikipedia for data - Bret Taylor's blog

Evan Prodromou — Thu, 10 Apr 2008 08:44:51 -0000

Hi, Bret. As someone very involved in Open Content and Open Data, I'm glad to see the firestorm of discussion you've started.

I think that it's more likely that we'll see Open Data split vertically rather than one big open data warehouse. People are more able to concentrate on creating a TV guide (like TVIV) or business listings (like Openguides) than stare at some big spreadsheet-style interface and come up with some "data" to share. I think projects like Freebase will be great for aggregating data created by more vertical projects.

I've started a project called Vinismo, where we're documenting every wine in every country in the world, both with unstructured text and with structured (RDF) data. It's an exciting project, and I think there are lots of other similar ones out there.