Crowd-sourcing — accurately!

Share:

Many popular sites, such as Wikipedia and Tripadvisor, rely on public participation to gather information — a process known as crowd data sourcing. While this kind of collective intelligence is often valuable, it is also fallible, and policing such sites for inaccuracies and offensive material is a costly undertaking.

But not for long. Prof. Tova Milo of Tel Aviv University’s Blavatnik School of Computer Science has developed a new database technology that can automatically evaluate information gathered from the public. By reviewing the incoming information and identifying questionable input, the program can efficiently moderate this input with minimal human interaction. The technology can be put to work in a number of ways, from fact-checking online encyclopedia content to alerting moderators about potentially offensive commentary — both saving valuable man-hours and improving the quality of information.

For her research, which was demonstrated in part at the 2011 International Conference on Data Engineering and will be presented in more detail at next year’s conference, Prof. Milo was awarded a European Research Council (ERC) advanced research grant, a highly prestigious grant administered by the European Union.

Mining crowd intelligence

It’s not just websites like Wikipedia that have a crowd-sourced component to their data. The bookselling site Amazon uses crowd-sourced data to provide reviews and book lists, and most news sites crowd-source comments and responses to articles. Because these sites are designed to be dynamic, Prof. Milo explains, “Every day, old information is updated and new information comes in. It’s very difficult to maintain.”

Typically, overworked staff members are tasked with sorting through the piles of information received to determine if any inappropriate material has made its way onto a site. But Prof. Milo’s database technology can change that as well.

The framework Prof. Milo has developed has clear tools for managing information. The application can flag those parts of incoming information that seem questionable — and from there, the technology can send out automatic notices to moderators, alerting them of comments that should be taken down, facts that need to be checked, and places where more information is needed. In some cases, Prof. Milo says, the program can even determine the staff members or others who are best able to evaluate the information.

Filling in the blanks

Prof. Milo’s technology has already been demonstrated as a social trivia-like game. At a conference, she asks participants to play a computer game where they are requested to answer trivia questions about their fellow conference participants. The questions are automatically selected to maximize the information obtained about each person. Answers are scored, and the program identifies the questions on which they have satisfactory information, and those on which information is still lacking. This not only shows the program’s ability to pinpoint information gaps, but also to engage the crowd itself to complete the necessary information.

Ultimately, the system ensures that the crowd is being used efficiently. “It’s about knowing to ask the right people the right questions,” she says. By using human input more selectively, the results will be of a higher quality, and sites will save money and time on controlling content.

Tags:

Leave your comment