You are here

The Free and Open Software Behind IBM’s Jeopardy Champion Watson

IBM has completed another feat of man against machine. In 1997 IBM’s Deep Blue supercomputer defeated chess champion Garry Kasparov. Now, in February 2011 IBM has defeated Jeopardy champions Brad Rutter and Ken Jennings using a supercomputer named Watson. This latest victory puts the score at: IBM Supercomputers two, mankind zero.

IBM is a solid research company that I have always admired for their very ambitious technical feats. I believe IBM chooses these feats very carefully. The average person cannot what current Artificial Intelligence technology means to their lives. Yet they can understand what it means to beat a chess or Jeopardy champion.

As a computer programmer and contributor to several AI projects I am very interested in learning more about how Watson works. There is a great deal to Watson. This article is by no means meant to be an exhaustive treatment of the technology of Watson. A book could easily be written on the subject. Rather, this article looks at the building blocks that were used to construct Watson.

IBM did not build Watson from scratch. The leveraged existing open source projects to provide much of the infrastructure for the Watson project. Much can be learned of how Watson functions by examining the free and open source (FOSS) components that make up Watson.

Watson in a Nutshell

We will begin by taking a very high level look at how Watson works. The goal of Watson is to be able to answer Jeopardy questions. The following represents a typical Jeopardy question.

“This NFL quarterback is a great-great-great-grandson of Brigham Young.”

Jeopardy question can be complex, usually requiring the contestant to put several facts together. Fortunately, for Watson, the answers are formatted very simply. In this case the answer is “Who is Steve Young”. Watson simply has to return a fact. It is not necessary for Watson to explain anything.

The basic process that Watson follows is summarized here. Watson uses a complex array of natural language processing, semantic analysis, information retrieval, automated reasoning and machine learning to answer the questions. Many existing algorithms from these fields are used. Watson did not introduce many new algorithms in these areas. Rather Watson uses many existing algorithms to generate potential answers. The confidence is measured on each answer. If the confidence on the best answer is good enough, Watson will provide an answer. In Jeopardy, you are penalized for incorrect answers. Because of this, the machine does not want to simply “guess”. Watson has to be reasonably sure.

A simplified illustration of this process is shown here.
Simple Diagram of IBM Watson

First, the sentence is parsed. Then hypotheses are created. These hypotheses are then checked against evidence. Finally, the hypotheses have confidence levels assigned to them. If the top hypothesis has a confidence level above the threshold, Watson proposes an answer.

Fortunately, for Watson, this is a very parallelizable task. Because of this, Watson was designed to execute tasks using grid computing. You can see this from Watson’s hardware stats.

  • 90 IBM Power 750 Servers
  • Additional I/O, network and cluster controller nodes
  • 2,880 POWER7 processor cores and 16 terabytes of RAM
  • $3 Million USD
  • Hardware cost of around $3 million USD

These are the stats that I saw repeated in many articles about Watson. However, I am more interested in the software side of Watson. A great deal of custom software was written for Watson. However, in this article I will focus on the “off the shelf” software and data used to create Watson.

All Things Deep

Watson is based on an underlying technology called DeepQA. DeepQA is based on Deep Learning. Deep learning, in a nutshell, means that you can train very small parts of a system independently. This allows you to create a very deep system. Consider this somewhat abstract example. A typical corporation may have thousands of employees all working to earn a profit. Each employee contributes to the final "bottom line" profit of the company in some way. Traditional machine learning attempts to train all employees simultaneously to achieve this goal. No regard is given to what individual blocks of the company accomplish. This has numerous issues. The largest being the vanishing gradient problem analyzed in 1991 by Jürgen Schmidhuber's student Sepp Hochreiter. The vanishing gradient problem simply means that the final goal becomes diluted before it reaches the lowest levels of the network. This is similar to the telephone game that children play. Deep learning allows individualized training of different sections of AI program, or model.

The Building Blocks of Watson

Watson is made of many software components. Some of them were custom software components designed by the IBM teams. However much of Watson uses “off the shelf” components which are available freely. The software structure of Watson is listed here.

Software alone will not solve the problem. Watson needs a massive amount of data. Watson made use of two types of data. Computers usually make use of very structured data. When you think of computer data, you think of databases with organized rows and columns. Watson did make use of some structured data. However, most of the answers will come from unstructured data sources. These unstructured data sources are listed here.

  • The Complete Text of Wikipedia
  • Encyclopedias
  • Dictionaries
  • Thesauri
  • Newswire Articles
  • Literary Works

These unstructured data sources are simply large volumes of English text. They are organized by topic, but any additional information must come from actually reading the sentences.

Watson did make use of some structured data sources. These structured data sources typically contained information about the English language. This gave Watson a jump start to be able to begin reading English and finding what it needed. The following structured data sources were used by Watson.

All three of these data sources are freely available. Though not technically software projects, they will be discussed later in this article. Next we will look at the open source software projects

Apache Hadoop

Apache Hadoop is a natural for Watson. Hadoop is an open source framework that is used for grid computing. Hadoop makes use of something called the Map Reduce algorithm to split one very large job into many smaller components. This allows Watson to make use of the large number of CPU cores present in its hardware.

The task of forming hypotheses and then validating these hypotheses against data is very parallel. This allows Watson to arrive at Jeopardy answer in a time quick enough to be competitive with a human contestant. The speed of individual CPU cores has pretty much topped out at around 3ghtz. For software to attack larger problems parallel programming is the key. Frameworks like Hadoop make parallel programming somewhat easier. Though it is still often hard for programmers to mentally take a linear task and make it parallel.

Apache Hadoop also provides a distributed file system used by Watson. This allows any hard disks and Watsons 16TB of RAM to be shared across the nodes. Watson has a massive amount of data and Hadoop’s distributed file system can be used to allow data to be quickly moved through the system. When running as a contestant, Watson uses only RAM. Hard disks are simply too slow.

Apache UIMA

Apache Unstructured Information Management Architecture (UIMA) is framework for handling unstructured data. Watson must deal with a very large volume of unstructured data. This makes UIMA a natural component choice for Watson. UIMA is very useful when dealing with a large amount of unstructured text. Examples of this include English text and log files from computer programs. UIMA can even process audio and video.

UIMA provides standards-based frameworks that allow analysis and annotation of large volumes of computer text. Watson used Apache UIMA for real-time content analytics and natural language processing. This allowed Watson to accomplish the following tasks:

  • Comprehend clues
  • Find possible answers
  • Gather supporting evidence
  • Score the answers
  • Compute confidence in each answer
  • Improve contextual understanding

Using Hadoop and UIMA together this could all be done in less than three seconds.

Next I will examine the structured databases that Watson made use of. For anyone interested in Natural Language Processing and AI, these are just as interesting as the software components used by Watson.

WordNet

WordNet is a lexical database for the English language developed by the Cognitive Science Department at Princeton University. At the most simple level WordNet can be thought of as a “super thesaurus”. WordNet is useful to help Watson make sense of English at a low level. WordNet allows Watson to know that some words go together. For example the term “car pool” should not be treated as two separate words. Likewise, the term “United States of America” should not be treated as four words. WordNet can be used online at the following URL.
http://wordnetweb.princeton.edu/perl/webwn

WordNet provides thesaurus type services. This allows Watson to know that two words are synonyms. Wordnet also provides a hierarchy of words. This can be important, consider that “every parrot is a bird, but not every bird is a parrot”. WordNet can also help you understand simple relations, such as “water is a liquid”.

WordNet is very useful to help Watson make assumptions. For example, Watson may know that birds can fly. Therefore, by extension, it can assume that a parrot, being a bird, can fly. Yet Watson may find evidence to contradict a temporary belief that a chicken might be able to fly. This is where confidence levels become very important. This allows Watson to weight conflicting evidence as to whether a chicken can, or cannot, fly.

DBPedia

DBpedia is a project to extract structured information from Wikipedia. This allows sophisticated queries against Wikipedia. This makes it easier for the amazing amount of information in Wikipedia to be used in new and interesting ways. Watson makes use of DBPedia to quickly get to the specific part of WikiPedia that it might be looking for. DBPedia can be used at the following URL.
http://wiki.dbpedia.org/OnlineAccess

DBPedia makes use of something called SPARQL. SPARQL is an SQL-like query language for RDF. This allows DBPedia to query through WikiPedia’s RDF tags and return a list of WikiPedia pages that might contain the request information. The following is an example SPARQL query.

PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?name ?birth ?death ?person WHERE {
     ?person dbo:birthPlace :Berlin .
     ?person dbo:birthDate ?birth .
     ?person foaf:name ?name .
     ?person dbo:deathDate ?death .
     FILTER (?birth < "1900-01-01"^^xsd:date) .
}
ORDER BY ?name

This example, taken from the DBPedia examples, returns a list of the articles for everyone born in Berlin before 1900.

DBPedia would be very valuable to Watson. Using DBPedia, Watson could quickly navigate the vast WikiPedia data and find a specific article.
YAGO

YAGO is a knowledge base that includes much of the information provided by WordNet and DBPedia. YAGO was developed at the Max-Planck-Institute Saarbrücken. The knowledge base contains more than 2 million entities. Entities are persons, organizations, cities, etc. YAGO knows 20 million facts about these entities. YAGO has a manually confirmed accuracy of 95%. The YAGO ontology is licensed under the GNU Free Documentation License. YAGO can be tried online from the following URL.
http://mpiat5401.ag5.mpi-sb.mpg.de:8081/webyagospotlx/Browser

YAGO has a great deal of information to help Watson know what something is. Using Yago, Watson could quickly get some basic facts about people and places.

Conclusions

Watson is an amazing achievement. It shows how many different technologies and data sources can be used together. Watson also showcases some interesting computer technologies and databases that have wide application beyond just Watson.

IBM sees a great future for DeepQA, the IBM technology that Watson was built upon. IBM sees application in any field where a vast amount of information must be analyzed. Applications that IBM has specifically mentioned are the medical and legal fields.

Grid Computing and Parallel Programming are the future of the Information Technology industry. As we must process greater and greater amounts of information, computers must be designed to act in parallel. Watson gives us a glimpse of that future.

Sources

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer