The Future of Search

Can we ever truly know what the future holds? Not really, but we can certainly provide theoretical predictions based on previous knowledge, contextual clues, public opinion, and momentary relevance. Ironically, when considering the future of search, these are all descriptors of what search will continue to evolve into as time moves on.

I recently had the opportunity to learn from a couple of very smart men – Bob Pritchett and Perry Fizzano. Bob Pritchett is one of the co-founders of Logos Bible Software, a company that produces electronic Bible study resources that are made available to customers around the world. Perry Fizzano is an assistant professor of Computer Science at Western Washington University and one of the developers of Treelicious, a system that allows hierarchical navigation of tagged web pages.

From my time learning from these two men, I was able to understand a little about where they believe search will be in the future. Obviously, the products they’ve both helped to create are good indicators of where they think search is headed. Both rely heavily on semantic technology, which is, in my belief as well, the basis of the future of search.

Getting To Semantic Search

Over the years, we have watched our search technology evolve from needing to have deep knowledge of complicated Boolean search queries to now what Pritchett referred to as “bag of words” searches – where we are able to throw a bunch of words into our favorite search engine, and it is able to work its algorithms to return relevant results to us without having been provided much context or thought structure. Lately, search technology has taken an even more logical approach.

If you remember logic from your statistics class in high school, you know that a basic logic problem can be answered with “if-then” statements; for example, if a and b, then c. When applying this type of deductive logic to search technology, you get what Pritchett referred to as RDF triplets. You may recall from our semantic technology article here on SourceCon that RDF (Resource Description Framework) is part of the Standardized Semantic Web technologies that enables search engines to clarify and classify bits of knowledge.

Couple this advanced logical technology with human tagging, historical and regional dialect, and large, community-built databases of knowledge (such as WordNet and OpenCyc, which would be considered both a database as well as a ‘commonsense reasoning engine’) and you’ve got search technology that is not only capable of almost human-like deductive reasoning, but also able to evolve within the boundaries of cultural and regional relevance by tapping into human contributions.

Two Examples

Treelicious and Logos are both good examples of these capabilities. Treelicious, which was created by Fizzano and one of his graduate students, Matt Mullins, is based on the collaborative user-generated tagging service of Delicious. User-generated classification of things has come to be known as a “folksonomy.” According to Fizzano, Treelicious combines “a free-tagging system and polyhierarchy – the best of both worlds… it takes the freedom and fluidity of tag- ging systems and leverages a thesaurus to impose semantic structure on the tags. Navigation around the tag space now becomes more intuitive and informative since we can generalize to broader content and specify to narrower content.”

While this service does not appear to be available to the general public just yet, when used for search it provides two columns for results: column one returns web results, and by leveraging “folksonomy” data from its sources, column two allows for both a more generalized and more specific search. Broader terms fall under the “Generalize” box. Narrower terms are split up into “Specify (branches),” which allow for further refinement, and “Specify (leaves)” which are terminal terms that do not allow further specification. These results, of course, are determined based upon the semantic technology coupled with “folksonomy” associated with specific sites, terms, images, and so forth. (You can read more information on Treelicious in this abstract.)

With Pritchett’s company, the search concepts are slightly similar, but with a much narrower focus. The search technology has to take into account that the knowledge base, i.e. the Bible, was originally written in Hebrew (Old Testament) and Greek (New Testament) and thus the majority of people’s understanding of the Bible is through translations from these two languages, which are both quite complex. Thus, the basic concept of the technology is quite amazing – it can comprehend the underlying data and making semantic connections based on similar words to the actual word searched. This data often comes in the form of multiple words in Hebrew or Greek for which there is only one English word, word “gender” assignments, Biblical and historical measurements that are not used in modern times, and so forth. The idea behind the technology is to be able to associate every word, every story, and every verse in the Bible to everything else – much like the tagging references mentioned earlier.

It’s All Greek To Me

If this all sounds like ‘Greek’ to you at this point, hang in there… there IS a simple explanation and conclusion to come from all of this!

Remember when WolframAlpha was first introduced in 2009? Everyone thought it would be the end-all for search engines. Until we played around with it and discovered it had some limitations for calculation. It cannot take multi-faceted inquiries and produce a relevant result. Example: type ‘Barack Obama’ into WolframAlpha. You get basic information about the current President of the United States. Now, type in ‘Michelle Obama’s husband.’ Even though you are asking for the same thing, WolframAlpha is not able to make the logical deduction of who Michelle Obama’s husband is. If you’re familiar with START, which was developed in 1993 by members of the MIT Computer Science and Artificial Intelligence Laboratory, this was the same problem it had as well – one-dimensional search only.

Last year, IBM designed a computer system that is able to understand questions posed in everyday human language and respond with a precise, factual answer – arguable within three seconds. This was accomplished by linking multiple knowledge databases sitting on several servers to improve computational deductive reasoning. This system was named Watson, and in February 2011, Watson made a guest appearance on the game show Jeopardy!. Not only did Watson play competitively against Ken Jennings and Brad Rutter, it beat both men decisively.

Josh Letourneau wrote an article postulating that the fact that Watson had accomplished this meant the end of the need for sourcing professionals to know complex Boolean logic. There is an element of truth to this: take a look at all the automation resources already at our disposal that can create Boolean strings for us. Is there really a need to have such deep knowledge of Boolean any more today?

What the Future Holds

So – circling back to the original question: what is the future of search? Some of you may have visions of SkyNet dancing through your heads right now of where things are headed. But according to Pritchett, this won’t be the case. He believes that search technologies will certainly be able to understand more and more human language structure and possess more logical, deductive reasoning capabilities. However, until the technology can brainstorm and build itself and walk across the room, the creators (humans) will always be greater than the creation (technology).

Search in the future will consist of ever-improving tools and interfaces designed for searching semi-structured and unstructured data in the attempt of making sense of it all. Need an example? The Internet: the ultimate semi-structured database of information. Search engine developers are constantly coming up with better ways to make sense of it all. Semantic is at the base level of the best of these.

A few parting thoughts for you as you ponder your future as it relates to search technology improvement:

The advancement of search technologies will NOT put you out of a job…as long as you keep your other skills sharp. It will simply make it easier for you to get through some of your activities so you can focus on other things. Hint: perhaps it’s time to brush up on some effective communication skills.
If the future of search is linking databases for RDF triplet search, then the companies who own or are busy compiling data are the ones that hold the key to the future of search. Think about what companies right now gather enormous amounts of social data…Twitter, Facebook, and LinkedIn come to mind immediately. Plus, they all use their own methods of allowing for the tagging of data and associating people with other people, places, and things. These companies are certainly on to something.
Where will quality control for these knowledge databases come from? It cannot be completely reliant on crowd-sourcing and folksonomy. Just because 1,000 people agree on an incorrect piece of information doesn’t make it true. We all have biases. Perhaps simply the nature of semantic technology will provide its own layer of fact-checking just based on the logic that is used.

As always, I am interested in your thoughts on the future of search. Please feel free to add a comment below and let’s keep the conversation moving forward.