Category Archives: Natural Language Engines

Brainboost Officially Launches

Brainboost, the natural language engine, has implemented some changes and re-launched. As regular readers know, I’m a fan of their search technology. I recently interviewed Assaf Rozenblatt, and a while back I also did a review of Brainboost.

Besides the new logo and different color scheme, there are two major relevancy differences that I immediately noticed. The first is that on the home page there is now a whole slew of sample questions to ask. In fact, it looks just like a directory in that it is arranged hierarchically by subject. The subjects themselves are not links, so clicking on History is not possible. But beneath History is a selection of three sample questions. I suppose these sample questions help first-time users know how best to search Brainboost. And they also provide positive examples of the technology for those evaluating it for enterprise search. For me, as someone who already uses Brainboost on a regular basis, the categorized questions didn’t help in any way.

The second big difference that stands out is that on the results pages the first results are Brainboost’s natural language results; OK same as before. But beneath those are regular search results from a selection of other engines. Brainboost gets its results to natural language queries by reformulating queries and sending them against other engines; smart meta-search. It is not exactly clear to me the relation between the two sets of search results, and to be honest I probably will not use the regular search results section much.

A less noticeable thing is that Brainboost now has tips. I did a search and then got this: Brainboost is not a chatbot. It was designed to answer questions which are factual in nature. In case you are wondering what query triggered the tip, it was how do you jumpstart a motorcycle?, since my motorcycle isn’t starting again after being out in the rain and I can’t remember the positive/negative hook ups. The one result for this search is not relevant at all.

For other queries, like what is the population of Scotland? the results were just as good as when I reviewed Brainboost. I will keep playing around with the new Brainboost to see if I can find other differences. If you haven’t used this engine, I recommend giving it a shot. It can definitely come in handy when you are looking for the answer to a question.

BrainBoost – Interview with Founder Assaf Rozenblatt

BrainBoost is a natural language search engine. Ask BrainBoost questions in plain English and you’ll get answers in plain English. BrainBoost is automated and uses no human editorial invention. The legend goes that BrainBoost was created by 24-year-old software programmer Assaf Rozenblatt. It took him a year to build it and he built it so that his fiancé could better do her college research.

For more information and analysis, check out the review of BrainBoost I did for the Search Lounge.

Hi Assaf, thank you so much for joining me here at the Search Lounge. I know you started BrainBoost, but what exactly is your role these days? And can you provide some more background about the size and structure of the company?

We are a very small team at the moment, with only a handful of developers.
We are still primarily focused on development, but we will be switching gears soon to the sales and marketing of our licensable AnswerRank technology.
I am still very hands-on with the software development and continually help improve the technology on an ongoing basis.

A big issue in Internet search is evaluating the trustworthiness of sources. This issue is amplified in BrainBoost because the answers are shown right on the search results page and do not require users to click through to investigate the trustworthiness of the source. For example, for the search what is the population of Scotland?, the first three answers are slightly different (5.2 million, 5.1 million, just over 5 million. Like I said, just a slight difference.) Maybe if you included a published/crawled date, would that help? Or some kind of page rank metric? Do you have any suggestions for how BrainBoost users should address this issue?

We are currently working on a PageRank like system to help identify trustworthy sources.

How do you evaluate the relevancy and quality of the results that are returned on BrainBoost? Do you have a formal process in place for doing this? And, what subjects or types of queries do you think BrainBoost is particularly good at? How about subjects or types of queries that need some improvement?

For QA, we compiled a database of common questions and manually researched the answers for each of them. We then run the questions through the BrainBoost engine, which in turn automatically goes out to find answers. Precision is then easily determined by comparing what percent of the automatically generated BrainBoost answers match our manually found answers

There really isn’t a question type that is problematic for us at this time.

BrainBoost is 100% automated, but would you consider blending BrainBoost’s technology with some editorial content or mapping of results for certain types of queries?

Extracting answers from unstructured documents is what really sets us apart from existing ‘Answer Engines’ like Ask Jeeves and the new MSN search. It’s a much trickier problem to solve, and we are going to continue focusing on it for the time being.

Can you provide any insight into how BrainBoost reformulates a query when it sends it to another engine? Any chance you might be willing to provide an example of how this works?

Query reformulation helps ensure search engines return web pages that most likely contain answers somewhere within them. A simple example: “what does NASA stand for” gets reformulated into “NASA stands for”. This simple reordering of words (and the conjugation of the verb) greatly boosts the likelihood that relevant documents are returned by the engines. With larger and especially multipart questions this can get very complicated.

There’s something I don’t quite understand about BrainBoost. I enter a search on BB; BB reformulates my query and sends the new query against other engines; the other engines provide results; BB gathers those results and ranks them. OK, so here’s the question: how does BB take a result from another engine and then show a different description (and title?) than what I would see on the other engine? Or am I missing a piece of the puzzle?

BrainBoost does not just display the results it gathers from other engines. It merely uses those results as it’s starting point. The core technology of BrainBoost is a system we call AnswerRank. The AnswerRank system is given a question and a collection of documents. AnswerRank then analyzes the documents line by line and automatically extracts the very best answers from those documents. The top few hundred search results from the popular engines are what we feed into AnswerRank. BrainBoost begins where the search engines leave off.

Does BrainBoost give a higher weight to certain sources? How about results from certain engines?

No, not at this time. All sources begin processing with an equal weight.

I’ve noticed that it matters if I don’t format my search like a question. Compare these two queries: population of Scotland vs. what is the population of Scotland?. Is that done on purpose?

BrainBoost pays close attention to all words in the question. The type of words you use and the order in which you use them determines what classification, or algorithm, BrainBoost will use to answer your question. Whereas most search engines ignore words like ‘what’, ‘where’, ‘when’ and ‘how’, BrainBoost very much relies on them. In this case, the wording of the two questions resulted in two distinct classifications.

Sometimes I see repeat phrases being displayed, such as for the query What is BrainBoost, the following phrase is repeated several times:
-BrainBoost is a Question Answering search engine.-
This probably is not too big a deal, and in fact it may even be a good thing because it shows agreement, but what is your opinion about it?

We chose not to filter out answers that provide the same information in slightly different ways. Like you said, it really does help with identifying agreement towards a specific answer.

I read some helpful information you posted about BrainBoost in a thread on Search Guild. You wrote: “BrainBoost classifies incoming questions into distinct categories. Classification enables BrainBoost to predict what lexical properties the answer will most likely contain.” Can you expound on this? Do you classify searches based on the subject or topic of the search? Or do you parse the query to look for clues in the phrasing of the search? Or…?

Its best to give an example: When asked “how long do cats live?” BrainBoost recognizes that the user is looking for sentences that quantify the answer in terms of years/months/weeks etc. Responding with an answer that talks about inches/feet/centimeters would not be very intelligent at all. BrainBoost has many dozens of these types of classifications, all of which help ensure suitable answers are returned.

It seems like I hear very little about BrainBoost. Are you purposefully trying to keep a low profile? Or might that change in the future? I like BrainBoost and since it is so easy to use I think a lot of other people would like it too.

Yes, we have been trying to keep a low profile. It’s given us the luxury of time we needed to perfect our AnswerRank system.

A considerable amount of time was also spent on packaging AnswerRank technology into a licensable software component that can be ‘plugged into’ any existing keyword-based search system, allowing for companies to add Question Answering to their existing in-house search.

What do you see as the current state of natural search engines on the web? Would you care to predict for us what the world of natural search will look like a couple years from now?

I think Natural Language question answering mixed with sophisticated personalization is the future of search.

Lastly, what is your favorite drink?

Triple Grande Latte

Assaf, thank you for your time. Is there anything else you would like to add?

Thanks for your time Chris.


Type of Engine:
Natural language.
Overall: Very Good.
If this engine were a drink it would be…a Jack and Ginger. An old-time favorite search type that tastes refreshing after not being tried for a long time.

Don’t be annoyed that there’s no space between the words Brain and Boost. Instead, go ahead and ask BrainBoost questions in plain English and you’ll get answers in plain English. BrainBoost boasts that it’s completely automated and uses no human editorial invention. I guess that’s impressive since it’s pretty good as it is, but I still think any engine can only get better if editors are used in some capacity. And just because AskJeeves uses editors don’t let that fool you. It’s commonsense to me that a successful combination of the two approaches would be best for relevance; but I digress.

The legend goes that BrainBoost was created by 24-year-old software programmer Assaf Rozenblatt. It took him a year to build it and he built it so that his fiancé could better do her college research. (And all I gave my wife was this search engine review blog. Ouch.)

BrainBoost is honest. What does that mean? It means that when it doesn’t know the answer it doesn’t pretend it does know. Usually. Of course it’s not perfect and you get your share of false positive matches, but generally speaking it’s solid.

UI and Features
The Snap Open feature is cool because it opens to the relevant part of the listing; think of an anchor tag where it takes you right to the text on the page that answers your question.
Otherwise it’s all pretty straightforward.

Query Examples
I wanted to know Tony Gwynn’s lifetime average, so I asked:
what was tony gwynn’s lifetime average? and I got no results. Strange, since that should be a relatively easy one.
So I refined my query: what was tony gwynn’s lifetime batting average? . This time I got 2 results and the second one has the answer displayed right there on the search results page: .339 (turns out from other results I looked at that it was actually .338, but that’s certainly not BrainBoost’s fault). I didn’t even have to click to the site. Now obviously that has potential repercussions for all the engines that make $ by driving traffic to sites. But for now I’ll stay out of the financial fray.

For my next search I wanted to find out how much an annual subscription to Smithsonian magazine costs. So I queried:
how much is a year’s subscription to smithsonian magazine? I got 2 results that both had to do with getting a subscription as part of donating to an organization. Not good.
A little refinement was in order:
what is the annual cost of smithsonian magazine? returned no results.
Better try again, but this time I gamed the system by using a keyword-based phrase query instead of natural language. subscription to Smithsonian returned 6 results, 5 of which answered my question. Though with this one I had to click on the actual results (man that sounds lazy) to see the answer to my query because BrainBoost’s display text didn’t show me the answer. That’s par for the course with search engines, but I was hoping BrainBoost would display the answer right up front to this query. And in case you’re wondering, how much is a subscription to Smithsonian? had a very similar result set, though it missed one of the results from my previous query and the display text for the same results was different. But the point is I was able to query this by keywords and by natural language and get good, though slightly different, results.

And of course everyone enjoys a good laugh at the expense of natural language engines, just like we all enjoy laughing at translation engines. So here’s a good one: what is a sous chef?. The first result:
-The sous chef is legal. Hes an American.
But then a few results down is this great display text:
A sous chef is a chef ranking above line cooks and below an executive chef or chef de cuisine.
Good stuff. I’m a fan and will use BrainBoost when that nagging trivia question hits me, like what’s the population of Wales? . You’ve got to see that one for yourself, because it’s really good.