Category Archives: Search Engine Reviews

Hi Cuil

When I joined Searchme in 2006 a couple other startup search engines were making headlines. Powerset, which recently got bought by Microsoft, and Cuil, which launched today.

Cuil has been known as the startup building the huge index. I ran some pet queries and it seems like they’re pretty comprehensive, though other people have emailed me about missing results for their queries. So try it yourself. The UI is mostly text, with a couple visual elements thrown in like thumbnails next to results. For the life of me though, I can’t figure out where they got the thumbnails used for a query for Searchlounge. The text is in columns like a newspaper. They’ve implemented a query suggest tabbed feature along the top of the SERP, and an Explore by Category feature on the right (strangely several times when I clicked this I got a no result page, so must be something buggy).

I’m not going to delve into a relevance analysis, so give it a whirl yourself and see what you think. There are tons of posts already about it, but here’s Techcrunch’s: Cuil Exits Stealth Mode With A Massive Search Engine

Brainboost Officially Launches

Brainboost, the natural language engine, has implemented some changes and re-launched. As regular readers know, I’m a fan of their search technology. I recently interviewed Assaf Rozenblatt, and a while back I also did a review of Brainboost.

Besides the new logo and different color scheme, there are two major relevancy differences that I immediately noticed. The first is that on the home page there is now a whole slew of sample questions to ask. In fact, it looks just like a directory in that it is arranged hierarchically by subject. The subjects themselves are not links, so clicking on History is not possible. But beneath History is a selection of three sample questions. I suppose these sample questions help first-time users know how best to search Brainboost. And they also provide positive examples of the technology for those evaluating it for enterprise search. For me, as someone who already uses Brainboost on a regular basis, the categorized questions didn’t help in any way.

The second big difference that stands out is that on the results pages the first results are Brainboost’s natural language results; OK same as before. But beneath those are regular search results from a selection of other engines. Brainboost gets its results to natural language queries by reformulating queries and sending them against other engines; smart meta-search. It is not exactly clear to me the relation between the two sets of search results, and to be honest I probably will not use the regular search results section much.

A less noticeable thing is that Brainboost now has tips. I did a search and then got this: Brainboost is not a chatbot. It was designed to answer questions which are factual in nature. In case you are wondering what query triggered the tip, it was how do you jumpstart a motorcycle?, since my motorcycle isn’t starting again after being out in the rain and I can’t remember the positive/negative hook ups. The one result for this search is not relevant at all.

For other queries, like what is the population of Scotland? the results were just as good as when I reviewed Brainboost. I will keep playing around with the new Brainboost to see if I can find other differences. If you haven’t used this engine, I recommend giving it a shot. It can definitely come in handy when you are looking for the answer to a question.

Soople

Soople is a simple, but powerful idea. It is a search interface that overlays Google so that searchers don’t have to know Google’s syntax in order to be able to take advantage of of their advanced search features.

One of the things that frustrates me is that general web search engines, with few exceptions such as MSN’s search builder, have not done much of anything beyond tabs to help users formulate queries, and MSN’s search builder isn’t great at this point. Tabs work to target queries to source types rather than helping to formulate searches. Soople changes all that by laying a very easy to use interface over Google’s bare search box.

On Soople, rather than typing in syntax like define: ontology, there is a search box called Definitions, so that all you have to do is enter the word ontology into that box and it formulates the query and sends it against Google. Now, you may be thinking that define: is not too difficult a thing to remember, but there are about fifteen other things that Soople helps with such as searching within a site, filtering by file type, mathematical equations, related sites, conversions, etc. To see all they offer, check out their overview of all functions.

Something I really like is that Soople has pop-up JavaScript windows that succinctly explain each type of search. That way there’s no confusion for me as a user about what’s happening. They also offer a personalized My Soople interface and a free account takes about 3 seconds to set up. Then you can personalize the My Soople section. I personalized it by including every type of search they offer. Is that really personalization? If a tree falls….oh, never mind, it works for me.

Soople also offers some proprietary search help not available on Google. Such as searching for a topic that provides advanced features for focusing searches by subjects, such as books, fashion, sports, etc.

The search set functionality lets each user target searches to a selection of sites. In other words, you can create a search set for something like sports by adding sites like ESPN, MLB, Yahoo Sports, etc. Then save that set of sites and run searches against just that group. As Soople’s instructions aptly put it, “This way you can create your own miniwebs to search in.”

Recently I sent Soople over to a friend of mine whose opinion about search I respect more than anyone else I know. His response was: I gave up years ago and don’t even bother with the advanced operators. I use the single box and simply alter my query words. Works well enough most of the time and I think the masses fall into this camp. I’ll call it Camp Lazy Users. So, if I’ve given up on advanced operators then I really have no need for a Soople type page — it tries to give me stuff I do not use to begin with. Not saying it doesn’t have value, just doesn’t work for me.

My response to him was as follows: In my opinion, presenting a single search box and a few tabs to users and expecting that to satisfy all queries is a lot to ask. It’s difficult enough for me to keep up with the syntax each engine uses, let alone for most users, so I think an interface like this is great.

I believe that “helping” users with queries is one of the things we’ll see web search engines working on in the next year. Right now everyone is convinced that the simple Google type of UI is the be all, end all. But I disagree. Remember Dialog? Searching on Dialog offered some of the most powerful searching I’ve run across and subsequently its syntax was the most difficult to learn. Because of that I hated using Dialog and I never go back to use them because I’ve forgotten how to. And that’s a shame. But we’re going to see the same thing happening to web search engines if they’re not careful. They’re going to throw in all these new search options that hardly anyone will use because they’re so stuck on the clean UI. But it’s not too late. They can still build interfaces, or improve upon the advanced search pages, to help guide us through the search process.

Soople may not change the world of search, but I for one hope it does. For web search engines, who purport to be the interface to the world’s information, why is it that they leave us hanging with a naked search box augmented with tabs? That’s not enough. Although there are advanced search pages to use, and they’re a step in the right direction, they seem to be focused on parameters like language, adult content filtering, and Boolean terminology. All of that is important, no doubt. But where is the easy to use syntactical interfaces (is that the right way to say that?) for things like unit conversions and equations and flight information and time zones and so forth and so on? I hope the Soople idea catches on. Helping users create better queries will result in better search relevancy.

Kartoo

Type of engine: Visual search.

SUMMARY
Relevancy of results:Needs improvement.
Freshness of results:Needs improvement.
Features and functionality:Good.
Quality of help and “about us” pages:Average.
Business model:They sell a variety of search packages that can be reviewed on their solutions page.

INTRO
Remember a few years ago when you were sitting at your desk one slow afternoon and you got an email about a cool new search tool called Kartoo? The sender wrote something like, “You’ve got to check this out. It’s really cool.” And you checked it out and you thought, “Yeah, that is cool.” Remember that day? I certainly do. And then remember what happened next? Every time someone asked you about a cool search tool, you said, “Have you seen Kartoo?”
For years, “cool search engine” has been synonymous with Kartoo. In fact, check out the Yahoo results for cool search engine. What is that in position #1? It’s none other than Kartoo.

But here is the big question: besides calling Kartoo cool, how often do you use it? If you’re like me, the answer is seldom. So I decided to take a closer look at Kartoo and investigate its usefulness beyond the cool factor.

Kartoo launched on April 25, 2002 and since then has gained a reputation as the visual web search engine. On Google, MSN and Yahoo, the search visual search engine returns Kartoo in the first position. Kartoo offers a unique experience because although they are another meta-search engine, their search is based on their “visual display interface”, as they call it. The visual interface uses Macromedia Flash, though, they do offer an HTML version as well.

UI AND FEATURES
Kartoo has a legend for their visual display so that you can identify things such as Sponsored Sites, recently updated pages, clusters, certain file types, and domain types (.org, .net, .com). These are all useful designations but it takes some practice to get used to them. There is a lot to look at on a Kartoo results page, or map as it’s called, and so it’s not easy to distinguish some of the subtle visual clues that are available.

Kartoo offers Boolean and other advanced search syntax. Take a look at their Key Tips page for more about this. They state that by adding a question mark at the end of a query, Kartoo interprets the query as a natural language query. But I tried what is the population of Scotland? and then what year did Kartoo launch? and did not find the results useful. The Kartoo interface is not built for natural language searches because you have to mouse-over each result looking to see if it has your answer. And even doing that I could not find the answers I was looking for.

There is a FAQs page that gives more explanations about their technology. One particular question I found interesting is this one:
Is KartOO technology more pertinent than other search engines?
“It often is but not always… In fact, KartOO technology analyses the words you are asking for and then decides to question the most accurate search engines….As to the notion of relevance: when you ask for the word “ray” for example, you may mean the sea animal or the light device. The results you obtain may therefore be accurate or totally irrelevant to what you are looking for.
What is significant about KartOO in such a situation is that this technology provides a map that summarizes all the various and possible topics so that retrieved sites are in fact grouped into a form of topical “family”. A list, i.e., a linear classification of search results, could not represent all the applications connected to a word like “nuclear” for example, and above all, a list could not display the links existing between the applications.”

In other words, Kartoo’s visual interface acts as a clustering engine because it lets users look horizontally across a broad selection of sites quicker than going through a linear list. As their example states, this can be particularly effective for ambiguous queries where the searcher is trying to understand various meanings of the search term. Though it is my opinion that if someone is searching for info about something like the planet Saturn they will type in Saturn the planet rather than the ambiguous Saturn. But in any case I am a big believer in clustering even though I also think engines can expect searchers to help them out a bit.

The search results are a bit slow, but they distract you by showing a neat looking genie who is deep in thought.

One thing that bothered me is that after I clicked on something in the map (that is to say, the search results page), I could not go back one step. I had to reload the original query. It’s really frustrating that I can’t take one step backwards. You can do so when you click on the Next Map link down in the lower right hand corner, there is an option to go back to Previous Map, but not if you click on a topic. There is a drop-down list of my recent searches so I can get back that way. However, the interface let me down because my queries were too long so I can’t tell the difference between “raymond chandler” “dashiell hammett” and “raymond chandler” “dashiell hammett” mystery because the words get cut off.

On the search results page if you mouse over the paper looking icons you’ll see the text summary for the site appear to the left. I should mention a small thing, but something I like. When I click on a site, Kartoo counts the number of times I click on it. That’s helpful with a visual interface so that I can quickly see the paths I have already traveled.

RELEVANCY EVALUATION
I’ve been reading, and really enjoying, some classic hard-boiled mysteries by the masters Raymond Chandler and Dashiell Hammett. I wanted to know what influence Hammett had on Chandler. What did Chandler think of Hammett’s stories?

I searched very generally, just using the men’s names, “raymond chandler” “dashiell hammett”. The first thing I did was to click on the two links that figure most prominently in the middle of the map. The first one is a Wal-Mart page selling a book called “Hard Boiled Mystery Writers: Raymond Chandler, Dashiell Hammett, Ross Macdonald.” OK, that is relevant to my search terms, but I was not looking for a shopping site selling a book about the writers. I was looking for information on the web about the writers. It turns out the second link is also a retail site selling the same book, only it’s from AddALL.com and it doesn’t have the three paragraph summary that the Wal-Mart site has.

Two of the results on the map were classified as articles, (in other words, there is a yellow line that connects the listings with the word “articles” implying that both listings are related to that topic) which sounded promising. One was in French and the other was from High Beam, but to view it I needed to sign up with them.

To summarize the remainder of my experience for this first search, each site I clicked on was either a shopping site selling the book I mentioned earlier or a page in French, with one exception: there was a detailed bibliography of Raymond Chandler that includes this nice quote, “Dashiell Hammett may have shown how mean those streets could be, but Raymond Chandler imagined a man who could go down those streets who was not himself mean.” Not exactly a detailed comparison, but a good quote nonetheless.

Over on the left side of the page is a list of twenty related topics. Normally I would acknowledge that my query isn’t very good and would refine or adjust it accordingly. But in this case I will use Kartoo’s related topics as my refinement.

Here are some of them:
Hardboiled mystery writers, Ross MacDonald, Auteur, Roman, Article, Amazon, Fiction, Hard, Library, Mystery, Writers, Book, Matthew

I clicked on hardboiled mystery writers. Doing so creates a new set of topics, some of which are good, such as detective fiction. Some of which are not so good, like isbn and featured.
I noticed that there are different Amazon country sites showing up, such as .UK. and .CA.. There are also other shopping sites like Overstock. Along with these shopping sites there are a couple of decent sites such as a Dashiell Hammett bibliography from a fan site that lists four books about the two writers.

I also noticed that even though I limited my search to English pages only, there were still French results.

Not having much luck with my Chandler and Hammett query, it is time to try a whole different user mission. Lately there has been a lot of news here in California about the resignation of the Secretary of State. So I decided it would be a good idea to learn more about just what exactly that position entails.
I searched for California secretary of state job functions. A cursory glance through the results shows a variety of suggested topics that are not quite relevant. There is logistique supply chain, California whitewater, vacation, features, and so forth. None of the related topics offered me anything useful.

Turning to the site results, there is a site about long distance phone rates, another about whitewater rafting, and a travel site. There is also a report written by the former Secretary of State in 2000. As with the related topics, none of these results are helpful to me.

I clicked on the next map and was shown some job listings, a hotel site, a vacation home rental site and so forth. Again, nothing to help me understand the job functions of this position.

I refined my search and entered in California secretary of state responsibilities. Again there are vacation rental listings, a cat breeder site, a computer store, insurance company, long distance phone service, and so forth. Again, nothing close to my user mission.

CONCLUSION
So what is going on here? I am seeing some obvious issues. One is that shopping sites are being boosted high in results. Although I can see why some shopping sites would be returned for my Chandler/Hammett search, these should have been a relatively small percentage of the result set. And for my Secretary of State search there should have been few if any commercial sites. It seems like the word California created a slew of false positives which would explain the vacation rental and whitewater rafting sites.

Another area I see for improvement is the related topics. By way of comparison, I tried California Secretary of State responsibilities on Clusty and got some relevant clusters, such as Kevin Shelleyand Office of the California Secretary of State. And in the list of results is this helpful site called State Executive Branch Overview that has a section called What are the Duties of the Secretary of State?

And lastly, the results are not fresh. There were too many results from several years ago appearing in my maps. I realize that Kartoo is a meta-search engine so they are relying on external indexes, but they should still be able to improve upon relevancy, topics and freshness of results from the engines they are pulling from.

So yes, there is no doubt that Kartoo is cool looking, but it really needs to create better topics and to return more relevant sites in order for it to be useful. Right now I consider Kartoo a novelty with great potential more than a really useful search tool. It may be that the Kartoo.com web search is just their way of getting attention to the search solutions that they are selling, but if that is the case I think it is all the more reason to improve upon the web search part of their business.

Gigablast

Gigablast

Type of engine: General web search.

SUMMARY
Relevancy of results: Needs improvement.
Freshness of results: Very good, and not only are the results fresh but the index date is listed next to each result.
Features and functionality: Very straightforward and easy to use.
Quality of help and “about us” pages: The Help sections are OK, but I learned more about using Gigablast from reading articles on other sites about it then by reading their help sections.
Business model: Selling ancillary search services, such as enterprise search. No sponsored links, banner ads, or any other form of advertising presented to the user.

INTRO
I’ve been wanting to do a review of Gigablast for a while because the story of Gigablast is inspiring. Gigablast was written by former Infoseek engineer Matt Wells. Matt has a blog of sorts, but he has not updated it since February, 2004. The blog mostly covers Gigablast, but also has notes about other non-Gigablast issues. I like his candidness, such as “Alright you SEO people. Get your bots off my search results.”
The thing I really like about Gigablast is that it knows what its business is, namely general web search, and it simply goes about doing that. There are not half a dozen search tabs or other distractions. Although they do offer some other services, as I will mention later, you can go to Gigablast and it is as obvious as night and day what you can do there. You go to Gigablast, you enter a search and that’s that.

Right on the front page they list how many sites have been indexed. As of January 12, 2005 that number is 1,014,363,952. Over the past four days the number did not change, but I know it was not too long ago that the number was half what it is now.

UI & FEATURES and QUERY EXAMPLES
Really the only feature that users see on a regular basis is Giga Bits, and so I have decided to write about features and queries together. Giga Bits are related terms that appear at the top of every search result page. They append the original search rather than replacing it. Giga Bits can sometimes be used to find answers to natural language queries. For example, for What is the capital of Sweden?, the first Giga Bit result is Stockholm. I happen to know that answer is correct, but if you did not know that you might not pick up on the fact that the Giga Bit term can actually be the answer. So although it is a nice bonus, the way it is implemented right now will not be clear to many users. After clicking on the Giga Bits suggestion of Stockholm, the first answer is in Swedish and there is nowhere to set user preferences for things like language.

Giga Bits is a nice idea, but the current implementation is confusing and it took me a while to figure it out. I am still not clear on what the percentages next to each term mean so I tend to ignore them. I think engines do themselves a disservice by naming things with cute and clever names like Giga Bits. How about just calling it something so that users immediately know what they are seeing? Like related terms or refine your query, etc.

As I mentioned earlier, one really cool feature Gigablast has is that it tells you when each page was indexed. Another very nice thing is that there are a few options next to each search result. You can click on older copies to go to the Internet Archive (see my Wayback Machine review.) and see archived versions of the site. You can also look at cached archive copies, and a stripped version that takes out images and just leaves text. These are all very nice.

Gigablast also has something that used to be common, but you don’t see so much anymore. And that is an easy link to other search engines’ results. At the bottom of each search page you will see: Try your search on google yahoo alltheweb dmoz alta vista teoma wisenut.

As a side note, Gigablast used to default to OR searches, but now defaults to AND. Seems wise to me, particularly now that their index size is growing.

Time to try another query, such asHow does Gigablast make money?.
The first listing is a meta-search result page with a list of work at home businesses (MLMs), a few of which were pulled from Gigablast. Not so good. I looked through all ten of the results on Gigablast’s first page of results and none of them were relevant. Results 8, 9, and 10 are all the same entry from John Battelle’s blog and the only place the word “Gigablast” appears is in his list of search companies in the menu section of the page.

But the first Giga Bit is intriguing: “arrangement with Google”. So, by using that modifier, we get how does gigablast make money? “arrangement with Google”. The first two results are duplicates, as in the same content and same title, but one is hosted by Research Shelf and the other by Free Pint. A tough thing for engines to detect, but nonetheless a bad user experience because they offer no differentiating content. And plus, the content is not relevant. The site mentions Gigablast and it mentions Google, but I thought maybe I’d find some juicy bit about the two companies working together, but nope, nothing like that. Aggh, but it keeps getting worse because not only listings 1 and 2, but all of 1 – 7 are actually the same site, and they all have the same title too. The fourth result is a TinyURL which probably should not be indexed in the first place. #7 is a redirect that takes you right to the same site. For #8 the page discusses both Gigablast and Google, but offers no insights into them having any kind of agreement together. Afraid I was led on a wild goose chase.

I also noticed that on the top of the results page it says, “Results 1 to 8 of about 113,” but it is interesting the way they have done this. Because at the bottom of the page it says, “No more results found. Show relevant partial matches for your query.” But when I click on that I get “Results 9 to 18 of about 231.” So now I am a confused user. When it said 8 of 113 I thought that the 8 were AND matches and the remainder were partial matches, however when I clicked on the partial match link I got 231 results. What happened to 113? Where did that go? Not a huge deal, especially since I think other engines get these numbers wrong sometimes too, but it is distracting nonetheless.

For the next Giga Bit refinement, how does gigablast make money? “Matt Wells”, the first result is a Search Engine Watch interview with him from September, 2003, and it has a great answer to my question:

Q. From a business perspective, Gigablast carries no advertising? Is this a decision you plan to keep? How does Gigablast make money?

A: Money is derived from selling search services on my products page. At this point I don’t think I’ll put up advertisements unless I need the revenue to support Gigablast or myself.

Sometimes, the Giga Bit suggestions are pretty random, like how does gigablast make money? “going to make decisions”. There are four results to this strange query. Three of them are the same John Battelle posting and the fourth is from Geeking with Greg Linden. Greg’s posting is indeed about Gigablast, but not about how they make money. There are other posts about other companies like Google and Technorati making money, but not about Gigablast making money.

Some comments about these duplicate and non-relevant results. It seems that Gigablast is indexing and keeping multiple snapshots of the same site. For the above example, the duplicate listing was indexed in September, November and then in December. Although the site may have been updated during those months, the page is the same page. There is only one page. The difference is that the URLs are not being normalized. So here is what two of the three indexed URLs look like:

http://www.battellemedia.com/archives/000627.php

http://battellemedia.com/archives/000627.php

As you can tell, they are nearly identical except that one has WWW. In certain rare cases these could actually be different, but in this case they are the same. The third instance of this site is this:

http://www.snipurl.com/62pj

I think Gigablast could do a couple of things to resolve this situation. If they have the bandwidth they could compare page content and if there is enough overlap, and the URL is nearly identical, they could conclude it is the same page and only show it once, or at least offer it as a cluster. I understand though that site content comparisons are tough and the devil is in the details. The other thing that might work is to compare the display text for the results. For all three of them, the displayed text is exactly the same. That should be a dead giveaway. And the last thing is to not index Tiny URLs and Snip URLs at all. Although these are handy features for capturing long URLs, I think engines should follow them through and only gather the original URL. Plus, in this case, the original URLs are not even that long.

The interesting thing about this de-duplicating business is that for their XML feeds Gigablast offers the option of de-duplicating results. However, I could not find a way to add the parameter to a regular search string. If I am missing something here, please let me know because it seems odd that for RSS searching this functionality is implemented, but not for regular web searching.

I also wanted to modify my query a bit more, so I tried Gigablast business model.
Result #1 is a collection of entries on a search engine blog. There is an entry about Gigablast, but the term “business model” pertains to another entry about a different engine. Same with #s 2, 3 and 10, which are all duplicates of each other. #8 and #9 are a different set, but same problems. Worth noting is that both sets of duplicates are from Resource Shelf. Why is that? The results are not relevant and they are also duplicates of each other. But then I hit gold with #6, well not exactly gold, but a direct path to gold. It is a SiteLines posting by Rita Vine referring readers to an article by Gwen Harris from July/August, 2004. And here is exactly the information I have been looking for:

At present Wells runs Gigablast without any keyword-activated advertising: there are no banner ads and no sponsored links. This isn’t a matter of principle — it’s just that advertisements slow down the query response time.
Income comes from selling the technology. As Wells explains, he has “built Gigablast to be more efficient than the other engines” to save on time and hardware. Webmasters will be interested in his product line for creating indexes or integrating Gigablast web search results.

So, great I got my answer. But what happened that Gigablast did not take me right to the info I needed? So close, but instead of taking me right to the answer to my question it took me to a site that linked to the answer. The reason in this instance is quite straightforward. Nowhere in Gwen Harris’ article does she use the words “business” or “model”. But fortunately the referring site used the phrase “business model”. I don’t know what Gigablast could do better in a situation like this. If I had continued to modify my query I eventually would have gotten directly to it because I did find the site in Gigablast’s index.

CONCLUSION
I am a fan of Gigablast. I have a lot of respect for what Matt Wells has done. He set out on his own to write a new search engine and he did just that. He scaled the technology so that he can index a billion pages with less hardware and financial investment than the big engines, though of course he still has a ways to go to catch up with the big boys. It seems the crawling and indexing parts of Gigablast are its main focus and strengths. Unfortunately, it is lagging right now in its algorithm and search results.

As with many engines, Gigablast has created XML feeds, but I would rather see them improving their results before adding extras like this. This happens at other engines as well. Even MSN has been building RSS feeds and their relevance needs some work too. (See my earlier MSN review.) I suppose the reason engines are doing this is because it is easier to implement RSS feeds than to improve the elusive beast called relevancy. Plus it may be that different engineers work on each of these, so adding XML feeds in no way impedes the progress of other initiatives like relevancy. Whatever the case may be, things like RSS search feeds are great, I love them and subscribe to several from Technorati and Google, but they are used by the few, not the many, and they are only as useful as the quality of the engine’s results.

What I saw in Gigablast’s results was the following:
• Many duplicate sets
• Many results from one or two sources
• Poor matching for multiple terms. In other words, Gigablast noticed the words in my query on a page, but the words were spread out and not in proximity to each other.

With some improvements to things like this, Gigablast can be a very nice homegrown engine and alternative to the big fellas.

Post Navigation