Monthly Archives: November 2004

You are browsing the site archives by month.

Exalead

Exalead
Type of engine:
General web search with integrated browsing capability.
Overall: Average.*
If this engine were a drink it would be…a French Kiss. It’s French, it has lots of ingredients, and if you take your time with it, it’ll get you where you need to go.

SUMMARY
Relevancy of results:
Average.
*There are good sites in there, sometimes you just have to click around too much to find them. This score is what brings their overall rating down.
Freshness of results: Very good.
I was impressed. Of course I’ll need to check back over time since they just went live and so I assume the index was created recently. They advertise real-time indexing, but I’m assuming (though I don’t know) that it’s just for a targeted subset like news articles.
Breadth of results: Very good.
Not only is the breadth good, but the related terms and categories provide good access points.
Features and functionality: Very good.
Quality of help and “about us” pages: Very good.
Includes keyboard shortcuts and explanations of special features like phonetic search, which I’ll discuss a bit more of later.
Business model: Exalead makes its money by selling enterprise search. The web version of their search is a showcase for their technology. As far as I could tell there were no sponsored links.

INTRO
Exalead is a French company that has been around since 2000, but just recently went live with a beta web search product. Their main line of business in enterprise search, but their web search is a nice way to attract attention. Although I’d heard about them, I hadn’t used Exalead until this review.

They claim they have indexed one billion pages and have plans to increase the size. And hey, a billion pages isn’t too shabby a starting point.

UI & FEATURES
The front door is very sleek and minimalist with its aqua-marine and silvery gray colors, but do a search and you’ll be presented with a lot of information. Although it is a lot of info, it’s very well organized and you’ll be getting around Exalead like a pro in no time. They do a nice job of keeping focused on the site results in the main, central column of the page. The ancillary, though still very useful, stuff is to the left and right sides. Let’s take a walk through it all…

Thumbnails appears on the right side, next to the sites. I’m on a laptop so the images are pretty small, but on a larger monitor I’m pretty sure they’d be clearly visible. Thumbnail images are getting more popular and I think they do have value. However, going back to A9, I’d like to see more engines adopting an easily customizable interface so that I can include or exclude extras like thumbnails. If you click on the thumbnail it loads the site in the bottom half of the search result page. A nice feature is that the search terms you entered will be highlighted on the page. You can also bookmark the result to access it again the next time you’re using Exalead. This is a feature some of the big engines are using, but I’m not a convert yet. Between my browser’s bookmarks, my C drive, my RSS reader, and the online bookmark program I use, I’m not sure I need another set of bookmarks on a search engine. But who knows, maybe someday I’ll be convinced that I want to save at the search level.

Moving to the left side of the page, there are several things to see. For related terms, you can click on the square next to the related terms and it’ll cross the term out. Nice way to track movement, especially for someone like me who is constantly refining queries and trying different things. It gets confusing to remember what I’ve already done. As far as I could tell, the related terms refines within results rather than sending out a brand new query.

Related categories pull relevant ODP categories. And Exalead also is displaying the bread crumb trail beneath each site result when it’s available. Thank you! I’ll never understand why the major engines moved away from doing that. There are several excellent web directories and if you map results to a directory it can only help users. The naysayers love to say that only 5% of people browse, but that’s misleading because a) people don’t browse if the interface isn’t done well, as it is with Exalead; b) for those of us who do browse it is extremely valuable; c) let’s run the numbers. Let’s pick a number that represents how many English speakers search the internet each day. How about 250 million? If 5% of those people browse that means that 12.5 million people browse. Obviously I’m making these numbers up, but you get the point. The number of people who like to browse categories is still in the millions, and that’s even with category browse not being promoted by Google.

Getting back to the features, you can limit results to audio or video. Very nice because a direct link to the file itself is actually returned so you don’t have to click to the page and then have to find the file on the page. This is conceptually similar, though implemented differently, to what BrainBoost does when they “snap open” to the section of the page that is relevant. I like this direct targeting of information and getting users as close to their goal as possible.

Exalead’s Advanced search is quite nice. You can limit by country, language, file format, title, and date. In the search method field there are some interesting choices: automatic word stemming, phonetic search, and approximate spelling. You can also set these on the preferences page.

QUERY EXAMPLES
I wanted to start with a relatively easy query, just to get a feel for Exalead’s interface: wes anderson. The results were all very relevant, but like I said, this is a give-me query. I like the way the results page looks. There’s a lot to see, but it’s easy to understand. If you click on the folder icon it will open a new page with results just from that site; sub-pages, in other words. The related terms were good, things like names of actors who have been in Anderson’s movies, other director’s names, and movies that are similar to Anderson’s films, like I Heart Huckabees. The related categories had Rushmore and the Royal Tenenbaums. It also had Indiana Sports and Recreation, which wasn’t relevant to me.

I played around with the phonetic search for wess andersen, but all the results on the first page included the words wess andersen, and didn’t deviate from the way I spelled it. I also tried wess andersen with the approximate spelling search method. This did return some sites with the term wes, but no variations on andersen. I’m not sure how they’re building their phonetic search algorithm, but it’s a nice feature to have for those occasional things, like names, that you know how to pronounce but not spell. I also tried to search with the phonetic search for information about the German soccer player Karl-Heinz Rummenigge, pretending I didn’t know how to spell his name. I tried karl heinz rumineger, karl heinz ruminiger, and karl heinz rumminiger, but got zero results for any of them. Not so good.

Let’s move on to a query that’s a bit tougher: 2004 world series winner returned NPR.org as the first result. Probably when Exalead did their last crawl the NPR homepage had news about the World Series. The other results on page one weren’t so good. I can tell why each of them were returned, but none of them were about the Red Sox’ win. There was a site about the Little League World Series, one about the Poker World Series, and then a few, like the World Conservation Union, that were not relevant at all. However, all is not lost. Lurking there on the left side of the page was a list of related terms that looked helpful. I clicked on the first one, series winner, and a bunch more sites about poker were returned, plus a couple of other random topics. But no MLB. The next related term, world series winner, did better for me. The first result was an article about Manny Ramirez winning the MVP. The results could have been better for this query.

To refine my query, I wanted to try out some Boolean logic. I entered: 2004 world series winner NOT “little league” NOT poker. And indeed I didn’t get anything about poker or little league. However, the World Conservation Union’s site was not only still there, but it got bumped up in the results. I played around with the related terms and the related categories, but the results just weren’t quite right. The categories were too general, like Recreation and Sports, and the terms looked good but didn’t return good sites.

CONCLUSION
I am really excited to see a new web search engine that not only incorporates, but actually highlights, the categories from a web directory. That alone is enough to keep me coming back. But since not everyone is a devout directory user, there are several other nice and useful features on Exalead. The phonetic search is nice to see applied to a web index, even though it wasn’t really working so well for me. Also, being able to limit to audio or video files is helpful. Their interface is unique in that it offers several ways to get to results, such as thumbnails, previews, etc. Although most of these features are available in one form or another on other engines, Exalead has done a fine job of combining and bringing all these different elements together. Plus they’re searching their own web index.

I hope they put some effort into improving relevance, because that’s the missing ingredient here. With their own index, their unique interface, and their wide-ranging selection of advanced features, Exalead has huge potential.

Wondir

Wondir
Type of Engine:
Question and answer.
Overall: Average.
If this engine were a drink it would be…a shot of Jagermeister. Your first time it’s a bit scary, but fascinating nonetheless, and you don’t really know what the heck it is. But have a drink and you’ll enjoy it for novelty’s sake as well as alcoholic (informational) content.

Intro
To put it simply, Wondir is a collection of questions and answers. There are several search related services, as in non-traditional engines, that I keep reading about on blogs and elsewhere. One of them is Wondir, so I decided it’s time to check them out.

On Wondir, anyone can ask and anyone can answer questions. It’s a community-based model that connects those who know with those who want to know. Google Answers is a similar service, and there were several others in the past including LookSmart Live. But all parts of Wondir are free to everyone and their revenue comes from Google AdWords. (For those of you not familiar with this free way of getting answers to questions, there is also an institution in your town called the public library that is free and you can – for the most part – trust the answers you get there. Or try accessing librarians remotely at AskNow. Check it out, it’s a fantastic service.) Wondir also distributes its question and answer box to vertical, specialty sites. I like that idea because if you go to ichef.com you’ll see what looks like an ichef question and answer box. However, when you ask a question it takes you right to Wondir. This is a good way for Wondir’s positive strengths to be accessed by people who otherwise wouldn’t go to the Wondir.com site.

Warning, prediction ahead:
As an aside, I predict several things that we think of as different are all going to merge: email, RSS, online bookmarking tools, article databases, web directories, and search engines. The line is already blurring, but consider a technology that is neutral in that it doesn’t care where information comes from so long as it’s relevant. So, like RSS but broader, I can subscribe to the content I want while remaining source neutral. But let’s get crazy and add email into the equation. Right now email is delivered and sorted by sender, date, subject, and a few other flags. But why not deliver and sort it by content so that all emails, blog posts, newly indexed articles, and so forth are filtered by topic. Within a couple years people (or at least this person) will not be using Outlook or web-based email the way we think of it now; I’ll be using some kind of reader that aggregates search results, news feeds, email, and who knows what else into one uber-reader. Think of federated search, but with the added advantage that instead of searching in the past on content that has been crawled, indexed, and stored, we’ll be fed in real-time with relevant content from newly published sources like blogs, news, and email.
Back on topic….

UI & Features
Wondir is confusing at first. Not confusing like What is outside the universe? (think about it!), but confusing in that it takes you a few moments to orient yourself. However, you can’t go wrong these days by using the big rectangular “search” box in the middle. I put search in quotes because really it’s an Ask box in this case. But just like a search, you enter your question and then things start happening.

The whole system is very transparent, everything is public, which is probably why it’s confusing at first since there’s so much to see. For instance, the Wondir Question Ticker lets you see what other users are asking and the Question Board shows a chronological list of questions. You can also sort by answered questions or unanswered questions. You can jump to a different day or go to a subject category to see questions by high-level subject such as Games, Travel, Mature Content, etc.

Just like RSS feeds, you can subscribe to be notified when a word or a phrase appears in a question or answer. You can also get answers by IM. Both of those require registration, but you can stay unregistered and click back to the date you entered your question and see if it’s been answered yet or not. Registering really makes a difference though, and any serious users of Wondir should register right away to take advantage of features. Otherwise you’ll be clicking through long lists of questions trying to find ones you’ve asked to see if anyone has answered them yet.

Query Examples
(Query in this case meaning questions.)
I asked, How do you tie a necktie?
I got the response: Your question will be placed in Home Improvement unless you select a different category for it.
I decided my question was really more of a “How-To” and moved it to that category.

But Wondir also returns results from news articles, search engines, and news groups. For search engines, there were five results. Three were relevant from AlltheWeb, and two were not relevant from About. That’s good relevance, but for now I’m testing Wondir’s question and answer service so won’t linger on web results. It turned out someone had asked a similar question to mine already, so there were already answers. Here’s one of them: there is more than one way, why not try, tie rack.com. Well, that seems a reasonable answer, doesn’t it? Unfortunately your intrepid investigator, never willing to stop in my quest to find the truth, was unable to find a site called tie rack.com. There’s a tie-rack.co.uk, but it’s not helpful. Too bad.

I tried a second question: who is the strongest US chess player right now? There weren’t any similar questions already asked, but I was taken to a page with questions about Bobby Fischer, the strongest chess player in US history. Right after I asked the question, it showed up on the question board. I also saw it scroll by on the Wondir question ticker right away. I got an answer in 14 minutes, unfortunately the answer was “Robert.” Some joker trying to be funny. Do you see me laughing? I waited a couple hours but no one supplied a real answer to my question. (Update, I checked back the following day and still no answer.)

There are many useful and legitimate questions. I saw ones about child custody issues, recipes, pregnancy, etc. But it probably comes as no surprise that there are also many nonsensical and silly questions. Here’s a sampling, just for fun:

Question: what can i do about my puppy who just ate 5 snicker chocolate bars????
Answer: report your self to the animal patrol f*#!%r
Answer: …You’re a dumb and unfit dog carer, I’m afraid.

Question: my best friend and i really like each other but we’re taking things slow but he likes another girl and she also likes him. we want to be together but alot of people are in our way. what should i do?

Question: how do you know when to give up the one you love

Question: What is 1 plus 1?
Answer: 6
Answer: Your I.Q.

And so forth….

Conclusion
The interface is confusing at first, but if you spend a bit of time on Wondir it becomes easier to get around. I like the transparency of the service in that every communication is public, but I’d like to see Wondir make navigation a bit easier. There should be easier ways to view the questions and answers that have already been posted. You can get tricky with things like subscribing to terms, but sometimes you just want to cruise around and look for questions and answers about something without subscribing to it. How about a plain old search box for searching all questions and answers?

To me Wondir is a representational example of the Internet in general: it’s communal, meaning you can be connected to people with similar interests, as well as people who have no authority or business answering questions; it’s got a lot of great content, but it’s also got a lot of crap, and sorting through it is the hardest part; and it’s interesting at first, but the real issue is figuring out how useful it is to you.

For now, Wondir is a novelty item to me, but in the future maybe I’ll incorporate their content into my RSS feeds so I can be notified when topics I’m interested are asked about. But I would like to add that there were tons of questions coming in, and tons of answers too, so it seems that there are people who really find this service valuable. Even if it’s not valuable to you, it is worth playing around with.

MSN

MSN (beta-search)
Type of Engine: Full-scale Internet search engine.
Overall: Average.
If this engine were a drink it would be…a Cosmo, but hold the Triple Sec. It’s got most of the basic ingredients, but something’s missing. And without that sweet ingredient, in this case relevance, it just doesn’t taste right.

Intro
I’ve shied away from reviewing the big kids because I didn’t feel I could add anything to the abundant amount of literature already out there, but I couldn’t ignore MSN’s beta search. Its release is simply too important to the industry. It floors me that anyone expects MSN’s newly built search to go live in Beta and be better, or even as good, as Google. But the mainstream media, as well as industry watchers, all write about how it compares to Google. Fine, that’s a legitimate question to ask, but I’d like to step back from that comparison and simply judge MSN on its own. It could be argued that MSN might have done better waiting until it has an excellent product, but that isn’t Microsoft’s way of doing things. Building a large-scale search engine with crawling, indexing, and an algorithm is a huge –large, big, mammoth– task, but I guess since they’re Microsoft, it’s expected that they can do it. I’m going to write this review as if MSN search is a new company and not the search arm of the largest software company in the world. Even though MSN has offered search for years, it’s always been farmed out to other companies, until now.

UI & Features
I really like the Search Builder because it’s surfaced right there on the search page. It’s right under the main search box. The advanced fields they currently have are useful, but I bet MSN will be really juicing these up in the future. Be sure to play around with the sliders, one of the industry’s latest fads, under Results Ranking. Yeah yeah, I know, only 2 or 3% of users use advanced logic. But maybe that’s because engines don’t let you build the logic the way MSN does now. In my opinion this is the single best front-end feature of MSN’s search, and I hope it jumpstarts the industry to be more creative in helping regular users build complex queries through natural language prompts.

There’s a settings page that lets you do the basic stuff like porn filtering and language preferences. There’s also a place to set default location for local search.

Following in the wake of Google and Yahoo, MSN search now has a blog. They also have a good About Us section, especially the Web Search Help page. I like this kind of transparency, it’s very helpful.

Query Examples
I was at Internet Librarian for a few days, and when I got home my motorcycle wouldn’t start. So what better query to test than instructions for jumpstarting a motorcycle? The very first result was a PDF file. Unfortunately I didn’t bother to look closely at the file extension so when I clicked the link it opened a 76 page PDF file. Not cool. I take full responsibility for clicking on this link, but how about helping me out by putting a PDF icon near the display title instead of in faded letters after the URL. I did a Control F to find motorcycle in the PDF document and the only reference was to inflating a motorcycle’s tires. Not helpful at all. The next result was a blog by a guy from Texas. There are some motorcycle references and the word jumpstart appears, but the ideas are not connected at all. Not helpful. Result # 3 was equally bad, another PDF file, this one having to do with campus security at San Jacinto college. I clicked on every one of the ten links on the first page of results and they were all bad. Not a single one, NOT ONE, was relevant to my query. So what the heck is going on here? I’d say there’s something funky with their proximity matching. Several of the sites were quite large and had a lot of text on them, so it seems like MSN is indexing all the words on the page but paying no attention to how close the terms are to each other. I don’t know if that’s a problem with their index, or if their algorithm isn’t taking proximity into account; if I had to guess, I’d say the latter.

OK, so maybe my query needs to better; computers don’t make mistakes, humans do. I tried jumpstarting motorcycle battery. The first difference I noticed is that this query produced four sponsored results whereas my previous one had none. Do I need to say that all of the sponsored sites are irrelevant? Can you guess what they’re about? They want to sell me a new motorcycle battery. The second thing I noticed is that my Texas blogger is back and so is my 76 page PDF file. But wait, there’s good news. The first result took me to a page with a link to a somewhat relevant subpage. To be specific it’s a message board from the Philippines with various people discussing the topic. I wish MSN had taken me right to the page, but at least it got me close. The second result is pretty good, it’s about batteries for one particular motorcycle model. It doesn’t give actual instructions, but it’s close. The rest of the results are not so great, but they’re better than my previous query.

Time for another query refinement, this time including the name of my bike: Suzuki gs500e jumpstart. Not many results for this one, only one sponsored result from eBay and three not relevant web results. Same problem again, the terms appear on the sites, but nowhere near each other.

A new approach was called for: Search Builder. Going back to the search homepage I entered the following:
Search Terms: jumpstarting a motorcycle (I required the exact phrase).
Site/Domain: *blank*
Links to: *blank*
Country/Region: *blank*
Language: English
Results Ranking: I slid the first bar far towards Exact Match. The second bar, for popularity, I left in the middle. And updated recently I also left in the middle.
…NO RESULTS…

At this point I’m a sad customer and I leave. Or instead of doing it myself I just give up and call a tow truck to jumpstart my bike for me.

I also wanted to do a local search: florist (with location set to Loudonville, NY). Before I get to relevance, one bad thing is that I can’t change my location on the fly. I have to click into Settings and do it there. So even though I’m in California, I wanted to buy some flowers for my sister-in-law so I had to change my default location for her region. Ignoring the sponsored sites, the first few results are good enough that I’m happy. Some of the following results are slightly off, like a weather page that also has a link to shopping categories like florists. The only complaint I have is that from the display data it doesn’t look like any of these florists are actually in Loudonville, but they are in the Albany area so I assume they deliver to Loudonville.

They’re also providing some direct answers through Encarta and other sources. You can ask questions like who is the prime minister of the united kingdom? And MSN will show you this:
Answer: United Kingdom: Prime Minister: Tony Blair
You can also do mathematical calculations (e.g. 4 *4), dictionary definitions (e.g. define laconic) and measurement conversions (e.g. how many centimeters in a mile). I tried other questions like who was jefferson’s vice-president, but it didn’t work.

MSN has tabs for News and Image searches. I won’t go into a full review, but I played around with both of these and they were both good. The news was recent and included authoritative sources, and the photo results were relevant.

Conclusion:
Within the context of being a brand-new search tool, it’s not bad. There’s definitely a lot of documents in their index but their relevancy needs a lot of fine tuning. Right now it’s not entering my regular repertoire, but when it takes the next step past this Beta version I will reevaluate it then. From the outside looking in, it seems like they’ve built up their crawling technology, but I’m not convinced they’ve done enough with their relevancy algorithm yet. Nonetheless it’s always exciting to have a new large scale engine appear with its own crawling and indexing technology, and this is no exception.

Wayback Machine

Wayback Machine (part of the Internet Archive)
Type of Engine:
Not so much an algorithmically based engine as it is an access point to archived versions of web sites. And an essential tool for searchers.
Overall: Very good.
If this engine were a drink it would be…Cola. I can’t make my favorite drink, Jack and Coke, without it, but I don’t drink it by itself. Just like you can’t search just with the Wayback Machine, but if you mix it with your favorite search engine you’ll be a happy customer.

Intro
Although the Internet Archive has been around for 8 years, and although they’re not really a search engine per se, I love what they do so much that I wanted to write about them for the Lounge. Their goals are lofty, inspiring, and unique. In my opinion they are one of the most important sites on the web. Founded in 1996 by Brewester Kahle, the Internet Archive is a public nonprofit organization whose goal is to create and keep regularly scheduled snapshots of as many web sites as possible. The Wayback Machine is the interface for viewing these stored versions of web sites. The Archive also archives movies, audio files, and books.

Kahle has stated that his organization’s goal is to store everything. “It could be one of the greatest achievements of all time” (2004). It is a clear and powerful mission statement. It has vastly significant consequences for future generations. We can not begin to fathom how differently the last 1,500 years would have been had Alexandria’s Library been preserved and its knowledge not lost. The same can be said for other information that has been lost. “The early manuscripts at the Library of Alexandria were burned, much of early printing was not saved, and many early films were recycled for their silver content” (Kahle, 1996). Although 1,500 years from now scholars will probably not be very interested in a personal web page dedicated to someone’s dog, there are countless other web sites that do contain valuable information on many subjects. Kahle does not want history to repeat itself by society losing valuable information that’s stored only on the web. For preserving books, he has even gone so far as to suggest that every book in the Library of Congress could be scanned. He says that universal access to all the knowledge in the Library of Congress could be had for around $280,000,000. He estimates he can scan and digitize all 28,000,000 books for $10 each. And in terms of the web, he says it “is growing at about 20 terabytes of compressed data a month, which is manageable” (2004). OK, if he says so.

The Internet Archive has been very successful in taking snapshots of millions of web sites, but there is still the major challenge of providing access to it all. Currently they have addressed this by creating an interface called the Wayback Machine that lets users view archived versions of web pages. Just type in a URL and all the archived versions of the site will be presented. The Internet Archive has mentioned here and here that they want to create a textual search interface to its archive, but no such interface currently exists for the public. This is the main area for the Internet Archive to improve in. The Wayback Machine, although incredibly powerful, needs to be augmented by text searching so that users can locate archived web sites by topic. Of course that’s no easy thing to do, but since IA is affiliated with Alexa (also founded by Kahle), maybe Alexa can share its indexing capabilities. Easier said than done.

UI & Features
There’s really not much to say. You just type in a URL, hit “Take Me Back”, and there you go. On the Advanced Search page there’s a few more options such as merging aliases, a.k.a. de-duplicating, where yahoo.com and yahoo.com/index will be mapped to each other. There’s also a function to compare two snapshots but unfortunately this wasn’t working for me.

Query Examples
Go as obscure as you want and there’s a good chance the Wayback Machine will find it. First I tried a search for a Bukowski page I know of. The first snapshot was in March, 1999, and then is followed with periodic snapshots since then.

I then tried something less obscure, http://www.mlb.com, as in Major League Baseball. The first archived page is from December 22, 1996. For 1996-1999 there are only 1 to 5 snapshots per year. In 2000 things started to kick in, but since 2001 there’s been snapshots on almost a monthly basis. But let’s take a closer look just for fun. It turns out that mlb.com was owned by Morgan, Lewis and Bockius, a law firm. Then beginning with the October 9, 2000 snapshot it becomes the homepage for Major League Baseball.

It’s not clear to me why on some days there are multiple snapshots, but it doesn’t really bother me.

Conclusion
If the Wayback Machine ever goes live with a good searching interface it will be incredibly potent. Not only would you be able to target specific web sites by URL, but you’d also be able to search archived versions of all that valuable content that has been lost from the web. Imagine the power of that. It’s the Internet version of a library with its online catalog plus access to the content of all the books the library ever had.

Knowing that the Internet Archive exists and is working quietly behind the scenes makes it easier for me to sleep at night.