The Search Lounge

1/26/2005

Local Search

Filed under: — Chris

For the Search Lounge I have been writing reviews of specific engines, but I wanted to try a different tack. I want to focus on types of search, or to put it another way, user missions. I recently did a foray into this by writing about shopping searches on Google and Yahoo, and in the future I plan to do blog and other types of search. I hope writing from the perspective of the type of information that is being sought will be useful.

Intro
For years people have predicted that local search will be one of “the next big things” in the search industry. I don’t dispute that. Local has always been a gaping hole on search engines. Recently, the major engines have started pushing their local search, so I thought it would be a good idea to check them out and see how they stack up against each other. Just to be clear on definitions, local search is search that is targeted to a US city or region.

I will look at each of the big four: Ask Jeeves, Google, MSN and Yahoo. A lot of the data these engines use comes from other sources, but I will focus on the user experience coming through each engine. For most users, it means little where the backend data comes from, even if Ask Jeeves and MSN both use Citysearch.

A major comment I have about all four engines is that their search is focused on local businesses. That is definitely valuable, but I hope they will expand that and allow for other local searching. Like maybe I want to search for general web sites about my city, or for bloggers who live near me, or for local government information. Of course I can use the general web search for this type of thing, but eventually I hope it will be integrated into the local products.

Conclusions
(I’m putting the conclusions earlier in this article to save people the pain of reading all the gory details. But for those of you who want the details, they are included below.)

Google and Yahoo are my preferred choices, with Google being the slight winner. Because I live in San Francisco and have so many other options for local information, MSN’s portal features, about which I will mention more later, are not particularly compelling. For other users, or for people from other cities, it very well may be different.

To provide more context, MSN is the only one of the four that is a full local portal. The other three are more search-based. So, depending on what you’re looking for you’ll want to use different ones (gee, what a surprise). I think Google offers the best search, but MSN’s browsing options could be useful. Yahoo stands out because they control their own data and in the long run that will set them apart. Ask didn’t really stand out to me in any significant way.

It’s interesting to think about the aforementioned strengths of each local engine because they accurately reflect each company as a whole. MSN is a destination company, Google is a search company, Yahoo is a destination/search/media company, and Ask is hanging with the others, but needs a little more oomph.

All four engines default to my saved search location, but Ask and Yahoo also keep a list of other recently searched locations for easy access. I find that feature useful because although I live in San Francisco, I also often search for information about Santa Cruz and San Diego, All four engines have useful help pages dedicated specifically to local.

Breakdown by Engine

Ask Jeeves Local
Strengths:Saves recent locations and searches.
Areas for Improvement:Customizing Citysearch’s data; improving local news.

Ask Jeeves local, which is tagged as still being a beta release, is focused on business listings. In fact, when you are on their Local page, the defaulted highlighted search tab is simply called Business Listings. They also have tabs for Maps, Directions, Local News and Weather. Maps and Driving Directions could probably be consolidated in one tab, but not a big deal.
When you search on Ask, you go to a Citysearch results page, though it is branded to look like Jeeves until you actually click on a listing and then you go right to Citysearch. The order of results is different on Ask than on Citysearch. I guess Ask is overlaying their own algorithm onto Citysearch’s data. Another difference is that on Citysearch you can sort by Best Of, Distance, Alphabetical, or Top Results. But on Ask you can only sort by Distance or Ratings. It seems that since they are using the same listings data, even though they are ranking it differently in search, they would be able to use the other sort options.
The results are sorted by distance as the default, but distance from what exactly? I couldn’t tell. With each listing there is a user rating, the address and phone number, and links for maps, directions, and a website, if one exists.

Query Examples
Vegetarian restaurants -location: San Francisco.
Three of the first ten listings are indeed vegetarian restaurants, but the other seven are not. There’s a sushi restaurant, a taqueria, an Italian restaurant, and so forth. I looked at all of them and most of them have vegetarian listed as one of the cuisines, but not all. For instance, Max’s does not have the word vegetarian anywhere on the page. So why was it returned at all? I don’t know the answer to that. I even looked at the HTML code and couldn’t find vegetarian anywhere. And although the other restaurants do have vegetarian as one of the cuisine options, that is not what I was searching for. I was specifically searching for vegetarian-only restaurants. Although the engine might be forgiven for not realizing my specific user mission, in this day and age just about every restaurant has at least one veggie option, even if it’s just pasta or grilled cheese, so showing me non-vegetarian restaurants that have vegetarian options is not quite good enough. But this problem is really Citysearch’s, since Citysearch is the one entering the data. So Ask might be forgiven for doing what it should: searching Citysearch’s data. But then again, as a user I don’t care, all I care about is finding what I’m looking for.

Jeeves has a tab for local news, but it is a bit confusing because the search box prompts for: City, State or Zip. OK, so I enter San Francisco. The results consist of 10 local news items, most of which are relevant. Some crime articles, some events, that kind of thing. But if I just wanted to see local headlines I can go to a local newspaper site. What I want is to be able to take advantage of Ask’s search technology to search articles from multiple sources. The second article is about a major drug bust that happened in the city, so I should get relevant results if I search for drug bust. However, I don’t. What I get instead is news from around the world about this topic. Since I am still in the local tab, I will give them the benefit of the doubt and say the search is being run against local sources, but not MY local sources. There is an article from Boise and one from Palm Beach. The article from San Francisco is there too. If I refine my query and try San Francisco, CA drug bust I get no results at all. The local news tab is not offering me much.

Google Local
Strengths:Relevant and extensive listing of results from their web search that match the local listings; results are shown on a map.
Areas for Improvement:All searches first return only business listings. So a search for something like reviews of pinball bars – location: San Francisco doesn’t return general review types of sites, just specific listings. However, I only call this an area for improvement because I know they have the data and it could be incorporated.

Google local, another beta release, is also focused on business. It says right there on the page: Find local businesses and services on the web. They are pushing Keyhole. When I first heard of Google’s acquisition of Keyhole, I thought they were off on a tangent. Then I heard Chris Sherman’s keynote speech at Internet Librarian in November and he explained how Keyhole will integrate with Google’s search so users will be able to actually see locations. I’m not 100% behind it yet, but I can certainly see potential.

Query Examples
vegetarian restaurants – location: San Francisco. Right off the bat I am impressed with what I see. Nine out of the first ten listings are vegetarian-only restaurants. Yea! They are really targeting vegetarian restaurants, rather than those restaurants that have some vegetarian dishes. I really like the interface. Along with the expected info, like phone number, address, and web site, there is also a nice map displayed right next to the listings. The only drawback I see is that although I targeted San Francisco, half of the restaurants are actually across the bay in the East Bay. Fortunately I have the option to search within 1 mile, 5 miles, 15 miles, or the default 45 miles. It’s a small thing, but I think the default should be 5 or 15 miles.

At the top of the page, Google shows three sponsored links. In this case, they were fairly relevant. One was for a vegan store, another for Citysearch, and the third for Green’s vegetarian restaurant.

But let me get to the best part, which is clicking on one of the restaurant’s links. Google takes you to a search results type of page with references to the restaurant. They seem to be sending the name of the restaurant to their general web index and returning matching results. It’s really a great feature because it provides not only the restaurant’s homepage, but also many reviews. All the results for Millennium Restaurant were relevant to the restaurant. The only suggestion I can make is that I wish Google’s URL was easier to parse and understand. Here’s what it looks like: http://local.google.com/local?q=vegetarian+restaurants&hl=en&lr=&c2coff=1
&safe=off&sa=G&near=san+francisco,+ca&radius=0&latlng=37775000,-122418333,8758249238891447007
I thought that really long number at the end was a cookie, but actually now I’m guessing it’s some kind of internal mapping ID number they’ve generated. Otherwise, I don’t see anything that indicates how the phrase“Millennium Restaurant” was sent against their web search.

MSN Local
Strengths: Full local portal.
Areas for Improvement:MSN local search is the same as Citysearch’s search.

MSN uses Citysearch’s data for search, but they also have a lot of other content. The front doors for San Francisco are different on MSN vs. Citysearch7.com. MSN, being a portal and all, is pushing its own properties for things like news and shopping. But as soon as you do a search, MSN kicks you over to the Citysearch interface. I also noticed that when I come through the main MSN.com homepage there is a traffic option, but I couldn’t find it on the local page.

Query Examples
vegetarian restaurants –location: San Francisco. Well, the results are different from what I got on Ask, but it comes as no surprise that the same thing is happening: non-vegetarian restaurants are being returned along with vegetarian-only restaurants. And so my thoughts for Ask are the same for MSN. Yes, this is Citysearch’s issue because of the data they are providing. But, as the interface to that data, MSN may want to address this type of thing.

Where MSN differentiates itself is that it is a true portal. It has event listings, job listings, sports news, and so forth. They are pulling a lot of content from a lot of sources. Each sub-page has a similar interface. You get to these pages by browsing from the local front door.

Yahoo Local
Strengths: Searching their own content (I think); traffic (as in cars and roads, not site traffic) monitoring.
Areas for Improvement:More integration with general web search results.

I really like Yahoo’s integration of real time traffic monitoring. Not exactly search, but nice nonetheless. Yahoo states that they do not sell rankings, but they do offer businesses the opportunity to enhance their listings by adding visual elements. They also offer regular users the ability to contact Yahoo Local in order to add or update listing information.

Query Examples
vegetarian restaurants – location: San Francisco. Same problem as Ask and MSN, many of the results are for restaurants that are not vegetarian only. At first I thought Yahoo was simply text-searching the full restaurant description, because a sushi restaurant created a positive match because it had this text: “Cooked seafood and vegetarian dinners are available.” However, Yahoo also offers a “vegetarian restaurants” category, and clicking on that did not change the results at all, so obviously the restaurants were categorized into the vegetarian restaurants category. In this case it looks like the editorial guidelines were a bit loose, though I do understand the logic behind it. I poked around in the HTML for an Italian restaurant and found the following metadata: Category Types: Vegetarian Restaurants, American Restaurants, Barbecue Restaurants. Interesting combination…vegetarian and barbecue.

There are two sponsored results. One is a local restaurant review site, which is reasonable. But the other is for a hotel. I happen to know that the hotel is the hotel where one of San Francisco’s best vegetarian restaurants is located, but many people won’t clue into that.

There are some nice refinement options, such as by rating, price, and atmosphere, and you can choose to view results on a map. It would be nice if Yahoo integrated their general web search results the way Google does it.

1/24/2005

Shopping Searches on Google and Yahoo

Filed under: — Chris

(Taking a break from the Defining Relevancy series. I will return to that again later.)

Intro
Google and Yahoo have relevant results for many non-commercial searches, but I am often disappointed by their results for shopping queries. Google and Yahoo have become every search engine optimizer and spammer’s target, and so they get “hacked” by spammers fairly often. They also get bombarded by the big shopping sites such as Amazon, eBay, Bizrate, etc. These companies depend on prominent search results on Google and Yahoo in order to survive, and so they target their efforts to doing just that. Oftentimes these sites are useful, but it troubles me that a small group of sites are dominating search results. Not only does it limit the immediate usefulness for users, but it also limits the opportunities of other merchants to sell online, and that will hurt us all in the long run.

I would like to look at some specific queries. To be sure that my queries are recognized as being commercial in nature, I have formulated them all to include the word“buy” in them.

Google
buy printers
The first result is from ZDNet, the second is from CNET. Remember when CNET bought ZDNet? The two sites are indeed different, but take a look at the display titles:
Buy printers - Best printers - Compare printer prices - ZDNet …
Buy printers - Best printers - Compare printer prices - CNET …
Looks fishy to me. Shopping.com and Shop Genie show up prominently as well.

buy ipod
Where is the official Apple store? Nowhere. And look, there is Shopping.com and Shop Genie again. Hi guys!

buy sports goggles
A scan through the results shows some of the usual suspects; My Simon, Dealtime, and Amazon in this case.

buy soccer ball
The first four listings in order are Bizrate, Bizrate (again), Epinions, and Amazon.

And check out this query:
buy some time before I die (intended to be somewhat nonsensical).
Amazon gets the first two spots and there’s Epinions at the bottom of the list.

What is my point with all of this? My point is to demonstrate that there are certain sites which consistently show up in Google results if the query includes “buy” or is otherwise shopping related. Is this bad? I do think it is problematic. Amazon, although a great shopping site, may not always be the best place to buy everything. But they do have a monopoly of sorts on Google for shopping searches.

Yahoo
Let’s see how Yahoo is doing with these same shopping queries.

buy printers
Buy.com, Bizrate, and ZDNet are there. But so are the homepages for Dell and HP. That sounds good at first, but since I am being nitpicky, why are the homepages returned and not Dell and HP’s printer pages?

buy ipod
Amazon has positions 3,4, 9 and 10. Not good. Why four positions instead of just one or two? Official Apple sites show up in positions 2, 5, 6 and 7; though again, do I need four Apple sites in my top ten? Buy.com sneaks in to number 8. Is this better than Google? Yes and no. It is much better in that the official Ipod site is included, but it is worse in that less diversity of results are returned.

buy sports goggles
Again, we see certain suspects showing up. Overstock, shop.com, shopping.com, and eBay. However, some other sites have managed to sneak in as well, which is good.

buy soccer ball
The first result is from soccer.com. Sounds promising, but it actually goes to the homepage and not the soccer ball page. Number 2 is a poster from Art.com. The rest of the results are not so good, there is a soccer ball rug, a soccer ball piñata, and a soccer memorabilia store.

buy some time before I die
Amazon and eBay are there sure enough, but there are also some other sites to provide diversity.

Froogle and Yahoo Shopping
Overall, my shopping experience has been passable, but not great. Of course I am not the first to point these things out, and I suppose the two companies realized this long ago which was why they both launched shopping tools. Google has Froogle, which for some reason is still in Beta even though it launched in 2003 and is even one of the tabs on Google’s front door. Yahoo has Yahoo Shopping.

Looking at some examples from above, the results are much better on both engines.
Froogle:
buy ipod
eBay is still there, but the other results come from a variety of merchants. Not only that, but the interface is much more advanced, allowing users to do things like sorting by price and by store. There are also thumbnails to preview the products.

buy soccer ball
Shop.com and Buy.com are prominent again, plus Froogle’s own comparison option. Overall, good results.

Yahoo Shopping:
buy ipod
Buy.com is in there, but also a whole variety of other merchants. And the interface is sweet. There are previews and some really good sorting options, such as specifying desired hard-drive size.

buy soccer ball
Buy.com front and center in position 1, but the remaining listings are diverse, which is good. But, most of the listings are selling soccer ball related things like video games, shin guards, soccer ball charms, etc. That is very bad. If I wanted those things I would search for those things. Trust me.

Conclusion
For shopping queries it is not worth doing general web searches on Google and Yahoo. Although many users optimistically think they should be able to do shopping searches using general web search, the reality is otherwise.

Google recognizes and admits this and has Froogle results promoted above its general web results. But Yahoo does not do that. I think for general web searches they should promote their shopping results for certain queries, since the user experience is better on Yahoo Shopping. The trick there is to base it on queries, so that only a percentage of searches trigger Froogle and Yahoo Shopping suggestions. I do not want to see Froogle or Yahoo Shopping recommendations for non-commercial queries. Both engines have probably generated heuristic lists of commercial queries over time; these lists could be used for this purpose. Maybe Google is already doing so because soccer promotes Google News instead of Froogle, whereas soccer ball correctly recommends Froogle results.

Both engines could improve their targeting for shopping queries. Sometimes the results are too broad and other times too specific. As a user I can be trusted to search for what I want to buy. If I want a soccer piñata I will search for that. So instead of doing broad matches to my query, the product results should be specific and targeted.

And lastly, it would benefit users if there were more diversification of results. Whether it’s done through source clustering or another method, providing a broader range of merchants will improve my online shopping experience.

For shopping searches, users should use Froogle and Yahoo Shopping. And the engines should promote those options for relevant searches that are conducted using their regular web search interfaces.

1/20/2005

Search Engine Relevancy. Part 3: A Call to Arms

Filed under: — Chris

A Call to Arms

[Part 3 of a series about relevancy.]

Two years ago, on December 5, 2002, I was working at LookSmart when Danny Sullivan at Search Engine Watch published a short piece called, In Search of the Relevancy Figure. He wrote:

“Where are the relevancy figures? While relevancy is the most important ‘feature’ a search engine can offer, there sadly remains no widely-accepted measure of how relevant the different search engines are. As we shall see, turning relevancy into an easily digested figure is a huge challenge, but it’s a challenge the search engine industry needs to overcome, for its own good and that of consumers.”

It was a call to arms for the search industry to come together and figure out acceptable relevancy metrics. Enough with the empty claims about relevancy, it was time that some standards were set in place so that the public could know definitively which engine was the most relevant. It is a noble, but nearly impossible, idea.

At the time I was running a team that compared the relevancy of search results from a variety of engines. Our insights were used by the executives to make business decisions and by the search engineering team to help improve the company’s own algorithms. Danny Sullivan’s article was sent around the company and commented heavily upon, but in the end we agreed that relevancy figures need to come from the outside. We could advise about our methodologies and analyses, but that was all. After all, would a newspaper trust a book critic who worked for a publisher to review one of that publisher’s books? No, and neither will the public fully embrace a relevancy figure generated by a consortium of search engine companies, no matter how good the intentions and methodologies are.

The single biggest problem with relevancy figures is the devastating, and in some cases illegitimate, damage it can cause a search company. Relevancy is not a universal figure, it is always subjective. It is not one magic number that encompasses all queries for all people. I am not arguing against reviews and criticism of search results; after all, critiques can provide search companies with solid analysis to build upon as is my goal with the Search Lounge. But if Time Magazine or Newsweek published a cover article saying one engine is far and away the most relevant, imagine the effects. Users would desert the other engines and flock to that engine. And that is not right. Users need to use the engine that is best for each unique information need they have.

As one former colleague astutely pointed out to me, it is similar to what happens when magazines publish lists of top universities, top hospitals, top doctors, etc. The rankings are generalized and with the publicity and hype that comes along with them, the winners get to make the rules. But each student and patient has unique needs that may or may not be best served by the university, hospital, or doctor that ranks the highest. The same can be said for search.

Relevancy analyses are often comprised of multiple sections or tests. There may be a part that looks at certain types of queries, such as geographical or shopping, or at queries with a certain number of words, or at natural language queries, or popular queries, or news stories, or ambiguous queries, and on and on. One engine may lose the overall relevancy test to another engine, but might win for local queries because they have targeted zip codes and city-level results. So if every user abandons ship and only uses the overall “winner”, then for local searches they will be getting inferior results. This notion can be taken down to the specific query level where an engine may have good results for chess openings and bad results for chess books. You be the judge! The point I am making here is that an overall relevancy figure sabotages the end goal of helping searchers.

Another major problem is the number of possible ways to evaluate relevancy of search results. And I guarantee there is absolutely no way the industry – that is to say the major search engines, namely Ask, Google, MSN, and Yahoo - would ever agree on one relevancy figure. It just will not happen. Think of these analogies: are you getting all the relevant newspaper articles on a topic if you read only one newspaper? Are you watching the funniest sitcoms on TV if you’re only watching one station? And most pertinent to this topic, are you getting the best books on a topic if you only visit one library or bookstore? The answer to all of these is a resounding NO. And the same is true of search engines. You are absolutely, completely, definitely, not going to get all the best results on one search engine.

Another problem is frequency of analysis. Engines update and release new products so often that it is a full-time job keeping up. Plus there are new search engines that go live every month. I do not have the time to fully analyze and do a Search Lounge review for every single new improvement and release, but I can run one or two or even five queries on a new release or new engine to see if it passes the acceptability barrier. These days most, but not all, engines meet a minimum threshold of acceptability. But a minimum threshold is far from good. And even an engine that is not passable may update its index the very next day and overnight the results may be dramatically better.

The back-end behind each engine’s crawling, indexing, and algorithm technologies is far too complex to produce the same results, and the queries each person will enter during multiple search sessions are too diverse. There are simply too many variables. I’ll throw out one last analogy (I promise): if four people are told to make chocolate chip cookies, will all four taste the same? No. And going one step further, will the tasters all agree on which one is the best? Maybe, but probably not, assuming they all meet at least a minimum threshold of quality. So even if a report is released saying that an engine is the most relevant as judged by a fully objective, scientific study, the counterattacks from the other engines will be swift, immediate, and oftentimes legitimate. The media would be awash with a blizzard of PR releases explaining why the test was incorrect, why the winner did not really win, and why the losing engine is actually more relevant and getting better every day. And there we are, right back where we started with searchers not able to trust corporate press releases.

Next Installment - Part 4: Using Different Engines

Search Engine Relevancy. Part 1: Defining Relevancy
Search Engine Relevancy. Part 2: The Jaded Surfer

1/18/2005

Search Engine Relevancy. Part 2: The Jaded Surfer

Filed under: — Chris

The Jaded Surfer

[Part 2 in a series about relevancy.]

Search engines love to tell us that they are the most relevant, and I don’t blame them. A glance through any engine’s press releases will include claims like “most relevant update”, “a dramatic increase in relevancy”, and so forth. These conflicting claims are like political rhetoric. Ultimately they have the opposite effect of what was intended, because we are all becoming jaded searchers. Here are some examples, and be sure to note how many claim to be the most relevant:

  • With the largest index of websites available on the World Wide Web and the industry’s most advanced search technology, Google Inc. delivers the fastest and easiest way to find relevant information on the Internet.
    http://www.google.com/press/pressrel/aol.html
  • The [MSN] index, much of which is updated weekly or even daily, provides the most relevant, timely and accurate data as quickly as possible, while minimizing frustrating dead links. http://www.microsoft.com/presspass/press/2004/nov04/11-11SearchBetaLaunchPR.asp
  • [Accoona] empowers users to find the most relevant information through its Artificial Intelligence powered search engine.
    http://biz.yahoo.com/bw/041206/66005_1.html
  • A pioneer in Web search technology, Inktomi provides millions of users worldwide with the freshest and most relevant search experience, and ensures that thousands of online retailers and other sites have their content constantly represented.
    http://docs.yahoo.com/docs/pr/release1110.html
  • [Mamma.com]…makes it easier and faster for people to find information by gathering the most relevant results from the best search engines on the Internet. http://www.mammamediasolutions.com/corporate/pr/2004/08-11-04.html
  • Yahoo! Search offers one of the most extensive search services available to ensure consumers find the most relevant results to their search.
    http://docs.yahoo.com/docs/pr/release983.html

Search companies must be allowed to say their product is relevant. Otherwise, how can they market themselves? I am not disputing any of these company’s claims or passing judgment on their right to proclaim their search product as being relevant. The issue I am emphasizing is that these claims are subjective and need to be understood as such because they can not all be the most relevant, easiest, most extensive, and fastest.

Relevancy is like Pornography
So, with all that being said, what exactly is this elusive specter called relevancy, and how is it identified? It’s like the classic question: what is pornography? We don’t know how to define it, but we know it when we see it. Similarly, search engine relevancy is subjective and means something different to everyone. And not only that, it can mean different things to the same person at different times. A relevant result can be the site that provides the exact answer to a question; it can be the authority in its topic that provides a broad selection of information; or it might be a new site in a topic that is already known very well. That is why relevancy evaluations must be comparison-based. I know that the results on one engine are bad because another engine has better results. If the other engine did not exist, then my level of expectation would be lowered and it is possible that the first engine’s results would seem relevant to me. Underlying this notion is the notion that both engines pass a minimum threshold of relevancy. It is conceivable that the results could all be not relevant, in which case the comparison does not even come in to play.

Relevancy evaluation changes based on the types of information being sought. Each and every query for information needs to be reevaluated every single time. Users must never think that the results on their favorite engine are always the most relevant. If searchers do not find what they want, they can do a few things: they can come up with a new search strategy and approach the problem from a different angle; they can stick with the same strategy, but refine the query by making it narrower or broader, or by using advanced options and syntax; and, lastly, if the engine is still not finding what they are looking for, they can go elsewhere and try a different search tool.

Next Installment - Part 3: A Call to Arms

Part 1: Defining Relevancy

Search Engine Relevancy. Part 1: Defining Relevancy

Filed under: — Chris

Relevancy

[Part 1 in a series of postings about relevancy.]

Relevancy is subjective. Each searcher will have a different evaluation of a search tool’s relevancy, and not only that, but each searcher will change that opinion based on the specific search being done. Search relevancy is a moving target that will never be agreed upon. Novice searchers should look to experts for advice, but in the end must reach their own conclusions about relevancy. Those conclusions must be based on using a few search engines, because relevancy is contextual and can only be understood as a comparison.

This is a concept that has been discussed by countless other information professionals, many of whom will say that defining relevancy is not constructive because of its subjectivity. I disagree. I think all serious searchers need to have their own definition of relevancy in order to make judgments about search results. After all, why do we use search engines? We use them to find information. We don’t use them to be impressed by clever features, a large index, or an intriguing name. We use them to find what we are looking for, and we can only find information if the results we get for our searches are relevant. And we can only decide if results are relevant if we have a simple framework for making that decision. Relevancy is the key and the foundation for search. Without relevancy, the rest is fluff. On an engine that has good relevancy, the features that are built around it become especially valuable. On an engine that has poor relevancy, the features are useless.

There are a slew of companies offering various takes on searching electronic sources. Some companies are searching sources such as databases, archives, and home computers, while on the Web there are general search engines, visual engines, clustering engines, natural language engines, and so forth. There are also specialty search tools - tabs or advanced search on general search engines - that focus on news, blogs, images, and so forth. It is great to have these tools, but none of the bells and whistles mean a thing if the results are not relevant. Without relevancy, users will not come back no matter how many special features are available. How often will I visit a restaurant with great atmosphere, but bad food? Not often.

Definitions of Search Engine Relevancy
With relevancy being such an important part of search, how is this elusive term defined when it comes to search engines? Here are my definitions. I am sure your definition will be different. Even if your definition is similar, when it comes to actually evaluating search results people will not always agree. Even if someone agrees with everything I say, we will still often disagree in our evaluation. I may think a result is relevant, when someone else thinks it is not relevant, and because of the subjective nature of relevancy evaluation we can both be right. So, with all those caveats let me present my definitions.

Relevancy: A measure of how well a search tool finds the information being sought.
[Sound too simple? Please, I welcome any other definitions because the more complicated I tried to make my definition, the more I just kept coming back to this simple sentence.]

To break it down further, I think of relevancy in terms of three levels or grades:

Relevant: the search result provides the information I am looking for. It is that simple.

Somewhat relevant: the result is close, and may even propel me along a path that leads to the information I am looking for, but it does not exactly have what I want. A somewhat relevant result is sometimes valuable because it suggests a different way and different terms for a search.

Not relevant: the site provides no help to me. It may contain the terms I searched for, but the context is wrong. It is that simple.

Next Installment - Part 2: The Jaded Surfer

1/13/2005

Gigablast

Filed under: — Chris

Gigablast

Type of engine: General web search.

SUMMARY
Relevancy of results: Needs improvement.
Freshness of results: Very good, and not only are the results fresh but the index date is listed next to each result.
Features and functionality: Very straightforward and easy to use.
Quality of help and “about us” pages: The Help sections are OK, but I learned more about using Gigablast from reading articles on other sites about it then by reading their help sections.
Business model: Selling ancillary search services, such as enterprise search. No sponsored links, banner ads, or any other form of advertising presented to the user.

INTRO
I’ve been wanting to do a review of Gigablast for a while because the story of Gigablast is inspiring. Gigablast was written by former Infoseek engineer Matt Wells. Matt has a blog of sorts, but he has not updated it since February, 2004. The blog mostly covers Gigablast, but also has notes about other non-Gigablast issues. I like his candidness, such as “Alright you SEO people. Get your bots off my search results.”
The thing I really like about Gigablast is that it knows what its business is, namely general web search, and it simply goes about doing that. There are not half a dozen search tabs or other distractions. Although they do offer some other services, as I will mention later, you can go to Gigablast and it is as obvious as night and day what you can do there. You go to Gigablast, you enter a search and that’s that.

Right on the front page they list how many sites have been indexed. As of January 12, 2005 that number is 1,014,363,952. Over the past four days the number did not change, but I know it was not too long ago that the number was half what it is now.

UI & FEATURES and QUERY EXAMPLES
Really the only feature that users see on a regular basis is Giga Bits, and so I have decided to write about features and queries together. Giga Bits are related terms that appear at the top of every search result page. They append the original search rather than replacing it. Giga Bits can sometimes be used to find answers to natural language queries. For example, for What is the capital of Sweden?, the first Giga Bit result is Stockholm. I happen to know that answer is correct, but if you did not know that you might not pick up on the fact that the Giga Bit term can actually be the answer. So although it is a nice bonus, the way it is implemented right now will not be clear to many users. After clicking on the Giga Bits suggestion of Stockholm, the first answer is in Swedish and there is nowhere to set user preferences for things like language.

Giga Bits is a nice idea, but the current implementation is confusing and it took me a while to figure it out. I am still not clear on what the percentages next to each term mean so I tend to ignore them. I think engines do themselves a disservice by naming things with cute and clever names like Giga Bits. How about just calling it something so that users immediately know what they are seeing? Like related terms or refine your query, etc.

As I mentioned earlier, one really cool feature Gigablast has is that it tells you when each page was indexed. Another very nice thing is that there are a few options next to each search result. You can click on older copies to go to the Internet Archive (see my Wayback Machine review.) and see archived versions of the site. You can also look at cached archive copies, and a stripped version that takes out images and just leaves text. These are all very nice.

Gigablast also has something that used to be common, but you don’t see so much anymore. And that is an easy link to other search engines’ results. At the bottom of each search page you will see: Try your search on google yahoo alltheweb dmoz alta vista teoma wisenut.

As a side note, Gigablast used to default to OR searches, but now defaults to AND. Seems wise to me, particularly now that their index size is growing.

Time to try another query, such asHow does Gigablast make money?.
The first listing is a meta-search result page with a list of work at home businesses (MLMs), a few of which were pulled from Gigablast. Not so good. I looked through all ten of the results on Gigablast’s first page of results and none of them were relevant. Results 8, 9, and 10 are all the same entry from John Battelle’s blog and the only place the word “Gigablast” appears is in his list of search companies in the menu section of the page.

But the first Giga Bit is intriguing: “arrangement with Google”. So, by using that modifier, we get how does gigablast make money? “arrangement with Google”. The first two results are duplicates, as in the same content and same title, but one is hosted by Research Shelf and the other by Free Pint. A tough thing for engines to detect, but nonetheless a bad user experience because they offer no differentiating content. And plus, the content is not relevant. The site mentions Gigablast and it mentions Google, but I thought maybe I’d find some juicy bit about the two companies working together, but nope, nothing like that. Aggh, but it keeps getting worse because not only listings 1 and 2, but all of 1 – 7 are actually the same site, and they all have the same title too. The fourth result is a TinyURL which probably should not be indexed in the first place. #7 is a redirect that takes you right to the same site. For #8 the page discusses both Gigablast and Google, but offers no insights into them having any kind of agreement together. Afraid I was led on a wild goose chase.

I also noticed that on the top of the results page it says, “Results 1 to 8 of about 113,” but it is interesting the way they have done this. Because at the bottom of the page it says, “No more results found. Show relevant partial matches for your query.” But when I click on that I get “Results 9 to 18 of about 231.” So now I am a confused user. When it said 8 of 113 I thought that the 8 were AND matches and the remainder were partial matches, however when I clicked on the partial match link I got 231 results. What happened to 113? Where did that go? Not a huge deal, especially since I think other engines get these numbers wrong sometimes too, but it is distracting nonetheless.

For the next Giga Bit refinement, how does gigablast make money? “Matt Wells”, the first result is a Search Engine Watch interview with him from September, 2003, and it has a great answer to my question:

Q. From a business perspective, Gigablast carries no advertising? Is this a decision you plan to keep? How does Gigablast make money?

A: Money is derived from selling search services on my products page. At this point I don’t think I’ll put up advertisements unless I need the revenue to support Gigablast or myself.

Sometimes, the Giga Bit suggestions are pretty random, like how does gigablast make money? “going to make decisions”. There are four results to this strange query. Three of them are the same John Battelle posting and the fourth is from Geeking with Greg Linden. Greg’s posting is indeed about Gigablast, but not about how they make money. There are other posts about other companies like Google and Technorati making money, but not about Gigablast making money.

Some comments about these duplicate and non-relevant results. It seems that Gigablast is indexing and keeping multiple snapshots of the same site. For the above example, the duplicate listing was indexed in September, November and then in December. Although the site may have been updated during those months, the page is the same page. There is only one page. The difference is that the URLs are not being normalized. So here is what two of the three indexed URLs look like:
http://www.battellemedia.com/archives/000627.php
http://battellemedia.com/archives/000627.php

As you can tell, they are nearly identical except that one has WWW. In certain rare cases these could actually be different, but in this case they are the same. The third instance of this site is this:
http://www.snipurl.com/62pj

I think Gigablast could do a couple of things to resolve this situation. If they have the bandwidth they could compare page content and if there is enough overlap, and the URL is nearly identical, they could conclude it is the same page and only show it once, or at least offer it as a cluster. I understand though that site content comparisons are tough and the devil is in the details. The other thing that might work is to compare the display text for the results. For all three of them, the displayed text is exactly the same. That should be a dead giveaway. And the last thing is to not index Tiny URLs and Snip URLs at all. Although these are handy features for capturing long URLs, I think engines should follow them through and only gather the original URL. Plus, in this case, the original URLs are not even that long.

The interesting thing about this de-duplicating business is that for their XML feeds Gigablast offers the option of de-duplicating results. However, I could not find a way to add the parameter to a regular search string. If I am missing something here, please let me know because it seems odd that for RSS searching this functionality is implemented, but not for regular web searching.

I also wanted to modify my query a bit more, so I tried Gigablast business model.
Result #1 is a collection of entries on a search engine blog. There is an entry about Gigablast, but the term “business model” pertains to another entry about a different engine. Same with #s 2, 3 and 10, which are all duplicates of each other. #8 and #9 are a different set, but same problems. Worth noting is that both sets of duplicates are from Resource Shelf. Why is that? The results are not relevant and they are also duplicates of each other. But then I hit gold with #6, well not exactly gold, but a direct path to gold. It is a SiteLines posting by Rita Vine referring readers to an article by Gwen Harris from July/August, 2004. And here is exactly the information I have been looking for:

At present Wells runs Gigablast without any keyword-activated advertising: there are no banner ads and no sponsored links. This isn’t a matter of principle - it’s just that advertisements slow down the query response time.
Income comes from selling the technology. As Wells explains, he has “built Gigablast to be more efficient than the other engines” to save on time and hardware. Webmasters will be interested in his product line for creating indexes or integrating Gigablast web search results.

So, great I got my answer. But what happened that Gigablast did not take me right to the info I needed? So close, but instead of taking me right to the answer to my question it took me to a site that linked to the answer. The reason in this instance is quite straightforward. Nowhere in Gwen Harris’ article does she use the words “business” or “model”. But fortunately the referring site used the phrase “business model”. I don’t know what Gigablast could do better in a situation like this. If I had continued to modify my query I eventually would have gotten directly to it because I did find the site in Gigablast’s index.

CONCLUSION
I am a fan of Gigablast. I have a lot of respect for what Matt Wells has done. He set out on his own to write a new search engine and he did just that. He scaled the technology so that he can index a billion pages with less hardware and financial investment than the big engines, though of course he still has a ways to go to catch up with the big boys. It seems the crawling and indexing parts of Gigablast are its main focus and strengths. Unfortunately, it is lagging right now in its algorithm and search results.

As with many engines, Gigablast has created XML feeds, but I would rather see them improving their results before adding extras like this. This happens at other engines as well. Even MSN has been building RSS feeds and their relevance needs some work too. (See my earlier MSN review.) I suppose the reason engines are doing this is because it is easier to implement RSS feeds than to improve the elusive beast called relevancy. Plus it may be that different engineers work on each of these, so adding XML feeds in no way impedes the progress of other initiatives like relevancy. Whatever the case may be, things like RSS search feeds are great, I love them and subscribe to several from Technorati and Google, but they are used by the few, not the many, and they are only as useful as the quality of the engine’s results.

What I saw in Gigablast’s results was the following:
• Many duplicate sets
• Many results from one or two sources
• Poor matching for multiple terms. In other words, Gigablast noticed the words in my query on a page, but the words were spread out and not in proximity to each other.

With some improvements to things like this, Gigablast can be a very nice homegrown engine and alternative to the big fellas.

1/7/2005

Findory - Interview with CEO Greg Linden

Filed under: — Chris

The Search Lounge is very pleased to feature an exclusive interview with Greg Linden, founder and CEO of Findory. Findory is a unique service. It’s not quite a search engine, it’s not quite an RSS subscription tool, and it’s not quite a news aggregator site. So what is it then? I’ll quote Greg from an email he sent me: “…Findory isn’t a normal search engine. The primary focus of Findory isn’t search. The primary focus is discovery. The site learns your interests from the articles you read, searches thousands of sources for you, and surfaces interesting news articles and new sources. It’s like a newspaper built just for you. Quite a bit different than your average search engine.”

Findory shows you blog entries and news stories that match your reading patterns. And you do not have to enter a single bit of information about yourself. You do not even need to register to use the basic features. All you have to do is visit Findory’s homepage and either search or browse the listings. Then every time you click on a link to read it, Findory (I think) analyzes the page’s content and also checks to see who else clicked on that link and what else they have clicked on. Then when you return to the Findory.com homepage there will be links that Findory thinks you will be interested in. Although the back-end technology may not be crystal clear, it is actually very simple and powerful to use.

Greg also maintains a very good blog that I read constantly called Geeking with Greg. He writes about search, RSS, and other Internet topics. For several years Greg worked on personalization features for Amazon.com.

This interview was conducted exclusively via email.

Hi Greg, thanks for joining us at the Search Lounge. It is a pleasure to have you with us. Would you mind starting off by giving a bit of background information about how Findory fits into a user’s repertoire of online information tools? In other words, what are the benefit for those readers who have yet to try Findory?

Imagine the front page of a newspaper unique to you, emphasizing the news of the day you need to see. Findory is a personalized newspaper that learns your interests and builds a front page of news stories specifically for you.

Findory helps you read the news faster and more efficiently. Rather than skimming many sites to try to find the news of the day, Findory brings all the daily news to one spot, sorting news from thousands of worldwide sources. You will find articles you might otherwise miss while getting broad coverage of major news events.

Unlike news aggregators such as Google News, Findory is personalized to you, focusing your attention on the news you need. Unlike customized news sites such as My Yahoo, Findory requires no effort to use. You do not have to specify categories or keywords; Findory learns what news you want to see just from the articles you read.

Any plans to offer users a personalized interface to go with personalized listings? As a user, I would be particularly interested in changing the category ontology on the top page to promote subjects I am interested in.

It’s a great suggestion. We have built Findory to be super easy to use. No effort. No registration. Just read news and the front page gets better and better.

But some of our readers are interested in customizing the Findory front page. We will be launching more customization features over time, but our site will always remain focused on being easy to use, no effort required.

Would you mind shedding some light on whether Findory crawls and indexes its own search results, or whether they are pulled from other engines? And as a follow-up, is the search algorithm built in-house?

Findory has its own crawl of thousands of news sources and weblogs.

Our personalization technology was designed and developed by Findory. The personalization engine and news and weblog search engines were built in-house. The web search engine personalizes Google web search results.

When I search on Findory I have the option to search news, blogs, or the web. There does not seem to be an option to search a combination of those sources. As a user, I am less concerned about which source bucket the information comes from, than I am about the information itself. Any chance you might offer users an advanced search option to handle that? And how about advanced syntax?

It’s a great suggestion. We’ve designed our search engine to be simple, fast, and easy to use. So, while it is true that there are no advanced search options, the news, weblog, and web searches are quite unusual in that they are all personalized. Different users doing the same search on our site will see different search results, all depending on their history and their interests.

As an outsider looking in, and a user, my impression is that when I click on a link, Findory analyzes the keyword content of the site and looks for other sites with overlapping terms. Findory also does something like Amazon’s “Customers with similar searches also purchased…” functionality, except in this case it is “Other users who read this article also read…” I realize you are not able to give away the secrets of your technology, but what can you share with us about it?

At a high level, we look at what other readers like you are reading to recommend news stories to you. It’s a community. Imagine if friends of yours using Findory were constantly recommending articles to you. Now imagine that those friends are found automatically for you. That’s how Findory works.

How does Findory evaluate its personalized listings and search results in order to improve? In other words, is there any kind of testing or evaluation to determine the relevancy of the listings shown to users?

Great question. We’re constantly testing refinements to our algorithm and trying to improve on it. As we get more and more users of Findory, the quality of our recommendations get better and better. Findory builds on the strength of its community.

What subjects is Findory particularly good at surfacing for users? Any areas that need improvement?

The quality doesn’t seem to vary by subject, so I’m not sure this is a real issue for us. One area where we are seeking to improve is our weblog coverage. We have thousands of weblogs in our database, but that represents just a small fraction of the weblog content out there. We are working aggressively to extend our crawl.

Sometimes I want to be able to give feedback to the Findory engine when a site is not relevant. For example, I searched for San Francisco because as a San Franciscan I thought I would help Findory by focusing it on my location. However, the next time I used Findory there was an article about the San Francisco 49ers firing their head coach. And the next time I went back the lead article was about the NFL. In that case, Findory was close with the 49ers article, but in the end it was not relevant because I am not a football fan. Any plans to offer this kind of feedback loop?

It’s a great idea. We’re considering customization features such as being able to rate articles or sources or being able to say “not interested” on articles. We are looking at ways to do this that don’t interfere with the main purpose of Findory, reading news.

Do you see a way for regular search engines to integrate your technology?

Absolutely. Our technology is designed to be highly scalable. Despite our complicated personalization, the online portion of our processing only takes a tens of milliseconds. We want Findory’s personalization to be helping millions of readers find the information they need.

Blog searching by companies like Findory, Daypop, and Technorati, to name just a few, has become one of the hot search topics these days. How do you see blog searching fitting into the overall search engine industry in the next year or two? What about real-time indexing?

The real issue with weblogs is finding good content. Weblogs are self-published. No publisher means no filter. That’s a good thing and a bad thing. It’s good because it opens the floodgates for so much new content. It’s bad because those filters are sometimes useful, helping readers differentiate useful from useless. This problem will only get worse and worse as the blogging phenomenon accelerates.

Findory is all about relevance. In a sea of information, how do you surface what people need? Current web and weblog search engines all use the same relevance rank for all searchers, but not everyone has the same definition of relevance. Findory learns your interests - what is relevant for you - and surfaces that content.

Some have said 2005 will be the year search becomes personal. Findory has already taken the first steps.

I have yet to see a sponsored result or any banner ad on Findory. Would you mind commenting about Findory’s business model?

Findory currently has no advertising. We have designed a personalized advertising engine that targets based on reader’s interests, much like Google AdWords but more targeted. But, readers aren’t huge fans of advertising. At this point, we prefer to work on features our readers want. When we need to launch advertising to support our website, we’ll be ready.

Lastly, what’s your favorite drink?

Favorite drink? It’d have to be coffee. Caffeine is what keeps us geeks going.

Thanks Greg. Are there any other comments you’d like to add?

Thanks, Chris! Glad to hear you’re enjoying Findory!

1/2/2005

Jobs In Search - Interview with Owner Mike Taylor

Filed under: — Chris

The Search Lounge is very pleased to feature an exclusive interview with Mike Taylor, founder of Jobs In Search.com. Jobs In Search is located in the United Kingdom and bills itself as “The specialist job site for the Search Engine Industry”. As well as having listings for jobs at search engine companies, they also list positions from a variety of search-related companies, including those that do search engine optimization, online marketing, and web design. All features are free for job seekers, including registering. The site has been live since October, 2004 and in that short time I have seen the U.S. listings go from none to more than twenty.

Mike has a background in online recruitment, and to find out more about him check out his blog, including a more comprehensive biography.

This interview was conducted exclusively via email.

Hi Mike, thanks for joining us at the Search Lounge. It’s a pleasure to have you with us. Would you mind starting off by giving a bit of background information about how you came up with the idea of Jobs in Search?

Thanks for the invite Chris. I am pleased to take part.

My background is mainly HR and recruitment, having worked for large blue-chip companies such as IBM, Motorola and Nokia.

It was during my time at Nokia (1999 - 2001) that I first got involved with online recruitment. I ended up being responsible for Nokia UK’s online recruitment strategy and I also served on Nokia’s global Internet recruitment strategy board.

After leaving Nokia in 2001 I set up Web-Based-Recruitment, an online recruitment consultancy offering advice to companies on how to use the Internet to attract and recruit new employees.

As an extension to the online recruitment part of the business I also started Domain Attraction, an online marketing business providing search engine marketing services.

I had the idea for JobsInSearch.com in the middle of 2004 as it enabled me to draw on my background and experience of online recruitment and online marketing.

Is Jobs In Search a full-time job for you, or do you have another 9 to 5 job?

I am now working full time on Jobs In Search but I still run some search engine marketing campaigns for my existing clients.

How do you define the term “search industry”? And this is probably a tough follow-up question, but do you have a rough estimate as to how many people currently work in this industry?

My personal definition (as far as Jobs In Search is concerned) is that the “search engine industry” covers all companies that are involved, or connected with, the term “search”.

This includes the Search Engines, Search Engine Optimization and Search Engine Marketing firms, New Media Agencies, firms providing search engine related software (positioning / analytics etc.) and Web Design firms.

As for how many people work in the search engine industry this was a question we asked ourselves when first starting the project. Although you can obtain figures on the number of people working with the large search engines, finding figures for the smaller search related firms proved very difficult.

Which job areas and skills do you see as being in the highest demand, both now and in the near future?

From the companies we have talked to we sense a shortage of good search engine optimization and search engine marketing professionals. As “search” continues to grow we also expect more and more companies to be recruiting for business development roles in the future.

How do you publicize your site so that people in the industry know about it? And what kind of response are you getting from hiring companies and recruiters? How about from job seekers?

We issued press releases related to the official launch which got picked up by various search engine related news and blog sites. We have also issued some further press releases since which have also been widely reported.

As you would expect we are also trying to maximize our natural search engine ranking positions as well as using pay per click. We also launched our own blog site in November 2004.

The response so far has been very encouraging as we have had a number of high profile companies contact us having seen our press releases. As for job seekers we are continuing to see the number of registrations on the site increase, which suggests that word is getting out there about the site.

I notice the majority of your jobs are for the UK, (currently more than forty jobs are listed) which is natural considering that is where you’re located. Do you have plans to expand the international listings? Has it been a challenge for you to make contacts with companies in other countries?

The plan all along was to be an international job site. You are correct in that we started with UK job listings but we expect the majority of jobs to be US based.

As for making contact with companies outside the UK it has not been a problem for us. All of the companies we have spoken to so far were happy to receive further information about our services.

Job seekers have the option of registering with you and sending in their resume and contact information. Can you explain the process and what happens after a job seeker sends in his or her resume in terms of hiring companies and recruiters seeing it?

The first step is that job seekers can register for our jobs by email service and we will send them details of any new jobs as soon as they are posted on the site that match their requirements.

In addition they can also register their resume with us and choose whether or not it can be viewed by companies and recruiters. We do not allow any companies or recruiters to view any resumes without the job seekers permission.

How does Jobs In Search work for a hiring company or a recruiter? Do they pay subscription dues? Or do they pay per job posted or per screened applicant? Or do they pay per successful hire? Or some other way?

To advertise on Jobs In Search companies or recruiters are required to purchase job credits. Each job credit lasts for 30 days from the date of posting. As there is no time limit on the use of job credits savings can be made by purchasing multiple job credits in advance.

Having researched the market for the cost of job postings we feel that we offer a very fair price for a specialist job site offering targeted job seeker traffic.

I noticed you have a jobs by email feature, but I didn’t notice an RSS feed option. Is that something you’re considering? Also, we’d love to hear about any other features and functionality improvements that you’re working on. For instance, can I assume that as you get more listings you will break down the geographical regions beyond the country level?

We are currently looking into RSS feeds and hope to add this feature in early 2005. As for other features we have more “Meet The Experts” interview profiles to add in January 2005 and beyond. We may also add an additional “training” feature in 2005 plus more information for job seekers about resume preparation etc.

As for functionality you can currently search for jobs in the major countries by state or county at the moment. However, as there is a lot of interest in “local search” we may consider more “local” options in the future.

Lastly, what’s your favorite drink?

A cup of tea!

Thanks Mike!

Powered by WordPress