A Call to Arms
[Part 3 of a series about relevancy.]
Two years ago, on December 5, 2002, I was working at LookSmart when Danny Sullivan at Search Engine Watch published a short piece called, In Search of the Relevancy Figure. He wrote:
â€œWhere are the relevancy figures? While relevancy is the most important â€˜featureâ€™ a search engine can offer, there sadly remains no widely-accepted measure of how relevant the different search engines are. As we shall see, turning relevancy into an easily digested figure is a huge challenge, but it’s a challenge the search engine industry needs to overcome, for its own good and that of consumers.â€
It was a call to arms for the search industry to come together and figure out acceptable relevancy metrics. Enough with the empty claims about relevancy, it was time that some standards were set in place so that the public could know definitively which engine was the most relevant. It is a noble, but nearly impossible, idea.
At the time I was running a team that compared the relevancy of search results from a variety of engines. Our insights were used by the executives to make business decisions and by the search engineering team to help improve the companyâ€™s own algorithms. Danny Sullivanâ€™s article was sent around the company and commented heavily upon, but in the end we agreed that relevancy figures need to come from the outside. We could advise about our methodologies and analyses, but that was all. After all, would a newspaper trust a book critic who worked for a publisher to review one of that publisherâ€™s books? No, and neither will the public fully embrace a relevancy figure generated by a consortium of search engine companies, no matter how good the intentions and methodologies are.
The single biggest problem with relevancy figures is the devastating, and in some cases illegitimate, damage it can cause a search company. Relevancy is not a universal figure, it is always subjective. It is not one magic number that encompasses all queries for all people. I am not arguing against reviews and criticism of search results; after all, critiques can provide search companies with solid analysis to build upon as is my goal with the Search Lounge. But if Time Magazine or Newsweek published a cover article saying one engine is far and away the most relevant, imagine the effects. Users would desert the other engines and flock to that engine. And that is not right. Users need to use the engine that is best for each unique information need they have.
As one former colleague astutely pointed out to me, it is similar to what happens when magazines publish lists of top universities, top hospitals, top doctors, etc. The rankings are generalized and with the publicity and hype that comes along with them, the winners get to make the rules. But each student and patient has unique needs that may or may not be best served by the university, hospital, or doctor that ranks the highest. The same can be said for search.
Relevancy analyses are often comprised of multiple sections or tests. There may be a part that looks at certain types of queries, such as geographical or shopping, or at queries with a certain number of words, or at natural language queries, or popular queries, or news stories, or ambiguous queries, and on and on. One engine may lose the overall relevancy test to another engine, but might win for local queries because they have targeted zip codes and city-level results. So if every user abandons ship and only uses the overall â€œwinnerâ€, then for local searches they will be getting inferior results. This notion can be taken down to the specific query level where an engine may have good results for chess openings and bad results for chess books. You be the judge! The point I am making here is that an overall relevancy figure sabotages the end goal of helping searchers.
Another major problem is the number of possible ways to evaluate relevancy of search results. And I guarantee there is absolutely no way the industry â€“ that is to say the major search engines, namely Ask, Google, MSN, and Yahoo – would ever agree on one relevancy figure. It just will not happen. Think of these analogies: are you getting all the relevant newspaper articles on a topic if you read only one newspaper? Are you watching the funniest sitcoms on TV if youâ€™re only watching one station? And most pertinent to this topic, are you getting the best books on a topic if you only visit one library or bookstore? The answer to all of these is a resounding NO. And the same is true of search engines. You are absolutely, completely, definitely, not going to get all the best results on one search engine.
Another problem is frequency of analysis. Engines update and release new products so often that it is a full-time job keeping up. Plus there are new search engines that go live every month. I do not have the time to fully analyze and do a Search Lounge review for every single new improvement and release, but I can run one or two or even five queries on a new release or new engine to see if it passes the acceptability barrier. These days most, but not all, engines meet a minimum threshold of acceptability. But a minimum threshold is far from good. And even an engine that is not passable may update its index the very next day and overnight the results may be dramatically better.
The back-end behind each engineâ€™s crawling, indexing, and algorithm technologies is far too complex to produce the same results, and the queries each person will enter during multiple search sessions are too diverse. There are simply too many variables. Iâ€™ll throw out one last analogy (I promise): if four people are told to make chocolate chip cookies, will all four taste the same? No. And going one step further, will the tasters all agree on which one is the best? Maybe, but probably not, assuming they all meet at least a minimum threshold of quality. So even if a report is released saying that an engine is the most relevant as judged by a fully objective, scientific study, the counterattacks from the other engines will be swift, immediate, and oftentimes legitimate. The media would be awash with a blizzard of PR releases explaining why the test was incorrect, why the winner did not really win, and why the losing engine is actually more relevant and getting better every day. And there we are, right back where we started with searchers not able to trust corporate press releases.
Next Installment – Part 4: Using Different Engines
Search Engine Relevancy. Part 1: Defining Relevancy
Search Engine Relevancy. Part 2: The Jaded Surfer