Category Archives: Articles

Non-search Search Companies

The media and the public think search engine relevancy started with Google. It didn’t. What Google did, compared to other search companies, was that they built a whole business around being good at Search, and for the first several years they didn’t get distracted. All the other early search companies, such as Excite, Lycos, Snap, LookSmart, Hotbot, Go, Infoseek, AltaVista, Inktomi, Ask Jeeves, Yahoo, MSN, etc., lived a dual life. They said they were search companies, but in reality most of these companies were ruled by business development and marketing. They were non-search search companies.

At each of those companies there were hidden people – engineers, product managers, and editors – who were trying to build relevant search. But the business side of these companies had the ears of the executives, and they were whispering that the product only needed to be decent to bring users. After all, in the mid and late 90s users were happy enough just to know one place to go when they got online, and if they knew how to type in *insert company name from above*.com that was fine for the business arms.

Some of the big players, like Yahoo and MSN, who did believe in providing a good user experience, outsourced much of their search technology. And the good engines, AV and Inktomi, didn’t know what to do with a good thing. Had they focused on public-facing algorithmic search, instead of portalization and enterprise search without a consumer facing interface, either of them could be king.

Google came along and said look, all you guys are doing all these other things, but we’re going to do Search. In the beginning they didn’t know how to make money out of it, so they outsourced it. They also had google.com, so while their competitors distributed their search, users realized they could go right to the source. In those early days was Google more relevant than AV, Inktomi, and IBM’s Clever? There’s no reason any of those other products couldn’t have kept up in the race if they’d had the same business-side support from their companies.

Those of us who were the hidden people in non-search search companies, we talked about how search was neglected. But for the execs, search was not something to make money off of (in business speak: monetization), search was something you had to have or else people wouldn’t come to your portal and look at your banner ads (business speak: impressions) or read your articles (business speak: content). And so while all the non-search search companies slept, Google built Search.

What’s the point of all this? The point is that the people who advocated for search and relevancy existed in the companies mentioned above, but they were under-nourished in their corporate diaspora. Now search makes busines sense, and the non-search search companies (those still in business along with a host of new ones), are all working on Search. Here at Yahoo, many of the hidden people have gathered from their respective non-search search companies and the result is a group whose sole and only mission is improving Search.

The search cycle

Search goes like this:

1. CREATE. A new medium becomes popular.
2. SHARE. People want to find and share objects created in the new medium.
3. ORGANIZE. Manual systems are created to organize the new medium.
4. SEARCH. Manual systems become stale and can’t scale quickly enough so automated systems are built.

Looking at an example, let’s start with web search:

1. CREATE. Lots of people started creating web pages.
2. SHARE. Users tried to find information on the web.
3. ORGANIZE. Web directories were built and became popular.
4. SEARCH. Web directories go out of fashion as search engines improve and scale.
What’s next? Some people think personalization and customization, but I think we’re still working on #4.

How about another example, image search:

1. CREATE. Digital photos become popular.
2. SHARE. People want to share their photos.
3. ORGANIZE. Tagging systems like flickr become popular for organizing photos.
4. SEARCH. Search engines like Yahoo and Google search text associated with photos. What’s next for automated ways to search images?

One more example:

1. CREATE. The rise of the bloggers.
2. SHARE. Bloggers want readers.
3. ORGANIZE. People build their own blog rolls. Also blog directories are built.
4. SEARCH. Blog (RSS) search engines like Technorati do real-time indexing.

I’m anxious to see how the cycle works with video content.

When Giant Directories Roamed the Earth

Written by guest contributor Dave Jansik

Back in the day, directories were the definitive sources for finding quality content on the Web. Following the success of Yahoo, several companies and volunteer-driven efforts sprung up to get a piece of the action. The concept: we do all the searching ahead of time for our users, we establish proprietary traffic, and then the inevitable follow up: we charge businesses for inclusion in the directory. Brilliant. Simple.

Over the last six years, I have been involved in building one of the largest editorially and volunteer driven directory of Web sites. LookSmart’s founders recognized the need to “sort through the crap” that search engines were spewing out in the late 90s. At the time, their intentions were pure and good. The bubble was growing when Microsoft contracted LookSmart to create comprehensive directories of all the best sites out there. With all that money flowing in, we could afford to hire over 100 editors and ontologists worldwide. With the acquisition of Zeal.com we enabled a Web-based platform to harness the skills of hundreds of volunteers to add non-commercial sites and manage categories. Some were in it to get traffic for their own sites while others truly wanted to help build a great search experience. They searched for sites that were determined to be the authority on a topic. Descriptions and titles were hand-written one at a time. The platform enabled users to add numerous types of metadata, link sites and categories, perform mass edit functions, and easily find transgressions such as empty categories. Soon after, we began a paid inclusion service that played off the huge traffic numbers LookSmart received through its distribution on MSN. Thousands of companies were given the right to have as many product pages as they wanted placed in the directory. Suffice it to say that from a quality standpoint, directory search results were heavy on commercial and light on the quality we first intended. I was on the Ontology team whose task was to create and manage an ontology that accommodated what ended up being over 2 million commercial and non-commercial sites neatly organized into a highly intuitive 250,000 category ontology.

When Google entered the fray, directories came to be considered limited in scope. Many specific queries – the quick answers – couldn’t be answered by a directory. Google got you what you wanted in that crucial first five site returns. LookSmart responded by acquiring WiseNut and Grub.org to position it in the search arena. MSN recognized the need to compete with Google, dropped our contract in favor of building their own search engine from scratch, and we were sent scrambling to find a way to replace seventy percent of our revenue because MSN accounted for the bulk of our user traffic. Without Microsoft, resources to maintain our search products were pared. Editorial took a big hit, and with such a large directory on our hands the remaining staff and dwindling number of Zeal volunteers couldn’t keep up with the exponential growth of the Web.

It’s 2005 and most of the editors are gone. I have spent much time clearing out most of the commercial sites that once challenged our credibility. A handful of passionate and dedicated volunteers continue to shape and develop the directory – some oversee its vast landscape while others continue to focus on the topics they hold dear. It is their hobby and their community of friends. It’s their baby. Still, fly-by-night contributors and savvy SEOs endure the Zeal quiz solely to submit their sites knowing that being in LookSmart’s directory will boost Google PageRank.

So, what is the future of the directory? The web is a vast, vast place. Will search engines be able to deliver specific content out of the billions of pages they are able to index? Those of us on the directory bandwagon feel that contextual search and browse tools will always be an essential tool for researching the Web. Consider how useful it is for a searcher looking for everything on the paper and packaging industry to have neatly organized categories of guides, associations, major companies, professional journals, and equipment suppliers already found for him. Kids can search for information on mummies and then climb back up or across the directory to find facts on King Tut, the pyramids, and Cleopatra. People who aren’t sure where to go on vacation can look up official travel guides to hundreds of cities worldwide and see what kind of hotels, entertainment, activities, and restaurants are there before booking.

Here are a few other uses and features that can keep directories relevant, interesting, and even profitable:
Search engine seeding. From a technical standpoint, it is widely acknowledged that search engines can use directories “seeded” with relevant URLs to improve relevancy.
Marketing to a specific segment of Web users by creating verticals. The best of the Web has already been categorized – it’s a great place to start zeroing in on an audience and advertisers. Freely distribute the category structure to other verticals to build traffic.
Making them more consumer-friendly for users and contributors. Add simple site rating and ranking tools that bring the best sites to the top of category site lists, incorporate message boards to share ideas and suggest other sites to fellow users, make it easy to submit a new site, offer basic search engine features such as preview panes and highlighted keywords.
Address staleness issue. By including topical RSS feeds, news feeds, or links to community bookmarking services such as Furl. Once a directory is built it is fairly easy for a small staff to maintain and add new content.

It will be interesting for me to see if directories last as a useful research tool. Yahoo’s directory, once their crown jewel, now occupies a smidge of real estate towards the bottom of their homepage. There are other players popping up solely for the submission money, though just about all of them are so thin on content that one visit is plenty. Bookmarking services like del.icio.us and Furl are helping us dynamically learn about what people find interesting on the Web, yet there are issues with the free-wheeling and personalized metadata, or folksonomies, that users of these services employ. No one has invented the perfect, omniscient search tool yet. And ya know, I wouldn’t be surprised if the person who creates one gets their idea while in a directory.

***
Dave Jansik is a Senior Ontologist and Content Producer at LookSmart.com. He has worked on Web directories for 6 years.
[Editor’s note: I’d like to thank my friend Dave for taking the time to put this commentary together. He has been a quiet advocate for helping people find information on the Web for years and it’s great to have him sharing his thoughts with us.]

Search Engine Relevancy. Part 3: A Call to Arms

A Call to Arms

[Part 3 of a series about relevancy.]

Two years ago, on December 5, 2002, I was working at LookSmart when Danny Sullivan at Search Engine Watch published a short piece called, In Search of the Relevancy Figure. He wrote:

“Where are the relevancy figures? While relevancy is the most important ‘feature’ a search engine can offer, there sadly remains no widely-accepted measure of how relevant the different search engines are. As we shall see, turning relevancy into an easily digested figure is a huge challenge, but it’s a challenge the search engine industry needs to overcome, for its own good and that of consumers.”

It was a call to arms for the search industry to come together and figure out acceptable relevancy metrics. Enough with the empty claims about relevancy, it was time that some standards were set in place so that the public could know definitively which engine was the most relevant. It is a noble, but nearly impossible, idea.

At the time I was running a team that compared the relevancy of search results from a variety of engines. Our insights were used by the executives to make business decisions and by the search engineering team to help improve the company’s own algorithms. Danny Sullivan’s article was sent around the company and commented heavily upon, but in the end we agreed that relevancy figures need to come from the outside. We could advise about our methodologies and analyses, but that was all. After all, would a newspaper trust a book critic who worked for a publisher to review one of that publisher’s books? No, and neither will the public fully embrace a relevancy figure generated by a consortium of search engine companies, no matter how good the intentions and methodologies are.

The single biggest problem with relevancy figures is the devastating, and in some cases illegitimate, damage it can cause a search company. Relevancy is not a universal figure, it is always subjective. It is not one magic number that encompasses all queries for all people. I am not arguing against reviews and criticism of search results; after all, critiques can provide search companies with solid analysis to build upon as is my goal with the Search Lounge. But if Time Magazine or Newsweek published a cover article saying one engine is far and away the most relevant, imagine the effects. Users would desert the other engines and flock to that engine. And that is not right. Users need to use the engine that is best for each unique information need they have.

As one former colleague astutely pointed out to me, it is similar to what happens when magazines publish lists of top universities, top hospitals, top doctors, etc. The rankings are generalized and with the publicity and hype that comes along with them, the winners get to make the rules. But each student and patient has unique needs that may or may not be best served by the university, hospital, or doctor that ranks the highest. The same can be said for search.

Relevancy analyses are often comprised of multiple sections or tests. There may be a part that looks at certain types of queries, such as geographical or shopping, or at queries with a certain number of words, or at natural language queries, or popular queries, or news stories, or ambiguous queries, and on and on. One engine may lose the overall relevancy test to another engine, but might win for local queries because they have targeted zip codes and city-level results. So if every user abandons ship and only uses the overall “winner”, then for local searches they will be getting inferior results. This notion can be taken down to the specific query level where an engine may have good results for chess openings and bad results for chess books. You be the judge! The point I am making here is that an overall relevancy figure sabotages the end goal of helping searchers.

Another major problem is the number of possible ways to evaluate relevancy of search results. And I guarantee there is absolutely no way the industry – that is to say the major search engines, namely Ask, Google, MSN, and Yahoo – would ever agree on one relevancy figure. It just will not happen. Think of these analogies: are you getting all the relevant newspaper articles on a topic if you read only one newspaper? Are you watching the funniest sitcoms on TV if you’re only watching one station? And most pertinent to this topic, are you getting the best books on a topic if you only visit one library or bookstore? The answer to all of these is a resounding NO. And the same is true of search engines. You are absolutely, completely, definitely, not going to get all the best results on one search engine.

Another problem is frequency of analysis. Engines update and release new products so often that it is a full-time job keeping up. Plus there are new search engines that go live every month. I do not have the time to fully analyze and do a Search Lounge review for every single new improvement and release, but I can run one or two or even five queries on a new release or new engine to see if it passes the acceptability barrier. These days most, but not all, engines meet a minimum threshold of acceptability. But a minimum threshold is far from good. And even an engine that is not passable may update its index the very next day and overnight the results may be dramatically better.

The back-end behind each engine’s crawling, indexing, and algorithm technologies is far too complex to produce the same results, and the queries each person will enter during multiple search sessions are too diverse. There are simply too many variables. I’ll throw out one last analogy (I promise): if four people are told to make chocolate chip cookies, will all four taste the same? No. And going one step further, will the tasters all agree on which one is the best? Maybe, but probably not, assuming they all meet at least a minimum threshold of quality. So even if a report is released saying that an engine is the most relevant as judged by a fully objective, scientific study, the counterattacks from the other engines will be swift, immediate, and oftentimes legitimate. The media would be awash with a blizzard of PR releases explaining why the test was incorrect, why the winner did not really win, and why the losing engine is actually more relevant and getting better every day. And there we are, right back where we started with searchers not able to trust corporate press releases.

Next Installment – Part 4: Using Different Engines

Search Engine Relevancy. Part 1: Defining Relevancy
Search Engine Relevancy. Part 2: The Jaded Surfer

Search Engine Relevancy. Part 2: The Jaded Surfer

The Jaded Surfer

[Part 2 in a series about relevancy.]

Search engines love to tell us that they are the most relevant, and I don’t blame them. A glance through any engine’s press releases will include claims like “most relevant update”, “a dramatic increase in relevancy”, and so forth. These conflicting claims are like political rhetoric. Ultimately they have the opposite effect of what was intended, because we are all becoming jaded searchers. Here are some examples, and be sure to note how many claim to be the most relevant:

Search companies must be allowed to say their product is relevant. Otherwise, how can they market themselves? I am not disputing any of these company’s claims or passing judgment on their right to proclaim their search product as being relevant. The issue I am emphasizing is that these claims are subjective and need to be understood as such because they can not all be the most relevant, easiest, most extensive, and fastest.

Relevancy is like Pornography
So, with all that being said, what exactly is this elusive specter called relevancy, and how is it identified? It’s like the classic question: what is pornography? We don’t know how to define it, but we know it when we see it. Similarly, search engine relevancy is subjective and means something different to everyone. And not only that, it can mean different things to the same person at different times. A relevant result can be the site that provides the exact answer to a question; it can be the authority in its topic that provides a broad selection of information; or it might be a new site in a topic that is already known very well. That is why relevancy evaluations must be comparison-based. I know that the results on one engine are bad because another engine has better results. If the other engine did not exist, then my level of expectation would be lowered and it is possible that the first engine’s results would seem relevant to me. Underlying this notion is the notion that both engines pass a minimum threshold of relevancy. It is conceivable that the results could all be not relevant, in which case the comparison does not even come in to play.

Relevancy evaluation changes based on the types of information being sought. Each and every query for information needs to be reevaluated every single time. Users must never think that the results on their favorite engine are always the most relevant. If searchers do not find what they want, they can do a few things: they can come up with a new search strategy and approach the problem from a different angle; they can stick with the same strategy, but refine the query by making it narrower or broader, or by using advanced options and syntax; and, lastly, if the engine is still not finding what they are looking for, they can go elsewhere and try a different search tool.

Next Installment – Part 3: A Call to Arms

Part 1: Defining Relevancy

Post Navigation