Monthly Archives: October 2006

You are browsing the site archives by month.

Internet Librarian 2006, Chris Sherman on social search

Notes from Chris Sherman regarding social search.

The web itself, as created by Tim Berners-Lee, and early things on the web, like the Yahoo! directory, are examples of social search. Links and directories involve human-recommendation systems. HTML meta-tags were another example, but they lasted all of about 2 months before spammers ruined them and search engines were forced to basically ignore them.

He thinks algorithmic search has plateaued and innovations are few and far between.

He gave a great recommendation for My Web as a resource for team projects because various people can save and access content together.

I also liked his mention that although it doesn’t get much airplay these days, the Yahoo! Directory is still going strong and is growing.

Other topics: on Wikipedia, external links are becoming more pervasive. Popurls.com - a helpful tool. Yahoo! Answers - beginning to rank and filter questions by quality.

Internet Librarian 2006, Chris Sherman on GYMA

Chris Sherman discussed search. Notes:

Ask - increasing R & D budget. Goal of being #2 but with a dedicated following. Think of Apple’s computers.

Google - company ranks top 100 projects using some kind of cool ranking algorithm (not sure on details). 75% of projects are search and advertising-related. 20% are pain-points on the internet. 10% are “blue sky”. Google Books is a project that will teach Google how to read. Think about it.

MSN Live Search - great new features marred by confusing and inconsistent interface. He thinks their image search is the best on the web.

Yahoo! - he thinks Yahoo! is having a lot of internal debates right now about company direction. He thinks Y! has not lost any ground technically, but the company is having some communication issues around what we’re doing. I like the way he expressed this. I think it’s fair. He also pointed out that we have 12 Economics PhDs on staff. I presume they’re in Yahoo! Research Labs.

He also mentioned that search engines (I believe he specifically mentioned Yahoo! and Google, though presumably it applies to all four) take privacy very seriously.

I liked his talk, but wanted him to say something, anything, about core relevance and quality of web search results.

Internet Librarian 2006, RIP?

I spent yesterday in Monterey with a thousand internet librarians. I was disappointed with the day. Why?

1. Duplicatation - I attended a few talks about search and there was too much duplication of content in the talks. I don’t blame the speakers for this, I blame the organizers. I agree that Exalead is a respectable search engine, but I don’t need three speakers telling me about them.

2. RSS and blogs - If you’re attending IL 2006 you should know what a blog is. You should know how to find them. You should know how to subscribe to them. Blogs were cutting edge (kinda) in 2004; they were still cool to talk about in 2005; but in 2006 we need to move our conference topics onwards.

3. Crowded - I tried to get into one talk and it was too crowded. That’s disappointing.

4. Keynote - J. A. Jance, a mystery author, gave the keynote. She gave an interesting speech. She discussed her life and some of the troubles she’s faced. She was funny and appropriately serious. But why was it the opening keynote? In 2004 Chris Sherman discussed the state of internet search; last year Lee Rainie of the Pew Internet & American Life Project discussed online demographics. Those are appropriate keynote topics for Internet Librarian. Sarah Houghton has a write-up of the talk itself.

5. My expectations - and lastly, to be fair, I should say that in 2004 and 2005 I loved this conference. Really loved it! (My IL 2005 write up on ysearchblog) I was inspired by it and by the collective excitement of the attendees. Maybe this year, having worked at Yahoo! now for a couple of years, my perspective is different and I’m looking for more in-depth talks about search. I feel a kindredness to the other attendees, but most of them are real-life librarians working in public, academic, and special libraries, and their needs are different from mine.

Search journalists

Search is a highly specialized field, however billions of people interact with it as users. Then lay on top of that the huge business implications of the industry and you’ll see why there are tons of articles about search deals, the stock prices, and market share. All of these are important, but I’m always disappointed that the mainstream media generally ignore the actual products. Even search “experts” rarely write about the quality of search results and the technology behind the curtain. It’s so much easier to write about deals and market share than to do product evaluations. Other specialized fields, such as ichthyology and crystallography aren’t subjected to the mainstream media except when something big happens. Otherwise their literature is restricted to a limited audience and a limited group of researchers and writers. Not so for search.

This Businessweek article, A Gaggle of Google Wannabes, gets closer than most articles. In it the author actully gives a brief description of how Ask’s algorithms differs from Google’s. (Note: when I say brief, I mean brief, but still better than nothing). To quote: “Ask.com’s algorithm, on the other hand, retrieves and ranks results based on the number of times groups identified as related to the topic reference the site. Company executives say the method is superior because it theoretically avoids displaying generally popular sites that are not frequently referenced by other sites on the topic.” In this excerpt there’s a brief explanation of what’s different but then the article relies on “company executives” to say their product is superior. But nonetheless I actually did learn something about Ask.

UPDATE 11/7/06: The search engine that would outdo Google, by Bambi Francisco, is an article that actually does some quick and dirty relevancy testing of queries. Wow! She evaluates how Powerset compares to Google for certain natural language queries, and points out a couple of examples of how the two engines handle some stopwords and proximity issues. According to her evaluation Powerset does better with natural language queries.

Author Rank

What if search engines could apply an “author rank” similar to Page Rank? Then ranking algorithms could retrieve and/or rank based on trust of the author, which would be determined by linked networks. And linking networks could be big. Think 7 or 8 degrees of separation. So big that any real author would eventually be linked to every real searcher. And if a bad apple spammer got into someone’s network they would be quickly found out.

There are many places where people link to each other, such as LinkedIn, Netflix, Yahoo! 360, Myspace, Facebook, flickr, Yahoo! My Web, Furl, delicious, Youtube, email and IM address books…need I go on? I know it could ruin anonymity of the web, but maybe search spam has forced us to this.

And anonymity might be able to be preserved. Perhaps there’d be ways to create trusted anonymous connections, such as having a nom de plume. Or is it that important? Journalists and authors use their names (or a pen name that sticks with them) so why should the web be different? My point though isn’t to un-anonymize the web, my point is to help search engines know which pages are real and trustworthy based on who created them not just who links to them.