Monthly Archives: January 2005

You are browsing the site archives by month.

Gigablast

Gigablast

Type of engine: General web search.

SUMMARY
Relevancy of results: Needs improvement.
Freshness of results: Very good, and not only are the results fresh but the index date is listed next to each result.
Features and functionality: Very straightforward and easy to use.
Quality of help and “about us” pages: The Help sections are OK, but I learned more about using Gigablast from reading articles on other sites about it then by reading their help sections.
Business model: Selling ancillary search services, such as enterprise search. No sponsored links, banner ads, or any other form of advertising presented to the user.

INTRO
I’ve been wanting to do a review of Gigablast for a while because the story of Gigablast is inspiring. Gigablast was written by former Infoseek engineer Matt Wells. Matt has a blog of sorts, but he has not updated it since February, 2004. The blog mostly covers Gigablast, but also has notes about other non-Gigablast issues. I like his candidness, such as “Alright you SEO people. Get your bots off my search results.”
The thing I really like about Gigablast is that it knows what its business is, namely general web search, and it simply goes about doing that. There are not half a dozen search tabs or other distractions. Although they do offer some other services, as I will mention later, you can go to Gigablast and it is as obvious as night and day what you can do there. You go to Gigablast, you enter a search and that’s that.

Right on the front page they list how many sites have been indexed. As of January 12, 2005 that number is 1,014,363,952. Over the past four days the number did not change, but I know it was not too long ago that the number was half what it is now.

UI & FEATURES and QUERY EXAMPLES
Really the only feature that users see on a regular basis is Giga Bits, and so I have decided to write about features and queries together. Giga Bits are related terms that appear at the top of every search result page. They append the original search rather than replacing it. Giga Bits can sometimes be used to find answers to natural language queries. For example, for What is the capital of Sweden?, the first Giga Bit result is Stockholm. I happen to know that answer is correct, but if you did not know that you might not pick up on the fact that the Giga Bit term can actually be the answer. So although it is a nice bonus, the way it is implemented right now will not be clear to many users. After clicking on the Giga Bits suggestion of Stockholm, the first answer is in Swedish and there is nowhere to set user preferences for things like language.

Giga Bits is a nice idea, but the current implementation is confusing and it took me a while to figure it out. I am still not clear on what the percentages next to each term mean so I tend to ignore them. I think engines do themselves a disservice by naming things with cute and clever names like Giga Bits. How about just calling it something so that users immediately know what they are seeing? Like related terms or refine your query, etc.

As I mentioned earlier, one really cool feature Gigablast has is that it tells you when each page was indexed. Another very nice thing is that there are a few options next to each search result. You can click on older copies to go to the Internet Archive (see my Wayback Machine review.) and see archived versions of the site. You can also look at cached archive copies, and a stripped version that takes out images and just leaves text. These are all very nice.

Gigablast also has something that used to be common, but you don’t see so much anymore. And that is an easy link to other search engines’ results. At the bottom of each search page you will see: Try your search on google yahoo alltheweb dmoz alta vista teoma wisenut.

As a side note, Gigablast used to default to OR searches, but now defaults to AND. Seems wise to me, particularly now that their index size is growing.

Time to try another query, such asHow does Gigablast make money?.
The first listing is a meta-search result page with a list of work at home businesses (MLMs), a few of which were pulled from Gigablast. Not so good. I looked through all ten of the results on Gigablast’s first page of results and none of them were relevant. Results 8, 9, and 10 are all the same entry from John Battelle’s blog and the only place the word “Gigablast” appears is in his list of search companies in the menu section of the page.

But the first Giga Bit is intriguing: “arrangement with Google”. So, by using that modifier, we get how does gigablast make money? “arrangement with Google”. The first two results are duplicates, as in the same content and same title, but one is hosted by Research Shelf and the other by Free Pint. A tough thing for engines to detect, but nonetheless a bad user experience because they offer no differentiating content. And plus, the content is not relevant. The site mentions Gigablast and it mentions Google, but I thought maybe I’d find some juicy bit about the two companies working together, but nope, nothing like that. Aggh, but it keeps getting worse because not only listings 1 and 2, but all of 1 – 7 are actually the same site, and they all have the same title too. The fourth result is a TinyURL which probably should not be indexed in the first place. #7 is a redirect that takes you right to the same site. For #8 the page discusses both Gigablast and Google, but offers no insights into them having any kind of agreement together. Afraid I was led on a wild goose chase.

I also noticed that on the top of the results page it says, “Results 1 to 8 of about 113,” but it is interesting the way they have done this. Because at the bottom of the page it says, “No more results found. Show relevant partial matches for your query.” But when I click on that I get “Results 9 to 18 of about 231.” So now I am a confused user. When it said 8 of 113 I thought that the 8 were AND matches and the remainder were partial matches, however when I clicked on the partial match link I got 231 results. What happened to 113? Where did that go? Not a huge deal, especially since I think other engines get these numbers wrong sometimes too, but it is distracting nonetheless.

For the next Giga Bit refinement, how does gigablast make money? “Matt Wells”, the first result is a Search Engine Watch interview with him from September, 2003, and it has a great answer to my question:

Q. From a business perspective, Gigablast carries no advertising? Is this a decision you plan to keep? How does Gigablast make money?

A: Money is derived from selling search services on my products page. At this point I don’t think I’ll put up advertisements unless I need the revenue to support Gigablast or myself.

Sometimes, the Giga Bit suggestions are pretty random, like how does gigablast make money? “going to make decisions”. There are four results to this strange query. Three of them are the same John Battelle posting and the fourth is from Geeking with Greg Linden. Greg’s posting is indeed about Gigablast, but not about how they make money. There are other posts about other companies like Google and Technorati making money, but not about Gigablast making money.

Some comments about these duplicate and non-relevant results. It seems that Gigablast is indexing and keeping multiple snapshots of the same site. For the above example, the duplicate listing was indexed in September, November and then in December. Although the site may have been updated during those months, the page is the same page. There is only one page. The difference is that the URLs are not being normalized. So here is what two of the three indexed URLs look like:
http://www.battellemedia.com/archives/000627.php
http://battellemedia.com/archives/000627.php

As you can tell, they are nearly identical except that one has WWW. In certain rare cases these could actually be different, but in this case they are the same. The third instance of this site is this:
http://www.snipurl.com/62pj

I think Gigablast could do a couple of things to resolve this situation. If they have the bandwidth they could compare page content and if there is enough overlap, and the URL is nearly identical, they could conclude it is the same page and only show it once, or at least offer it as a cluster. I understand though that site content comparisons are tough and the devil is in the details. The other thing that might work is to compare the display text for the results. For all three of them, the displayed text is exactly the same. That should be a dead giveaway. And the last thing is to not index Tiny URLs and Snip URLs at all. Although these are handy features for capturing long URLs, I think engines should follow them through and only gather the original URL. Plus, in this case, the original URLs are not even that long.

The interesting thing about this de-duplicating business is that for their XML feeds Gigablast offers the option of de-duplicating results. However, I could not find a way to add the parameter to a regular search string. If I am missing something here, please let me know because it seems odd that for RSS searching this functionality is implemented, but not for regular web searching.

I also wanted to modify my query a bit more, so I tried Gigablast business model.
Result #1 is a collection of entries on a search engine blog. There is an entry about Gigablast, but the term “business model” pertains to another entry about a different engine. Same with #s 2, 3 and 10, which are all duplicates of each other. #8 and #9 are a different set, but same problems. Worth noting is that both sets of duplicates are from Resource Shelf. Why is that? The results are not relevant and they are also duplicates of each other. But then I hit gold with #6, well not exactly gold, but a direct path to gold. It is a SiteLines posting by Rita Vine referring readers to an article by Gwen Harris from July/August, 2004. And here is exactly the information I have been looking for:

At present Wells runs Gigablast without any keyword-activated advertising: there are no banner ads and no sponsored links. This isn’t a matter of principle — it’s just that advertisements slow down the query response time.
Income comes from selling the technology. As Wells explains, he has “built Gigablast to be more efficient than the other engines” to save on time and hardware. Webmasters will be interested in his product line for creating indexes or integrating Gigablast web search results.

So, great I got my answer. But what happened that Gigablast did not take me right to the info I needed? So close, but instead of taking me right to the answer to my question it took me to a site that linked to the answer. The reason in this instance is quite straightforward. Nowhere in Gwen Harris’ article does she use the words “business” or “model”. But fortunately the referring site used the phrase “business model”. I don’t know what Gigablast could do better in a situation like this. If I had continued to modify my query I eventually would have gotten directly to it because I did find the site in Gigablast’s index.

CONCLUSION
I am a fan of Gigablast. I have a lot of respect for what Matt Wells has done. He set out on his own to write a new search engine and he did just that. He scaled the technology so that he can index a billion pages with less hardware and financial investment than the big engines, though of course he still has a ways to go to catch up with the big boys. It seems the crawling and indexing parts of Gigablast are its main focus and strengths. Unfortunately, it is lagging right now in its algorithm and search results.

As with many engines, Gigablast has created XML feeds, but I would rather see them improving their results before adding extras like this. This happens at other engines as well. Even MSN has been building RSS feeds and their relevance needs some work too. (See my earlier MSN review.) I suppose the reason engines are doing this is because it is easier to implement RSS feeds than to improve the elusive beast called relevancy. Plus it may be that different engineers work on each of these, so adding XML feeds in no way impedes the progress of other initiatives like relevancy. Whatever the case may be, things like RSS search feeds are great, I love them and subscribe to several from Technorati and Google, but they are used by the few, not the many, and they are only as useful as the quality of the engine’s results.

What I saw in Gigablast’s results was the following:
• Many duplicate sets
• Many results from one or two sources
• Poor matching for multiple terms. In other words, Gigablast noticed the words in my query on a page, but the words were spread out and not in proximity to each other.

With some improvements to things like this, Gigablast can be a very nice homegrown engine and alternative to the big fellas.

Findory - Interview with CEO Greg Linden

The Search Lounge is very pleased to feature an exclusive interview with Greg Linden, founder and CEO of Findory. Findory is a unique service. It’s not quite a search engine, it’s not quite an RSS subscription tool, and it’s not quite a news aggregator site. So what is it then? I’ll quote Greg from an email he sent me: “…Findory isn’t a normal search engine. The primary focus of Findory isn’t search. The primary focus is discovery. The site learns your interests from the articles you read, searches thousands of sources for you, and surfaces interesting news articles and new sources. It’s like a newspaper built just for you. Quite a bit different than your average search engine.”

Findory shows you blog entries and news stories that match your reading patterns. And you do not have to enter a single bit of information about yourself. You do not even need to register to use the basic features. All you have to do is visit Findory’s homepage and either search or browse the listings. Then every time you click on a link to read it, Findory (I think) analyzes the page’s content and also checks to see who else clicked on that link and what else they have clicked on. Then when you return to the Findory.com homepage there will be links that Findory thinks you will be interested in. Although the back-end technology may not be crystal clear, it is actually very simple and powerful to use.

Greg also maintains a very good blog that I read constantly called Geeking with Greg. He writes about search, RSS, and other Internet topics. For several years Greg worked on personalization features for Amazon.com.

This interview was conducted exclusively via email.

Hi Greg, thanks for joining us at the Search Lounge. It is a pleasure to have you with us. Would you mind starting off by giving a bit of background information about how Findory fits into a user’s repertoire of online information tools? In other words, what are the benefit for those readers who have yet to try Findory?

Imagine the front page of a newspaper unique to you, emphasizing the news of the day you need to see. Findory is a personalized newspaper that learns your interests and builds a front page of news stories specifically for you.

Findory helps you read the news faster and more efficiently. Rather than skimming many sites to try to find the news of the day, Findory brings all the daily news to one spot, sorting news from thousands of worldwide sources. You will find articles you might otherwise miss while getting broad coverage of major news events.

Unlike news aggregators such as Google News, Findory is personalized to you, focusing your attention on the news you need. Unlike customized news sites such as My Yahoo, Findory requires no effort to use. You do not have to specify categories or keywords; Findory learns what news you want to see just from the articles you read.

Any plans to offer users a personalized interface to go with personalized listings? As a user, I would be particularly interested in changing the category ontology on the top page to promote subjects I am interested in.

It’s a great suggestion. We have built Findory to be super easy to use. No effort. No registration. Just read news and the front page gets better and better.

But some of our readers are interested in customizing the Findory front page. We will be launching more customization features over time, but our site will always remain focused on being easy to use, no effort required.

Would you mind shedding some light on whether Findory crawls and indexes its own search results, or whether they are pulled from other engines? And as a follow-up, is the search algorithm built in-house?

Findory has its own crawl of thousands of news sources and weblogs.

Our personalization technology was designed and developed by Findory. The personalization engine and news and weblog search engines were built in-house. The web search engine personalizes Google web search results.

When I search on Findory I have the option to search news, blogs, or the web. There does not seem to be an option to search a combination of those sources. As a user, I am less concerned about which source bucket the information comes from, than I am about the information itself. Any chance you might offer users an advanced search option to handle that? And how about advanced syntax?

It’s a great suggestion. We’ve designed our search engine to be simple, fast, and easy to use. So, while it is true that there are no advanced search options, the news, weblog, and web searches are quite unusual in that they are all personalized. Different users doing the same search on our site will see different search results, all depending on their history and their interests.

As an outsider looking in, and a user, my impression is that when I click on a link, Findory analyzes the keyword content of the site and looks for other sites with overlapping terms. Findory also does something like Amazon’s “Customers with similar searches also purchased…” functionality, except in this case it is “Other users who read this article also read…” I realize you are not able to give away the secrets of your technology, but what can you share with us about it?

At a high level, we look at what other readers like you are reading to recommend news stories to you. It’s a community. Imagine if friends of yours using Findory were constantly recommending articles to you. Now imagine that those friends are found automatically for you. That’s how Findory works.

How does Findory evaluate its personalized listings and search results in order to improve? In other words, is there any kind of testing or evaluation to determine the relevancy of the listings shown to users?

Great question. We’re constantly testing refinements to our algorithm and trying to improve on it. As we get more and more users of Findory, the quality of our recommendations get better and better. Findory builds on the strength of its community.

What subjects is Findory particularly good at surfacing for users? Any areas that need improvement?

The quality doesn’t seem to vary by subject, so I’m not sure this is a real issue for us. One area where we are seeking to improve is our weblog coverage. We have thousands of weblogs in our database, but that represents just a small fraction of the weblog content out there. We are working aggressively to extend our crawl.

Sometimes I want to be able to give feedback to the Findory engine when a site is not relevant. For example, I searched for San Francisco because as a San Franciscan I thought I would help Findory by focusing it on my location. However, the next time I used Findory there was an article about the San Francisco 49ers firing their head coach. And the next time I went back the lead article was about the NFL. In that case, Findory was close with the 49ers article, but in the end it was not relevant because I am not a football fan. Any plans to offer this kind of feedback loop?

It’s a great idea. We’re considering customization features such as being able to rate articles or sources or being able to say “not interested” on articles. We are looking at ways to do this that don’t interfere with the main purpose of Findory, reading news.

Do you see a way for regular search engines to integrate your technology?

Absolutely. Our technology is designed to be highly scalable. Despite our complicated personalization, the online portion of our processing only takes a tens of milliseconds. We want Findory’s personalization to be helping millions of readers find the information they need.

Blog searching by companies like Findory, Daypop, and Technorati, to name just a few, has become one of the hot search topics these days. How do you see blog searching fitting into the overall search engine industry in the next year or two? What about real-time indexing?

The real issue with weblogs is finding good content. Weblogs are self-published. No publisher means no filter. That’s a good thing and a bad thing. It’s good because it opens the floodgates for so much new content. It’s bad because those filters are sometimes useful, helping readers differentiate useful from useless. This problem will only get worse and worse as the blogging phenomenon accelerates.

Findory is all about relevance. In a sea of information, how do you surface what people need? Current web and weblog search engines all use the same relevance rank for all searchers, but not everyone has the same definition of relevance. Findory learns your interests — what is relevant for you — and surfaces that content.

Some have said 2005 will be the year search becomes personal. Findory has already taken the first steps.

I have yet to see a sponsored result or any banner ad on Findory. Would you mind commenting about Findory’s business model?

Findory currently has no advertising. We have designed a personalized advertising engine that targets based on reader’s interests, much like Google AdWords but more targeted. But, readers aren’t huge fans of advertising. At this point, we prefer to work on features our readers want. When we need to launch advertising to support our website, we’ll be ready.

Lastly, what’s your favorite drink?

Favorite drink? It’d have to be coffee. Caffeine is what keeps us geeks going.

Thanks Greg. Are there any other comments you’d like to add?

Thanks, Chris! Glad to hear you’re enjoying Findory!

Jobs In Search - Interview with Owner Mike Taylor

The Search Lounge is very pleased to feature an exclusive interview with Mike Taylor, founder of Jobs In Search.com. Jobs In Search is located in the United Kingdom and bills itself as “The specialist job site for the Search Engine Industry”. As well as having listings for jobs at search engine companies, they also list positions from a variety of search-related companies, including those that do search engine optimization, online marketing, and web design. All features are free for job seekers, including registering. The site has been live since October, 2004 and in that short time I have seen the U.S. listings go from none to more than twenty.

Mike has a background in online recruitment, and to find out more about him check out his blog, including a more comprehensive biography.

This interview was conducted exclusively via email.

Hi Mike, thanks for joining us at the Search Lounge. It’s a pleasure to have you with us. Would you mind starting off by giving a bit of background information about how you came up with the idea of Jobs in Search?

Thanks for the invite Chris. I am pleased to take part.

My background is mainly HR and recruitment, having worked for large blue-chip companies such as IBM, Motorola and Nokia.

It was during my time at Nokia (1999 - 2001) that I first got involved with online recruitment. I ended up being responsible for Nokia UK’s online recruitment strategy and I also served on Nokia’s global Internet recruitment strategy board.

After leaving Nokia in 2001 I set up Web-Based-Recruitment, an online recruitment consultancy offering advice to companies on how to use the Internet to attract and recruit new employees.

As an extension to the online recruitment part of the business I also started Domain Attraction, an online marketing business providing search engine marketing services.

I had the idea for JobsInSearch.com in the middle of 2004 as it enabled me to draw on my background and experience of online recruitment and online marketing.

Is Jobs In Search a full-time job for you, or do you have another 9 to 5 job?

I am now working full time on Jobs In Search but I still run some search engine marketing campaigns for my existing clients.

How do you define the term “search industry”? And this is probably a tough follow-up question, but do you have a rough estimate as to how many people currently work in this industry?

My personal definition (as far as Jobs In Search is concerned) is that the “search engine industry” covers all companies that are involved, or connected with, the term “search”.

This includes the Search Engines, Search Engine Optimization and Search Engine Marketing firms, New Media Agencies, firms providing search engine related software (positioning / analytics etc.) and Web Design firms.

As for how many people work in the search engine industry this was a question we asked ourselves when first starting the project. Although you can obtain figures on the number of people working with the large search engines, finding figures for the smaller search related firms proved very difficult.

Which job areas and skills do you see as being in the highest demand, both now and in the near future?

From the companies we have talked to we sense a shortage of good search engine optimization and search engine marketing professionals. As “search” continues to grow we also expect more and more companies to be recruiting for business development roles in the future.

How do you publicize your site so that people in the industry know about it? And what kind of response are you getting from hiring companies and recruiters? How about from job seekers?

We issued press releases related to the official launch which got picked up by various search engine related news and blog sites. We have also issued some further press releases since which have also been widely reported.

As you would expect we are also trying to maximize our natural search engine ranking positions as well as using pay per click. We also launched our own blog site in November 2004.

The response so far has been very encouraging as we have had a number of high profile companies contact us having seen our press releases. As for job seekers we are continuing to see the number of registrations on the site increase, which suggests that word is getting out there about the site.

I notice the majority of your jobs are for the UK, (currently more than forty jobs are listed) which is natural considering that is where you’re located. Do you have plans to expand the international listings? Has it been a challenge for you to make contacts with companies in other countries?

The plan all along was to be an international job site. You are correct in that we started with UK job listings but we expect the majority of jobs to be US based.

As for making contact with companies outside the UK it has not been a problem for us. All of the companies we have spoken to so far were happy to receive further information about our services.

Job seekers have the option of registering with you and sending in their resume and contact information. Can you explain the process and what happens after a job seeker sends in his or her resume in terms of hiring companies and recruiters seeing it?

The first step is that job seekers can register for our jobs by email service and we will send them details of any new jobs as soon as they are posted on the site that match their requirements.

In addition they can also register their resume with us and choose whether or not it can be viewed by companies and recruiters. We do not allow any companies or recruiters to view any resumes without the job seekers permission.

How does Jobs In Search work for a hiring company or a recruiter? Do they pay subscription dues? Or do they pay per job posted or per screened applicant? Or do they pay per successful hire? Or some other way?

To advertise on Jobs In Search companies or recruiters are required to purchase job credits. Each job credit lasts for 30 days from the date of posting. As there is no time limit on the use of job credits savings can be made by purchasing multiple job credits in advance.

Having researched the market for the cost of job postings we feel that we offer a very fair price for a specialist job site offering targeted job seeker traffic.

I noticed you have a jobs by email feature, but I didn’t notice an RSS feed option. Is that something you’re considering? Also, we’d love to hear about any other features and functionality improvements that you’re working on. For instance, can I assume that as you get more listings you will break down the geographical regions beyond the country level?

We are currently looking into RSS feeds and hope to add this feature in early 2005. As for other features we have more “Meet The Experts” interview profiles to add in January 2005 and beyond. We may also add an additional “training” feature in 2005 plus more information for job seekers about resume preparation etc.

As for functionality you can currently search for jobs in the major countries by state or county at the moment. However, as there is a lot of interest in “local search” we may consider more “local” options in the future.

Lastly, what’s your favorite drink?

A cup of tea!

Thanks Mike!

Post Navigation