Monthly Archives: November 2005

You are browsing the site archives by month.

Everyone is a Creator and Every Creation Will be Indexed

We are moving towards a world in which every form of communication is accessible and the cost of storing data is miniscule. Future historians will be dumbfounded by the massive amount of data available from our era because everyone is now a creator. If you’re not writing a blog, you’re taking digital photos. If you’re not taking digital photos you’re leaving a trail in your clickstream through the web. If you don’t use the web you are creating text-retrievable data in your phone calls.

Standing in an airport taxi line listening to a woman blab about loud kids on her flight got me thinking about all these technologies that are aimed at fostering communication between people. Cell phones, text messaging, digital photography, blogs, etc. In general I’m for these new modes of communication that make everyone a creator, however it creates a challenge for search companies. Not only are search engines expected to index and retrieve bits of text from massive collections of blogs, for instance, but in the future search engines will be required to index and retrieve words from people’s voice communications. Because we are moving to a world in which every form of communication is perceived as creation, search engines are required to index and search every word that woman said. Yet what’s the point of archiving her complaints about a loud flight? It’s possible that it could be useful to her in the future, but most likely it won’t ever be and it will cloud and obscure useful information.

Right now I have the option to archive my IM chat sessions. When will I have this option for my phone calls? When I do the first thing I’ll do is categorize my conversations and voice mails into a directory. I’ll have a category for my wife, a category for parents with sub-categories for each parent, etc. Then people will create the technology to let me tag my phone calls so I can classify calls as “xmas gift list”, “new year’s eve plans”, etc. And then eventually I’ll be able to do full-text searching of each phone call so I can zoom in to the part I’m interested in. Beyond personal communications, companies will be able to record voice communications to their customers and then post those in a searchable format. I’ll search my ISP for the voice menu instructions about hooking up my DSL service.

OK, great, in the future I’ll be able to search voice conversations as well as web pages, blogs, my email, and my IM using one or several search interfaces. But is this a good thing? For each of these categories of information there will be those who exploit them for their own personal profit: spammers. They will figure out how to drive traffic to their content through each of these technologies. And so the search engines will have to not only index all that content, but they’ll also have to devise crime-fighting techniques to identify and remove from retrieval the spam. The engines are faced with removing the trivial as well as the nefarious. Same as now, but with new formats in the future. Controlled vocabularies and pre-defined structures, like the Library of Congress Subject Headings, are not faced with spam (though bias creeps in), but in this world, where everyone’s a creator, pre-defined taxonomies and ontologies are being stretched thin due to the massive scale of information, hence the rise of folksonomies.

And privacy? To have a private dialogue the only way will be face to face until people start walking around with voice recorders in their cell phones and then every face to face conversation will also be archived and retrievable. Is this really what we want? Is it enough to put access restrictions on our own private archives? No.

Librarians, archivists, search engines, and all other information professionals need to decide what to do with so much data. The only way I can see through this mess is a melding of open structures and controlled structures. There needs to be a layer that sits on top of open structures like flickr and Furl and helps to organize things. This is the role of information professionals: to help organize while not interfering too much. Build thesauri, bridge related concepts, point out broader and narrower terms, but not hamper users from defining their own tags for content. In that way we can identify the woman’s cell phone call about her flight for what it is: a daily conversation between two people not worthy of permanent preservation.

Update: I developed this idea further in my post about tagsologies.

Cities and Search Engines

I’m reading a book by Joel Kotkin called The City. The author writes that cities have three common aspects that stretch across time and space.

A city offers:

    1. A sacred space for its citizens to come together for purposes of spirituality, camaraderie, and devotion to that which is greater than the individual.
    2. Security for its citizens.
    3. An environment conducive to commerce.

Historically all of these are pretty easy to see. In contemporary times #1 has been supplanted to some degree by secular things like civic pride, community organizations, public gatherings, and sports.

In the context of the web, search engines (portals?) provide these three things for us now. Let’s see if this works:

    1. Sacred space – People gather together through search engines based on shared interests and communities.
    2. Security – Users trust search engines to guide them safely through the web.
    3, Commerce – I think this is the easiest one to see. Search engines facilitate 21st century commerce.

Am I stretching the analogy too far?
Physical location means less than ever and I wonder what the future of the city will be.