We are moving towards a world in which every form of communication is accessible and the cost of storing data is miniscule. Future historians will be dumbfounded by the massive amount of data available from our era because everyone is now a creator. If youâ€™re not writing a blog, youâ€™re taking digital photos. If youâ€™re not taking digital photos youâ€™re leaving a trail in your clickstream through the web. If you donâ€™t use the web you are creating text-retrievable data in your phone calls.
Standing in an airport taxi line listening to a woman blab about loud kids on her flight got me thinking about all these technologies that are aimed at fostering communication between people. Cell phones, text messaging, digital photography, blogs, etc. In general Iâ€™m for these new modes of communication that make everyone a creator, however it creates a challenge for search companies. Not only are search engines expected to index and retrieve bits of text from massive collections of blogs, for instance, but in the future search engines will be required to index and retrieve words from peopleâ€™s voice communications. Because we are moving to a world in which every form of communication is perceived as creation, search engines are required to index and search every word that woman said. Yet whatâ€™s the point of archiving her complaints about a loud flight? Itâ€™s possible that it could be useful to her in the future, but most likely it wonâ€™t ever be and it will cloud and obscure useful information.
Right now I have the option to archive my IM chat sessions. When will I have this option for my phone calls? When I do the first thing Iâ€™ll do is categorize my conversations and voice mails into a directory. Iâ€™ll have a category for my wife, a category for parents with sub-categories for each parent, etc. Then people will create the technology to let me tag my phone calls so I can classify calls as â€œxmas gift list”, â€œnew yearâ€™s eve plans”, etc. And then eventually Iâ€™ll be able to do full-text searching of each phone call so I can zoom in to the part Iâ€™m interested in. Beyond personal communications, companies will be able to record voice communications to their customers and then post those in a searchable format. Iâ€™ll search my ISP for the voice menu instructions about hooking up my DSL service.
OK, great, in the future Iâ€™ll be able to search voice conversations as well as web pages, blogs, my email, and my IM using one or several search interfaces. But is this a good thing? For each of these categories of information there will be those who exploit them for their own personal profit: spammers. They will figure out how to drive traffic to their content through each of these technologies. And so the search engines will have to not only index all that content, but theyâ€™ll also have to devise crime-fighting techniques to identify and remove from retrieval the spam. The engines are faced with removing the trivial as well as the nefarious. Same as now, but with new formats in the future. Controlled vocabularies and pre-defined structures, like the Library of Congress Subject Headings, are not faced with spam (though bias creeps in), but in this world, where everyoneâ€™s a creator, pre-defined taxonomies and ontologies are being stretched thin due to the massive scale of information, hence the rise of folksonomies.
And privacy? To have a private dialogue the only way will be face to face until people start walking around with voice recorders in their cell phones and then every face to face conversation will also be archived and retrievable. Is this really what we want? Is it enough to put access restrictions on our own private archives? No.
Librarians, archivists, search engines, and all other information professionals need to decide what to do with so much data. The only way I can see through this mess is a melding of open structures and controlled structures. There needs to be a layer that sits on top of open structures like flickr and Furl and helps to organize things. This is the role of information professionals: to help organize while not interfering too much. Build thesauri, bridge related concepts, point out broader and narrower terms, but not hamper users from defining their own tags for content. In that way we can identify the woman’s cell phone call about her flight for what it is: a daily conversation between two people not worthy of permanent preservation.
Update: I developed this idea further in my post about tagsologies.