November 15, 2004

Google and Known Unknowns

By Paul

Sometime time back Brad DeLong had an interesting post about searching and creating metadata which is worth quoting in detail:

…Let's take Donald Rumsfeld's four catagories: the known knowns, the known unknowns, the unknown knowns, and the unknown unknowns:

The known knowns: The things that you know, and that you know that you know. Here there is no information retrieval problem at all.

The known unknowns: These are the things that you know are on your hard disk someplace, but you're not sure where they are or what, exactly, they say. Your recollection needs to be refreshed. Here is where search based on full-text indexes plus high-quality metadata shines. We know how to make full-text indexes. We know how to search such indexes plus metadata. The only potential problem is a social engineering one: how to make sure that high-quality metadata about files is created and maintained.

The unknown knowns: Once you have found your known unknown, you then want to find what other files on your hard disk are related to it. The same keyword and text search won't necessarily pick them up. This is what subdirectories--folders--are supposed to be for: one of the benefits of grouping related files in subdirectories is that one can then thrash about and get hold of related information. And, because one file may well belong to more than one possible group of unknown knowns, we have symbolic links--aliases. Once again, however, there is a social engineering problem: how to make sure that files are sorted into the right folders and that the right symbolic links are created, for this task can also be "tedious in the extreme." And we are vain and lazy infovores.

The unknown unknowns: These are things that one would search for if one remembered enough about what was on one's hard disk (or knew enough about what was on the web) to know that one should look for them. Here we have a very difficult problem: how do you jog someone's memory or tell them enough about what is known so that they can figure out what kinds of things they can search for? I think that this is a very hard problem indeed.”

I think product’s like Google’s Desktop Search fit into the ‘known unknown’ category above. David Pollard further expands on Google’s foray into the ‘Personal Content Management’ tools. More on the competition between Google and Microsoft is described in this Economist article.

My favorite desktop searching tool is Copernic Desktop Search. Together with Google it is a good combination. A list of similar products is given by DeLong. I wonder why our IT departments didn’t recommend these, they ought to be concentrating on improving personal productivity.

Posted at November 15, 2004 09:33 AM

Comments

One of the factors that is currently shaping our world is the spiralling 'cost of not knowing' -- not knowing that Iraq had no WMD, not knowing that 9/11 was going to occur, not knowing what Global Warming will wreak, not knowing about SARS and Mad Cow and the Bird Flu -- and whatever will come next. The real question is, even if we decide we can no longer afford this cost, what can we do? If the world is indeed a 'complex adaptive system' then that means most of what we need to know to be able to continue to steward it in our clumsy and ineffective way is unknowable.

Comment by Dave Pollard at November 16, 2004 07:55 PM | Permalink

Comment by 3 credit free report score at November 22, 2004 08:25 AM | Permalink

Yahoo has also got a new desktop search software
http://desktop.yahoo.com/
-Paul

Comment by Paul at May 21, 2005 02:49 PM | Permalink

Post a Comment




Remember Me?

(you may use HTML tags for style):

Note: You may have to reload to see your comment.


Trackback Pings

TrackBack URL for this entry:
http://truckandbarter.com/mt/mt-tb.cgi/287

Listed below are links to weblogs that reference Google and Known Unknowns:


» Copernic Desktop Search - The Search Engine for Yo from Ashish's Niti
It out-competes Google in ease of use and Outlook support. [Read More]

Tracked on November 19, 2004 11:48 PM