trenchant.org

by adam mathes
archive · subscribe

Setting Good Bits

Google bought Pyra, which kind of blows my mind, but wasn’t as surprising to me as it probably was to other people.

Although I theoretically signed an NDA and probably am not even allowed to mention that I ever talked to anybody at Google, I think I can say there were people who interviewed me who definitely “got” weblogs and had some interesting ideas, so who knows, maybe we’ll see something cool come out of this besides the rather obvious and not very interesting “search blogs in real time” (which Blogger used to offer, and which DayPop and others practically offer now) or the “use blogs as fodder to better rate and instantly insert pages more quickly into the search engine” which Google already pretty much does by spidering the crap out of frequently updated sites.

I still don’t pretend to understand the possible business rationale of the deal. (My guess is the Blogger brand is probably more valuable than the actual software to a company like Google. But buying a brand? When you already have the best global brand? Or maybe they wanted something to instantly bundle with the search appliance.)

But, anyway, every wannabe web pundit has already talked about this deal, but I’m mentioning it because it’s semi-related to another idea I wanted to get out so I can later seem smart if anybody uses it. (Also because I don’t want to talk about the “other stuff” in my life right now.)

This allows you to search not the Google-wide web that automatically returns pages Google thinks are the “good” ones, but the explicit subset of what you have previously declared to be “good” pages by blogging them. Expanding this to allow searches of only linked articles from your “friends’” blogs or even Google-defined “good” or “related” blogs would also I think be pretty interesting. (I could probably more exactly explain this with maps and nodes and computer dork terminology, but I think you get the idea.)

Anyway, I wrote software to do something like this, but Google is in a much better position to do it more effectively since they have an existing giant search engine infrastructure. Here’s some revised notes from when I was designing it:

Currently, it is difficult for me to find specific pages I remember reading even just hours later. Bookmarks are the obvious answer to that, but there are problems. Bookmarking pages captures only a very small bit of the content of the page, normally just the title. This rarely captures the information you actually wanted from that page. Bookmarks in the major browsers encourages you to use hierarchal organization (folders within folders) which at times is useful, but I find in general doesn’t help me much in searching for pages I’ve visited. And bookmarks don’t scale well.

The next solution would be to search your browser’s cache. Hard drives are big enough that you can let your web cache grow fairly large without any serious worry over disk space. Anyone who has used the “search history” function in Internet Explorer knows that it is not as useful as it seems like it would be. Even if it wasn’t a UI mess (Why is it in that tiny sidebar? Why does it only show page titles? Why can’t I resort the results? Why can I see when I saw a page in history mode but not in search history mode?) the fundamental problem with searching through the cache is you repeat information sifting you have already done. What I mean is the process of searching the web for particular information or doing research or even just surfing randomly involves both hits and misses. While you may find the information you were searching for, you also visited many pages that seemed possibly relevant to your search, but after visiting them you realized they were not relevant or were just boring and not worth ever seeing again. Since your browser cache has no way of knowing these arbitrarily “good” pages from “bad” ones, a search through the cache returns a lot useless results.

The granularity of what you search right now is usually confined to searching the following domains:

  • the whole web
  • the "weblog community"
  • recent news stories
  • a single domain
  • your entire browser cache

You really need to be able to set a “good” bit on pages on the web.

So instead, I wrote a program to search only your predefined “good” pages, called Flick.

Anyway, Flick is a little side project I was writing before I lost my DSL connection, and probably will never release anyway. Flick (which vaguely stood for “friendly little internet content kettle) was more of a proof-of-concept toy than fully developed application.

Flick is a first attempt at a “personal content repository.” If you find a page that has information you think is important, and that you may want to find again later, you “flick” the page. This saves it to the little personal content repository database. Later on, you can search through this repository that only has the specific pages you found valuable.

My proof of concept solution was to create a simple client-side cache of pages, along with an inverted index to allow fast keyword searches on it. Not necessarily the most elegant or innovative solution (it heavily piggybacks on top of existing perl modules that do almost all of the work) but I am finding it to be a fairly useful tool.

To use Flick you surf normally. When you find a useful page, you click a “Flick it” bookmarklet, enter optional commentary, then click “save” and it’s automagically saved to your personal content repository. Later, you can search your flicked pages with a simple, standard, web page. (See, Flick was also my attempt at a web-app in a box that runs locally, it packages an HTTP server and database into a self-contained perl app that I compiled to a native exe. SQLite is fun like that.)

But if you’ve maintained a weblog, you can definitely see a parallel between the “blog this” and “flick this” kind of mentality - it is a conscious, clear indication that “this page is x” where x could be good, bad, weird, stupid, but almost always includes “something I might want to find later.” And it can be pretty useful to keep a literal log of web pages, and with good commentary or quotes chosen it can be a valuable aid in searching, not just sharing interesting links temporarily, but it’s not a replacement for full-text search. (Flick allowed you to add commentary and specify and excerpt and search through those, it has webloggish moments where it can pop out RSS files and HTMLified link lists of “recently flicked” pages.)

Again, Google with the Blogger acquisition is in an ever better position to do some interesting things with a “find pages linked from my site x that contain phrase y” and in fact if I played around with the Google API I might be able to come up with a proof of concept using their existing tools, but you know, they didn’t hire me a few months ago when they could have (1) so that will never, ever happen. Not that I’m bitter. Although I will admit occasionally I have fantasies about Microsoft crushing them like they did Netscape.



Finally, Even though I have certainly made fun of Blogger and numerous weblogs over the years, and haven’t been a Blogger user in a long time, I can’t help (as Kottke noted) feel a certain (probably irrational) happiness because of the deal. There’s some feeling that this somehow helps to legitimize and boost personal, independent web publishing. And how can that not be a good thing?

  1. Don’t misinterpret this, I am of course still available for hire.

· · ·

If you enjoyed this post, please join my mailing list