Fill Crawlers

Some notes on how I am using crawlers as I’m collecting links.

I’ve started dabbling in crawlers with two simple prototypes—these may not even be considered crawlers, but simple web fetchers or something like that—but I think of them as being (or becoming) fill crawlers. Most crawlers are out exploring the Web, discovering material and often categorizing them, given some kind of algorithm that determines relevancy. Here, I’m the one discovering and categorizing; the fill crawler only does the work of watching those pages, keeping me aware of other possibly relevant sites and notifying me when I need to update that link.

So, these crawlers are filling in the blanks for certain links. Filling in missing parts that aren’t editorial. This isn’t a crawler that is feeding the site’s visitors—it’s there for my utility.

For href.cool, the crawler isn’t really a crawler, given that it doesn’t do any exploring yet. It just updates screenshots, lets me know when links are broken and tracks changes over time. Eventually, I hope that it will keep snapshots of some of those pages and help me find neighboring links.

Anyway, I’ve had that crawler since the beginning and it will stay rather limited since it’s for personal use.

For indieweb.xyz, I’ve started a crawler that’s also for keeping the links updated. Yeah, I want to know when something is 404 and keep the comment counts updated. But I also want to get better comment counts by spidering out to see the links that are in the chain. This crawler allows indieweb.xyz to stay updated even if Webmentions don’t continue to come in from that link.

I think the thing that excites me the most about this crawler is that I’d like it to start understanding hypertext beyond the Indieweb. I’m hoping it can begin to index TiddlyWikis or dat:// links, so that they can participate. I’d really like TiddlyWiki users to have more options to broadcast that doesn’t require plugins or much effort—they should remain focused on writing.

Both of these projects are focused on trying to help the remaining denizens of straight-up Web hypertext find each other, without it functioning like another social network that becomes the center of attention. To me, rather than giving the crawler the power to filter and sort all these writings, it simply acts as a voracious reader that looks for key signifier that all of normal readers/linkers are looking for anyway. (Such as links in a comment chain or tags that reveal categories.)

That’s all I have to say at the moment. I mostly put this out here so that people out there will know how these sites work—and to connect with other people (like Brad Enslen and Joe Jennett) who are doing cataloging work, to keep that discussion going.

PLUNDER THE ARCHIVES