Some notes on how I am using crawlers as I’m collecting links.
I’ve started dabbling in crawlers with two simple prototypes—these may not
even be considered crawlers, but simple web fetchers or something like
that—but I think of them as being (or becoming) fill crawlers. Most
crawlers are out exploring the Web, discovering material and often categorizing
them, given some kind of algorithm that determines relevancy. Here, I’m the one
discovering and categorizing; the fill crawler only does the work of watching
those pages, keeping me aware of other possibly relevant sites and notifying me
when I need to update that link.
So, these crawlers are filling in the blanks for certain links. Filling in
missing parts that aren’t editorial. This isn’t a crawler that is feeding the
site’s visitors—it’s there for my utility.
For href.cool, the crawler isn’t really a crawler, given that it doesn’t do
any exploring yet. It just updates screenshots, lets me know when links are
broken and tracks changes over time. Eventually, I hope that it will keep
snapshots of some of those pages and help me find neighboring links.
Anyway, I’ve had that crawler since the beginning and it will stay rather
limited since it’s for personal use.
For indieweb.xyz, I’ve started a crawler that’s also for keeping the links
updated. Yeah, I want to know when something is 404 and keep the comment counts
updated. But I also want to get better comment counts by spidering out to see
the links that are in the chain. This crawler allows indieweb.xyz to stay
updated even if Webmentions don’t continue to come in from that link.
I think the thing that excites me the most about this crawler is that I’d like
it to start understanding hypertext beyond the Indieweb. I’m hoping it can begin
to index TiddlyWikis or dat:// links, so that they can participate. I’d really
like TiddlyWiki users to have more options to broadcast that doesn’t require
plugins or much effort—they should remain focused on writing.
Both of these projects are focused on trying to help the remaining denizens of
straight-up Web hypertext find each other, without it functioning like another
social network that becomes the center of attention. To me, rather than giving
the crawler the power to filter and sort all these writings, it simply acts as a
voracious reader that looks for key signifier that all of normal readers/linkers
are looking for anyway. (Such as links in a comment chain or tags that reveal
That’s all I have to say at the moment. I mostly put this out here so that
people out there will know how these sites work—and to connect with other
people (like Brad Enslen and Joe Jennett) who are doing cataloging work, to keep
that discussion going.