Scraper: add url from which page was scraped

peter · October 30, 2006

Hello,
It would be good if the URL field would be automatically filled in with
the URL of the site from which you scrape the reference.

Regards,
Peter

dstillman · October 31, 2006

This was mentioned elsewhere, but there's a necessary distinction between the URL field and, for example, a linked URL, and the URL field isn't always appropriate—it should really only be used when the URL provides the actual content (say, a New York Times article) rather than just a reference to a physical copy stored elsewhere (say, a book's Amazon page). It's also complicated by the fact that most library catalogs and many other databases don't provide static URLs, so there's no use in saving those URLs. We may be adding a Repository URL field, however, to go along with the Call Number and Repository fields,* so in the case of dynamic URLs that would at least store the main URL of the library site.

Our goal is to update most of the scrapers to store a URL one way or another, but we haven't finished doing so yet.

* The astute observer will note that the Call Number and Repository fields already break the convention of having the parent item be an abstract representation rather than a physical/digital instance, but we're not too keen on separating out that metadata, and the requirement for storing multiple library data is probably rare enough that the Extra field would suffice, so it'll probably stay this way.

peter · November 1, 2006

Ok, that is true of course. It's not _so_ important anyway ;-)