URIs (again)
So I'm still finding the URI schema for zotero.org/2.0 strange, after having complained about it months ago, and having Dan assure my it was just temporary.
Why am I still seeing trailing ids after the natural language slugs? E.g.:
Or:
You already preview the URI in group creation, so you can at the same ensure the group names are unique (which is important anyway so that people don't create groups with large but not conscious overlap).
I would say if you want to include disambiguation ids somewhere in the URI, they might be GUIDs, and they should certainly go before the slug.
Also, this is an aesthetic thing, but can you change your slugifying code to use dashes instead of underscores to replace spaces?
Why am I still seeing trailing ids after the natural language slugs? E.g.:
http://www.zotero.org/bdarcus/34
Or:
https://www.zotero.org/groups/critical_geopolitics/23
You already preview the URI in group creation, so you can at the same ensure the group names are unique (which is important anyway so that people don't create groups with large but not conscious overlap).
I would say if you want to include disambiguation ids somewhere in the URI, they might be GUIDs, and they should certainly go before the slug.
Also, this is an aesthetic thing, but can you change your slugifying code to use dashes instead of underscores to replace spaces?
http://www.zotero.org/bdarcus/34
will all point to the same thing. Note that the shorter http://www.zotero.org/bdarcus also points there. But if you change your username, it will become a dead link. Why? Putting human-readable ids makes the URI look "friendlier" on a first parse to me & it is probably more search-engine friendly to put content at the start of a URI. Having created groups that have hyphenated words, I don't know if I agree with your aesthetic. I wouldn't have a strong objection to it, but I likehttp://www.zotero.org/bruce/34
http://www.zotero.org/darcus/34
https://www.zotero.org/groups/atom-probe_tomography
Wikipedia uses the same URI scheme, so maybe we all just like what we know ;-).On the slugifying, it certainly is aesthetic, and so in that sense there probably is no objectively correct position. FWIW, I take my cue here from frameworks like Django, blog engines like WordPress, Drupal, etc., all of which use dashes (Django actually has a built-in slugify field type which does it all automatically).
Also, as far as the API is concerned, the item URI is just http://zotero.org/users/34/items/10049, and that will redirect to http://www.zotero.org/bdarcus/34/items/10049
This may not be the ideal scheme, but it does have its advantages and avoids certain problems with other approaches.
OK, there are two separate issues here. There's the somewhat anal aesthetics of pretty URIs. This is admittedly not that important (though obviously something I care about).
There's also the really important issue about URI stability. Can we please have one identifier for each user/group/item in zotero, rather than many? This isn't just about serving web pages to users; it's also about being able to access structured data (e.g. RDF).
As an example, what possible real advantage is there to allowing users to change their user names that outweighs the problems on the data end? If I use that URI above for an item to cite something, does it end up in a 404 when I want to grab the data after later changing my user name? Or are you going to put in redirects, and added RDF triples, just to account for those changes?
http://www.zotero.org/groups/atom-probe%20tomography
should point to the right page.https://www.zotero.org/groups/atom-probe%20tomography/11
https://www.zotero.org/search#group/atom
But it is probably not user friendly that this also works:https://www.zotero.org/search#group/probe
https://www.zotero.org/search#group/probe%20tomograph
https://www.zotero.org/search#group/e%20t
while this does not:http://www.zotero.org/search#group/atom%20probe
http://community.muohio.edu/blogs/darcusb/?p=585
are the same thing?http://community.muohio.edu/blogs/darcusb/archives/2009/05/10/html-5-microdata-use-cases
It seems the choice is "pretty/opaque urls that are surprising on name changes" vs "ugly/magic urls that don't surprise."
As for the choice, this presumes there's a compelling need for users to change their usernames. My position is there is not.
The issue here is that Zotero.org can become a big, open, linked database of scholarly data. These data can be linked to Library of Congress data, and/or this in-development periodical data, etc., etc.
If they follow the principles of linked data, this is easy to do. But a fundamental prerequisite is a sane and stable URI scheme. It seems doubtful to me one exists ATM, which has me worried.
PS - What I meant by opaque is essentially that it's nowhere visible in the interface.
As I understand it, while URIs are mutable by design, it is URNs that are meant to be immutable. That seems to be why the former can be written off the cuff and provide a redirection mechanism, while the latter are issued by a standards authority and do not.
(This is unrelated to aesthetics and readability, of course.)
$ curl -I -H "Accept: application/rdf+xml" http://www.zotero.org/bdarcus/34/items/10051
HTTP/1.1 200 OK
Date: Sat, 16 May 2009 00:55:57 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.2.4
Set-Cookie: zotero_www_session=deleted; expires=Fri, 16-May-2008 00:55:56 GMT; path=/; domain=www.zotero.org
Set-Cookie: zotero_www_session=deleted; expires=Fri, 16-May-2008 00:55:56 GMT; path=/; domain=.zotero.org
Set-Cookie: lussumocookieone=deleted; expires=Fri, 16-May-2008 00:55:56 GMT; path=/; domain=.forums.zotero.org
Set-Cookie: zotero_www_session_v2=ppu887h5s6plr76ja3eu5sb2n2; path=/; domain=.zotero.org
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Content-Type: text/html; charset=utf-8
If I actually request that RDF, I get back HTML. Just because you can change URIs doesn't mean you should. You certainly shouldn't without a good reason. See http://www.w3.org/TR/cooluris/
I read the document, which proved to be very interesting. The reservations it expresses about changing URIs are based on the assumption that changing the URI leaves a dangling link. See also:
http://www.w3.org/Provider/Style/URI
No dangling links here, though, so long as the server continues to perform redirects according to the same scheme. If it were to stop doing so (let's say in connection with the abrupt introduction of a policy that users should not be able to change their usernames ...), that would be a clear breach of the guidelines.
If anything, the Tim Berners-Lee document linked immediately above would favor using the ID number alone, with the username dropped altogether (viz: "What to leave out? Everything!"). It's a question of whether we trust the current redirection scheme to remain in place. If we do, then it boils down to readability and aesthetics (which is the sort of coolness that TBL argues against).
As for the username changing policy, since that dictates the current URI scheme, I don't have a strong opinion (at least once the forums are changed to display full names rather than usernames). Username changes are probably rare enough that it could either not be allowed or be allowed via special request, in which case the user IDs could be removed from the URLs (or, more accurately, they would be a redirected alternative to the username, since not everything that generates URIs will have the usernames). I suspect we'd get a handful of change requests a year, and we could easily keep redirects for those to keep old links working.
For user items, I don't see a better option than the item id. (Yes, in an ideal world there wouldn't be item editing either, but there is.) We can, however, append a more user-friendly slug to the end and ignore it, as this forum does with discussion titles. (Or, better, we could redirect if a request didn't match the current slug.)
(URLs generated in the client and embedded in documents will use the item's secondary key, which is a random 8-character string, instead of the id, which is now server-generated and no longer synced to clients, but, e.g., http://zotero.org/bdarcus/items/2H8FWNL3 will redirect to http://zotero.org/bdarcus/items/12345. We use ids for items on the site because they're less unwieldy.)
The current plan is to loosely associate user items with "abstract" items, which would aim to be the canonical representations of user items. I say "loosely" because this sort of aggregation will be inexact and subject to user data changes or algorithm improvements. Abstract items will probably be available at http://zotero.org/items/12345, with the same sort of optional, ignored, human-friendly suffix as discussed above. There'll be a link to the abstract item URI available on the user item web page and in API responses.
Finally, eventually we'll also get around to switching to zotero.org instead of www for site URLs. Generated URIs already all use zotero.org.
My point on content-negotiation wasn't clear, but I was just meaning to say the server was returning incorrect information (I believe; could be wrong).
On the periodicals dataset, I believe there's a bug with the PHP framework they're using (which I've reported).
curl -I -H "Accept: application/rdf+xml" http://www.google.com
Leads to similar behavior.The source proper is then linked/associated from that bookmark.
The tricky/awkward bit is modeling/identifying that linked resource. Maybe you have two separate relation properties: one to what you call the "canonical" version, and another to the user's version (which I would hope would generally be the same).
To jot it down in RDF, something like (warning: strawman):
<http://zotero.org/jdoe/items/1> a a:Bookmark ;
dct:creator <http://zotero.org/jdoe>
bm:recalls <http://www.nytimes.com/2008/06/25/business/25exurbs.html> ;
z:udata <http://zotero.org/jdoe/items/1/udata> .
# here we represent the "canonical" data (would be much more verbose normally)
# note: could also use a zotero uri, and add owl:sameAs link to the nytimes URI
<http://www.nytimes.com/2008/06/25/business/25exurbs.html> a bibo:Article ;
dct:title "Fuel Prices Shift Math for Life in Far Suburbs"@en .
# here we represent the user data, maybe only if it differs from the above
# note: still need a way to merge and disambiguate these data
<http://zotero.org/jdoe/items/1/udata> a bibo:Article ;
dct:title "Fuel Prices Shift Math for Life in Far Suburbs"@en .
For group names, we're considering removing group ids for public groups and allowing a fixed number of group name changes within a given period (say, twice in six months), with automatic, permanent redirects. It seems there might be legitimate reasons to change a group name, and having a human-readable slug in a URL is valuable, and, if we're going to support post-name-change redirects from /groups/group_name anyway—which we should—there's no need to use the group id. Private groups will continue to use just the group id.
Bruce, I don't really see the need for separate entities (bookmark and user data) for user items. Having a single user item with owl:sameAs pointing to an abstract item (if one exists yet, which it may not until some asynchronous processing has occurred) seems perfectly sufficient.
And yes, the abstract items I'm referring to would be abstract Zotero items, with Zotero URIs such as http://zotero.org/items/13245. (These URIs would have associated web pages with statistics and links to users.) Abstract items would in turn point to external resources.
As you note, a user item might only include the original data if different from the abstract item's data.
What about notes and attachments?
The same problem exists in the API, which we currently use internally but haven't yet made public. In the API, we put user data (user, date added, date modified) into the containing Atom entry fields, and the item data into the <content>. We currently just use custom XML in <content> but will likely switch to RDFa once the BIBO mapping is complete. But other than just using the Atom fields for the user data, how, then, do you model the two types of data in the RDFa response without requiring separate API requests for each item returned by the original request?
Might it be worth jotting down the ideas on the trac wiki (sean put a page up for the bibo mapping; could be there, or linked from there)?
https://www.zotero.org/trac/wiki/URIScheme
Dan, please adjust/comment as you see fit.