The URL Problem

Internet literate people use URLs to figure out where they are online. They use them to judge the accuracy of a website, figure out who is probably writing it, and determine how to interpret the discussion on it.

Internet illiterate people, by contrast, don’t make use of URLs, much less understand what “URL” means or refers to. Give them a text box marked “URL” on a blog comment form, and they’ll fill it with something else (which the blog comment software will dutifully but distastefully mix with hexadecimal encodings). Here’s a selection of things people struggling to use the Internet entered in that field (with the title of the blog post these comments appeared on in parens):

  1. http://hey/ (Movie: Holes)
  2. http://I%20donno%20wat%20this%20is%21 (Movie: Holes)
  3. http://cancel%20e-fax%20service (Cancelling eFax Service)
  4. http://google/ (Google Answers HCI Program)
  5. http://lmw52530yahoo.com/ (Google Answers HCI Program)
  6. http://microsoftinternetexplorer/ (Google Answers HCI Program)
  7. http://maury%20povich (Maury’s Blooper)
  8. http://I%20need%20your%20help%20%21%21%21%21%21%21%21%21%21%21%21 (Maury’s Blooper)
  9. http://St.%20KITT%27S (Maury’s Blooper)
  10. http://MelissaSweetMilaforsale (How To Sell A Wedding Dress)
  11. http://dont%20know (Spiders! Ack!)
  12. http://communitionsfromelsawhere/ (Spiders! Ack!)
  13. http:///????????????????????????????????? (Can we talk about Avril Lavigne for a minute?)
  14. http://metoo,fromethiopia/ (World Youth Congress 2008 – Need Help)
  15. http://abid/ (Maury’s Blooper)
  16. http://bigpond/ (Maury’s Blooper)
  17. http://houston,tx/ (Maury’s Blooper)
  18. http:///???url??? (Harry Potter)
  19. http://sorry%20no%20email (Who Is Josh Server?)

Got that? That’s greetings (1); element from the title of the post or blog being commented on (3, 4, 12); email address (5, also entered in the “email” field); name of the browser the commenter is apparently using (6); person being addressed (7); an apparent subject line (8,10); commenter’s geographic location (9, 14, 17); commenter’s name (15, also posted in the “name” field); commenter’s apparent ISP (16; Big Pond is an Australian ISP); and last but not least, abundant indications that the commenter just does not know what in hell URL means (2, 11, 13, 18, 19).

These were among the findings from my dissertation. As a result, I’ve been on a kick to fight for greater understanding of URLs. I have this idea that a browser plugin needs to be developed to support this understanding among everyday users. I’ve been talking about this to a lot of programmers I know.

Some of them have agreed with me; some have not. Blaine Cook and I had a lively debate about it a while back. His position was roughly along the lines of what he was saying in a comment in this discussion:

…people don’t understand URLs, and no amount of training will ever make them understand. URLs are comprised of seven abstract concepts: scheme, userinfo, host, port, path, query, and fragment. Even if we say we can safely teach people only about domain, port, path, and query we still have to teach them that they can (usually) ignore the scheme, userinfo, and fragment.

Later, he says “The OpenID community needs to give up on URLs as identifiers.”

The debate we had was not so much an argument with each other as it was an argument past each other (as in “ships in the night”). Blaine’s concern, and the concern of many of the people working on OAuth and similar identification schemes, is identifying individuals. For individuals, URLs are not particularly good identifiers; probably they never have been.

My concern, however, is usually verifying the identity of larger organizations, like corporations, lobbying groups, political parties, etc. It’s a concern I’ve had since I began reading up on media ownership during the Great Consolidations of the mid-’90s.

If online verification standards gave up URLs, it would be harder to determine which organization was involved in a particular website. Now, it doesn’t seem likely that URLs will be entirely abandoned as part of what Lessig calls “architectures of credentials.” What is happening, however, is that interface designers are starting to hide URLs from users as much as possible, particularly on mobile devices.

When I visited Yahoo a little while back to talk about my dissertation research, I discussed this with a guy who does some HCI work for them. He contended that URLs ought to go away to make the user experience less confusing.

I balked, and told him what I’ve taken to saying: if you want the Internet to become TV — where it is hard to figure out who owns a channel, produces a show, or generally influences what you’re seeing — then sure, go ahead and hide URLs from users. If you want the Internet to stay the Internet, though, it’s important that users still have the option to read the URL and do things like write it into whois lookups. If you work in the industry, it’s your choice.

When I threatened the guy at Yahoo with the televisionizing of the Internet, he shrugged and said that he really wasn’t in control of such matters. Of course you are, I said, frustrated. You work in the industry, you make decisions about things like this every day, for your job. But it’s not up to me, he said. It’s the guys down the street who get the final say, he said, gesturing out the cafeteria’s immense plate glass windows, down the tree-shaded highway, in the direction of Cupertino.

Well, guys at Cupertino, and elsewhere in Silicon Valley and Redmond. Which will it be: the transparency of a URL you can take over to Internic, or yet another mask on the many-headed hydras of News Corporation, ClearChannel, and Disney?

I’ll leave that gauntlet lying there. I’m aware it’s not that simple, for reasons I’ll go into next. But it’s worth keeping in mind what URLs do for us as tech-savvy types; when parts of the URL disappear for us, we tend to pitch a fit.

* * *

I sat down today to write a chapter about URLs and search engines for the book I’m about to pitch. In it, I’m trying to explain to a popular audience how geeks figure out whether a domain is legit.

I realized as I tried to explain whois lookups that my assumption about their utility is a little presumptuous. On many sites which have a web interface for whois, the registration information doesn’t go as deep as the actual owner — just the registrar. So I wrote using Network Solutions’s lookup service as an example. (I’m not assuming a general audience will be able to run whois from the command line.)

And then I realized there’s other assumptions I’m making about people’s skills and knowledge when I’m suggesting whois lookups. I compared the whois information for mauryshow.com and maurypovich.com as an example. The former is the Maury Povich Show’s actual website; the latter was snapped up by a vendor of New York City-area theater and TV show tickets. The registrants are Universal City Studios LLLP and New York TV Show Tickets Inc., respectively.

This is a pretty good case for how confusing whois can be. Why not assume NYTVSTInc. is the company most closely tied to Maury? There’s “TV Show” in the name. You have to know that Universal City Studios is a TV production house to make the correct judgment in this case. So, what? Do I tell people to go further down the rabbit hole? Check IMDB? How about SEC filings? This is starting to be heavy lifting.

I’m convinced such heavy lifting is valuable in certain cases. Frankly, I wish students were encouraged to do some equivalent of a whois lookup on every text they encounter in school which they are asked to take as fact: textbooks, newspapers, journal articles, etc. To some extent we are already taught to do this from a very early age. Who wasn’t asked by a grade-school teacher to recite the author of a book along with the title? (I have a distinct memory of droning “STOPping by WOODS on a SNOWy EVEning by ROBert – LOUis – STEEvenson” along with the rest of Mrs. Lahorgue’s first grade class.)

It’s important to know where the author of a text is coming from in order to evaluate its veracity. Even more so in an age like ours, where the criteria for evaluating the source of a text are in chaotic upheaval. (Isn’t that video being cited by the Tea Partiers from The Onion? How do we know when medical journals are sponsored by pharmaceutical companies? Which online articles count toward tenure, and which don’t? Would you rather have Facebook or something like Webfinger as the repository for your online identity? Is Adrian Lamo really a journalist?)

The task as I see it, as an educator, is determining when in the reading process we should be recommending that kind of heavy lifting. And then, of course, the more difficult task is getting schools to understand and implement this kind of reading-of-sources. When it comes to URLs, this is a particularly thorny problem, as many schools simply do not allow reasonable access to the Internet.

Comments 3