en plein txt

logical negation | cross.ova.ing ][4rm.blog.2.log][ | [SUPER8.LOG]

The Web is broken

...as a platform for electronic publishing for long term-storage and reference. One might embrace this instability, as net.art has done from early on. But it means that the world continues to be stuck with print books and journals for all "serious" publishing, with all the negative - and no longer necessary - implications for public access to information, research, study and learning opportunity outside rich institutions and countries.

Reasons why the Web is broken for long-term electronic publishing:

  • The URL system is broken since it doesn't sufficiently abstract from physical server addresses. This has always been a problem, but has escalated with the (a) commercialization of the Web and (b) proliferation of content management systems: (a) URLs rely on DNS, DNS does not abstract enough from IP addresses and has been tainted by branding and trademarks. (b) Content management systems create internal namespaces (that taint URLs and document structure) and are highly unstable, getting rewritten/replaced every couple of years. If a document exists on a CMS, it's unlikely to survive years on its URL. (Ultimately, CMS create just another layer of spam to the Web.)

  • As a side-effect, any kind of reference to an online resource - be it a citation, link or embedded quotation - can not reliably made, which is why the WWW has not met its original goal of providing a distributed hypertext system.

  • Still, the Web is not distributed enough because web servers provide single points of failure (and document death - an issue closely linked to the URL and DNS system).

  • HTML does not provide enough structure for research-level publishing. More capable alternatives such as extended XHTML, DocBook XML and TEI XML have not succeeded because their complexity is too much for people trained on graphical software that emulates, and thus artificially extends, analog tools and their work flows. Because of this legacy, not even the rudimentary semantic markup structure of HTML has been widely understood and used.

  • Changes / editing histories of documents can only be tracked on the level of individual content management systems (such as Wiki engines), not the Web as a whole. However, built-in revision control and version rollback are a necessary precondition for reliable referencing of documents. [Ted Nelson had that figured out in the 1970s.]

What could be done:

  • Introduce a new document identifier/addressing system that fully abstracts from DNS, using cryptographic hashes as document identifiers and a distributed registry for those document hashes.

  • Introduce document revision control and rollback on protocol/API level. This would allow server software to implement whatever individual local revision control, respectively rely on existing systems (RCS/CVS, Subversion, git etc.) while still maintaining network compatibility.

  • There is no hope for high-level standardization of document formats (such as simplified TEI XML), so simply allow any open standard document format.

This needs to be written up as a paper, with technical terms unwrapped for non-technical readers.

Tags: internet.
2nd January 2009
Archive
Tags 16mm, analog, art, book, cultural theory, deutsch, div, economy, education, film, gnu/linux, internet, literature, music, nederlands, neoism, performance, photo, poetry, politics, super 8, systems, theology, video.

Created by Chronicle v4.6