Maintaining links

Filed under: Internet · Date: Fri Feb 16 00:06:40 2007

A recent meme among web developers is duplicate links, and why they should be avoided. The reasons are easy to undestand, but the methods to get there aren't so readily at hand.

The web is build for fault tolerance

Consider a link like http://www.host.com/story. The link is hackable, it's easy to write, and if you link consistently, it's the only publicly available link. But the web was not build entierly on links. If you append a slash to the link, the contents are the same, but the link has changed.

Depending of the CMS used, the same content could be accessible using multiple links, like for example: http://www.host.com/index.php?id=100 or http://www.host.com/story.html. These are all different links even though the content is exactly the same.

Maintaining links

The problem of having multiple links to the same content lies in the roots of most content management systems, and web server configurations. While the content management systems manage the data or content of a website, they don't consider links as content important enough to manage. Therefore, links are not managed, and the same content will unavoidably be served from different URLs.

To solve the problem, links must be managed. The CMS must keep track of each link it serves. Including the links that it once has served. It's not enough to keep track of the current state of links, simply because the site administrator could change the links at a later time after publishing a page. Say, there is a spelling error in the link, and it's corrected. Most modern systems just throw away the information they had on the old incorrectly spelled link and start serving the new link instead. What the CMS should do, is redirect the old link to the new link.

Application

Link management requires that the links are stored in a central place -- after all, a link can't point to two different pages, so it only makes sense to prevent such things from happening.

For an experienced developer, adding link management is not difficult, altough it demands time to change the entire CMS to work with an universal repository of links.

The following example shows a minimalistic implementation of link repository for link management:

CREATE TABLE link (
uri VARCHAR NOT NULL,
id INTEGER,
status INTEGER DEFAULT 200,
meta VARCHAR
);

The id attribute is an identifier in the CMS for the page or data which the link should show. This information will depend on the CMS, and it could require additional attributes, link the type of link. HTTP status is saved in status attribute. The default value 200 indicates that everything is OK. If the status is not OK, then the meta attribute can be used to tell what's wrong with the link. If the status is 301 or 302, then meta could point to the new location of the page.

Conclusion

Maintaining links is not difficult, but it requires a certain amount of time to get it right. Managing links helps the CMS to keep a single link for a certain page.


Comments are disabled for this post.