Wednesday, May 02, 2007

The broken Web

Two months ago I got into an argument with TR on the quality of Microsoft Powerpoint; today I got into an argument with Truman on the quality of web-pages and the intellect of their designers. I suppose I have a short fuse when it comes to software quality, layout and usability issues.

On the first occasion, it was totally off-topic, but at least it was TR's own blog and it was he who introduced the powerpoint motif; but this time, though it was not entirely off-topic, it was a third person's blog (Dilip's) which makes it much less excusable. So I may as well put down what I think about the whole thing, and in future link to it rather than arguing. (Of course, many others have written similar things, and I will link to some of them below.) In this post I'll focus on the web, and webpages; in a later post I'll talk about Microsoft's products. But in both cases a key point is emphasising standards versus emphasising proprietary (and expensive) products.


We take standards for granted everywhere in daily life: my Sony CDs of Miles Davis will play on my Onkyo player, or someone else's Philips player, or on a laptop, without problem. That's because audio CDs adhere to the red book standard. Similarly, you can fill your car with petrol at any of a dozen pumps: the formulation is standard (we ignore the adulteration problem here) and a manufacturer of cars that insisted on its own particular formulation would not get a huge market. (This is also why alternative fuels like CNG find it hard to break in.)

In computer information too, many things are standard, like plain text files (created with the ASCII character set) or MP3 audio files. Specifications exist for these, and readers/players must conform to these specifications, or they don't work.

What happens if you create a document, or a CD or a petrol formulation, that deviates from the specifications? In mild cases you may just get degraded performance (eg, the document displays with errors, the CD may play partially, the car may run with poor mileage); in extreme cases the product is unusable.

For interoperability, then, adherence to standards is essential. So why does it happen so often that "this web page looks funny"?

The language of the web is HTML, or "hypertext markup language". The specifications of HTML are maintained by a consortium called the World Wide Web Consortium, whose members include Microsoft, the Mozilla Foundation, Opera Software, and pretty much any other web-related company you can think of. There are now several versions of HTML (and its successor, XHTML), each with its own specification. The consortium hosts a validator to help you check the correctness of your webpage. Mozilla's webpage passes, as does Opera's; while Microsoft's has five errors, which is really quite respectable. Most web pages report dozens of errors. In fact, you'd have a hard time finding sites, other than the above two and W3C's own pages, that pass the validator.

Why is practically every page on the internet invalid HTML? There are several reasons:

  • If you write a syntactically incorrect C program, it will not compile. By contrast, if you write a syntactically incorrect HTML file, most browsers will attempt to display it anyway. If the results look good enough, most web developers won't care that it is wrong. This leads to the "slow degradation" problem I mentioned above: strict syntax-checking would have avoided it.
  • Most people no longer write HTML by hand: they use programs like Microsoft Frontpage. These, too, don't care about errors as long as it looks OK to a browser (and often, to a specific browser, typically Microsoft Internet Explorer).
  • Back in the early days of the web, there were very few browsers around; the first to achieve some popularity was NCSA Mosaic, but it was quickly supplanted by Netscape Navigator. Netscape, knowing it had a market lead and wanting to retain it, extended HTML in various ways without the consortium's approval, and encouraged developers to use its extensions (which its HTML authoring program also produced). Later, Microsoft strong-armed itself into the market lead and followed the same tactics. Meanwhile, neither company paid too much attention to actually following the W3C's recommendations. The result was a mishmash of standards -- the W3C's (which nobody fully supported), Netscape's, and Microsoft's -- and most developers (the ones who cared) just threw up their hands and supported the two most popular browsers, and eventually, just Microsoft's.

Today, the browser situation, at least, is much better: at least three major browsers -- Mozilla, Safari/Konqueror, Opera -- are largely standards-compliant, and the new Internet Explorer 7 is a big improvement over its predecessors. Nevertheless, most webpages out there don't support the standards.

Standards-compliant webpages tend to display properly across all browsers. With many webpages, that is not the case. But it is hard to blame web designers too much when the major browser vendors were so slow to the table. Unfortunately, there is much else that the designers -- and also the writers of web-authoring software -- can be blamed for.

Web authoring is not typesetting!

Printing a book, and creating a webpage, have this in common: they involve displaying text and graphics on a medium (a monitor, or paper). But there the similarities end.

While the book's layout is in the hands of the creator, the webpage's layout is in the hands of the web browser on the user's computer. For this reason, HTML is a "markup language", not a typesetting language: it describes the attributes of various text objects but does not prescribe (except at a crude level) where on the page the text should go. This is important: displays come in various sizes and resolutions, from tiny mobile-phone screens to 2048x1536 monstrosities (and beyond), and a one-size-fits-all approach will not work. Moreover, high-quality serif fonts may be suitable for the hi-res displays, while a simple but readable sans-serif font would be more appropriate for the mobile phone. So one should not specify, or make assumptions, on such things as font families and font sizes.

Unfortunately, lots of pages do make such assumptions. For example, suppose I have poor eyesight and choose an unusually large font size. Wikipedia (which passes the validator) looks cluttered, because of its multi-column format, but the elements are still correctly positioned:

However, the home page of FreeBSD (a Unix-like operating system), which also validates, is a disaster when viewed at large font sizes:

Needless to say, pages that don't validate fare even worse. Try it (on Mozilla Firefox, Ctrl-+ increases the font size).

The Any Browser campaign has more information on how to make a web page look good on any browser, and how to make it "degrade gracefully" when assumed features aren't present in a browser.

Web misfeatures

The biggest misfeature to be a part of the web standard is probably frames. Rather than holding forth, I'll just link to what Jakob Nielsen says about them. (Nielsen's 1996 list of web-design mistakes is still worth reading, as are many of his later writings.)

Another misfeature is pop-up windows. Enough said. (It is never necessary to open a new window: the user can choose to do that when he/she wants to.)

So far I have talked about HTML. But a lot of web pages have chosen to avoid HTML altogether, and put together their entire website in Flash. There are several things wrong with this, such as:

  • Web browsers are available on nearly every platform; flash plugins are unavailable on most platforms, including all 64-bit systems (even 64-bit Windows-XP).
  • One can cut-and-paste, search, index, link to HTML pages. One can't do that with flash pages.
  • The visually impaired can resize the text on HTML pages, or use a reader. With flash they're out of luck.
  • The download is just so much bigger with flash. Many people still use dial-up internet connections.

Other non-HTML features include Java (these days less common) and JavaScript (which can be a security risk). A Microsoft-only feature is ActiveX, which is a severe security risk, and thankfully very few sites seem to require it these days (I haven't encountered any in a couple of years).

My other peeves with web design relate to usability (where links are placed, what they do, how you access them) and layout (a subject that overlaps with what I plan to say about Microsoft Word and Powerpoint); having expended sufficient verbiage for an evening, I will return to these topics later.

Dilip (correctly) pulled me up for suggesting that all web designers are idiots. The above observations suggest that to a large extent, they are more sinned against than sinning: if authoring software, and (till recently) browsers, do not satisfy standards, what are they to do?

Nevertheless, I do not absolve them of responsibility. Today, web browsers do follow standards, and it should be easy to ensure a webpage works on them. Moreover, Mozilla Firefox and Opera can both be installed easily on Windows machines (Safari can't, but its parent, Konqueror, can be installed under a unix-like environment called CygWin). Mozilla Firefox alone commands over 10% of the market, and the non-Microsoft browsers put together approach (or perhaps exceed) 20%. These browsers also allow easy scaling of fonts and setting of font-family and font-size preferences. It is, if not idiotic, at least completely irresponsible to choose to test only on one web browser, MS IE, and that too only on version 6. With the release of (and, often, enforced upgrade to) IE7, many sites are now broken and need to be fixed; they need never have been broken in the first place.


km said...

Great post, Rahul.

MS proved it 20 years ago that the race is not always to launching the best product, but to winning the Standards battle.

And call me old-fashioned, but Lynx was AWESOME :D

(I do mean it. Lynx was fast and I had so much less Carpel frickin' tunnel.)

Space Bar said...

excellent post, rahul. shall forward to many friends in the business.

Rahul Siddharthan said...


km - I still use lynx. Not for surfing the net, generally, but for quickly viewing a HTML file, or for viewing HTML email. (I use a text-mode email program, mutt: today I think there's no good reason not to switch to Mozilla Thunderbird or something, I just haven't got around to it.)

There are "better" text-mode browsers that can handle tables and frames, like w3m and links, but I still like lynx...

Dilip D'Souza said...

Lynx! Don't remind me. Grew to love the thing. Miss it like I miss Unix, and my Brown-only text editor bb (imagine vi grown easier), and Lisp, and typing Ctrl-D to exit applications cleanly and quickly and still knowing "kill -9" was waiting in the wings, and little round cloth buttons, and ganne-ka-ras, and Adexolin ... aaaah, I'm regressing! (Or am I?)

km, re standards vs best product, it wasn't just MS. It was the VHS/Beta battle and others too.

Actually I've always been intrigued by the whole push to standards in everything. Another downside, at least when I was following this more closely in software, seemed to me that the technology/trends evolved faster than standards could be set up. So in Lisp, even though there was a Common Lisp standard, every version of Lisp you bought was just enough different to drive you crazy if you were porting code.

Rahul, want to take on that aspect?

km said...

Ganne-ka-ras and adexolin? Now, now, Dilip, Better Living Through Chemistry? Nudge nudge wink wink.

Rahul Siddharthan said...

Dilip -- you can always go back to unix (km recently declared his intentions of doing so). As a techie you should have no problem with Linux. Indeed, lots of non-techies are using unix today, in the guise of Mac OS X.

The Betamax format wasn't necessarily superior -- it's a question of what the users wanted, and it looks like they wanted a non-proprietary technology with longer recording time, and weren't satisfied with an expensive 1-hour tape, even if the picture quality was superior.

The whole thing is playing out again with the HD-DVD vs Blu-Ray thing, and this time it looks like Sony may have the edge (but it's too early to say for sure).

About standards in software -- today Common Lisp is a widespread standard (the other lisp standard is scheme, currently at R5RS). This is not to say that interoperability problems have been solved -- after all, few C compilers follow the C99 standard either -- but at least it's a good thing to aim for. There will always be developers whose itch is not scratched by the standard, so they will add their own features. In a programming language -- especially a compiled one -- that's not so bad. Moreover, today's non-standard extensions may well become tomorrow's standards. But when the goal is to exchange information, you really want to be sure the other guy can read what you are writing. That's where standards are most important, both on the web and in word-processors (the subject of a future post).

Dilip D'Souza said...

Rahul, actually I use Mac OS X myself, having switched from Windows over a year ago. I've noticed the underlying unix base, though I haven't yet made great use of it. Suggestions on how I can?

Rahul Siddharthan said...

dilip - well, I think unix is defined by the command line; if you open a terminal you get a bash (or perhaps tcsh) prompt. What I find that tremendously useful for is searching through text files (via "grep", "sort", etc), and doing repetitive tasks (for example, doing trivial format changes to large numbers of text files using sed, or shrinking every photo in a directory and saving the shrunk copy to a subdirectory). If you've done such things in your unix past, you'll quickly pick it up again.

Beyond that, of course, there's a lot of very useful, free unix software that works on Mac OS X, and you can pick it up via fink.The graphical programs will run with a separate "rootless" X server and will not look like "native" OS X programs, but that apart, I believe it works very nicely (I haven't myself used it, but some of my colleagues do).

Rahul Siddharthan said...

max - thanks for the spam.