Tuesday, May 29, 2007

The Document Markup Conundrum

There are many languages that exist for formatting documents. HTML is a very common example of this and a good one to look at. It can be used to define the structure of a document and has some rudimentary style options. LaTeX is another markup language that is similar in those respects since it also focuses on structure with some styling capabilities. The problem with this is that you can worry about both the structure and the style at the same time.

CSS is an attempt to fix this problem by providing a means to style a document outside of the document itself, however it's still embeddable and can be declared within the document itself. This can lead to a document that can be confusing and hard to maintain.

The W3C wants the world to move towards XHTML. Others want to move to HTML5. Both have some of the same problems as prior versions, though there are a number of fixes and improvements. In the attempt to retain backwards compatibility, are we really creating the right solution?

I'm thinking we need to start from scratch. To continue with enforcing backwards compatibility is going to hold us back. However, to completely bail on the old formats means we have to wait for a new reader/browser to become compatible with it. Looking at the current and past states of web browsers, waiting for that to happen is probably pointless.

What I would love to see is a standard that is simple to understand and fixes all of the mistakes made with HTML and others. Also, I'd love it to be supported fairly quickly. Some ideas I've had are focusing on structure instead of presentation in the actual document portion itself. One way of doing this is similar to how you can embed data and documentation in a Perl script. After the actual Perl code, you can add a section, such as __DATA__. This becomes interesting for a document as you now have a clear way of having style and script inside a document file without polluting the actual document itself.

Another thought I was toying with is what type of markup should be used. Do we use the XML/HTML style of tags or something like LaTeX where text is enclosed in braces? Each way has it's pros and cons. Neither really appears to have any real benefit over the other from a technical perspective, so it's just a matter of preference or whichever is cleaner. Honestly, the only real difference between the two that I see is that the LaTeX style is more compact whereas the XML style can be more readable.

Now, the more important criteria is which is easier for someone to write a document with. Personally, I found the LaTeX style to be a bit easier, but as a programmer, braces are commonplace. I've also found that it is clearer for me. To say \section{Foo} seems clearer than <h1>Foo</h1>. Outside of a bit of wordiness at times, the only thing that I think LaTeX could do better it tables. They're kind-of a pain to populate, but I do like the way you can customize the column dividers.

Now, as for the styling aspect of things, CSS does a pretty good job, but it does have it's deficiencies. First, I think it can use inheritance, Perl6 is doing exactly what I think CSS needs to do with their regular expressions. In Perl6, you can build a regular expression using other regular expressions. With CSS, it think it would be nice to be able to do the same thing. For example, let's say we have a style for links so that they are all 14 pixels in size, white in color, and have a blue background. Now, let's say I have another style for links that's exactly the same, except I want it italicized. Instead of creating a whole new style that's almost an exact copy of the prior one, we could inherit the prior style and just add the necessary changes.

From a scripting perspective, I don't have many complaints outside of the fact that Javascript is a bit weird at times. However, in terms of DOM access, one new function keeps coming to mind: get ElementsByAttribute. I see it having two parameters: the attribute you want to search on and the value you want to search for. For example, to find all elements with a class attribute of "foo", you could do this: getElementsByAttribute('class', 'foo'); May not be the most efficient way of doing things, it can handle any attributes you may want to search on.

As my last thought, is an organization like the W3C the right way to go about creating and updating a standard? They are not necessarily the fastest at getting updates to specs ready. Look at OpenGL and DirectX. OpenGL is slow to be updated where DirectX is updated much more frequently to take advantage of new technologies. A potential problem with moving faster is that the specification may not be as well thought out as it could be.

Well, that's all for now. If I have the time, I'll try to come up with an actual specification.


Post a Comment

Links to this post:

Create a Link

<< Home