SERIES: 2 – Writing Valid, Accessible HTML

Is HTML hard to learn?

HTML is the easiest markup language you will ever learn. It is smaller and easier than specialty markup languages often used in publishing or typesetting. It is much easier than programming.

What the heck is a markup language, anyway?

Simply put, it’s a set of tags used to “mark up” content for interpretation by a device. When you were in grade school and high school, your teachers used a sort of shorthand to “mark up” the reports you wrote. The shorthand told you if you should start a new paragraph, insert a phrase, or move chunks of text around. Markup languages for the web and for publishing/typesetting tell a device what type of content a chunk of text is — a paragraph, a header, a list — so that the device can render (display or print it) in a way that will make sense to the person reading the document.

Tags are easy to type. Seperating content from style makes writing code even easier, because your webpage code mostly consists of the actual content (the text) you want to be displayed. HTML is not a bunch of cryptic commands, functions, variables, statements, and conditional tests like a programming language. It is far simpler than trigonometry or calculus, or even algebra! It is very human-friendly!

The structure of an HTML document

To start you off, here is a simple webpage example. I recommend you copy this text and paste it into a file named “example1.html”, then double-click it — your browser will open the file automatically. What can you determine about HTML just by looking at the code here and the result in your browser?

example1.html

<html>

<head>

    <title>Basic HTML Structure</title>

</head>

<body>

    <h1>Some Great Website Features</h1>

    <img src=”http://static.php.net/www.php.net/images/php.gif” />

    <p>PHP is a <em>programming language</em>. It is used on websites with dynamic (changing) content, such as shopping carts, lists of items from a database, etc.</p>

    <p>WordPress.com is a community and conversations continue from one blog post to another and through the comments. Our tag surfer feature makes it easy for you to find like minded bloggers interested in the same topics as you and connect with them.</p>

</body>

</html>

There are several things you probably noticed:

  • The code consists of normal text surrounded by some odd-looking code. This code consists of what we call tags.
  • The first item on the page was a large heading. This is the text between the <h1> </h1> tags. This text was automatically made large and bold.
  • The second item on the page was an image. There is an <img> tag that has an attribute, called “src”, which holds the URL to an image file. The browser is obviously following the link to find and display the image automatically, and “img” is obviously short for “image”.
  • The third and fourth items, the text between the <p> </p> tags, were displayed in big chunks that are seperate from each other. So “p” must mean “paragraph”.
  • There is text that did not get displayed, between the <title> </title> tags. Why is that? You may have noticed that every element between the <body> </body> tags was displayed while the element between the <head> and </head> tags was not.
  • The <html> </html> tags surround all other elements.
  • Did you notic something different about the <em> </em> tags? They are in the middle of some text.

HTML document structure

Elements are the “parts” of a webpage. These elements consist of “markup” called tags and content made of plain text. The only exception is with images, because images are stored in special image files. Some elements are called “empty elements” because they have no contents — this is the case with the <img> element, because the image data is read from an image file instead. Given the URL to the image, the browser knows how to display it.

The <html> element is called the root element because it is the element which holds all others. It is the parent of all other elements in the document. Therefore, the other elements are child elements of <html>. So, all other code must go inside the <html> </html> tags. (There are actually 2 exceptions to this, special tags that provide basic information about the document, but that is not important now.)

The way you place tagged elements inside other elements is called nesting.

The root element is further divided into two more elements, <head> and <body>. As you already discovered, all child elements of the body represent items which will be displayed on the page. In contrast, the <head> element does not represent actual content. Instead, it holds special elements that provide additional information to the browser (or cellphone or search engine) about the document.

The <title> element serves several purposes:

  • The operating system will display the title on the title bar (top part) of the browser window. (If there is no title, it will display the URL, or the path and filename.)
  • Search engines will display the title in search results and make it the text of the link a potential vistor will (hopefully) click to navigate to your website.
  • Browsers usually use the title as the title of a “bookmark” or “favorite” (saved browser link that allows users to visit your website again without searching). This happens when a user likes your page so much they decide to bookmark it!

We put some extra carriage returns, or newlines, in the file. This will not affect the display. Browsers determine how to display elements by their semantic meanings, and this includes determining how much space should surround an element. So, you can use extra newlines, tabs, and spaces to make your source code easy to read. This type of non-rendered spacing is called whitespace. (This term also has a second meaning, space with no content in the webpage display, often used when people take about good graphic design.)

There are several nested elements in the document, but which one is nested differently? The <em> element. This tag is inside the actual text that makes up a larger element. An element nested this way is an inline element. Inline elements, for the most part, do not have their own display style. The exception may be that the content of the inline element is bolded, italicized, underlined, or stands out in some other way.

The <em> element is used for words or phrases which are to be emphasized like this. They should be italicized in a graphical browser, and read with a little extra emphasis by a screen reader for the blind. Do not use the <i> element for emphasized text. Italics are a visual style and mean nothing to assistive technology. Furthermore, it is possible you may want to italicize other text, as well. If at any time you want to set or override visual styles for any element, it should be done in a stylesheet. For example, you may decide to style all <em> elements with a font-color or background-color or underline instead of making it italic. That is fine, and the emphasis will not be lost!

Some other examples of inline elements are hyperlinks, images, and strong text (emphasized strongly and bolded). Actually, images are kind of a different animal…they behave partially like inline elements and partially like block-level elements. They are not consistently inline because they can be aligned with respect to text and other content so that the text or other content flows nicely right around the image. Block-level elements are those which occupy their own horizontal space and are usually seperated from other content by whitespace. Notice that the image and paragraph start on the same line… this is what happens with the image element.

HTML syntax overview

Here are some basic facts about HTML tags and syntax:

  • HTML tagsets have a “start tag” and an “end tag”. The exception is with empty elements.
  • Tags begin and end with < and >; these are called angle brackets.
  • Inside these brackets is the element name.
  • Sometimes there are also one or more attributes to provide information that is not content. The “src” attribute in the <img> tag is an example. This attribute is required so that the browser can find and display the image.
  • Attributes are in the form of name-value pairs. The attribute name is followed by an “=” (equals sign) and the value. The value must be surrounded by quotes (either single or double is fine, but the quotes must match).
  • The end tag is identical to the start tag with two exceptions:
    • The first character inside the end tag, after the opening bracket, is a forward slash. (This is found on the same key on your keyboard as the question mark.)
    • Attributes go inside the start tag, not the end tag.
  • Empty tags can be coded 2 ways:
    • You can use two tags with nothing between them. (But on <img> you cannot do that.)
    • You can use a “shorthand” tag instead. Put a space and then a backslash before the closing bracket. (But on <textarea>, a tag used inside forms, you cannot do that.)

There are some rules of syntax which are only loosely enforced in older standards.

Don’t fall into that trap! Use the stricter rules of the XHTML standard. This will avoid errors, and besides, can you imagine having to change almost every single tag in tens or hundreds of webpages later because you decide to upgrade to a stricter standard? Torture!

I will cover how to follow XHMTL rules (and where the term “XHTML” came from) soon enough. But before I do that, I want to talk about the minimum software you need to develop webpages, and additional software that you will probably want to use to make your life easier.

My new Series: Writing Valid, Accessible HTML

This series will be very helpful to a lot of people. Novices to HTML will start off with good habits, and established coders will learn to overcome their bad habits and inaccessible designs (and will learn why they should do so).

I strongly encourage both novices and experts to comment on my article/tutorial and ask questions!

I will start all the posts with “SERIES: ” followed by the number of the entry in the series followed by ” Writing Valid, Accessible HTML”. This will make it easier to use search once I have a lot of posts. When I add new installments, I will make sure there are “previous/next” installment links.

Here’s the first installment. Enjoy!

SERIES: 1 – Writing Valid, Accessible HTML

This series is intended for both the novice and the experienced HTML coder. It will include information for beginners. This will help you learn the right way and avoid bad habits. I assume you are familiar with basic computer/Internet use and terminology.

Introduction: What is “valid, accessible” HTML (or XHTML)?

I am not referring to avoiding errors which prevent the display of content or which display HTML code in the browser. (Yikes!) Rather, I am referring to standards and best practices. There are international and national standards for many things (machinery, electricity, hardware interfaces, Internet protocols)…there are also standards for coding web pages. Here is what I mean in the context of coding webpages:

W3C (World Wide Web Consortium)
– international, chartered organization responsible for web-related standards; made up of individuals from many technical specialties, companies, governments, countries, races, and levels of experience; working groups develop proposals and solicit input, and members vote on the recommendations
valid code
– the code meets the standards put forth by the W3C for a particular language and version (e.g., HTML 4.01 strict), which the author must declare in his or her document
accessible
– the document meets some official standard for ease-of-use for those who have conditions such as blindness, color-blindness, and motor impairments; there are voluntary international standards, and countries may have specific laws and standards of their own; the only Internet-related accessibility law in the United States is Section 508, and it applies only to the government, though all websites should meet its standards; ideally, strive for the highest level of compliance with accessibility standards, especially if yours are a commercial website

What is the purpose of writing valid, accessible HTML?

The easiest issue to explain deals is accommodating people with physical handicaps or challenges.

Blind users cannot see your graphics. They cannot see the layout of your website, so they cannot interpret visual cues to determine what content is important. In contrast, sighted users can instantly see the layout of content and navigation with a quick visual scan and so ignore navigation, advertisements, and extrenuous information until needed. Blind users cannot determine important points from text that is visually bolded, or ordered steps using arrows.

The blind use special software called screen readers which “read” the pages to them. Like the users themselves, screen readers cannot take advantage of visual cues to determine the order in which content should be read, or the importance of an item — by default, they will just read all the content in the order in which it actually appears in the code (including all your navigation links). Many webpage designers have used tables with invisible borders (intended for tabular data) to control their layouts and overcome display discrepancies between browsers, especially when a complex graphical layout is desired. Is this a big deal? Yes! Many screen readers cannot handle tables at all, and if they can, they have no idea in which order content should be read!

People with other physical disabilities may find it difficult or impossible to access webpages using the mouse, they keyboard, or both. Some people may use pointing devices instead. For example, many people deal with conditions such as carpal tunnel syndrome, multiple schlerosis, or paralysis. There are even Braille printers. Other people are color-blind and require a high-contrast design. Still others are not disabled, but configure their browser to not display images, either because images annoy them or because they have slow Internet connections. These users need important images described to them, and/or a description to take the place of the image on the page.

There are also other usability and accessibility issues.

Some “surf the web” with text-based browsers. Why? Because they are faster, since they don’t have to worry about pictures, layout, colors, and fonts, and because many of them use an operating system that allows them to boot their computers into the command line, without using a GUI (”goo-ey”), a graphical user interface (such as a desktop), or allows them to choose their mode at startup.

Writing code that conforms to a standard (preferably the strictest and latest) helps to deal with most cross-browser issues. During the fierce “browser wars” in the 1990’s between Netscape and Microsoft’s Internet Explorer, both companies introduced tags that were proprietary (private property, meaning the other browser couldn’t use them). They also handled the standard tags and attributes inconsistently. This caused website designers to target their websites specifically at users of one browser or the other, or even to build seperate websites for each. Nonstandard code also causes horrible problems for people using screen readers and other assistive technology!

All browsers are much better today at handling code in a uniform way, but they are not perfect. Also, they attempt to accomodate old, possibly even abandoned (but still valuable) websites by using what is called quirks mode. This means that browsers will render different versions of HTML (including those pages that don’t declare which version they are) in old, nonstandard ways. Some browsers may even use quirks mode if they detect nonstandard code, regardless of the version the author declared. This is very important when you manipulate the layout of the page, as opposed to just the visual styling of text.

However, inconsistencies and bugs still exist. It is up to you, the author, to use the most modern standard, make sure you adhere to it, declare your standard in your document, and test your webpages in all browsers.

Is writing valid, accessible code hard?

No! Writing valid, accessible code, in combination with what are called best practices (most effective, efficient ways of working), actually makes your job as a web designer much, much easier! Here are a few generalities:

  • Use the most modern version of the standard…XHTML 1.0 strict is usually the ideal way to go. (I will explain the relationship between the HTML language and XHTML coding soon.)
  • Avoid framesets. A frameset is an HTML page that holds other HTML pages, each in its own frame. This allows parts of the webpage to be scrolled seperately. Non-graphical browsers cannot display the content at all, nor can cellphones, PDAs, assistive technology, nor very old browsers. Framesets confuse people if there are too many frames, or the frameset is badly-designed, or the user does not know how to print only the frame they want.
  • Try to follow XHTML coding practices, meaning wherever an HTML version does not care how you do something, do it the “XHTML way” even if you are coding to a lower standard; XHTML is more strict about certain things (this is simple, and will be explained later).
  • Avoid tables. If tables are absolutely neccessary to display specific, appropriate data, provide an alternative, non-tables version of that information.
  • Avoid tags that are deprecated (to be phased out and not recommended) or non-contextual (not related to the meaning of the content); using them is a lazy way to style elements (and problematic from an accessibility viewpoint).
  • Avoid using too many images, or images that are too large. They slow download times, a critical issue for vistors with slow connections.
  • Provide alt text (alternative text) which will be shown in place of images and read to those using screen readers, or which will be displayed in a tooltip (pop-up text in a little box) to the viewer when he or she hovers the mouse over the picture.
  • Do not use styles within individual tags when you can use stylesheets (style declarations coded at the beginning of a document, or in an external document, and which can be reused).
  • Seperate content from style — use stylesheets. If you are incorporating dynamic (changeable) content via a server-side scripting language, consider using a templating system to keep the HTML seperate.
  • Avoid JavaScript if you can, and never rely on it for required functionality or actual content. (JavaScript is a scripting language which can provide special effects, alter page elements “on the fly”, and provide extra functionality.) Visitors sometimes turn JavaScript off because it may be annoying, error-prone, badly-designed, or used for nefarious purposes. Because it can be turned off in a user’s browser, never, ever rely on JavaScript for validation (checking data for correct entry or malicious code before it is sent via a form to your email or your database).
  • Do a prototype (mock-up, practice version) of a single or a few pages before you do all the pages on a multi-page website. When you are satisfied with the layout, navigation, and style, save a copy and rename it something like “TEMPLATE.html”. Strip all the changeable content out of the new copy…this will now be your template (basic code) to be reused on every page. For each new page, open up the template, save a copy under the new page name, and add all the unique content for that page (header text, images, and text).

I understand the accessibility/usability angle, but how do such practices make it easy for me?

In the long run, your website will be easier to maintain. You will need to re-write less code, and search/find features in your text-editor will be more reliable if you are consistent in your habits. Here are some specific examples:

  • Seperating content from style makes it easier to:
    • find mistakes
    • make changes to content
    • make changes to style, even more so if you control the style of multiple pages from one stylesheet
    • avoid non-standard tags.
  • Following XHMTL coding practices makes it easier to:
    • upgrade the document’s standard later with less rewriting
    • rely on search/find features in your text-editor
    • avoid quirks mode
    • avoid mistakes regarding tag nesting (the way some elements appear inside others)
    • avoid non-standard tags, whether proprietary or just deprecated
  • Avoiding tables:
    • makes it easier to change content and layout
    • avoids the all-too common disaster where you don’t understand how all your tables/cells are nested (webpage tables allow selected cells to “span” multiple columns and/or rows and to contain more inner tables), and where you may even make a cut/paste mistake and lose your layout or content while trying to change it
    • avoids the nightmare where changes can only be made to pages one-by-one because content in pages is divided into tables — which, in addition to the problems above, can even result in discrepencies between pages because it takes you so many days (or weeks, or months) to update all pages that you don’t remember which files you changed or what you did
  • Using prototypes and templates helps you:
    • keep your entire site consistent
    • seperate content from style

Okay, I understand this (I think). But first I need to learn HTML. Is it hard?

HTML is the easiest markup language you will ever learn. It is smaller and easier than specialty markup languages often used in publishing or typesetting. It is much easier than programming.