From Ocean Teacher Library
Markup Language FormatsContents |
Background
A markup language is an artificial language using a set of annotations to text that give instructions regarding how text is to be displayed. Markup languages have been in use for centuries, and in recent years have also been used in computer typesetting and word-processing systems. [From Wikipedia: Markup language (see below)]
Generalized Markup Language (GML)
The grandaddy of all the modern markup languages, GML was developed by IBM in the 1960's. GML introduced the use of "tags" to indicate document sections and important visual elements. Subsequently, documents could be printed by devices that were programmed to used desired fonts, document layout, etc., based on these tags. The method was adopted by industry in a second-generation Standard Generalized Markup Language, SGML.
Hypertext Markup Language (HTML)
The official development history of HTML began in 1995, with the publication of Version 2.0 of the standard, after several years of early development based on SGML (above). There have been several new versions, resulting in the current official standard, Version 4.01, published in 2000, as ISO/IEC 15445:2000. In early 2008 drafts of Version 5 began circulating.
Hyperlinks
"In computing, a hyperlink is a reference, link, or navigation element in a document to another place, such as another section of the same document or to another document that may be on or part of a (different) domain." [From Wikipedia: Hyperlink] The earliest well-known use of hyperlinks within a PC document occurred in the HELP documentation for the QuickBasic programming software from Microsoft (ca. 1988). Hyperlinks are the essential building block in HTML, XHTML, XML, etc. (see below)
HTML Training
- A Beginner's Guide to HTML - Although this guide was written for HTML 2.0, it is still one of the most widely used training resources. Originally written by the US NCSA, it is no longer provided there, but can be found in many places on the Web.
- W3 Schools HTML Tutorial
- W3 Schools HTML 4.01/XHTML 1.0 Reference - Quick overview of tags
- Seven-part training curriculum developed by HTML Goodies
- Available files:
- cruiselist_bottle_data_namibia_wod05_amended.htm - Table of ocean cruises downloaded and edited from the WODSelect website.
Extensible Markup Language (XML)
XML is an Extensible Markup Language (extensible because it is not a fixed format like HTML). XML is not a single, predefined markup language: it is a meta-language "a language for describing other languages." It is a set of rules for creating semantic tags used to describe data.
XML is fast becoming the standard for data representation and exchange on the Internet. The basic ideas underlying XML are very simple: tags on data elements identify the meaning of the data, rather than, as with HTML, specifying how the data should be formatted, and relationships between data elements are provided via simple nesting and references. Web servers and applications encoding data in XML can quickly make the information available in a simple and usable format. As the information content is separated from information rendering, it is easy to provide multiple views of the same data.
- IMPORTANT NOTE: XML is so popular now that it is appearing as the physical format for some other format "types" such as the Auxiliary Formats. And it is being used within the new hybrid NetCDF Markup Language (NCML). This blurring of the typology is probably a very good thing, because the eventual emergence of XML-based variants for all data types would allow general solutions to current incompatibility issues.
As with HTML, data is identified using tags (identifiers enclosed in angle brackets, like this: <...>). Collectively, the tags are known as "markup". Unlike HTML, XML tags describes what the data means, rather than how to display it. Where an HTML tag says something like "display this data in bold font" (...), an XML tag acts like a field name in the program. It puts a label on a piece of data that identifies it, for example, <cruise id >...</cruise id>.
XML allows anyone to design a new, custom-built language. However, before a new XML language can be drafted, designers must agree on three things: which tags will be allowed, how tagged elements may nest within one another, and how they should be processed. The first two - the language's vocabulary and structure - are typically codified in a Document Definition Language, or DTD. The XML standard does not compel language designers to use a DTD, but it is required to formally identify the relationships between the various elements that form the document.
- Exchange of data
- A major strength and source of potential of XML is that it facilitates the exchange of data between different applications and operating systems. One of XML's strongest points is its ability to do data interchange. Because different organisations (or even different parts of the same organisation) rarely standardise on a single set of tools, it takes a significant amount of work for two groups to communicate. XML makes it easy to send structured data across the web so that nothing gets lost in translation. XML is potentially the answer for oceanographic data exchange, as long as all sides agree on the markup to use.
- Extensibility
- Extensible means that it is not a fixed format like HTML. While HTML tags must follow pre-set standards, new XML tags can be created by anyone at any time. XML will allow groups of people or organisations to create their own customized markup languages for exchanging information in their domain. Examples of existing industry-specific XML include music, chemistry, electronics, linguistics, engineering and mathematics.
- Plain Text
- Since XML is not a binary format, files can be created and edited with a standard text making it useful for storing small amounts of data. At the other end of the spectrum, an XML front end to a database makes it possible to efficiently store large amounts of XML data. XML provides scalability for anything from small configuration files to an industry-wide data repository.
- Data Identification
- The XML standard specifies how to identify data, not how to display it. HTML, on the other hand, describes how things should be displayed without identifying the content. Because the different parts of the information have been identified, they can be used in different ways by different applications.
- Stylability
- When display is important, the style sheet standard, XSL, can dictate how to portray the data. Since XML is inherently style-free, different style sheets can be used to produce output in postscript, PDF, or any other format.
- Hierarchical
- XML documents are hierarchical in structure. Hierarchical document structures are, in general, faster to access because you can drill down to the part you need, like stepping through a table of contents.
Well Formed XML
XML text is only considered "well formed" if it obeys all of XML's syntax rules. Tags in the text must be spelled correctly, paired in the usual stop-start sequences, and provided with all the arguments they require. If text is not well formed, then it cannot be read by XML-compatible programs or parsers.
Valid XML
XML text is only considered "valid" if it conformed to the rules set up by the original markup language creator as "semantic rules." These rules are are contained in XML schema (see below) or in the document type definition (DTD; see below). They often limit or constrain the types of data that can be placed in various dataset fields, for example. In essence, XML documents obtain meaning and usability from the existence and content of schema and DTD's
XML Examples
- Metadata record - Example of how XML can be used to store metadata in the ISO_19115 format (select "Save ISO19115/19139 metadata as XML" to view the XML file)
- XBT Temperature Profile Measurements - The actual data from the metadata above
XML Training
Extensible Hypertext Markup Language (XHTML)
XHTML can be thought of as the intersection of HTML and XML in many respects, since it is a reformulation of HTML in XML. [from Wikipedia: XHTML (see below)]
XHTML Training
- W3 Schools XHTML Tutorial
- W3 Schools HTML 4.01/XHTML 1.0 Reference - Quick overview of tags
Wikitext
Wikitext language or wiki markup is a markup language that offers a simplified alternative to HTML and is used to write pages in wiki websites such as Wikipedia. [From Wikipedia: Wikitext (see below)] It was used in the writing of OceanTeacher, but it does not play any role in marine data management, per se.
Additional Resources
- Consult the Marine Data Format Examples page to see downloadable examples of files of this type.
- Wikipedia: Markup language
- Wikipedia: HTML (Hypertext markup language)
- Wikipedia: XML (Extensible markup language)
- Wikipedia: SGML (Standard generalized markup language)
- ISO/IEC 15445:2000 "ISO HTML", based on HTML 4.01
- Wikipedia: XML Schema -
- Wikipedia: DTD (Document type definition) - A type of schema, said by Wikipedia to be inherited from the SGML ancestor of XML.
- Wikipedia: Extensible Hypertext Markup Language (XHTML)
- Wikipedia: Wikitext
Subsections of this Article
| Pagename | Short title | Description | |
|---|---|---|---|
| Marine XML | Marine XML | Marine XML | none |
Information about this article
Short title: Markup Language Formats
Description: A markup language is an artificial language using a set of annotations to text that give instructions regarding how text is to be displayed.
Expertise level: beginner
Author: Murray.Brown
Approval status: approved
Approved by: Murray.Brown
Last change: 2010-1-4
Subsection of: Marine Data Format Types
Contact
If you have any direct comments or suggestions for the author of this page then please feel free to send an email to the author (listed above). For discussions on this page please use the discussions page.,



