SGML was written decades ago as a resource for publishers.
Completely configurable, allowing programmers to use 'tags' to segment a file into parts, then write programs which process each segment according to its type.
It's very complex.
A subset of SGML. It was developed by the World Wide Web consortium. It comes with a specialized set of tags which:
Describe how a document should be displayed.
Provide the basis for links between documents and parts of documents.
Each html document starts out with a 'document' tag, which declares certain things which will pertain to the document as a whole, then it has a 'body' section which lists the different parts of the contents.
Look at the source of this for a simplified sample with comments.
HTML has been very successful because it is simple and has made the popularity of world wide web possible.
A disadvantage is that it only allows you to represent the display of the document in a way that's accessible to any HTML-enabled computer program. Contents are all written in text, and aside from the 'keyword' meta-tag, the only way to get at the contents of the document is to apply the information retrieval and data extraction techniques we went into earlier in the course.
XML (eXtended Markup Language) is an attempt to move HTML in the direction of making it more semantically explicit. It is much more like SGML, but not quite as complex.
XML allows the author of the document to define his/her own tags, which express a particular type of data. These are called Document Type Definitions, or DTD's.
Tags can be nested within other tags to describe a tree structure.
XML documents in turn are interpreted by a stylesheet language XSL, which translates the contents of an XML document into HTML, which can be viewed in any browser.
It is intended that various industries will come up with stadards for document type definition languages (eg. medical insurance forms), to facilitate Electronic Data Interchange (EDI). Eg, there's something called MathML