Structural markup of a web document is provided with . See what "Markup Language" is in other dictionaries. HTML markup language

Markup languages

Markup language(text) in computer terminology, a set of characters or sequences inserted into text to convey information about its output or structure. It belongs to the class of computer languages. A text document written using a markup language contains not only the text itself (as a sequence of words and punctuation marks), but also additional information about its various parts - for example, an indication of headings, highlights, lists, etc. In more complex In some cases, a markup language allows you to insert interactive elements and content from other documents into a document.

It should be noted that a markup language is not Turing complete and is not usually considered a programming language, although strictly speaking it is.

HTML (from English. Hyper Text Markup Language- "Hypertext Markup Language") - was developed by the British scientist Tim Berners-Lee approximately in 1986--1991 within the walls of the European Center for Nuclear Research in Geneva (Switzerland). HTML was created as a language for the exchange of scientific and technical documentation, suitable for use by people who are not experts in the field of layout. HTML successfully dealt with the complexity of SGML by defining a small set of structural and semantic elements called descriptors. Descriptors are also often referred to as "tags". With HTML, you can easily create a relatively simple yet beautifully designed document. In addition to simplifying the structure of the document, support for hypertext has been added to HTML. Multimedia features were added later.

Initially, the HTML language was conceived and created as a means of structuring and formatting documents without being tied to the means of reproduction (display). Ideally, text with HTML markup should be reproduced without stylistic and structural distortions on equipment with various technical equipment (color screen of a modern computer, monochrome screen of an organizer, limited-sized screen of a mobile phone or device and programs for voice reproduction of texts). However, the modern use of HTML is very far from its original purpose. For example, tag

, used several times for page formatting, is designed to create the most common tables in documents. Over time, the platform's core idea of ​​HTML independence has been sacrificed in favor of modern needs for multimedia and graphic design.

XML(English) eX tensibleM arkupL angle-- extensible markup language; pronounced [ ex-em-eml]) is a markup language recommended by the World Wide Web Consortium (W3C). The XML specification describes XML documents and partially describes the behavior of XML processors (programs that read XML documents and provide access to their content). XML was designed to be a language with a simple, formal syntax that would be easy for programs to create and process documents, while also being easy for humans to read and create documents, with an emphasis on web use. The language is called extensible because it does not fix the markup used in documents: the developer is free to create markup according to the needs of a particular area, being limited only by the syntax rules of the language. The combination of simple formal syntax, human-friendliness, extensibility, and reliance on Unicode encodings for representing the content of documents has led to the widespread use of both XML itself and a variety of specialized XML-based derivative languages ​​in a wide variety of software tools.

XHTML(English) Ex tensibleH ypert extM arkupL angle-- extensible hypertext markup language) -- a family of XML-based web page markup languages ​​that repeat and extend the capabilities of HTML 4. The XHTML 1.0 and XHTML 1.1 specifications are recommendations from the World Wide Web Consortium, but at the moment its development has been stopped with the recommendation to use HTML. New versions of XHTML are not released.

The main difference between XHTML and HTML is the processing of the document. XHTML documents are processed by their module (parser) in the same way as XML documents. During this processing, errors made by developers are not corrected.

XHTML conforms to the SGML specification because XML is a subset of it. HTML has many features in the process of processing and actually ceased to belong to the SGML family, which is enshrined in the draft HTML 5 specification.

The browser chooses the parser to process the document based on the content-type header received from the server:

HTML - text/html

XHTML - application/xhtml+xml

· For local viewing on the client, the selection is based on the file extension.

· In Internet Explorer up to version 8, there is no parser for processing XHTML documents.

WML(English) Wireless Markup Language-- "wireless markup language") -- document markup language for use in cell phones and other mobile devices according to the WAP standard.

The structure resembles somewhat simplified HTML, but there are key differences, since WML is aimed at devices that do not have the capabilities of personal computers (small screen, not all devices can display graphics, small memory size, etc.): all information in WML is contained in the so-called "decks" (eng. deck). Dec is the smallest unit of data that can be transferred by the server. The decks contain "cards" ( card) (each map is limited by tags and). There should always be at least one card in one deck, but there may be several. At the same time, only one card is displayed on the device screen at a time, and the user can switch between them by clicking on the links - this is done to reduce the number of requests for information to the server; at the same time, the size of WML pages should not exceed 1-4 kilobytes.

VML(English) Vector Markup Language-- vector markup language) was developed by Microsoft to describe vector graphics. VML was submitted to the W3C in 1998 by Microsoft, Macromedia, and others. Around the same time, Adobe, Sun, and several other companies submitted PGML documents for consideration. Both of these languages ​​later became the basis for SVG.

PGML (Precision Graphics Markup Language, loosely translated into Russian - "precision graphics markup language") - an XML-based markup language used to describe vector graphics on a web page (diagrams, individual interface elements) in the form of text in XML format, uses an image construction model , similar to PDF and PostScript. It was submitted to the W3C consortium by Adobe Systems, IBM, Netscape Communications and Sun Microsystems in 1998, but was not accepted as recommended. Almost simultaneously, Microsoft submitted its VML project for consideration, a year later a more advanced SVG language was developed, based on the idea of ​​​​two technologies. SVG has received a W3C recommendation and has become the main format for describing vector graphics on a web page.

SVG(from English. S calableV ectorG raphics-- scalable vector graphics) -- the scalable vector graphics markup language, created by the World Wide Web Consortium (W3C) and included in a subset of the extensible markup language XML, is designed to describe two-dimensional vector and mixed vector / bitmap graphics in XML format. Supports both still and animated interactive graphics -- or, in other terms, declarative and scripted. Does not support the description of three-dimensional objects. It is an open standard that is a recommendation of the W3C, the organization behind standards such as HTML and XHTML. SVG is based on the VML and PGML markup languages. Developed since 1999.

XBRL(English) eX tensibleB businessR eportingL angle, lit. Extensible Business Reporting Language is an open standard for electronic financial reporting. The XBRL format is based on the Extensible Markup Language XML. XBRL uses the XML syntax as well as XML-related technologies such as the XML namespace, XML Schema, XLink, and XPath. One of the purposes of XBRL is to represent and exchange financial information, such as the financial statements of companies. The XBRL language specification is developed and published by XBRL International, Inc., an independent international organization.

To improve the visual perception of the web, CSS technology has become widely used, which allows you to set uniform design styles for many web pages. Another innovation worth noting is the URN resource naming system. Uniform Resource Name).

A popular concept for the development of the World Wide Web is the creation of a semantic web. The Semantic Web is an add-on to the existing World Wide Web, which is designed to make the information posted on the network more understandable to computers. The Semantic Web is the concept of a network in which each resource in human language is provided with a description that a computer can understand. The Semantic Web provides access to clearly structured information for any application, regardless of platform and regardless of programming languages. Programs will be able to find the necessary resources themselves, process information, classify data, identify logical relationships, draw conclusions, and even make decisions based on these conclusions. If widely adopted and implemented well, the Semantic Web has the potential to revolutionize the Internet. To create a computer-friendly description of a resource, the Semantic Web uses the RDF format (Eng. Resource Description Framework), which is based on XML syntax and uses URIs to identify resources. New in this area is RDFS (Eng. RDF Schema) and SPARQL (eng. Protocol And RDF Query Language) a new query language for fast access to RDF data.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Similar Documents

    Definition of the concept of hypertext. The main parts of an SGML document. The history of the creation of the standard markup language for HTML documents. Differences between XHTML syntax and HTML. RSS is a family of XML formats for describing news feeds. Using the KML markup language.

    presentation, added 02/15/2014

    Fundamentals of the programming language of Web pages - HTML. The types of information a Web page can contain are text, graphics, sound, animation, and video. Toolkit for creating Web-pages. Basic HTML editors that are used for Web design.

    abstract, added 01/19/2011

    General characteristics of the Hypertext Markup Language. The structure of the HTML document. An overview of the main features of HTML. Elements of modern web-page design. Analysis of the practical application of HTML (on the example of training programs).

    term paper, added 11/24/2012

    Basic tags and attributes of the HTML language. Creation of a website, which should be several interconnected pages. Consider different attribute and tag values ​​on pages and other documents. Screen forms of the developed pages.

    laboratory work, added 04/16/2014

    What is markup. A markup language is a set of conventions about formatting principles that are used to encode text blocks. Possibilities of SGML, HTML, XML formats, creation history, application specifics, control over information placement.

    abstract, added 03/22/2010

    The new hypertext markup language XHTML. Validation of XHTML documents, determination of their type. Common mistakes in XHTML markup. Conformity of user agents. Using XHTML with other namespaces. Extension of HTML semantics.

    term paper, added 07/14/2009

    Studying the recursive descent algorithm and the grammar building system using the Lex lexical analyzer. Writing an interpreter program for the HTML markup language. Checking the input sequence for the correctness of the input as a general function of the program.

    control work, added 12/25/2012

We have released a new book, "Social Media Content Marketing: How to get into the head of subscribers and make them fall in love with your brand."

HTML is a Hypertext Markup Language.

Language is used to organize web pages. Let's draw an analogy. You buy a newspaper. It contains several articles. Each article has a title, it has photos. And the text is typed in several columns. This is the structure of a newspaper page.

On the site, everything is the same. To make the correct structure of the article - content - you need to use the text markup language.

What is HTML for?

HTML is needed to tell the browser how to display the page on the screen.

The language is ubiquitous. This is a universal tool for decorating content on a page. It can be used in any browser. If you write code in a programming language, you need to know some features, operators, data types, and so on.

HTML consists of a set of tags - commands, and attributes - properties. They are easy to remember, and you can always find reference materials.

What is HTML code

The code is the instructions to the browser how to display the page. There is a structure that must always be respected. For example, the presence of only one H1 heading on the page, the main information is placed in sections, etc.

The language has three tools.

There are two types of tags - paired and single.

The structure of the HTML code on the page

We said that the structure of any html document is always the same. Below are the required elements.

  1. !- indicates that the document uses HTML.
  2. ...- This tag contains the entire code of the page. Everything that is not placed in it is not recognized by the browser and is not displayed.
  3. ...- a pair tag, it contains technical information, for example, about the encoding of a document.
    1. ... is the title of the page and is placed inside the head section. Each page must have its own unique title.
    2. - This is official information. It connects individual styles to the page - css, etc. It is not displayed to the user.
  4. ...

    - the body of the page. All basic information is contained in this tag.
    1. ...- hyperlinks.
    2. - Images.
    3. ...- thumbnail.
    4. ...- italics.

There can be an unlimited number of elements inside the body.

For example, this is how part of the page code for one of our blog posts looks like.

The more often you use tags, the faster they are remembered. You can always find a reference book with all tags, attributes and their values.

Lightweight markup languages

Languages ​​designed for easy and fast writing of text in a simple text editor are called lightweight(en:Lightweight markup language). Features of such languages:

  • Minimum features.
  • Small set of supported tags .
  • Easy to learn.
  • The source text in such a language is read with the same ease as the finished document.

They are used where a person has to prepare text in a regular text editor (blogs, forums, wikis), or where it is important that a user with a regular text editor can also read the text. Here are some widely used lightweight markup languages:

  • Wiki markup (see Wikipedia:How to edit articles)
  • Various auto-documentation systems (eg Javadoc).

Story

The term "markup" (markup) comes from the phrase "marking up" ( mark, markup- Eng.) from the traditional publishing practice of putting down special conditional marks in the margins and in the text of a manuscript or proofreading before sending it to print. Thus, "markup men" indicated the typeface, style and font size for each part of the text. Nowadays, text markup is handled by editors, proofreaders, graphic designers - and, of course, the authors themselves.

GenCode

The idea of ​​using markup languages ​​in computer word processing was most likely first introduced by William Tunnicliffe. William W. Tunnicliffe ) at a conference in 1967. He himself called his proposal "universal coding" (Eng. generic coding). During the 1970s, Tunnicliffe led the development of the GenCode standard for the publishing industry and later became chairman of a committee of the International Organization for Standardization (ISO). International Organization for Standardization ), who created SGML, the first descriptive markup language. Brian Reid (ur. Brian Reid ) in his dissertation, which he defended in 1980 at Carnegie University (Eng. Carnegie Mellon University ), in the development of the proposed concept, carried out the practical implementation of descriptive markup.

However, IBM researcher Charles Goldfarb is now commonly referred to as the "father" of markup languages. Charles Goldfarb ). The basic concept came to him in 1969 while working on a primitive document management system designed for law firms. In the same year, he took part in the creation of the IBM GML language, which was first introduced in 1973.

Some early implementations of computer markup languages ​​can be found in UNIX typography utilities such as troff and nroff . They allow you to insert formatting commands into the text of a document to format it according to the requirements of the editor.

Availability of publishing software with WYSIWYG function (eng. "what you see is what you get" what you see is what you get) has supplanted most of these languages ​​among general users, although serious publishing work still uses markup for specific non-visual text structures, and WYSIWYG editors now most commonly save documents in formats based on markup languages. .

TeX

Another important publishing standard is TeX, created and subsequently improved by Donald Knuth in the 70s and 80s of the twentieth century. TeX has brought together powerful text formatting and font description capabilities, especially for professional-quality math books. This took a lot of time for Knuth to learn the art of typesetting. However, TeX has gone downhill so that it is now mostly used in the scientific world, where it is the de facto standard in many scientific disciplines. In addition to Tex, there is LaTeX, which is a widely used TeX-based descriptive markup system.

Scribe, GML and SGML

The first language with a clear and distinct distinction between document structure and kind was Scribe, created and described by Brian Reid's doctoral dissertation in 1980. Scribe was revolutionary in the number of ways in which it was processed, not least because of the introduction of the idea of ​​styles that are separate from text and grammar proper and control the use of descriptive elements. Scribe was influential in the development of the GML language (later SGML) and is also the direct ancestor of the HTML and LaTeX languages.

In the early 80s, the idea that markup should focus on the structural aspects of a document and should leave the external representation of the document to the interpreter led to the creation of SGML. The language was developed by a committee headed by Goldfarb. He combined ideas from many sources, including the Tunnikofflick project, GenCode. Sharon Adler, Anders Berglund and James A. Marke were also key members of the SGML committee.

SGML precisely defined the syntax for including markup in text, as well as separately describing which tags are allowed and where (DTD - Document Type Definition). This allowed authors to create and use any markup they wanted, choosing which tags to use and giving them names in the normal language. Thus, SGML should be considered a meta-language; multiple special markup languages ​​have descended from it. The late 80s were most significant in the emergence of new markup languages ​​based on SGML, such as TEI and DocBook.

In 1986, SGML was published as an International Standard by ISO 8879. SGML has found wide acceptance and has been widely used in very large projects. However, it was generally found to be cumbersome and difficult to learn, a side effect of the language being that it tried to do too much and be too flexible. For example, SGML created end tags (or start tags, or even both) that were not always needed because it believed that this markup would be added manually by the project support staff, who would appreciate the savings in keystrokes.

HTML

By 1991, the use of SGML was limited to business programs and databases, while WYSIWYG tools (which saved documents in proprietary binary formats) were used for other document processing programs. The situation changed when Sir Tim Berners-Lee learned about SGML from his colleague Anders Bergland. Anders Berglund ) and others at CERN, used the SGML syntax to generate the HTML. It was similar to other markup languages ​​based on the SGML syntax, but it was much easier to get started, even for developers who had never done so. Steven DeRose argued that HTML using descriptive markup (and from SGML in particular) was a major factor in the development of the Web because it was designed to be flexible and extensible (as well as other factors including the concept of URLs and free use by browsers). HTML is the most attractive and most used markup language in the world today.

However, HTML's status as a markup language has been disputed by some computer scientists. Their main argument is that HTML restricts tag placement by requiring both tags to be nested within other tags or within the document's main tags. As a result, these scholars consider HTML to be a container language following a hierarchical model.

XML

XML (Extensible Markup Language) is a meta markup language widely used today. XML is developed by the World Wibe Web Consortium in a committee chaired by Jon Bosak. The main purpose of XML is to be simpler than SGML and to focus on a specific problem - documents on the web. XML is a meta language like SGML, users are allowed to create any tags they want (hence "extensible"). The rise of XML was helped because every XML document could be written the same way as an SGML document, and programs and users using SGML could migrate to XML fairly easily.

However, XML lost many of the human-centric features of SGML that made it easier to use (until the amount of markup increased and readability and editability were restored to the same level). Other enhancements fixed some SGML issues internationally and made it possible to parse a document hierarchically even if no DTD was available.

XML was designed primarily for semi-structured environments such as documents and publications. However, it resulted in a sweet spot between flexibility and simplicity, and it was quickly adopted by many users. Nowadays, XML is widely used for passing data between programs. Like HTML, it can be described as a "container" language.

XHTML

Since January 2000, all recommendations to the W3C have been based on XML rather than SGML, the acronym XHTML (Extensible HyperText Markup Language - Extensible HyperText Markup Language) has been proposed. The language specifications required that XHTML documents be formatted as XML documents, this allows XHTML to be used for clearer and more precise documents using tags from HTML.

One of the most noteworthy differences between HTML and XHTML is the rule that all tags must be closed: empty tags, for example<br/> must both be closed with a standard end tag or a special entry:<br/> (the space before the "/" in the closing tag is optional, but often used as it is used by some pre-XML browsers, also by SGML parsers). Other attributes in the tags must be in quotes. Finally, all tags and attribute names must be written in lowercase to be read correctly; HTML is case insensitive.

Other XML based developments

Many XML-based developments are now in use, such as RDF (Resource Description Framework), XFORMS, DocBook, SOAP, and OWL (Ontology Web Language).

Peculiarities

A common feature of all markup languages ​​is that they mix document text with markup instructions in a data stream or file. It is not necessary, it is possible to isolate markup from text using pointers, labels, identifiers, or other coordination methods. This "separated markup" is typical for the internal representation of programs that work with markup documents. However, embedded or "interline" markup is more accepted elsewhere. For example, here is a small piece of text marked up with HTML:

Anatidae

The family Anatidae includes ducks, geese, and swans, but not the closely related screamers.

Markup instruction code (known as tags) surrounded by angle brackets<как здесь>. The text between these instructions is the text of the document. Codes h1, p and em- examples of structural markup, they describe the position, purpose or meaning of the text included in them.

More accurately, h1 means "this is a first level heading", p means "this is a paragraph", and em means "this is the underlined word or phrase". The interpreter can apply these rules or styles to display different parts of the text using different typefaces, font sizes, indentation, color, or other styles as needed. A tag such as h1 may, for example, be represented in large, bold typeface, or in a document with monospaced text (like a typewriter) may be underlined, or may not change appearance at all.

For contrast, tag i in HTML, an example of visual markup; it is usually used to identify specific features of text (use italic typeface in this block) without explanation.

The TEI (Tex Encoding Initiative) has published comprehensive guidance documents specifying how to encode text for the benefit of humanity and scientific societies. These manuals were used to code historical documents, specific works of scholars, periodicals, and so on.

Alternative uses

While the idea of ​​using markup languages ​​with text documents was developing, it increased the use of markup languages ​​in other areas, suggesting that they be used to represent various types of information, including playlists, vector graphics, web services, user interfaces. Most of these applications are based on XML because it is a highly structured and extensible language.

The use of the XHTML language also shows that it can be combined with different markup languages ​​of the same profile, such as XHTML+SMIL or XHTML+MathML+SVG.

(Standard Generalized Markup Language), presented in the ISO 8879 standard. This language is accepted as the main language for the design of technical documentation, including interactive electronic technical manuals for products created in CALS technologies.

SGML defines the structure of documents as a sequence of data objects. Data objects representing parts of a document can be stored in different files. The SGML standard establishes such sets of symbols and rules for representing information that allow various systems to correctly recognize and identify this information. These sets are described in a separate part of the document, called a DTD (Document Type Definition), which is transmitted along with the main SGML document. The DTD specifies the correspondence between characters and their character codes, the maximum lengths of identifiers used, how tag delimiters are represented, other possible conventions, the syntax of the DTD, and the document type and version. Therefore, SGML can be called a metalanguage for a family of specific markup languages. In particular, the markup languages ​​XML and HTML can be considered subsets of SGML.

The technical description in the form of an SGML document includes:

  • the main file with the technical manual marked up with SGML tags;
  • description of entities, if the document belongs to a group in which the same entities are used and their fame is implied;
  • dictionary to explain SGML tags;

However, SGML is difficult to learn and use. Therefore, for the wide use of markup in documents presented in WWW technologies, in 1991, based on SGML, a simplified HTML (HyperText Markup Language) was developed, and in 1996, XML (eXtensible Markup Language), which becomes, in combination with HTML is the main language for representing documents in various applications.

The HTML language was developed with the aim of widely using markup in documents presented in WWW technologies.

An HTML description is ASCII text and a sequence of commands (control codes) included in it, also called descriptors or tags. This text is called an HTML document, or an HTML page, or, once hosted on a Web server, a Web page. Tags are placed in the right places in the source text, they define fonts, hyphenation, the appearance of graphic images, links, etc. When using WWW editors, inserting commands is done by simply pressing the appropriate keys.

XML, like HTML, is considered a subset of SGML. Currently, the XML language claims to be the main language for representing documents in information technology; it can be considered as a metalanguage that serves as the basis for creating private markup languages ​​in various applications. At the same time, XML is more convenient than SGML, which is ensured by the elimination of some minor features of SGML in XML. Descriptions in XML are easier to understand, adapted for use in modern browsers while maintaining the basic features of SGML.

For specific applications, their own variants of XML are created, called XML dictionaries or XML applications. So, for the description of texts with specific mathematical symbols, an XML-application OSD (Open Software Description) has been developed. For CALS, the Product Definition eXchange (PDX) variant of data exchange is of interest. Known dictionaries for chemistry (CML - Chemical Markup Language), biology (BSML - Bioinformatic Sequence Markup Language), etc.

Read also: