What is a markup language. How to learn HTML markup language. Descriptive markup systems

HTML markup language

To date, there are many technologies for creating Web pages, without which a Web master cannot do. But the basis for developing Web documents is, of course, the HTML hypertext markup language.

HTML is primarily a markup language, and code written in it is executed on the client's computer in a Web browser application. Related to this is its relative simplicity and ease of development.

Why do you need a markup language?

When you create a regular document in a word processor program, you can easily format the document, such as setting characters to italic or bold, giving a paragraph a heading or plain text style, and so on. What you do with the document on the monitor screen is transferred to paper in the same form when printed on the printer.

Whether you select an option from the drop-down menus or give a key command, you immediately see the result of your efforts on the screen. However, the specific commands that implement the display of the document on the screen or on paper will be hidden from you.

In the case of Web pages, the user does not deal with paper, but with electronic documents received via the Internet. The principle of displaying a document with formatting tools of the parent application is not acceptable here. The user would have to have too many applications or all kinds of converters on his computer in order to work effectively with the many possible document formats.

The idea of ​​solving the problem of exchanging documents between different computers and applications over the Internet is based on the HyperText Markup Language (HTML). This language was created more than 15 years ago as a document format standard and has been adopted by the vast majority of Internet users, and most importantly, by all manufacturers of software and hardware for the Web. Documents marked up according to HTML can be read on any computer that has only one viewer for such documents - a browser.

Thanks to the HTML markup language, a Web client can view a document on the screen of his computer in the form in which the developer intended it: with certain font sizes and paragraph breaks, with a certain arrangement of pictures, hyperlinks, and so on.

A text document written in HTML has a size in bytes that is several times smaller than the size of a similar document prepared in a word processor (for example, Word).

Berners-Lee (the developer) based the language being developed on the basis of the SGML language and methods of working with hypertext, which is the reason for the name of the language he created - HTML. The new language used the basic SGML constructs to describe documents and hypertext links.


Hypertextis a way of organizing text, graphics, and other data in which data elements are linked together. Both elements of one document and elements of different documents can be linked. The hypertext structure is at the heart of the World Wide Web.

Hypertexts are electronic documents. You can work with hypertexts only on a computer; in printed form, hypertexts do not exist. An example of a hypertext system is the well-known Windows help system.

Connections in the hypertext structure are carried out using links. Thanks to links, the user can call another document from one document, the next document from it, and so on.

In 1989, Berners-Lee developed an information system reminiscent of a string of linked documents. Documents are stored on servers located all over the world and interconnected by Internet channels. He developed HTTP protocol - the language in which servers are to exchange hypertext documents, and wrote the first Web server and browser programs. He approached the Internet community directly, and in 1991 enthusiasts began to create the first Web sites.

Over the years, the World Wide Web has grown rapidly to become the most popular service on the Internet. It currently satisfies the information needs of the widest range of users, including millions of Web sites. Large sites host thousands and hundreds of thousands of documents, and the total number of documents on the WWW is increasing every second, since a huge army of specialists and amateurs in different parts of the globe is working on their creation.

world wide webor abbreviated web- it is a global hypertext information dissemination system using the Internet as a transport channel.

In fact, the World Wide Web is a hypertext space of documents that is not related to the geography of the Web sites themselves. Therefore, in this space, the physical distance between nodes does not make sense. You can view Web pages on the monitor screen in the same way, which are stored both on a computer disk in the next room and on a server located in another country.

The World Wide Web operates according to certain standards, which are developed and implemented by an association of research and industrial organizations - a consortium W3C(short for World Wide Web Consortium). .

The HTML markup language was based on the SGML language. Markup tools for paragraphs, headings, lists and other elements available in HTML were also provided in SGML. The merit of the inventor of HTML is that he introduced into the markup language what was not SGML - these are hypertext links.

Markup languages

Markup language(text) in computer terminology, a set of characters or sequences inserted into text to convey information about its output or structure. It belongs to the class of computer languages. A text document written using a markup language contains not only the text itself (as a sequence of words and punctuation marks), but also additional information about its various parts - for example, an indication of headings, highlights, lists, etc. In more complex In some cases, a markup language allows you to insert interactive elements and content from other documents into a document.

It should be noted that a markup language is not Turing complete and is not usually considered a programming language, although strictly speaking it is.

HTML (from English. Hyper Text Markup Language- "Hypertext Markup Language") - was developed by the British scientist Tim Berners-Lee around 1986--1991 within the walls of the European Center for Nuclear Research in Geneva (Switzerland). HTML was created as a language for the exchange of scientific and technical documentation, suitable for use by people who are not specialists in the field of layout. HTML successfully dealt with the complexity of SGML by defining a small set of structural and semantic elements called descriptors. Descriptors are also often referred to as "tags". With HTML, you can easily create a relatively simple yet beautifully designed document. In addition to simplifying the structure of the document, support for hypertext has been added to HTML. Multimedia features were added later.

Initially, the HTML language was conceived and created as a means of structuring and formatting documents without being tied to the means of reproduction (display). Ideally, text with HTML markup should be reproduced without stylistic and structural distortions on equipment with various technical equipment (color screen of a modern computer, monochrome screen of an organizer, limited-sized screen of a mobile phone or device and programs for voice reproduction of texts). However, the modern use of HTML is very far from its original purpose. For example, tag

, used several times for page formatting, is designed to create the most common tables in documents. Over time, the platform's core idea of ​​HTML independence has been sacrificed in favor of modern needs for multimedia and graphic design.

XML(English) eX tensibleM arkupL angle-- extensible markup language; pronounced [ ex-em-eml]) is a markup language recommended by the World Wide Web Consortium (W3C). The XML specification describes XML documents and partially describes the behavior of XML processors (programs that read XML documents and provide access to their content). XML was designed to be a language with a simple, formal syntax that would be easy for programs to create and process documents, while also being easy for humans to read and create documents, with an emphasis on web use. The language is called extensible because it does not fix the markup used in documents: the developer is free to create markup according to the needs of a particular area, being limited only by the syntax rules of the language. The combination of simple formal syntax, human-friendliness, extensibility, and reliance on Unicode encodings for representing the content of documents has led to the widespread use of both XML itself and a variety of specialized XML-based derivative languages ​​in a wide variety of software tools.

XHTML(English) Ex tensibleH ypert extM arkupL angle-- extensible hypertext markup language) -- a family of XML-based web page markup languages ​​that repeat and extend the capabilities of HTML 4. The XHTML 1.0 and XHTML 1.1 specifications are recommendations from the World Wide Web Consortium, but at the moment its development has been stopped with the recommendation to use HTML. New versions of XHTML are not released.

The main difference between XHTML and HTML is the processing of the document. XHTML documents are processed by their module (parser) in the same way as XML documents. During this processing, errors made by developers are not corrected.

XHTML conforms to the SGML specification because XML is a subset of it. HTML has many features in the process of processing and actually ceased to belong to the SGML family, which is enshrined in the draft HTML 5 specification.

The browser chooses the parser to process the document based on the content-type header received from the server:

HTML - text/html

XHTML - application/xhtml+xml

· For local viewing on the client, the selection is based on the file extension.

· In Internet Explorer up to version 8, there is no parser for processing XHTML documents.

WML(English) Wireless Markup Language-- "wireless markup language") -- document markup language for use in cell phones and other mobile devices according to the WAP standard.

The structure resembles somewhat simplified HTML, but there are key differences, since WML is aimed at devices that do not have the capabilities of personal computers (small screen, not all devices can display graphics, small memory size, etc.): all information in WML is contained in the so-called "decks" (Eng. deck). Dec is the smallest unit of data that can be transferred by the server. The decks contain "cards" ( card) (each map is limited by tags and). There should always be at least one card in one deck, but there may be several. At the same time, only one card is displayed on the device screen at a time, and the user can switch between them by clicking on the links - this is done to reduce the number of requests for information to the server; at the same time, the size of WML pages should not exceed 1-4 kilobytes.

VML(English) Vector Markup Language-- vector markup language) was developed by Microsoft to describe vector graphics. VML was submitted to the W3C in 1998 by Microsoft, Macromedia, and others. Around the same time, Adobe, Sun, and several other companies submitted PGML documents for consideration. Both of these languages ​​later became the basis for SVG.

PGML (Precision Graphics Markup Language, loosely translated into Russian - "precision graphics markup language") - an XML-based markup language used to describe vector graphics on a web page (diagrams, individual interface elements) in the form of text in XML format, uses an image construction model , similar to PDF and PostScript. It was submitted to the W3C consortium by Adobe Systems, IBM, Netscape Communications and Sun Microsystems in 1998, but was not accepted as recommended. Almost simultaneously, Microsoft submitted its VML project for consideration, a year later a more advanced SVG language was developed, based on the idea of ​​​​two technologies. SVG has received a W3C recommendation and has become the main format for describing vector graphics on a web page.

SVG(from English. S calableV ectorG raphics-- scalable vector graphics) -- the scalable vector graphics markup language, created by the World Wide Web Consortium (W3C) and included in a subset of the extensible markup language XML, is designed to describe two-dimensional vector and mixed vector / bitmap graphics in XML format. Supports both still and animated interactive graphics -- or, in other terms, declarative and scripted. Does not support the description of three-dimensional objects. It is an open standard that is a recommendation of the W3C, the organization behind standards such as HTML and XHTML. SVG is based on the VML and PGML markup languages. Developed since 1999.

XBRL(English) eX tensibleB businessR eportingL angle, lit. Extensible Business Reporting Language is an open standard for electronic financial reporting. The XBRL format is based on the Extensible Markup Language XML. XBRL uses the XML syntax as well as XML-related technologies such as the XML namespace, XML Schema, XLink, and XPath. One of the purposes of XBRL is to represent and exchange financial information, such as the financial statements of companies. The XBRL language specification is developed and published by XBRL International, Inc., an independent international organization.

To improve the visual perception of the web, CSS technology has become widely used, which allows you to set uniform design styles for many web pages. Another innovation worth noting is the URN resource naming system. Uniform Resource Name).

A popular concept for the development of the World Wide Web is the creation of a semantic web. The Semantic Web is an add-on to the existing World Wide Web, which is designed to make the information posted on the network more understandable to computers. The Semantic Web is the concept of a network in which each resource in human language is provided with a description that a computer can understand. The Semantic Web provides access to clearly structured information for any application, regardless of platform and regardless of programming languages. Programs will be able to find the necessary resources themselves, process information, classify data, identify logical relationships, draw conclusions, and even make decisions based on these conclusions. If widely adopted and implemented well, the Semantic Web has the potential to revolutionize the Internet. To create a computer-friendly description of a resource, the Semantic Web uses the RDF format (Eng. Resource Description Framework), which is based on XML syntax and uses URIs to identify resources. New in this area is RDFS (Eng. RDF Schema) and SPARQL (eng. Protocol And RDF Query Language) a new query language for fast access to RDF data.

We have released a new book, "Social Media Content Marketing: How to get into the head of subscribers and make them fall in love with your brand."

HTML is a Hypertext Markup Language.

Language is used to organize web pages. Let's draw an analogy. You buy a newspaper. It contains several articles. Each article has a title, it has photos. And the text is typed in several columns. This is the structure of a newspaper page.

On the site, everything is the same. To make the correct structure of the article - content - you need to use the text markup language.

What is HTML for?

HTML is needed to tell the browser how to display the page on the screen.

The language is ubiquitous. This is a universal tool for decorating content on a page. It can be used in any browser. If you write code in a programming language, you need to know some features, operators, data types, and so on.

HTML consists of a set of tags - commands, and attributes - properties. They are easy to remember, and you can always find reference materials.

What is HTML code

The code is the instructions to the browser how to display the page. There is a structure that must always be respected. For example, the presence of only one H1 heading on the page, the main information is placed in sections, etc.

The language has three tools.

There are two types of tags - paired and single.

The structure of the HTML code on the page

We said that the structure of any html document is always the same. Below are the required elements.

  1. !- indicates that the document uses HTML.
  2. ...- This tag contains the entire code of the page. Everything that is not placed in it is not recognized by the browser and is not displayed.
  3. ...- a pair tag, it contains technical information, for example, about the encoding of a document.
    1. ... is the title of the page and is placed inside the head section. Each page must have its own unique title.
    2. - This is official information. It connects individual styles to the page - css, etc. It is not displayed to the user.
  4. ...

    - the body of the page. All basic information is contained in this tag.
    1. ...- hyperlinks.
    2. - Images.
    3. ...- thumbnail.
    4. ...- italics.

There can be an unlimited number of elements inside the body.

For example, this is how part of the page code for one of our blog posts looks like.

The more often you use tags, the faster they are remembered. You can always find a reference book with all tags, attributes and their values.

Document markup language is a set of special instructions, called tags, designed to form a structure in documents and define relationships between various elements of this structure. Language tags, or control tags as they are sometimes called, in such documents are distinguished from the main content of the document and serve as instructions for the program that displays the content of the document on the client side. The earliest systems used the symbols “<” и “>”, inside which the names of instructions and their parameters were placed. Now this way of naming tags is standard.

The use of hypertext breakdown of a text document in modern information systems is largely due to the fact that hypertext allows you to create a mechanism for non-linear information viewing. In such systems, data is presented not as a continuous stream of textual information, but as a set of interrelated components, the transition through which is carried out using hyperlinks.

Today's most popular hypertext markup language, HTML, was created specifically for organizing information distributed on the Internet, and is one of the key components of WWW technology. With the use of the hypertext document model, the way of presenting various information resources on the web has become more streamlined, and users have received a convenient mechanism for searching and viewing the necessary information.

HTML is a simplified version of the standard general markup language - SGML (Standart Generalized Markup Language), which was approved by ISO as a standard back in the 80s. This language is intended to create other markup languages, it defines the allowed set of tags, their attributes and the internal structure of the document. Control over the correct use of descriptors is carried out using a special set of rules called DTDs, which are used by the client program when parsing a document. Each class of documents defines its own set of rules that describe the grammar of the corresponding markup language. Using SGML, you can describe structured data, organize the information contained in documents, and present this information in some standardized format. But due to some of its complexity, SGML was used mainly to describe the syntax of other languages ​​(the most famous of which is HTML), and few applications dealt directly with SGML documents.

Much simpler and more convenient than SGML, HTML allows you to define the design of document elements and has a certain limited set of instructions - tags, with which the markup process is carried out. HTML instructions are primarily intended to control the process of displaying the contents of a document on the screen of a client program and thereby determine the way the document is presented, but not its structure. The element of the hypertext database described by HTML is a text file that can be easily transferred over a network using the HTTP protocol. This feature, as well as the fact that HTML is an open standard and a huge number of users have the opportunity to use the capabilities of this language to design their documents, certainly influenced the growth in popularity of HTML and made it the main mechanism for presenting information on the Web today.

However, modern applications need not only a language for presenting data on the client screen, but also a mechanism that allows you to determine the structure of a document and describe the elements contained in it. HTML has a simple set of commands and quite successfully copes with the task of describing textual information and displaying it on the screen of a browser viewer. However, the displayed data itself has nothing to do with the tags that are used for formatting, so parser programs do not have the ability to use HTML tags to find the document fragments we need. Those. having met, for example, such a description

rose

the viewer will know what color to display the text contained within the tags and, probably, it will display it correctly, but it is absolutely indifferent to it where this tag is found in the document, what other tags the current fragment is enclosed in, whether there are fragments nested in it, whether relations between objects are built correctly. Such "indifference" to the structure of the document leads to the fact that the search or analysis of information inside it will not differ in any way from working with a continuous text file that is not divided into elements. And this, as you know, is not the most efficient way of working with information.

Another significant disadvantage of HTML is the limited set of its tags. DTD rules for HTML define a fixed set of descriptors, and therefore the developer does not have the opportunity to enter his own, special tags. Although new language extensions appear from time to time, but the long way of their standardization, accompanied by constant disagreements between major browser manufacturers, makes it almost impossible to quickly adapt the language, its use for displaying specialized information (for example, multimedia, mathematical, chemical formulas, etc.) .

Summarizing all that has been said, it can be argued that HTML today does not fully satisfy the requirements that modern developers place on languages ​​of this kind. And it was replaced by a new hypertext markup language, a powerful, flexible, and, at the same time, convenient XML language.

XML (Extensible Markup Language) is a markup language that describes a whole class of data objects called XML documents. This language is used as a means to describe the grammar of other languages ​​and to control the correctness of documents. Those. XML itself does not contain any tags to be marked up, it simply defines the order in which they are created. Thus, if, for example, we think that to denote the rose element in a document, it is necessary to use the tag ; then XML allows us to freely use the tag we define, and we can include snippets like the following in the document:

rose

The set of tags can be easily extended. If, suppose, we also want to indicate that the description of a flower should go inside the description of the greenhouse in which it blooms, then we simply set new tags and choose the order in which they appear:

rose

The process of creating an XML document is very simple and requires only a basic knowledge of HTML and an understanding of the tasks to be performed using XML as a markup language. Thus, developers have a unique opportunity to define their own commands, allowing them to most effectively determine the data contained in the document. The author of the document creates its structure, builds the necessary links between the elements, using the commands that meet his requirements, and achieves the type of markup that he needs to perform the operations of viewing, searching, analyzing the document.

Another obvious advantage of XML is its ability to be used as a general-purpose language for querying information stores. The W3C is currently reviewing a working version of the XML-QL (or XQL) standard, which may be a serious competitor to SQL in the future. In addition, XML documents can act as a unique way to store data, which includes both tools for parsing information and presenting it on the client side. In this area, one of the promising areas is the integration of Java and XML technologies, which allows using the power of both technologies in building machine-independent applications that also use a universal data format for information exchange.

XML also allows you to control the correctness of data stored in documents, check hierarchical relationships within a document, and establish a single standard for the structure of documents, the content of which can be a variety of data. This means that it can be used in building complex information systems, in which the issue of information exchange between different applications running in the same system is very important. By creating the structure of the information exchange mechanism at the very beginning of work on the project, the manager can save himself in the future from many problems associated with the incompatibility of the data formats used by various components of the system.

Also, one of the advantages of XML is that the programs that process XML documents are not complicated, and today all kinds of software products designed to work with XML documents have appeared and are freely distributed. XML is supported in IE5. It was announced to be supported in subsequent versions of Netscape Communicator, Oracle DBMS, DB-2, in MS-Office applications. All this suggests that, most likely, in the near future XML will become the main information exchange language for information systems, thus replacing HTML. On the basis of XML, such well-known specialized markup languages ​​as SMIL, CDF, MathML, XSL have already been created, and the list of working drafts of new languages ​​that are under consideration by the W3C is constantly updated.

The XSLT language is used to process documents, make changes and necessary additions to the markup. It can be used to convert XML code into formatted HTML code that is easily readable by a human. You can also convert the XML document to plain text or another restructured XML document, or even to a JavaScript document. The XSLT language provides access to the content of XML documents and is also used to create new documents based on them. For these reasons, it's worth learning the XSL language.

It is more common to convert XML documents to HTML documents, and it is this operation that is covered in the examples in this chapter.

Two documents are used to perform an XSLT transformation: the document to be converted and the style sheet that defines the transformation itself. In this case, we are talking about XML documents.

Any document has three components:

structure;

Content is the information that is displayed in the document. The content of the document on paper can be purely textual, and also contain images. If the document is presented in electronic form, it may contain multimedia data, as well as links to other documents. Although the content of different documents varies, they can be classified by type, such as a book or a train ticket.

The style of a document determines the form in which its content is displayed on a particular device (for example, a printer or display). The concept of style includes the characteristics of the font (name, size, color) of the entire output document or its individual blocks, the order of pagination, the arrangement of blocks on pages, and other parameters. The same document can be displayed in different styles both on different media and on the same media.

Document markup languages ​​are artificial languages ​​designed to describe the structure of a document and the relationships between various structure objects. Markup data is also called metadata.

The first markup language is GML (Generalized Markup Language), developed by IBM employees back in the 60s of the last century. Its immediate successor was the SGML language (Standard Generalized Markup Language - standard generalized markup language), which defines the rules for writing document markup elements. A document that follows the rules of the language is called an SGML document.

The SGML language is defined in the ISO 8879 standard, which specifies the following basic requirements for a document markup language:

The language must be human readable.

· Marked up document files must be textual and encoded using ASCII (American Standard Code for Information Interchange) code characters. However, the content of the document does not have to be ASCII encoded or textual.

SGML and similar languages ​​use special document markup tools:

Elements and related attributes;

· entities (entities);

comments.

The structural unit of an SGML document is an element. In markup text, each element must be highlighted in a certain way. Selection is performed by inserting a start tag (from the English word tag - label) at the beginning of the element (start tag) and an end tag (end tag) at the end of the element. The start and end tags have the same name. To distinguish tags from plain text, they must begin with a symbol - a sign of the start of a tag and end with a character - a sign of the end of a tag. In addition, a symbol is specified in the end tag - a sign of the end tag. In SGML, any characters can be specified as such signs, however, the symbol "<" (левая угловая скобка), в качестве признака окончания тега используется символ ">" (left angle bracket) and the end tag feature "/" (slash). Elements in an SGML document can contain other elements, resulting in a graphical representation of the SGML document as a hierarchical (tree) structure.


Example 4.3.1. An SGML document specifying a list of students with the results of their examination session can be specified as follows:

List of student grades in session

Ivanov Ivan Ivanovich

TS-61

A

B

B

B

Petrov Petr Petrovich

TS-62

C

C

D

C

In this document, the first element is the student-list element. This element contains one title element (title) and several student elements (student data). In turn, each student element contains one full-name element (last name, first name and patronymic of the student), one group-number element (group number) and one mark-list element (list of student grades in the session). And finally, the mark-list element contains several mark (evaluation) elements.

Graphical representation of this list in fig. 4.3.1 has a tree structure:

Rice. 4.3.1. SGML document structure in graphical representation

You can use attributes to refine SGML elements. Attributes are written in the start tag of an element in the following form:

attribute-name="attribute-value".

An element can have multiple attributes. Attributes are separated from each other and the element name by at least one space.

Example 4.3.2. For the mark elements in example 4.3.1, you can specify the subject attribute, the value of which is the name of the discipline in which the exam was taken. Then for the first student, the elements will take the following form:

A

B

B

B

Languages ​​like SGML use entities to work with groups of data. An entity is any named data, both textual and non-textual. When viewing the document, the name of the entity is replaced by its value. So, for example, the name of the text entity kpi will be replaced by its value: Kyiv Polytechnic Institute, and the non-text entity image1 will be replaced by an image named image1.

Read also: