What is parsing? Comprehensive guide to parsing

As a programmer, parsing is one of the best branches of Computer science worth understanding. After reading this article, you should understand the meaning of parsing, why it is used in programming, and its application in some technologies.

The meaning of Parse
Why do we use Parse in programming
Applications of Parse in Programming with Detailed Examples
Resources

The meaning of Parse

Parse, in computer science, can be defined as the decomposition of structures into their constituent parts. Each component is then analyzed into its correct syntax under a given grammar. The term parsing comes from the Latin pars (orationis), meaning part (of speech).
It may also be used to describe a split or separation.

Why do we use parse in programming

Interest in parsing goes back to the first attempts to apply computers to natural languages. Language messages fall apart into sentences composed of words, which in turn consist of symbol sequences when written. Languages can differ on all three levels of composition. The script can be slightly different between English and Irish or very different between English and Chinese. Words tend to differ significantly, which can alter the sentence structure. So an utterance like "Aisha loves food" versus "Food loves Aisha" is definite in one detail, but in another language, it might appear as "Aisha food loves".

The computer scientist takes a very abstract view of all this. Yes, a language has sentences, and these sentences possess structure; whether they communicate something or not is not his concern, but information may be derived from their structure, and then it is pretty all right to call that information the “meaning” of the sentence. So "parsing is born" to achieve a clear, well-understood, and unambiguous means of describing objects in the computer and communicating with the computer.

Applications of Parse in Programming

Parsing is performed as a method of understanding the exact meaning of a structure. Parsers are applied in many areas of programming, which include the following:

Information and document retrieval.
Knowledge and information extraction.
Machine translation.
Speech recognition and understanding
and lots more

Information and document retrieval

The obtained structure from a parsed structure, such as a document, helps process the object further. Once the structure of a document has been brought to the surface, it can be converted more easily.
We can see this application in the JavaScript engine inside browsers.
The browser parses HTML into a DOM tree. HTML parsing involves tokenization and tree construction stages. HTML tokens include DOCTYPE, start tag, end tag, comment, character, and attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the construction stage associated with a DOM document object, and the ‘output’ of the construction stage builds up the document tree.

A simple HTML Parsing process

Fig 1. A simple HTML Parsing process

The stylesheet is linked and parsed to the browser through the HTML. The browser parses the CSS style text into the CSS Object Model (or CSSOM), a data structure for styling layouts and painting. The CSSOM is combined with the DOM to form a render tree. This render tree contains the required nodes to render the content to the screen. JavaScript is also downloaded, parsed, and then executed.

Knowledge and information extraction

The parsing technique can extract specific (pre-specified) information from textual sources. This applies to the methods used in our regular programming codes. Some JavaScript methods are given below:

ParseInt: The parseInt method parses a value as a string and returns the first integer.

parseInt("025"); //returns 25
parseInt("30.88"); //returns 30
parseInt("30 cars"); //returns 30

parseFloat: The parseFloat() method parses a value as a string and returns only a number.

parseFloat("25.00"); //returns 25
parseFloat(" 50 "); //returns 50
parseFloat("50 years"); //returns 50
parseFloat("50H"); // returns 50
parseFloat("I am a girl"); //returns NaN

In retrospect, whenever the parse method is used, it returns only the specified method as given above. There are many more methods applied in high-level programming languages.

Machine Translation

Machine translation is the process of applying computer language to natural language (that is, human language, e.g. English) that involves the application of parsing technology. Any application of computers to linguistic material requires some process of decomposing the sentences in the material into relevant parts — parsing.
During the development of high-level programming languages, parsing techniques were developed for the compilers and interpreters of these languages. Languages like C++ and Java are parsed by their respective compilers before being transformed into executable machine code. Scripting languages, like PHP and Perl, are parsed by a web server, allowing the correct HTML to be sent to a browser.

Other applications of parsing techniques can be found in Speech recognition and understanding, Text summarization, Lexicon induction, high-accuracy OCR, and speech translation.

Resources

This article only covers the basic understanding of Parsing. The following list is valuable resources curated to learn more about parsing.

New developments in parsing technology
What is a Parse? Definition, Types, and Examples
Parsing Techniques A Practical Guide by Dick Grune and Ceriel J.H. Jacobs
Wikipedia
Parsing HTML documents

Parsing: The key to understanding structure in programming

Table of Contents