What problems does NLP solve?

Part-of-Speech Tagging

Part-of-speech tagging (POS-tagging or POST) also called grammatical  or word-category  disambiguation is the process of having a text in some language and assigning parts of speech to each word (and other token), such as noun, verb, adjective, etc., This process is based on both the definition and the context of each word. The steps of a POS tagging are the following:
Tokenisation: The given text is divided into tokens so that they can be used for further analysis. The tokens may be words, punctuation marks, and utterance boundaries.
Ambiguity look-up: This is to use lexicon and a guesser for unknown words. While lexicon provides list of word forms and their likely parts of speech, guessers analyse unknown tokens. Lexicon and guesser make up what is known as lexical analyser.
Ambiguity Resolution: This is also called disambiguation. Disambiguation is based on information about word such as the probability of the word occurring with a certain part of speech. For example, power is more likely used as noun than as verb. Disambiguation is also based on contextual information or word/tag sequences. For example, the model might prefer noun analyses over verb analyses if the preceding word is a preposition or article. Disambiguation is the most difficult problem in tagging.
POS-Tagging demo (with explanation of tags).
Automatic Summarisation: The growth of billions of documents in the World Wide Web at an exponential pace highlights the necessity of tools that provide timely access to, and digest of, various sources in order to alleviate the information overload people are facing. These concerns have sparked interest in the development of automatic summarisation systems. Such systems are designed to take a single article, a cluster of news articles, a broadcast news show, or an email thread as input, and produce a concise and fluent summary of the most important information. Finding the most important information presupposes the ability to understand the semantics of written or spoken documents. Writing a concise and fluent summary requires the capability to reorganise, modify and merge information expressed in different sentences in the input. Full interpretation of documents and generation of abstracts is still certainly beyond the state of the art for automatic summarisation.

Machine Translation (MT)

MT is automated translation. It is the process by which computer software is used to translate a text from one natural language to another. To process any translation, human or automated, the meaning of a text in the original (source) language must be fully preserved in the target language, i.e., the translation needs to express the same meaning as the source text. While on the surface this seems straightforward, it is far more complex. Translation is not a mere word-for-word substitution. A translator must interpret and analyse all of the elements in the text and know how each word may influence another. This requires extensive expertise in grammar, syntax (sentence structure), semantics (meanings), etc., in the source and target languages, as well as familiarity with each local region.
There are two types of machine translation system: rules-based and statistical:
Rules-based systems use a combination of language and grammar rules plus dictionaries for common words. Specialist dictionaries are created to focus on certain industries or disciplines (“domains”). Rule-based systems typically deliver consistent translations with accurate terminology when trained with specialist dictionaries.
Statistical systems have no knowledge of language rules. Instead they “learn” to translate by analysing large amounts of data for each language pair. They can be trained for specific industries or disciplines using additional data relevant to the sector needed. Typically statistical systems deliver more fluent-sounding but less consistent translations.

Natural Language Generation (NLG)

NLG is the subfield of Artificial Intelligence and Computational Linguistics that focuses on computer systems which can produce understandable texts in a human language. Typically starting from some nonlinguistic representation of information as input, NLG systems use knowledge about language and the application domain to automatically produce documents, reports, explanations, help messages, and other kinds of texts.
“The goal of natural language generation (NLG) systems is to figure out how to best communicate what a system knows. The trick is figuring out exactly what the system is to say and how it should say it. Unlike NLU (Natural Language Understanding), NLG systems start with a well‐controlled and unambiguous picture of the world rather than arbitrary pieces of text. Simple NLG systems can take the ideas they are given and transform them into language. This is what Siri and her sisters use to produce limited responses. The simple mapping of ideas to sentences is adequate for these environments.”

Named Entity Recognition (NER)

NER (entity identification – entity chunking – entity extraction) is a significant subtask in the information extraction field since it allows identification of proper nouns in open-domain (i.e., unstructured) text. It seeks to locate and classify elements in text into predefined categories such as the names of persons, organisations, locations, expressions of times, quantities, monetary values, percentages, etc.