Introduction to Language

Before we jump into Natural Language Processing, let's take up the foundational question: What is language?

As Christopher Manning and Hinrich Schütze observe:

"Even practically-minded people have to confront the issue of what prior knowledge to try to build into their model."

Any NLP system rests on assumptions about language's nature. These assumptions shape how we design models, train them, and interpret their results.

Let's explore two contrasting views:

The Generative View: language as a rule system
The Functional View: language as communication behaviour

Our Focus: Throughout the series, our focus will be on the functional view rather than generative view. Why? Because the generative view is too complex to model and understand, and it's not scalable.

The Generative View

Abstract: Language as Rule System

This view is heavily influenced by Noam Chomsky. He proposed that human language is generated by a system of rules - similar to formal languages in computer science.

Chomsky's Hierarchy of Languages

In formal language theory, languages are defined by constraints on their generating rules:

Regular Grammar $\rightarrow$ the simplest, with strict constraints:

A non-terminal can only become a terminal, a terminal + non-terminal, or empty {1}

Context-Free Grammar (CFG) $\rightarrow$ less constrained:

Rules don't depend on surrounding context
More expressive than regular grammars {2}

$$\text{Regular languages} \subset \text{Context-free languages}$$

Chomsky argued that human language might also be generated by such rules. This leads to questions generative linguistics tries to answer:

What rules generate sentences?
How do they combine to produce valid sentences?
How do rules vary across languages?

Understanding Through Parsing

If we accept that language has underlying rules, then understanding a sentence means inferring those rules thus called as parsing.

For example, "Our company is training workers" can be parsed as:

(ROOT
  (S
    (NP (PRP$ Our) (NN company))
    (VP (VBZ is)
        (VP (VBG training)
            (NP (NNS workers))))))

Once we have this structure, answering "Who is training workers?" becomes straightforward: inspect the subject node → "Our company."

Parsing Basics

Parsing infers the syntactic hierarchy of a sentence using rules from the generative view (like context-free grammars).

For "Our company is training workers," a parser applies rules step-by-step:

Identify the root as a Sentence (S).
Split into Noun Phrase (NP: "Our company") + Verb Phrase (VP: "is training workers").
Nest the auxiliary "is" under a VP with main verb "training" + object NP ("workers").

This builds a tree showing relationships, like who does what to whom.

Tree Breakdown

ROOT
        └── S
            ├── NP (subject)
            │   ├── PRP$ (Our)
            │   └── NN (company)
            └── VP (verb phrase)
                ├── VBZ (is)
                └── VP
                    ├── VBG (training)
                    └── NP (object)
                        └── NNS (workers)

NP: Noun phrase (things/names).

VP: Verb phrase (actions).

Tags like PRP$ (possessive pronoun), NN (noun) come from standards like Penn Treebank.

How Inference Works

Start with words + part-of-speech tags.
Apply grammar rules (e.g., S → NP VP; VP → VBZ VP; VP → VBG NP).
Build upward: Match "Our company" as NP subject; "is training workers" as progressive VP.
Result: Tree lets you query—subject of "training" is the outer NP ("Our company").

The Problem

Language is inherently ambiguous. The same sentence can have multiple valid parses, and an NLP system must choose the correct interpretation. This is a major reason why NLP is difficult.

Classic Example: "I saw the man with the telescope."

Possible parses:

I used a telescope to see the man. (Telescope modifies "saw.")
I saw a man who had a telescope. (Telescope modifies "man.")

A Bayesian Perspective

The generative view fits naturally into a Bayesian framework:

Z = hidden generative mechanism (rules and structure)
X = observed sentence

Generation: P(X | Z) : given rules, generate sentence Understanding: P(Z | X) : given sentence, infer the rules that produced it

This probabilistic interpretation became foundational in statistical NLP.

The Functional View

Abstract: Language as Communication

Modern NLP takes a different path. Not treating language as abstract rules, but as part of communication.

A natural language is "a system intended to communicate ideas from a speaker to a hearer."

When someone speaks, they're acting on the listener - changing mental states, influencing actions, or conveying information. Language evolved because it was useful for interaction.

Language as Function Approximation

The question rises as: what if language is a mathematical function?

The function takes:

The state of the world
The speaker's utterance
The listener's mental state

And outputs:

A response, action, or no response

Understanding language = learning this function's internal mechanism.

This is just how we learnt about agents in AI, much like reinforcement learning agents map states to actions. In AI, we model it mathematically to build systems that "understand" and respond.

Core Function Signature
Think of language processing as: $f(S_w, U_s, M_l) \rightarrow R$
Where:

$S_w$: World state (context, facts, environment).
$U_s$: Speaker's utterance (input text/speech).
$M_l$: Listener's mental state (beliefs, goals, history).
$R$: Response (action, reply, or silence).

Agents learn this $f$ via massive data, approximating it with neural nets (e.g., transformers in LLMs like GPTs).

Why This Fits Machine Learning

Machine learning is fundamentally about function approximation:

Classification → maps inputs to labels
Clustering → groups similar inputs
Language understanding → maps conversations to responses

We observe many communication examples and learn the function that maps inputs to outputs.

The Challenge

This view exposes a major problem: the language function depends on the entire environment, speaker's mental state, and listener's mental state. Modeling language accurately might require simulating the entire world.

The Practical Compromise

Instead of approximating the full language function, we approximate smaller sub-functions:

Task	What it captures
Machine Translation	Text → text in another language
Language Modeling	Predicting next word
Question Answering	Generating answers from context
Image Captioning	Visual scenes → descriptions

Modern NLP solves these specific tasks using machine learning and deep learning.

Introduction to Language

Introduction to Language

The Generative View

Chomsky's Hierarchy of Languages

Understanding Through Parsing

Parsing Basics

Tree Breakdown

How Inference Works

The Problem

A Bayesian Perspective

The Functional View

Language as Function Approximation

Why This Fits Machine Learning

The Challenge

The Practical Compromise

Task

What it captures