Introduction to Language
Table of Contents
- Introduction to Language
- The Generative View
- Chomsky's Hierarchy of Languages
- Understanding Through Parsing
- Parsing Basics
- Tree Breakdown
- How Inference Works
- The Problem
- A Bayesian Perspective
- The Functional View
- Language as Function Approximation
- Why This Fits Machine Learning
- The Challenge
- The Practical Compromise
- Task
- What it captures
Introduction to Language
Before we jump into Natural Language Processing, let's take up the foundational question: What is language?
As Christopher Manning and Hinrich Schütze observe:
"Even practically-minded people have to confront the issue of what prior knowledge to try to build into their model."
Any NLP system rests on assumptions about language's nature. These assumptions shape how we design models, train them, and interpret their results.
Let's explore two contrasting views:
- The Generative View: language as a rule system
- The Functional View: language as communication behaviour
Our Focus: Throughout the series, our focus will be on the functional view rather than generative view. Why? Because the generative view is too complex to model and understand, and it's not scalable.
The Generative View
Abstract: Language as Rule System
This view is heavily influenced by Noam Chomsky. He proposed that human language is generated by a system of rules - similar to formal languages in computer science.
Chomsky's Hierarchy of Languages
In formal language theory, languages are defined by constraints on their generating rules:
Regular Grammar $\rightarrow$ the simplest, with strict constraints:
- A non-terminal can only become a terminal, a terminal + non-terminal, or empty {1}
Context-Free Grammar (CFG) $\rightarrow$ less constrained:
- Rules don't depend on surrounding context
- More expressive than regular grammars {2}
$$\text{Regular languages} \subset \text{Context-free languages}$$
Chomsky argued that human language might also be generated by such rules. This leads to questions generative linguistics tries to answer:
- What rules generate sentences?
- How do they combine to produce valid sentences?
- How do rules vary across languages?
Understanding Through Parsing
If we accept that language has underlying rules, then understanding a sentence means inferring those rules thus called as parsing.
For example, "Our company is training workers" can be parsed as:
(ROOT
(S
(NP (PRP$ Our) (NN company))
(VP (VBZ is)
(VP (VBG training)
(NP (NNS workers))))))
Once we have this structure, answering "Who is training workers?" becomes straightforward: inspect the subject node → "Our company."
Parsing Basics
Parsing infers the syntactic hierarchy of a sentence using rules from the generative view (like context-free grammars).
For "Our company is training workers," a parser applies rules step-by-step:
- Identify the root as a Sentence (S).
- Split into Noun Phrase (NP: "Our company") + Verb Phrase (VP: "is training workers").
- Nest the auxiliary "is" under a VP with main verb "training" + object NP ("workers").
This builds a tree showing relationships, like who does what to whom.
Tree Breakdown
ROOT
└── S
├── NP (subject)
│ ├── PRP$ (Our)
│ └── NN (company)
└── VP (verb phrase)
├── VBZ (is)
└── VP
├── VBG (training)
└── NP (object)
└── NNS (workers)
NP: Noun phrase (things/names).
VP: Verb phrase (actions).
Tags like PRP$ (possessive pronoun), NN (noun) come from standards like Penn Treebank.
How Inference Works
- Start with words + part-of-speech tags.
- Apply grammar rules (e.g., S → NP VP; VP → VBZ VP; VP → VBG NP).
- Build upward: Match "Our company" as NP subject; "is training workers" as progressive VP.
- Result: Tree lets you query—subject of "training" is the outer NP ("Our company").
The Problem
Language is inherently ambiguous. The same sentence can have multiple valid parses, and an NLP system must choose the correct interpretation. This is a major reason why NLP is difficult.
Classic Example: "I saw the man with the telescope."
Possible parses:
- I used a telescope to see the man. (Telescope modifies "saw.")
- I saw a man who had a telescope. (Telescope modifies "man.")
A Bayesian Perspective
The generative view fits naturally into a Bayesian framework:
- Z = hidden generative mechanism (rules and structure)
- X = observed sentence
Generation: P(X | Z) : given rules, generate sentence
Understanding: P(Z | X) : given sentence, infer the rules that produced it
This probabilistic interpretation became foundational in statistical NLP.
The Functional View
Abstract: Language as Communication
Modern NLP takes a different path. Not treating language as abstract rules, but as part of communication.
A natural language is "a system intended to communicate ideas from a speaker to a hearer."
When someone speaks, they're acting on the listener - changing mental states, influencing actions, or conveying information. Language evolved because it was useful for interaction.
Language as Function Approximation
The question rises as: what if language is a mathematical function?
The function takes:
- The state of the world
- The speaker's utterance
- The listener's mental state
And outputs:
- A response, action, or no response
Understanding language = learning this function's internal mechanism.
This is just how we learnt about agents in AI, much like reinforcement learning agents map states to actions. In AI, we model it mathematically to build systems that "understand" and respond.
Core Function Signature
Think of language processing as: $f(S_w, U_s, M_l) \rightarrow R$
Where:
- $S_w$: World state (context, facts, environment).
- $U_s$: Speaker's utterance (input text/speech).
- $M_l$: Listener's mental state (beliefs, goals, history).
- $R$: Response (action, reply, or silence).
Agents learn this $f$ via massive data, approximating it with neural nets (e.g., transformers in LLMs like GPTs).
Why This Fits Machine Learning
Machine learning is fundamentally about function approximation:
- Classification → maps inputs to labels
- Clustering → groups similar inputs
- Language understanding → maps conversations to responses
We observe many communication examples and learn the function that maps inputs to outputs.
The Challenge
This view exposes a major problem: the language function depends on the entire environment, speaker's mental state, and listener's mental state. Modeling language accurately might require simulating the entire world.
The Practical Compromise
Instead of approximating the full language function, we approximate smaller sub-functions:
Task |
What it captures |
|---|---|
|
|
|
|
|
|
|
|
Modern NLP solves these specific tasks using machine learning and deep learning.