Syntax Patterns in Language
Generative Grammar (Ch.1)
Study of phrase and sentence structure
the scientific study of sentence structure. Specifically, what do native speakers know about the
structure of their language and how is this knowledge organized in the mind?
study of how the organization of sounds and words become meaning
lies between morphology and semantics: often, the lines between syntax and semantics is blurred
As syntacticians (scientists of syntax)…
• What are we studying?
• How are we studying it?
• Why are we studying it?
What are we studying?
• e-Language vs. i-Language • Learned vs. Acquired
• Prescriptive vs. Descriptive • Performance vs. Competence
i-Language (Capital ‘L’) vs. e-Language (Lowercase ‘l’)
Both terms “i-” and “e-” language come from Chomsky
i-Language (Capital ‘L’)
Human Language Capacity (the human ability to speak a language)
basically states that all humans are capable of speaking any language if they acquire it
Eg. someone born into speaking an African dialect’s complex sounds vs. languages with rolling ‘r’
sounds: we are all capable of producing these sounds from a young age
what we study as syntacticians
e-Language (Lowercase ‘l’)
the languages we speak
Eg. English, Spanish, etc.
Prescriptive vs. Descriptive
• Prescriptive grammars/rules: prescribe how we should speak (according to some ‘language authority’)
• Descriptive grammars/rules: describe how we actually speak.
1 We will focus on descriptive grammar.
Another way to look at it…
In sports and games
• Prescriptive rules: if we break them, we play “bad”
• If I make an off-side play in soccer, I break a prescriptive rule.
• Descriptive rules: if we break them, we are not playing the game.
• If I try to push the ball along the ground with my nose, I am not really ‘playing’
• Breaks a descriptive rule:
– *Went home.
– *John loves herself.
– *John and Mike like himself.
• Breaks a prescriptive rule:
– Its continuing mission […] To boldly go where no-one has gone before. (Split infinitive)
– John and me are going to the mall.
* syntactically incorrect (represents descriptively ungrammatical)
# semantically incorrect
Learned vs. Acquired
Learning involves conscious gaining of knowledge
Acquisition involves subconscious gaining of knowledge
Language (native speaker ability) is acquired.
In this course, we will focus on the kind of linguistic knowledge that is acquired by native speakers.
(Of course, when you consciously try to learn a second language, especially when you’re older, you are
trying to learn it, rather than acquire it, because your attempts are conscious).
In this course, we are more interested in acquisition
Performance vs. Competence
• Performance: what we actually produce.
• Competence: what we know about language.
For example, our performance is affected by stuttering, nervousness, speech impairment, etc.
ALSO writing down exactly what the speaker is saying may amount to a performance issue, since how
we speak is not always how we write (eg. when we speak, we say “um”, we pause, we retract what we
are saying, etc.)
Sentences that highlight this distinction:
– The horse raced past the barn fell. (Garden path)
2 – The old man the boat. (Garden path)
– Who did Bill say Frank claimed Mary seems to have been likely to have kissed.
– Cheese mice love stinks. (Centre-embedding)
Generative grammar focuses on competence.
How are we studying it?
• Syntax: Humanities or Science? • Grammar/Rules
• The Scientific Method • Data
– Hypotheses – Corpora
– Falsifiability – Grammaticality Judgments
Syntax: Humanities or Science?
• When you study language from a Humanities perspective, what do you study?
• When you study language from a Science perspective, how do you study it?
Using Scientific Methodology.
-themes, motifs, metaphors -use scientific methodology
-write essays, etc.
“What are the scientific characteristics that make the
modern approach to language study what it is?”
– (Crystal 1971:77, as cited in Haegeman 2006:4)
• Haegeman (2006:5) identifies the following themes from several definitions of science:
– Knowledge – Laws of nature, natural laws, general truths,
– Pursuit, hunt, search, seek. law
– Explanation – Order, regularity, systematic
• So, science is an activity aimed at achieving knowledge (Haegeman).
The Scientific Method
Against More Data
3 overlooks what we are looking for in the data without a theory to start with.
ie. We need a theory to start with in order to know what we are looking for in the data
• “In linguistics, as in other sciences, there is an essential interaction between data analysis and
– (Dik 1989:33, as cited in Haegeman 2006:15)
What makes a good hypothesis?
A hypothesis must make predictions i.e. it must be falsifiable
• To see whether or not it is correct, you look for evidence that will disprove the hypothesis.
• If you find such evidence, you must revise the hypothesis.
NOTE: Be prepared: In this course we will revise our hypotheses often. To succeed in this course, you
will need to understand why (and how) we move from one hypothesis to another.
Applying the scientific method to language:
“Manchester’s morning rush-hour traffic was brought to a near standstill yesterday as 150 black
cab drivers staged a go-slow protest calculated to cause maximum disruption to commuters.”
(Guardian, 14.9.2000, as cited in Haegeman 2006:7)
The phrase “150 black cab drivers” is ambiguous:
• Either the cabs are black, or the drivers are (ethnically) black
Let’s say we want to find out why this string of words is ambiguous.
No. Example AmbiguousNo.guous?
(1) 150 black cab drivers Yes
(2) 150 black taxi drivers Yes
(3) black cab drivers Yes
(4) a black cab driver Yes
(5) 150 drivers of black cabs No
• We start with an attested example (the sentence from the previous slide).
• We then test for different causes of the ambiguity. In each of the test phrases in the above table, we
have changed one thing, keeping the others the same. (We experiment with the data, relying on our
native speaker intuitions about whether ambiguity is present or not.)
• Each example in the above table represents a different hypothesis about the source of the ambiguity.
• Note that each of our hypotheses is falsifiable, and our hypotheses in (2) – (4) are in fact falsified
(proven to be wrong: the changes we have made in (2) – (4) do not affect the ambiguity, therefore are
not the cause of the ambiguity.)
4 • Note that in each of our hypotheses, we only change one thing in the phrase at a time: our
‘experimentation’ is controlled, orderly, and systematic.
• We want to go further to identify a general law or pattern…
• We’re not just satisfied with a list of data—sentences that make the phrase ambiguous or not. We
want to know why it is ambiguous in (1) – (4) but not in (5).
• We can hypothesize that there are two structures involved in all the ambiguous cases, and not in the
– (a) [150 [black cab] drivers]]
– (b) [150 [black [cab drivers]]]
• This ambiguity can be represented in brackets or in trees.
• We can hypothesize that the reason for the ambiguity is because when we have a sequence adjective -
noun - noun we have two possibilities for the relations between those nouns:
Adjective modifies only the first noun: – [[adjective noun] noun]
Adjective modifies the sequence of two nouns: – [adjective [noun noun]]
• We would thus make a prediction that other sequences of adjective – noun – noun will demonstrate
the same ambiguity
In the phrase “drivers of black cabs”, this structural configuration does not obtain
(is not present).
We can test this prediction with other (in this case, constructed) examples:
No. Example Adj – N – N sequence? Ambiguous
(6) A French art student Yes Yes
(7) A trendy furniture designer. Yes Yes
(8) A trendy designer of furniture No No
(9) A designer of trendy furniture No No
(10) The Apple computer bag. Yes Yes
(11) A computer bag from/by Apple. No No
• In the previous example, we tried to come up with a hypothesis for why a given string was ambiguous.
• But more often in this course, we will be trying to come up with hypotheses about why a given
sentence is grammatical or ungrammatical.
• Sometimes we will ask whether something is grammatical or ungrammatical with a particular
• We mark *ungrammatical sentences with an (*) asterisk.
• We will try to come up with a theory of why the grammatical sentences are grammatical and the
ungrammatical ones ungrammatical.
5 What is a Theory?
a “network of hypotheses”. -Haegeman (2006)
o This means that hypothesis and theory are not the same thing – a theory is a “network of
o Therefore, if a hypothesis is wrong, don’t throw out the theory, since it isn’t the same thing
• Grammar a group of rules
• In this course we will often use rules to represent our hypotheses.
• (Remember we’re talking about Descriptive Grammar, which describes how people
actually speak, not Prescriptive Grammar, which attempts to tell people how they should speak.)
A group of rules that generate the grammatical sentences of a language (and none of the
Our generative grammar is a model of what we know and how we learned it.
(This is a subtle but important point).
• Using rules we will generate the sentences of the languages we examine.
• Our goal is to be able to develop a grammar that will generate all the grammatical sentences in a
language and none of the ungrammatical ones.
• (Note that we’re not using “generate” to mean the way in which speakers actually produce sentences
while speaking. Instead, we’re interested in developing a model of what they know and how they
learned it: remember the performance vs. competence distinction).
Where do we get our data?
-corpora, grammaticality judgements, experimental work, etc.
where we get our data/sources of data
Ex. Newspapers, books, magazines, Internet, telephone recordings, recorded real-world speech
one type source of data in syntax, BUT it has limitations:
It doesn’t tell us that the sentence is ambiguous, we
have to use our own abilities to determine that it is
• Corpora are not sufficient
– Don’t contain negative evidence (i.e. what’s not grammatical)
– Very limited in scope; given that the number of possible sentences in a language is virtually infinite.
– In our previous exercise (black cab drivers) we changed the original attested sentence to test various
hypotheses, and we established whether the ambiguity was still present based on our own intuitions as
6 NOTE: Wrong sentences generally are not in corpora since they are mostly seen as being affected by
performance rather than competence
Eg. *Where do you wonder if he lives?
Will any corpus tell us that this sentence is ungrammatical? Have you ever heard anyone utter
this sentence? How do you know it is ungrammatical?
– Can corpora data falsify a hypothesis?
– Every day, we produce sentences that have never been uttered before.
• We need mental knowledge (aka competence):
Grammaticality judgments (aka. Acceptability judgments)
• Some linguists use more formal/ statistical ways of getting judgments:
– Likert scale scale of ungrammaticality (from 1-10)
– Magnitude estimation how many more times worse/better is this sentence than another
– A larger number of participants giving judgments so that statistical measures can be applied.
• We will apply acceptability judgments in this class non-statistically. For the most part this will
give us the right results. (Statistical proof of judgments is possible, though).
The Empirical Validity of Judgments
• Sprouse, Schütze, & Almeida (2013:1):
– Studied a random sample of 300 grammaticality judgments from a prominent linguistics journal.
– Tested those sample sentences using 3 “formal” judgment tasks (magnitude estimation, Likert scale,
forced-choice task) and ran statistics on the results.
– “The results suggest a convergence rate of 95% between informal and formal methods, *…+ The high
convergence rate suggests both methods are substantially valid.”
formal vs. informal
use the opinion use the opinion
of a lot of people of a random person/colleague
both are okay for most phenomena
(95% convergence rate), but not all
Syntactic vs. Semantic Judgements
Syntactic judgements (structure): the structure of the sentence makes no sense
Ex. *Where do you wonder if he lives?
7 Semantic judgments (meaning): the structure makes sense, but the meaning doesn’t
Ex. #Colourless green ideas sleep furiously
#My toothbrush is pregnant.
• (We’re mostly interested in Syntactic judgments.)
o Judgments of interpretation:
– Are there multiple interpretations for a given sentence (ambiguity)?
Eg. The man saw the fish with the binoculars.
o Judgments of coreference:
– Is a particular sentence grammatical with a particular interpretation?
Eg. John likes him.
– Is the above sentence grammatical/acceptable if John and him are the same person?
Why are we studying it?
• What big questions about language will it help us answer?
• We want to know more about the structure of language partly because we have bigger questions
about language in mind….
– How do we acquire language?
– What parts of Language are innate?
– How do we explain language universals and language variation?
• Observationally Adequate:
– Accounts for all the observed (corpus/performance) data.
• Descriptively Adequate:
– Accounts for all the observed data and all grammaticality judgments (competence).
• Explanatorily Adequate:
– Accounts for all the observed data, grammaticality judgments, but also explains how speakers acquire
we get to at most descriptive adequate but rarely, if ever, to explanatory
How do we acquire languages?
• Obviously this question is too big to answer here, but …
• Are we instructed by our parents?
• Do we mimic our parents?
1) Language is infinite: We produce sentences we’ve never heard before
2) We know things about our language that we’ve never been exposed to.
Language as an Instinct
• You know things about your language that you’ve never been taught:
– Whom did you think ___Shawn called?
– Whom did you think that Shawn called?
– Who did you think ___called Bill?
– *Who did you think that called Bill?
• Despite what they may think, parents rarely teach their children to speak!
Noam Chomsky The ability of humans to use language is innate (an instinct.)
• We are programmed to learn language.
• Most linguists assume an innate component to Language, but the extent of that component is up for
(often considerable) debate.
• Innateness & Universal Grammar (UG)
– Certain rules/principles/constraints are built into our genetic code.
– Properties common to all human languages are likely part of UG.
• Language Universals & Variation
– How do we account for cross-linguistic variation?
innate principles that govern sentence structure
Parameters: the different ways in which languages implement these innate principles
the building blocks that all languages use to construct the sentences of their languages.
• All languages use the same basic hardwired tools. It is the particular implementation
of these tools that varies between languages.
• Notice this gives us a very explanatory theoryif most of language is innate and the rest are
parameters, then why languages are the way they are (and relatively easy to learn) is explained.
Other evidence for UG:
• Human Specificity of • Cross-linguistic similarities • Lack of overt instruction
Language in language acquisition (even • Language Universals
• Distinct areas of the brain despite cultural differences)
9 What is Grammar?
Parts of Speech (Ch.2)
Syntactic Categories and Parts of Speech
Semantic vs. Distributional criteria Lexical vs. Functional
Major classes (Subcategorizations)
Open vs. Closed
‘Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.
(Lewis Carroll, Through the Looking Glass and What Alice found There, 1872. Carnie p. 65)
What’s in a “Noun”? (Or a “verb”, etc…)
• “If it walks like a duck and quacks like a duck, it’s probably a duck.”
What does this have to do with syntax?
• If it acts like a Noun (Verb, Adj., etc.), then it probably is a Noun (Verb, Adj., etc.).
Eg. We know “wabe” is a noun because of the determiner (the wabe)it acts like a noun
How do we tell how a noun (or a verb, etc…) “acts”? Distributional Criteria
• We define syntactic categories using Distributional Criteria:
Syntactic Categories (i.e. Parts of Speech (Ch.2)
• In elementary school:
– Nouns: person, place, or thing
– Verbs: action, occurrence, state of being
10 – Adjective: a modifier expressing quality, quantity, or extent
– Adverbs: a modifier expressing manner, quality, degree, etc.
– Preposition: location or origin
this is semantically defining parts of speech
Problems with semantic definitions of Parts of Speech:
1.) Not so clear-cut:
Ex. The assassination of the president.
Ex. Honesty is an important quality.
2.) Multiple parts of speech
Ex. We work at the factory.
Ex. This work is not fun.
Ex. She bought work clothes.
3.) Nonsense words seem to have parts of speech:
– The yinkish dripner blorked quastofically into the nindin with the pidibs.
NOTE: if the unacceptability is due to the semantics we say the sentence is ‘infelicitous.’
if the unacceptability is due to the syntax we say the sentence is ‘ungrammatical.’
• Instead, we can determine a word’s Part of Speech by morphological and syntactic criteria (how it
“acts” in a sentence)
• These are called Distributional Criteria (for nounhood, verbhood, etc...) because they focus on the
distribution of these parts of speech.
– Morphological: what affixes attach to it.
– Syntactic: the syntactic context—where it can appear in a sentence in relation to other components.
• These distributional criteria are language-specific.
• Derivational Suffixes:
– -ment, -ness, -ity, -ty, -(t)ion, -ation, -ist, -ery, -ee, -ship, -aire, -acy, -let, -ling, -hood, -ism, -ing
• Inflectional Suffixes
– (dog)-s, (wish)-es, (ox)-en, (child)-ren, (syllab)-i, etc.
• Syntactic Distribution
– After determiners such as the, those, these (e.g. these peanuts)
– Can appear after adjectives (the big peanut)
11 – Follow prepositions (in school)
– Subject or direct object of a sentence
– Negated by no
• Derivational Suffixes:
– -ate, -ize/-ise
• Inflectional Suffixes:
– Past tense: (pour)-ed or (spel)-t
– Present tense third person singular: (thinks)-s
– Progressive: -ing, Perfective: -en, Passive: -ed & -en
• Syntactic Distribution
– Follow auxiliaries, modals, the special infinitive marker to:
– Can follow subjects
– Can follow adverbs like often and frequently
– Can be negated with not (as opposed to no and un-)
• Derivational Suffixes:
– -ing, -ive, -able, -al, -ate, -ish, -some, -(i)an, -ful, -ly
• Inflectional Suffixes:
– Comparative form -er (or follow more)
– Superlative form -est (or follow most)
– Negated with the prefix un-
• Syntactic distribution:
– Between determiners and nouns
– Can also follow auxiliary am/is/are/was/were/be/been/ being (but this overlaps with verbs)
– Adjectives can be modified by adverbs like very (this also overlaps with adverbs)
• Derivational Suffixes:
– Many adverbs end in -ly
• Inflectional Suffixes:
– Generally don’t take inflectional suffixes
– (Rarely can be used comparatively; more quickly)
• Syntactic Distribution:
– Adverbs can’t appear between a determiner and a noun. (e.g. *the quickly dog)
– Can often appear in many positions in a sentence.
– Can often be modified by the adverb very.
12 NOTE: inflectional suffixes always follow derivational
Open vs. Closed
Open classes: ones you can add to
• Syntactic categories so far
Ex. N: staycation, truthiness, birther, netiquette, turduckin
Ex. V: tweet (=post on twitter), misunderestimate, phish (trying to get personal data for a scam)
Ex. Adj: Harperesque, -licious
Ex. Adv: transgenically, bioluminescently, autistically, holistically
Closed classes: ones you can’t
• But some are closed classes
Ex. Auxiliaries: be, have, do
Lexical vs. Functional
easy to define
acquired early by children
easier to translate between languages
words which are hard to define
nuts-and-bolts of language
acquired later by children
often omitted in telegrams and headlines
Eg. What is the MEANING of been in: The textbooks could have been sent last week.
Some Functional Categories
• Prepositions (P):
Can appear before nouns (and determiners)
Eg. to, from, under, over, with, by, at, above, before, after, through, near, on, off, for, in, into, of,
during, across, without, since, until
• Determiners (D):
Can appear before nouns
1. Articles: the, a, an
2. Deictic articles (demonstratives): this, that, these, those
13 3. Quantifiers: each, every, all, most, many, no, any, less, more
4. Numerals: one, two, three, etc.
5. Possessive pronouns: my, your, his, her, etc.
6. some wh-words: whose, which
• Conjunctions (Conj):
Eg. and, or, nor, but, either, neither
• Complementizers (C):
Eg. that, if, whether, for
• Tense (T):
appear before verbs:
1. Auxiliaries: have, be, do
2. Modals: will, would, shall, should, can could, may, might
3. Non-finite Tense marker: to
• Negation (Neg):
• Generative Grammar (Ch.1)
– What is Syntax?
• Parts of Speech (Ch.2)
– Semantic versus Distributional criteria
– Major classes
– Open vs. Closed
– Lexical vs. Functional
14 Constituency, Trees, and Rules (Ch.3)
• Hierarchical Structure
• How to Test for Constituency
• Phrase Structure Rules (PSRs)
• Heads and Phrases
Linear Order/Flat Structure vs. Hierarchical Structure
The customer in the corner will order the drinks before the meal.
How are these words put together in a sentence?
Are they simply put one after the other in linear order, or do they have some kind of internal
• Hyp 1: They are just arranged in linear order. No hierarchical structure
(they are not grouped into units).
• Hyp 2: They are grouped into un