Chapter 14 – How Do We Use Language?
• Pragmatics: the aspects of meaning that do not affect the literal truth of what is
being said; these concern things such as choice form words with the same meaning,
implications in conversation, and maintaining coherence in conversation.
It can be divided into 2 topics:
o How we as hearers and speakers go beyond the literal meaning of what we
hear to make and draw inferences.
§ Ex: “Can you pass the salt?” Not asking if you have the literal ability,
it’s just a polite way of making a request.
§ Layering: trying to achieve particular goals when speaking through
the multiple layers of meaning in language.
o How we maintain conversations. This must be through cooperation, with
• Central theme: people are always making inferences at all levels on the basis of what
they hear. Our utterances interact with context to give them full meaning.
Making Inferences in Conversation
• Speech Acts: Austin and Searle
o We make inferences from what people say, how they say it, and even from what
they don’t say. In conversation, we have the added ability of asking the person
what they mean.
o According to Austin and Searle, every time we speak we perform a speech act.
o Speech act: an utterance defined in terms of the intentions of the speaker and the
effect that it has on the listener.
o Austin began with exploring sentences containing performative verbs (they
perform an act in their very utterance ex: I hereby pronounce you man and
wife – as long as the circumstances are appropriate, they’re called the felicity
o Conclusion: all sentences are performative, though mostly in an indirect way.
They’re all doing something, even if that’s just stating a fact.
o Three effects that each sentence possesses:
§ Locutionary force of an utterance is its literal meaning.
§ Illocutionary force: what the speaker is trying to get done with the
§ Prelocutionary force: the effect the utterance actually has on the actions
and beliefs of the listener.
o According to Searle, every speech act falls into one of 5 categories:
§ Representatives. The speaker is asserting a fact and conveying his or her
belief that a statement is true. (“Boris rides bicycle.”)
§ Directives. The speaker is trying to get the listener to do something. (In
asking the question “Does Boris ride a bicycle?” the speaker is trying to
get the hearer to give info) § Commissives. The speaker commits him/herself to some future course of
action. (“If Boris doesn’t ride a bike, I will give you a present.”)
§ Expressives. The speaker wishes to reveal his/her psychological state.
(“I’m sorry to hear that Boris only rides a bike.”)
§ Declaratives. The speaker brings about a new state of affairs (“Boris –
you’re fired for riding a bike!”)
o Different theorists specify different categories of speech acts.
§ D’Andrade and Wish described 7 types. They distinguished between
assertions and reactions as different types of representatives.
However, there was lack of agreement and lack of detailed criteria
of what constitutes any type of speech act. Also, some utterances
might be ambiguous, so how do we select the appropriate speech
o Direct speech acts: straightforward utterances where the intention of the
speaker is revealed in the words.
o Indirect speech acts require some work on the part of the listener.
o Speech acts can become increasingly indirect, often with increasing politeness.
o Searle proposed a two-stage mechanism for computing the intended meaning.
§ First, listener tries the literal meaning to see if it makes sense in context.
§ Only if it does not make sense does the listener do the additional work of
finding a non-literal meaning.
o Versus one-stage model: people derive the non-literal meaning either instead of
or as well as the literal one.
o The non-literal meaning is understood as fast as or faster than the literal
meaning, which favours the one-stage model.
o Kinds of layering: sarcastic, ironic, humorous, teasing, asking rhetorical
questions etc. We probably understand these types of utterance using similar
sorts of mechanisms as with indirect speech acts.
• How to run a conversation: Grice’s maxims
o Grice: in conversations, speakers and listeners cooperate to make the
conversation meaningful and purposeful -> adhere to the cooperative principle.
o To comply with this, you must adhere to 4 conversational maxims:
§ Maxim of quantity: make your contributions as informative as required,
but no more.
§ Maxim of quality: make your contribution true. Do not say anything that
you believe to be false, or for which you lack sufficient evidence.
§ Maxim of relevance: make your contribution relevant to the aims of the
Maxim of manner: be clear: avoid obscurity, ambiguity, wordiness, and
disorder in your language.
o Redundancy in maxims? Relevance is primary among them and the others can be
deduced from it.
o We usually try to make sense of conversations that appear to deviate from
the maxims, by assuming that overall the speaker is following the cooperative
principle. To do this, we make a type of inference known as a conversational
implicature. o Face management is a common reason for violating the maxim of relevance:
people don’t want to hurt or be hurt. The listener’s recognition of this is
important in how they make sensical inferences regarding remarks that violate
o Other ways that speakers cooperate in conversations as identified by Garrod and
§ They observed people cooperating in an attempt to solve a computer-
generated maze game.
§ The pairs of speakers quickly adopted similar forms of description ->
• The frequency of name selection can override other factors that
influence lexical choice, such as informativeness, accessibility, and
being at the basic level.
o Brennan and Clark (1996) proposed that in conversations speakers jointly make
conceptual pacts about which names to use.
§ Conceptual pacts are dynamic: they evolve over time, can be simplified,
and even abandoned for new conceptualisations.
o Sometimes we don’t want to cooperate in conversations. Often it seems that the
harder we try to keep something private, the more likely it is to pop out.
o Expt by Wardlow Lane, Groisman, and Ferreira:
§ Speakers described simple objects to other people. Some info was known
only to the speakers à privileged info.
§ Results: if speakers were told to keep this privileged info secret, they
were more likely to refer to the concealed objects.
§ Explanation: in terms of our monitoring our speech; monitoring can bring
things that we are trying to avoid into awareness. Freud would talk in
terms of repression.
o Right hemisphere of the brain plays an important role in processing some
pragmatic aspects of language -> non-literal processing ex: jokes, idioms
metaphors, and proverbs.
The Structure of Conversation
Two different approaches to analyzing the way in which conversations are
o Discourse analysis: uses the general methods of linguistics and aims to
discover the basic units of discourse and the rules that relate them.
§ Most extreme version: the attempt to find a grammar for conversation
in the same way as there are sentence and story grammars.
§ Labov and Fanshel: looked at the structure of psychotherapy episodes.
Utterances are segmented into units such as speech acts, and
conversational sequences are regulated by a set of sequencing rules
that operate over these units.
o Conversation analysis: much more empirical, aiming to uncover general
properties of the organization of conversation without applying rules. It
was pioneered by ethnomethodologists, who examine social behaviour in its
natural setting. • In a conversation, speaker A says something, speaker B has a turn, then A, etc ->
o A turn varies in length, and might contain more than one idea.
Other speakers might speak during a turn in the form of back-channel
communication, making sounds (“mhmm”), words (“yep”) or gestures (nodding), to
show that the listener is still listening, is understanding.
• Turn structure is made explicit by the example of adjacency pairs (question-answer
pairs, greeting-greeting pairs)
• The exact nature of the turns and their length depend on the social setting.
Less than 5% of conversation consists of the overlap of the two speakers talking at
once, and the average gap between turns is just a few tenths of a second.
• Sacks, Schegloff, and Jefferson proposed that the minimal turn0constructional unit
from which a turn is constructed is determined by syntactic and semantic structure,
and by the intonational contour of the utterance (over which the speaker has a great
deal of control).
A speaker is initially assigned just one of these minimal units, and then a
transition relevance place where a change of speaker might arise.
o A number of rules that govern whether or not speakers actually do change at
§ Gaze: we tend to look at our listeners when we are coming to the end
of a turn.
§ Hand gestures might be used to indicate that the speaker wishes to
§ Filled pauses indicate a wish to continue speaking.
§ Asking question to invite a change of speakers.
o Advantage of this system: it can predict other characteristics of conversation,
such as when overlaps (competing starts of turns, or where transitional
relevance places have been misidentified) or gaps occur.
• Wilson and Wilson: more biological model of the control of turn-taking.
o Ruing conversation, endogenous oscillators in the brains of the speaker and
listener become synchronised or entrained.
o Endogenous oscillators: groups of neurons that fire together in a periodic
way and hence act like clocks in the brain.
o Driving force of synchronisation: the speaker’s rate of syllable production.
o A cyclic pattern develops, with the probability of one of the conversants
initiating speech at any time being out of phase with each other, so
minimising the likelihood that the 2 people will start speaking at the same
2 key ideas:
§ Biological clocks ensure we don’t speak simultaneously
§ Clocks obtain their timing from the speech stream
Collaboration in Dialogue
• Audience design: the idea that speakers tailor their utterances to the particular
needs of the addressees. • Alignment: Conversation is a process of communicating representations of language,
and of trying to make the representation of the speaker and the listener the same –
filling the gaps.
o Exceptions: lying, deliberately withholding info
• Pickering and Garrod’s interactive alignment model:
o During dialogue the linguistic representations of the participants become
aligned at many levels (including the syntactic and lexical).
o Alignment occurs by means of 4 types of largely automatic mechanism:
priming, inference, use of routine expressions, monitoring and repair of
o This alignment of linguistic representations leads to the alignment of the
speaker’s and the listener’s situation models.
o Priming of words and syntactic structures ensures that linguistic
representations become aligned at a number of levels. This assumes much
less explicit reasoning about one’s interlocutor than alternative views such as
that of Clark.
o Emphasise the way in which listeners make predictions in conversations,
and that these predictions are made by the speech production system:
comprehension draws on production, particularly in difficult circumstances.
• Other reasons for supposing that audience design is an emergent, interactive
o Horton and Gerrig: the memory requirements of a task influence speakers. If
speakers have a lot to remember, they find it difficult to take the needs of the
listeners and the detailed past history of their conversational interaction into
• Speakers sometimes prosody and pausing to help listeners disambiguate what they
say. They also seem to monitor what they say with the goal of reducing ambiguity.
• While speakers sometimes avoid linguistic ambiguity, they go out of their way to
avoid non-linguistic ambiguity.
• Non-linguistic ambiguity arises when there are multiple instances of similar
o Ex: several instances of the same object in the visual scene.
Ferreira et al. expt about non-linguistic ambiguity:
o Results: speakers monitor their speech and can sometimes detect and avoid
linguistic ambiguity before producing it, but almost always avoid non-
• When the visual context was potential ambiguous, speakers tried to disambiguate
their utterances -> they do pay some attention to the needs of the listener.
• Limits to how far a speaker will go to make the listener’s life easier.
• Ferreira and Dell: examined the extent to which speakers used optional
complentisers (ex: the chicken “that” you ate)
o If speakers are trying to produce structures that are as easy to understand
and as unambiguous as possible, they should frequently include these
optional words in sentences that would otherwise be ambiguous. However,
they don’t. Instead they choose structures that are easy to produce and that enable them to produce the main content words as early as possible.
• Speech production proceeds with quickly selected lemmas being produced as soon
• While speakers produce prosodic cues (lengthening words and inserting pauses) to
syntactic boundaries, and listeners do pay attention to these cues, speakers tend to
do so regardless of whether or not the listener really needs it.
• There are limitations to audience design.
• Speakers over-estimate how good they are at conveying info.
• There are limits to how much speakers tailor their productions to their listeners,
and they do not always do so correctly even when they try.
Sound and Vision
• The study of the visual world provides us with a new tool for studying how we can
understand language and speech.
• The visual world paradigm has proved very popular for investigating sentence
processing and speech production.
• While adults make considerable use of the visual world, children do so to a much lesser
extent. 5 year old children rely exclusively on verb-bias info.
• Highly reliable cues, such as lexical bias, emerge first in development, with referential
info gradually being used as the child gets older.
• Although referential info may not determine which structures young children
construct, it may reduce the time it takes to construct them.
• Using visual info in comprehension
o Adult readers rely mostly on lexical info to generate alternative syntactic
structures, while adult listeners make a great deal of use of the visual world.
o People can use referential info from the visual scene at which they’re looking to
over-ride very strong lexical biases.
o Expt by Spivey, Tanenhaus, Eberhard: monitor eye movements of participants
following spoken instructions about picking up and moving objects in a visual
workspace. Eye movements were closely linked to the associated referential
expressions (phrases describing objects) in the instructions.
§ Interpretation of this ambiguous prepositional phrase: “Put the apple on
the towel in the box.”
Normally preferred initial interpretation is the goal-argument
analysis: put the apple on the towel
• Less usual interpretation: noun-phrase modifier: the apple that is
already on the towel should be put somewhere else.
• The answer depends on the visual context.
§ Eye movements showed that the initial interpretation was the one
consistent with the visual context.
o Chambers, Tanenhaus, Magnuson: properties of objects in the visual world can
§ Gave participants temporarily ambiguous sentences. Listeners restrict
their attention to object that are physically compatible with what they
hear. Hence real-world properties of objects constrain the referential
domain, and this info in turn is used form a very early stage to influence parsing.
Results: language processing immediately takes into account relevant
non-linguistic context, and argues against models where initial syntactic
decisions are guided solely by syntactic info.
o Particular sort of visual info: info form the speaker themselves ex: lip reading,
eye movements (as flagging attention of listener). The eye movements of the
listener come to match those of the speaker. If they’re looking over a scene,
there’s a 2 second delay between the speaker’s gaze and the listener’s.
Chapter 15: The Structure of the Language
Caplan: 4 main characteristics of the language-processing system:
- language system is divided into a number of informationally encapsulated
modules, each taking only one kind of input and delivering only one kind of
output. (The extent to which the modules are encapsulated is controversial).
- Processes within a module are mandatory and automatic when an input comes in.
e.g. we can’t not read a word and access its meaning
- Language processes generally operate unconsciously
- Most language processing takes place very quickly and with great accuracy (these
last two points suggest that much of language is like automatic processing)
The extent to which language processes interact is very controversial, but it seems
that the earlier in processing a process is, the more likely it is to be autonomous.
What Are the Modules of Language?
Recognizing/producing words and decoding/encoding syntax involve specific
The semantic-conceptual system is responsible for organizing and accessing or world
knowledge and for interacting with the perceptual system.
Word meanings are represented by being decomposed into semantic features, contact
the conceptual system through modality-specific stores. The meanings of words
can be connected together to form a propositional network that is operated on by
schemata (in comprehension –chapter 12 ) and the conceptualizer (in production –
Neurological case studies show that brain damage can affect some components of
language while leaving others intact, leading some to think e.g. that specific instances
are stored in the mental lexicon in one part of the brain, while general grammatical
rules are processed elsewhere.
There are differences in language processing in the visual vs auditory
modalities Phonological recoding isn’t always necessarily in reading, esp. in
languages with many irregular spellings of words We have access to visual stimulus much longer than spoken stimulus à
different temporal restraints in word recognition
Still, except for in young children, there is a very high correlation between
reading and listening comprehension skills
Parsing may also differ in the two modalities (b/c we can go back to previous
words we read.
Speech recognition is a data-driven, bottom-up process, while speech
production is a non-modular process involving feedback, b/c the goals of
the tasks are different: in recognition, we need to extract the meaning as
quickly as possibly, and we don’t need to construct detailed representations
of everything. In production, we need to be accurate about the construction
of every word and every sentence.