Chapter 1

Navigating a diverse field

What is multimodality?

‘Multimodality’ is a term that is now widely used in the academic world. The number of publication titles featuring the term has grown exponentially since it was first coined in the mid-1990s. Since then, a myriad of conferences, monographs, edited volumes and other academic discussion forums have been produced that are dedicated to multimodality. Signs of its becoming a shorthand term for a distinct field include the publication of the first edition of the Handbook of Multimodal Analysis (Jewitt, 2009), now a revised second edition (Jewitt, 2014), the launch of the Routledge Series in Multimodality Studies (2011) and the launch of a journal titled Multimodal Communication (2012). These and many other outlets inviting contributions in the area of multimodality provide platforms for scholars working in different disciplines, including semiotics, linguistics, media studies, new literacy studies, education, sociology and psychology, addressing a wide range of different research questions.

With the term being used so frequently and widely, it may seem as though a shared phenomenon of interest has been recognized and a common object of study identified. Indeed, we can, in relatively generic terms, describe that phenomenon, or object of interest, as something like, ‘We make meaning in a variety of ways’, or, ‘We communicate in a variety of ways’. Yet we must immediately add that ‘multimodality’ (and related concepts, including ‘mode’/’modality’, ‘[semiotic] resource’) is differently construed. Exactly how the concept is articulated and ‘operationalized’ varies widely, both across and within the different disciplines and research traditions in which the term is now commonly used. Therefore, it is very difficult and potentially problematic to talk about multimodality without making explicit one’s theoretical and methodological stance.

Before going any further, we turn to those who first used the term and explore what it was that they were trying to draw attention to. As far as we can reconstruct, the term first appeared in the middle to late 1990s in different parts of the world. It is used, for instance, by Charles Goodwin, in a seminal article that he submitted to the Journal of Pragmatics in 1998 (Goodwin, 2000). It also features in Gunther Kress and Theo van Leeuwen’s Multimodal Discourse: The Modes and Media of Contemporary Communication (2001), the manuscript of which had been ‘in the making’ for a number of years. These scholars started using the term more or less independently of each other, with Goodwin in the US working in the tradition of ethnomethodology and conversation analysis, and Kress and van Leeuwen (then) in the UK in the tradition of social semiotics. Around this same time, O’Halloran, working (then) in Australia and drawing on earlier work by O’Toole (1994) and Kress and van Leeuwen (1996), began to use the term ‘multisemiotic’ to describe the multimodal character of mathematics texts (see, for instance, O’Halloran [1999b], published in Semiotica).

If a ‘means for making meaning’ is a ‘modality’, or ‘mode’, as it is usually called, then we might say that the term ‘multimodality’ was used to highlight that people use multiple means of meaning making. But that formulation alone does not accurately describe the conceptual shift these scholars were trying to mark and promote. After all, disciplines such as linguistics, semiotics and sociology have studied different forms of meaning making since well before the term ‘multimodality’ was introduced. Indeed, Ferdinand de Saussure (1857–1913), writing in the early 20th century, already suggested that ‘linguistics’ was a ‘branch’ of a more general science he called semiology. Yet the branches of that imaginary science have continued to specialize in the study of one or a small set of means for making meaning: linguistics on speech and writing, semiotics on image and film, musicology on music; and new subdisciplines have emerged: visual sociology, which is concerned with, for example, photography; visual anthropology, which is concerned with, for example, dress. These (sub)disciplines focus on the means of meaning making that fall within their ‘remit’; they do not systematically investigate synergies between the modes that fall inside and outside that remit.

Multimodality questions that a strict ‘division of labour’ among the disciplines traditionally focused on meaning making, on the grounds that in the world we’re trying to account for, different means of meaning making are not separated but almost always appear together: image with writing, speech with gesture, math symbolism with writing and so forth. It is that recognition of the need for studying how different kinds of meaning making are combined into an integrated, multimodal whole that scholars attempted to highlight when they started using the term ‘multimodality’. It was a recognition of the need to move beyond the empirical boundaries of existing disciplines and develop theories and methods that can account for the ways in which we use gesture, inscription, speech and other means together in order to produce meanings that cannot be accounted for by any of the existing disciplines. This fact only became more noticeable with the introduction of digital technologies, which enable people to combine means of making meaning that were more difficult or impossible to disseminate before – for the majority of people anyway (moving image being one pertinent example). So that is how the introduction of the notion of multimodality marks a significant turn in theorizing and analysing meaning.

What the early adopters of the term recognized was not only the need to look at the co-occurrence and interplay of different means of making meaning but also that each ‘mode’ offers distinct possibilities and constraints. It had often been argued (e.g. by Saussure and Vygotsky) that language has, ultimately, the highest ‘reach’, that it can serve the widest range of communicative functions or that it enables the highest, most complex forms of thinking and is therefore the ‘most important’. Others, including Goodwin, Kress, van Leeuwen and others who first introduced the notion of multimodality, have pointed out that there are differences between semiotic resources in terms of the possibilities they offer for making meaning but that it is not the case that one resource has more or less potential than the other. The same point was made by O’Halloran, who in her definition of ‘multisemiotic’ emphasized the significance of the combination of different resources, each with their own potential. Thus multimodality marks a departure from the traditional opposition of ‘verbal’ and ‘non-verbal’ communication, which presumes that the verbal is primary and that all other means of making meaning can be dealt with by one and the same term.

We can now formulate three key premises of multimodality:

We should add four important footnotes to this. First, not everyone working in multimodality uses the notion of meaning making. Depending on their disciplinary background and focus, they might say that they are interested in ‘multimodal communication’, ‘multimodal discourse’, or ‘multimodal interaction’. We will use the term ‘meaning making’ unless we are writing about a specific approach to multimodality. Nor does everyone working in multimodality use the term ‘mode’: some prefer to talk about ‘resource’, or ‘semiotic resource’, and generally avoid drawing strong boundaries between different resources, highlighting instead the significance of the multimodal whole (‘gestalt’). Indeed, for that very reason, some scholars whose work we subsume under the heading of ‘multimodality’ do not use that term themselves, while otherwise committing to the three key premises we just presented.

Second, scholarly interest in the connections between different means of making meaning predates the notion of multimodality. For instance, the study of gesture and its relation to speech, gaze and the built environment has a long history in linguistic anthropology, interactional sociology and other disciplines (see e.g. Goffman, 1981; Kendon, 2004a; Mehan, 1980); the relation between image and writing has been studied in semiotics (e.g. Barthes, 1977 [1964]) and so on. These early contributions have produced important insights in what we now call multimodality. At the same time, we should note that the potential empirical scope of multimodality goes further still. We can see a development from an exclusive interest in language to an interest in language and its relations to other means of making meaning, to an interest in making meaning more generally, without a clear base point, whether language or any other mode.

Third, while those using the term ‘multimodality’ generally aim to develop a framework that accounts for the ways in which people combine distinctly different kinds of meaning making, their epistemological perspectives (i.e. their perspective on how we can know the world) are different. As we shall see later on in this chapter, in some approaches to multimodality the assumption is that it is possible and indeed necessary to develop an integrated theoretical and methodological framework for some kinds of meaning making, for instance for the study of speech, gesture, gaze and the material environment. In other approaches, the assumption is that it is possible and necessary to develop an encompassing theoretical and methodological framework to account for all kinds of meaning making – whether in image or in gesture or in writing or in any other mode. So researchers who adopt the notion of multimodality (or whose work is treated by others as being part of the field of multimodality) still draw different boundaries around what it is in the empirical world that they aim to account for. This is not a matter of ambition but a matter of epistemology: some argue that the differences between, say, image and speech are too great to handle within one and the same framework; others argue that, notwithstanding the differences, it is still possible, at a more general level, to establish common principles of meaning making.

Fourth, when exploring how the notion of multimodality has been and is being developed along diverse lines and schools of thought, it is important to keep an eye on the ‘original’ premises we just outlined. Fundamental to all those premises is a concern with the cultural and social resources for making meaning, not with the senses. While there are, of course, important relations to be explored between the senses and the means for making meaning, it is important not to conflate the two. The focus on the cultural and the social shaping of resources used for making meaning also sets the approaches apart from the popular notion that observation of ‘non-verbal behaviour’ offers a ‘way in’ to what an individual ‘really’ thinks (as suggested in e.g. best-selling guidebooks on ‘successful business communication’).

What makes a study ‘multimodal’?

When reviewing literature or when planning your own study, it is important to clarify what makes a study multimodal. The following sets of questions about aims, theory and method can help you assess the centrality (or marginality) of multimodality in a study:

Considering the place of multimodality on these dimensions, we can distinguish between:

When adopting multimodal concepts, you can draw selectively from approaches to multimodality such as the ones we discuss in the book. But picking and mixing can be a tricky approach. When selecting concepts from the frameworks and connecting them to concepts derived from other frameworks, it is important to reflect on their ‘compatibility’. Drawing on a theory raises expectations about methods used. For example, claiming to ‘use’ a theory from one of the approaches discussed in this book raises the expectation (among others, as we shall see in the next section) that you will analyse human artefacts or social interactions. So if you choose to combine that theory with the method of the interview, you are likely to be seen as having produced an incoherent framework. If you believe there are good reasons to use the interview as a method, you need to make a case for it (alternatively, you could treat the interview not as a method but as an object of study and analyse it multimodally).

Making explicit what the place of multimodality is in one’s study along these lines can be a way of setting appropriate expectations about the coherence of the research design. When you submit a research paper to a journal and suggest that the study you present is multimodal, some reviewers will expect multimodality to be central throughout the paper. When you explain that you adopt selected multimodal concepts, reviewers are more likely to assess the ‘fit’ between those concepts and the theoretical and methodological frame within which you integrate it. We will elaborate on the issue of mixing approaches in Chapter 6.

Three approaches to multimodal research

In Chapters 3, 4 and 5 of this book we discuss three approaches to doing multimodality. We will elaborate on how elements of the three approaches have been incorporated into other approaches in Chapter 6. Each is grounded in a distinct discipline, with a distinct theoretical and methodological outlook: conversation analysis, systemic functional linguistics and social semiotics. Not all scholars working in these originating disciplines are interested in multimodality. For instance, many conversation analysts or systemic functional linguists continue to focus on the study of ‘talk’ or ‘speech’. Yet within each of the three disciplines, we can identify a substantial and growing body of literature and a community of scholars engaging with multimodal research. It is these bodies of work that we will focus on.

While there are significant differences between them, they share a number of important features:

As the last bullet point suggests, the approaches that we focus on in this book have developed a more encompassing multimodal frame, largely by expanding their original frame: all had developed sophisticated toolkits to investigate language in use and then branched out, as it were, to explore meaning made with other means – gesture, for instance, or image. We should point out from the outset that the risk of branching out is that the new territory is described in the terms of the originating discipline. Indeed, this is a common critique of all three approaches, and one that we will attend to throughout the book. When expanding the traditional scope, it is important to keep a close eye on what is typical of a mode or semiotic resource and what may count as a more general principle of meaning making, making sure that linguistic categories are not imposed onto other modes. Every time the frame is expanded, old terms and categories need to be revisited and re-evaluated, in the light of the wider range of empirical cases being considered. So we might ask, ‘What would the counterpart be of a verb in image?’ But we can ask that only if we then immediately add, ‘Maybe image doesn’t have anything like the verb. Maybe it has categories unlike anything language has’.

The same can be said about the names of the originating disciplines. The terms ‘conversation analysis’ or ‘systemic functional linguistics’ no longer match the scope of the disciplines they describe. A number of new terms are now being used to mark the changing scopes of these disciplines. We will, for the moment, continue to use some of the old names and use new names if they are widely used within the community they represent. Thus we use the term ‘systemic functional multimodal discourse analysis’ (SF-MDA) but not, for instance, ‘multimodal conversation analysis’.

We will discuss the three approaches at length in Chapters 3, 4 and 5, respectively. Here we summarize them by briefly introducing their aims, history, theory of meaning, concept of mode, empirical focus and methodology. We also present a typical research question for each approach. If you have problems understanding some of the bullet points at this point, rest assured that we will come back to all of them.

Systemic functional linguistics

Social semiotics

Conversation analysis

Throughout the book, we will cross-reference and point out differences and similarities among these three focal approaches. The main differences are summarized in Table 1.1.

We want to highlight two significant differences here: one theoretical and one methodological.

The theoretical point is, first of all, an issue of naming. The three approaches have different terminological preferences, coupled with different conceptualizations of what we have described so far as ‘means for making meaning’. In SS and SFL, the terms ‘mode’ and ‘semiotic resource’ are both used, and definitions have been proposed that make a distinction between the two. In CA, ‘(semiotic) resource’ is used, but ‘mode’ is not, or very rarely, and some attempts at defining ‘(semiotic) resource’ have been made. Yet none of these definitions is (as yet) widely and consistently used beyond those who proposed them.

Table 1.1 Mapping three approaches to multimodality: SFL, social semiotics and conversation analysis

SFL

Social semiotics

CA

Aims

Recognition of social functions of forms

Recognition of power and agency

Recognition of social order in interaction

Theory of meaning

Meaning as choice

Motivated sign

Sequentiality

History

European functionalism

SFL, critical linguistics, semiotics

American interactionism, ethnomethodology

Conceptualization of ‘means for making meaning’

Semiotic resource, mode

Mode, semiotic resource

(Semiotic) resource

Example representatives

O’Toole, Martin, Unsworth, O’Halloran

Kress, van Leeuwen

Goodwin, Heath, Mondada

Empirical focus

Artefacts, including texts and objects

Artefacts, mostly texts

Researcher-generated video recordings of interaction

Method of analysis

Micro analysis of selected short fragments, corpus analysis, multimodal analytics

Micro analysis of selected short fragments, historical analysis

Micro analysis of (collections of) selected short fragments

There is, put simply, much variation in the meanings ascribed to mode and (semiotic) resource. Gesture and gaze, image and writing seem plausible candidates, but what about colour or layout? And is photography a separate mode? What about facial expression and body posture? Are action and movement modes? You will find different answers to these questions not only between different research publications but also within. To avoid potential confusion, it is important to make a deliberate decision on what categories and terms to use when engaging with multimodal research. It will be helpful to formulate some ‘working definitions’, drawing on the ones already put forward by the approach you adopt. Even though the working definition is unlikely to be entirely satisfactory, it is important to strive for maximum conceptual clarity and consistency. We will discuss the definitions proposed within our focal approaches in the respective chapters.

The methodological point is this. CA is primarily interested in meanings made in situ, in dynamic, face-to-face interactions. It looks at artefacts only insofar as these artefacts are being oriented to in observed interactions. So, for instance, Charles Goodwin (2000) looked at the Munsell chart, a tool used to determine the colour of soil by the archaeologists participating in the interactions he had video-recorded. In social semiotics, artefacts have been explored in situ – for instance the use of 3D models in the science classroom (Kress and van Leeuwen 2001) – but in other social semiotic work, artefacts have also been studied away from specific situated interactions. For instance, Bezemer and Kress (2008) studied textbooks. Their focus was on meanings made by the makers of textbooks (including authors and graphic designers), not on the meanings of those who engage with textbooks, such as teachers and students. In SFL, a similar position is usually taken, recognizing that it is possible to reconstruct meanings from (collections of) artefacts. Thus SS and SFL generally cover a wider empirical scope than CA. For instance, the architectural design of a building would normally fall outside the scope of CA.

There is, of course, significant variation in the degree to which scholars stay close (some might say ‘faithful’) to the principles put forward by the founders of the originating disciplines. Indeed there is a tension between staying faithful to concepts as they were originally defined and the need to revise old concepts in the light of the changing world. After all, the world we live in now looks very different from what it looked like when the originating disciplines appeared. Social, cultural and technological changes constantly challenge old notions.

There are close connections between scholars working with the different frameworks, and indeed some are active members in both communities. The closest links are between SFL and SS; there is far less interaction between representatives of CA, on the one hand, and SFL and SS, on the other. For instance, at the International Conference on Multimodality, CA has to date been under-represented, while SS and SFL were hardly represented at the tenth edition of the International Conference on Conversation Analysis (2010), which was dedicated to ‘multimodal interaction’. CA is closely linked with interactional (socio)linguistics and linguistic anthropology, and this connection is reflected in early work on the role of, for example, gaze in classroom interaction (see the work of Ray McDermott and Frederick Erickson). Social semiotics is closely linked with critical discourse analysis, which developed as a separate branch of critical linguistics. That is visible, for instance, in the joint work of David Machin and Theo van Leeuwen on media discourse (e.g. 2007).

In many studies, selected elements of one of the three approaches have been adopted and brought into connection with concepts and methods derived from other disciplines, such as psychology. For instance, you could use eye-tracking technology to ‘test’ certain concepts proposed in social semiotics (Holsanova, 2012). Other work has attempted to bring together concepts from social semiotics with ethnography. We will elaborate on how elements of the three approaches have been incorporated into other approaches in Chapter 6.