Diegetic sounds come from a source within a film’s fictional universe. These sounds are audible to the characters and film audiences. Diegetic sound helps to shape the space and time of the shot.

     In general, sounds in film include dialogues, singing, music, ambiance noise, and sounds made by particular objects.

     Diegetic sound can be on- or off-screen, and could be recorded during filming or manufactured during post-production.

     On-screen sound emanates from characters or objects visible on screen, while off-screen sound comes from sources outside the camera’s frame.

     Keep in mind that diegetic sounds (sound effects, music, dialogues, noises) always come from the world of the story.


     Sound design is an important element in Christopher Nolan’s science fiction film Inception (2010), which depicts the infiltration and manipulation of people’s subconscious through their dreams. Sounds and music, orchestrated by sound editor Richard King, transport audiences between imagined (dream) worlds and the embodied reality.

     In the masked ball scene where Romeo meets Juliet for the first time in Baz Luhrmann’s William Shakespeare’s Romeo and Juliet (1996), a transition in sound design signals to film audiences a transition of moods.

     As Romeo and Juliet discover, and look at, each other through a fish tank, the diegetic sound, music and chatter from the ongoing party, is tuned down and drowned out by a romantic soundtrack. The intrusion of the nondiegetic music transports the couple and film audiences to a parallel universe that is disconnected from the party.

Romeo, in a costume of knight in shining knight armor, and Juliet, dressed as an angel with white wings, meet each other for the first time as a masked ball. They discover each other’s presence through an aquarium. Romeo stands to the left of the frame. The right of the frame is taken up entire by the fish tank, with tropical fish swimming near Juliet’s face.

     Gender is as important a factor as race in film sound. Toward the end of John Madden’s Shakespeare in Love (1998), Will Shakespeare and his fellow actors are ready to stage the premiere of Romeo and Juliet. Audiences have filled their playhouse. However, they discover a not-so-minor problem. The actor cast as Juliet is going through voice change and can no longer play the maiden.

     This clip is provided by Miramax’s official channel.

     Voices are not only racialized but often gendered as well, and the impression is further solidified by costumes, camerawork, and the mise-en-scène. It is particularly ironic that this film imposes a modern ideology of binary genders onto the early modern, gender-fluid world that it is supposed to portray. In Shakespeare’s times, women did not generally perform on the English professional stage. Characters of all genders were performed by all-male casts. Gender variance is a recurring motif in Shakespeare’s plays.


     One important, but often glossed over, element of diegetic sound is the accent. Actors—guided by accent coaches—could use a particular accent to fit the story. It is therefore useful to listen to rather than simply watch a film.

     Many people are used to seeing race. By paying attention to accents, we will notice that r ace and ethnicity are not only visible and palpable but also audible. Jennifer Lynn Stoever points out that “listening operates as an organ of racial discernment, categorization, and resistance.”

     Alexa Alice Joubin’s research shows that audiences often “actively and unconsciously listen for accents and other sonic registers” of characters. They listen attentively to more familiar accents, and filter out, selectively, unfamiliar ones.

     In other words, dominant listening habits visualize linguistic differences. It is a form of racialization.

     Take, for example, Spike Lee’s 2018 BlacKkKlansman, which follows Ron Stallworth (John David Washington), the first Black officer in Colorado Springs in 1972, as he infiltrates the local division of the Ku Klux Klan.

     In one scene, Stallworth speaks to David Duke (Topher Grace), founder of the KKK, on the phone.

     Let us first listen to their phone conversation with any visuals. 

     When we could only hear the voice and have no visual cues, how do we respond to the clip? Examine how we tend to “hear” or not hear racial identities. Here is a transcript of what they are saying.

Speaker 1:        It’s him.

Speaker 2:        Hey, Ron, I don’t share this with a lot of people, but …

Speaker 1:        Yeah. Well, I’m anxious to meet you.

Speaker 2:        Excited to meet you too, Ron.

Speaker 1:        Aren’t you ever concerned of some smart-ass n__ calling you pretending to be white?

Speaker 2:        No, I can always tell when I’m talking to a n___.

Speaker 1:        How so?

Speaker 2:        Take you for example, Ron. Okay.

Speaker 1:        Yes?

Speaker 2:        I mean, I can tell that you’re a pure Arian white man from the way you pronounce certain words

Speaker 1:        Can you gimme any examples?

Speaker 2:        Yeah. Take the word, uh, R. Pure Arians like you or I would pronounce it correctly. N__ pronounces it ARA. Did you ever notice that it’s like ARA you gonna fry up that crispy fried chicken soul brother.

Speaker 1:        <laugh>. Wow. You are so white. Thank you for teaching me this lesson. If you had not brought it to my attention, I wouldn’t have noticed the difference between how we talk and how n__ talk.

Speaker 2:        Good. Good.

Speaker 1:        Yeah. I’d love to continue this conversation in Colorado Springs. It’s beautiful here, sir. God’s country.

Speaker 2:        Well, that’s what I hear, Ron. I look forward to meeting you and, uh, we’ll be talking real soon.

Speaker 1:        God bless white America.

     Now, let us watch this scene with audio and visual elements.


     Your Turn:     With both visuals and audio on, how have your reactions to the narrative changed? How do we see and/or hear race?

     Tips:     When describing sound and music, we can break it down to three categories: the sound’s characteristics (such as pitch, volume, and quality), the source (whether it is diegetic or nondiegetic), and the type (musical, orchestral, vocal, dialogue, special sound effects).