Closed-captioning, as a job, has two distinct parts: transcription and timing in. Transcribing is the easy part; as long as you can hear and type and type the things you hear, you’re good to go. Timing in, where you go back through the episode and link your transcribed text up to the show’s action as accurately as you can, is tougher. Not hard, exactly, but complicated. There are a million tiny rules and shortcuts to memorize: character limits, text colors, house style, length. For example: a caption can appear on-screen for a maximum of five seconds and a minimum of one, and there must be a .45-second gap between all captions that end with an end-stop; otherwise, they have to flow seamlessly into one another like water. A caption can’t always be the same length as a spoken sentence, and text can appear on-screen in one of four different colors, each meant to signify a different speaker. Two speakers whose lines appear one after the other can obviously never be rendered in the same color, although this quickly becomes an issue in scenes that feature more than four people speaking at once. Captions need to be raised after a commercial break, with a one-second gap before the next block of transcribed text. If there is already text at the bottom of the screen as a part of the program, you need to raise your caption so both are visible.
There are seventy-six pages of rules like this. Mastering them gives me a tidy, funless pleasure, which is good, because working quickly is the only way to make any money. Captioners are paid not by the amount of time they spend working, but by the number of total minutes in each video they caption. A minute of video can have pretty much anything in it — music, birdsong, total silence, a roomful of people screaming over one another so loudly you can’t make out a single word — but no matter its content, each minute pays the same. Captioners get $3.00 per video minute for shows we have to transcribe from scratch and $2.75 for those where the network sends along a script for us to copy and paste in, though these are often booby-trapped with homophones and inaccurate phrasing, and can take more work to edit than you’d put in doing the whole thing from scratch. A captioner should, on average, be able to hit about forty video minutes per day without breaking a sweat.
There are no mandated breaks at the captioning job. If you want to get up from your computer you can, but every minute of captioning means more money, and every single pause means your money is ticking away. Every second your fingers aren’t touching a keyboard, every minute your eyes are off the screen, you are losing money. Answering texts, looking at email, staring out the window, going to the bathroom, eating a snack, letting your mind drift backward or forward for even a second — if you add these things up, they cost too much to justify.
By the end of my first week, I’m pretty close to forty video minutes a day. When I stand in the office kitchen, circling my wrists as I wait impatiently for the kettle to boil, the sound is like someone rolling an office chair over a long sheet of bubble wrap, or walking across a forest floor covered in dry twigs.
A scientist and a reporter are standing together on a beach. “Will you ever / put these stick insects / back on the island?” the reporter asks. “Once the rat problem / is under control, / definitely,” the scientist says, nodding.
A woman is sitting on a blue stage in a pool of icy light, telling the story of her mother’s death. The phone call, the silent line, her caught breath, her husband, the paramedics. She says that her mother had never once been to the doctor, not once in her whole life. “She never / put herself first.” This is the revelation, she explains, for which her brand-new skin care line is named.
Two men in matching white suits are standing in front of an enormous metal drum. Everything around them is stainless steel or glowing white. One man flips a switch and the machine roars to life. The men stare into its depths as a thick yellow substance churns inside, up and down. Pointing a white, gloved finger at something off-camera, one man yells: “See over there? / That’s where the cheese comes out.”
The daytime captioners give me a wide berth as I pass them on my way in, like my shift might be contagious. Not many other people work at night. Aside from me there’s Night Manager Steve, a rockabilly couple who never make eye contact with me, Darryl, and John. Darryl and John are two very charming men in their late forties who look like alternate-universe versions of each other. Either John is a stretched-out, loud Darryl, or Darryl is a shy, squat John. Both have thick, stained skin that looks like hand-tooled leather. The main difference between them is the hair. While John’s is thin and wiry, Darryl’s is impossibly thick and lustrous, like the hair of a boy prince from a fairy tale. It flops down gently in front of his eyes when he’s at rest.
It’s not clear whether Darryl and John know each other outside the office, but they both bike to work every day, even in the winter, and they always show up at the exact same time. They move in an aura so thick with cigarette smoke you can smell their presence before they walk through the front door.
Sometimes a hundred episodes of a televangelist’s program will descend upon the office like a collective fever dream.Tweet
Darryl speaks to his computer as he captions, keeping up a steady, tense monologue about how much he hates whatever show he’s watching. His exclamations lilt like a melody over the steady beat of everyone’s typing. Fuck, he sighs, sinking deeper into his chair, an episode of British Antiques Roadshow humming in front of him. It’s decoupage! Of course. Shit. John is about a foot taller than Darryl; his girlfriend is a teacher, and they’re both into grindhouse movies from the ’70s and rescuing cats. His voice sounds like sandpaper scraping across gravel and his default setting is jubilance. I got five episodes of Dr. Phil this week, he tells me in the kitchen one evening, grinning as he carefully unwraps an individually packaged tea bag. Can you believe it? That show is monstrous. He chuckles, shakes his head, stirs his tea, smiling.
Night Manager Steve is shaped like an upside-down mop: floppy hair and stick-thin body. He eats an entire large Domino’s pizza in his office every shift, never makes eye contact with me or anyone, and leaves his half-empty cans of Red Bull in the fridge for days on end, where their sickly sweet smell seeps into everyone’s food.
Some evenings, I will go up to his desk to ask for a new assignment, and as the question escapes my throat I’ll realize it’s the first time I’ve spoken to another human being all day. When I speak, a strange queasy sensation moves through me — a pressure bound up with its own release, like the feeling of standing up after your leg’s fallen asleep. For months after I leave this job, it will be reflexively triggered in me by the smell of a fresh Domino’s pizza.
We mostly caption syndicated American panel shows and Australian daytime TV. It’s still not clear to me why an Australian broadcaster has hired a Canadian company to do all their captioning, or how they assign us our work. Some programs come back every week, while others show up once and then disappear completely. Sometimes a hundred episodes of a televangelist’s program will descend upon the office like a collective fever dream. You can spend weeks working on just one show, developing a complex, fraught relationship with the quirks in its format, learning its shape the way you’d learn a lover’s body — and then you never see it again.
I caption home-renovation shows and an educational program for third graders about careers in science. I caption Australian Family Feud and MasterChef Australia and a true crime show that tracks the grisly murder of a beautiful young woman in such detail that I have to periodically step out of the room to scream into my backpack. I caption soap operas and infomercials for revolutionary sprinkler systems and an awards show for innovations in contemporary design and a show called Gardening Australia about gardening in Australia and a show about a crocodile who loves guacamole called Crocamole.
Some shows are fun to watch but bad to caption, like Antiques Roadshow (too much text on the screen, too many proper nouns to spell-check), while others are easy but so boring they make my brain feel liquid, like the grim six-part documentary about the impact of erosion on vulnerable parts of the British Isles. Judge Judy is a gift from heaven — fun to watch, so staged it seems morally harmless, and also she’s always getting people to shut up so she can monologue, which makes transcribing a breeze. Neighbours, the Australian soap opera, provides maybe the best blend of pleasure and ease — it’s engaging, endlessly dramatic, with lots of long, silent pauses for significant eye contact. An all-Neighbours day is a jackpot. I can polish off five episodes in six hours and leave the office early, richer, and psychically no worse for wear.
The worst, by far, is a program called The Doctors. The Doctors is a panel show produced by Dr. Phil’s son, in which a former Bachelor from The Bachelor and a ghoulish plastic surgeon conduct discussions on a range of subjects very loosely adjacent to the topic of medicine. The two men are often joined by a third panelist, usually a hot female psychologist or a hot female sexual health educator or a hot female dermatologist — to offer a hot female perspective. The Bachelor steers the conversation, and his cohost makes inappropriate jokes, laughing like a freshly roused corpse squeezing centuries’ worth of dust from his lungs. The hot extra doctor waits patiently for her turn.
The sight of a new Doctors on my desktop sets my stomach churning. Every single thing about the show seems haunted: the scrubs the Bachelor wears to host each episode; the howling, disembodied audience; the jagged, violent quick cuts; the blunt blue of the set — debt blue, bus blue, blue like the chairs in a hospital waiting room. The show’s content is scaffolded by a logic I recognize from advertising chumboxes on the internet: something shocking next to something gross, something sexual next to something deeply tragic, something disturbing next to something pitiful. Everything in the show’s introductory voice-over is either HORRIFYING, LIGHTHEARTED, or A QUESTION???!?!?: NEXT, The Doctors surprise A HOMELESS VETERAN living on FOOD STAMPS with the MAKEOVER OF A LIFETIME!!! THEN, a man marries his pet . . . snake?!????
A BETTER way to BUST YOUR BELLY FAT!!!! Could your smartphone be KILLING YOU????
Worst of all, the show is a nightmare to caption, filled with graphics and text that appear at random points all over the screen. People talk over each other constantly, and the audience roars so loudly I have to listen to three-second segments a hundred times just to get them right:
WOULD YOU PAY TO SEE A VASECTOMY IN REAL TIME?
MEET THE PSYCHO NURSE WHO CYBERSTALKED A PATIENT?
COMING UP NEXT, A DOG THAT CAN READ?
These phrases are burned so aggressively into my mind that, after a long day of listening to them over and over, I feel certain they’ll last longer in my brain than anything else — like plastic in a landfill, refusing to degrade.
In an office building downtown, a man named Ridge is staring at a scale model of a skyscraper. He is in love with the building, or with the model, or what it represents. It has a name. He whispers it into one of the tiny windows while caressing it, gazing into the windows’ shiny plastic surfaces, wide-eyed, unblinking.
The category is “THINGS THAT GIVE YOU A HEADACHE.” The contestant frowns for a second. “Your partner? Your spouse?” The buzzer sounds, and the audience shouts the correct answer all together: “YOUR WIFE!”
A woman stands over a spotless gas stove in an immaculate white kitchen, calmly flipping a steak in a cast-iron grill. After thirty or forty seconds, she looks directly into the camera. “Right now / we’re just / letting the meat talk.”
A caption is a small map of the mind of the person who wrote it. In her misspellings you can see which nouns she’s never heard before, or the direction her mind travels when she mishears a word. In her elisions, you see which words she thinks are necessary to keep, which parts of the idea she thought could be erased without consequence.
Our house style guide says that when a captioner is describing a sound, her language has to be accurate, understandable, and as objective as possible. Draining your writing of all opinion is key. What is the least emotional way to convey that someone is raising their voice, or that something is difficult to hear? How do you describe a distant sound without bringing attention to yourself as a listener?
WOULD YOU PAY TO SEE A VASECTOMY IN REAL TIME?Tweet
One night, I burn a whole shift on a single forty-five-minute stand-up comedy special. A guy in it has this extended bit where he impersonates the keyboard player from the Bon Jovi song “It’s My Life.” The joke is that he plays a few notes during the verses, but has nothing at all to do during the big bombastic chorus because there’s no keyboard. He never speaks once. The joke rests on the seamless synchronization of his gestures and the music playing over them; it only makes sense if you can tell what’s happening. How do you transcribe the absence of something not just accurately, but hilariously, while remaining entirely objective? This problem costs me approximately $60 to figure out.
The Doctors does this thing at the end of every episode called the Word of the Day. Each show ends with the Bachelor delivering a closing monologue about an issue they’ve discussed on the show — but when he says a certain word, a piercing siren goes off, balloons drop from the ceiling, and the word flashes up on every screen in the studio. That’s our word of the day! the plastic surgeon yells. Everyone in the audience gets a prize. It is absolutely terrifying, every time.
I start jotting down the words of the day on a small piece of paper next to my computer. At the end of a seven-Doctors week, the list reads:
and then I’m so creeped out I have to stop.
Sometimes, like maybe once every ten videos, a horrible squealing crash will shudder through my headphones. I have no idea what causes the noise, but it makes me feel like I have stuck a wet fork into an electrical socket, like a screaming ghost is gnawing on the inside of my skull. It travels all the way down my spine into the joints of my toes. Every time it happens, I whip my headphones off and slam them down on the desk. No one looks at me or asks what’s wrong. If anyone else ever hears the screeching, they never talk about it. Too expensive.
The panel is discussing birth defects. Behind them, an enormous screen blazes with the words: BIRTH DEFECTS? Their guest stares directly into the eyes of the host, unblinking, hyper-serious. “Chimerism / is a very rare condition. / Basically, it means / I am my own twin.”
The panel is discussing a news story about a woman who was murdered in a hotel room. Her killer wrapped her body up in garbage bags and shoved it under the bed, where it lay undiscovered until someone finally complained to the management about the smell in the room. “Crazy!” says the woman with the very shiny hair. “I’ve stayed in a lot of bad hotels in my time,” says the surgeon, “but [indistinct].”
On an average night, the office has one of the most gorgeous soundscapes I’ve ever heard in my life. The captioning keyboards are deep and sonorous — not like the crisp, nervous clatter of the laptop keys I’m used to. Some captioners type more dramatically than others. John has a light touch, but Darryl sounds like he is slapping the keyboard with the full force of his upper body. You can see his shoulders working through his T-shirt. Night Steve clicks more frantically than he types; you can hear him slamming his mouse all the way from the kitchen.
In addition to closed-captioning, the company specializes in something called “re-speaking,” where one person sits in a soundproof room watching a slowed-down episode of a show and clearly repeats every single word of it into a microphone that’s hooked up to a computer. Their speech is run through a voice-capture program that transcribes the spoken text so it can be cleaned up and used as a caption. The re-speaking offices are supposed to be soundproof, but sometimes you can hear murmuring through the walls: a guy reciting the lyrics to Sugar Ray’s “Every Morning” clear, crisp, sharp, and slow, hitting the Ds super hard. A woman transcribing a 60 Minutes interview with Donald Trump, forced to play both sides.
The captioning office is not far from Lake Ontario, and something about the building’s positioning makes it particularly susceptible to weather. When wind hits the outside wall at an angle it makes this high, wavering, wailing sound that splits and multiplies into a ghostly chord. In these moments, the whole office seems to flicker in and out of some alternate dimension, like the signal in a busted old TV. I waste precious minutes paused over my keyboard, holding my phone up to the window, trying to capture this sound. Years later, the files will stay on the desktop of my computer. 1/8/17 SPOOKY CAPTION PLACE, 6/22/17 CAPTION PLACE HOWLING, et cetera.
Most nights I leave the office around 1 AM, when the drunk young men of the neighborhood are spilling out of one club so they can migrate to another. I wait for the streetcar in the doorway of the Dollarama as they come together around me in drifts, then split apart. I feel like a rock in a river. The young men yell for and at each other, collapse on the sidewalk, stumble into waiting cars. Their presence calls a cloud of cologne and perfume and vodka and vomit out over the street. They seem to me like a single organic unit, the wildlife of the city. It is like a baptism to stand among them, let them rinse the eerie quiet of the caption place out of my mind.
In the night vision color scheme the sand is blinding white. The tide is moving out. An enormous turtle is propelling itself across the beach at quarter speed. The researchers watch him for ten or twelve seconds until he hits a rough patch and has to stop. He stands for a minute, wrinkly and bathed in weird moonlight. Soft music twinkles in the background. One of the researchers turns to the camera: “Now we’re going to measure / the turtle.”
“Caption Place” is excerpted with permission from Best Young Woman Job Book: A Memoir by Emma Healey, published in 2022 by Random House Canada.