The Fireship style, measured: 13 cuts a minute and zero face time
Every dev-tool founder who opens a YouTube channel has the same reference video in their head. Fast, funny, dense, never boring, never a face. The Fireship style is the most imitated grammar in developer content, and almost every imitation gets it wrong in the same way: people copy the vibe and miss the mechanics.
So we measured the mechanics. As part of a nine-video study of long-form creators (the same study behind our MKBHD teardown), we ran ffmpeg scene detection and vision-classified frame sampling on three recent long Fireship videos: "Every operating system concept in one video" (11:30, 640K views), "I read every major CS paper of the last 100 years" (10:11, 586K), and "10 open source tools that feel illegal" (10:03, 1.32M). That gave us 134 classified frames and a full cut list per video. Worth noting up front: the main channel rarely exceeds 8 minutes, and these three are the longest of his last 40 uploads.
This is an independent editorial analysis. We are not affiliated with or endorsed by Fireship.
The headline numbers
The cut rhythm is the fastest we measured anywhere in the study: 11.8 to 15.4 hard cuts per minute across the three videos, 13.3 aggregate. Median shot length is about 3 seconds. The 90th percentile shot is about 9 seconds. For comparison, MKBHD averages 7.4 cuts per minute and Ali Abdaal runs 7.7 to 9.3. Fireship cuts nearly twice as often as either.
And then the number that surprises nobody who watches the channel but should surprise everyone who makes content: talking-head frames are 0 percent. Zero of 134 classified frames contain the creator's face. The entire retention engine that other channels build on charisma, eye contact, and reaction shots is simply absent. What replaces it is a rotation of five visual surfaces.
What is on screen (share of 134 classified frames)
Memes and stock inserts are the single biggest surface at roughly 35 percent of frames. Custom flat diagrams on pure black take about 25 percent. Code, terminals, and screen recordings cover about 18 percent, paper and document scans about 12 percent, and numbered title or logo cards about 6 percent. The rest of this piece walks through each device and how to build it.
Memes are paragraph breaks, not jokes
The most misunderstood device in the style. Imitators treat memes as humor injections and sprinkle them wherever a joke lands. In the measured footage the placement is far more systematic: meme, stock, and archival inserts run 1 to 3 seconds each and land after each explanatory beat. Explain a concept, then cut to the insert. Explain the next concept, insert again.
Functionally they are paragraph breaks. Dense technical narration has no natural visual rhythm, so the inserts manufacture one: they give the viewer a half-breath of cognitive rest, they reset the scroll-away timer with a new image, and they timestamp the boundary between one idea and the next. The joke is a bonus. The rhythm is the point.
To reproduce it: write your script first, mark every spot where a complete thought ends, and budget one 1 to 3 second insert per mark. At 13 cuts per minute you will need a lot of them, which is why a searchable local library of reaction clips and stills matters more than any single great meme.
Additive diagrams on a black canvas
The second-largest surface, about a quarter of all frames, is the signature look: flat custom diagrams on pure black. The key mechanic is that they are additive. Elements pop in one per narrated clause. Say "the kernel," a box appears. Say "talks to the hardware," an arrow and a second box appear. The diagram is never shown finished and then explained; it assembles itself in sync with the voiceover.
This matters for the cut math. A 20-second diagram sequence with an element appearing every 2 to 3 seconds reads as continuous motion even though scene detection may register few hard cuts. Within-frame animation substitutes for cutting, which is how the style sustains its density without becoming a strobe.
To reproduce it: black background, flat shapes, no gradients or drop shadows, and one animation trigger per clause of your narration. Build the diagram from the script, not the other way around. If a clause has no visual change attached to it, either the clause is filler or the diagram is missing a beat.
Angled stickers instead of captions
Here is a cross-creator fact from the full study that deserves its own article: across all 502 classified frames from Fireship, MKBHD, and Ali Abdaal combined, exactly zero contained burned-in speech captions. The word-by-word subtitle style that dominates Shorts and TikTok does not appear in high-performing long-form at all. We covered why in the seven laws of retention editing.
What Fireship uses instead is keyword stamps: bold 1 to 3 word phrases set on colored rectangles rotated 3 to 10 degrees, slapped on like stickers. PAGE FAULT. SELF-HOSTED. They land on the technical term at the moment it is spoken, and only on terms worth remembering. In the terminal segments the same job is done by a floating label that sits beside the prompt and names the flag under discussion.
To reproduce it: treat on-screen text as an index, not a transcript. One sticker per new concept, rotated a few degrees so it reads as placed rather than templated, removed after 1 to 2 seconds. Let platform captions handle accessibility.
Live terminals: the only legitimate long shot
In a style that cuts every 3 seconds, the longest unbroken shots we measured ran 23.7 to 47 seconds. Every single one was either the sponsor segment or a live-typed terminal. That is not a lapse in discipline. A terminal being typed into carries its own within-frame motion: the cursor, the keystrokes, the output scrolling in. The screen changes constantly even though the shot never cuts.
The terminals are real and live-typed with real output, and that authenticity is load-bearing for a developer audience that can smell a mocked-up screenshot instantly. The paper-explainer videos get the same treatment with a different surface: real document scans marked up with translucent highlighter and hand-drawn red arrows, about 12 percent of frames.
To reproduce it: record the actual command running, keep the whole execution as one shot, and resist cutting away mid-command. The rule of thumb from the study: a long hold is legitimate only when the frame animates itself. A static slide held for 30 seconds is a retention leak; a terminal held for 30 seconds is a feature.
The hidden skeleton: a 10-item listicle and a hard stop
Under the chaos, the structure is rigid. The videos are hidden numbered listicles, 10 items, with roughly one boundary per minute. Each boundary is a 2 to 4 second black card with a numbered badge and a logo, and that card is the only segmentation the video gets. None of the three videos use YouTube chapter markers. The numbered card does the chapter's job in-edit.
The hook follows the same discipline. Cold open on sentence one, direct premise, escalating jokes and stakes at 12 to 18 cuts per minute, promise landed within 25 to 35 seconds, then straight into item 1. The sponsor read arrives mid-roll at 45 to 52 percent of runtime in two of the three videos, rendered in the exact house style, and it is reliably the longest unbroken shot in the video.
The ending is the most copyable device of all. Content runs to the final 5 seconds or so, a fixed sign-off ("Thanks for watching and I will see you in the next one"), and the final frame is a gag clip. Zero endscreen real estate, and the edit stays hot to the end: 5 to 10 cuts in the last 30 seconds. Compare that with the standard creator outro, a minute of subscribe animations over dead air, and it is obvious which one respects the viewer's time.
The recipe on one card
| Device | Measured spec |
|---|---|
| Cut rhythm | 13.3 cuts/min aggregate; median shot ~3s; p90 ~9s |
| Face time | 0% of frames; voiceover only |
| Meme inserts | ~35% of frames, 1 to 3s each, after each explanatory beat |
| Diagrams | ~25% of frames; flat on black; elements pop in per narrated clause |
| Text | No speech captions; 1 to 3 word stickers rotated 3 to 10 degrees |
| Long shots | 23.7 to 47s; only terminals and the sponsor segment |
| Structure | Hidden 10-item listicle; ~1 boundary/min; 2 to 4s numbered black cards |
| Ending | Content to final ~5s; fixed sign-off; gag final frame; no endscreen |
How much of this can WritePanda automate?
Honest answer: the assembly, not the taste. This study is baked into WritePanda's editing agent as the dev-explainer recipe, so you can hand it a screen recording plus narration and ask for the structural layer: the numbered black boundary cards, angled sticker keywords timed to the transcript, additive diagram segments rendered as motion graphics, cuts on beat, and the hard-stop ending with no outro padding. It will also flag any long static stretch that is not a terminal or a deliberate payoff.
What it will not do is write your jokes, pick the meme that lands, or give you the writing density that makes this style work at 13 cuts a minute. The script is still the product. The editor is the part you no longer have to do by hand. See how agent-driven editing works for the full workflow.
FAQ
How many cuts per minute does Fireship use?
We measured 11.8 to 15.4 hard cuts per minute across the three videos, 13.3 aggregate, with a median shot around 3 seconds. Hooks run 12 to 18 cuts per minute.
Does Fireship use captions?
No. Zero of 134 classified frames had burned-in speech captions. Text on screen is limited to angled sticker keywords and floating terminal labels; platform captions handle accessibility.
Does he ever appear on camera?
Not in the footage we measured: talking-head frames were 0 percent. That makes this the most practical style for founders who want to publish without being on camera at all.
Can I use this style without becoming a clone?
Yes. The devices are structural: cut rhythm, additive diagrams, live terminals, listicle skeleton, hard-stop ending. Change the visual language and voice, and the structure works without borrowing the brand.