How to Write a Video Script Using the AIDA Framework
Quick Answer
AIDA stands for Attention, Interest, Desire, Action. To apply it to a video script: open with a hook that stops the viewer (Attention), build credibility and context (Interest), show the transformation or benefit vividly (Desire), then close with a single clear call to action (Action). The framework works for tutorials, sales videos, and YouTube content alike.
“I had been structuring my sales videos as feature lists because that's what I knew from slide decks. Rebuilding them around AIDA moved my sales page conversion rate by 2.3 percentage points in the first month. The Desire stage rewrite — making outcomes specific instead of vague — was where most of the lift came from.”
Rachel V. — Online Course Creator, Denver CO
Why AIDA Works for Video (Not Just Ads)
AIDA — Attention, Interest, Desire, Action — was developed as a direct-response advertising framework in the late 1800s. It has outlasted every trend in copywriting, marketing, and now video, because it maps directly to how human beings process and respond to persuasion. The brain moves through these four states in sequence. Your job as a video creator is to move your viewer through them intentionally.
Having applied AIDA to hundreds of video scripts across YouTube tutorials, course content, brand videos, and sales pages, I can tell you it works as well spoken on camera as it does in written copy — if you adapt it for the medium. Spoken AIDA has different timing and different tools than written AIDA, and that's what this guide covers.
Stage 1: Attention — Stop the Scroll in 3–5 Seconds
The Attention stage is entirely contained in the first 3–5 seconds of your video. In that window, the viewer decides whether to stay or skip. Written copy can earn Attention with a headline; video earns it with an opening visual or spoken statement that is unexpected, provocative, or directly relevant to a known frustration.
Written AIDA Attention vs. Spoken AIDA Attention
Written: a bold headline, unusual statistic, or compelling image.
Spoken video: a specific question, a surprising statement, or a direct problem declaration. Examples that work:
- 'If your videos keep getting skipped in the first 5 seconds, this is why.'
- 'I tried every teleprompter on the market and almost all of them made me sound worse.'
- 'Here's what nobody tells you about recording video from home.'
The formula is: specificity + either surprise or direct problem recognition. Generic opening lines — 'Hi, welcome back to my channel' or 'Today we're going to talk about...' — fail the Attention test because they contain no surprise and make no immediate promise.
Stage 2: Interest — Build the Bridge from Problem to Solution
Once you have the viewer's Attention, Interest is where you earn the right to keep it. This is the hardest stage for most video creators because it requires talking about the viewer, not about yourself or your solution.
Interest is built by making the viewer feel understood. You describe their specific situation — the frustration, the failed attempts, the context — with enough precision that they feel seen. When someone hears their own experience described accurately, they lean in. That lean-in is Interest.
Practical Interest Techniques for Video
- Before state narration: 'You've probably tried X and found that it produces Y problem.' Describe the experience they know before showing them the solution they don't.
- Contrasting common knowledge with a better approach: 'Most people do this. Here's why it produces diminishing returns — and what actually works.'
- Credentials earned, not declared: 'After recording more than 300 videos, I learned...' is more compelling than 'I'm an expert.' Show the experience in the claim.
The Interest stage in a video typically runs 20–30% of total length — roughly 2–3 minutes in a 10-minute video, or 10–15 seconds in a 60-second Short.
Stage 3: Desire — Make the Outcome Vivid and Specific
Desire is about the transformation, not the mechanism. Viewers don't desire a process — they desire the outcome the process creates. This distinction is the difference between a video that informs and a video that motivates action.
Written AIDA Desire often uses lifestyle imagery or testimonials. Spoken video AIDA Desire uses:
- Specific results with concrete numbers: 'After applying this method, my average view duration went from 38% to 67% — in one week.'
- Before/after contrast stated verbally: 'Imagine finishing a video session in one take instead of six. That's what this process gives you.'
- Viewer-referenced outcomes: 'By the end of this video, you'll have a structure you can apply to every video you make from here on.'
Notice all of these are specific. Vague desire ('you'll be a better creator') produces no emotional response. Specific desire ('you'll cut your recording time in half') produces a response you can feel.
Stage 4: Action — One Clear Next Step
The Action stage is where most video creators either skip the CTA entirely or stack so many asks that the viewer does nothing. One action. Not five. One.
The most effective CTA in video is the one most tightly connected to what the viewer just experienced. If you just demonstrated a technique, the action is 'try this in your next recording and leave a comment telling me what changed.' If you're selling a product, the action is 'click the link and start your free trial.' If you're building an audience, the action is 'subscribe and I'll send you a new strategy every week.'
One of the ways I've seen creators execute Action well is pairing it with a bridge: 'You've got the framework now — if you want to put it into practice immediately, load your next script into Telepront and let the voice-scroll prompter handle pacing while you focus on delivery.' That's a connected action that extends the session rather than ending it.
Putting AIDA Together: A Script Structure in Practice
A 60-second short-form video using AIDA might look like this:
- 0–5s (Attention): 'Your first 5 seconds are the only ones that matter — here's exactly what to say.'
- 5–20s (Interest): 'Most creators open with an introduction. Viewers swipe during introductions. The algorithm sees that swipe and buries the video.'
- 20–45s (Desire): 'When you open with a specific problem declaration instead, viewers stop and watch. I tested this on 12 consecutive videos and saw a 40% improvement in 30-second retention on every one.'
- 45–55s (Action): 'Write your next hook using the formula in my pinned comment. Test it on your next video and see what happens.'
Notice how naturally each stage flows into the next when the ratio is correct — roughly 10%, 25%, 50%, 15% for short-form. For long-form, the proportions shift: Attention stays short, Interest expands, Desire deepens, Action stays singular and crisp.
“The distinction between written AIDA and spoken AIDA is something I genuinely hadn't thought about before. The advice about Attention being earned through problem specificity rather than bold typography was the key insight for me. My first 5 seconds are completely different now and my click-through rate has noticeably improved.”
Samuel O. — YouTube Finance Creator, Atlanta GA

Use this script in Telepront
Paste any script and it auto-scrolls as you speak. AI voice tracking follows your pace — the floating overlay sits on top of Zoom, FaceTime, OBS, or any app.
Your Script — Ready to Go
AIDA Framework Explainer Script · 161 words · ~1 min · 134 WPM
Fill in: PLACEHOLDER: Show AIDA proportion diagram here
Creators Love It
“Excellent framework application guide. I appreciated that this treats AIDA as a dynamic, proportional structure rather than a rigid four-equal-parts formula. The 10/25/50/15 short-form ratio is a practical detail that most AIDA explainers skip entirely.”
Lily T.
Brand Strategist, New York NY
See It in Action
Watch how Telepront follows your voice and scrolls the script in real time.
Every Question Answered
5 expert answers on this topic
Is AIDA the best framework for all video scripts?
AIDA is the most versatile framework for persuasive and educational video because it maps to natural decision-making psychology. For purely informational content (tutorials, how-tos without a sale), PAS (Problem-Agitate-Solve) is sometimes simpler and faster. For storytelling and narrative content, a three-act structure works better. AIDA is the default recommendation because it handles the widest variety of video objectives.
How long should the Attention hook be in a video script?
On short-form platforms (Shorts, Reels, TikTok), the Attention hook should complete within the first 3–5 seconds — roughly 7–12 spoken words at normal pace. On YouTube long-form, you have up to 15 seconds before significant viewer drop-off begins, but tighter is always better. A hook should make one clear promise or provoke one clear question. Adding a second promise dilutes both.
Can I use AIDA in a how-to tutorial that isn't trying to sell anything?
Absolutely — AIDA applies anywhere you want a viewer to take an action, and 'watch to the end' or 'try this technique' counts as an action. Attention: open with the problem the tutorial solves. Interest: establish why this approach works and why you know. Desire: show what the viewer can do or know after watching. Action: tell them the specific next step to apply what they learned.
What's the most common AIDA mistake video creators make?
Under-investing in Desire. Most creators spend 60–70% of their video in the Interest stage (building context, explaining the mechanism) and only 10–15% showing the specific outcome. Viewers need to feel the transformation before they'll act on it. Concrete, specific outcome descriptions — numbers, before/after contrasts, viewer-referenced results — should make up at least 40–50% of a persuasive video script.
Should I say the AIDA stages out loud in my video?
No — AIDA is an invisible architecture, not a visible label. The viewer should experience the progression naturally without being told 'now we're in the Desire stage.' Labeling the framework breaks the immersion and turns persuasion into instruction. The only exception is a video explicitly teaching AIDA, where naming the stages is the content.