IAAP Webinar: AI & Emotion in Digital Life
Scientific Computing Laboratory, Institute of Physics Belgrade, University of Belgrade
January 22, 2026
Key Insight
Specialized models outperform LLMs for well-defined classification tasks at scale
Model: BART Large MNLI (~2GB RAM)
# MLK's "I Have a Dream"
text <- "So even though we face the
difficulties of today and tomorrow,
I still have a dream. It is a dream
deeply rooted in the American dream."
# Custom emotion labels
emotions <- c("anger", "fear", "joy",
"sadness", "optimism",
"hope", "surprise")
# Analyze with BART
results <- transformer_scores(
text = text,
classes = emotions,
model = "bart-large-mnli"
)Zero-shot
No training needed - define any labels you want!
Let’s see CLIP in action with political portraits…
Trump Portraits - Zero-shot Emotion Classification
Same approach works for faces - dramatically different emotional profiles from official portraits
Images and text describing similar concepts are mapped to nearby locations in shared embedding space
This massive scale enables generalization to new concepts - including emotions
Key Insight
Label phrasing significantly affects accuracy. Try “a happy person” vs “happy” vs “a joyful scene”!
# Analyze video emotions
video_results <- video_scores(
video = "speech.mp4",
classes = c("anger", "fear",
"joy", "sadness",
"surprise", "neutral"),
nframes = 100
)
# Returns emotion scores
# for each sampled frameTime Series Output
Track emotional dynamics across video frames - great for speeches, debates, or social media content
| Labeling Approach | Joy Score |
|---|---|
| Adjectives (“joyful”) | 0.53 |
| Person (“a joyful person”) | 0.62 |
| Scene (“a joyful scene”) | 0.82 |
Label Engineering Matters
Scene descriptions capture context better for social images
| Model | Label Set | Hit@1 | Hit@2 |
|---|---|---|---|
| CLIP ViT-Base | Adjectives | 33.5% | 50.7% |
| CLIP ViT-Large | Scene Descriptions | 33.0% | 46.4% |
| Fine-tuned ViT-Large | Scene Descriptions | 44.1% | 62.0% |
Hit@1: Highest score is correct. Hit@2: Correct label in top 2.
8-class setup on 442 test images. Fine-tuning with just 2,489 images improves Hit@1 by 11 percentage points.
| Metric | Before | After |
|---|---|---|
| Hit@1 | 33% | 44% |
| Hit@2 | 46% | 62% |
Small dataset (2,489 images) yields significant gains
Practical Advice
Start with transforEmotion for scale, validate edge cases with LLMs if needed.
Co-authors: Hudson Golino (UVA), Alexander P. Christensen (Vanderbilt)
Questions?