Day 1 - Video generation with eye-tracking and QR code generation

I will try to publish creating stuff I do with Claude Code. I hope more people do it so I can copy their ideas. Today is day 1!

Have you tested your breed-identifying skills lately? Try it on BreedGuessr.

What I did

I had tagged images from Stanford Dogs Dataset (20,580 images across 120 breeds). I created a video for tiktok trying to get viral. My idea was to show swift image slides for dogs with their breeds. I wanted their eyes to be aligned so it gets less dizzying.

Key Technical Decisions

Human Detection: Started with OpenCV’s Haar cascade but it missed too many humans (especially in profile). Switched to LLaVA 7B via Ollama - 10x better detection, worth the ~2s per image tradeoff.

Eye Detection: OpenCV’s human eye cascade surprisingly works on dogs. Added quality filters: exactly 2 eyes, similar size (ratio < 2.5x), roughly horizontal alignment, reasonable distance (5-70% of image width). Each detection gets a confidence score.

The Alignment Magic: Every dog’s eyes land at the exact same screen position. Calculate midpoint between eyes, scale so eye distance = 180px, translate so midpoint lands at (360, 400) on a 720x1280 canvas. This creates the morphing illusion.

Video Assembly: FFmpeg concatenates at 5fps, burns in breed subtitles, adds a QR code splash screen, and encodes H.264 for Twitter.

Results

From 20,580 source images: ~9,000 rejected for bad eye detection, ~150 rejected for humans, ~450 viable candidates. Top 150-375 selected by confidence. Processing time: ~30 minutes (dominated by LLaVA inference).

See the result on TikTok.