Google Flow at labs.google/fx/tools/flow is one of the most capable AI video generation tools available right now. Feed it a text prompt, wait 60–90 seconds, and you get a cinematic clip powered by VEO 3.1. The output quality is genuinely impressive.
The problem: it's entirely manual. One prompt at a time, one click at a time. For any real production workflow — running 50+ prompts across different orientations, retrying on failures, tracking which media IDs correspond to which prompt — it's completely unusable as-is.
So we built our own automation layer.
I. What It Does
The tool is a desktop app built with Electron + React + TypeScript. It embeds the Flow web interface in a hidden browser window and drives it programmatically while showing a control panel in a separate window.
Key capabilities:
- Batch queue — paste a list of prompts, set batch size, let it run
- Rolling concurrency — submits new prompts as previous ones complete, keeping the queue saturated
- Two generation modes — text-to-video and image-to-video (with product asset selection)
- Retry logic — configurable max retries per prompt before skipping
- Media ID extraction — extracts the
mediaIdfrom each completed tile's DOM for downstream use - Session persistence — cookies and user-agent stored across restarts
II. The Engineering
1. Automating a Slate Editor
The prompt input on Flow uses Slate.js, a rich-text editor framework. You can't just set element.value = "..." like a regular textarea.
Our approach: simulate a ClipboardEvent with a DataTransfer object containing the prompt text. Slate intercepts paste events at the React layer, which makes the paste-event path the most reliable way to inject text without fighting the editor's internal state machine.
const dataTransfer = new DataTransfer()
dataTransfer.setData('text/plain', prompt)
const pasteEvent = new ClipboardEvent('paste', {
bubbles: true, cancelable: true, composed: true,
clipboardData: dataTransfer,
})
promptInput.dispatchEvent(pasteEvent)
If the paste path fails (which can happen on first load), we fall back to execCommand('insertText') character by character.
2. Fingerprinting and User-Agent
Running automation against a web app risks triggering bot detection. Flow sits behind reCAPTCHA Enterprise and has its own session monitoring.
Our mitigations:
- Persistent user-agent — generated once per install using a realistic Chrome/macOS pattern, saved to disk, reused across restarts (avoids the "UA changes every session" detection signal)
- Electron fingerprint spoofing — overrides
navigator.platform,navigator.hardwareConcurrency, canvas fingerprint, and WebGL renderer strings via a preload script injected before page load - reCAPTCHA serialization — Google flags concurrent token requests from the same session as
UNUSUAL_ACTIVITY. We serialize all solves with a mutex so only one token is in-flight at a time.
3. Headless API Path
Beyond DOM automation, we also reverse-engineered Flow's internal API from network traffic: direct calls to aisandbox-pa.googleapis.com with the same payload structure the web UI uses. This path bypasses the DOM entirely and is significantly faster.
The trade-off: it requires a valid reCAPTCHA token per request, and Google rotates the endpoint structure periodically.
4. Tile Tracking
Flow renders generated media as "tiles" in a grid. Each tile gets a data-tile-id attribute. We snapshot tile IDs before clicking Generate, then watch for new tile IDs to appear — that's how we match tiles to prompts in a concurrent batch.
const beforeTileIds = getAllTileIds()
generateButton.click()
const newTileId = await waitForNewTile(beforeTileIds, 5000)
III. Why It's Internal
This tool exists in a gray area. It automates a Google product that doesn't offer an official API. We use it for internal production workflows only — not distributed, not open-sourced.
The reverse-engineering work is also fragile by nature: Flow updates its UI and internal API periodically, and the tool requires maintenance to keep up.
If Google ever ships an official Flow API with batch support, this becomes obsolete. Until then, it works.