Multi-AI Workflow: How I Shipped an OCR Feature with Gemini + Claude
Today I added a full OCR feature to VoxLogAI — without writing a single line of code myself. Just AI prompts, a few clicks, and some glue.
Here's how it went down:
🧠 The Setup
VoxLogAI started as a tool for fast, AI-powered audio transcription (YouTube, MP3, etc). I wanted to expand it to also extract text from PDFs and images — basically add OCR support.
Instead of opening an editor, I opened Google AI Studio and started chatting with Gemini (gemini-2.5-pro-exp-03-25
).
📜 The Prompt
First, I set up a system prompt to give Gemini context:
You will act as a technical Product Owner and will help the user refining tickets. The original codebase will be passed under <codebase/>.
Then I passed in the code using a tool called Repomix, which lets us flatten the repo into an XML structure.
I also included working OCR scripts for reference:
gemini_image_ocr.py
gemini_pdf_ocr.py
My prompt:
Please look at gemini_image_ocr.py and gemini_pdf_ocr.py
These are scripts that send an image/pdf to Gemini to perform OCR.
Now, I’d like to incorporate that into my app. But how, UI-wise? At the moment it's only for audio transcription, so I guess we’d need to add a new OCR tab?
What do you think? Brainstorm with me before writing any code.
<example_code>
[the contents of the python scripts]
</example_code>

💡 Gemini’s Design Help
Gemini responded with a few UI/UX options. Initially it leaned toward Option 2 (embed OCR in existing flow), but after I asked about mobile usage, it switched to Option 3 — a dedicated OCR tab.
Then I asked it:
“then let's refine ticket for option 3 please”
You can view the refined Gemini ticket here: View the full Gemini-generated ticket (Markdown).
🛠️ Claude Does the Code
With the ticket in hand, I switched to Claude.
I asked it to read the ticket and implement the feature. It handled it end-to-end. The result worked out of the box, with just one minor bug:

I reported the bug (which was a minor UI issue) to Claude → fixed:

Then I asked Claude to refactor a bloated file, and it did.
"perfect! now, a bit of refactor is needed IMO: we have the transcriber logic in transcriber.py, which i like. shouldn't we have a ocr.py (or wtv name you think is best) and include the logic there too? app.py seems to be doing a lot now. what do you think? i'm open to be challenged"
🧼 Gemini Does the PR Review
After Claude was done, I staged the changes. Claude can occasionally overwrite previous changes if the context gets messy — committing often helps preserve state.
Then I saved the git diff into a file to prepare the code review I would be asking Gemini to do:
git diff --staged > /tmp/git_diff
Then I went back to Gemini, passed it the git diff and said:
"please now put your Senior developer hat on and review the git diff that implemetnes this feature. focus on code quality, tech debt, securiuty vulnerabilities, etc"
Gemini came back with a bunch of solid code-level suggestions: Gemini's full code review for the OCR feature (Markdown)
🔁 Claude Finalizes
I passed Gemini’s suggestions back to Claude (literally copy/paste).
Then ran:
git diff > /tmp/git_diff_latest
Sent that to Gemini for a final pass — it approved.
✅ Shipped.
Changes committed. Feature done. You can see the PR here: feat: Add Image/PDF OCR and Refactor UI Logic
- 🧾 Cost: $2.97
- ⏱️ Time: < 1 hour
- 👨💻 Code written manually: 0 lines
🧠 Lessons & Workflow Tips
- Claude is fantastic for coding — especially in small, clean codebases.
- Gemini is killer at reasoning, product design, and high-level planning.
- The combo? 🔥. Design with Gemini, code with Claude.
- I always pass code as XML because AI models seem to understand structure better that way.
- With Claude, commit often — it might overwrite things if the context window overflows.
⚡ Verdict
This was a real productivity boost.
Not because AI replaced the thinking — but because it helped me move fast through the grind.
And, more importantly, it was fun!