Okay, Senior Developer hat is on. Let’s review this diff focusing on quality, potential issues, and security.

Overall Impression:

The implementation follows the “Mode Switcher” approach (Option 3) reasonably well. It adds the required UI elements, backend endpoints, and processing logic for image and PDF OCR. However, there are several areas for improvement regarding code quality (DRY principle), robustness, and potentially a bug in the frontend logic.

Positive Points:

Functionality: The core features (mode switching, conditional UI, new upload types, backend OCR calls) seem to be implemented.
Security (Sanitization): Excellent use of DOMPurify.sanitize for rendering the OCR results on the frontend. This is crucial.
Security (Input Validation): Good input validation on the backend (app.py) for file presence, filename, extensions, and size limits. Frontend validation is also present.
Privacy: The backend OCR implementation processes files directly (in memory or via PIL object) without saving temporary files unnecessarily, respecting the no-data-retention requirement for OCR.
Configuration: Correctly added ocr.py to Dockerfile and Pillow to requirements.txt.
Documentation: README.md updated clearly to reflect the new features.

Areas for Improvement / Concerns:

Frontend JavaScript - DRY Principle Violation (High Priority Tech Debt):
- Drag & Drop: The code for setting up drag and drop event listeners (dragenter, dragover, dragleave, drop, preventDefaults, highlight/unhighlight logic) is almost identical for the audio, image, and PDF upload zones. This is significant duplication.
- File Selection Handlers: handleFileSelect, handleImageSelect, handlePdfSelect are very similar (check file, check size, update UI, reset errors).
- Recommendation: Refactor this duplicated logic. Create helper functions, perhaps one generic setupDragAndDrop(zoneElement, fileInputElement, highlightColor, borderColor) and one generic handleFileSelection(event, infoElement, nameElement, maxSizeMB, errorElement, successCallback) to reduce code drastically and improve maintainability.
Frontend JavaScript - Tab Switching Logic (Potential Bug / Incompleteness):
- The original tabs.forEach(tab => { tab.addEventListener('click', () => {...}); }); block appears unchanged. This listener was likely set up only for the initial tabs present (.tab elements found on page load).
- When the mode switches and the document-ocr-tabs become visible, it’s unclear if this original listener correctly handles clicks on the newly relevant tabs (“Upload Image”, “Upload PDF”). It might only be attached to the audio/video tabs, or it might incorrectly interact with all .tab elements regardless of visibility.
- Recommendation: This needs thorough testing. The tab switching logic likely needs to be adjusted. It should probably operate only on the tabs within the currently active tab header (.audio-video-tabs or .document-ocr-tabs). A better approach might be to attach the listener to the parent tab header and delegate based on the clicked element (event.target.closest('.tab')), checking if it belongs to the active header.
Backend Robustness (ocr.py) - Missing Retries (Medium Priority Tech Debt):
- The transcriber.py wisely uses @retry from the tenacity library for API calls (upload_file, generate_content), as network issues or transient API errors can occur.
- The new ocr.py functions (ocr_image, ocr_pdf) make direct calls to client.models.generate_content without any retry mechanism. These calls are just as susceptible to transient failures.
- Recommendation: Add the same @retry decorator pattern used in transcriber.py to the ocr_image and ocr_pdf functions (or potentially wrap the generate_content call itself within a decorated helper function).
Backend Code Quality (ocr.py):
- from google.genai import types is imported inside the ocr_pdf function. Imports should generally be at the top level of the module for clarity and standard practice, unless there’s a specific reason (like avoiding circular imports, which isn’t the case here).
- Recommendation: Move from google.genai import types to the top of ocr.py.
Backend Efficiency/Security (app.py) - In-Memory Handling:
- ocr_pdf: pdf_content = file.read() reads the entire PDF (up to 20MB) into memory.
- ocr_image: image = PIL.Image.open(file) also loads the image data into memory via Pillow.
- Concern: While functional and respecting privacy (no disk writes), loading large files entirely into memory can lead to high RAM usage, potentially causing Denial of Service (DoS) under concurrent load, especially on resource-constrained environments like the specified 1gb Fly.io VM. Pillow also represents an additional dependency and potential attack surface if vulnerabilities exist in its parsers for complex/malformed image formats.
- Recommendation:
  - (Short-term): Accept this as a known limitation given the current implementation simplicity. Ensure Pillow is kept up-to-date.
  - (Long-term/Ideal): Investigate if the Gemini API (generate_content) can accept file streams or chunked uploads for images/PDFs. If so, refactor to avoid reading the entire file into memory. This would be more scalable and potentially safer. If not, this in-memory approach is likely necessary.
Frontend Code Quality - Naming:
- The main action button is still ID’d transcribe-button, even though it now performs transcription OR OCR.
- The results container is transcript-container, transcript-header, transcript-content, transcript-status.
- Recommendation: Rename these elements to be more generic now that the app does more than transcription (e.g., action-button, result-container, result-header, result-content, result-status). This improves clarity and maintainability.

Summary of Recommendations:

Fix: Address the potential bug in the frontend tab switching logic to ensure it works correctly for both modes.
Refactor (JS): Apply the DRY principle heavily to the frontend JavaScript for Drag & Drop and File Selection handling.
Refactor (JS): Rename UI elements (transcribe-button, transcript-container, etc.) to be mode-agnostic (action-button, result-container).
Add (Backend): Implement @retry logic in ocr.py for robustness.
Fix (Backend): Move the types import to the top level in ocr.py.
Monitor/Consider (Backend): Be aware of the memory implications of reading full files in app.py. Keep Pillow updated. Investigate streaming/chunking if performance/security becomes a concern.

Overall, this is a solid first implementation of the feature, but addressing the duplication and robustness points will significantly improve the quality and maintainability of the codebase. The potential tab switching bug needs immediate attention.