OCR and Document Intelligence

This page documents the server-side OCR helpers in SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr. The key entry points are SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/OcrUtils.java, SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/DocumentAiUtils.java, and SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/VisionApiUtils.java.

Purpose

The OCR layer extracts text from PDFs and images, optionally adds OCR text layers back into generated PDFs, and normalizes page and image handling so other services can consume the results.

Scope

This page focuses on OCR and document intelligence helpers. It does not cover the broader web-service API or desktop conversion utilities unless they feed OCR directly.

Entry Points

SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/OcrUtils.java
SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/OcrSettings.java
SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/DocumentAiUtils.java
SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/VisionApiUtils.java
SC/suredms-web-service/src/main/java/com/sureclinical/suredms/ocr/tesseract/TesseractUtils.java

Primary Components

OcrUtils is the top-level facade. It chooses the engine, reads OCR text, adds OCR layers, and preprocesses pages for OCR runs.
OcrSettings holds processing options such as DPI, page size, orientation, auto-rotate, debug mode, and image colorspace.
DocumentAiUtils wraps Google Document AI integration, including caching of OCR results and PDF generation with OCR overlays.
VisionApiUtils wraps Google Vision OCR and converts OCR responses into the PDF overlay path.
TesseractUtils provides a local OCR path for image-based text extraction.

Data Flow

Callers pass a PDF or image file into OcrUtils with an engine name.
OcrUtils routes the request to Vision, Tesseract, or Document AI.
The selected engine extracts OCR data or text.
For overlay generation, pages may be preprocessed, split into images, and rendered back into a searchable PDF.
DocumentAiUtils caches OCR output so repeated runs can reuse prior results.

Key Behaviors

Vision engine output is serialized to JSON-like output per page.
Tesseract is limited to image inputs for now.
Document AI is the more complete path for PDFs and multi-page document processing.
Page size and DPI settings control image scaling before OCR overlay generation.

Dependencies and Integrations

Google Cloud Vision and Document AI provide OCR services.
iText PDF OCR classes create the final searchable PDF outputs.
Shared image and PDF utilities handle page conversion, resizing, and color-space adjustments.

Edge Cases and Constraints

OcrUtils still contains TODOs for multi-file support and broader PDF handling in the Tesseract path.
DocumentAiUtils can use either an in-memory runtime cache or a static cache folder for development and testing.
VisionApiUtils currently contains a very basic local GoogleOcrEngine adapter.

Purpose​

Scope​

Entry Points​

Primary Components​

Data Flow​

Key Behaviors​

Dependencies and Integrations​

Edge Cases and Constraints​

Related Documents​