Archive Parsing Pipeline
This page describes how SureDrive archive content is parsed into the internal archive model. The main orchestration class is SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/ArchiveParserManager.java, which selects a parser implementation from the ArchiveParserService registry.
Purpose
The parsing pipeline normalizes files and streams into an ArchiveParsingResult that contains an ArchiveModel, parser warnings, and any fixed-up metadata. It is the shared import path for workbook-driven content models and related archive definitions.
Scope
This page focuses on parsing orchestration and model conversion. It does not document the desktop runtime import UI or OCR processing.
Entry Points
SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/ArchiveParserManager.javaSC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/engine/ArchiveParserService.javaSC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/ArchiveParserContext.javaSC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/model/ArchiveModel.javaSC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/engine/ArchiveParserErrorPolicy.java
Primary Components
ArchiveParserManagerdiscovers availableArchiveParserServiceimplementations throughServiceProviderand selects one based on the declared input type.ArchiveParserServiceis the parser contract. Implementations advertise supportedArchiveParserInputTypevalues and return anArchiveParsingResult.ArchiveParsingContextcarries request-scoped details such as username and archive id.ArchiveModelis the intermediate model that accumulates parsed content model metadata, categories, folders, organizations, people, roles, documents, and discrepancy types.ArchiveParsingResultcontains the archive model and any parser errors or warnings.
Data Flow
ArchiveParserManager.parse(...)accepts a file, stream, or explicitArchiveParserInputType.- The manager resolves the input type from MIME type or the explicit caller-supplied type.
- The selected parser implementation parses the source into an
ArchiveParsingResult. - If
FIX_ERRORS_IF_POSSIBLEis active, the manager fills in missing property metadata defaults. - The calling feature consumes the result to build the final archive objects or display validation issues.
Key Behaviors
- Input type resolution is centralized in the manager rather than scattered across individual callers.
- Parsers can be specialized for different file kinds while still returning the same result contract.
- Error policy matters: the parser can preserve strict validation errors or fill in defaults where possible.
Dependencies and Integrations
ArchiveParserManagerdepends onServiceProviderto find parser implementations.ArchiveModeldepends on shared entity enums such asContentModelVersion,NumberingScheme,RequirementType,PropertyValueType, andMetadataType.ExcelArchiveParserand the specialized parsers insuredms-parserimplement the concrete workbook handling.- Desktop and import tools in other modules reuse the same parser contracts when they process archive definitions.
Edge Cases and Constraints
- Unknown MIME types fall back to
ArchiveParserInputType.SAVE. - Parsing a file with an unsupported office format throws a clear parsing error.
- The manager logs content-model details after parsing, which helps diagnose unexpected workbook inputs.