Skip to main content

Archive Parsing Pipeline

This page describes how SureDrive archive content is parsed into the internal archive model. The main orchestration class is SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/ArchiveParserManager.java, which selects a parser implementation from the ArchiveParserService registry.

Purpose

The parsing pipeline normalizes files and streams into an ArchiveParsingResult that contains an ArchiveModel, parser warnings, and any fixed-up metadata. It is the shared import path for workbook-driven content models and related archive definitions.

Scope

This page focuses on parsing orchestration and model conversion. It does not document the desktop runtime import UI or OCR processing.

Entry Points

  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/ArchiveParserManager.java
  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/engine/ArchiveParserService.java
  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/ArchiveParserContext.java
  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/model/ArchiveModel.java
  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/engine/ArchiveParserErrorPolicy.java

Primary Components

  • ArchiveParserManager discovers available ArchiveParserService implementations through ServiceProvider and selects one based on the declared input type.
  • ArchiveParserService is the parser contract. Implementations advertise supported ArchiveParserInputType values and return an ArchiveParsingResult.
  • ArchiveParsingContext carries request-scoped details such as username and archive id.
  • ArchiveModel is the intermediate model that accumulates parsed content model metadata, categories, folders, organizations, people, roles, documents, and discrepancy types.
  • ArchiveParsingResult contains the archive model and any parser errors or warnings.

Data Flow

  1. ArchiveParserManager.parse(...) accepts a file, stream, or explicit ArchiveParserInputType.
  2. The manager resolves the input type from MIME type or the explicit caller-supplied type.
  3. The selected parser implementation parses the source into an ArchiveParsingResult.
  4. If FIX_ERRORS_IF_POSSIBLE is active, the manager fills in missing property metadata defaults.
  5. The calling feature consumes the result to build the final archive objects or display validation issues.

Key Behaviors

  • Input type resolution is centralized in the manager rather than scattered across individual callers.
  • Parsers can be specialized for different file kinds while still returning the same result contract.
  • Error policy matters: the parser can preserve strict validation errors or fill in defaults where possible.

Dependencies and Integrations

  • ArchiveParserManager depends on ServiceProvider to find parser implementations.
  • ArchiveModel depends on shared entity enums such as ContentModelVersion, NumberingScheme, RequirementType, PropertyValueType, and MetadataType.
  • ExcelArchiveParser and the specialized parsers in suredms-parser implement the concrete workbook handling.
  • Desktop and import tools in other modules reuse the same parser contracts when they process archive definitions.

Edge Cases and Constraints

  • Unknown MIME types fall back to ArchiveParserInputType.SAVE.
  • Parsing a file with an unsupported office format throws a clear parsing error.
  • The manager logs content-model details after parsing, which helps diagnose unexpected workbook inputs.