Skip to main content

Excel and Template Importers

This page covers the workbook-driven import path used for SureDrive and related content models. The main parser is SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/excel/ExcelArchiveParser.java, with version-specific wrappers such as SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/excel/ExcelEtmf33Parser.java and SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/excel/ExcelQms10Parser.java.

Purpose

The Excel importers translate workbook sheets into an ArchiveModel by reading categories, content types, folders, data property definitions, organizations, people, roles, and discrepancy types.

Scope

This page focuses on workbook parsing and template adapters. It does not cover OCR or XML parsing.

Entry Points

  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/excel/ExcelArchiveParser.java
  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/excel/ExcelEntityReader.java
  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/excel/ExcelSheet.java
  • SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/excel/ISFContentModelArchiveParser.java
  • SC/suredms-xls-parser/src/main/java/com/sureclinical/suredms/parser/ExcelParser.java

Primary Components

  • ExcelArchiveParser reads the workbook, detects sheets, and dispatches each sheet to helper methods.
  • ExcelEntityReader converts rows into parser models and validates folder numbering, parent-child structure, metadata values, and sheet-specific rules.
  • ExcelSheet abstracts Apache POI sheet access and normalizes cell reading, column counting, and header lookup.
  • ISFContentModelArchiveParser is the simplified parser variant for ISF content model workbooks.
  • ExcelParser in suredms-xls-parser is the desktop-side workbook parser that produces ArchiveCtx objects and drives the older desktop import pipeline.
  • Version-specific wrappers such as ExcelEtmf33Parser and ExcelQms10Parser load bundled template workbooks and then hand them to ExcelArchiveParser.

Data Flow

  1. A workbook is opened with Apache POI.
  2. The parser checks for known sheet names such as properties, annotations, categories, folders, organizations, persons, roles, users, and discrepancy types.
  3. ExcelEntityReader maps rows into parser models and attaches annotations or validation errors.
  4. Parsed models are assembled into an ArchiveModel.
  5. Template-specific parsers set the content model version, name, and date after parsing.

Key Behaviors

  • The parser supports both legacy and current sheet labels, which lets older workbook templates continue to work.
  • If FIX_ERRORS_IF_POSSIBLE is enabled, the parser relaxes some input formatting issues and fills sensible defaults.
  • Folder and content-type ids are validated to prevent duplicate ids and invalid parent relationships.
  • The desktop ExcelParser follows a parallel but separate path and builds ArchiveCtx for client-side processing.

Dependencies and Integrations

  • Apache POI provides workbook access.
  • Parser models in SC/suredms-parser/src/main/java/com/sureclinical/suredms/parser/model are the intermediate structures.
  • Shared entity enums and helper utilities supply numbering, metadata, and validation behavior.
  • ServiceProvider is used by the wider parser system to select parser implementations.

Edge Cases and Constraints

  • Unsupported Office XML files raise a specific parsing error that asks the user to convert the workbook.
  • Empty rows terminate workbook scans.
  • ExcelSheet treats formula, numeric, boolean, and blank cells differently so values stay consistent across sheets.