Skip to content

Document Parsing System

This platform automates data extraction from financial documents, designed specifically to optimize workflow efficiency for collection agency managers. It seamlessly processes uploaded documents such as payment schedules and fee schedules.

OCR converts uploaded documents into structured JSON, making them immediately actionable for downstream systems.

Primary Objective: Convert unstructured data in documents into structured data enabling automated, high-speed business workflows.


1. Dashboard — System Overview

The Dashboard provides a consolidated, real-time view of document parsing operations. Administrators and managers can monitor document activity and system performance in real-time.

The Document Hub menu is located in the left-hand panel of the Home screen, with four navigation options:

  • Dashboard (Home Screen)
  • Upload
  • Documents
  • Analytics

To upload documents, click Upload in the left-hand menu. One or multiple documents can be uploaded at once.

Dashboard Overview

The Dashboard Overview contains four metric sections.

Total Documents

Shows the cumulative number of documents uploaded by the user up to the current date and time, including all historically uploaded files.

Processing Queue

Displays the number of documents currently waiting for processing. This is particularly useful during batch uploads, enabling users to gauge the time required to complete processing of all queued documents.

Processed Today

Indicates how many documents were successfully processed during the current day. This is useful for daily reporting, providing an at-a-glance count for operational logs.

Success Rate

Shows the percentage of documents processed successfully. This is especially useful for identifying upload failures caused by unsupported file size or format.

Tip: A declining success rate may indicate document formatting issues or OCR extraction challenges.

Status Distribution

Documents are classified according to their current stage in the processing lifecycle. Typical statuses include:

  • Uploaded
  • Processing
  • Processed
  • Review Required
  • Approved
  • Rejected
  • Failed

Note: Status distribution helps operations teams quickly identify bottlenecks in the processing pipeline.

Processing Performance

The dashboard tracks the following performance indicators:

  • Average document processing time
  • Processing throughput

Info: Processing time is measured in milliseconds for greater accuracy.

System Health

System health indicators monitor the status of core infrastructure components.

Health Checks:

  • Database connectivity
  • OCR service availability
  • File storage availability

Warning: If the OCR service or database becomes unavailable, document processing will fail.


2. Document Upload

The Document Upload screen allows authorized users to submit documents for parsing.

Supported File Types

Currently Supported

  • PDF

Planned Future Support

  • CSV
  • Excel
  • Image files (PNG, JPG)

Warning: Currently, only PDF files are accepted. Unsupported formats will be rejected during upload.

Upload Constraints

Constraint Value
Maximum file size 10 MB
Maximum files per batch upload 5 files

Note: Files exceeding these limits will not enter the upload queue. Files present in the queue have already passed validation for file type (PDF) and size (under 10 MB).

Upload Workflow

Step 1 — Select File

Users choose files from their local system using one of the following methods:

  • Drag-and-drop upload
  • File browser selection

Step 2 — Upload Queue Preview

Selected files appear in the upload queue preview before processing begins.

Tip: Multiple documents can be uploaded together for batch processing.

Step 3 — Start Upload

When the Upload button is triggered:

  • Files are uploaded to the server.
  • The processing pipeline begins.

Warning: Files remain in the queue until the user explicitly triggers the upload.

Step 4 — Upload Status

Documents display real-time upload status. Possible statuses include:

  • Queued
  • Uploading
  • Processing
  • Completed
  • Failed

3. Document Library & Parsing

The Document Library stores all uploaded documents and provides access to parsing results. This screen is the core operational interface for reviewing parsed data.

Document Repository

Uploaded documents appear in a tabular list with the following columns:

  • Document Name
  • Status
  • File Size
  • Upload Date
  • Processed Date
  • Actions

Document Actions

Users can perform the following actions on documents:

  • Download original document
  • View parsed document details
  • Approve parsed results
  • Reject parsing results
  • Delete document

Note: Only authorized users can approve or reject parsed documents.

Document Lifecycle

Status Description
Uploaded File successfully uploaded
Processing OCR and parsing in progress
Processed Parsing completed
Review Required Manual review needed
Approved Document validated
Rejected Parsing result rejected
Failed Processing error occurred

Document Parsing Details

When a document is opened, the Document Details panel displays the following parsing information.

Metadata:

  • Document ID
  • Storage path
  • File type
  • File size

Processing Information:

  • OCR provider
  • Page count
  • Processing time

Info: The current OCR provider is AWS Textract.

Extracted Data Output

Parsed document data is displayed in JSON format. Example structure:

{
  "key": "Account Name",
  "value": "Dexter & Company",
  "confidence": 0.92,
  "boundingBox": {
    "top": 0.16,
    "left": 0.17,
    "width": 0.38,
    "height": 0.04
  }
}

Extracted Attributes:

Field Description
Key Label detected in document
Value Extracted value
Confidence OCR confidence score
Bounding Box Location of field within the document

Tip: Bounding box coordinates allow the system to map extracted data back to the original document location.

Document Review Workflow

After parsing completes, documents enter Review Required status. Users can perform one of three actions:

Approve — Confirms that extracted data is correct.

Reject — Indicates parsing errors or incorrect field extraction.

Delete — Removes the document entirely from the system.

Warning: Documents should be reviewed carefully before approval. Approved data may trigger downstream workflows.


4. Analytics & Monitoring

The Analytics section provides operational visibility into document processing.

Document Processing Metrics

The Analytics screen displays the following metrics:

  • Total documents processed
  • Processing success rate
  • Failed document count
  • Documents currently in queue

Tip: Monitoring these metrics helps maintain consistent document processing performance.

Recent Document Activity

Displays recently processed documents, including:

  • Document name
  • File type
  • Processing status
  • Upload timestamp
  • Processing duration

Processing Performance Monitoring

Performance analytics measure:

  • Average processing time
  • Document processing throughput

Info: Processing time varies by document size and complexity.

Data Storage Model

The system stores document files and parsed data separately, improving traceability and scalability.

Component Storage Location
Uploaded documents File storage
Parsed JSON output Database
Storage path Metadata reference

Future Enhancements

Planned system improvements include:

  • CSV document parsing
  • Excel document parsing
  • Image-based document parsing (PNG, JPG)
  • Improved error detection
  • Automatic Payment Schedule Generation

Tip: Payment schedule generation will enable direct conversion of parsed documents into structured payment workflows.