Document Parsing System
This platform automates data extraction from financial documents, designed specifically to optimize workflow efficiency for collection agency managers. It seamlessly processes uploaded documents such as payment schedules and fee schedules.
OCR converts uploaded documents into structured JSON, making them immediately actionable for downstream systems.
Primary Objective: Convert unstructured data in documents into structured data enabling automated, high-speed business workflows.
1. Dashboard — System Overview
The Dashboard provides a consolidated, real-time view of document parsing operations. Administrators and managers can monitor document activity and system performance in real-time.
The Document Hub menu is located in the left-hand panel of the Home screen, with four navigation options:
- Dashboard (Home Screen)
- Upload
- Documents
- Analytics
To upload documents, click Upload in the left-hand menu. One or multiple documents can be uploaded at once.
Dashboard Overview
The Dashboard Overview contains four metric sections.
Total Documents
Shows the cumulative number of documents uploaded by the user up to the current date and time, including all historically uploaded files.
Processing Queue
Displays the number of documents currently waiting for processing. This is particularly useful during batch uploads, enabling users to gauge the time required to complete processing of all queued documents.
Processed Today
Indicates how many documents were successfully processed during the current day. This is useful for daily reporting, providing an at-a-glance count for operational logs.
Success Rate
Shows the percentage of documents processed successfully. This is especially useful for identifying upload failures caused by unsupported file size or format.
Tip: A declining success rate may indicate document formatting issues or OCR extraction challenges.
Status Distribution
Documents are classified according to their current stage in the processing lifecycle. Typical statuses include:
- Uploaded
- Processing
- Processed
- Review Required
- Approved
- Rejected
- Failed
Note: Status distribution helps operations teams quickly identify bottlenecks in the processing pipeline.
Processing Performance
The dashboard tracks the following performance indicators:
- Average document processing time
- Processing throughput
Info: Processing time is measured in milliseconds for greater accuracy.
System Health
System health indicators monitor the status of core infrastructure components.
Health Checks:
- Database connectivity
- OCR service availability
- File storage availability
Warning: If the OCR service or database becomes unavailable, document processing will fail.
2. Document Upload
The Document Upload screen allows authorized users to submit documents for parsing.
Supported File Types
Currently Supported
Planned Future Support
- CSV
- Excel
- Image files (PNG, JPG)
Warning: Currently, only PDF files are accepted. Unsupported formats will be rejected during upload.
Upload Constraints
| Constraint | Value |
|---|---|
| Maximum file size | 10 MB |
| Maximum files per batch upload | 5 files |
Note: Files exceeding these limits will not enter the upload queue. Files present in the queue have already passed validation for file type (PDF) and size (under 10 MB).
Upload Workflow
Step 1 — Select File
Users choose files from their local system using one of the following methods:
- Drag-and-drop upload
- File browser selection
Step 2 — Upload Queue Preview
Selected files appear in the upload queue preview before processing begins.
Tip: Multiple documents can be uploaded together for batch processing.
Step 3 — Start Upload
When the Upload button is triggered:
- Files are uploaded to the server.
- The processing pipeline begins.
Warning: Files remain in the queue until the user explicitly triggers the upload.
Step 4 — Upload Status
Documents display real-time upload status. Possible statuses include:
- Queued
- Uploading
- Processing
- Completed
- Failed
3. Document Library & Parsing
The Document Library stores all uploaded documents and provides access to parsing results. This screen is the core operational interface for reviewing parsed data.
Document Repository
Uploaded documents appear in a tabular list with the following columns:
- Document Name
- Status
- File Size
- Upload Date
- Processed Date
- Actions
Document Actions
Users can perform the following actions on documents:
- Download original document
- View parsed document details
- Approve parsed results
- Reject parsing results
- Delete document
Note: Only authorized users can approve or reject parsed documents.
Document Lifecycle
| Status | Description |
|---|---|
| Uploaded | File successfully uploaded |
| Processing | OCR and parsing in progress |
| Processed | Parsing completed |
| Review Required | Manual review needed |
| Approved | Document validated |
| Rejected | Parsing result rejected |
| Failed | Processing error occurred |
Document Parsing Details
When a document is opened, the Document Details panel displays the following parsing information.
Metadata:
- Document ID
- Storage path
- File type
- File size
Processing Information:
- OCR provider
- Page count
- Processing time
Info: The current OCR provider is AWS Textract.
Extracted Data Output
Parsed document data is displayed in JSON format. Example structure:
{
"key": "Account Name",
"value": "Dexter & Company",
"confidence": 0.92,
"boundingBox": {
"top": 0.16,
"left": 0.17,
"width": 0.38,
"height": 0.04
}
}
Extracted Attributes:
| Field | Description |
|---|---|
| Key | Label detected in document |
| Value | Extracted value |
| Confidence | OCR confidence score |
| Bounding Box | Location of field within the document |
Tip: Bounding box coordinates allow the system to map extracted data back to the original document location.
Document Review Workflow
After parsing completes, documents enter Review Required status. Users can perform one of three actions:
Approve — Confirms that extracted data is correct.
Reject — Indicates parsing errors or incorrect field extraction.
Delete — Removes the document entirely from the system.
Warning: Documents should be reviewed carefully before approval. Approved data may trigger downstream workflows.
4. Analytics & Monitoring
The Analytics section provides operational visibility into document processing.
Document Processing Metrics
The Analytics screen displays the following metrics:
- Total documents processed
- Processing success rate
- Failed document count
- Documents currently in queue
Tip: Monitoring these metrics helps maintain consistent document processing performance.
Recent Document Activity
Displays recently processed documents, including:
- Document name
- File type
- Processing status
- Upload timestamp
- Processing duration
Processing Performance Monitoring
Performance analytics measure:
- Average processing time
- Document processing throughput
Info: Processing time varies by document size and complexity.
Data Storage Model
The system stores document files and parsed data separately, improving traceability and scalability.
| Component | Storage Location |
|---|---|
| Uploaded documents | File storage |
| Parsed JSON output | Database |
| Storage path | Metadata reference |
Future Enhancements
Planned system improvements include:
- CSV document parsing
- Excel document parsing
- Image-based document parsing (PNG, JPG)
- Improved error detection
- Automatic Payment Schedule Generation
Tip: Payment schedule generation will enable direct conversion of parsed documents into structured payment workflows.