IDV & Biometrics: OCR Comparison

A deep dive into how FrankieOne compares OCR data with user-provided information to ensure data consistency and prevent fraud.

What is OCR Comparison?

Optical Character Recognition (OCR) is the process of extracting text data from an image, such as a driver’s license or passport. The OCR Comparison is a critical security step within the IDV workflow where FrankieOne verifies that the data extracted from the document matches the information provided by the user or already stored on their entity profile.

The primary objective is to ensure the details on the government-issued ID match the entity profile being submitted for verification, preventing “bait-and-switch” fraud scenarios.

How OCR Comparison Works

The comparison logic is triggered in two primary scenarios:

  1. Manual Input First: If a user manually enters their details (e.g., name, DOB) before the document scan, the OCR comparison is triggered immediately after the scan to check for discrepancies.
  2. Biometrics First / Post-Verification Updates: If the IDV process happens first, an entity is created using the initial OCR data. If the user later updates key fields (name, DOB, ID number) on their profile, the system re-runs the OCR comparison logic to check if the changes invalidate the original verification.

The IDV_OCR_COMPARISON Supplementary Data

The result of this check is found within a Process Result Object (PRO) where the supplementaryData object has a type of IDV_OCR_COMPARISON.

1"supplementaryData": {
2 "type": "IDV_OCR_COMPARISON",
3 "outcomeRaw": "suspected",
4 "resultMap": {
5 "last_name_match": { "resultNormalized": "clear" },
6 "date_of_birth_match": { "resultNormalized": "clear" }
7 },
8 "mismatchMap": {
9 "first_name_match": {
10 "originalData": "Jon",
11 "reviewedData": "John",
12 "resultNormalized": "suspected"
13 }
14 },
15 "ocrResultId": "pro_01J...",
16 "comparisonSource": "ENTITY"
17}

Key Fields:

  • resultMap: An object showing which fields matched successfully.
  • mismatchMap: An object that highlights the specific fields that did not match. This is crucial for debugging SUSPECTED results, as it shows the discrepancy (e.g., original “Jon” vs. OCR “John”).
  • comparisonSource: Indicates what the OCR data was compared against. This will be ENTITY (for existing entity data) or OCR_UPDATE (for data updated post-scan). [cite: 2975-2979]
  • ocrResultId: The ID of the IDV_OCR Process Result Object that this comparison is based on.

Matching Logic and Fuzziness

To account for minor OCR inaccuracies or common typos, the comparison does not always require an exact match. FrankieOne uses the Levenshtein distance algorithm to measure the difference between the user-provided string and the OCR-extracted string.

This “fuzziness” is configurable. The default configuration allows for small discrepancies before a field is flagged as SUSPECTED.

FieldDefault Levenshtein Distance
First Name2
Last Name2
Date of Birth1
Document ID Number2

There is also a compare_ocr_max_mismatch_levenshtein_distance which, by default, allows for up to 2 characters to be mismatched across all fields before the overall result is marked as SUSPECTED.


Result Invalidation

A key security feature of the OCR comparison is the re-check that occurs if a user updates their profile information after the initial IDV process. This is to protect against a scenario where a fraudster:

  1. Uses their real ID and face to pass an IDV check.
  2. Later, changes the name, DOB, or ID number on their profile to match the details of a stolen identity.
  3. Attempts to use this fraudulent profile to pass other checks like KYC or AML.

If the system detects that a change to a key field exceeds the configured Levenshtein distance (e.g., fuzzy_logic_idv_allowed_distance_family_name), the original OCR and IDV results will be invalidated by having their systemStatus changed to STALE or MARKED_INVALID.

Best Practices

  • Check the mismatchMap: When you receive an OCR comparison result of SUSPECTED, always inspect the mismatchMap to understand exactly which field caused the issue.
  • Understand Configuration: Be aware that the definition of a “mismatch” is configurable. A SUSPECTED result does not necessarily mean fraud; it could be a minor OCR error that falls within your organization’s acceptable risk tolerance.
  • Secure Profile Updates: Be mindful of the result invalidation logic. If you allow users to freely edit their profile details after a successful IDV check, it may trigger an invalidation, requiring them to re-verify.

Additional Resources