Purpose built vs. general purpose OCR – which is better?

Is Tesseract receipt OCR enough? Discover a more intelligent solution.

Published in

Receipt and Invoice OCR

Amazon’s Textract, the open source tesseract, or OCR in off-the-shelf PDF readers are great for digitising random notes, books, or newspapers…but what if you’re working with receipts and invoices instead?

General purpose OCR is just not going to cut it there because the problem is much more specific, and needs something more than just elbow grease.

Also, you’ll often need to do it at scale. When it’s tax season and you need to digitize and file hundreds of receipts and invoices, all of which contain critical financial data, even an OCR algorithm that’s always 100% accurate would be too dumb to extract meaning from receipts in bulk to give you structured, actionable data.

General Purpose OCR: The 30,000 Foot View

To understand TAGGUN’s solution, let’s look at why receipts are a pain without a specialized receipt data engine:

There’s no standardized way of putting information into receipts, or to format them (layouts, spacing, font choice). How do you show vendor headers, subtotals, taxes, totals?
Often, there will be lines on receipts that are simply meant to be ignored – maybe because of regulations as old as the POS machine, and the software either hasn’t been updated, or can’t be.
Scans or images of receipts won’t be perfect. There will be lighting/angle/blur issues, and you’d have to first edit and fix each image to even make a Computer Vision (CV) analysis possible.
The biggest challenge by far is interpreting meaning once you have the text. How do you even know you’re looking at a receipt? How do you know which random sequence of characters is vendor names, geolocation, item names, codes, prices for each? How do you then extract them out as structured data that can be acted upon?

So the key here is context – and general purpose OCR cannot extract that from receipts for you.

Sure, you can add more resources to help out, but this problem space is just too specific for them to cover meaningfully. They can extract text blocks from images, and if you feed it templates they’ll be decent at parsing receipts from one – maybe two— specific vendors, but the requirements will skyrocket; forget ever doing it at scale.

The only real solution is one that has been trained to know what receipts and invoices are, what they look like, and how they are meant to be read…one that also learns, adapts, and gets better each time it does so.

And that’s where we come in

Our receipt scanning API combines state-of-the-art OCR with cognitive learning technology – a combination of machine learning, regular expressions, NLP, and fuzzy-matching for logically related fields – that can:

scan and parse receipts in 205 languages, without needing templates,
extract information while preserving its context,
and output structured data that can be easily read by humans and machines alike.

Best of all? It operates as a traditional RESTful API in the cloud. You simply send images of receipts – whether photographs or scanned copies, in JPEG, PDF, PNG8, PNG24, GIF, and HEIF formats, or just the URL of the image on your servers – and within seconds get back JSON results with an accuracy rate of 90%+, processed realtime.

Now, the million dollar question.

Why does TAGGUN work better than General Purpose OCR?

Purpose Built

TAGGUN has been extensively trained on millions of receipts. We can parse structured, contextual data from just pictures of receipts – even if it is in a previously unseen format, or a completely different language.

For example, if you’re trying to budget for your household, all you have to do is:

Point your phone at each receipt
Take a picture
Send it to TAGGUN
Get back data as simple (just the totals) or as complex (vendor information, quantities and prices for each item, taxes, etc.) as you want
Move on to the next one.

It’s literally that simple. With TAGGUN, you’re saving precious time and effort in picking through a mountain of receipts.

Even if general purpose OCR APIs worked with 100% accuracy, you’d get a long block of digitised text that you’d still have to scroll through to find the data you want – not nearly as useful as separate fields clearly marking date of purchase, vendor name and location, item ID, name, taxes, and cost.

Only a specialised engine can create structured data from unstructured receipts (which might even be handwritten!).

In fact our platform has the capability to recognise and extract a large number of receipt data points unique to certain countries or regions. For example, the VAT validation API field streamlines the process of verifying VAT numbers directly from scanned receipts. This integration ensures that the financial records are not only digitised but also validated for legitimacy, providing an extra layer of fraud prevention.

Accurate, And Fast

Imagine having all that AI-powered goodness…and then combining it with Google Vision AI/Azure Cognitive Services, the two best raw OCR technologies. TAGGUN gives you a powerful solution with 90+% accurate results in under 5 seconds, with confidence metrics for each extracted field so your team can inspect and review data with low confidence scores to get even more accurate.

It’s the difference between having a middle schooler read an article about the stock market vs. an actual investment professional. Both can read it, but only the latter can understand it – and do it much faster because they know what they’re looking for.

When you’re this good, other solutions would have to brute-force the problem with manpower like AWS Mechanical Turk to be more accurate. And at that point, it’d no longer be a viable no-queues, real-time solution that you could integrate anywhere.

Built-In Pre And Post Processing

When digitising receipts and invoices for your business, you’ll often run into ones that are low quality, or have artifacts. Most will be from non-employees, so you can’t really control this process. To even get started with these, you’d have to first edit the images to fix noise, blur, contrast, orientation and so on, and then post-process results to account for errors (using OCR merging, error models, etc.)

We handle all of that for you, and you could forget these parts of the pipeline even exist. From your point of view, it’s an end-to-end solution. You’re sending in images/scans of receipts, and getting back extracted information; saving on employees’ time and investors’ capital in the process.

Existing general purpose solutions, on the other hand, are like getting quite a capable motor for free (recognising text), but it’s up to you to build the rest of the car around it.

Easy, Zero-Friction Integration

TAGGUN is a conventional RESTful API that lives in the cloud, and integration into your tech stack is trivially fast; a matter of writing boilerplate code in your language of choice to make POST requests to it with receipt(s), and getting back data. Time-to-market being this low means you can instead focus on parts of your business that actually need that time and effort.

General solutions are either too inflexible, or complex SDKs and libraries like tesseract, where the expertise/development costs alone might make the option a non-starter.

Saves You Infrastructure And Hardware Costs

TAGGUN is a REST API gateway to a powerful AI engine with self-learning – so adopting it would save you money in building an in-house ML/Deep Learning solution (i.e., hiring professionals, high-performance GPUs, rented servers, sourcing large enough datasets to train the AI) and yearly expenses to maintain all of that.

TAGGUN’s pricing is transparent, predictable, and a fraction of what building your own infrastructure from scratch would.

Also, TAGGUN is language-and-platform agnostic, while existing solutions might require vendor specific hardware – PyTorch and CUDA libraries as dependencies (Meaning an Nvidia GPU, with the CPU fallback in its absence downgrading performance considerably).

Reliable Support

You can count on TAGGUN’s dedicated Global Services team for top notch, prompt support whenever, and wherever you need it. Based on your feedback and usage patterns, our engineers can even train the AI with your data, or build you a custom-made AI model that fits you precisely within weeks, not months or years.

Conclusion

‍

Digitising receipts and invoices is hard.

Just throwing them into a PDF isn’t enough. Just extracting a long block of raw text isn’t enough. For precise, time-sensitive goals like budgeting, filing tax returns, making accurate cash flow forecasts, managing employee expenses, and so on, you need a solution that understands the problem with receipts just as well as you do, and can deliver – in real-time – contextual data that gives you the confidence to make informed decisions.

Businesses requiring precise and efficient digitisation of receipts and invoices can benefit from a receipt processing API designed specifically for this purpose, offering real-time processing, multi-language support, and the ability to handle various file formats and data points without the need for extensive pre- and post-processing.

So, really, the choice between TAGGUN and general purpose OCR solutions comes down to this: do you want to work smarter, or harder?

Shouldn’t be too difficult a decision.