Turn any type of document into structured data6 min read
Mindee’s machine learning APIs are a very convenient and powerful way to extract key information from documents. Our APIs allow you to quickly and accurately extract information from invoices, receipts, driver licenses, and more! If you’re looking for resources on how to extract data from common files, our APIs are straightforward and easy to implement.
But, standardized documents are great – for standard processes. But how many of the files we use every day are standard? Every company and every file is slightly different.
As we spoke to more and more customers about our data extraction APIs, it turned out that most of the applications that would benefit from our APIs have unique data extraction needs, and that to fully incorporate our automation tools into your toolchain would require the creation of a unique data model for each of your documents. Sounds near impossible, but….
We’re very happy to announce our Document Builder API. We can use this to train a model for any document and in just a few hours, begin extracting the data from the model!!
Spoiler: It’s Magic, but it’s not TOTALLY magic
Since the release of the Document Builder API, we’ve had a lot of exciting conversations about creating a bespoke API for each process. But, when the wheels hit the road, there was a realization. This isn’t a magical tool – there is work that has to be done first.
It is magic….but it is not totally magic. If you are planning to run a marathon, you can’t just lace up your shoes and run 42km. There is a lot of training before you can complete a marathon. In fact, there are guides and plans to help you train. We cannot (yet) just give the algorithm one file, and some criteria, and have it all work. Just like marathon training, we have to put in some effort on the front end to see the benefits from the API.
Think of this post as your guide to training your Mindee Document Builder model. I promise that training your Mindee model will be a lot easier than training for a marathon, will take a lot less time, and won’t cause any aches in your knees.
The training process
Let’s start planning the API Builder model we’d like to build. For this example, we’ll train a model to read the W-9 Tax form from the USA. We’ll build the API to extract the name, address (street address, city, state, and Zipcode), and Social Security Number from each form.
To make this fun, I’ve generated 22 W-9 forms for characters from the Harry Potter series.
The Mindee API Builder requires 20 images to be trained before any predictions can be made. There will be an initial model training, and you can begin to use the API to get results. It may not be perfect, but it will begin to work. Think of this as your first marathon attempt – you’ll finish, but you’ll learn from it, and use that knowledge to improve. The API results will be good, but they won’t yet be perfect. As you continue to train the model, it will get more and more accurate. Every 20 images trained will have the model retrain itself (40, 60, 80, etc) and after each training, you’ll see a marked improvement in how the model works. Let’s see it in action!
Preparing for training
You can follow these steps, and also follow along in this video:
We have our list of documents, so let’s begin building our model:
Step 1: Create an account at https://platform.midee.com
Step 2: Create a new API
Step 3: Name your API, and give it a description and an image:
Step 4: Now we are getting to the fun part, defining the model.
The W-9 form is used to identify each person for who you will be withholding tax for. You’ll want to extract:
- Street address
- Zip code
- Social Security Number
Identifying and naming the fields
Now we will build the model for the training. You can use many types of fields for each entry:
For the W9 forms, we’ll call them all text fields.
Each text field has a name (and the API key), and you can define whether or not it has numbers and characters:
- Name: never contains numerics
- Street address (can have both alpha and numeric values)
- City: never contains numerics
- State: never contains numerics
- Zipcode: never contains alpha characters
- SSN: never contains alpha characters
Once these are entered, the data model looks like this:
And now we are ready to train the model
Training the Document Builder Model
If you’d like to watch a video of the training:
A wise runner once said, just keep putting one foot in front of the other, and you’ll make it to the finish line. Some runs are a slog and not fun. This is the ‘not fun’ part. We’ve got to do the training so that we (and our model) are ready for production.
You can upload images (jpg, png, webp) or pdf files. Or if you have a number of files ready, you can upload a zip file.
As the files load, you can begin your training. Here is the W-9 for Tom Riddle:
Each word that fits the parameters for the name field (highlighted on the right) is marked by a blue box. Zoom with your trackpad or mouse, and you can click on the boxes that contain the name. If you accidentally click the incorrect box, you’ll see an “x” that removes it from the field on the right.
Tip: the training does not care what order you click on the words. “Tom Riddle” and “riddle Tom” are treated exactly the same way.
Continue for each field in the document, and when you complete, click the “Validate” button.
Once you train the model with 20 documents, the API goes into training mode. You’ll get an email when it completes its training. (This will happen at 40, 60, 80 documents trained as well)
So – we train the model with 20 images, and Mindee tells us that the model training is occurring. Let’s see how the model does!
a little bit later…
Watch the video showing the results of the API training:
After a few minutes, you’ll get an email that the model has been trained on the first 20 images. We’ve trained and trained, and now we can try to see if we’re ready for our marathon.
The 21st image you upload will be tested against the model, and we can see how well it is doing. In this case we’re looking at the W-9 of Cornelius Fudge (senior, we all know that junior is rotting in Azkaban)
With training just 20 documents, our W-9 API is extracting all of the fields to a high degree of accuracy!
You can build your own Document Builder API for your unique form or document. With justa little bit of training, you will have created an API that can be used in production to a high degree of accuracy!
Give it a try at https://platform.mindee.com. It is free to try out, and we’d love to hear what you think!