This summer, we trained a neural network to determine whether there are identification documents in photos, and if so, which kinds.
Why this was necessary
We wanted to lighten our colleagues’ workload and protect people from scammers. The new neural network serves two functions: to allow users to restore access to their account and to hide personal documents from the general search.
Restoring access to accounts. Photos of identification documents are used to help original owners recover their accounts in case they lose access to their phone number for example, or if 2-step verification is enabled but it is no longer possible to receive a one-time login verification code. The new neural network speeds up the appeal process as moderators no longer have to manually return incorrectly filled-out applications every time. Now the system does not allow users to submit the form without the necessary images attached, and if an attached photo doesn’t contain a picture of the ID, they will be requested to add one. Of course, restoring access to an account is only possible if the profile page contains actual photos of its owner. Since we are talking about account security and the safety of personal data, there is no room for mistakes.
Filtering search results in the “Documents” section. All documents that users upload to this section or send in private messages are, by default, hidden from prying eyes and aren’t included in search results. However, you can set the preferred level of privacy for each individual file manually. Before the neural network, many documents containing sensitive data could be found by searching certain keywords. The owners of these files had changed the privacy settings themselves. To protect the privacy of our users’ personal information, we began removing photos containing identification documents from public search results.
How we approached the problem
It might seem that the easiest way to recognize identification documents in an image is to set up a neural network and train it from scratch on a large dataset. But that’s easier said than done.
Any dataset used needs to be representative. It’s difficult to collect a sufficient number of real samples since open access databases with real identification documents just don’t exist.
There are many systems that are able to identify and parse documents. They are typically made to collect specific information from a photo. To do so, they need the quality of the original image to be ideal. To reach the image quality required, some government service portals require their users to position their passport along the edges of a specific template.
Such systems wouldn’t work for our purposes. When users go to restore access to their accounts, we tell them that they can hide all the information on their document except for their photo, first and last name, and the stamp (if there is one). However, we still need to identify the document even if the series and number are hidden, if the passport was photographed with something in the background or, conversely, if only a part of the document (the one with the user’s photo for example) is shown. Different lighting and angles also have to be taken into account. The neural network needs to be able to recognize these documents regardless of all these various aspects, and the problem is in training it to do so.
There are some other challenges. Passports, for example, are difficult to distinguish from other types of documents, as well as from various handwritten and printed materials.
Attempting to take the easy way was unsuccessful. The resulting classifier turned out weak, with a small Type-1 error and a large Type-2 error. For example, there were interesting cases when people wrote first and last names by hand, drew photos and passport covers, and the system carelessly accepted such documents.
What we came up with
It turned out that the best solution, in our case, was to use an ensemble of grids and facial recognition to identify documents and determine their type. We also added a differential classifier, which included an encoder to highlight characteristic features, and a form classifier to allow us to distinguish documents from other images. In addition to this, preliminary clustering of the training set was done to normalize the data. Of the possible architectures, VGG and ResNet proved to be the best.
The base classifier “Document/Not Document” works on a configured 19-layer VGG with a stratified sample. On top of that, a combined ensemble of classifiers is used to reduce Type-2 errors and differentiate the result. First, stratified sampling is performed, then an encoder is used to extract near contour information, then a modified VGG, and finally, a single grid. This approach allowed us to minimize Type-1 errors to approximately 0.002. The probability of a false negative depends on the chosen dataset and the specific area of application.
Now we have learned how to automatically detect the presence of passports and driver’s licenses in pictures. Recognition successfully occurs at any angle, with any background, even in poor lighting conditions. What matters is that the image contains the part of the document with the photo and name. To identify other document types, we’d need corresponding datasets. We trained the network using our own data, a non-representative dataset containing from five to ten thousand documents. For other images, the dataset is random, but apriori clustering is used in both cases.
From a technical standpoint, the system is written using python / keras / tensorflow / glib / opencv. For practical use of the new system, all that’s required is to integrate it into the machine learning infrastructure’s python handlers. At the same time, a detector is added that can tell if photos have been changed in graphics editors, but this topic deserves its own separate article.
What we achieved
Now 6% of requests to restore access are automatically returned to the applicant with a request to add or replace the document photo, and 2.5% of applications are rejected. If you look at image analysis in general, including heuristics and facial recognition in photos, then it automates up to 20% of the department’s work.
After the launch of the neural network, we were also able to count the number of passports uploaded to the “Documents” section. It turned out that there were approximately 2,000 identification documents appearing in general search results each day. Now the likelihood that they would fall into the wrong hands is minimal.
Neural networks are already helping us fight spam and various kinds of fraud. We will continue our experiments and tell you about them in our blog.