We’re excited to be launching Reincubate Lookup today. It answers a question users kept asking: given a serial number or IMEI from an Apple device, can we precisely identify it and what it's capable of?
When buying iPhones or iPads on eBay, users may want to confirm the specs (particularly the device's age and storage). With the serial or IMEI on the listing, they can confirm specs and even whether the device was distributed locked. Here's an example serial lookup of a device that was sold carrier-locked.
Developers often need to test their apps on specific versions of iOS. Trying to find second-hand devices with an older iOS is like panning for gold, but we can identify the version a device shipped with, ruling out a bunch of listings.
Sysadmins responsible for fleets of devices only get limited specifications and device data from MDM tools. Enterprises and institutions already use the technology behind Reincubate Lookup to enrich their device inventory data.
Clients and users sent our API over 13 million requests for device information over the last year, and that gave us the impetus to go deeper.
Reincubate Lookup’s promise is straightforward: given an Apple serial number or an IMEI, it’ll provide accurate, ethical, and user-friendly data. Each of those parts of the promise has special significance to us, and there’s a story as to why they’re important. Let’s take a look.
There’s plenty of jargon when it comes to learning about mobile devices, so we spent time thinking about the best way to order, present and label the information it shows. We hold back a few data points that might be valuable to some technical users, but which might confuse other people.
The value of device data, and the difficulty consumers have in assessing its reliability has given rise to a number of services providing this data, gating it with a series of gruesome and unfriendly mechanisms to stop other people scraping it. We chose to avoid a painful captcha system, and have built Lookup to be super-fast, and to work without any interstitial loading pages.
Curiously — given this is data about mobile devices — few similar services have interfaces that work well on phones or tablets. We wanted something that was mobile-friendly and international-friendly.
I hope that we’ve done a good job with the accessibility of Reincubate Lookup. It’s been heavily tested on iPhones and iPads, and we think it looks pretty sweet. It natively supports Dark Mode, and will flip its palette to ease eye-strain. It won’t prompt you for a captcha, and it’s available in 11 languages.
We’ve been learning a lot about this data since we started building our device identifier API back in 2017 (and shipped support for rich device metadata in iPhone Backup Extractor last summer), and we found a number of hard problems that we wanted to solve.
One of the first things we learned is that the data behind Apple’s identifiers is — to be candid — a bit of a mess. Apple themselves have no single canonical model identifier, and no internal naming consistency. In some cases — we’re looking at you, iPhone 3G — the same IDs even get reused for different products.
I spent an enjoyable WWDC hoping to detect signs of internal consistency with Apple engineers and managers, but even they expressed frustration about it. Furthermore, Apple’s documentation on model identification is inconsistent, and just about every form of identifier (there are 12 main ones) gets referred to as a “model” at one point or other in their documentation.
The identifiers Apple uses have changed over time, too. There are three different formats of serial number — soon to be four — and two different formats of UDID. Family numbers once reliably started with “M”, and now they only start with “A”, and instead some MPNs now start with “M”. It goes on.
There are a range of lookup tools already on the web, but it doesn’t take much digging to find bad data in them. Usually, this is structural: the tools — such as Everymac’s — are based observed of relationships between identifier data which appear to be true… until they’re not. Some tools have enough data to recognise this, meaning a lookup results in a range of possible results rather than a single, definitive one. Others provide data that looks just about right… but isn’t.
Structural data problems aren’t the only issue. Many of these systems have flawed sources of data, relying on scraping from Apple’s GSX service, entry from technicians, or — worst of all — data entry from random nutcases browsing the web.
In setting out to get this right, we wanted to satisfy three constraints:
The system must learn as much as possible automatically and independently, as anything requiring regular data entry, editing, or moderation would rapidly age.
The system mustn’t rely on unlicensed or unreliable data. The iPhone Wiki has some data (and we contributed a bunch), but it’s not normalised isn’t entirely accurate. Apple’s GSX APIs have some data, but it’s also not normalised, has some really weird stuff going on in some of it, and isn’t licensed for use in this way.
The system must produce data and results that we'refully confident in, rather being often right. Thus where a single answer isn't available, or probability of an answer is low, the system must make that clear.
Accuracy wasn’t the only constraint, however. It didn’t take us long to realise where some of this data was sourced in the industry: by networks of employees paid to covertly copy (steal!) data from internal databases.
Until we figured how to make Reincubate Lookup work, there were only really three sources of IMEI data:
Scraping Apple’s GSX API. This is a violation of Apple’s terms and will result in API access getting withdrawn. There’s a small industry of people reselling access to GSX accounts, burning them out, and trying to get new ones. It doesn’t work.
Licensing the GSMA's IMEI database. This trade body for the mobile industry holds the canonical registry for all IMEI information, and licenses its database of type allocation codes (TACs), which are helpful for device identification. Whilst this sounds ideal, things are more complicated than that, and the organisation itself claims only 95% accuracy for its "GSMA Device Map" data. Aside from anything else, their database doesn't contain normalised identifiers for the devices that it includes. A TAC lookup on it may tell you simply that the device is an iPhone XS. That's hardly granular.
The organisation seems to be partly at war with the mobile manufacturers it represents, simultaneously running programs to maintain IMEI integrity whilst also noting "the GSMA does not have the responsibility or powers to stop illegal TAC and IMEI activity". Misregistration of this data is common enough that they heavily promote their "non-compliant TAC / IMEI reporting process".
What this means — in effect — is that there are two sets of data to think about: the GSMA's self-declared data from vendors that is incomplete and inaccurate but at least theoretically accurate, and real-world data that is accurate and normalised that one can build by examining mobile devices. We've had to build a database of 3,000 Apple device TACs that's more comprehensive and useful than the GSMA's.
Bribing staff at telcos to act as “data spies”. This is the same sort of abhorrent practice that has US cell firms selling consumer location data to bounty hunters and crazies, and the same way illegal “iCloud unlock” services work.
Some IMEI services advertise this quite flagrantly. We found this ad on the site of a UK service:
As we weren't prepared to use either approach, we had to seek out a third way of finding the data we needed: no spies, no GSX, no underhand behaviour.
The answer, essentially, lies in analysing very large amounts of data, understanding how that data is related, and boiling that down to a set of facts that get automatically updated over time. Some of this data is intrinsic to the devices themselves, and some of it can be derived by examining anonymised questions about data. If enough queries are of the form "this device is
B, is it also
C?", one can start to understand that
B are related. It’s the same technique that powered our awdit product: sometimes if you look at enough questions you can generate answers.
The analysis started to work for just about any device once we hit about 5,000,000 queries in the database, and where there’s not enough data to answer a given question, Reincubate Lookup will recommend taking a look with iPhone Backup Extractor. That’s smart enough to dig deeper and flag up the question to us.
As of today, we've identified just under 9,000 configuration codes for 2,350 models, 127 different production facilities, 304 different distribution regions, and over 3,000 IMEI type allocation codes.
So that’s how we ended up building Reincubate Lookup, and why user-friendliness, accuracy, and ethics have been important as we built it. We hope you like it, and that it provides you value. If you'd like to go deeper, all of Lookup's data comes from our DeviceIdentifier API, and we love seeing people build stuff against that.
Don’t be a stranger: please let us know how you get on. ?