Introducing Industry Classification Data

Enigma
5 min readJun 9, 2020

By Jordan Dominguez

Understanding what industry a business is in proves to be relevant for nearly every aspect of customer onboarding, underwriting, and monitoring. Without knowing how a business makes its money, it’s difficult to answer any of the following questions:

  • What is the success/failure rate of similar businesses?
  • Are the cash flows of this business healthy based on the industry?
  • Does this company engage in any prohibited or risky activities?
  • How will this company use my services?
  • How resilient is this company to economic downturns? How much might it benefit from economic growth?

In short, industry helps put everything else you know about a business in context. It tells you whether a business having a website is common or uncommon, or whether a business is relatively small or large.

The fundamental value of identifying a business’s industry is exactly why Enigma is excited to introduce our new, high-accuracy industry classification data.

Industry classification has failed financial institutions for years.

Right now, nearly every financial institution that serves small businesses cites industry classification as one of its main challenges. When interviewing institutions, we found that they struggled with poor accuracy rates, antiquated taxonomies, and that their industry data lacked granularity and coverage of small businesses.

Rampant inaccuracy

Some financial institutions cited accuracy rates ranging from 25–40%, while others provided examples such as only knowing a business engaged in retail, but not knowing what that business sold. High inaccuracy rates result in financial institutions relying on manual research, creating significant inefficiency.

Systems that don’t reflect today’s businesses

We also learned about key failings in today’s industry taxonomies. NAICS, the standard taxonomy used by most institutions, is expansive and detailed. But it groups businesses in old-fashioned and unintuitive ways, leading to nonsensical groupings of businesses. My personal favorite 2-digit NAICS is “Administrative and Support and Waste Management and Remediation Services”, which covers everything from septic tank cleaning to temp agencies. Other institutions use the GICS industry taxonomy, which provides more common-sense groupings but can lack granularity.

Insufficient granularity

Granularity provides details that are often crucial to understanding if and how you want to work with a business. For example, a construction company can be a one-person plumbing contractor, a home remodeling agency, or a commercial construction company building skyscrapers. Each of these hypothetical businesses present distinct risks and opportunities, and thus getting details that go deeper than “construction company” is essential. 6-digit NAICS codes can provide much-needed detail, but they’re inconsistent in terms of specificity and range from essential to trivial. In other words, across the 1000 6-digit NAICS codes can be the difference between whether a business is a parking lot (812930) or a pet care service (812910); but these codes can also distinguish between minutiae such as whether a company engages in Dimension Stone Mining and Quarrying (212311) or Industrial Sand Mining (212322).

Low small business coverage

Financial institutions have also told us that they have very low fill rates when it comes to identifying industries for small businesses. This low coverage can again result in more manual research, but also potentially jeopardizes the extent to which you can onboard and service small business customers.

Building accurate, powerful industry data

Overall, financial institutions are in a bind — forced to choose between accurate high-level classifications that lack important details, or less accurate and overly noisy granular classifications. We knew that this was a problem we had to fix, especially because accurate industry data makes our other data attributes even more powerful.

We set out to build our own industry classification system based on the following principles:

  • Unmatched accuracy
  • A modern and intuitive way of segmenting companies
  • Details that added insight instead of noise
  • High coverage

High accuracy through advanced data science

So far our industry classification is achieving accuracy rates of 2–3X higher than incumbent providers. This has major implications for financial institutions — readily-available and accurate data about every business’s industry will help them reduce risk exposure and related losses. Accurate industry data also allows institutions to minimize resources spent on manual research, a significant operational inefficiency.

We’ve attained this accuracy by building predictive models that reflect how a human would classify companies into an industry. We heard from many different companies that they spend countless time and resources plugging business names into search engines, combing through search results and business data aggregators to get detailed information. Based on this, we automated the manual investigation process. Through leveraging online and other public information about business and advanced linguistic models, we’re now able to replicate the human research process and classify industries far more accurately than current providers. Our internal accuracy bar is such that until an industry category is achieving 85% accuracy or higher, we don’t release it.

Classification that makes sense for modern businesses

Based on what we learned from our customers, we built an industry taxonomy that provides common-sense groupings of companies and reflects modern business models. We’re also integrating operations flags to capture how modern businesses operate (more details below).

Details, not noise

Our industry classification data provides as much granularity as a 4–6 digit NAICS code, which maintaining the high level of accuracy detailed above. We focus on providing detail for industries where it’s beneficial, not where it creates additional noise.

The coverage you’d expect from an SMB data provider

Lastly, our industry coverage extends to even the smallest businesses in ways that incumbent providers cannot. On average our fill rates are 10% higher, and we expect that number to get better as our small business data expands.

Real-life example: Enigma’s industry classification vs. 3 other providers

This is just the beginning

In the next 2 weeks, our coverage will expand from 20 industries to roughly 50 industries, including 60% of US small businesses. Further coverage expansions are planned in the coming months.

We will also be adding further nuance to industries in the form of operations flags that detail specific business activities about how companies offer goods and services. There’s more to come on this front, but we believe operational details will help our customers get an even deeper understanding of businesses and their associated risks and resiliencies.

Getting started

Enigma’s beta industry classification data is available right now via our API. You can test the data for free, and we welcome your feedback. If you have any questions or would like to learn more about our industry classification data, please reach out.

--

--