Technology Intelligence Platform

Harmonisation of applicant names
In the first three articles in our Technology Intelligence Platform (TIP) series, we explored technological fields and their evolution, then shifted our focus to time series forecasting for patent filings. By leveraging PATSTAT data and combining it with the TIP’s data processing and visualisation capabilities, the accompanying notebooks provided powerful insights into the evolution of technical fields and demonstrated how TIP can be used to anticipate trends in innovation.
In this article, we explore a critical aspect of patent documentation: handling inconsistencies in applicant names. Patent databases store applications with the corresponding applicants and their addresses exactly as they appear in the original filing. Consequently, variations in the representation of applicant names and addresses are common. These discrepancies arise from issues such as differences in word order, capitalisation preferences, inclusion or omission of accents, variations in legal entity designations and typographical errors.
Such inconsistencies pose challenges when retrieving all applications filed by a specific company. Since computers interpret these variations as distinct strings, they fail to recognise them as belonging to the same entity. The Organisation for Economic Co-operation and Development (OECD) recognises applicant name harmonisation as vital for studying innovation as it enables researchers, analysts and policymakers to track patenting activities more accurately.
Existing harmonisation efforts
PATSTAT already includes mechanisms to harmonise applicant names. The PATSTAT Standardised Name (PSN), developed by the University of Leuven, and the Harmonised Applicant Name (HAN), provided by the OECD, apply standardisation processes to applicant names.
Practical approach to cleaning applicant names
We have developed a notebook that builds an applicant name harmonisation algorithm from the ground up. Based on a typical data retrieval query in PATSTAT Global, we apply a set of techniques to cluster applicant names into potential duplicates. By exploring the notebook, you can learn how to use standard lists of variations for abbreviations of legal entities, python libraries for deduplicating records, and other tools to create your own harmonisation algorithm.
This notebook does not aim to provide a definitive or fully optimised solution. Instead, it serves as a practical guide, illustrating techniques and evaluating their impact by measuring dataset size reduction after clustering. The results are also compared with existing PSN and HAN methods to assess their relative effectiveness.
We encourage you to clone the notebook and customise it to suit your needs.
Keywords: data processing, visualisation, patent data analysis, harmonisation, PATSTAT
Related items

In-depth analyses based on EPO expertise and worldwide patent data

Don’t miss the last patent knowledge lecture of 2024!

Empowering users: the rise of language processing technology

NEW ARTICLE: PATSTAT EP Register documentation

NEW ARTICLE: Updates on the Unitary patent protection INPADOC legal events

Celebrating International Day of Happiness with a smile

Forecasting patent filings

Share your views before 17 March to help steer the Observatory’s future activities

Going round in circles on Pi Day: A short quiz for all!

Unlock powerful features to make your patent search faster, more accurate and more productive

The role of Europe in emerging technologies

Join the competition and submit your proposal by 16 March!

Celebrating the role of women and girls in science and technology communities

Unleash the full power of patent data: analysing emerging technologies

From lost in translation to insight – Part II: Office actions in machine-readable format in the EPO Global Dossier

First lecture of the year exploring the European Patent and Unitary Patent Registers

Unleash the full power of patent data

Wishing all readers happy holidays and a peaceful and prosperous New Year!

Event recordings and presentations now available!

How patent information can help track emerging trends and green technologies in the transport sector

In-depth analyses based on EPO expertise and worldwide patent data

Don’t miss the last patent knowledge lecture of 2024!

Empowering users: the rise of language processing technology

NEW ARTICLE: PATSTAT EP Register documentation

NEW ARTICLE: Updates on the Unitary patent protection INPADOC legal events

Celebrating International Day of Happiness with a smile

Forecasting patent filings

Share your views before 17 March to help steer the Observatory’s future activities

Going round in circles on Pi Day: A short quiz for all!

Unlock powerful features to make your patent search faster, more accurate and more productive

The role of Europe in emerging technologies

Join the competition and submit your proposal by 16 March!

Celebrating the role of women and girls in science and technology communities

Unleash the full power of patent data: analysing emerging technologies

From lost in translation to insight – Part II: Office actions in machine-readable format in the EPO Global Dossier

First lecture of the year exploring the European Patent and Unitary Patent Registers

Unleash the full power of patent data

Wishing all readers happy holidays and a peaceful and prosperous New Year!

Event recordings and presentations now available!

How patent information can help track emerging trends and green technologies in the transport sector

In-depth analyses based on EPO expertise and worldwide patent data

Don’t miss the last patent knowledge lecture of 2024!

Empowering users: the rise of language processing technology