Dirty Data in IP Management Systems

Intellectual Property rights (IPRs) are valuable assets for any business, possibly among the most important that it has. It is therefore crucial that the associated IP data are also treated well. “Data integrity” are data that are characterised in that they have a complete structure.  All characteristics of the data including business rules, rules for how pieces of data relate to each other, dates, definitions and lineage must be correct for data to be complete.  Here we explore what “data integrity” within an IP data management system entails.

IP data that have integrity are identically maintained during any operation on the IP management system, such as data entry, data transfer, storage or retrieval.  Put in simple business terms, IP data integrity is the assurance that the IP data are consistent, certified and can be reconciled. “Dirty data” refers to the lack of data integrity to one degree or another. Dirty data is a term used by information technology (IT) professionals when referring to inaccurate information or data.

The definition of ‘dirty data’

Dirty data can have a variety of meanings:

  • Missing data
  • Incorrect data, wrongly entered to the tool
  • Incorrectly formatted data
  • Data entered into the wrong field on the IP Management System
  • Stale data, that was once correct but is now out of date
  • Missing links such as the relationship between the data in two or more fields
  • Duplicated data, where the data exists in more than one place

The root causes which lead to data becoming dirty

There are a number of possible root causes of dirty data:

  • Migration errors
  • Data entry errors
  • System design errors
  • Synchronisation problems
  • Data reporting problems
  • Maintenance problems

Migration is where data are transferred into an IP Data Management System (“DMS”) from another systems, perhaps as a result of a system upgrade or as a result of M&A activity, where data has been transferred and incorporated from an external IP DMS.  If the data are dirty before the migration, then it is likely to remain dirty after the migration, unless concrete steps have been taken to address the problem.

Data entry mistakes can be both made by IP personnel within the organisation, as well as by non-IP personnel who are given access to the IP DMS and by external IP personnel who have been provided with access to the DMS.  A certain amount of human error is inevitable, but what is the solution when the mistakes are constantly occurring, the fix would make an auditor cringe and the person or persons making the errors are taking zero responsibility, while blaming it all on the system?

DMS design and implementation errors can lead to dirty data. However, good system design can for example help to greatly reduce data entry errors, by focusing on such issues as catching exceptions, formatting, buffering and the way in which choices and selections are provided to the user.

Synchronisation, in this instance, is the maintenance of one operation in the IP DMS, in step with another step in another system to ensure overall data integrity.  Synchronisation challenges with other company systems can lead to problems with the data as it is not uncommon for the corporate IP DMS to be linked electronically with other corporate systems in the company, used for example by HR or Finance.  Combine this with systems belonging to an IP Renewals/Annuities Payment provider and possibly with the system belonging to an IP Agent network and your synchronisation challenges can be even greater.  Creating reports using the data can itself present the problem of dirty data within the actual reports, if there are errors with the scripts or problems with the reporting functionality of the system.  It can also be due to lack of understanding of the data structure within the system. The data within the IP DMS may not be being properly maintained.  If data within the system is not being updated on a regular basis, as it should be, this can lead to dirty data problems within the system. So, there are several causes of dirty data.

Where are the ‘dirty data’?

Dirty data can exist in the data fields associated with any of the key IP process areas such as IP creation, IP portfolio management and IP utilisation. Problems can be linked to data fields used in the front end, for example in the patent creation process from inventor and invention report, through to Patent Committee or Patent Board decisions.  Problems can also exist in the data fields used in the actual patenting process from drafting, to first filing, or foreign filing and through prosecution through to granted patent. Dirty data can also occur in the IP portfolio management process in data fields used during management the IP assets, and in the IP utilisation phase in data fields used in license agreements and contracts.

Why is ‘dirty data’ an issue for IP?

If it exists, then dirty data is a serious issue for any corporate IP Department or any IP Agency as it can lead to liability issues or a loss of rights.  The ‘rules’ may not run for example for the proper creation of patent families, key dates may be missed or the wrong data may be sent to the IP Office.  Correspondence may be sent to the wrong person or IP reports with incorrect data may be created and used in the decision making process.  IP data is ultimately used for IP management purposes and will be utilised for well informed decision making.  Dirty data may lead to the wrong decisions being made.

Why is ‘dirty data’ an issue outside of IP?

IPR data are utilised not just by the IP department . IP data are important in doing business, technologies, products and services are concerned as it forms an integral part of many legal agreements and contracts.  IP data is more frequently being reported to, and utilised by, Senior Management within the Corporation so ‘dirty data’  in IP can adversely impact activities and decision making outside of the corporate IP dept.

Cleaning up the data

Firstly, some understanding is needed of how serious the problem is with ‘dirty data’.  Questions to ask include how and why it has occurred and where is it happening?  If the challenge with dirty data is large, then what is the prioritisation?  Only when all the previous questions have been considered should the clean-up exercise be undertaken.  Cleaning the data may involve using dedicated IP Service Providers and/or developing some automatic scripts and tools.  It will almost definitely involve some manual hard work.

A three stage process is strongly recommended:-

  • Corrective actions to fix any problems
  • Understanding of the root cause
  • Preventative actions to stop problems repeating (processes, systems, education, checks)

‘Dirty data’ cannot be tackled in isolation

Data quality issues cannot be tackled in isolation.  Data quality is interlinked with the IP processes or ways of working which are adopted in the company, the IP systems and tools in use, various legal matters and of course the actual people involved.  Last but not least it involves management and leadership.

Best practices

A number of best practices exist to help address dirty data issues within an IP DMS:-

  • Control the data entry
  • Define mandatory and optional data fields properly
  • Assign rights and roles both for IP and non IP personnel with access to the system
  • Assign personal responsibility
  • Keep a change history
  • Design ‘intelligent’ data fields
  • Use tools to measure and clean the data on a regular basis
  • Make data management a living process
  • Measure, measure, measure!!!

The best approach is to make data quality management an on-going process and an integral part of IP management within the organisation


To properly address dirty data problems within an IP DMS, it is important to adopt a recognised iterative four step problem solving process.  ‘Plan, Do, Check, Act’.  This first step is to thoroughly evaluate and analyse the problem and decide if, what, where and how dirty data is a problem and what needs to be done to rectify the situation. The second step involves making the necessary improvements, often on a small scale initially.  The third step involves checking the situation and comparing actual results versus planned results.  The final step is to analyse the differences to determine their causes.

When your dirty data challenge has been addressed, it is most important not to just forget the problem and move onto the next issue.  Metrics should be defined, agreed and implemented and regular data reports created so that you know precisely the situation with your data integrity going forward and so that you can react quickly if things go amiss again in the future.

As stated at the beginning IPRS are valuable assets for any business. It is therefore imperative that the associated IP data is also treated with the respect that it deserves, and that any dirty data challenges are tackled and resolved.

Donal O’Connell, IPEG Consultancy, Chawton Innovation Services

All images and illustrations used in our posts are licensed and have been legally acquired through official sources such as Adobe Stock