Resources >> Legal-Entity Data Principles

Trust starts here

The world needs a paradigm shift in legal-entity data accuracy, dependability and trustworthiness.

Trust can no longer be based on a provider's brand alone — not if its data is characterized by opaque curation processes, hidden bias, proprietary identifiers and restrictive access.

These principles set out to correct that. They are what we strive to achieve. We hope others do too.

Our Legal-Entity Data Principles

Ц has developed the following principles to provide a framework to close the data-trust gap.

They are both principles and statements of intent to which we are committed to achieve.

These principles are published under a Creative Commons open licence in the hope that they may be useful to others in this and other fields, and as part of the commitment to openness enshrined in our public-benefit mission.

Foundational data comes from primary sources

Foundational data should be sourced from official primary sources, not opaque third parties. What defines a primary source depends on the context and the domain – a company register may be a primary source for legal-entity existence, but not for asset lists or ownership – but without clarity of the source, and the decisions relating to that source, any dataset is built on sand.

Hidden bias is a silent killer

Bias in data comes not just from its selection, but how it is mapped, combined and transformed. These mappings and transformations, and the concepts and assumptions behind them, should be transparent to users, as should any bias in the underlying source data identified in the course of analysis.

Only full audit trails give the trust and utility users need

Full provenance is essential for trust, for risk reduction, for quality, and for utility, particularly when used in compliance, legal, master data, or investigative contexts. Only full end-to-end attribute-level audit trails provide the assurance and context users need, whether for compliance, for due diligence, or for machine learning.

Good standards reduce friction

Standards reduce friction for users, and improve consistency. Open and established standards should be used where possible and appropriate, and where standards are missing or deficient, we should work with others to create appropriate standards, sharing our expertise and domain knowledge.

Today’s world needs real-time data

Today’s data world is highly dynamic, with rapid changes to critical data. Collecting from original sources once a month, or even once a quarter, is no longer fit for purpose. Data should be collected and made available so that it is functionally real-time, that is the latency between the data change on the source and it being made available on our database is functionally insignificant.

Data quality needs a dataset perspective

We live in a data-driven world, and you cannot identify bias, systematic data-quality issues or key insights by looking at individual records. Simply passing on a mapped version of an individual record will fail to identify if it is a duplicate, or has data that makes no sense in the context of the wider dataset. Data must be continually assessed across the whole dataset, comparing not just how an individual record has changed, or whether it conforms to a schema, but how the dataset as a whole has changed.

Open is an essential component of data quality

Proprietary IDs, opaque data models, and limited feedback loops are highly damaging to data quality, introducing inevitable biases, serious systematic problems, and limiting the diversity and size of the audience. We should be transparent by default in all aspects of our data.