Data Discovery and Classification: Turn Data Inventory to Evidence

Jul 01, 202618 minute read

Data Discovery and Classification: Turning Your Data Inventory Into Evidence

Data Discovery and Classification: Turning Your Data Inventory Into Evidence

TL;DR: Most organisations treat the data inventory as a documentation deliverable: a spreadsheet refreshed before an audit, accurate for about a week, and impossible to defend line by line.

Regulators typically expect organisations to demonstrate an accurate and current inventory rather than simply presenting a spreadsheet. They ask what personal data you hold, where it lives, how sensitive it is, who owns it, when you last verified it, and who approved each decision. That is not a documentation question. It is an evidence question.

This article explains why data discovery and classification are the foundation every downstream obligation stands on, why an inventory without lineage is a liability rather than an asset, and how AesirX ComplianceOne turns discovery, classification, and data mapping into a living evidence base: discovery cycles that detect what changed since last time, AI-assisted classification suggestions that a human owner must approve, a proposed-to-approved label lifecycle that records who decided what and why, and a Records of Processing inventory whose completeness is scored field by field against PDPL, GDPR, and the frameworks you have installed.

The inventory stops being a document you produce for the auditor and becomes the proof itself.

This article is written for data governance leads, DPOs, compliance managers, CISOs, internal audit leads, and the department data owners who actually know what their systems hold. It is especially relevant for banks, telcos, payment platforms, e-commerce groups, and multinational subsidiaries operating under Vietnam's Personal Data Protection Law (PDPL) and Decree 356, alongside international frameworks such as GDPR, ISO 27001, and ISO 27701.

The inventory that was true for a week

A compliance manager at a Hanoi insurance group gets a call on a Tuesday. The Ministry has opened a routine review, and the inspector wants something that sounds simple: a complete list of every system that processes personal data, the categories of data each one holds, how sensitive that data is, who owns it, and when the organisation last verified that the list was correct.

The compliance manager has a list. It lives in a spreadsheet assembled eight months ago during the last big governance push. It was accurate for about a week. Since then, two systems on it have been decommissioned, a new customer-analytics platform was stood up by the marketing team without telling anyone, and the payroll system migrated to a new vendor. Nobody can say who labelled the claims database as confidential, on what basis, or whether that label was ever reviewed by someone with the authority to set it.

In many organisations, the inventory exists, but the supporting evidence needed to demonstrate its accuracy may be incomplete.

That gap, between having a data map and being able to prove it, is the most expensive blind spot in enterprise compliance. Every obligation downstream depends on it. You cannot run a meaningful impact assessment, file a cross-border transfer dossier, fulfil a rights request, or notify a breach accurately if you do not know what data you hold, where it lives, and how sensitive it is. And you cannot defend any of it if your inventory cannot show its own history.

“An inventory you cannot prove is not an asset. It is a liability with good formatting.”

Data discovery and classification are usually treated as the boring first step before the real compliance work begins. They are not the first step. They are the foundation the rest of the building stands on. This article is about treating them that way.

The inventory goes stale the moment it’s finished

Most enterprises build their data inventory the same way: once, under pressure, by emailing every department a spreadsheet template and asking them to fill it in.

The result is a snapshot, and a snapshot starts decaying immediately. Each department keeps its own version in its own format. There is no reliable way to tell which entries are current and which are a year out of date without manually chasing every owner. By the time the central team has consolidated everything into one master file, the early submissions are already stale.

I hear the same frustrations from data governance leads across every sector:

Consolidation is a quarterly ordeal because each department maintains its inventory differently.
There is no staleness detection, so nobody can separate a current entry from an abandoned one without manual follow-up.
The annual governance package is assembled by hand from disparate sources, with significant rework every cycle.
Cross-border data flows are tracked in yet another separate spreadsheet that nobody fully trusts.

A static inventory answers the question "what did we have at the moment we built this." A regulator asks "what do you have now, and prove that you knew."

A label without lineage is just an opinion

Classification has the same weakness in most organisations. Someone, at some point, decided that a dataset was confidential or that a field counted as sensitive personal data. But the decision left no trace.

Ask the harder questions and the classification falls apart. Who applied the label? On what basis? Was it a manual judgement, an inherited default, or a guess? Did anyone with authority review it? Has the basis changed since? When a label cannot answer those questions, it is not a classification. It is an opinion someone typed into a cell.

This matters because sensitivity drives obligation. Under Decree 356, the distinction between basic personal data (Article 3) and sensitive personal data (Article 4) changes what safeguards, what assessment, and what handling rules apply. If you cannot prove how a dataset was classified and by whom, you cannot prove that the controls wrapped around it were the right ones.

The downstream cost of not knowing what you hold

Every high-stakes compliance workflow assumes an accurate, current, classified inventory underneath it.

An impact assessment under PDPL Article 21 and Decree 356 Article 19 starts by describing the processing and the data involved. A cross-border transfer dossier under PDPL Article 20 and Decree 356 Article 18 depends on knowing exactly which categories of data leave the country and where they go. A rights request cannot be fulfilled completely if you do not know every system that holds the requester's data. A breach notification has to describe the data affected, which is impossible if the record of what that system held is wrong.

When the foundation is unreliable, every floor above it is unreliable too. The data inventory is not a side project. It is load-bearing.

discovery is a cycle not a one-time scan

Discovery is a cycle, not a one-time scan

The first shift is to stop thinking of discovery as a project with an end date and start running it as a repeatable cycle.

In ComplianceOne, data discovery is organised around discovery cycles you create, execute, and formally close. Each cycle scans or ingests the systems and sources available to it, creates or updates the system and dataset records it finds, and, critically, runs delta detection that highlights what has changed since the last cycle. The output is not just a list. It is a list plus a record of what is new, what moved, and what has disappeared.

The mechanics that make this usable in a large environment:

A staging area where discovered systems are reviewed before they are admitted to the inventory, so a scan never silently pollutes the source of truth.
Deduplication rules that stop the same system being recorded three times under three names.
Confidence-scored, AI-assisted classification suggestions that propose a label for each finding rather than leaving it blank.
Import templates and connector-based discovery for the systems a scan cannot reach directly.

The cycle is the unit of work. Because every cycle produces a delta against the last one, the inventory stops being a snapshot and becomes a time series you can actually defend.

Every finding carries its origin

A discovery that cannot say where it came from is no better than a manual guess.

The design rule is that every discovered record retains its source, the scan or import batch it came from, and the timestamp it was found. Every proposed classification preserves the original machine suggestion, even after a human changes it. That preservation is what turns a label into evidence: you can show the system's suggestion, the human's decision, and the distance between them.

This is the difference between "the database is confidential" and "the discovery scan on this date proposed confidential at this confidence score, the data owner reviewed it on that date, and the DPO approved the enterprise baseline." One is a claim. The other is a record.

“Classification becomes evidence the moment it remembers who decided, when, and on what basis.”

Classification with an approval gate

A label that takes effect the instant someone types it is a label nobody is accountable for. ComplianceOne puts a gate in front of it.

New classification labels move through a proposed-to-approved lifecycle. A proposed label is routed for owner review based on the organisation's hierarchy before it is applied. Only after the right person approves does the label become the current truth. Conflicts and disagreements do not vanish into a comment thread: they surface in a dedicated exception queue that escalates to the appropriate data owner or, where needed, the DPO.

The supporting machinery makes the taxonomy itself governable:

A taxonomy builder for custom schemes such as public, internal, confidential, and restricted, rather than a fixed vocabulary that does not fit your business.
Gap reporting that surfaces unlabelled assets and policy gaps instead of letting them hide.
Lifecycle and retention tracking that flags data exceeding its retention period.
An enterprise data map that gives leadership a department-level view of how data is classified across the business.

The point of the gate is not bureaucracy. It is that every label can name the person who stands behind it.

Owner review routing puts the label in front of the person who knows

Central compliance teams do not know what every system holds. The department owners do. The hard part is getting the right slice in front of the right person without burying them.

ComplianceOne routes each proposed classification to the correct department or application owner, and shows that owner only their slice rather than the entire enterprise map. For each item they can approve, modify, reject, or mark it as unknown and needing escalation. Every override is captured in full: the actor, the time, the previous value, the new value, and the reason for the change.

This solves the problem from both ends. The governance lead gets decisions from the people with actual knowledge instead of guessing on their behalf. The department owner is asked a narrow, answerable question instead of being handed a twenty-page document and told to review it. And because every change records its reason, the review history is itself part of the evidence base, not a side effect of it.

From classified inventory to the Records of Processing

Discovery and classification produce a current, labelled, owner-approved inventory. Data mapping turns that inventory into the record a regulator actually asks for.

Confirmed discoveries flow directly into the data mapping workflow, maintaining an auditable lineage from scan finding to approved classification label to processing record. From there the inventory becomes structured: systems, data categories with sensitivity levels, processing purposes with their legal bases, retention policies, and data flows with cross-border indicators. The processing activity records are structured to satisfy the Records of Processing expectation that GDPR Article 30 codifies and that Vietnamese practice increasingly expects.

Two capabilities make the record defensible rather than merely present:

A per-field Data Registry that inventories personal data down to the individual element, attaches each one to a data category, and can import deterministically from a regulatory register submission such as the PDPL personal-data-inventory register, detecting conflicts as it goes.
Attestation cycles that assign departments to periodic reviews, detect stale entries, track deadlines, and assign Responsible, Accountable, Consulted, and Informed roles across systems, flows, activities, categories, purposes, and retention policies.

The inventory is no longer a spreadsheet someone owns in name only. It is a governed record with named accountability and a review rhythm.

Find the gap before the regulator does

The final piece answers a question most organisations only get asked during an inspection: is the record actually complete?

The Records of Processing completeness report scores every in-scope processing activity, field by field, against the fields a given framework requires. It shows where critical, high, and conditional fields are missing, drills from a per-field chart into the specific activities that are short, and gives you a direct Fix action that opens the relevant activity at the exact field that needs attention. You choose the framework you are scoring against from the ones you have installed, whether that is PDPL, GDPR, or another.

This is hygiene before exposure. Instead of discovering during an audit that a third of your processing records are missing a legal basis or a retention period, you see the gap on a dashboard, with a button that takes you straight to the fix.

walkthrough 3 moments in the discovery journey

Walkthrough: the governance lead running an enterprise discovery cycle

Consider a data governance lead at a large enterprise launching the annual data mapping program. The job is to discover systems and datasets across dozens of business units, get them classified correctly, and produce something the governance committee and the regulator can both trust.

The lead opens a new discovery and classification cycle. The platform scans and ingests the available systems, creates or updates the discovered system and dataset records, and proposes a classification for each finding. Those proposals are routed automatically to the correct department and application owners, who approve, modify, reject, or escalate them. As approvals land, the results consolidate into the data inventory register, the classification record, and the governance ownership mapping, with the lineage from each finding back to its source preserved.

Before anything is published, the governance lead and the DPO work the exception queue: unresolved mappings, conflicts, and any item flagged as high-risk or sensitive. Sensitive classifications get a privacy or legal review where needed. Only then does the lead publish the approved baseline, which becomes the current enterprise truth and stays versioned. The final act is an export: the approved data map, the classification register, the ownership mapping, and an evidence pack for the governance committee. None of it was assembled by hand. All of it carries its own history.

Walkthrough: the department owner reviewing only their slice

Now consider one of those department owners, a privacy champion in retail banking who is not a compliance specialist and has a day job.

They receive a task that is narrow and specific: review the proposed classifications for the systems their department owns, not the entire enterprise map. The mandatory fields are clear. They can approve a label, change it with a reason, attach supporting evidence, or escalate something they are unsure about. They can see the status of their contribution at every step, from draft to submitted to approved or returned for revision.

Crucially, they are not asked for the same thing twice. Where they contributed data in a previous cycle, it carries forward rather than arriving as a fresh blank template. The narrow ask, the clear status, and the absence of duplicate requests are what make department-level participation sustainable instead of resented. And every decision they make, with its reason, becomes part of the record.

Walkthrough: when a delta triggers a re-review

The most valuable moment is the one most inventories miss entirely: something changed.

A later discovery cycle finds a new analytics platform that was not there before, and detects that a system materially changed how it processes data. The platform does not wait for the next annual push. The delta surfaces, and a re-review cycle is triggered so the new and changed assets are classified and owned before they drift into the inventory unexamined.

This is also where discovery connects to the rest of the regulatory machine. When a change touches a processing impact assessment or a cross-border transfer that was already filed, the obligation to keep dossiers current applies, which PDPL Article 22 and Decree 356 Article 20 address through the dossier update path and its supporting forms (Mẫu số 03a and 03b). When a discovered flow carries data across a border, the cross-border indicator flags the downstream transfer assessment under PDPL Article 20 and Decree 356 Article 18 rather than letting it pass unnoticed. Regular discovery cycles help keep the inventory current over time, and it pulls the dependent obligations along with it.

The shift

The change I am arguing for is small to describe and large in consequence. Stop treating the data inventory as a document you produce for an audit, and start treating it as the evidence base the audit is testing.

When discovery runs as a cycle, classification passes through an approval gate, and every decision retains its actor, its reason, and its lineage, the inventory answers the three questions a regulator actually asks. What do you hold: An effective discovery process helps answer that question, and keeps answering it as things change. How sensitive is it and who says so: A governed classification process helps answer that question, with a named approver behind every label. Can you prove it: the lineage answers it, because the record remembers its own history.

“The data map stops being something you maintain for the regulator and becomes the thing that proves you were right.”

Moving beyond the spreadsheet

Discovery and classification are not the unglamorous prelude to compliance. They are the part everything else depends on, and the part regulators increasingly test first, because an organisation that cannot describe its own data accurately cannot credibly claim to protect it.

If your inventory still lives in a spreadsheet that was true for a week, the gap to a defensible evidence base is no longer a tooling problem. Discovery cycles, an approval-gated classification lifecycle, owner review routing, a per-field data registry, and a completeness report that finds the holes before an inspector does all exist in one place today.

If you want to see what an approved enterprise data map with full lineage actually looks like, or how the completeness report scores your processing records against the frameworks you operate under, the team at AesirX is happy to walk through it. Visit https://aesirx.io/compliance-one for the current product page, or get in touch to see it run against your own inventory.

Ronni K. Gothard Christiansen
Technical Privacy Engineer and CEO, AesirX.io

Laws and standards referenced

Vietnam: Law on Personal Data Protection (PDPL), Articles 20, 21, 22
Vietnam: Decree 356/2025 (Decree 356), Articles 3, 4, 18, 19, 20, with Mẫu số 03a / 03b / 09 / 10
International: GDPR Article 30 (Records of Processing Activities)
International: ISO 27001 and ISO 27701 (information security and privacy information management)

Disclaimer

This article is operational guidance from a platform vendor, not legal advice. Specific regulatory positions, especially around the PDPL articles, the Decree 356 procedures, and the classification of basic versus sensitive personal data, should be confirmed with qualified Vietnamese counsel for your specific industry and supervisor.

Frequently Asked Questions About Data Discovery and Classification

Answer: Data discovery is the process of finding the systems, applications, and datasets that hold personal data across an organisation, including the ones nobody told the compliance team about. Data classification is the process of labelling that data by sensitivity, such as public, internal, confidential, or restricted, so the right controls and obligations can be applied. Discovery answers "what do we have and where," and classification answers "how sensitive is it and how should it be treated." In a mature program the two run together: discovery surfaces the asset, and classification labels it, ideally through a review that records who decided and why.

Answer: Because a regulator or auditor does not test whether you have a list, they test whether the list is accurate, current, and defensible. A spreadsheet can show what someone believed was true on the day they typed it, but it cannot show who classified each asset, on what basis, when it was last verified, or what changed since. An inventory that functions as evidence retains that lineage, so when an inspector asks "prove that you knew this system held sensitive data and that the right person approved that label," the answer is a record rather than an argument. The difference becomes decisive during a breach investigation, a cross-border filing, or a rights request, where an inaccurate inventory directly undermines everything built on top of it.

Answer: Automated discovery runs as repeatable cycles rather than as a one-time project. Each cycle scans or ingests the available systems, updates the records it finds, and performs delta detection that highlights what is new, what changed, and what disappeared since the previous cycle. New and materially changed assets can trigger a re-review automatically, so they are classified and assigned an owner before they drift into the inventory unexamined. A staging area lets teams check discovered systems before they are admitted to the source of truth, and deduplication rules prevent the same system being recorded several times. The net effect is an inventory that is a maintained time series rather than a snapshot that decays the moment it is finished.

Answer: A defensible classification can name the person who stands behind each label and show the basis for it. That requires an approval gate, where a proposed label is routed to the appropriate data or department owner for review before it takes effect, and an exception path that escalates conflicts to a data owner or DPO rather than burying them. It also requires preserved lineage: the original machine or system suggestion is retained even after a human changes it, and every override records the actor, the time, the previous and new values, and the reason. With that history in place, a classification is no longer an opinion in a cell. It is a reviewed decision with a documented basis, which is exactly what an auditor is looking for.

Answer: A classified, owner-approved inventory is the raw material for the Records of Processing Activities, the structured register of systems, data categories, purposes, legal bases, retention, and data flows that GDPR Article 30 codifies and that Vietnamese practice increasingly expects. Confirmed discoveries flow into the data mapping workflow with their lineage intact, and a completeness report scores each processing record field by field against the framework you are operating under, such as PDPL or GDPR, so gaps are visible before an inspection. From that same foundation, downstream PDPL obligations become tractable: impact assessments under PDPL Article 21 and Decree 356 Article 19, cross-border transfer dossiers under PDPL Article 20 and Decree 356 Article 18, and the dossier updates under PDPL Article 22 and Decree 356 Article 20 all draw on a current, classified inventory rather than starting from a blank page.

Enjoyed this read? Share the blog!

TABLE OF CONTENTS

Download Regulatory Playbook for Vietnam

Data Discovery and Classification: Turning Your Data Inventory Into Evidence

Data Discovery and Classification: Turning Your Data Inventory Into Evidence

The inventory that was true for a week

The inventory goes stale the moment it’s finished

A label without lineage is just an opinion

The downstream cost of not knowing what you hold