Database Health Report

Cover ★ Executive Summary 1 · Schema Quality 2 · Normalization 3 · Security 4 · AI Readiness 5 · Compliance 6 · SQL Remediation 7 · Roadmap

ThunderScan · Lightning Strikes My DB

A comprehensive analysis of schema quality, normalization integrity,
security posture, and AI readiness for your production database.

📊 Full Scan 🔒 Confidential 7 Chapters 12 Issues Found

Database

portfolio_analytics

Host

prod-db.goldensection.vc

Scan Date

February 23, 2026

Duration

4 min 52 sec

Prepared For

Isaac Shi · Golden Section VC

Overall Score

67 / 100

Executive Summary3

Critical Findings3

Issue Distribution3

Chapter 1 — Schema Quality & Structural Integrity4

1.1 Score Breakdown4

1.2 Missing Foreign Key Constraints5

1.3 Duplicate Entity Detection5

Chapter 2 — Normalization Diagnostics6

2.1 NF Level Distribution6

2.2 Per-Table Normalization Violations7

Chapter 3 — Security & Compliance Posture8

3.1 Security Score & Findings8

3.2 PII Column Inventory9

Chapter 4 — AI Readiness Evaluation10

4.1 AI Readiness Dimension Scores10

4.2 AI Blockers11

4.3 Top Vectorization Candidates11

Chapter 5 — Compliance Framework Mapping12

Chapter 6 — SQL Remediation Scripts13

Chapter 7 — Prioritized Remediation Roadmap15

★ Executive Summary

Top-line health scores and critical findings at a glance

Overall Score

Needs Attention

Schema Quality

Moderate

Security Score

Risk Present

AI Readiness

Blockers Found

Critical Findings — Immediate Action Required

CRIT

Unencrypted PII — users.phone_number & users.ssn_hash

Two columns storing personally identifiable information are not encrypted at rest. This violates GDPR Article 32 and SOC 2 CC6.1. Immediate remediation required before next audit.

Table: public.users · Columns: phone_number, ssn_hash · Module: Security

CRIT

Missing FK Index — orders.customer_id

Foreign key column orders.customer_id references customers.id but has no index. Causes full table scans on every JOIN, breaking referential integrity enforcement at scale.

Table: public.orders · Column: customer_id · Module: Schema Quality

CRIT

1NF Violation — customer_preferences multi-value column

Columns pref_1 through pref_5 represent a repeating group, violating First Normal Form. Makes querying, indexing, and ML feature extraction unreliable.

Table: public.customer_preferences · Module: Normalization

Issue Distribution by Severity

Critical3

High4

Medium3

Low2

Chapter 1 Schema Quality & Structural Integrity

FK coverage · constraint analysis · index efficiency · orphan risk · duplicate entities

1.1 Score Breakdown

PK Coverage96%

10 tables missing PKs — mostly legacy audit log tables.

FK Coverage78%

54 relationships exist as implied conventions without declared constraints.

Constraint Coverage71%

NOT NULL and CHECK constraints are sparse in the analytics schema.

Index Efficiency82%

18 tables have at least one unindexed FK column causing slow JOINs.

Orphan Risk3.2%

2,341 orphaned rows in order_items; 87 orphans in audit_events.

248

Total Tables

1,847

Columns

Missing FKs

Orphan Tables

🤖 AI Insight

Declaring missing FK constraints will improve query planner accuracy by ~22% for multi-table JOINs based on the current schema topology.

1.2 Missing Foreign Key Constraints

Table	Column	References	Impact	Rows Affected
public.orders	`customer_id`	customers(id)	Critical	142,890
public.order_items	`product_id`	products(id)	High	418,234
analytics.events	`session_id`	user_sessions(id)	High	2,100,441
audit.log_entries	`actor_id`	users(id)	Medium	88,120

1.3 Duplicate Entity Detection

⚠ users ↔ analytics.user_profiles

92% column overlap — both have email, name, phone, created_at, country, timezone, language, avatar_url. Likely the same entity modeled twice. Consolidate with a JOIN view.

⚠ products ↔ product_catalog

6 overlapping columns with diverging data (SKU, price, category). Likely created by two separate teams. Recommend a data contract review and migration plan.

Chapter 2 Normalization Diagnostics

1NF through BCNF analysis · transitive & partial dependency detection · per-table violations

2.1 NF Level Distribution

Fail 1NF

Fail 2NF

Fail 3NF

Below BCNF

189

BCNF+ ✓

🤖 AI Insight

Your database is at an average of 2.6NF. Resolving the 14 tables below 3NF would improve Text-to-SQL accuracy by ~38% by eliminating ambiguous transitive relationships that confuse query generation models.

2.2 Per-Table Normalization Violations

Table	NF Violation	Type	Affected Columns	Severity
customer_preferences	Fails 1NF	Repeating group: pref_1…pref_5	`pref_1, pref_2, pref_3, pref_4, pref_5`	Critical
order_tags	Fails 1NF	Multi-value: comma-separated tags	`tags, category_tags`	Critical
user_sessions	Fails 2NF	Partial dep: user_agent → user_id	`user_agent, device_type`	High
order_reports	Fails 2NF	Partial dep on composite key	`customer_email, customer_name`	High
products	Fails 3NF	Transitive: category_name → category_id	`category_name, supplier_name`	Medium
employees	Fails 3NF	Transitive: department_name → dept_id	`department_name, manager_name`	Medium

Chapter 3 Security & Compliance Posture

PII detection · encryption gaps · access control · privilege escalation paths

3.1 Security Score & Findings

Security Score

PII Exposures

Privilege Path

Total Findings

CRIT

Unencrypted PII — users.phone_number

Column phone_number stores phone numbers as plain VARCHAR with no encryption. Under GDPR Article 32 and HIPAA §164.312(a)(2)(iv), sensitive personal data must be encrypted at rest using AES-256 or equivalent.

Detected via: regex pattern match on column name + sample data entropy analysis

HIGH

Privilege Escalation Path — analytics → public schema

The analytics schema role has INSERT/UPDATE privileges on public schema tables through an indirect role chain: analytics_writer → report_admin → db_superuser.

Role chain: analytics_writer → report_admin → db_superuser

HIGH

Missing Row-Level Security — audit.log_entries

The audit log table is readable by all authenticated users. Log entries contain actor IDs, IP addresses, and action payloads. RLS should restrict access to SECURITY DEFINER functions and audit roles only.

Table: audit.log_entries · Rows: 88,120 · Exposure: all authenticated roles

3.2 PII Column Inventory

Table	Column	PII Type	Encrypted	Masked in Logs	Action
public.users	`phone_number`	Phone	✗ No	✗ No	Encrypt Now
public.users	`date_of_birth`	DOB	✓ AES-256	✓ Yes	OK
public.users	`email`	Email	✓ AES-256	~ Partial	Review
public.payments	`card_last4`	Payment	✓ PCI-DSS	✓ Yes	OK
analytics.events	`ip_address`	IP / PII	~ Partial	✗ No	Mask Logs

Chapter 4 AI Readiness Evaluation

Text-to-SQL · RAG suitability · vectorization candidates · label consistency · join complexity

4.1 AI Readiness Dimension Scores

Overall AI Score

Text-to-SQL

Vectorization

Data Completeness

Join Integrity

RAG Suitability

Data Completeness84%

Schema Clarity (Text-to-SQL)64%

Vectorization Potential71%

Join Integrity52%

RAG Suitability48%

4.2 AI Blockers — Resolve Before AI Integration

38 ambiguous column names prevent reliable Text-to-SQL

Columns named val, flag, type, data, info across 12 tables have no semantic meaning. LLMs cannot infer intent → generates wrong SQL. Add COMMENT ON COLUMN for each to improve accuracy by ~40%.

Missing FK constraints break RAG relationship traversal

4 implied relationships lack declared FKs. AI agents cannot reliably discover join paths for context retrieval chains, causing incomplete RAG responses.

14 columns with >40% NULL rates bias model training

High NULL rates in label columns used for classification training introduce systematic bias. Either impute values or exclude columns from training datasets.

Mixed naming conventions degrade prompt injection accuracy

analytics schema uses camelCase; public schema uses snake_case. Cross-schema AI queries become error-prone. Standardize to snake_case across all schemas.

4.3 Top Vectorization Candidates

Table	Column	Type	Suitability	Recommended Use Case
products	`description`	TEXT	92%	Semantic product search
support_tickets	`body`	TEXT	88%	RAG retrieval for support AI
articles	`content`	TEXT	85%	Knowledge base / Q&A
reviews	`review_text`	TEXT	71%	Sentiment analysis

Chapter 5 Compliance Framework Mapping

GDPR · SOC 2 · ISO 27001 · HIPAA · control mapping · open gaps

5.1 Framework Compliance Scores

GDPR

64%

2 open gaps

SOC 2

71%

1 open gap

ISO 27001

N/A

Not assessed

HIPAA

N/A

Not assessed

Control	Framework	Requirement	Status	Finding
Art. 32	GDPR	Encryption of personal data at rest	Fail	users.phone_number unencrypted
Art. 25	GDPR	Data minimization & pseudonymization	Partial	IP addresses logged in plain text
CC6.1	SOC 2	Logical access controls	Partial	Privilege escalation path detected
CC7.2	SOC 2	System monitoring & anomaly detection	Pass	Audit log active and comprehensive
A.8.2	ISO 27001	Information classification	N/A	Assessment pending

Chapter 6 SQL Remediation Scripts

Auto-generated migration scripts · run in transaction · always back up first

Fix 1 — Add Missing FK Indexes (P0)

⚠ Impact

Eliminates full-table scan on every order JOIN. Expected query speedup: 4–12× for ORDER-related queries.

-- Fix 1: Add indexes for missing FK columns -- ThunderScan · Generated Feb 23, 2026 · Run in transaction BEGIN; -- orders.customer_id (CRITICAL: 142,890 rows) CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_customer_id ON public.orders (customer_id); -- order_items.product_id (418,234 rows) CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_order_items_product_id ON public.order_items (product_id); -- analytics.events.session_id (2,100,441 rows) CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_events_session_id ON analytics.events (session_id); COMMIT;

Fix 2 — Resolve 1NF Violation in customer_preferences (P0)

-- Fix 2: Eliminate repeating group — customer_preferences -- Replace pref_1…pref_5 columns with a normalized junction table BEGIN; CREATE TABLE IF NOT EXISTS customer_preference_values ( id BIGSERIAL PRIMARY KEY, customer_id INT NOT NULL REFERENCES customers(id), preference_key VARCHAR(100) NOT NULL, preference_value TEXT, created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX ON customer_preference_values (customer_id); INSERT INTO customer_preference_values (customer_id, preference_key, preference_value) SELECT customer_id, 'pref_1', pref_1 FROM customer_preferences WHERE pref_1 IS NOT NULL UNION ALL SELECT customer_id, 'pref_2', pref_2 FROM customer_preferences WHERE pref_2 IS NOT NULL UNION ALL SELECT customer_id, 'pref_3', pref_3 FROM customer_preferences WHERE pref_3 IS NOT NULL UNION ALL SELECT customer_id, 'pref_4', pref_4 FROM customer_preferences WHERE pref_4 IS NOT NULL UNION ALL SELECT customer_id, 'pref_5', pref_5 FROM customer_preferences WHERE pref_5 IS NOT NULL; ALTER TABLE customer_preferences DROP COLUMN pref_1, DROP COLUMN pref_2, DROP COLUMN pref_3, DROP COLUMN pref_4, DROP COLUMN pref_5; COMMIT;

Fix 3 — Encrypt PII Column (P1)

-- Fix 3: Encrypt users.phone_number using pgcrypto -- Requires: CREATE EXTENSION IF NOT EXISTS pgcrypto; BEGIN; ALTER TABLE public.users ADD COLUMN phone_number_enc BYTEA; UPDATE public.users SET phone_number_enc = pgp_sym_encrypt( phone_number, current_setting('app.encryption_key')) WHERE phone_number IS NOT NULL; ALTER TABLE public.users DROP COLUMN phone_number, RENAME COLUMN phone_number_enc TO phone_number; COMMIT;

Fix 4 — Resolve 3NF Violation in products (P2)

-- Fix 4: Remove transitive dependency in products table BEGIN; INSERT INTO categories (id, name) SELECT DISTINCT category_id, category_name FROM products WHERE category_name IS NOT NULL ON CONFLICT (id) DO NOTHING; ALTER TABLE products DROP COLUMN category_name, DROP COLUMN supplier_name; ALTER TABLE products ADD CONSTRAINT fk_products_category FOREIGN KEY (category_id) REFERENCES categories(id), ADD CONSTRAINT fk_products_supplier FOREIGN KEY (supplier_id) REFERENCES suppliers(id); COMMIT;

Chapter 7 Prioritized Remediation Roadmap

Action plan ordered by risk severity and implementation complexity

Phase 1 — Immediate (Week 1–2) · P0 Items

Encrypt users.phone_number

Apply pgcrypto AES-256 encryption to all PII columns. Blocker for GDPR compliance and next security audit. Use Fix 3 migration script above.

Est: 2–4 hrs DBA GDPR · SOC2P0

Add FK indexes on orders.customer_id & order_items.product_id

Missing indexes cause full table scans on your two highest-traffic tables. Use CONCURRENTLY to avoid locking production. Expected 4–12× query speedup.

Est: 30 min DBA PerformanceP0

Fix 1NF violation in customer_preferences

Normalize pref_1…pref_5 repeating columns into a junction table. Required before any ML feature engineering on preference data.

Est: 4–6 hrs Dev + DBA Normalization · AIP0

Phase 2 — Short Term (Month 1) · P1 Items

Revoke privilege escalation path in analytics role

Audit and revoke the indirect role chain analytics_writer → report_admin → db_superuser. Apply principle of least privilege to all analytics roles.

Est: 2 hrs DBA / Security SOC2 CC6.1P1

Declare 4 missing FK constraints

Add FOREIGN KEY declarations for all 4 implied relationships. Enables query planner to optimize JOINs and unlocks reliable AI context traversal for RAG systems.

Est: 1–2 hrs DBA Schema · AI ReadinessP1

Add COMMENT ON COLUMN for 38 ambiguous column names

Columns like val, flag, type have no semantic context. Adding PostgreSQL column comments provides schema context to LLMs and improves Text-to-SQL accuracy by ~40%.

Est: 4–8 hrs Dev Team AI ReadinessP1

Phase 3 — Medium Term (Quarter) · P2 Items

Normalize to 3NF — products, employees, order_reports

Resolve remaining transitive dependencies. Use Fix 4 migration script as a template. Schedule during low-traffic maintenance window.

Est: 1–2 days Dev + DBA NormalizationP2

Standardize column naming to snake_case across all schemas

Rename 22 camelCase columns in analytics schema. Update all application-layer queries accordingly.

Est: 2–3 days Dev Team AI Readiness · QualityP2

Create product_embeddings companion table for pgvector

Add product_embeddings(product_id, vector VECTOR(1536), model VARCHAR) to enable semantic product search and RAG-powered recommendation engine.

Est: 1 day ML Engineer AI IntegrationP2

Report Summary — ThunderScan Lightning Strikes My DB

Overall Score

Total Issues

Critical (P0)

Action Items

Generated by ThunderScan Lightning Strikes My DB · Feb 23, 2026 · For: Isaac Shi, Golden Section VC · Confidential

Table of Contents