ThunderScan · Lightning Strikes My DB
Database Health Report
A comprehensive analysis of schema quality, normalization integrity,
security posture, and AI readiness for your production database.
security posture, and AI readiness for your production database.
📊 Full Scan
🔒 Confidential
7 Chapters
12 Issues Found
Table of Contents
Executive Summary3
Critical Findings3
Issue Distribution3
Chapter 1 — Schema Quality & Structural Integrity4
1.1 Score Breakdown4
1.2 Missing Foreign Key Constraints5
1.3 Duplicate Entity Detection5
Chapter 2 — Normalization Diagnostics6
2.1 NF Level Distribution6
2.2 Per-Table Normalization Violations7
Chapter 3 — Security & Compliance Posture8
3.1 Security Score & Findings8
3.2 PII Column Inventory9
Chapter 4 — AI Readiness Evaluation10
4.1 AI Readiness Dimension Scores10
4.2 AI Blockers11
4.3 Top Vectorization Candidates11
Chapter 5 — Compliance Framework Mapping12
Chapter 6 — SQL Remediation Scripts13
Chapter 7 — Prioritized Remediation Roadmap15
★ Executive Summary
Top-line health scores and critical findings at a glance
67
Overall Score
Needs Attention
74
Schema Quality
Moderate
61
Security Score
Risk Present
58
AI Readiness
Blockers Found
Critical Findings — Immediate Action Required
CRIT
P0
Unencrypted PII — users.phone_number & users.ssn_hash
Two columns storing personally identifiable information are not encrypted at rest. This violates GDPR Article 32 and SOC 2 CC6.1. Immediate remediation required before next audit.
Table: public.users · Columns: phone_number, ssn_hash · Module: Security
CRIT
P0
Missing FK Index — orders.customer_id
Foreign key column
orders.customer_id references customers.id but has no index. Causes full table scans on every JOIN, breaking referential integrity enforcement at scale.Table: public.orders · Column: customer_id · Module: Schema Quality
CRIT
P0
1NF Violation — customer_preferences multi-value column
Columns
pref_1 through pref_5 represent a repeating group, violating First Normal Form. Makes querying, indexing, and ML feature extraction unreliable.Table: public.customer_preferences · Module: Normalization
Issue Distribution by Severity
Critical3
High4
Medium3
Low2
Chapter 1 Schema Quality & Structural Integrity
FK coverage · constraint analysis · index efficiency · orphan risk · duplicate entities
1.1 Score Breakdown
PK Coverage96%
10 tables missing PKs — mostly legacy audit log tables.
FK Coverage78%
54 relationships exist as implied conventions without declared constraints.
Constraint Coverage71%
NOT NULL and CHECK constraints are sparse in the analytics schema.
Index Efficiency82%
18 tables have at least one unindexed FK column causing slow JOINs.
Orphan Risk3.2%
2,341 orphaned rows in order_items; 87 orphans in audit_events.
248
Total Tables
1,847
Columns
4
Missing FKs
2
Orphan Tables
🤖 AI Insight
Declaring missing FK constraints will improve query planner accuracy by ~22% for multi-table JOINs based on the current schema topology.
1.2 Missing Foreign Key Constraints
| Table | Column | References | Impact | Rows Affected |
|---|---|---|---|---|
| public.orders | customer_id | customers(id) | Critical | 142,890 |
| public.order_items | product_id | products(id) | High | 418,234 |
| analytics.events | session_id | user_sessions(id) | High | 2,100,441 |
| audit.log_entries | actor_id | users(id) | Medium | 88,120 |
1.3 Duplicate Entity Detection
⚠ users ↔ analytics.user_profiles
92% column overlap — both have email, name, phone, created_at, country, timezone, language, avatar_url. Likely the same entity modeled twice. Consolidate with a JOIN view.
⚠ products ↔ product_catalog
6 overlapping columns with diverging data (SKU, price, category). Likely created by two separate teams. Recommend a data contract review and migration plan.
Chapter 2 Normalization Diagnostics
1NF through BCNF analysis · transitive & partial dependency detection · per-table violations
2.1 NF Level Distribution
3
Fail 1NF
6
Fail 2NF
5
Fail 3NF
36
Below BCNF
189
BCNF+ ✓
🤖 AI Insight
Your database is at an average of 2.6NF. Resolving the 14 tables below 3NF would improve Text-to-SQL accuracy by ~38% by eliminating ambiguous transitive relationships that confuse query generation models.
2.2 Per-Table Normalization Violations
| Table | NF Violation | Type | Affected Columns | Severity |
|---|---|---|---|---|
| customer_preferences | Fails 1NF | Repeating group: pref_1…pref_5 | pref_1, pref_2, pref_3, pref_4, pref_5 | Critical |
| order_tags | Fails 1NF | Multi-value: comma-separated tags | tags, category_tags | Critical |
| user_sessions | Fails 2NF | Partial dep: user_agent → user_id | user_agent, device_type | High |
| order_reports | Fails 2NF | Partial dep on composite key | customer_email, customer_name | High |
| products | Fails 3NF | Transitive: category_name → category_id | category_name, supplier_name | Medium |
| employees | Fails 3NF | Transitive: department_name → dept_id | department_name, manager_name | Medium |
Chapter 3 Security & Compliance Posture
PII detection · encryption gaps · access control · privilege escalation paths
3.1 Security Score & Findings
61
Security Score
2
PII Exposures
1
Privilege Path
3
Total Findings
CRIT
P0
Unencrypted PII — users.phone_number
Column
phone_number stores phone numbers as plain VARCHAR with no encryption. Under GDPR Article 32 and HIPAA §164.312(a)(2)(iv), sensitive personal data must be encrypted at rest using AES-256 or equivalent.Detected via: regex pattern match on column name + sample data entropy analysis
HIGH
P1
Privilege Escalation Path — analytics → public schema
The
analytics schema role has INSERT/UPDATE privileges on public schema tables through an indirect role chain: analytics_writer → report_admin → db_superuser.Role chain: analytics_writer → report_admin → db_superuser
HIGH
P1
Missing Row-Level Security — audit.log_entries
The audit log table is readable by all authenticated users. Log entries contain actor IDs, IP addresses, and action payloads. RLS should restrict access to
SECURITY DEFINER functions and audit roles only.Table: audit.log_entries · Rows: 88,120 · Exposure: all authenticated roles
3.2 PII Column Inventory
| Table | Column | PII Type | Encrypted | Masked in Logs | Action |
|---|---|---|---|---|---|
| public.users | phone_number | Phone | ✗ No | ✗ No | Encrypt Now |
| public.users | date_of_birth | DOB | ✓ AES-256 | ✓ Yes | OK |
| public.users | email | ✓ AES-256 | ~ Partial | Review | |
| public.payments | card_last4 | Payment | ✓ PCI-DSS | ✓ Yes | OK |
| analytics.events | ip_address | IP / PII | ~ Partial | ✗ No | Mask Logs |
Chapter 4 AI Readiness Evaluation
Text-to-SQL · RAG suitability · vectorization candidates · label consistency · join complexity
4.1 AI Readiness Dimension Scores
58
Overall AI Score
64
Text-to-SQL
71
Vectorization
84
Data Completeness
52
Join Integrity
48
RAG Suitability
Data Completeness84%
Schema Clarity (Text-to-SQL)64%
Vectorization Potential71%
Join Integrity52%
RAG Suitability48%
4.2 AI Blockers — Resolve Before AI Integration
1
38 ambiguous column names prevent reliable Text-to-SQL
Columns named
val, flag, type, data, info across 12 tables have no semantic meaning. LLMs cannot infer intent → generates wrong SQL. Add COMMENT ON COLUMN for each to improve accuracy by ~40%.
2
Missing FK constraints break RAG relationship traversal
4 implied relationships lack declared FKs. AI agents cannot reliably discover join paths for context retrieval chains, causing incomplete RAG responses.
3
14 columns with >40% NULL rates bias model training
High NULL rates in label columns used for classification training introduce systematic bias. Either impute values or exclude columns from training datasets.
4
Mixed naming conventions degrade prompt injection accuracy
analytics schema uses camelCase; public schema uses snake_case. Cross-schema AI queries become error-prone. Standardize to snake_case across all schemas.4.3 Top Vectorization Candidates
| Table | Column | Type | Suitability | Recommended Use Case |
|---|---|---|---|---|
| products | description | TEXT | 92% | Semantic product search |
| support_tickets | body | TEXT | 88% | RAG retrieval for support AI |
| articles | content | TEXT | 85% | Knowledge base / Q&A |
| reviews | review_text | TEXT | 71% | Sentiment analysis |
Chapter 5 Compliance Framework Mapping
GDPR · SOC 2 · ISO 27001 · HIPAA · control mapping · open gaps
5.1 Framework Compliance Scores
GDPR
64%
2 open gaps
SOC 2
71%
1 open gap
ISO 27001
N/A
Not assessed
HIPAA
N/A
Not assessed
| Control | Framework | Requirement | Status | Finding |
|---|---|---|---|---|
| Art. 32 | GDPR | Encryption of personal data at rest | Fail | users.phone_number unencrypted |
| Art. 25 | GDPR | Data minimization & pseudonymization | Partial | IP addresses logged in plain text |
| CC6.1 | SOC 2 | Logical access controls | Partial | Privilege escalation path detected |
| CC7.2 | SOC 2 | System monitoring & anomaly detection | Pass | Audit log active and comprehensive |
| A.8.2 | ISO 27001 | Information classification | N/A | Assessment pending |
Chapter 6 SQL Remediation Scripts
Auto-generated migration scripts · run in transaction · always back up first
Fix 1 — Add Missing FK Indexes (P0)
⚠ Impact
Eliminates full-table scan on every order JOIN. Expected query speedup: 4–12× for ORDER-related queries.
-- Fix 1: Add indexes for missing FK columns
-- ThunderScan · Generated Feb 23, 2026 · Run in transaction
BEGIN;
-- orders.customer_id (CRITICAL: 142,890 rows)
CREATE INDEX CONCURRENTLY IF NOT EXISTS
idx_orders_customer_id ON public.orders (customer_id);
-- order_items.product_id (418,234 rows)
CREATE INDEX CONCURRENTLY IF NOT EXISTS
idx_order_items_product_id ON public.order_items (product_id);
-- analytics.events.session_id (2,100,441 rows)
CREATE INDEX CONCURRENTLY IF NOT EXISTS
idx_events_session_id ON analytics.events (session_id);
COMMIT;
Fix 2 — Resolve 1NF Violation in customer_preferences (P0)
-- Fix 2: Eliminate repeating group — customer_preferences
-- Replace pref_1…pref_5 columns with a normalized junction table
BEGIN;
CREATE TABLE IF NOT EXISTS customer_preference_values (
id BIGSERIAL PRIMARY KEY,
customer_id INT NOT NULL REFERENCES customers(id),
preference_key VARCHAR(100) NOT NULL,
preference_value TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON customer_preference_values (customer_id);
INSERT INTO customer_preference_values (customer_id, preference_key, preference_value)
SELECT customer_id, 'pref_1', pref_1 FROM customer_preferences WHERE pref_1 IS NOT NULL
UNION ALL SELECT customer_id, 'pref_2', pref_2 FROM customer_preferences WHERE pref_2 IS NOT NULL
UNION ALL SELECT customer_id, 'pref_3', pref_3 FROM customer_preferences WHERE pref_3 IS NOT NULL
UNION ALL SELECT customer_id, 'pref_4', pref_4 FROM customer_preferences WHERE pref_4 IS NOT NULL
UNION ALL SELECT customer_id, 'pref_5', pref_5 FROM customer_preferences WHERE pref_5 IS NOT NULL;
ALTER TABLE customer_preferences
DROP COLUMN pref_1, DROP COLUMN pref_2, DROP COLUMN pref_3,
DROP COLUMN pref_4, DROP COLUMN pref_5;
COMMIT;
Fix 3 — Encrypt PII Column (P1)
-- Fix 3: Encrypt users.phone_number using pgcrypto
-- Requires: CREATE EXTENSION IF NOT EXISTS pgcrypto;
BEGIN;
ALTER TABLE public.users
ADD COLUMN phone_number_enc BYTEA;
UPDATE public.users
SET phone_number_enc = pgp_sym_encrypt(
phone_number, current_setting('app.encryption_key'))
WHERE phone_number IS NOT NULL;
ALTER TABLE public.users
DROP COLUMN phone_number,
RENAME COLUMN phone_number_enc TO phone_number;
COMMIT;
Fix 4 — Resolve 3NF Violation in products (P2)
-- Fix 4: Remove transitive dependency in products table
BEGIN;
INSERT INTO categories (id, name)
SELECT DISTINCT category_id, category_name FROM products
WHERE category_name IS NOT NULL
ON CONFLICT (id) DO NOTHING;
ALTER TABLE products
DROP COLUMN category_name,
DROP COLUMN supplier_name;
ALTER TABLE products
ADD CONSTRAINT fk_products_category
FOREIGN KEY (category_id) REFERENCES categories(id),
ADD CONSTRAINT fk_products_supplier
FOREIGN KEY (supplier_id) REFERENCES suppliers(id);
COMMIT;
Chapter 7 Prioritized Remediation Roadmap
Action plan ordered by risk severity and implementation complexity
Phase 1 — Immediate (Week 1–2) · P0 Items
1
Encrypt users.phone_number
Apply pgcrypto AES-256 encryption to all PII columns. Blocker for GDPR compliance and next security audit. Use Fix 3 migration script above.
2
Add FK indexes on orders.customer_id & order_items.product_id
Missing indexes cause full table scans on your two highest-traffic tables. Use CONCURRENTLY to avoid locking production. Expected 4–12× query speedup.
3
Fix 1NF violation in customer_preferences
Normalize pref_1…pref_5 repeating columns into a junction table. Required before any ML feature engineering on preference data.
Phase 2 — Short Term (Month 1) · P1 Items
4
Revoke privilege escalation path in analytics role
Audit and revoke the indirect role chain analytics_writer → report_admin → db_superuser. Apply principle of least privilege to all analytics roles.
5
Declare 4 missing FK constraints
Add FOREIGN KEY declarations for all 4 implied relationships. Enables query planner to optimize JOINs and unlocks reliable AI context traversal for RAG systems.
6
Add COMMENT ON COLUMN for 38 ambiguous column names
Columns like
val, flag, type have no semantic context. Adding PostgreSQL column comments provides schema context to LLMs and improves Text-to-SQL accuracy by ~40%.Phase 3 — Medium Term (Quarter) · P2 Items
7
Normalize to 3NF — products, employees, order_reports
Resolve remaining transitive dependencies. Use Fix 4 migration script as a template. Schedule during low-traffic maintenance window.
8
Standardize column naming to snake_case across all schemas
Rename 22 camelCase columns in analytics schema. Update all application-layer queries accordingly.
9
Create product_embeddings companion table for pgvector
Add
product_embeddings(product_id, vector VECTOR(1536), model VARCHAR) to enable semantic product search and RAG-powered recommendation engine.