Views Reference¶
Complete reference for all 24 pg_stat_insights views with examples and use cases including comprehensive replication monitoring, logical replication subscription/publication tracking, and debugging.
Overview¶
pg_stat_insights provides 24 pre-built views for instant query performance analysis, replication monitoring (physical & logical), and diagnostics:
| View | Purpose | Typical Use Case |
|---|---|---|
pg_stat_insights | Main statistics view | All-purpose monitoring |
pg_stat_insights_top_by_time | Slowest queries by total time | Find time-consuming queries |
pg_stat_insights_top_by_calls | Most frequently called | Find hot path queries |
pg_stat_insights_top_by_io | Highest I/O consumers | Identify disk-intensive operations |
pg_stat_insights_top_cache_misses | Poor cache performers | Optimize buffer cache usage |
pg_stat_insights_slow_queries | Queries with mean time >100ms | Find consistently slow queries |
pg_stat_insights_errors | Queries with errors | Troubleshooting |
pg_stat_insights_plan_errors | Plan estimation issues | Query optimization |
pg_stat_insights_histogram_summary | Response time distribution | Performance analysis |
pg_stat_insights_by_bucket | Time-series aggregation | Trend analysis |
pg_stat_insights_replication | Basic replication monitoring | Basic replication lag tracking |
pg_stat_insights_physical_replication | Physical replication details | Streaming replication health |
pg_stat_insights_logical_replication | Logical replication slots | Logical replication lag tracking |
pg_stat_insights_replication_slots | All replication slots | Slot health and WAL retention |
pg_stat_insights_replication_summary | Replication overview | Cluster-wide replication status |
pg_stat_insights_replication_alerts | Critical alerts | Lag/WAL loss/inactive detection |
pg_stat_insights_replication_wal | WAL statistics | WAL retention and sizing |
pg_stat_insights_replication_bottlenecks | Bottleneck detection | Network/I/O/replay analysis |
pg_stat_insights_replication_conflicts | Conflict detection | Logical replication conflicts |
pg_stat_insights_replication_health | Health diagnostics | Comprehensive health check |
pg_stat_insights_replication_performance | Performance trends | Throughput and lag trends |
pg_stat_insights_replication_timeline | Timeline analysis | Historical lag patterns |
pg_stat_insights_subscriptions | Subscription monitoring | Logical replication subscriptions |
pg_stat_insights_subscription_stats | Subscription details | Per-table sync status |
pg_stat_insights_publications | Publication monitoring | Publisher configuration |
pg_stat_insights_replication_origins | Origin tracking | Cascading/bidirectional replication |
pg_stat_insights_replication_dashboard | Unified dashboard | JSON comprehensive view |
Main Views¶
pg_stat_insights¶
The primary statistics view with all 52 metrics
Columns (52 total):
| Category | Columns |
|---|---|
| Identity | userid, dbid, toplevel, queryid, query |
| Planning | plans, total_plan_time, min_plan_time, max_plan_time, mean_plan_time, stddev_plan_time |
| Execution | calls, total_exec_time, min_exec_time, max_exec_time, mean_exec_time, stddev_exec_time, rows |
| Buffer I/O | shared_blks_hit, shared_blks_read, shared_blks_dirtied, shared_blks_written |
| Local I/O | local_blks_hit, local_blks_read, local_blks_dirtied, local_blks_written |
| Temp I/O | temp_blks_read, temp_blks_written |
| I/O Timing | shared_blk_read_time, shared_blk_write_time, local_blk_read_time, local_blk_write_time, temp_blk_read_time, temp_blk_write_time |
| WAL | wal_records, wal_fpi, wal_bytes, wal_buffers_full |
| JIT | jit_functions, jit_generation_time, jit_inlining_count, jit_inlining_time, jit_optimization_count, jit_optimization_time, jit_emission_count, jit_emission_time, jit_deform_count, jit_deform_time |
| Parallel | parallel_workers_to_launch, parallel_workers_launched |
| Timestamps | stats_since, minmax_stats_since |
Example:
-- Comprehensive query analysis
SELECT
LEFT(query, 100) AS query_preview,
calls,
ROUND(mean_exec_time::numeric, 2) AS avg_time_ms,
ROUND(cache_hit_ratio::numeric, 3) AS cache_ratio,
pg_size_pretty(wal_bytes::bigint) AS wal_generated,
parallel_workers_launched,
rows
FROM pg_stat_insights
WHERE calls > 10
ORDER BY total_exec_time DESC
LIMIT 20;
Performance Analysis Views¶
pg_stat_insights_top_by_time¶
Top 100 queries by total execution time
Identifies queries consuming the most cumulative time.
Use Cases:
- [TARGET] Optimization priority - Focus on queries consuming most time
- [DATA] Capacity planning - Identify resource-intensive operations
- [FIND] Performance regression - Detect newly slow queries
Example Analysis:
-- Find queries contributing to 80% of total time
WITH total_time AS (
SELECT SUM(total_exec_time) AS total FROM pg_stat_insights
),
ranked AS (
SELECT
query,
total_exec_time,
SUM(total_exec_time) OVER (ORDER BY total_exec_time DESC) AS running_total,
ROW_NUMBER() OVER (ORDER BY total_exec_time DESC) AS rank
FROM pg_stat_insights
)
SELECT
rank,
LEFT(query, 100) AS query_preview,
ROUND(total_exec_time::numeric, 2) AS time_ms,
ROUND((running_total / (SELECT total FROM total_time) * 100)::numeric, 1) AS cumulative_pct
FROM ranked
WHERE running_total <= (SELECT total FROM total_time) * 0.8
ORDER BY rank;
pg_stat_insights_top_by_calls¶
Top 100 queries by call count
Identifies most frequently executed queries.
Use Cases:
- [HOT] Hot paths - Identify critical code paths
- [FAST] Caching opportunities - Find queries to cache
- [TARGET] Micro-optimization - Even small improvements have big impact
Example:
-- Frequent queries with improvement potential
SELECT
LEFT(query, 100) AS query_preview,
calls,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
ROUND((calls * mean_exec_time)::numeric, 2) AS total_time_ms,
ROUND(cache_hit_ratio::numeric, 3) AS cache_ratio
FROM pg_stat_insights_top_by_calls
WHERE mean_exec_time > 1 -- Room for optimization
ORDER BY (calls * mean_exec_time) DESC
LIMIT 20;
pg_stat_insights_top_by_io¶
Top 100 queries by I/O operations
Identifies disk-intensive queries.
Sorting Logic:
Use Cases:
- [DISK] Disk bottlenecks - Find I/O-bound queries
- [DEPLOY] Index opportunities - Reduce sequential scans
- [TREND] Storage planning - Identify I/O patterns
Example:
-- I/O-intensive queries with timing
SELECT
LEFT(query, 100) AS query_preview,
calls,
shared_blks_read,
shared_blks_hit,
shared_blk_read_time,
ROUND((shared_blk_read_time / NULLIF(shared_blks_read, 0))::numeric, 3) AS ms_per_block,
ROUND(cache_hit_ratio::numeric, 3) AS cache_ratio
FROM pg_stat_insights_top_by_io
WHERE shared_blks_read > 0
LIMIT 20;
pg_stat_insights_top_cache_misses¶
Queries with poor cache performance
Includes calculated cache_hit_ratio column.
Cache Hit Ratio Calculation:
Use Cases:
- [DATA] Cache optimization - Improve buffer cache efficiency
- [TARGET] Memory tuning - Adjust
shared_buffers - [FIND] Index analysis - Find missing indexes
Example:
-- Queries needing cache optimization
SELECT
LEFT(query, 100) AS query_preview,
calls,
ROUND(cache_hit_ratio::numeric, 3) AS cache_ratio,
shared_blks_hit,
shared_blks_read,
pg_size_pretty((shared_blks_read * 8192)::bigint) AS data_read_from_disk
FROM pg_stat_insights_top_cache_misses
WHERE cache_hit_ratio < 0.95 AND calls > 5
ORDER BY (shared_blks_read * calls) DESC
LIMIT 20;
pg_stat_insights_slow_queries¶
Queries with mean execution time > 100ms
Filters for consistently slow queries.
Filter Logic:
Use Cases:
- [SLOW] Slow query detection - Find performance problems
- [TREND] Optimization candidates - Prioritize improvements
- [WARNING] Performance alerts - Set up monitoring thresholds
Example:
-- Slow queries with full context
SELECT
LEFT(query, 100) AS query_preview,
calls,
ROUND(min_exec_time::numeric, 2) AS min_ms,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
ROUND(max_exec_time::numeric, 2) AS max_ms,
ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,
rows,
ROUND(cache_hit_ratio::numeric, 3) AS cache_ratio
FROM pg_stat_insights_slow_queries
ORDER BY mean_exec_time DESC
LIMIT 20;
pg_stat_insights_histogram_summary¶
Response time distribution histogram
Shows query count distribution across time buckets.
Columns:
bucket_label- Time range label (e.g., "< 1ms", "1-10ms")bucket_order- Ordering numberquery_count- Queries in this buckettotal_time- Total execution time for bucketavg_time- Average execution time
Use Cases:
- [DATA] Performance profile - Understand query time distribution
- [TARGET] SLA monitoring - Track % of queries under threshold
- [TREND] Trend analysis - Monitor distribution changes
Example:
-- Response time SLA analysis
SELECT
bucket_label,
query_count,
ROUND((query_count::numeric / SUM(query_count) OVER () * 100), 1) AS pct_queries,
ROUND(total_time::numeric, 2) AS total_ms,
ROUND((total_time / SUM(total_time) OVER () * 100), 1) AS pct_time
FROM pg_stat_insights_histogram_summary
ORDER BY bucket_order;
Sample Output:
bucket_label | query_count | pct_queries | total_ms | pct_time
----------------+-------------+-------------+----------+----------
< 1ms | 1,234 | 82.3 | 456.12 | 15.2
1-10ms | 189 | 12.6 | 987.45 | 32.9
10-100ms | 52 | 3.5 | 1245.67 | 41.5
100ms-1s | 18 | 1.2 | 234.56 | 7.8
> 1s | 7 | 0.5 | 78.90 | 2.6
pg_stat_insights_by_bucket¶
Time-series query statistics by time bucket
Groups queries by time periods for trend analysis.
Columns:
bucket_start- Bucket start timestampbucket_end- Bucket end timestampquery_count- Distinct queries in buckettotal_calls- Total executionstotal_time- Total execution timeavg_time- Average execution time
Use Cases:
- [TIME] Time-series analysis - Track performance over time
- [FIND] Pattern detection - Find daily/hourly patterns
- [SCHEDULE] Peak load analysis - Identify busy periods
Example:
-- Hourly performance trends
SELECT
DATE_TRUNC('hour', bucket_start) AS hour,
COUNT(*) AS bucket_count,
SUM(query_count) AS total_queries,
SUM(total_calls) AS total_executions,
ROUND(AVG(avg_time)::numeric, 2) AS avg_response_ms
FROM pg_stat_insights_by_bucket
WHERE bucket_start >= NOW() - INTERVAL '24 hours'
GROUP BY DATE_TRUNC('hour', bucket_start)
ORDER BY hour DESC;
Specialized Views¶
pg_stat_insights_errors¶
Queries that encountered errors
Note
This view may be empty if queries complete successfully. Errors are typically logged but not retained in statistics.
Use Cases:
- [BUG] Error detection - Find failing queries
- [FIND] Debugging - Identify problematic SQL
- [WARNING] Alert generation - Monitor for errors
pg_stat_insights_plan_errors¶
Queries with significant plan estimation errors
Compares planned rows vs actual rows.
Use Cases:
- [DATA] Statistics accuracy - Find stale statistics
- [TARGET] Query tuning - Identify mis-estimated queries
- [TREND] Index effectiveness - Evaluate index usage
Example:
-- Find queries with poor estimates
SELECT
LEFT(query, 100) AS query_preview,
calls,
rows AS actual_rows,
planned_rows,
ABS(rows - planned_rows) AS estimation_error,
ROUND((ABS(rows - planned_rows)::numeric / NULLIF(GREATEST(rows, planned_rows), 0) * 100), 1) AS error_pct
FROM pg_stat_insights_plan_errors
WHERE ABS(rows - planned_rows) > 100
ORDER BY ABS(rows - planned_rows) DESC
LIMIT 20;
pg_stat_insights_replication¶
Replication lag and statistics monitoring
Columns:
pid- Replication process IDusename- Replication userapplication_name- Client applicationclient_addr- Client IP addressrepl_state- Replication state (streaming, catchup, etc.)sync_state- Synchronous state (sync, async, potential)sent_lsn- Last LSN sent to clientwrite_lag_bytes- Bytes behind in writingflush_lag_bytes- Bytes behind in flushingreplay_lag_bytes- Bytes behind in replayingwrite_lag_seconds- Time lag in writingflush_lag_seconds- Time lag in flushingreplay_lag_seconds- Time lag in replaying
Use Cases:
- [SYNC] Replication monitoring - Track replica lag
- [WARNING] Lag alerts - Detect replication delays
- [DATA] Capacity planning - Understand replication load
Example:
-- Monitor replication lag
SELECT
application_name,
client_addr,
sync_state,
pg_size_pretty(replay_lag_bytes::bigint) AS replay_lag_size,
ROUND(replay_lag_seconds::numeric, 2) AS replay_lag_sec,
CASE
WHEN replay_lag_seconds < 1 THEN '[OK] Healthy'
WHEN replay_lag_seconds < 10 THEN '[WARNING] Warning'
ELSE '[CRITICAL] Critical'
END AS status
FROM pg_stat_insights_replication
ORDER BY replay_lag_seconds DESC;
View Comparison¶
When to Use Each View¶
graph TD
A[Start] --> B{What do you need?}
B -->|Find slowest queries| C[pg_stat_insights_top_by_time]
B -->|Find most called queries| D[pg_stat_insights_top_by_calls]
B -->|Find I/O bottlenecks| E[pg_stat_insights_top_by_io]
B -->|Optimize cache| F[pg_stat_insights_top_cache_misses]
B -->|Find consistently slow| G[pg_stat_insights_slow_queries]
B -->|Analyze time distribution| H[pg_stat_insights_histogram_summary]
B -->|Track trends| I[pg_stat_insights_by_bucket]
B -->|Monitor replication| J[pg_stat_insights_replication]
B -->|All metrics| K[pg_stat_insights] View Performance Characteristics¶
| View | Rows Returned | Query Speed | Memory Usage |
|---|---|---|---|
pg_stat_insights | All queries (5,000+) | Medium | High |
pg_stat_insights_top_by_time | 100 | Fast | Low |
pg_stat_insights_top_by_calls | 100 | Fast | Low |
pg_stat_insights_top_by_io | 100 | Fast | Low |
pg_stat_insights_top_cache_misses | 100 | Fast | Low |
pg_stat_insights_slow_queries | Filtered | Fast | Low |
pg_stat_insights_histogram_summary | 10-20 | Very Fast | Very Low |
pg_stat_insights_by_bucket | 100-1000 | Medium | Medium |
pg_stat_insights_replication | 1-10 | Very Fast | Very Low |
Advanced Usage¶
Combining Views¶
-- Slow queries with poor cache performance
SELECT
s.query,
s.mean_exec_time,
c.cache_hit_ratio,
s.calls,
i.shared_blks_read
FROM pg_stat_insights_slow_queries s
JOIN pg_stat_insights_top_cache_misses c USING (queryid)
JOIN pg_stat_insights_top_by_io i USING (queryid)
WHERE c.cache_hit_ratio < 0.9
ORDER BY s.mean_exec_time DESC
LIMIT 15;
Custom Views¶
-- Create your own monitoring view
CREATE VIEW my_critical_queries AS
SELECT
queryid,
LEFT(query, 200) AS query_preview,
calls,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
ROUND(cache_hit_ratio::numeric, 3) AS cache_ratio,
pg_size_pretty(wal_bytes::bigint) AS wal_size,
CASE
WHEN mean_exec_time > 1000 THEN '[CRITICAL] Critical'
WHEN mean_exec_time > 100 THEN '[WARNING] Warning'
ELSE '[OK] OK'
END AS status
FROM pg_stat_insights
WHERE calls > 10 AND (
mean_exec_time > 100
OR cache_hit_ratio < 0.9
OR wal_bytes > 1000000
)
ORDER BY mean_exec_time DESC;
-- Query your custom view
SELECT * FROM my_critical_queries;
View Maintenance¶
Refresh Statistics¶
Statistics are continuously updated. To see latest data:
Reset Statistics¶
-- Reset all statistics (affects all views)
SELECT pg_stat_insights_reset();
-- Reset specific query
SELECT pg_stat_insights_reset(userid, dbid, queryid);
Performance Considerations¶
View Query Cost¶
All views are lightweight and query shared memory directly:
- Cost: ~1-10ms per query
- Impact: Minimal on production
- Caching: Results can be cached by application
Best Practices¶
- Limit results - Use
LIMITfor large result sets - Filter early - Add WHERE clauses when possible
- Cache results - Cache view results in application
- **Avoid SELECT *** - Select only needed columns
- Use prepared statements - For repeated view queries
Next Steps¶
- Metrics Guide - Learn about all 52 metrics
- Usage Examples - 50+ real-world queries
- Quick Start - Start monitoring now
Replication Monitoring Views¶
pg_stat_insights_physical_replication¶
Enhanced physical replication monitoring with health status
Columns:
pid- Backend process IDusename- Replication user nameapplication_name- Application identifierclient_addr- Client IP addressclient_hostname- Client hostnameclient_port- Client port numberbackend_start- Time when this process was starteduptime_seconds- How long replica has been connectedbackend_xmin- Transaction ID thresholdrepl_state- Current state (streaming, catchup, backup)sync_state- Synchronous commit state (async, potential, quorum, sync)sync_priority- Priority for synchronous replicationsent_lsn- Last WAL location sent to replicawrite_lsn- Last WAL location written by replicaflush_lsn- Last WAL location flushed by replicareplay_lsn- Last WAL location replayed by replicawrite_lag_bytes- Bytes between sent and writeflush_lag_bytes- Bytes between sent and flushreplay_lag_bytes- Bytes between sent and replaywrite_lag_mb- MB lag for writeflush_lag_mb- MB lag for flushreplay_lag_mb- MB lag for replaywrite_lag_seconds- Time lag for writeflush_lag_seconds- Time lag for flushreplay_lag_seconds- Time lag for replayreply_time- Last status update from replicalast_msg_age_seconds- Seconds since last messagehealth_status- HEALTHY, WARNING, CRITICAL, SYNCING, DISCONNECTED
Example: Monitor replica health
SELECT
application_name,
client_addr,
replay_lag_mb,
replay_lag_seconds,
health_status
FROM pg_stat_insights_physical_replication
ORDER BY replay_lag_seconds DESC NULLS LAST;
pg_stat_insights_logical_replication¶
Logical replication slot monitoring with lag tracking
Columns:
slot_name- Replication slot identifierplugin- Logical decoding plugin (pgoutput, wal2json, etc.)slot_type- Always 'logical'database- Database nameactive- Whether slot is actively being usedactive_pid- PID of process using the slotxmin- Oldest transaction that slot prevents from being vacuumedcatalog_xmin- Oldest transaction affecting system catalogsrestart_lsn- WAL position slot needs to restart fromconfirmed_flush_lsn- WAL position confirmed by subscriberwal_status- WAL availability status (reserved, extended, unreserved, lost)safe_wal_size- Bytes until wal_keep_size limittwo_phase- Whether slot supports two-phase commitlag_bytes- Bytes of WAL laglag_mb- MB of WAL lagwal_files_retained- Number of WAL segment files retained
Example: Monitor logical replication lag
SELECT
slot_name,
database,
plugin,
active,
lag_mb,
wal_files_retained,
wal_status
FROM pg_stat_insights_logical_replication
ORDER BY lag_mb DESC;
pg_stat_insights_replication_slots¶
All replication slots (physical + logical) with health monitoring
Columns:
- All standard pg_replication_slots columns
lag_bytes- Total lag in byteslag_mb- Total lag in megabyteswal_files_retained- WAL segments being retainedhealth_status- HEALTHY, INACTIVE, WARNING, CRITICAL, HIGH_LAG
Example: Identify problematic slots
SELECT
slot_name,
slot_type,
database,
active,
health_status,
lag_mb,
wal_status
FROM pg_stat_insights_replication_slots
WHERE health_status IN ('CRITICAL', 'WARNING', 'HIGH_LAG')
ORDER BY lag_mb DESC;
pg_stat_insights_replication_summary¶
Cluster-wide replication overview
Columns:
physical_replicas_connected- Number of streaming replicasphysical_slots_active- Active physical replication slotslogical_slots_active- Active logical replication slotsinactive_slots- Slots not currently in useslots_with_lost_wal- Slots that lost required WALmax_replay_lag_bytes- Maximum replay lag across all replicasmax_replay_lag_seconds- Maximum replay lag in secondsavg_replay_lag_seconds- Average replay lag in secondscurrent_wal_lsn- Current WAL write positiontotal_slot_lag_bytes- Combined lag of all slotsstreaming_replicas- Replicas in streaming statecatchup_replicas- Replicas catching upsync_replicas- Synchronous replicaspotential_sync_replicas- Potential sync candidates
Example: Get cluster replication overview
SELECT
physical_replicas_connected,
logical_slots_active,
max_replay_lag_seconds,
streaming_replicas,
sync_replicas,
ROUND((max_replay_lag_bytes::numeric / 1024 / 1024), 2) AS max_lag_mb
FROM pg_stat_insights_replication_summary;
pg_stat_insights_replication_alerts¶
Critical alert detection for replication issues
SELECT * FROM pg_stat_insights_replication_alerts
WHERE alert_level LIKE 'CRITICAL%' OR alert_level LIKE 'WARNING%';
Columns:
replication_type- PHYSICAL or LOGICALidentifier- application_name (physical) or slot_name (logical)source- client_addr (physical) or database (logical)alert_level- OK, INFO, WARNING, CRITICAL with descriptionlag_seconds- Time lag (physical only)lag_mb- Space lag in megabytesstate- Current replication statesync_state- Synchronization state or plugin namelast_message_age_seconds- Seconds since last message (physical only)
Alert Thresholds:
Physical Replication: - CRITICAL: Not streaming or lag > 5 minutes - WARNING: Lag > 1 minute - INFO: Lag > 10 seconds
Logical Replication: - CRITICAL: WAL lost or lag > 1GB - WARNING: Inactive or WAL unreserved or lag > 100MB
Example: Get all critical alerts
SELECT
replication_type,
identifier,
alert_level,
lag_mb,
state
FROM pg_stat_insights_replication_alerts
WHERE alert_level NOT LIKE 'OK%'
ORDER BY
CASE
WHEN alert_level LIKE 'CRITICAL%' THEN 1
WHEN alert_level LIKE 'WARNING%' THEN 2
ELSE 3
END,
lag_mb DESC NULLS LAST;
pg_stat_insights_replication_wal¶
Detailed WAL statistics and retention analysis
Columns:
current_wal_lsn- Current WAL write positioncurrent_wal_insert_lsn- Current WAL insert positiontotal_wal_generated_bytes- Total WAL generated since cluster inittotal_wal_generated_gb- Total WAL in gigabyteswal_files_count- Number of WAL segment fileswal_total_size_bytes- Total size of all WAL fileswal_total_size_mb- Total WAL size in megabyteswal_keep_size- wal_keep_size settingmax_wal_size- max_wal_size settingmin_wal_size- min_wal_size settingoldest_slot_lsn- Oldest LSN required by any slotwal_retained_mb- WAL retained for replication slots
Example: Check WAL retention
SELECT
wal_files_count,
wal_total_size_mb,
wal_retained_mb,
wal_keep_size,
CASE
WHEN wal_retained_mb::numeric > wal_keep_size::numeric THEN 'WARNING: Exceeding wal_keep_size'
WHEN wal_total_size_mb > 10000 THEN 'WARNING: WAL directory > 10GB'
ELSE 'OK'
END AS status
FROM pg_stat_insights_replication_wal;
pg_stat_insights_replication_bottlenecks¶
Identify replication bottlenecks (network, I/O, or replay)
Columns:
application_name- Replica identifierclient_addr- Replica addressstate- Replication statesync_state- Synchronization statebottleneck_type- Type of bottleneck detected or "No bottleneck detected"write_lag_bytes,flush_lag_bytes,replay_lag_bytes- Lag at each stagewrite_lag_sec,flush_lag_sec,replay_lag_sec- Time lag at each stagelast_msg_age_sec- Seconds since last heartbeatbackend_xmin- Transaction horizonconnection_age_sec- Connection duration
Bottleneck Types:
- Network bottleneck - Slow write phase (network transfer)
- Disk I/O bottleneck - Slow flush phase (disk writes)
- Replay bottleneck - Slow apply phase (query execution on replica)
- High lag - General lag without specific bottleneck
- Not streaming - Connection or configuration issue
Example: Diagnose replication bottlenecks
SELECT
application_name,
bottleneck_type,
write_lag_sec,
flush_lag_sec,
replay_lag_sec,
ROUND((write_lag_bytes::numeric / 1024 / 1024), 2) AS write_lag_mb,
ROUND((flush_lag_bytes::numeric / 1024 / 1024), 2) AS flush_lag_mb,
ROUND((replay_lag_bytes::numeric / 1024 / 1024), 2) AS replay_lag_mb
FROM pg_stat_insights_replication_bottlenecks
WHERE bottleneck_type != 'No bottleneck detected';
pg_stat_insights_replication_conflicts¶
Logical replication conflict and WAL status monitoring
Columns:
slot_name- Logical slot identifierdatabase- Target databaseplugin- Decoding pluginconflicting- Has unresolved conflictswal_status- WAL availability (reserved, lost, unreserved)conflict_status- Human-readable status with severityactive- Slot currently in usexmin,catalog_xmin- Transaction horizonslag_bytes,lag_mb- Current lagsafe_wal_size- Bytes until WAL limitwal_safety_status- WAL safety assessmentwal_files_held- Number of WAL segments retained
Example: Find slots with conflicts or WAL issues
SELECT
slot_name,
database,
conflict_status,
wal_safety_status,
lag_mb,
wal_files_held,
active
FROM pg_stat_insights_replication_conflicts
WHERE conflict_status != 'OK' OR wal_safety_status != 'OK'
ORDER BY
CASE conflict_status
WHEN 'CRITICAL: Required WAL segments lost' THEN 1
WHEN 'CRITICAL: Exceeding wal_keep_size' THEN 2
ELSE 3
END;
pg_stat_insights_replication_health¶
Comprehensive health check with actionable recommendations
Columns:
slot_name- Replication slot nameslot_type- physical or logicaldatabase- Database nameplugin- Logical decoding plugin (if logical)active,temporary- Slot status flagswal_status- WAL segment statusoverall_health- OK, WARNING, CRITICALissues- Array of detected issueslag_bytes,lag_mb- Current lag metricswal_files_held- WAL segments retainedsafe_wal_size,safe_wal_size_mb- Safe WAL space remainingrecommendation- Actionable advice for fixing issues
Example: Get health report with recommendations
SELECT
slot_name,
slot_type,
overall_health,
lag_mb,
wal_files_held,
issues,
recommendation
FROM pg_stat_insights_replication_health
WHERE overall_health != 'OK'
ORDER BY
CASE overall_health
WHEN 'CRITICAL' THEN 1
WHEN 'WARNING' THEN 2
ELSE 3
END;
pg_stat_insights_replication_performance¶
Performance trends and replay throughput analysis
Columns:
application_name,client_addr- Replica identificationstate,sync_state- Replication statussent_lsn,replay_lsn- WAL positionscurrent_lag_bytes,current_lag_mb,current_lag_seconds- Current lag metricsuptime_seconds- Connection durationavg_lag_bytes_per_second- Average lag accumulation ratereplay_rate_bytes_per_second- Replay throughputreplay_rate_mb_per_second- Replay throughput in MB/sperformance_rating- Excellent, Good, Fair, Poor, Critical
Performance Ratings:
- Excellent - Lag < 1 second
- Good - Lag < 5 seconds
- Fair - Lag < 30 seconds
- Poor - Lag < 5 minutes
- Critical - Lag > 5 minutes
Example: Analyze replication performance
SELECT
application_name,
performance_rating,
current_lag_seconds,
replay_rate_mb_per_second,
avg_lag_bytes_per_second,
ROUND((uptime_seconds::numeric / 3600), 1) AS uptime_hours
FROM pg_stat_insights_replication_performance
ORDER BY current_lag_seconds DESC;
pg_stat_insights_replication_timeline¶
Historical timeline analysis with lag trends
Columns:
application_name,client_addr- Replica identificationbackend_start- When connection startedconnected_for_seconds,connected_for_hours- Connection durationsent_lsn,replay_lsn- WAL positionsreplay_lag_bytes,replay_lag_mb,replay_lag_seconds- Current lagavg_lag_mb_per_hour- Average lag accumulation ratereplay_throughput_kb_per_sec- Replay speedreply_time- Last heartbeat timeheartbeat_age_seconds- Time since last heartbeatstatus_message- Human-readable status with severity
Example: Identify lagging replicas over time
SELECT
application_name,
connected_for_hours,
replay_lag_seconds,
avg_lag_mb_per_hour,
replay_throughput_kb_per_sec,
status_message
FROM pg_stat_insights_replication_timeline
ORDER BY replay_lag_seconds DESC;
Replication Monitoring Quick Reference¶
Quick diagnostics for common scenarios:
-- Check overall cluster replication health
SELECT * FROM pg_stat_insights_replication_summary;
-- Find all current alerts
SELECT * FROM pg_stat_insights_replication_alerts
WHERE alert_level != 'OK';
-- Identify bottlenecks
SELECT application_name, bottleneck_type
FROM pg_stat_insights_replication_bottlenecks
WHERE bottleneck_type != 'No bottleneck detected';
-- Check logical replication conflicts
SELECT slot_name, conflict_status, lag_mb
FROM pg_stat_insights_replication_conflicts
WHERE conflict_status != 'OK';
-- Get health recommendations
SELECT slot_name, overall_health, recommendation
FROM pg_stat_insights_replication_health
WHERE recommendation IS NOT NULL;
-- Monitor performance trends
SELECT application_name, performance_rating, replay_rate_mb_per_second
FROM pg_stat_insights_replication_performance;
-- Analyze WAL retention
SELECT wal_files_count, wal_total_size_mb, wal_retained_mb
FROM pg_stat_insights_replication_wal;
Logical Replication Monitoring¶
pg_stat_insights_subscriptions¶
Monitor logical replication subscriptions on subscriber side
Columns:
subscription_name- Subscription identifiersubscription_oid- Subscription object IDdatabase- Target databasestatus- enabled or disabledconnection_info- Publisher connection stringslot_name- Replication slot on publishersync_commit- Synchronous commit modepublications- Array of publication nameshealth_status- Active, Subscription disabled, No replication slotrecommendation- Actionable advice if issues detected
Example: Check subscription health
SELECT
subscription_name,
status,
slot_name,
publications,
health_status,
recommendation
FROM pg_stat_insights_subscriptions
ORDER BY
CASE health_status
WHEN 'Active' THEN 3
WHEN 'No replication slot' THEN 2
ELSE 1
END;
pg_stat_insights_subscription_stats¶
Per-table subscription synchronization status
Columns:
subid- Subscription IDsubscription_name- Subscription namerelid- Table OIDtable_name- Fully qualified table name (schema.table)sync_state- Current sync state code (i,d,s,r)subscription_lsn- LSN of subscriptionsync_state_description- Initialize, Data copy, Synchronized, Ready, Unknownstatus_message- Human-readable status
Sync States:
- i (Initialize) - Initial state before copying
- d (Data copy) - Currently copying initial data
- s (Synchronized) - Fully synchronized with publisher
- r (Ready) - Ready for synchronization
Example: Monitor table sync progress
SELECT
subscription_name,
table_name,
sync_state_description,
status_message
FROM pg_stat_insights_subscription_stats
WHERE sync_state != 's'
ORDER BY subscription_name, table_name;
pg_stat_insights_publications¶
Monitor logical replication publications on publisher side
Columns:
publication_name- Publication identifierpublication_oid- Publication object IDdatabase- Publisher databasescope- "All tables" or "Selected tables"operations- Comma-separated list: INSERT, UPDATE, DELETE, TRUNCATEpartition_mode- "Via root" or "Direct"table_count- Number of tables in publicationactive_subscribers- Number of active replication slots
Example: Monitor publications and subscribers
SELECT
publication_name,
scope,
operations,
table_count,
active_subscribers,
CASE
WHEN active_subscribers = 0 THEN 'No active subscribers'
WHEN active_subscribers > 0 THEN 'Publishing to ' || active_subscribers || ' subscriber(s)'
END AS status
FROM pg_stat_insights_publications
ORDER BY active_subscribers DESC;
pg_stat_insights_replication_origins¶
Track replication origins for cascading and bidirectional replication
Columns:
origin_id- Origin identifierorigin_name- Origin namesession_active- Whether origin session is currently activeremote_lsn- Last LSN received from remotelocal_lsn- Last LSN applied locallylag_bytes- Replication lag in byteslag_mb- Replication lag in megabytes
Use Cases:
- Cascading replication (replica replicating to another replica)
- Bidirectional replication (multi-master)
- Conflict detection in multi-origin setups
- Lag monitoring across replication topology
Example: Monitor cascading replication
SELECT
origin_name,
session_active,
remote_lsn,
local_lsn,
lag_mb,
CASE
WHEN lag_mb > 1000 THEN 'CRITICAL: Lag > 1GB'
WHEN lag_mb > 100 THEN 'WARNING: Lag > 100MB'
WHEN lag_mb > 0 THEN 'OK: ' || lag_mb || ' MB behind'
ELSE 'OK: Fully synchronized'
END AS status
FROM pg_stat_insights_replication_origins
ORDER BY lag_mb DESC;
pg_stat_insights_replication_dashboard¶
Unified JSON dashboard for comprehensive replication monitoring
Structure:
The dashboard view organizes replication data into sections with JSON details:
- CLUSTER_SUMMARY - Overall cluster statistics
- physical_replicas, logical_slots, active_subscriptions
- active_publications, max_lag_seconds
-
critical_alerts, warning_alerts
-
PHYSICAL_REPLICA - One row per physical replica
- client_addr, state, sync_state, health_status
-
replay_lag_mb, replay_lag_seconds, uptime_hours
-
LOGICAL_SLOT - One row per logical replication slot
- database, plugin, active, wal_status
-
lag_mb, wal_files_retained
-
ALERT - One row per active alert
- replication_type, alert_level, lag_mb, state
Example: Get comprehensive dashboard view
-- Full dashboard
SELECT
section,
name,
jsonb_pretty(details::jsonb) AS details
FROM pg_stat_insights_replication_dashboard
ORDER BY section, name;
-- Just cluster summary
SELECT details
FROM pg_stat_insights_replication_dashboard
WHERE section = 'CLUSTER_SUMMARY';
-- Only alerts
SELECT name, details->>'alert_level' AS alert, details->>'lag_mb' AS lag_mb
FROM pg_stat_insights_replication_dashboard
WHERE section = 'ALERT'
ORDER BY details->>'alert_level';
Example: Export to monitoring system
-- Export full replication status as JSON for Grafana/Prometheus
SELECT json_agg(
json_build_object(
'section', section,
'name', name,
'details', details
)
) AS replication_status
FROM pg_stat_insights_replication_dashboard;
Complete Replication Monitoring Workflow¶
Production-ready monitoring queries:
-- Daily Health Check
SELECT * FROM pg_stat_insights_replication_summary;
SELECT * FROM pg_stat_insights_replication_alerts WHERE alert_level != 'OK';
-- Investigate Issues
SELECT * FROM pg_stat_insights_replication_bottlenecks WHERE bottleneck_type != 'No bottleneck detected';
SELECT * FROM pg_stat_insights_replication_health WHERE overall_health != 'OK';
-- Logical Replication Status
SELECT * FROM pg_stat_insights_subscriptions;
SELECT * FROM pg_stat_insights_subscription_stats WHERE sync_state != 's';
SELECT * FROM pg_stat_insights_publications;
-- Performance Analysis
SELECT * FROM pg_stat_insights_replication_performance;
SELECT * FROM pg_stat_insights_replication_timeline ORDER BY replay_lag_seconds DESC LIMIT 10;
-- WAL & Capacity Planning
SELECT * FROM pg_stat_insights_replication_wal;
SELECT * FROM pg_stat_insights_replication_conflicts;
-- Cascading & Multi-Origin
SELECT * FROM pg_stat_insights_replication_origins;
-- Dashboard Export
SELECT * FROM pg_stat_insights_replication_dashboard;