Hi Team,
I am using a greenplum-db-6.8.1 iwith 3 segment servers running on EC2 instance (r6g.8xlarge type). We are frequently performing reads and writes on the database as we use this db for our reports functionality so reads and writes happen very frequently.
The issue is we are seeing the segments gets down everytime and have to go for full recovery.
We need help on this . How to avoid these frequent segment failures. Do we need to make changes to default config values like gp_fts_probe_interval, gp_fts_probe_timeout, gp_fts_probe_retries as we are using the default values???
https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/admin_guide-highavail-topics-g-detecting-a-failed-segment.html
Below is the configuration of our greenplum cluster:
1 Master Server -- 32cpus and 256Gb Memorty
3 Segment servers -- 32cpus and 256Gb memory
Below is the current gpstate
20230630:19:26:59:003334 gpstate:-----:gpadmin-[INFO]:-Gathering data from segments...
.
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-Greenplum instance status summary
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Master instance = Active
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Master standby = No master standby configured
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total segment instance count from metadata = 48
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Primary Segment Status
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total primary segments = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total primary segment valid (at master) = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[WARNING]:-Total primary segment failures (at master) = 24
<<<<<<<<
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid files found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of /tmp lock files found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number postmaster processes missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number postmaster processes found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Mirror Segment Status
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total mirror segments = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total mirror segment valid (at master) = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid files found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of /tmp lock files found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number postmaster processes missing = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number postmaster processes found = 24
20230630:19:27:00:003334 gpstate:-----:gpadmin-[WARNING]:-Total number mirror segments acting as primary segments = 24
<<<<<<<<
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number mirror segments acting as mirror segments = 0
20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------
[gpadmin@----- local]$