I am using a greenplum-db-6.8.1 with 3 segment servers running on EC2 instance (r6g.8xlarge). We are frequently performing reads and writes . We are seeing all the segments going down almost every day . Kindly suggest on resolving this segment failures

Question

Hi Team,

I am using a greenplum-db-6.8.1 iwith 3 segment servers running on EC2 instance (r6g.8xlarge type). We are frequently performing reads and writes on the database as we use this db for our reports functionality so reads and writes happen very frequently.

The issue is we are seeing the segments gets down everytime and have to go for full recovery.

We need help on this . How to avoid these frequent segment failures. Do we need to make changes to default config values like gp_fts_probe_interval, gp_fts_probe_timeout, gp_fts_probe_retries as we are using the default values???

https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/admin_guide-highavail-topics-g-detecting-a-failed-segment.html

Below is the configuration of our greenplum cluster:

1 Master Server -- 32cpus and 256Gb Memorty

3 Segment servers -- 32cpus and 256Gb memory

Below is the current gpstate

20230630:19:26:59:003334 gpstate:-----:gpadmin-[INFO]:-Gathering data from segments...

.

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-Greenplum instance status summary

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Master instance = Active

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Master standby = No master standby configured

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total segment instance count from metadata = 48

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Primary Segment Status

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total primary segments = 24

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total primary segment valid (at master) = 0

20230630:19:27:00:003334 gpstate:-----:gpadmin-[WARNING]:-Total primary segment failures (at master) = 24

<<<<<<<<

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid files found = 24

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 24

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number of /tmp lock files found = 24

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number postmaster processes missing = 0

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total number postmaster processes found = 24

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Mirror Segment Status

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:-----------------------------------------------------

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total mirror segments = 24

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total mirror segment valid (at master) = 24

20230630:19:27:00:003334 gpstate:-----:gpadmin-[INFO]:- Total mirror segment failures (at master) = 0