Hello Rohan,
Please understand that you can get this error for more reasons then the partition being full, this was just a common example.
The MySQL partition in our OVA/appliance builds should be /var/lib/mysql you can use commands such as df -h for the size/space used.
I would look over the provided KB as it answers your question, this is why I provided this as I feel it will provide more insight into this procedure.
Here is a quick look at that kb as it addresses this question:
Execute the create_slave.sh script on the primary node: /opt/SecureSpan/Appliance/bin/create_slave.sh
Provide the FQDN of the secondary node when prompted
Execute the attached create_slave.sh script on the secondary node: /opt/SecureSpan/Appliance/bin/create_slave.sh
Provide the fully qualified domain name (FQDN) of the primary node
The KB provides a bit more detail to the steps
------------------------------
Support Engineer
Broadcom
------------------------------
Original Message:
Sent: 09-18-2020 10:31 AM
From: ROHAN SINHA
Subject: Replication Auto-Fail with error:1236
Hi Matthew,
Thanks for your reply. We don't think that our database would have got full. Can you provide me commands to check the database size.
Also, in the below step during replication I am getting a bit confused
3>Restart the replication on the Primary database node:
[primary]
# ./restart_replication.sh
Enter hostname or IP for the MASTER: [SET ME]
machine.mycompany.com
Enter replication user: [repluser]
repluser
........
I am running this command on the primary gateway GW1. Here it asks to enter the hostname for the master. Ideally the master for GW1 will be 'GW2'. So shall I put 'GW2' hostname here, or put 'GW1' (Primary gateway where I am running this command).
Please help me out in this step.
Regards,
Rohan
------------------------------
[Technology Architect]
[Infosys Limited]
Original Message:
Sent: 09-18-2020 09:09 AM
From: Matthew Hogan
Subject: Replication Auto-Fail with error:1236
Hello Rohan,
This error is common when the database gets full and then fixed without repairing replication. Yes, reinitializing replication is the recommended steps. This is done on both the primary and the secondary. The below KB may better state the steps required.
Reinitialize replication in a multi-node cluster: https://knowledge.broadcom.com/external/article?articleId=44402
------------------------------
Support Engineer
Broadcom
Original Message:
Sent: 09-18-2020 02:11 AM
From: ROHAN SINHA
Subject: Replication Auto-Fail with error:1236
Hi All,
We have a cluster of 2 nodes, GW1 and GW2 with their databases SSG1 and SSG2 respectively. Our primary database SSG1 shut down automatically due to some technical issues. When we resatrted the mysql service for SSG1 database, we noticed below error on the secondary database SSG2:
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
Based on the below tech doc link we understand that restarting replication of the secondary database in the correct solution. This is given in the section 'When Secondary Node Slave Fails' in the below link:
https://techdocs.broadcom.com/us/en/ca-enterprise-software/layer7-api-management/api-gateway/9-4/install-configure-upgrade/configure-a-gateway-cluster/configuring-cluster-database-replication/restart-replication.html
Pleaser let us know if this is the correct solution. If yes, there is a step in that link as mentioned below which is to be performed on the primary node (GW1) :-
3>Restart the replication on the Primary database node:
[primary]
# ./restart_replication.sh
Enter hostname or IP for the MASTER: [SET ME]
machine.mycompany.com
Enter replication user: [repluser]
repluser
........
As per our knowledge the highlighted hostname should be the secondary gateway hostname GW2 as it is the master of the primary node. Please confirm if this understanding is correct or not.
Regards,
Rohan
------------------------------
[Technology Architect]
[Infosys Limited]
------------------------------