Replication Auto-Fail with error:1236

View Only

Back to discussions

Expand all | Collapse all

Replication Auto-Fail with error:1236

Jump to Best Answer

1. Replication Auto-Fail with error:1236

0 Recommend
ROHAN SINHA
Posted Sep 18, 2020 02:12 AM

Reply Reply Privately
Hi All,

We have a cluster of 2 nodes, GW1 and GW2 with their databases SSG1 and SSG2 respectively. Our primary database SSG1 shut down automatically due to some technical issues. When we resatrted the mysql service for SSG1 database, we noticed below error on the secondary database SSG2:

Last_IO_Errno: 1236

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

Based on the below tech doc link we understand that restarting replication of the secondary database in the correct solution. This is given in the section 'When Secondary Node Slave Fails' in the below link:

https://techdocs.broadcom.com/us/en/ca-enterprise-software/layer7-api-management/api-gateway/9-4/install-configure-upgrade/configure-a-gateway-cluster/configuring-cluster-database-replication/restart-replication.html

Pleaser let us know if this is the correct solution. If yes, there is a step in that link as mentioned below which is to be performed on the primary node (GW1) :-

3>Restart the replication on the Primary database node:

[primary]
# ./restart_replication.sh

Enter hostname or IP for the MASTER: [SET ME] machine.mycompany.com
Enter replication user: [repluser]
repluser

........

As per our knowledge the highlighted hostname should be the secondary gateway hostname GW2 as it is the master of the primary node. Please confirm if this understanding is correct or not.

Regards,
Rohan

------------------------------
[Technology Architect]
[Infosys Limited]
------------------------------
2. RE: Replication Auto-Fail with error:1236

0 Recommend
Broadcom Employee

Matthew Hogan
Posted Sep 18, 2020 09:10 AM

Reply Reply Privately
Hello Rohan,

This error is common when the database gets full and then fixed without repairing replication. Yes, reinitializing replication is the recommended steps. This is done on both the primary and the secondary. The below KB may better state the steps required.

Reinitialize replication in a multi-node cluster: https://knowledge.broadcom.com/external/article?articleId=44402

------------------------------
Support Engineer
Broadcom
------------------------------

Original Message
3. RE: Replication Auto-Fail with error:1236

0 Recommend
ROHAN SINHA
Posted Sep 18, 2020 10:32 AM

Reply Reply Privately
Hi Matthew,

Thanks for your reply. We don't think that our database would have got full. Can you provide me commands to check the database size.

Also, in the below step during replication I am getting a bit confused

3>Restart the replication on the Primary database node:

[primary]
# ./restart_replication.sh

Enter hostname or IP for the MASTER: [SET ME] machine.mycompany.com
Enter replication user: [repluser]
repluser

........

I am running this command on the primary gateway GW1. Here it asks to enter the hostname for the master. Ideally the master for GW1 will be 'GW2'. So shall I put 'GW2' hostname here, or put 'GW1' (Primary gateway where I am running this command).

Please help me out in this step.

Regards,
Rohan

------------------------------
[Technology Architect]
[Infosys Limited]
------------------------------

Original Message
4. RE: Replication Auto-Fail with error:1236
Best Answer

0 Recommend
Broadcom Employee

Matthew Hogan
Posted Sep 18, 2020 11:47 AM
Edited by ROHAN SINHA Sep 18, 2020 12:21 PM

Reply Reply Privately
Hello Rohan,

Please understand that you can get this error for more reasons then the partition being full, this was just a common example.

The MySQL partition in our OVA/appliance builds should be /var/lib/mysql you can use commands such as df -h for the size/space used.

I would look over the provided KB as it answers your question, this is why I provided this as I feel it will provide more insight into this procedure.

Here is a quick look at that kb as it addresses this question:
Execute the create_slave.sh script on the primary node: /opt/SecureSpan/Appliance/bin/create_slave.sh
Provide the FQDN of the secondary node when prompted

Execute the attached create_slave.sh script on the secondary node: /opt/SecureSpan/Appliance/bin/create_slave.sh
Provide the fully qualified domain name (FQDN) of the primary node

The KB provides a bit more detail to the steps

------------------------------
Support Engineer
Broadcom
------------------------------

Original Message
5. RE: Replication Auto-Fail with error:1236

0 Recommend
ROHAN SINHA
Posted Sep 18, 2020 12:21 PM

Reply Reply Privately
Thanks Matthew,

We followed the steps in the link provided by you and post that the replication is working. So the issue is resolved now.

Regards,
Rohan

------------------------------
[Technology Architect]
[Infosys Limited]
------------------------------

Original Message

Layer7 API Management

Replication Auto-Fail with error:1236

ROHAN SINHASep 18, 2020 02:12 AM

Matthew HoganSep 18, 2020 09:10 AM

ROHAN SINHASep 18, 2020 10:32 AM

Matthew HoganSep 18, 2020 11:47 AMBest Answer

ROHAN SINHASep 18, 2020 12:21 PM

1. Replication Auto-Fail with error:1236

2. RE: Replication Auto-Fail with error:1236

3. RE: Replication Auto-Fail with error:1236

4. RE: Replication Auto-Fail with error:1236 Best Answer

5. RE: Replication Auto-Fail with error:1236

4. RE: Replication Auto-Fail with error:1236
Best Answer