GemFire WAN Replication to AWS
Setting up WAN replication with GemFire between private data centers without NAT-ing is fairly straightforward, but what if you are in a situation where you want to do WAN replication between machines that are separated by a NAT (or even multiple NATs)? Luckily, GemFire's Gateway Receiver provides for this case with a configuration setting which controls how remote systems connect to it called "hostname-for-clients".
Locators
You can find out a lot of detail about how WAN replication works at the WAN Replication reference manual page, but I'll recap the basics here. Just skip down to How to Set up WAN Replication to AWS if you just want an example of a working config.GemFire can use a process called a Locator to maintain a list of all the servers in a single cluster (called a "distributed system" in GemFire parlance). As servers start up, they register themselves with the Locator (or possible multiple Locators if you run more than one for reliability). Then, when local clients want to connect to the distributed system, they query the Locator for a server to connect to. One way to look at this is that the Locator a bit like a DNS system.
Locators also maintain a list of remote locators that are responsible for cataloging other distributed systems. In a WAN replication situation, servers that want to send data to a remote distributed system ask their local distributed system's Locator process for a server that can accept replication events for a remote distributed system. The local Locator process actually received a list of remote servers that could accept replication events when it makes contact with any remote Locator processes in it's configuration. The local Locator can then hand back a remote server from it's list to the local server for replication. To use the DNS metaphor from earlier, this is similar to a single DNS server forwarding requests to remote DNS servers for domains that it isn't authoritative for. In this way the whole system appears to be one large, unified catalog of resources.
Problems arise whenever servers are behind a Firewall/NAT due to the way that servers register themselves with a Locator process. Normally, when a server starts up, it simple takes whatever address it is bound to, and registers that with the Locator(s) for the distributed system it is part of. Oftentimes, this address is a private IP address that isn't routable across the Internet. This means that the address will work fine for machines within the same private network, but that address can't be contacted over the internet directly. So in this scenario, if a remote server in one site tried to use the private address of a remote site to connect, it would likely fail unless some other provisions were made at a network level.
That problem might look similar to the following. In the locator log, it seems like the local locator is able to contact the remote locator because you see a log entry like the following:
[info ... locator-1 <WAN Locator Discovery Thread> tid=0x4e] Locator discovery task exchanged locator information 192.168.?.?[10334] with ec2-54-235-?-?.compute-1.amazonaws.com[10334].
However, when you start up the server with a gateway sender, you'll see something similar to the following in the logs that indicates a problem:
[warning ... server-1 <Event Processor for GatewaySender_sender1> tid=0x42] Remote locator host port information for remote site "2" is not available in local locator "192.168.?.?[10334]". [warning ... server-1 <Event Processor for GatewaySender_sender1> tid=0x42] GatewaySender "sender1" could not get remote locator information for remote site "2".
To correct this problem, we need to give the gateway receiver a hint about what our "public" name or IP is to pass along to remote systems when they need to connect. The gateway receiver configuration provides for this by allowing you to specify a
hostname-for-clients
attribute on the gateway-receiver
element in the cache.xml. The following section will walk you through how to set this up in AWS replicating to a machine in your local network.How to Set up WAN Replication to AWS
These instructions will allow you to set up two machines with collocated Locator and Server processes to experiment with WAN replication in GemFire. The destination machine is in AWS, and the source machine is running on a machine in your home network. Ideally, these would be separate systems, and with slight modifications, these instructions could be used to set up that type of scenario if needed. These instructions assume you are familiar with AWS basics, so review the AWS documentation if you are not.
On AWS
- Create an Elastic IP in AWS.
This step is optional, but recommended as when you stop and restart instances in AWS, their external IP addresses are likely to change. Using an elastic IP allows you to retain the same external IP/hostname across instance restarts. Trust me, you will be less frustrated if you do this. - Launch micro instance using Amazon Linux and allocate your Elastic IP to it.
Really you could do this with any image type. GemFire provides a DEB installer, and RPM installer, and plain tar.gz and zip files to install by hand. - When you define a security group for the instance, make sure to allow SSH, but also allow TCP ports 10334 and also TCP ports 1530-1551. The 10334 port is for the Locator process, and the 1530-1551 ports are for the gateway receiver in the Server process.
- After the instance is started, use scp to copy the Gemfire install package you want to use to
ec2-user@<hostname-for-elastic-ip>:/home/ec2-user
, where<hostname-for-elastic-ip>
is the hostname for the elastic IP you created earlier.
If you are using Amazon Linux or another RHEL based image, then you can use the RPM installer to make things easier. The rest of these instructions assume you are using the RPM based installer. - SSH to instance you created.
- Install a JDK.
In Amazon Linux, you can use sudo yum install java-1.7.0-openjdk-devel. - Install GemFire.
Again in Amazon Linux, you can usesudo yum localinstall /home/ec2-user/pivotal-gemfire-*.rpm
. - Create directories in /users/ec2-user called locator and server.
- Create gemfire.properties in locator directory with the following content, replacing the
<hostname-for-elastic-ip>
with the hostname for the elastic IP you created earlier, and the<local-system-public-ip>
with the public IP:
name=locator-2 distributed-system-id=2 mcast-port=0 locators=<hostname-for-elastic-ip>[10334] bind-address=<hostname-for-elastic-ip> remote-locators=<local-system-public-ip>[10334]
- Create gemfire.properties in server directory with the following content, replacing the
<hostname-for-elastic-ip>
with the hostname for the elastic IP you created earlier:
name=server1-2 distributed-system-id=2 mcast-port=0 locators=<hostname-for-elastic-ip>[10334] bind-address=<hostname-for-elastic-ip>
- Create cache.xml in server directory with the following content, replacing the
<hostname-for-elastic-ip>
with the hostname for the elastic IP you created earlier:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE cache PUBLIC "-//GemStone Systems, Inc.//GemFire Declarative Cache 7.0//EN" "http://www.gemstone.com/dtd/cache7_0.dtd"> <cache> <gateway-receiver start-port="1530" end-port="1551" hostname-for-clients="<hostname-for-elastic-ip>" /> <region name="TEST"> <region-attributes scope="distributed-ack" data-policy="replicate" /> </region> </cache>
- Change directory to the /home/ec2-user directory.
- Start /opt/pivotal/Pivotal_GemFire_*/bin/gfsh and issue the following commands to start the server:
start locator --name=locator-2 --dir=locator start server --name=server1-2 --dir=server
On Local System
- Change directory to your home directory, and create directories called locator and server.
On Windows this is%USERPROFILE%\locator
and%USERPROFILE%\server
, and in most OSes it is~/locator
, and~/server
. - Create a gemfire.properties file in the locator directory with the following content, replacing the
<local-internal-ip>
with the IP address of your local system, and the<hostname-for-elastic-ip>
with the hostname for the elastic IP you created earlier:name=locator-1 distributed-system-id=1 mcast-port=0 locators=<local-internal-ip>[10334] bind-address=<local-internal-ip> remote-locators=<hostname-for-elastic-ip>[10334]
- Create a gemfire.properties file in the server directory with the following content, replacing the
<local-internal-ip>
with the IP address of your local system:
name=server1-1 distributed-system-id=1 mcast-port=0 locators=<local-internal-ip>[10334] bind-address=<local-internal-ip>
- Create a cache.xml file in the server directory with the following content:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE cache PUBLIC "-//GemStone Systems, Inc.//GemFire Declarative Cache 7.0//EN" "http://www.gemstone.com/dtd/cache7_0.dtd"> <cache> <gateway-sender id="sender1" parallel="false" remote-distributed-system-id="2"/> <region name="TEST"> <region-attributes scope="distributed-ack" data-policy="replicate" gateway-sender-ids="sender1"/> </region> </cache>
- In gfsh, replacing the
<path-to-locator-dir>
and<path-to-server-dir>
with the paths to the directories you created in step 1:
start locator --name=locator-1 --dir=<path-to-locator-dir> start server --name=server1-1 --dir=<path-to-server-dir>
Comments
You are correct that if you wanted to have bidirectional replication, then you would need to define a gateway sender on the AWS side much like what you do in step 4 on the local side, and a gateway receiver on the local side much like what you do in step 11 on the AWS side.
Additionally, you would need to configure the NAT/Firewall for your local system to forward packets coming to ports 1530-1551 to your local system running GemFire, and port 10334 to the same system (or whatever system is running your local Locator process).