Debezium Setup v2.4 for Postgres
This page give instructions for setting up a development environment to test CDC for Postgres. A docker compose file is used to create all necessary applications and an example version of Postgres is included as part of that. To use the CDC set up in a production environment with your own instance of Postgres please consult the official documentation.
download the attached docker-compose file and the register connector file, and save them to a new directory.
A .env file with variables for the values shown below also needs to be created in the directory.
TRANSFORMER_LICENSE=<transformer-license>
DEBEZIUM_VERSION=2.4
GRAPH_DATABASE_ENDPOINT=<your-graphdb-endpoint>
GRAPH_DATABASE_TYPE=<graphdb-type> #see Graph Writer Docs
GRAPH_DATABASE_USERNAME=<username>
GRAPH_DATABASE_PASSWORD=<password>
HOST_DIRECTORY_PATH=<host-directory-path> # e.g. /Users/myuser/Documents/transformer_testing/graphbuild/sstransformer/
HOST_IP=<host-ip> # e.g. 192.168.1.23The Debezium version is the latest at the time of writing, see here for what is current. HOST_DIRECTORY_PATH is the directory that the Semi Structured Transformer will use to create its directory structure under.
The following commands must all be run from a terminal within that directory.
Start the deployment stack
docker-compose -f docker-compose-postgres.yaml up
This will deploy Zookeeper, Kafka, Postgres, Kafka Connect, the Graph Writer and the Semi-Structured Transformer.
(If you have problems with start up of applications within the docker compose and all the details in your .env file are correct this can be caused by left over docker resources and networks. Running docker system prune can be used to remove any hanging docker resources that may be causing problems.)
Place the mapping file in the mapping directory
Once the Semi-Structured Transformer has been created from the stack it should have created a mapping directory underneath the host directory. Using the example in the .env variables above /Users/myuser/Documents/transformer_testing/graphbuild/sstransformer/ would have a directory underneath it with the path /Users/myuser/Documents/transformer_testing/graphbuild/sstransformer/mapping/
Register the Postgres connector
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-postgres.json
The first time it connects to a Postgres server or cluster, the connector takes a consistent snapshot of all schemas it is connected to. After that snapshot is complete, the connector continuously captures row-level changes that insert, update, and delete database content and that were committed to a Postgres database. When the Connector is registered it will therefore capture all the current data in the customers table it is connected to and this will be transformed by the CDC pipeline into Graph data. It will then wait to capture the further changes that are made. If you wish to change whether an initial snapshot is taken of a database this can be configured using the snapshot.mode property in the connector configuration. See here for the current Debezium configuration instructions.
Start the Postgres CLI
docker-compose -f docker-compose-postgres.yaml exec postgres env PGOPTIONS="--search_path=inventory" bash -c 'psql -U $POSTGRES_USER postgres'
(Optional) Monitor your Kafka topic
You can connect to your cluster with Conduktor using
localhost:9092and consuming thedbserver1.inventory.customerstopic
Testing End To End CDC
Run the following queries in your Postgres terminal to simulate change data, view your Graph data between each query to see the changes. Ensure you have configured your table correctly, see the Postgres Setup section below for more info.
List Databases:
\lConnect to inventory database:
\c postgresList tables:
\dtSelect all:
SELECT * FROM customers;Insert Query:
INSERT INTO customers (id, first_name, last_name, email) VALUES('1005', 'Russ', 'Waterson', '[email protected]');Update Query:
UPDATE customers SET first_name='Russell' WHERE id='1005';Delete Query:
DELETE FROM customers WHERE id='1005';
Shut down cluster
docker-compose -f docker-compose-postgres.yaml down
Troubleshooting
If you get an error with zookeeper where it fails to start due to something similar: Refusing session request for client /172.22.0.1:56004 as it has seen zxid 0xd1 our last zxid is 0x5a client must try another server. This is due to an old Kafka connection persisting within Docker. To fix this, run docker system prune to clean up any dangling networks and caches etc. Also close any connections you may have with the cluster, either in the terminal or with Conduktor.
Postgres DB Setup
To have the correct shape of Debezium data, the user’s Postgres tables need to have a specific flag set. The Replica Identity needs to be set to FULL so that each column is returned in the before data with any update or delete operations. This can be done with the following command against the table in the database:
ALTER TABLE customers REPLICA IDENTITY FULL where the tables name is customers
Last updated

