Debezium Setup v2.4 for MongoDB
This page give instructions for setting up a development environment to test CDC for MongoDB. A docker compose file is used to create all necessary applications and an example version of Mongodb is included as part of that. To use the CDC set up in a production environment with your own instance of Mongodb please consult the official documentation.
download the attached docker-compose file and the register connector file, and save them to a new directory.
A .env file with variables for the values shown below also needs to be created in the directory.
TRANSFORMER_LICENSE=<transformer-license>
DEBEZIUM_VERSION=2.4
GRAPH_DATABASE_ENDPOINT=<your-graphdb-endpoint>
GRAPH_DATABASE_TYPE=<graphdb-type> #see Graph Writer Docs
GRAPH_DATABASE_USERNAME=<username>
GRAPH_DATABASE_PASSWORD=<password>
HOST_DIRECTORY_PATH=<host-directory-path> # e.g. /Users/myuser/Documents/transformer_testing/graphbuild/sstransformer/
HOST_IP=<host-ip> # e.g. 192.168.1.23The Debezium version is the latest at the time of writing, see here for what is current. HOST_DIRECTORY_PATH is the directory that the Semi Structured Transformer will use to put its directory structure under.
The following commands must all be run from a terminal within that directory.
Start the deployment stack
docker-compose -f docker-compose-mongodb.yaml up
This will deploy Zookeeper, Kafka, MongoDB, Kafka Connect, the Graph Writer and the Semi-Structured Transformer.
(If you have problems with start up of applications within the docker compose and all the details in your .env file are correct this can be caused by left over docker resources and networks. Running docker system prune can be used to remove any hanging docker resources that may be causing problems.)
Place the mapping file in the mapping directory
Once the Semi-Structured Transformer has been created from the stack it should have created a mapping directory underneath the host directory. Using the example in the .env variables above /Users/myuser/Documents/transformer_testing/graphbuild/sstransformer/ would have a directory underneath it with the path /Users/myuser/Documents/transformer_testing/graphbuild/sstransformer/mapping/
Connect to the MongoDB image with mongo shell
docker exec -it mongodb-mongodb-1 mongosh
You may need to change the docker container name that you are connecting to, you can check the names of the running containers with
docker ps
Initialise MongoDB replica set and create a DB and collection (table)
Create DB: use inventory;
Initialise Replica Set: rs.initiate({ _id: "rs0", members: [ { _id: 0, host: "mongodb:27017" } ] });
Create collection (table): db.createCollection( "customers", { changeStreamPreAndPostImages: { enabled: true } } );
Ensure the
changeStreamPreAndPostImagesis enabled when creating new tables
Check collection created: db.getCollectionNames();
Should return
[ 'customers' ]
Check collection can be queried: db.customers.find()
Should return nothing (as the collection is empty) with no errors
Start MongoDB connector
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-mongodb.json
The first time it connects to a MongoDB replica set or cluster, the connector takes a consistent snapshot of all databases and collections it is connected to . After that snapshot is complete, the connector continuously transmits changes that insert, update, and delete database content and that were committed to a MongoDB database. When the Connector is registered it will therefore capture all the current data in the customers collection it is connected to and this will be transformed by the CDC pipeline into Graph data. It will then wait to capture the further changes that are made. If you wish to change whether an initial snapshot is taken of a database this can be configured using the snapshot.mode property in the connector configuration. See here for the current Debezium configuration instructions.
(Optional) Monitor your Kafka topic
You can connect to your cluster with Conduktor using
localhost:9092and consuming thedbserver1.inventory.customerstopic
Testing End To End CDC
Run the following queries in your MongoDB terminal to simulate change data, view your Graph data between each query to see the changes.
List collections (tables):
db.getCollectionNames()Select all:
db.customers.find()Insert Query:
db.customers.insertOne( { _id : 1005, "first_name" : "Russ", "last_name" : "Waterson", "email" : "[email protected]" } );Update Query:
db.customers.updateOne( { _id: 1005 }, { $set: { "first_name" : "Russell" } } );Delete Query:
db.customers.deleteOne( { _id : 1005 } );
Shut down cluster
docker-compose -f docker-compose-mongodb.yaml down
Troubleshooting
If you get an error with zookeeper where it fails to start due to something similar: Refusing session request for client /172.22.0.1:56004 as it has seen zxid 0xd1 our last zxid is 0x5a client must try another server. This is due to an old Kafka connection persisting within Docker. To fix this, run docker system prune to clean up any dangling networks and caches etc. Also close any connections you may have with the cluster, either in the terminal or with Conduktor.
Last updated

