The Transporter 0.4 series has begun

Published

Compose's Transporter is a powerful way of transporting data between different databases. With each iteration, we look to improve it and the newly launched Transporter 0.4 series is no different.

In the Transporter 0.3 series, we revamped the way you built data transfer pipelines and tightened it up so all the configuration was in one expressive JavaScript file. In the process, the Compose developers put into place the next step in Transporter's evolution. Now with the Transporter 0.4 series, that work has helped deliver the beta of the Transporter commit log.

Introducing the commit log

In the older Transporter editions, when records came from a source, they were regarded as purely transient and stateless. The Transporter would manipulate them and send them to the sink databases and that was it. This was simple and efficient, but as the use of Transporter grew, so did the number of scenarios it had to cope with. Specifically, it has to handle scenarios where the Transporter, the network or one of the databases goes offline during a transfer.

With Transporter 0.4's commit log, every operation is recorded in the commit log as efficiently as possible. When the Transporter is restarted after some process-ending incident, that commit log allows it to re-synchronize with the databases it has been talking to and then pick-up cleanly where it left off.

Using the commit log

Because it is a beta feature, we've kept it off by default, using the fine principle of least surprise. To enable it on a Transporter pipeline, you just need to give the pipeline a log_dir setting that points to a directory where you want the commit log to live. So a typical Transporter pipeline that looks this:

t.Source("source", source, "/.*/").Save("sink", sink, "/.*/")  

becomes

t.Config({"log_dir":"./commitlogdir"}).Source("source", source, "/.*/").Save("sink", sink, "/.*/")  

The Config function is a general purpose function that takes a set of values which then get applied to configurable elements inside the pipeline. In this case, when log_dir gets set, the commit log is activated and starts using that directory. Each commit log directory should be unique to a pipeline.

When the Transporter starts running, data records are embellished with meta-data as they are read. When those records are written out and confirmed, that information is preserved in the log as writes done by a particular database sink - Transporter can happily write to multiple databases.

To stop the commit log from getting too big, it is compacted every hour or so. It is worth remembering that the commit log will have copies of the data in it and you should treat it as a copy of the database for practical security and privacy purposes.

The Commit log comes into its own when the Transporter is re-run. The Transporter consults the commit log first and uses it to work out what work it had previously done in terms of reading and writing. Once established, it then resumes transporting data over the network without unnecessary duplication and redundancy.

More commitment

That should be all you need to know about the commit log if you want to start using it today. The MongoDB and RabbitMQ adaptors support recording their read operations in the commit log and can use them to resume reading. Elasticsearch, File, MongoDB, PostgreSQL, and RethinkDB adaptors can record their write operations in the commit log and use that to resume writing.

If you are interested in the details of the commit log implementation, consult the design document which explains its operation.

Also...

Other changes in Transporter 0.4.x are, by dint of the huge changes with logging, include being compiled with Go 1.8 and a new contributed option to set read preferences on MongoDB connections.

With logging in place, Transporter moves on to its next phase of development. We'll be covering the commit log and other transporter features over the coming weeks in Compose Articles. In the meantime, check out the Transporter Wiki, Readme and, of course the latest Releases.


If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Natalia Oommen

Dj Walker-Morgan
Dj Walker-Morgan is Compose's resident Content Curator, and has been both a developer and writer since Apples came in II flavors and Commodores had Pets. Love this article? Head over to Dj Walker-Morgan’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.