Syncing CRM data to your application on a batch schedule is one of the most common ways to integrate with your customers’ Salesforce. For example, fetching Account and Opportunity records every 4 hours to power a lead scoring algorithm. But some sales automation use cases require knowing about changes to customer CRM records in real time, like triggering an internal workflow or Slack notification immediately after an Opportunity status changes to closed-won. In these use cases, consuming Salesforce change events in real-time is critical to delivering a great product experience for end users.
Many SaaS products (including HubSpot) offer an HTTP webhooks API to support this real-time event integration use case, but Salesforce doesn’t have a webhooks API. Instead, Salesforce lets you consume Change Data Capture (CDC) events via a gRPC-based Pub/Sub API, a less common but more robust pattern for consuming events. However, it’s also significantly more difficult to work with and comes with several Salesforce-specific quirks that developers must work around.
This guide explains how the Salesforce Pub/Sub API works and how to use it to build a real-time CDC event integration with your customers’ Salesforce.
To get the most out of this guide, you should have the following:
gRPC is a modern, open-source, high-performance remote procedure call (RPC) framework that can operate in any environment. It uses protobuf as its interface definition language, which helps developers define services and message types for their RPC calls. gRPC supports long-running connections and bi-directional communication, which the Salesforce Pub/Sub gRPC API leverages for subscribing to CDC events.
Avro is a data serialization system that transmits data between different systems. It is similar to other serialization systems such as JSON and XML but is designed for high performance and language-agnostic. One of the benefits of Avro is that it has a compact binary format that is very efficient to transmit over the network. In the context of Salesforce, Avro encodes the payloads of Change Data Capture (CDC) events sent to subscribers via the Salesforce Pub/Sub gRPC API.
For this project, we will use the buf Connect suite of libraries, which provides an easy-to-use and modern interface for gRPC clients in Node.js. We will also use the fast and modern avsc library to decode the Avro-encoded payloads of the events.
We can then install the dependencies using npm:
We need to generate the gRPC client code from the Salesforce Pub/Sub API protobuf file. To do this, we use the buf tool, which is a command-line tool that helps us manage protobuf files and generate client code from them.
After compiling the protobuf files to client code, we connect to the Salesforce Pub/Sub gRPC API. Here we assume that you already have code in place that handles getting the customers’ access token, instance URL and tenant ID via OAuth.
This script will continue running and log to the console each time a CDC event is received from Salesforce. In your system, you could push the Avro payload into an internal queue for processing.
This basic implementation shows how you can start consuming CDC events, but doesn't cover some of the Salesforce-specific quirks you need to handle before bring your integration to production.
This is quite complex to deal with, but we do handle it in the Supaglue product. The code we included in Supaglue for identifying the changed fields using the bitmap fields can be found here.
Many streaming pub-sub platforms offer eventually consistent guarantees, like Kafka’s at least once or exactly-once semantics. Salesforce’s PubSub API does not offer these guarantees and can potentially be lossy. Some of the events that you receive could be Gap or Overflow Events, and need to be handled differently by your system, depending on your use case. The way you handle these events can be fairly complex, especially for Overflow Events.
A Gap Event is produced by Salesforce when there is an error or other issue with Salesforce that makes it so it is actually unable to generate a change event, or the generated change event would be over 1MB in size, or the event was generated by a process (like a data cleanup job) that changes records directly in the database.
A Gap Event looks something like this:
In the case where over 100,000 changes happen in a single transaction, Salesforce emits standard CDC events for the first 100,000 changes, and a single Overflow Event per entity type for the changes over the first 100,000. For example, if a cascade delete results in the deletion of 110,000 account, contact, opportunity and activity records in a single transaction, you would receive events for the first 100,000 records deleted, and one Overflow Event for each of the entity types for the remaining 10,000 records.
When you receive an Overflow Event, the correct process to handle synchronization is pretty complex:
This was a basic guide on how to consume and decode Avro-formatted CDC events from Salesforce using the Salesforce Pub/Sub gRPC API. We also covered some of the more complex considerations, like how to identify which fields changed via the bitmap fields in the event headers, and how to handle Gap and Overflow Events.
Building a real-time CDC events integration with Salesforce is doable, but it’s tricky to get right and even more complex to maintain. Yet a real-time Salesforce integration delivers superior user experiences and differentiation against competitors who fallback to batch. If you’re looking for a real-time Salesforce integration but don’t want to sink significant engineering resources, we’d love to help.
Supaglue has a robust implementation with all these details considered that produces events you can consume using standard HTTP webhooks. We even handle the OAuth flow (including token refresh) so you don’t have to! If you are interested in trying this out, read about our real-time events feature or sign up to get early access.