In today's data-driven world, ensuring data consistency and interoperability across different systems is crucial. This is where a schema registry comes into play. In this walkthrough, we'll explore how to use a schema registry (Apicurio Registry) with the Solace JCSMP API, focusing on the Apache Avro format for schema definition.

You'll learn about:

✔️ What a schema registry is and why it's important
✔️ What a Serializer and Deserializer (SERDEs) is and the role they play
✔️ How to set up an instance of the Apicurio Registry using Docker Compose
✔️ How to create and register schemas
✔️ How to use schemas in your event-driven applications with Solace's JCSMP API
✔️ Best practices for schema evolution

  1. A general understanding of event-driven architecture (EDA) terms and concepts.
  2. A locally running PubSub+ Broker or a free trial account of Solace PubSub+ Cloud. Don't have one? Sign up here.
    • Along with the connection information for the broker
  3. Basic knowledge of Apache Avro schema format
  4. Docker installed on your system
  5. Java Development Kit (JDK) version 11+ installed on your system
  6. An IDE of your choice (e.g., IntelliJ IDEA, Eclipse, Visual Studio Code)
  7. Download the provided BETA zip package named Schema-Registry-Beta-Package_0.1.0.zip that contains all the necessary pieces you will need. This is available on the Solace Product Portal and unzip it to your preferred directory.

NOTE: If you cannot access the Solace Product Portal, please click the Report a mistake at the bottom left of the codelab and open an issue asking for access.

A schema registry is a central repository for managing and storing schemas. It helps ensure data consistency, enables data governance, and supports schema evolution. Here's why it's important:

  1. Data Consistency: Ensures that producers and consumers agree on the data format.
  2. Interoperability: Allows different systems to communicate effectively.
  3. Schema Evolution: Supports versioning and compatibility checks as schemas change over time.
  4. Data Governance: Centralizes schema management for better control and auditing.

A SERDEs (Serializer/Deserializer) in the context of a schema registry is a component that handles two key functions:

  1. Serialization: Converting data objects from their native format (like Java objects) into a binary format suitable for transmission or storage.
  2. Deserialization: Converting the binary data back into its original format for processing.

In a schema registry system, SERDEs work closely with schemas to:

For example:

This is a fundamental concept in message-based systems where data needs to be:

We'll use Docker Compose to set up the Apicurio Registry quickly and easily. We've prepared a Docker Compose file that will launch an instance of the Apicurio Registry and all the necessary components with a pre-defined configuration.

  1. In these subsequent steps we will use the package that came from the downloaded zip from the prerequisites section. Navigate to the extracted folder called Schema-Registry-Beta-Package. You should see the following files and folders:

  1. Extract the tar folder named solace-schema-registry-dist.tar.
  2. Open a terminal or command prompt window and navigate to the extracted location of the folder called solace-schema-registry-dist.
  3. Optionally you can make changes to the .env file to change things such as default login or ports. We will leave everything to defaults for this codelab.

  1. Run the following command: docker compose up -d and all the components will start up with the default values configured.
  2. Once the script is done running, you should now be able to go to your browser and navigate to localhost:8888 which should re-direct you to the Apicurio Registry login screen.

That's it, you have now installed an instance of the Apicurio Registry with the Postgres storage option!

Let's create a simple schema for a User event:

  1. Open the Apicurio Registry UI in your web browser by going to localhost:8888.
  2. Login with the predefined credentials for a developer. In this case the username is sr-developer and password is devPassword.

  1. Click on Create artifact button. Once the dialogue opens enter the following as shown below:
    • Group Id: Set to com.solace.samples.serdes.avro.schema
    • Artifact Id: Set to User
    • Type: Set to Avro Schema

  1. We will skip the Artifact Metadata step as it is optional and will click Next.
  2. For the "Version Content" section, copy the Avro schema from below and either paste directly or save it into a file and upload it and click Next when done.
{
  "namespace": "com.solace.samples.serdes.avro.schema",
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "name", "type": "string"},
    {"name": "email", "type": "string"}
  ]
}

  1. We will skip the Version Metadata step as it is optional and will simply click the Create button.
  2. Finally after you have successfully created the new schema you should see the following:

You've now created and registered your first schema!

Now, let's see how to use this schema in Java using the Solace Messaging API for Java (JCSMP):

  1. Extract the SERDEs zip called pubsubplus-schema-registry-avro-serde-0.1.0-beta-release.zip and extract the JCSMP zip called sol-jcsmp-10.25.3-jcsmp_schema_registry.44338.zip
  2. In a new window, navigate to the Schema-Registry-Beta-Package/jcsmp folder. This is where you will find the sample application.
  3. Create a new folder called lib and copy the contents of pubsubplus-schema-registry-avro-serde-0.1.0-beta-release.zip/lib and sol-jcsmp-10.25.3-jcsmp_schema_registry.44338/lib into this new folder.
  4. Open a command window or terminal in the Schema-Registry-Beta-Package/jcsmp directory.

NOTE: For winows users, use the gradlew.bat file instead of gradlew in the below steps. For non-Windows users, you will have to run chmod +ux gradelw to change permissions on gradelw.

  1. Run the following command to build the sample on Windows ./gradlew.bat build and . You should see a BUILD SUCCESSFUL message.
  2. Run the sample application and provide the broker connection details like so ./gradlew.bat runHelloWorldJCSMPAvroSerde --args="localhost:55555 default default".

This sample talks to the locally deployed Apicurio Schema Registry and retrieves the schema along with the schema ID. It will then do the following:

Configures the Serializer and Deserializer:

// Create and configure Avro serializer and deserializer
        try (Serializer<GenericRecord> serializer = new AvroSerializer<>();
             Deserializer<GenericRecord> deserializer = new AvroDeserializer<>()) {

            serializer.configure(getConfig());
            deserializer.configure(getConfig());
   /**
     * Returns a configuration map for the Avro serializer and deserializer.
     *
     * @return A Map containing configuration properties
     */
    private static Map<String, Object> getConfig() {
        Map<String, Object> config = new HashMap<>();
        config.put(SchemaResolverProperties.REGISTRY_URL, REGISTRY_URL);
        config.put(SchemaResolverProperties.AUTH_USERNAME, REGISTRY_USERNAME);
        config.put(SchemaResolverProperties.AUTH_PASSWORD, REGISTRY_PASSWORD);
        return config;
    }

Serializes the message payload and publishes the message to the connected broker on test/topic destination with the serialized payload and schema ID:

   // Create and populate a GenericRecord with sample data
            GenericRecord user = initEmptyUserRecord();
            user.put("name", "John Doe");
            user.put("id", "123");
            user.put("email", "support@solace.com");

            // Serialize and send the message
            BytesMessage msg = JCSMPFactory.onlyInstance().createMessage(BytesMessage.class);
            SerdeMessage.serialize(serializer, msg, user);
            System.out.printf("Sending Message:%n%s%n", msg.dump());
            producer.send(msg, topic);

It will then receive the published message and deserialize it by looking up the schema ID and retrieving the schema and print it out to console:

  // Set up the message consumer with a deserialization callback
            XMLMessageConsumer cons = session.getMessageConsumer(Consumed.with(deserializer, (msg, genericRecord) -> {
                System.out.printf("Got record: %s%n", genericRecord);
                latch.countDown(); // Signal the main thread that a message has been received
            }, (msg, deserializationException) -> {
                System.out.printf("Got exception: %s%n", deserializationException);
                System.out.printf("But still have access to the message: %s%n", msg.dump());
                latch.countDown();
            }, jcsmpException -> {
                System.out.printf("Got exception: %s%n", jcsmpException);
                latch.countDown();
            }));
            cons.start();

The sourcecode can be further looked at by opening the HelloWorldJCSMPAvroSerde.java file.

As your data model evolves, you'll need to update your schemas. Here are some best practices:

  1. Backward Compatibility: Ensure new schema versions can read old data.
  2. Forward Compatibility: Ensure old schema versions can read new data (ignoring new fields).
  3. Full Compatibility: Aim for both backward and forward compatibility.
  4. Versioning: Use semantic versioning for your schemas.
  5. Default Values: Provide default values for new fields to maintain backward compatibility.
  6. Avoid Renaming: Instead of renaming fields, add new fields and deprecate old ones.

When updating a schema in Apicurio Registry:

  1. Create a new version of the existing schema.
  2. Make your changes, following the best practices above.
  3. Use the compatibility rule in the registry to ensure your changes don't break existing consumers.

✔️ Schema registries are crucial for maintaining data consistency in event-driven architectures.
✔️ Apicurio Registry provides a powerful platform for managing schemas.
✔️ Apache Avro is a popular choice for schema definition due to its compact binary format and schema evolution capabilities.
✔️ Solace's JCSMP API can be used effectively with the Apicurio Registry for robust event-driven applications.
✔️ Proper schema evolution practices ensure smooth updates without breaking existing systems.
✔️ Using schemas in your applications helps catch data issues early and improves overall system reliability.
✔️ More schema SERDEs will be made available in the future including JSON.

Soly Image Caption

Thanks for participating in this codelab! Let us know what you thought in the Solace Community Forum! If you found any issues along the way we'd appreciate it if you'd raise them by clicking the Report a mistake button at the bottom left of this codelab.