No business likes an angry user. That goes double if the user is a taxi driver. The language can be...intense.

As NYC Modern Taxi Company modernizes its operations through event-based integration, things will go wrong. What's important is that issues are resolved quickly and as painlessly as possible for the end user, so the taxis keep moving. To help with that, NYC Modern Taxi is adding OpenTelemetry distributed tracing to all of their microservices. The goal is to trace events at they move through Solace and potentially dozens of microservices.

Instead of plowing through logs, trying to understand what happened, NYC Modern Taxi can instead search through graphs like these that show how the event moved across microservices: flame-graph

In this CodeLab you'll learn about:

βœ… How OpenTelemetry works (the basics, if you'd like more detail here is a solid intro blog post.)

βœ… How to get the Solace PubSub+ Python API, and set up a basic environment.

βœ… How to include OpenTelemetry tracing in your processes

βœ… How to visualize your event-driven applications with Jaeger

opentelemetry

  1. A general understanding of event-driven architecture (EDA) terms and concepts.
  2. A free trial account of Solace PubSub+ Cloud. Don't have one? Sign up here.
  3. Download Jaeger for free in the OS of your choosing
  4. Download Python for free in the OS of your choosing
  5. Download the sample code Or use git commands to grab it from GitHub.

At Solace, we've been following the story of NYC Modern Taxi, a taxi company struggling to stay relevant by creating a new ride sharing app. To be clear, it's a fake company. But its problems are all too real.

At midnight, NYC Modern Taxi went live with the app, and in the morning drivers started complaining about not being able to login. The issue appeared to be with events moving from Salesforce (where potential drivers are recruited) into an operations database used by the mobile application. But given the number of hops the event took along the way, the location of the issue wasn't exactly clear.

opening_graphic

Luckily, to give support personnel better visibility into what's happening at runtime, NYC Modern Taxi added OpenTelemetry to their microservices. In contrast to traditional tracing with line after line of text, distributed tracing shows you a searchable, graphical picture of when, where and how one event flowed through your enterprise. It let NYC Modern Taxi watch the Tommy's account update zig to the cloud, zag into an on-premises data center, fan out into 40 thanks to a pub-sub architecture, slow down in a bottle neck and then...watch a programming error stop it completely.

Since you can't recreate the whole infrastucture here, you'll focus on a small slice. A "Salesforce" Python script will publish a message to Solace. That message will be consumed by two additional Python scripts, one playing the role of an outbound REST call, the other mocking a database insert. Each process sends a trace to Jaeger, which both collects them and displays in pretty, useful pictures.

logicalarch

Access to a Solace messaging service, Solace PubSub+, can be achieved in either one of the three flavours

  1. Hardware Appliance
  2. Software broker image (Docker, Virtual image)
  3. Solace Cloud service instance

This tutorial will walk you through setting up a Solace Cloud service instance, which also gives you access to the Event Portal. If you are interested in setting up a local broker running on Docker or a virtual machine check out the PubSub+ Event Broker: Software documentation

Sign up for free Solace Cloud account

Navigate to the Create a New Account page and fill out the required information. No credit card required!

Create a messaging service

After you create your Solace Cloud account and sign in to the Solace Cloud Console, you'll be routed to the event mesh page.

Solace Cloud Event Mesh Page

Click on β€˜Cluster Manager' and all the messaging services associated with your account will show up if you have any already created. To create a new service, click either button as depicted in the image below:

Solace Cloud Landing Page

Fill out all the details for your messaging service, and then click "Create" at the bottom of the page.

Create Solace Cloud Messaging Service

Your service should be ready to use in a couple seconds! πŸŒͺ

Click on the previously created service and navigating to the Connect tab. Expand the "Solace Messaging" menu to get the connection details connection-service

Use a text editor to save the the connection parameters with the Host, Message VPN Name, Client Username and Password. You'll need them later.

βœ”οΈ You've created the first part of the solution! check1

Jaeger collects and visualizes the spans created by your microservices. It has several different components such as an underlying database, a GUI and a span collector, but fortunately it has an all-in-one distribution that makes setup easy.

Steps to install

  1. Extract the Jaeger distribution file you downloaded as part of the prerequisites. (e.g. jaeger-1.17.1-windows-amd64.tar.gz on Windows)
  2. Open a command terminal.
  3. Run jaeger-all-in-one
  4. Using a web browser, navigate to http://localhost:16686/
  5. You should be greeted by a picture of a...chipmunk? Gopher? Not really sure.

βœ”οΈ You've created the second part of the solution! check2

Python is a great language for prototyping applications. It's easy to get up and running quickly, and since it doesn't compile you just change a line and reload. This CodeLab uses Python to simulate Salesforce publishing the message and two event consumers using the Solace PubSub+ Python API. And since there is a native OpenTelemetry API for Python it can produce the OpenTelemetry tracing as well.

Steps to install

  1. Open the Python installer that you downloaded in the prerequisites.
  2. You'll be guided through the install process.

Once it's complete, you should have a program called IDLE (Integrated Development and Learning Environment). IDLE

That's what you'll use to modify and run the Python scripts.

  1. If you haven't already, unzip the CodeLabs code into a convenient directory
  2. Open a terminal window and navigate to the downloaded github repo
  3. Configure a virtual Python environment
    python -m pip install --user virtualenv
    python -m venv venv
    ## Activate the virtual environment on MacOS
    source venv/bin/activate
    ## Activate the virtual environment on Windows
    source venv/Scripts/activate
    
  4. Install the required OpenTelemetry dependencies with pip
    pip install -r requirements.txt
    
  5. Execute the solace_telemetry_publisher_Salesforce.py by passing the correct environment variables
    SOLACE_HOST=<host_name> SOLACE_VPN=<vpn_name> SOLACE_USERNAME=<username> SOLACE_PASSWORD=<password> python solace_telemetry_publisher_Salesforce.py
    
  6. If it's successful, you'll get a message like:
    2020-08-26 08:33:41,885 [INFO] pysolace.messaging.core: solace_session.py:470  ESTABLISH SESSION ON HOST tcp://mr-d8f4yze27kt.messaging.solace.cloud:55555
    parentSpan trace_id  on sender side:18915100849980568506040033268233107768
    parentSpan span_id  on sender side:16251617562995641485
    Process finished with exit code 0
    
  7. To check out the trace you generated (which only has one span in it at this point), open up the Jaeger UI.
  8. You should see a service listed with the name: Listen for Salesforce Platform Account, publish Solace DriverUpserted

jaeger_services

  1. Click on the "Find Traces" button on the bottom of the left-hand column.
  2. In the results, you should see the trace of the microservice, which should contain one span. single_span

βœ”οΈ You've created the third part of the solution! check3

Code Walk Through (optional)

  1. The code creates a trace factory, using the default TracerProvider. The code creates the actual tracer later on using this factory.
  2. The we create the Jaeger exporter, which contains the details on how to connect to Jaeger. As you can see, it's running locally on port 6831.
    trace.set_tracer_provider(TracerProvider())
    jaeger_exporter = jaeger.JaegerSpanExporter(
    	service_name="<Boomi> Listen for Salesforce Platform Account, publish Solace DriverUpserted",
    	agent_host_name="localhost",
    	agent_port=6831,
    )
    
  3. Now that there is connectivity to Jaeger, the code specifies that the spans we create will be send in batches using the BatchExportSpanProcessor.
    trace.get_tracer_provider().add_span_processor(
    BatchExportSpanProcessor(jaeger_exporter)
    
  4. Using the trace factory that you created above, you get an instance of a trace, called tracer. Using tracer, you create a span, which describes what happens in this microservice.
    tracer = trace.get_tracer(__name__)
    # THIS IS PER https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/messaging.md
    parentSpan = tracer.start_span(
    	"RideUpdated send",
    	kind=SpanKind.PRODUCER,
    	attributes={
            "messaging.system": "solace",
            "messaging.destination": "RideUpdated send",
            "messaging.destination-kind": "topic",
            "messaging.protocol": "jcsmp",
            "messaging.protocol_version": "1.0",
            "messaging.url": "url of solace box"}
    )
    
  5. Now that the span is established, you need to get the span's trace_id and the span_id
    trace_id = parentSpan.get_context().trace_id
    span_id = parentSpan.get_context().span_id`
    
  6. As the script publishes the event, it includes the trace id and the span id as two headers in the message. Downstream consumers now have access to them.
    outbound_msg = OutboundMessage.builder() \
    	.with_property("trace_id", str(trace_id)) \
    	.with_property("span_id", str(span_id)) \
    	.build("Hello World! This is a message published from Python!")
    
  7. Finally, end the parentSpan, so it will be sent to Jaeger
    parentSpan.end()
    

Now that "Salesforce" is publishing an event to Solace, you need to get the "REST" consumer and the "database" consumers up and running. It's basically the same procedure as before.

  1. In a new terminal, navigate to the directory where the code resides
  2. Activate your virtual environment by executing source venv/bin/activate in MacOs or source venv/Scripts/activate on Windows
  3. Execute the solace_telemetry_publisher_Salesforce.py by passing the correct environment variables
    SOLACE_HOST=<host_name> SOLACE_VPN=<vpn_name> SOLACE_USERNAME=<username> SOLACE_PASSWORD=<password> python solace_telemetry_consumer_Database.py
    
  4. You should see something like:
    2020-08-26 08:32:20,449 [INFO] pysolace.messaging.core: [solace_session.py:470]  ESTABLISH SESSION ON HOST [tcp://mr-d8f4yze27kt.messaging.solace.cloud:55555]
    Execute Direct Consume - String
    Subscribed to: opentelemetry/helloworld
    
  5. The database consumer listens for incoming message until it receives a keyboard interrupt. At that point you'll see a message that says: Process finished with exit code 0

πŸ” Repeat this process with solace_telemetry_consumer_REST.py, so you have two consumers running at the same time.

βœ”οΈ You've created the entire solution! check5

Code Walk Through (optional)

Much of the code here is either:

We'll focus on what's different, which is mainly extracting the span_id and trace_id from the incoming message and using them to start a new span, specifying the publisher's span as the parent span.

  1. When an event arrives on the telemetry/helloworld topic, the direct_message_handler(msg) function is called.
  2. The function first extracts the span id and the trace id from the event
    trace_id = str(msg.get_property("trace_id"))
    span_id = str(msg.get_property("span_id"))
    print("parentSpan trace_id on receiver side:" + trace_id)
    print("parentSpan span_id on receiver side:" + span_id)
    
  3. Using the span_id and trace_id, it then recreates the pubisher's context
    propagated_context = SpanContext(int(trace_id), int(span_id), True)
    
  4. Using the recreated context, it then opens a new span, specifying the parent span as the publisher. This allows the linkage between the publisher and the consumer.
    childSpan = tracer.start_span("RideUpdated receive", parent=propagated_context)
    

Now all the pieces are in place. To see the end-to-end solution:

  1. Run the two consumers in two separate terminal instances
  2. Run the publisher in another terminal instance
  3. As soon as you run the publisher, you should see a log entries in each of the consumers showing that the events were received. The logs should look like:
    CALLBACK: Message Received on Topic: opentelemetry/helloworld.
    Message String: Hello World! This is a message published from Python!
    
    parentSpan trace_id on receiver side:18915100849980568506040033268233107768
    parentSpan span_id on receiver side:16251617562995641485
    
  4. Head back to the Jaeger UI one more time.
  5. You should see still a service listed with the name: Listen for Salesforce Platform Account, publish Solace DriverUpserted), click on the "Find Traces" button again to make sure you have the latest results.
  6. There should be a result that include all three spans that you created jaeger_3_span.png
  7. Clicking on the result gives you a Flame Graph, showing exactly how long each process took and how the processes are related jaeger_detail.png

βœ”οΈ With Event-driven integration, real-time data can reach across your enterprise faster βœ”οΈ Event-driven integration also loosely couples publishers and consumers of information, which let's you create innovative solutions faster βœ”οΈ But that loose coupling means that you need to think carefully about how to trace events going through your enterprise βœ”οΈ OpenTelemetry gives you visibility into the movement of events

Thanks for participating in this codelab! Let us know what you thought in the Solace Community Forum! If you found any issues along the way we'd appreciate it if you'd raise them by clicking the Report a mistake button at the bottom left of this codelab.