X
    Categories Industry Insights

The Wire: Instrumenting Custom RPC Interfaces

AppNeta no longer blogs on DevOps topics like this one.

Feel free to enjoy it, and check out what we can do for monitoring end user experience of the apps you use to drive your business at www.appneta.com.

One of TraceView’s key strengths is its ability to follow requests from tier to tier inside of an application, helping track down tricky bottlenecks and problematic interactions in a distributed system. Out of the box, TraceView supports a number of common RPC protocols/frameworks like HTTP, Thrift, and WCF.

But what if you’re using something else for RPC?  As it turns out, all the APIs available to our instrumentation team in developing automatic support for RPC protocols are available to our customers as well.  In this article, I walk through the theory behind instrumenting an RPC transport, using zeromq as an example.

Distributed Tracing: It’s All About the Context

The basic mechanism of distributed tracing involves following a single request transaction throughout its lifespan, across tiers of the application.  In order to do this, a unique identifier for the transaction is propagated along with the request–this is referred to as the trace context.

TraceView is based on the X-Trace methodology (if you want to learn more, check out TR’s great blog post on X-Trace fundamentals), so our context, when serialized for transmission, will have this structure:

1B17186D2DEF7747D6628ADD3BC6CCF4DABD675864590FCF50DC0E27EE

[header]         [trace_id]               [event_id]

The header identifies the protocol with certain flags (this will in almost all cases be “1B”), the trace_id is the global identifier of this transaction, and the event_id is the identifier of the immediately preceding traced event in the transaction flow, to link the events causally.  (If the event_id seems unnecessary, recognize that it helps to sort out traces that cross hosts with a lot of clock skew, for instance.)

To make the trace connect both sides, the following four things must happen:

  • Outgoing request from the client side must encode its current context and send it to the remote side with the RPC call.
    • In HTTP(s), this takes the form of an X-Trace header
    • This should be an extension that’s backwards-compatible with the uninstrumented protocol or it has potential to break things.
  • Remote side instrumentation must check for an incoming header and, if provided, applying it as the current context.
  • When remote work is done the remote side must encode its current context (modified by any events sent on the remote side) and send it with the RPC payload.
  • The client side must then check for a returning trace context and, if the context is found and the current request is being traced, apply it as the current context.

Trace Event Diagram

TraceView API Usage

You’ll want to use the following ~4 methods, available in slightly different forms across each of AppNeta’s supported languages:

  • log_entry() log_exit() log_method() – instrument a block of code (eg. RPC client call)
  • get_context() – serializes the current context to a string
  • start_trace() – starts a trace, or, if provided a context, continues a trace from that context within the current process
  • end_trace() – ends a trace within the current process

(At the end of this post you’ll find links to API documentation in each language.)

Hands-on with ZeroMQ

Let’s use a simple client/server example inspired by the zeromq zguide.

To save some space, I won’t insert the original code here.  It’s a simple application that sends the string “Hello” from the client and receives the response “World” from the server.

Check out the code here: Client, Server.

The main challenge with zeromq will be that there is not an easy way to add our context metadata to each request/response.  Zeromq is essentially acting as a socket, and in this example we’re just sending strings back and forth.  So we’ll need to create an envelope that can hold both our trace context as well as the actual application payload.  As a simple example, we’ll just pickle a dictionary with those two items:

packed = pickle.dumps({'TraceContext': str(oboe.Context.get_default()), 'Payload': 'Hello'})

Note that if you have to modify an RPC protocol in a non-backwards-compatible way like this, you’ll need to ensure that you are running the instrumented code in your entire system!  Protocols like HTTP manage to sidestep this problem by providing x-headers.

Here’s what we want to end up with:

Trace Details

Here’s our final, instrumented code:

Client

(gist: full code)

def instrumented_rpc(socket, message):
    """ Example instrumented RPC method."""
    keys = {'IsService': True, 'RemoteURL': 'tcp://localhost:5556'}
    oboe.log_entry('RPC-Client', keys=keys)

    wrapper = pickle.dumps({'TraceContext': str(oboe.Context.get_default()), 'Payload': message})
    socket.send(wrapper)

    wrapper = pickle.loads(socket.recv())
    reply = wrapper['Payload']

    c = oboe.Context(wrapper['TraceContext'])
    c.set_as_default()
    oboe.log_exit('RPC-Client')

    return reply

Server

(gist: full code)

def run_server():
    print('Distributed tracing server: receive request, sleep, reply')

    zmq_context = zmq.Context()
    socket = zmq_context.socket(zmq.REP)
    socket.bind('tcp://*:5556')

    while True:
        #  Wait for next request from client
        wrapper = pickle.loads(socket.recv())
        message = wrapper['Payload']

        # Begin backend trace based on upstream context
        keys={'URL': '/server.py', 'Domain': 'localhost'}
        oboe.start_trace('Backend', keys=keys, xtr=wrapper['TraceContext'])

        print('Received request: %s' % (message,))

        #  Do some 'work'
        time.sleep(1)
        message = 'World'

        #  End backend trace, reply back to client
        trace_context = oboe.end_trace('Backend')
        wrapper = pickle.dumps({'TraceContext': str(trace_context), 'Payload': message})
        socket.send(wrapper)

And the events it generates:

X-trace Event

Appendix: API Docs

Here’s the API docs for reference:

Dan Kuebrich: Dan Kuebrich is a web performance geek, currently working on Application Performance Management at AppNeta. He was previously a founder of Tracelytics (acquired by AppNeta), and before that worked on AmieStreet/Songza.com.

View Comments

  • I am trying to do something similar with Node JS but am having trouble setting the context properly on the Server side. Do you have any examples of Node JS RPC implementations?