What is LogCorrelation?

While working on an article about CDI interceptors on producer methods I mention logCorrelation. I will not go into detail on this topic over at the other blog post as it would be simply too much over there. And this gives a great topic for a separate post anyway. And here we go…

So what is LogCorrelation?

Consider you have a more or less distributed application topology. You might have a server which does maintain customer data. There might be another box handling all the document archive, another one which holds the calculation kernel, etc etc.

Nowadays all people would say that are microservices. 8 years ago all people called it SOA. To be honest I GIVE A SHIT how some sales people name it as all this is around since much longer than I’m working in the industry (which is a whoopie 25 years already). It’s just modular applications talking with each other somehow. Sometimes via SOAP or REST, but maybe even via MessageQueue, shared database tables or file based with batches handling the passing over – to me it doesn’t matter much.

But for all those the problem is the same: Consider a user clicks on some button in his browser or fat client. This triggers an ‘application action’. And this single action might hit the first server, then this server pings another one, etc. Synchronous or asynchronous also doesn’t matter. This might go all over the place in your company and even externally.  At the end something happens and the user most times gets some response. And it is really, REALLY hard to tell what’s wrong and where it went wrong if something doesn’t work as expected or returns wrong results. Often you don’t even have a clue which servers were involved. And if your whole application starts to behave ‘laggy’ then you will have a hard time judging which system you need to tinker with.

Now how cool would it be if you could follow this single action over all the involved servers?

And this is exactly what logCorrelation does!

What is the trick?

The trick is really plain simple. Each ‘action’ gets an own unique logCorrelationId. That might be a UUID for example. The only requirement is that it’s ‘decently’ unique.

If a server gets a request then he checks if there was a logCorrelationId passed to him. If so, then he takes this id and stores it in a ThreadLocal. If there was no id passed, then this is a ‘new’ action and we generate a fresh logCorrelationId. Of course this logCorrelationId will also get set as e.g. HTTP header for all subsequent outgoing HTTP calls on this very thread.

Where do I show the logCorrealationId?

Our applications now all handle this logCorrelationId properly, but where can I look at it? What is the benefit of all this?

At my customers I mainly use Apache log4j as logging backend, (often with slf4j as API). The point is that only log4j (and logback, but with way worse performance) support a nice little feature called MDC which stands for Mapped Diagnostic Context.  It is basically a ThreadLocal<Map<String, String>> which will get logged out in each and every line you log out on this very thread.

This log4j feature can also be accessed via the slf4j API. E.g. in a javax.servlet.Filter

MDC.set("correlationId", logCorrelationId);
MDC.set("sessionId", httpSessionId);
MDC.set("userId", loggedInUser.getId());

For enabling it in the log output you need to configure a ConversionPattern:

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
    <appender name="console" class="org.apache.log4j.ConsoleAppender">
        <param name="Target" value="System.out"/>
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{ISO8601} [%t] %X{sessionId} %X{userId} %X{correlationId} %-5p %c{2} %m%n"/>
        </layout>
    </appender>

If you logging is configured properly in your company and you funnel all back into log aggregation systems like ELK (OpenSource with commercial support offering) or Splunk (Commercial with limited free offering) then you can now simply follow a single action over all the various systems.

What about non-user requests?

Non user requests can sometimes even be filled with more information. At a few customers we use Camunda BPMN Suite (OpenSource with commercial support). The core has a Thread which basically polls the DB and fetches new tasks to execute from it. Those will then get ‘executed’ in a parallel thread. For those threads we intercept the Executor and fill the logCorrelationId with the camunda jobId which basically is a uuid starting with ‘cam-‘. So once a process task blows up we can exactly figure what went wrong – even on a different server.

Of course this trick is not limited to the process engine…

PS: how does my stuff look like?

Probably it’s also worth sharing my LogCorrelationUtil:

/**
 * Helper for log correlation.
 *
 * @author Mark Struberg
 */
public class LogCorrelationUtil {

    public static final String REQUEST_HEADER_CORRELATION_ID = "X_LOG_CORRELATION_ID";
    public static final String MDC_CORRELATION_ID = "correlationId";

    private LogCorrelationUtil() {
    }

    /**
     * Creates a new log correlation Id ONLY IF there is no existing one!.
     * Takes about 4 uS, because I use a much faster UUID algo
     *
     * @param logEnvironment prefix for the logCorrelationId if a new one has to be created. Determines the environment the uuid got created in.
     * @param existingLogCorrelationId oder {@code null} if there is none yet
     */
    public static String enforceLogCorrelationId(LogEnvironment logEnvironment, String existingLogCorrelationId) {
        if (existingLogCorrelationId != null && existingLogCorrelationId.length() > 0) {
            return existingLogCorrelationId;
        }
        ThreadLocalRandom random = ThreadLocalRandom.current();
        String uuid = new UUID(random.nextLong(), random.nextLong()).toString();

        if (logEnvironment != null) {
            StringBuilder sb = new StringBuilder(60);
            sb.append(logEnvironment);
            sb.append(uuid);
            uuid = sb.toString();
        }
        return uuid;
    }

    /**
     * @return the logCorrelationId for this thread or {@code null}
     */
    public static final String getCorrelationId() {
        return MDC.get(MDC_CORRELATION_ID);
    }

    /**
     * Set the given logCorrelationId for the current Thread.
     */
    public static final void setCorrelationId(String logCorrelationId) {
        MDC.put(MDC_CORRELATION_ID, logCorrelationId);
    }

    /**
     * Clears the logCorrelationId from the current Thread.
     * This method MUST be called at the end of each request 
     * to prevent mem leaks!
     */
    public static final void clearCorrelationId() {
        MDC.remove(MDC_CORRELATION_ID);
    }
}
Advertisements

About struberg
I'm an Apache Software Foundation member blogging about Java, µC, TheASF, OpenWebBeans, Maven, MyFaces, CODI, GIT, OpenJPA, TomEE, DeltaSpike, ...

One Response to What is LogCorrelation?

  1. Pingback: Applying interceptors to producer methods | Struberg's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: